Finished Sync Scan Caching

15 February 2002

First a little refresh...

As we described in the last WIP update, once a protection has been reverse engineered and the information obtained has been fed into the analyser using our “disk structure description language“, any other game that uses this protection is automatically detected. A disk with an unknown protection is also easily visible.

This detection of known protection is one of a few tasks that the analyser does for each disk image and is completely automated. In this update, these tasks will collectively be referred to as “analysing” or “the analysation process”. The development effort that was required to do it this way was admittedly rather huge, but it will be worth its weight in gold every time we have a game that is automatically recognised.

The slowest task that the analyser does so far is sync scanning. But bear in mind the actual analysation part is not written yet.

Sync

See hereto understand what a sync is.

Sync Sets

Different formats tend to use different styles of syncing that vary in different ways:

  • Lead clocking
  • Amount of sync
  • Value of sync
  • Various syncs grouped together, commencing with data
  • etc...

We define all these parameters in terms of a “sync set”. This information is used to filter the scan result and it produces a very small amount of spots on a track (sometimes the exact positions, sometimes a bit more), where the area is likely to have valid data parts.

Sync Scanning

Remember from the last WIP that the track (format) is made up of a varying amount (one or more) of blocks. Sync scanning is basically the process of finding the sync values on a track, and thus finding the block areas.

Finding syncs normally produces a high volume of data, especially on formats where the encoding produces them. This is normally a few hundred, but in extreme cases it can be tens of thousands. Therefore filtering must be applied to the data or “table” produced by the sync scan.

This pre-processing significantly reduces the possible areas on a track to commence with the decoding of data. Sync scanning refers to both the scanning and the filtering algorithms together.

Brute Force Method

There are a few cases when the sync scanning technique does not work, specifically where syncs are just normal data values. For these cases a brute force method is used to find data areas and it is based on finding encoded data and finding the clocking (or gap) preceding the data area. This method is necessary to catch the data/sync properly and can find pretty much everything on a track. It does take a lot more time to do since the scanning produces a high amount of possible spots for the filtering.

The Problem

This task as a whole is unavoidable. The problem is that it takes a few seconds for each known block type (i.e. every different block structure encountered) and it will quickly become rather unusable, especially when the analyser contains a few hundred block descriptors.

Imagine this:

168 tracks
  x hundreds of scans
  x multiple blocks per track
  x a few seconds each block
--------------------------
  = ...

You can do the maths, it will be easily hours to analyse each disk image for just this task. With this in mind, a caching system was developed this week that enables the analyser to fully decide if a “sync set” already applies to the raw data and therefore can be re-used. This is possible if it contains at least the required data, or more from a previous scan.

This is now all implemented and working; scans are only done when it is unavoidable. This reduces the few hundred complete scans per track to about the number of predefined valid sync sets (10-20 at most). Not the same amount, since the data descriptor may contain syncs that are out of range for the sync set; in that case the analyser automatically creates a temporary sync set and applies that to the data, instead of the preset one.

Next comes search mask creation and data matching. Everything else works, it goes as far as taking all the steps required for block analysing (but does not actually do it yet).