KryoFlux - Work To Do and Compression

2009-11-06

The board firmware uses an encoding that ensures that any time value measured between flux reversals are faithfully reproduced and not altered in any way. This means there are no values that would be clipped, clamped or ignored.

Compression of the flux transition stream will be done by the host, the board is simply to weak to do anything meaningful. Currently, there is no compression at all.

Things To Do

  • Current streamed data should be converted to the DRAFT format (see below).
  • Apply some kind of compression (maybe).
  • Create an .ADF/.ST conversion tool
  • Make our analyser, CTA, read DRAFT files.

The conversion tool will help us see things working early and will hopefully show up any obvious problems before we do the work of adding functionality to the analyser to read the DRAFT format.

These things have various sub-issues to resolve. For example, converting to DRAFT format is not as simple as writing the data out in a different way, it should be fully decoded data, and re-arranged in a way that is convenient for later processing.

What made KryoFlux possible at all is the fact that the firmware creates and sends data in a way that is very convenient and highly optimised for the hardware. This had to be done by thinking outside the box a bit, using some rather unconventional solutions like using deferred processing and a rather inventive signalling system. However, understanding this data is the responsibility of the host software.

Compression

As for compression, we are not currently sure whether there is much point writing our own compression algorithm. The main reason for it is that we might be able to do much better than “blind” compression algorithms such as ZIP and RAR by understanding what the data is supposed to represent.

There is something called delta compression in CT, which might work quite nicely. It was written with the capabilities of a 68020 in mind, so it could probably be made much more sophisticated on a modern desktop. Unfortunately we will not know for sure until we have something working, the results could simply be disappointing. If it doesn’t work out, we’ll recommend using something like RAR or 7-Zip.

There is currently some minor processing done on the stream. It is not really compression, it’s just data encoded in such a way that uncommon data takes more space than normal data.

Without this pre-processing, the original data would be 80 MiB per disk. By encoding the data in a certain way, you could say our pre-processing gives us a 1:2 compression ratio, but really it is just slightly re-arranging the data representation, a bit like using a new numeral system.

This data would first have to be converted to 160 MiB by the host to allow representation of all possible values. Ideally, that 160 MiB will be compressed down to at most 10 MiB, which would be a rather impressive 1:16 lossless CODEC (although on very special data sets, and so not applicable to general systems). Actually, it would be better to have more like 1:32 ratio - 5 MiB files.

We store a lot of these files, and so it is worth a try. However, this is all just wishful thinking until we have a working algorithm that shows the potential, if there is any. We may just end up with 20 MiB files instead which RAR can already produce on the current datasets.