Thanks to the unbelievable simplicity of the VCF algorithm, countless improvements, added features and alternative implementations can be proposed.
Here is a non exhaustive list of the most outstanding propositions we have.
Improving the test implementation
The test implementation has exhibited a weakness in the algorithm: the strong cell boundary artefacts are too much visible and waste compression resources. Fortunatly, an easy solution has been proposed to address this problem: instead of building "flat" cells for which all pixels have the same value, we should build a gradient in the cell by interpolation the cell value with its 8 neighbours.
For this improvement, we do not need to add parameters to the cell: changing the way a cell is rebuilt with its single value will be enough to make the cell boundaries invisible. By doing that, we will at the same time increase the compression rate, because the VCF algorithm would not need to waste transitions to compensate the artefacts.
However, we feel that flat cells might also be need. One can imaging that flat areas of an image might be coded more efficiently by flat cell, and we don't want the VCF algorithm to waste transitions resources by fighting the generated gradients.
More experiment has to be done on the subject, to make sure this is really needed, and how to encode the information. The proposed way to deal with that is by adding, in a dedicated table, one bit to each top-level cell to decided whether this cell tree will be interpolated or flat. We might also want to add a flag in the header to decide if this table is present in this image or not.
Multiple depth support
Another thing the tests has shown is the need to handle pixel depth different that 8-bits. The test has been done for 1-bit, but a large set of depths is actually useful. We propose to handle depths of 1, 2, 4, 6, 8, 12, 14, 16, 24 and 32 bits per pixel, either in gray scale (at least for 1 to 16) or in color (RGB or RGBA, at least for 8 to 32). For instance, grayscale images with 1 to 4-bits per pixel can be used for facsimile storage and transmission, while those with 8 to 16-bits per pixels might be used for medical images.
One idea to test is how much interesting would it be to store the pixel values at lower depth (say 4-bits for a 8-bits image, possiblity encoded by a mu-Law for more dynamic), but still build the gradients with the screen's depth.
To become a real image format, VCF should of course handle color images. We propose 3 implementation for that, and some further test will be made to make sure all of them are needed.
The naive way to handle color images is to deal with 16, 24 or 32-bits per pixel as one pixel value, with just the mean calculation and the interpolation functions being aware of the presence of 3 (RBG, CMY or YUV) or 4 (RGBA or CMYK) fields in the pixel value. This is easy to implement and is likely to give very good results.
The second proposition we make is to treat each color and alpha channel as separate images. To take advantage of the difference of sensibility of the human eye for different colors (as 4:2:2 and 4:1:1 encoding do), we might want to use a different depth for each channel. For instance, we might want to encode R and B with 5 bits and G with 6 bits. This way, R and B channels will have their pertinences naturally scalled down by a factor 4 regarding to channel G, yet keeping the high-pertinence details.
This proposition also allows to encode channels that do not map to usual chromaticity. For instance, we could compress so-called "sepia" images with only two channels: one for luminance and one for the sepia chroma. The drawback is the need for one transistion table for each channel. But the image quality might be improved by separating channels and some tests are needed to decide to whether or not support this feature.
The last proposition we have for supporting color images has never been proposed for lossy compression of images, but, thanks to the simplicity of the VCF algorithm, we can easily test unusual features.
We can try and compress images with indexed colors (look-up table). For that, the flat cells are probably only way to go, and the cells value calculation as to be changed from mean of pixels value to majority of pixels value. The reconstruction uses the same algorithm. As usually, more tests have to be made for that feature.
More control over the compression
Since the compression and decompression processes are comparable to memory-to-memory copy, an image viewer or an image retouching application could use the VCF encoding as memory cache for the large images, painting the needed image areas by decompressing the relevant top-level cells on-the-fly. An advanced compression utility could even allow a user to increase the pertinence of some areas (such as a text in the image) to increase the image quality, or decrease it (such as the borders of the image or the background of the subject) to increase the compression rate.
This feature could be paired with the ability for the user to ask for a given weight for the image: is a user wants the image to be, say exactly 16KB, the VCF algorithm can add more and more details up to the desired weight, by progressively decreasing the pertinence selection.
Movies are just a set of images following one another. We believe that the VCF algorithm can be adapted to perform very well in this area too. Because as fast movements make blurry images, the human eye can only detect a certain amount of new details of an image. By delaying the adding of details to several frames, we can achieve high compression rates while keeping sharp images: the image will eventually be sharp if it's still enough for the eye to catch it.
For that, we can not just compress independently all images: we have to take advantage of the fact that the next image is likely to look almost the same as the current one. We propose to use a flag for each top-level cell (all flags put in yet another table) to decided whether this tree will use plain pixel values as for regular images, or a reference to the same tree in the preceding image.
Preliminary tests we have made show that it would be better to used two versions of the current image: the regular one, and a slightly blurrer one. Each tree will decide to use one or the other. This is needed in order to avoir the wasting of transitions to "erase" details.
Another improvement would probably be to use a motion-compensation reference, by allowing a tree to reference a slightly shifted top-level cell sized area in one of the reference images. The shifting of this reference needs only to be smaller than a top-level cell size, horizontally and vertically.
The same way different channel could be encoded as different images, we can embed several layers in the same compressed file. Layers are not only useful for image retouching, viewer applications could take advantage of such a feature, just like subtitles for a movie. For instance, schematics could be displayed in a very handy fashion with a viewer allowing to hide and show different layers. One can also think of a medical viewer, where anatomic images could be explored while showing and hiding sets of organs.
Some of these layers could contain non-graphical data, such as chunks of data. We could use these chunks to store meta-data related to the image, such as copyright information, EXIF data, etc.
The VCF principle of transition can be applied not only to images (2D), but also to data having any number of dimension.
For instance, one could wish to compress 1D data, such as sound, by using a transition of 1 cell to 2 (half) cells. Of course, the cell value would probably not be calculated with the mean of samples, and the cells reconstruction will not be either flat or interpolated. This sound compression will likely use a time-frequency representation of some kind to be efficient.
On the other hand, compressing 3D data sets (such as medical ultra-sound images or mechanical models) would use most of the concepts present in this paper. The transitions would be 1 cell to 8 smaller cells, and the flat and interpolated values still apply. A viewer application would build a 2D section and projection of the data set. The user would be allowed to rotate and move the object in the viewer, move the section, and it would be possible to show and hide layers to make some parts visible or not.
There is no reason to limit the applications to the third dimension. A set of n-dimensions data could be displayed with exactly the same viewer: after all, an nD-to-2D projection is not much more complicated than a 3D-to-2D projection, same for interpolation. Transitions will be from 1 cell to 2^n cells, and flat or interpolated cell will be built. Statistical visualisation could benefit from such a viewer using flat cells, possibly with the indexed color encoding, and layers.
If we use another projection formula, we can also display 2D data in a panoramic fashion, either from seen from outside (for an object), or seen from inside (for a room or a landscape).
One great promise of the fractal compression was to store multi-resolutions images. Well, VCF can also do that too! In the test implementation, we choosed to keep the transitions unpainted. If we keep the pixel value of the cell even for transitions, we can choose down to what level we want to uncompress the image: no need to uncompress a huge 3000x2000 image and scale it down to display it on a 640x480 screen. Same if we want to display a 80x60 thumbnailed version of the image.
Of course, that would bloat the compressed file with pixel value that will seldom be displayed. But not that much, since upper level cells are much less numerous than the pixels. Moreever, we can compensate this somehow by allowing "not-painted non-transitions": if the transition does not occur, the cell will be painted with the value of the upper level, and there will be no value for this particular cell.
Hidden details and Infinite loops
Since we can have multi-resolutions images, we can display any image at the resolution we want without needing to uncompress the details smaller than a screen's pixel.
We can use that to put, in an image, details that need to be zoomed to become visible. For instance, one could zoom in an image of a person's face and display a very detailed representation of the reflections in one eye of this person. Many areas of such an image could hide "easter-eggs" to be explored.
Even better: zooming into the eyes of the person would reveal a complete landscape where the same person is standing. One could continuously zoom from inside the eyes to the landscape, to the person's face, to the eyes ... in an infinite loop. Several completely indepent such infinite zooming loops could exist in the same image.
Last modification: March 2002, © Eric GAUDET, 2002