Notes
| Via author Matthew Clark: I have also decided not to include any special information in the vector about the order of the Hilbert curve as the order of the curve is recoverable from the number of values in the array already (4^n, where n is the order). An implication of this is that there is a maximum order of the Hilbert curve usable for a discrete image of pixels; once the number of points in the curve exceeds the number of pixels in the image, the code will not work properly. I don't believe this is an issue since having more points in the Hilbert curve than pixels in the image does not add any useful information to the curve. If desired, it would be possible to edit the code to automatically choose the highest possible order for a given image instead of selecting it manually within the script. However, there may be some use for a "lower-fidelity," lower order curve that I have not thought of - in which case that particular change may be too limiting. These are my main findings concerning loss of information from the image for the code I wrote previously when applying the Hilbert curve: Seeing as all of the data in the image is stored as a scalar field of intensity values, where each discrete coordinate within a square range (even if the square is in fact a rectangle where one dimension has been ALTERED)... Loss of information is caused solely by skipping pixels in the image while processing. This occurs when the ‘scale factor’ I define in the code to overlay the Hilbert curve onto the image exceeds 1. I use the scale factor in part to decide which pixels are recorded in the Hilbert curve; it is defined as the ratio of the square picture pixel dimension size to the dimension size of the generated Hilbert curve. If the two are equal, then there will be the same number of points in the Hilbert curve as there are pixels in the image. So, each point will correspond to one unique pixel. In the case where the order of the Hilbert curve selected has fewer points than the image has pixels (scale factor less than one), not all pixels will be sampled. I just call this the undersampled case. For instance, in a 300x300 size picture, a Hilbert curve of order 8 has 65,536 points to the image’s 90,000 pixels. Thus, the scale factor is 1.37. All of the coordinates in the Hilbert curve are increased in magnitude by 1.37 times. This makes most of them not integers, so these points could not refer to / index a pixel, as the pixels are all whole numbers. My code handles this by rounding to the nearest whole number for each point. This slightly distorts the shape of the Hilbert curve as the lines are no longer of equal length, but this is not actually a concern as the change in length is smaller than the resolution of the image by definition. However, this does cause the undersampling problem, as the Hilbert curve points will progress through the image at steps of length 1.37: Original: 1 2 3 4 5 ... Scf Applied: 1.37 2.74 4.11 5.49 6.87 ... Pixel Index: 2 4 5 6 8 ... In this simplified example, pixels 1, 3 and 7 are not sampled. As the fraction of pixels not recorded is the fraction of the data lost…. The fraction of data recorded can be found by taking the inverse of the scale factor. Note that it is never necessary in my method to have a scale factor of 2 or larger, as if the scale factor were larger than 2, the geometry of the problem is such that there would be a higher order Hilbert curve that records more of the pixels. Thus there is no practical reason to have a scale factor above 2, so in my method- looking at the undersampling case only, the most data that could be lost is 50%. The second case is that of oversampling- this is something I was not able to do before, but the issue in the code has been fixed; my example image was 667x666 pixels, which is not square, and thus caused an out of bounds error as there is no 667th width pixel for it to sample as the code tries to assuming that the image is square. Simply ignoring the 667th row or stretching the image horizontally to make up for the missing row before processing are both feasible ways of avoiding the mistake I initially made. In the case of oversampling, a Hilbert order is selected that has a higher number of points in the curve than the image has pixels. So in this case the scale factor will be less than one. As an example, considering the 300x300 pixel image, now with a Hilbert curve order of 9 that has 262,144 points, the scale factor is now 0.34. Again, most of the new index references for the pixels collected will not be integers, and they are rounded accordingly: Original: 1 2 3 4 5 ... Scf Applied: 0.34 0.68 1.02 1.36 1.7 ... Pixel Index: 1 2 2 2 3 ... In this case, pixels 1,2, and 3 are all sampled with no gaps, but pixel two is now sampled multiple times. In the case of oversampling therefore, each point on the Hilbert curve will not necessarily refer to a unique pixel (though locality will be preserved due to the nature of the Hilbert curve itself, though I have not demonstrated this in the ‘one-dimensional’ example above). In addition to the Hilbert point indices no longer indexing unique pixels, oversampling adds inefficiency by storing the same intensity value of a single pixel multiple times, and since not all pixels are sampled the same amount of times, this may slightly increase the complexity of the method required to transform the curve back into the original image (my line method that I employed in the code developed would no longer work since one pixel would appear to be ‘lengthened’). This however is not an enormous challenge, as locality ensures that it is not overly difficult to identify which points are repeated. The final, and most advantageous case for sampling is the case of equal sampling, wherein the Hilbert curve number of points and pixel number are identical. In this case, the scale factor is of course 1, and thus the Hilbert point coordinates perfectly correspond to every individual pixel in the image uniquely. In this case, there is no loss or redundancy. Because of its clear advantage over undersampling, and oversampling, I considered the idea of completely redoing the design of the program, from the current scheme where a Hilbert curve is scaled up to the size of a square image or an image that has been manipulated by some transformation into a square, to a scheme where pixels from a square image of 4^n pixels are simply laid over a Hilbert curve of order n to store them. I ultimately decided against this because I think it unnecessarily limits the ability of the code as it already exists; simply submitting an image of the proper 4^n pixel size through this code will result in the same output, and I do not think that the increase in computational time would be an issue. Furthermore, abandoning the current framework would restrict the type of file sizes and shapes that could be used with the code, and seeing as the optimal equal sampling functionality exists in the present code, I see no reason to jettison the capacity to do under or over sampling if that is the user’s choice. |