To develop and standardize a portable pixel encoding that efficiently represents the full range of colors and luminances perceivable by the human visual system.
We propose the development of a new pixel encoding standard based on human perception rather than conventional computer hardware. This standard is needed by the computer graphics community to preserve information in image rendering, digital photography and image processing. The proposed format is simple and compact, taking no more space than a standard 24-bit/pixel encoding, yet its basis in human perception means that it will not be outmoded by new computer hardware, nor will it compromise important visual information. Our hope is that this new format will be shared and supported by all major sources and consumers of digital imagery, and will be maintained as an open standard by a small industry group.
Most image pixel encodings are based on what a conventional CRT monitor can display, rather than what people are able to see. For example, the CIE CCIR-709 reference primaries adopted by Kodak for their Photo YCC encoding are designed to correspond well with typical monitor phosphors, and the power law used to compress the dynamic range corresponds closely to the typical gamma response curve of a CRT. The resulting 24-bit/pixel encoding can be mapped very easily to a monitor simply by scaling the RGB values to the appropriate range and converting them to gun voltages. However, there are some very important limitations with monitor-oriented encodings:
Future display technologies may have a much greater dynamic range and wider gamuts than today's monitors, but if we have not preserved this information in our encoded images, we will not be able to take full advantage of these enhanced display capabilities. Since the human eye is unlikely to be upgraded in the immediate future, it is more sensible to represent the full range of what can be seen rather than what can currently be displayed.
Most image manipulation software works in a 24-bit monitor RGB space, which is very limiting since it is neither linear nor logarithmic. Even something as simple as adding two images together or filtering an image to a smaller size cannot be done correctly without reverting to a linear representation. However, converting each primary to a linear, 8-bit integer value would be a mistake, because it would create visible quantization error in shadow regions. Because doing the math correctly would require costly conversion and manipulation of floating point values, most image processing applications approximate the math in monitor space, and leave users to suffer with the aliasing and contrast problems that result.
In film photography, the latitude of a negative determines the maximum range in exposure over which one can capture image detail. A negative film with a wide latitude allows one to make adjustments during the printing stage to the exposure and distribution of highlight and shadow regions in the image. In the simplest case, the developer can increase or decrease the print exposure to maximize contrast in the most important part of the image. Using more advanced techniques such as "dodge and burn," professional photographers and hobbyists can bring out multiple areas in the image that would normally be outside the latitude of the print paper. The forgiving nature of color negative film is also very important for amateur photographers, who don't always have the best equipment, and don't always shoot pictures under optimal lighting. Even professionals are challenged when photographing outdoor scenes that have extreme differences between light and shadow, and the wide latitude of negative film is needed to obtain a reasonable result. Unfortunately, once the information contained on the negative has been converted to a conventional digital representation such as Photo YCC, the original latitude is lost and exposure balancing is not possible, no matter what equipment we bring to the digital darkroom.
We suggest that it is possible to represent the full range of what humans are capable of seeing in the same number of bits as is currently used to represent what a typical monitor is capable of displaying. In experiments we have performed using a log encoding of luminance and a CIE (u',v') encoding of chroma, 24-bits/pixel is enough to cover nearly 5 orders of magnitude with luminance and chroma step sizes that are imperceptible at any display level. A more accurate encoding using 32-bits/pixel would enable us to use even smaller step sizes and encode negative luminance values as well, which are useful in filter kernels and other types of image manipulation.
This type of encoding would have three principle advantages:
Tone mapping is the process of taking world luminance and chroma values and converting them to display luminance and chroma values. In conventional photography, extensive research has been done to find the optimal response functions to map tones from typical indoor and outdoor scenes to printed or projected media. These tone mapping functions have been implemented as faithfully as possible in the appropriate chemical photofinishing processes. These same functions, as well as functions not yet devised, could be applied even more faithfully to digital images, but only if the original information survives the analog to digital encoding process.
Especially in the nascent field of digital photography, a perceptual color encoding is badly needed to enable us to do with computer images what can presently be done only with film. Likewise, rendering and image manipulation is severely constrained by a 24-bit integer space that is neither native nor natural to the algorithms employed.
The following figures illustrate the limitations of current monitor-based encoding methods. Specifically, we consider Kodak's Photo YCC encoding, since it is the most comprehensive and widespread standard currently in use. Figure 1 shows the limited color space of the standard RGB primary system. Figure 2 shows the limited dynamic range of YCC luminance encoding as compared to our proposed logarithmic representation. Figure 3 shows how quantization error becomes excessive at the low end of the YCC response curve.
Figure 1. CCIR-709 gamut shown in CIE (u',v') perceptually uniform color space. The proposed encoding would cover the entire visible gamut with over 16,000 perceptually-spaced chroma values.
Figure 2. The gamma-law compression used to encode Photo YCC pixels on PhotoCDs is shown in red. The proposed encoding is shown in blue, with a larger dynamic range and the linear response relative to world brightness.
Figure 3. The relative error due to the encoding used for YCC rises sharply near the low end. In contrast, the proposed encoding maintains a constant error over its entire range.
To display the following images, make sure that your monitor is adjusted for a gamma of approximately 2.2. See the page on gamma correction for more information. Click on each image to get a larger version.
Figure 4 shows a PhotoCD scan of an image captured on Kodak Gold 200 negative film. Although the negative itself records a wide dynamic range, the scan is capable of storing only about 2 orders of magnitude. Therefore, it is not possible through any adjustment applied after digitization to recover information recorded in the bright window region of this image.
Figure 5 shows brightness adjustments applied to the PhotoCD scan and a high dynamic range scan of the same negative. Figure 5a shows the original scan darkened in an attempt to show the area outside the window. Figure 5b shows how the darkening process does work when the high dynamic range information is preserved. Figure 5c shows the original scan brightened up, yet we still cannot see detail under the desk. This detail is shown, however, in the high dynamic range scan adjustment.
Figure 6 shows the high dynamic range scan, mapped using a dynamic range compression scheme to fit within the displayable range. Here we can see that the original negative contains detail information in all regions of the image, we just couldn't see it with the default tone mapping.
Click on an image to see a larger version.
Figure 4. A photograph containing a large range of luminance values, shown here as it was recorded on a PhotoCD by standard procedures.
Figure 5. A standard PhotoCD scan shown darkened in (A) and brightened in (C). The same negative scanned and stored to preserve the original dynamic range, shown darkened in (B) and brightened in (D).
Figure 6. The same image, digitized using a high dynamic-range encoding, then later adjusted to fit within the dynamic range of a standard display.
A note about color in above images.
For more information on the tone mapping of high dynamic range images, see the following: