THE STRUCTURE OF JPEG PICTURES
This document is an appendix to the main manual page of the Image::MetaData::JPEG module, which the reader should refer to for further details and the general scope. The structure of a well formed JPEG file can be described by the following pseudo production rules (for sake of simplicity, some additional constraints between tables and SOF segments are neglected).
JPEG --> (SOI)(misc)*(image)?(EOI)
(image) --> (hierarch.)|(non-hier.)
(hierarch.) --> (DHP)(frame)+
(frame) --> (misc)*(EXP)?(non-hier.)
(non-hier.) --> (SOF)(scan)+
(scan) --> (misc)*(SOS)(data)*(ECS)(DNL)?
(data) --> (ECS)(RST)
(misc) --> (DQT)|(DHT)|(DAC)|(DRI)|(COM)|(APP)
(SOI) = Start Of Image
(EOI) = End Of Image
(SOF) = Start Of Frame header (10 types)
(SOS) = Start Of Scan header
(ECS) = Entropy Coded Segment (row data, not a real segment)
(DNL) = Define Number of Lines segment
(DHP) = Define Hierarchical P??? segment
(EXP) = EXPansion segment
(RST) = ReSTart segment (8 types)
(DQT) = Define Quantisation Table
(DHT) = Define Huffman coding Table
(DAC) = Define Arithmetic coding Table
(DRI) = Define Restart Interval
(COM) = COMment segment
(APP) = APPlication segment
This package does not check that a JPEG file is really correct; it accepts a looser syntax, were segments and ECS blocks are just contiguous (basically, because it does not need to display the image!). All meta-data information is concentrated in the (COM*) and (APP) Segments, exception made for some records in the (SOF*) segment (e.g. image dimensions). For further details see
B<"Digital compression and coding of continuous-tone still images:
requirements and guidelines", CCITT recommendation T.81, 09/1992,
The International Telegraph and Telephone Consultative Committee>.
Structure of a JFIF APP0 segment
APP0 segments are used in the old JFIF standard to store information about the picture dimensions and an optional thumbnail. The format of a JFIF APP0 segment is as follows (note that the size of thumbnail data is 3n, where n = Xthumbnail * Ythumbnail, and it is present only if n is not zero; only the first 8 records are mandatory):
[Record name] [size] [description]
---------------------------------------
Identifier 5 bytes ("JFIF\000" = 0x4a46494600)
MajorVersion 1 byte major version (e.g. 0x01)
MinorVersion 1 byte minor version (e.g. 0x01 or 0x02)
Units 1 byte units (0: densities give aspect ratio
1: density values are dots per inch
2: density values are dots per cm)
Xdensity 2 bytes horizontal pixel density
Ydensity 2 bytes vertical pixel density
Xthumbnail 1 byte thumbnail horizontal pixel count
Ythumbnail 1 byte thumbnail vertical pixel count
ThumbnailData 3n bytes thumbnail image
There is also an extended JFIF (only possible for JFIF versions 1.02 and above). In this case the identifier is not JFIF but JFXX. This extension allows for the inclusion of differently encoded thumbnails. The syntax in this case is modified as follows:
[Record name] [size] [description]
---------------------------------------
Identifier 5 bytes ("JFXX\000" = 0x4a46585800)
ExtensionCode 1 byte (0x10 Thumbnail coded using JPEG
0x11 Thumbnail using 1 byte/pixel
0x13 Thumbnail using 3 bytes/pixel)
Then, depending on the extension code, there are other records to define the thumbnail. If the thumbnail is coded using a JPEG stream, a binary JPEG stream immediately follows the extension code (the byte count of this file is included in the byte count of the APP0 Segment). This stream conforms to the syntax for a JPEG file (SOI .... SOF ... EOI); however, no 'JFIF' or 'JFXX' marker Segments should be present:
[Record name] [size] [description]
---------------------------------------
JPEGThumbnail ... bytes a variable length JPEG picture
If the thumbnail is stored using one byte per pixel, after the extension code one should find a palette and an indexed RGB. The records are as follows (remember that n = Xthumbnail * Ythumbnail):
[Record name] [size] [description]
---------------------------------------
Xthumbnail 1 byte thumbnail horizontal pixel count
YThumbnail 1 byte thumbnail vertical pixel count
ColorPalette 768 bytes 24-bit RGB values for the colour palette
(defining the colours represented by each
value of an 8-bit binary encoding)
1ByteThumbnail n bytes 8-bit indexed values for the thumbnail
If the thumbnail is stored using three bytes per pixel, there is no colour palette, so the previous fields simplify into:
[Record name] [size] [description]
---------------------------------------
Xthumbnail 1 byte thumbnail horizontal pixel count
YThumbnail 1 byte thumbnail vertical pixel count
3BytesThumbnail 3n bytes 24-bit RGB values for the thumbnail
Structure of an Exif APP1 segment
Exif (Exchangeable Image File format) JPEG files use APP1 segments in order not to conflict with JFIF files (which use APP0). Exif APP1 segments store a great amount of information on photographic parameters for digital cameras and are the preferred way to store thumbnail images nowadays. They can also host an additional section with GPS data. The reference document for Exif 2.2 and the Interoperability standards are respectively:
B<"Exchangeable image file format for digital still cameras:
Exif Version 2.2", JEITA CP-3451, Apr 2002
Japan Electronic Industry Development Association (JEIDA)>
B<"Design rule for Camera File system", (DCF), v1.0
English Version 1999.1.7, Adopted December 1998
Japan Electronic Industry Development Association (JEIDA)>
The TIFF (Tagged Image File format) standard documents, as well as some updates and corrections, are also useful:
B<- "TIFF(TM) Revision 6.0, Final", June 3, 1992, Adobe Devel. Association
- ISO 12639, "Graphic technology -- Prepress digital data exchange
-- Tag image file format for image technology (TIFF/IT)"
- ISO 12234-2, "Electronic still-picture imaging -- Removable memory
-- Part 2: TIFF/EP image data format"
- DRAFT - TIFF CLASS F, October 1, 1991
- DRAFT - TIFF Technical Note #2, 17-Mar-95 (updates for JPEG-in-TIFF)
- "Adobe Pagemaker 6.0 TIFF Technical Notes", (1,2,3 and OPI), 14-Sep-1995>
Exif APP1 segments are made up by an identifier, a TIFF header and a sequence of IFDs (Image File Directories) and subIFDs. The high level IFDs are only two (IFD0, for photographic parameters, and IFD1 for thumbnail parameters); they can be followed by thumbnail data. The structure is as follows:
[Record name] [size] [description]
---------------------------------------
Identifier 6 bytes ("Exif\000\000" = 0x457869660000), not stored
Endianness 2 bytes 'II' (little endian) or 'MM' (big endian)
Signature 2 bytes a fixed value = 42
IFD0_Pointer 4 bytes offset of 0th IFD (usually 8), not stored
IFD0 ... main image IFD
IFD0@SubIFD ... Exif private tags (optional, linked by IFD0)
IFD0@SubIFD@Interop ... Interoperability IFD (optional,linked by SubIFD)
IFD0@GPS ... GPS IFD (optional, linked by IFD0)
APP1@IFD1 ... thumbnail IFD (optional, pointed to by IFD0)
ThumbnailData ... Thumbnail image (optional, 0xffd8.....ffd9)
So, each Exif APP1 segment starts with the identifier string "Exif\000\000"; this avoids a conflict with other applications using APP1, for instance XMP data. The three following fields (Endianness, Signature and IFD0_Pointer) constitute the so called TIFF header. The offset of the 0th IFD in the TIFF header, as well as IFD links in the following IFDs, is given with respect to the beginning of the TIFF header (i.e. the address of the 'MM' or 'II' pair). This means that if the 0th IFD begins (as usual) immediately after the end of the TIFF header, the offset value is 8. An Exif segment is the only part of a JPEG file whose endianness is not fixed to big endian.
If the thumbnail is present it is located after the 1st IFD. There are 3 possible formats: JPEG (only this is compressed), RGB TIFF, and YCbCr TIFF. It seems that JPEG and 160x120 pixels are recommended for Exif ver. 2.1 or higher (mandatory for DCF files). Since the segment size for a segment is recorded in 2 bytes, thumbnails are limited to slightly less than 64KB.
Each IFD block is a structured sequence of records, called, in the Exif jargon, Interoperability arrays. The beginning of the 0th IFD is given by the 'IFD0_Pointer' value. The structure of an IFD is the following:
[Record name] [size] [description]
---------------------------------------
2 bytes number n of Interoperability arrays
12n bytes the n arrays (12 bytes each)
4 bytes link to next IFD (can be zero)
... additional data area
The next_link field of the 0th IFD, if non-null, points to the beginning of the 1st IFD. The 1st IFD as well as all other sub-IFDs must have next_link set to zero. The thumbnail location and size is given by some interoperability arrays in the 1st IFD. The structure of an Interoperability array is:
[Record name] [size] [description]
---------------------------------------
2 bytes Tag (a unique 2-byte number)
2 bytes Type (one out of 12 types)
4 bytes Count (the number of values)
4 bytes Value Offset (value or offset)
The possible types are the same as for the Record class, exception made for nibbles and references (see "Managing a JPEG Record object" in Image::MetaData::JPEG). Indeed, the Record class is modelled after interoperability arrays, and each interoperability array gets stored as a Record with given tag, type, count and values. The "value offset" field gives the offset from the TIFF header base where the value is recorded. It contains the actual value if it is not larger than 4 bytes (32 bits). If the value is shorter than 4 bytes, it is recorded in the lower end of the 4-byte area (smaller offsets). For further details see the section "Valid tags for Exif APP1 data" in Image::MetaData::JPEG::TagLists.
Structure of a Photoshop-style APP13 segment
The Adobe's Photoshop program, a de-facto standard for image manipulation, uses the APP13 segment for storing non-graphic information, such as layers, paths, IPTC data and more. The unit for this kind of information is called a "resource data block" (because they hold data that was stored in the Macintosh's resource fork in early versions of Photoshop). The content of an APP13 segment is the string "Photoshop 3.0\000" followed by a sequence of resource data blocks; a resource block has the following structure:
[Record name] [size] [description]
---------------------------------------
(Type) 4 bytes Photoshop always uses '8BIM'
(ID) 2 bytes a unique identifier, e.g., "\004\004" for IPTC
(Name) ... a Pascal string (padded to make size even)
(Size) 4 bytes actual size of resource data
(Data) ... resource data, padded to make size even
(a Pascal string is made up of a single byte, giving the string length, followed by the string itself, padded to make size even including the length byte; since the string length is explicit, there is no need of a terminating null character). Valid Image Resource IDs are listed in the Photoshop-style tags' list section. In general a resource block contains only a few bytes, but there is an important block, the IPTC block, which can be quite large; the structure of this block is analysed in more detail in the IPTC data block section.
The reference document for the Photoshop file format is:
B<"Adobe Photoshop 6.0: File Formats Specifications",
Adobe System Inc., ver.6.0, rel.2, November 2000>.
Another interesting source of information is:
B<"\"Solo\" Image File Format. RichTIFF and its
replacement by \"Solo\" JFIF", version 2.0a,
Coatsworth Comm. Inc., Brampton, Ontario, Canada>
Structure of an IPTC data block
An IPTC/NAA resource data block of a Photoshop-style APP13 segment embeds an IPTC stream conforming to the standard defined by the International Press and Telecommunications Council (IPTC) and the Newspaper Association of America (NAA) for exchanging interoperability information related to various news objects. The data part of a resource block, an IPTC stream, is simply a sequence of units called datasets; no preamble nor count is present. Each dataset consists of a unique tag header and a data field (the list of valid tags [dataset numbers] can be found in section about IPTC data). A standard tag header is used when the data field size is less than 32768 bytes; otherwise, an extended tag header is used. The datasets do not need to show up in numerical order according to their tag. The structure of a dataset is:
[Record name] [size] [description]
---------------------------------------
(Tag marker) 1 byte this must be 0x1c
(Record number) 1 byte always 2 for 2:xx datasets
(Dataset number) 1 byte this is what we call a "tag"
(Size specifier) 2 bytes data length (< 32768 bytes) or length of ...
(Size specifier) ... data length (> 32767 bytes only)
(Data) ... (its length is specified before)
So, standard datasets have a 5 bytes tag header; the last two bytes in the header contain the data field length, the most significant bit being always 0. For extended datasets instead, these two bytes contain the length of the (following) data field length, the most significant bit being always 1. The value of the most significant bit thus distinguishes "standard" from "extended"; in digital photographies, I assume that the datasets which are actually used (a subset of the standard) are always standard; therefore, we likely do not have the IPTC block spanning more than one APP13 segment. The record types defined by the IPTC-NAA standard are the following (but the "pseudo"-standard by Adobe for APP13 IPTC data is restricted to the first application record, 2:xx, I believe, because the enveloping structure is replaced by the resource data block):
[Record name] [dataset record number]
----------------------------------------------------
Object Envelop Record 1:xx
Application Records: 2:xx through 6:xx
Pre-ObjectData Descriptor Record: 7:xx
ObjectData Record: 8:xx
Post-ObjectData Descriptor Record: 9:xx
The reference document for the IPTC standard is:
B<"IPTC-NAA: Information Interchange Model", version 4, 1-Jul-1999,
Comité International des Télécommunications de Presse>
AUTHOR
Stefano Bettelli, bettelli@cpan.org
COPYRIGHT AND LICENSE
Copyright (C) 2004 by Stefano Bettelli
This library is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License. See the COPYING and LICENSE file for the license terms.
SEE ALSO
The main documentation page for the Image::MetaData::JPEG module.
1 POD Error
The following errors were encountered while parsing the POD:
- Around line 314:
Non-ASCII character seen before =encoding in 'Comité'. Assuming CP1252