Thursday, February 18, 2010

Recovering Deleted JPEGs from a FAT File System - Part 8

Part 8 in a series of posts on recovering deleted JPEG files from a FAT file system.

In part 7, I demonstrated recovering deleted JPEG files through knowing their pre-deletion location in a FAT file system. In the real use-case of recovering accidentally deleted files, the locations are unknown making this approach impossible.

Recovering deleted files without knowing their location requires a method to find them within the unerased data. In this post, I'll show how the structure of a JPEG file can be used to do just that. Follow the read more link for the full discussion.


JPEG File Structure

Generally speaking, files meant to be processed programmatically employ some form of deterministic structure. Two common approaches are to partition the file into well defined segments or use an embedded catalog to record the file's contents (similar to a file system directory). Regardless of the approach, this deterministic structure can be used to reconstitute a file from its residual data.

In JPEG's case, the segmentation approach is used. The official JPEG format is specified in Annex B of the ISO/IEC International Standard 10918-1 - otherwise known as the JPEG Interchange Format (JIF). The specification defines:

  • a variety of segment types to store metadata, compressed image data, etc.
  • the combinations of segment types that form valid JPEG files.
  • unique two-byte markers used to demarcate segments and other key aspects of a JPEG file's structure.

For the purposes of this discussion, the markers are particularly important.

Every two-byte marker consists of the value 0xFF followed by a non-zero value representing the marker's type. In some cases, markers may be proceeded by a series of 0xFF "fill bytes" to meet alignment requirements - these fill bytes can be ignored. To avoid false markers in segment payloads (e.g. compressed image data), all non-marker related 0xFF values must be followed by x0x00 to escape them - the null bytes should be ignored during processing. This simple marker scheme allows a JPEG file to be parsed without having to interpret each constituent segment.

Apparently, the JIF format is rarely used in practice due to its complexity. Instead, two simplified variants - JFIF and EXIF - are commonly used instead. Both JFIF and EXIF utilize JIF's built-in extension mechanism and marker values. This means that a program that understands JIF markers can determine the structure of both JFIF and EXIF files.


An Example

To further understand the structure of a JPEG file, let's examine one of the test image files. Inspecting the first 128 bytes of the file 4.1.01.jpg using hexdump results in:

$ hexdump -n 128 -s 0 -C images/4.1.01.jpg
00000  ff d8 ff e0 00 10 4a 46  49 46 00 01 01 01 00 48  |......JFIF.....H|
00010  00 48 00 00 ff db 00 43  00 03 02 02 03 02 02 03  |.H.....C........|
00020  03 03 03 04 03 03 04 05  08 05 05 04 04 05 0a 07  |................|
00030  07 06 08 0c 0a 0c 0c 0b  0a 0b 0b 0d 0e 12 10 0d  |................|
00040  0e 11 0e 0b 0b 10 16 10  11 13 14 15 15 15 0c 0f  |................|
00050  17 18 16 14 18 12 14 15  14 ff db 00 43 01 03 04  |............C...|
00060  04 05 04 05 09 05 05 09  14 0d 0b 0d 14 14 14 14  |................|
00070  14 14 14 14 14 14 14 14  14 14 14 14 14 14 14 14  |................|

To make the output clearer, the markers have been bolded and color coded.

First, we see that the file starts off with the marker 0xFF_D8, this is the Start-of-Image marker (SOI) that must begin all JPEG files. SOI markers stand alone and do not have an associated segment.

Next comes the marker 0xFF_E0 which identifies an APP0 application specific segment. Application segments are JIF's built-in extension mechanism to allow application specific information to be embedded inside JPEG files. The JIF standard reserves 16 markers (0xFF_E0 to 0xFF_EF) for application segments which are available for general use - the JIF standard makes no attempt to assign application segments to specific applications. By convention, JFIF files use an APP0 segment while EXIF files use an APP1 segment. To guard against other applications using the same application segments, both JFIF and EXIF identify themselves by including the 'JFIF' or 'EXIF' ASCII string in the 4th through 7th bytes of the segment (from the marker's first byte). In this case, we see that the APP0 segment indeed contains the string 'JFIF' in bytes 6 through 9.

After the marker, each segment begins with a two-byte length parameter (excluding the marker). The output above indicates that the APP0 is 0x10 bytes long and sure enough the next marker is found at offset 0x14. This time the marker 0xFF_DB indicates the beginning of a quantization table segment (DQT) that is 0x43 bytes long. This is followed by another DQT segment at offset 0x59.

Similarly analyzing the remainder of the file reveals the following markers and segments.

OFFSETMARKERSEGMENT?LENGTH(B)DESCRIPTION
0X00000XFF_D8 (SOI)N-Start of image
0X00020XFF_E0 (APP0)Y0X10Application segment
0X00140XFF_DB (DQT)Y0X43Quantization table
0X00590XFF_DB (DQT)Y0X43Quantization table
0x009E0xFF_C0 (SOF0)Y0x11Start of frame
0x00B10xFF_C4 (DHT)Y0x1DHuffman table
0x00D00xFF_C4 (DHT)Y0x3EHuffman table
0x01100xFF_C4 (DHT)Y0x1BHuffman table
0x012D0xFF_C4 (DHT)Y0x37Huffman table
0x01660xFF_DA (SOS)YUNSPECIFIEDStart of scan
0x5B7D0xFF_D9 (EOI)N-End of image

Houston we have a problem

Notice in the table above that the SOS segment has an unspecified length. Based on my reading of the JIF specification and other references, the length of the entropy coded image data in the SOS segment is not explicitly specified. Instead, it is terminated by either an EOI or other marker. I suspect this was done to allow encoders to generate JPEG files as images are compressed - in this case the length of the compressed data is unknown when the SOS is started.

This presents a problem for recovering deleted JPEG files as it means their structure isn't completely deterministic. In the next post, I'll discuss the limitations this imposes and demonstrate how markers can be used to recover contiguous deleted files.