Tuesday, December 15, 2009

Recovering Deleted JPEGs from a FAT File System - Part 3

Part 3 in a series of posts on recovering deleted JPEG files from a FAT file system.

The FAT file system was invented in 1980 by Tim Paterson while developing 86-DOS, the precursor of MS-DOS. Since its creation, FAT has gone through a number of evolutions to accommodate growing disk sizes, and provide more capabilities. Although FAT is nearly 30 years old, it remains in common use due to its ubiquitous OS support and simplicity - both important factors for embedded consumer devices like digital cameras.

FAT's history is complex, a retelling could consume an entire series of posts itself. For this project, it is sufficient to know that there are three main variants - FAT12, FAT16, and FAT32 - each successively supporting larger maximum disk sizes. Generally speaking, the on-disk format of all three variants is very similar, so much so that code developed to grok one format can, with reasonable effort, be hacked to grok another1. In this project, we'll be working with a FAT16 file system which hdiutil can be explicitly requested to create by using the argument -fs "MS-DOS FAT16"2.

The following diagram depicts the high-level, on-disk structure of a FAT16 file system (not drawn to scale).

+---------+-------+--------+-------+---------------------+
|   BOOT  | RESV  |   FAT  | ROOT  |  DATA ............. |
|  SECTOR | (OPT) |        | DIR   |  AREA ............. |
+---------+-------+--------+-------+---------------------+
0                                                        N

In computer storage, capacity is divided into fixed sized units called sectors - typically 512 bytes long3. The sector is the fundamental unit for addressing storage and smallest amount of data that can be transferred to/from a disk drive. While older disk drives used an awkward addressing scheme based on physical geometry, newer drives use a linear series of sector addresses.

The first sector (offset 0) of a FAT16 file system contains the BOOT SECTOR which provides critical information about the file system's organization. Optionally, a number of sectors may be reserved after the boot sector (RESV).

The next major data structure is the File Allocation Table (FAT) from which the file system gets its name. The FAT is essentially an array of 16bit elements 4 each representing a fixed size portion of the DATA AREA. To minimize the FAT's relative size, each element represents a cluster5, a power-of-two multiple number of sectors. In addition to tracking the allocation status of each DATA AREA cluster, FAT elements are also used to store pointers forming linked-lists describing the on-disk location of files and sub-directories spanning multiple clusters.

After the FAT comes the root directory area (ROOT DIR) used to hold the information describing the files and sub-directories at the top of the file system namespace tree. In FAT12 and FAT16, the root directory area is a fixed size which limits the numbers of files and sub-directories that can be stored in it.

The remainder of the disk capacity (DATA AREA) is essentially a heap allocated as needed to store file and sub-directory data. As mentioned previously, use of the DATA AREA is managed by the FAT.

In the next few posts, I'll describe each of these major areas in greater detail and present the code needed to interpret them. By the end, we should essentially have a read-only FAT file system implementation that will serve as the basis for the remainder of the project.

Footnotes:

1 Here I am ignoring many, many subtle differences and complications. A thorough discussion of the technical details for all of FAT's variants is outside the scope of this series. The curious reader is recommended to read the bountiful information available via a Google search. Brian Carrier's book, File System Forensic Analysis, is another valuable resource (which sadly wasn't available when I first did this project in 2004).

2 Note that FAT16 file systems have a minimum supported size - usually 16MB but some tools may allow smaller sizes.

3 Some enterprise level disk drives use a 520 byte sector - the additional 8 bytes are often used to store error checking and recovery information for RAID systems.

4 The size of the FAT elements is one of the principal differences between the file system variants. FAT12 uses 12bit entries while FAT32 uses 32bit entries. The FAT element size is one of the factors that determines each variant's maximum supported disk size.

5 Other file systems may use the term block instead of cluster.