Part 3 in a series of posts on recovering deleted JPEG files from a FAT file system.
The FAT file system was invented in 1980 by Tim Paterson while
developing 86-DOS, the precursor of MS-DOS. Since its creation, FAT
has gone through a number of evolutions to accommodate growing disk
sizes, and provide more capabilities. Although FAT is nearly 30 years
old, it remains in common use due to its ubiquitous OS support and
simplicity - both important factors for embedded consumer devices like
digital cameras.
FAT's history is complex, a retelling could consume an entire series
of posts itself. For this project, it is sufficient to know that there
are three main variants - FAT12, FAT16, and FAT32 - each successively
supporting larger maximum disk sizes. Generally speaking, the on-disk
format of all three variants is very similar, so much so that code
developed to grok one format can, with reasonable effort, be hacked to
grok another. In this project, we'll be working with a FAT16
file system which hdiutil
can be explicitly requested to create by
using the argument -fs "MS-DOS FAT16"
.
The following diagram depicts the high-level, on-disk structure of a
FAT16 file system (not drawn to scale).
+---------+-------+--------+-------+---------------------+
| BOOT | RESV | FAT | ROOT | DATA ............. |
| SECTOR | (OPT) | | DIR | AREA ............. |
+---------+-------+--------+-------+---------------------+
0 N
In computer storage, capacity is divided into fixed sized units called
sectors - typically 512 bytes long. The sector is the
fundamental unit for addressing storage and smallest amount of data
that can be transferred to/from a disk drive. While older disk drives
used an awkward addressing scheme based on physical geometry, newer
drives use a linear series of sector addresses.
The first sector (offset 0) of a FAT16 file system contains the BOOT SECTOR
which provides critical information about the file system's
organization. Optionally, a number of sectors may be reserved after
the boot sector (RESV
).
The next major data structure is the File Allocation Table (FAT
)
from which the file system gets its name. The FAT is essentially an
array of 16bit elements each representing a fixed size portion
of the DATA AREA
. To minimize the FAT's relative size, each element
represents a cluster, a power-of-two multiple number of
sectors. In addition to tracking the allocation status of each DATA AREA
cluster, FAT elements are also used to store pointers forming
linked-lists describing the on-disk location of files and
sub-directories spanning multiple clusters.
After the FAT comes the root directory area (ROOT DIR
) used to hold
the information describing the files and sub-directories at the top of
the file system namespace tree. In FAT12 and FAT16, the root directory
area is a fixed size which limits the numbers of files and
sub-directories that can be stored in it.
The remainder of the disk capacity (DATA AREA
) is essentially a heap
allocated as needed to store file and sub-directory data. As mentioned
previously, use of the DATA AREA
is managed by the FAT
.
In the next few posts, I'll describe each of these major areas in
greater detail and present the code needed to interpret them. By the
end, we should essentially have a read-only FAT file system
implementation that will serve as the basis for the remainder of the
project.