Thursday, December 31, 2009

Recovering Deleted JPEGs from a FAT File System - Part 6

Part 6 in a series of posts on recovering deleted JPEG files from a FAT file system.

In this post we'll tackle the last major FAT file system data structure - the file allocation table. Compared to the previous post on the root directory, this post is much less complex. Follow the "read more" link for an overview of the file allocation table and the code required to process it.

As mentioned in previous posts, a FAT16 file allocation table consists of one 16 bit entry for each cluster in the data area. The value stored in each entry indicates the associated cluster's allocation state and, if needed, the next cluster containing data for the same file system object.

Although table entries exist for clusters 0 and 1, they are reserved for the system's use. The first byte of entry 0 is used to store a copy of the mediaType field from the common boot sector - the entry's remaining bits are set to 1. Entry 1 is sometimes used to record if the volume was unmounted cleanly. Because of these reservations, the data area cluster information beings with entry 2.

Initially, table entries 2 and higher are initialized to a value of 0x0000 indicating that the associated clusters are available for use. Bad clusters are signified by the value 0xFFF7 while reserved clusters have a value between 0xFFF0 and 0xFFF6 (inclusive).

Table entries for allocated clusters contain either:

  • The number of the next cluster containing additional data for the associated file system object (file or directory).
  • A value between 0xFFF8 and 0xFFFF (inclusive) indicating that this is the last cluster containing data for the associated file system object. Collectively, these values are known as end-of-chain (EOC) values.

Essentially, allocated table entries form a singly linked-list specifying the location and order of the clusters containing a file system object's data. To illustrate this, consider the following diagram:

          ROOT                        FAT
          DIR                        TABLE
  +--------+------+            +-------+-------+
  | FNMAME | 1ST C|            | ENTRY | VALUE |
  +========+======+            +=======+=======+
  | FILEA  | 0x02 +-------+    | 0X00  | RESVD |
  |        |      |       |    |       |       |
  +--------+------+       |    +-------+-------+
  | FILEB  | 0x03 +-----+ |    | 0X01  | RESVD |
  |        |      |     | |    |       |       |
  +--------+------+     | |    +-------+-------+
  |        |      |     | +--->| 0X02  | EOC   |
  |        |      |     |      |       |       |
                        |      +-------+-------+
                        +----->| 0X03  | 0X04  |
                               |       |       +----+
                               +-------+-------+    |
                               | 0X04  | 0X06  |<---+
                               |       |       +----+
                               +-------+-------+    |
                               | 0X05  | 0X00  |    |
                               |       |       |    |
                               +-------+-------+    |
                               | 0X06  | EOC   |<---+
                               |       |       |
                               +-------+-------+
                               |       |       |
                               |       |       |

The diagram depicts two examples, FILEA and FILEB, and their associated entries in the root directory and file allocation table.

FILEA's root directory entry states that cluster 0x02 contains the first portion of its data. To reflect this allocation, file allocation table entry 0x02 has a non-zero value. Furthermore, entry 0x02 contains an EOC value indicating that no other clusters contain data for FILEA - all of FILEA's data fits within the single cluster.

FILEB is a longer file requiring three clusters to store its data. Its root directory entry identifies cluster 0x03 as the first and, appropriately, file allocation table entry 0x03 contains a non-zero value. In this case, entry 0x03 contains the value 0x04 indicating that FILEB's second cluster of data resides in cluster 0x04. File allocation table entry 0x04 in turn contains the value of 0x06 identifying cluster 0x06 as the third holding FILEB's data. Finally, file allocation table 0x06 contains an EOC value indicating that it is the last cluster containing FILEB's data. Note that cluster 0x05, although between clusters holding FILEB's data, remains unallocated and is available for use.

The example illustrates how the file allocation table is used to both track cluster allocations and construct linked-lists describing the location of file system object data. In FAT parlance, these linked lists are called cluster chains.

The file allocation table forms the heart of the file system - a fact reflected by the file system's name, FAT. Before discussing the code required to process the on-disk table, it's worth noting a couple of things:

  1. The smallest allocation unit is a cluster - 2KB in our test image's case. File system objects smaller than a cluster consume the whole thing. As a result, a large number of small files will result in severe internal fragmentation.
  2. The file allocation table only contains forward pointers. There are are no back pointers to prior clusters or the directory entry for the file system object the chain belongs to.
  3. The file allocation table alone describes the location of file system object data - erasing the table essentially erases the file system although the data remains unchanged.

Recall from the first post that the third point was the basis for the original project in 2004. File systems that erase files by only clearing their directory and file allocation table entries leave the data in place thus making it theoretically possible to recover them.

Compared to the root directory, processing the file allocation table is trivial. The following code fragment provides all of the required constants, macros, and type definitions:

#define FAT_FAT16ENT_FREESECTOR       (0x0000U)
#define FAT_FAT16ENT_RESERVED_START   (0xfff0U)
#define FAT_FAT16ENT_RESERVED_END     (0xfff6U)
#define FAT_FAT16ENT_BADSECTOR        (0xfff7U)
#define FAT_FAT16ENT_LASTSECTOR_START (0xfff8U)
#define FAT_FAT16ENT_LASTSECTOR_END   (0xffffU)

#define FAT_FAT16ENT_ISFREESECTOR(x) \
  ((x) == FAT_FAT16ENT_FREESECTOR)
#define FAT_FAT16ENT_ISBADSECTOR(x) \
  ((x) == FAT_FAT16ENT_BADSECTOR)
#define FAT_FAT16ENT_ISLASTSECTOR(x) \
  ((x) >= FAT_FAT16ENT_LASTSECTOR_START)
#define FAT_FAT16ENT_ISRESERVEDSECTOR(x)\
  (((x) >= FAT_FAT16ENT_RESERVED_START) && \
   ((x) <= FAT_FAT16ENT_RESERVED_END))

typedef uint16_t fatFAT16Entry_t;

Reading a cluster entry requires calculating the associated sector number and offset, loading the sector into memory, and extracting the correct two bytes.

fatTableOffset = clusterNum * sizeof(fatFAT16Entry_t); 
sectorNum      = (fatTableOffset / bytesPerSector)
                 + pimageInfo->fatFirstSector;
sectorOffset   = fatTableOffset % bytesPerSector;

frSectorRead(pimageInfo, sectorNum,
             1, pSectorBuff);

memcpy(pfatEntry, (pSectorBuff + sectorOffset),
       sizeof(fatFAT16Entry_t));

Alternatively, a portion of the table loaded into a memory buffer can be iterated over as an array.

 fatFAT16Entry_t *fatTable;
 fatFAT16Entry_t  entry;

  fatTable = (fatFAT16Entry_t *) pfatTableBuffer;
  for(entryIndex=0; entryIndex < numEntries; entryIndex++ ) {

    entry = fatTable[entryIndex];
    ....
  }

Using the code fragments above, I enhanced the program I am (re)developing alongside these posts to process and printout the file allocation table. Running it against the test disk image, frtest1.dmg, created in the second post and used to produce the examples in the prior two posts yielded:

$ sw_vers
ProductName:    Mac OS X
ProductVersion: 10.6.2
BuildVersion:   10C540

$ wc -l fatrecover.c 
     829 fatrecover.c

$ make clean all
rm -r fatrecover fatrecover.dSYM 
gcc -g -Wall fatrecover.c -o fatrecover

$ ./fatrecover frtest1.dmg 

=== FAT RECOVER v0.0 ===

Opening file frtest1.dmg................OK
Reading boot sector.....................OK
Processing boot sector..................OK
Reading root dir........................OK

BOOT SECTOR:
          OEMName: (BSD  4.4)
         volumeID: 63260cf5
     Volume Label: FRTEST1    
           FSType: FAT16   
     Bytes/Sector: 512
  Sectors/Cluster: 4
 Reserved Sectors: 1
   Hidden Sectors: 0
 Sectors/Vol (2B): 32768
 Sectors/Vol (4B): 0
      Sectors/FAT: 32
        # of FATs: 2
# RootDir Entries: 512

MAJOR STRUCTURES:
      AREA    OFFSET      SIZE
----------  --------  --------
 BOOT+RESV  00000000  0x000001 (sectors)
       FAT  0x000001  0x000040 (sectors)
   ROOTDIR  0x000041  0x000020 (sectors)
      DATA  0x000061  0x001fe7 (clusters)

LISTING DIR: ROOT
 ATTR     SIZE            NAME          1ST CLUSTER
------  --------  --------------------  -----------
.....A     92889            4.2.04.jpg  (0x00000a)
.H..D.         0              .Trashes  (0x000002)
.H...A      4096            ._.Trashes  (0x000003)
...V.A         0              FRTEST1.  (00000000)

FAT TABLE ( KEY: .=free  B=bad  -=used X=last R=reserved)

00000000-0000001F: RXX-X.....----------------------
00000020-0000003F: -----------------------X........
00000040-0000005F: ................................
00000060-0000007F: ................................
<OUTPUT FOR EMPTY ENTRIES OMITTED>

CLUSTER CHAIN FOR ENTRY 0xa:
0x000a->0x000b->0x000c->0x000d->0x000e->0x000f->0x0010->
0x0011->0x0012->0x0013->0x0014->0x0015->0x0016->0x0017->
0x0018->0x0019->0x001a->0x001b->0x001c->0x001d->0x001e->
0x001f->0x0020->0x0021->0x0022->0x0023->0x0024->0x0025->
0x0026->0x0027->0x0028->0x0029->0x002a->0x002b->0x002c->
0x002d->0x002e->0x002f->0x0030->0x0031->0x0032->0x0033->
0x0034->0x0035->0x0036->0x0037->EOC (46 clusters)

As discussed previously, file allocation table entries 0 and 1 are reserved by the system. The program output indeed shows that entry 0 contains a RESERVED value but entry 1 instead contains an EOC value. A little research1 uncovered that EOC values are used to both prevent entry 1 from being used and to encode the volume's last unmount state - the output thus makes sense.

Table entry 2 is also allocated and the root directory listing tells us that it belongs to the .Trashes subdirectory. Furthermore, the EOC value indicates that the directory's contents fit within the single sector.

Entry 4 is allocated to the ._.Trashes file which requires two clusters to store its 4096 bytes of data. Presumably, entry 5 also belongs to this file and is the last in its cluster chain.

Finally, entries 0x0a to 0x37 are all allocated to the test JPG file 4.2.04.jpg. This is suggested by the root directory listing and allocation table summary and confirmed by the additional output showing the cluster chain starting at entry 0x0a. The cluster chain's length, 46, provides a sanity check as it is the smallest number of clusters required to store the file's 92889 bytes of data (46 * 2KB -> 94208B).

With that we now have all the code required to examine a FAT file system disk image. With a little work, the code fragments provided in this and prior posts can be used to create a program capable of inspecting file system images of varying size and complexity.

In the next post, we'll regroup by using the functionality developed thus far to inspect various file system images to determine the best approach for recovering deleted JPG files.