Part 5 in a series of posts on recovering deleted JPEG files from a FAT file system.
UPDATE: 2010/1/28: it's come to my attention that the long file name support code described in this post is broken - it only works for files that utilize a single LFN entry. For the time being I plan to work around this bug for the FATRecover series but I do intend to post a fix eventually. I'll be sure to add another update with a link to the fix when it is available.
The topic of this installment is the root directory and the code required to process it. Although strictly speaking the file allocation table is the next major on-disk area, I think covering the root directory first will make the overall discussion clearer.
Note that this is a watershed post as the "hack" level of the example code increases dramatically. This was unfortunate but necessary to accommodate certain FAT functionality while limiting the discussion's length. That said, readers with sufficient skill should be able to use the example code to create more robust implementations.
Follow the "read more" link for a detailed description of how to read and interpret the root directory.
As discussed in parts 3 and 4, the root directory area is located on-disk immediately after the FAT area and consumes a fixed amount of pre-allocated space. The root directory area is essentially an array of 32 byte entries with the following structure:
#define FAT_DIRENTS_FNSZ (8) #define FAT_DIRENTS_EXTSZ (3) typedef struct fatDirectoryEntrySFN_s { uint8_t filename[FAT_DIRENTS_FNSZ]; // 07-00 uint8_t extension[FAT_DIRENTS_EXTSZ]; // 10-08 uint8_t attributes; // 11-11 uint8_t reserved1; // 12-12 uint8_t createTimeDsec; // 13-13 uint16_t createTimeHMS; // 15-14 uint16_t createTimeDay; // 17-16 uint16_t accessedDay; // 19-18 uint16_t firstClusterHigh; // 21-20 uint16_t writeTimeHMS; // 23-22 uint16_t writeDay; // 25-24 uint16_t firstCluster; // 26-27 uint32_t fileSize; // 31-28 } __attribute__ ((packed)) fatDirectoryEntrySFN_t;
The the first byte of the filename
field specifies an entry's
allocation status. Initially, all directory table entries are in the
UNUSED
state (0x00
). Entries that are allocated and then later
deallocated (due to file system deletions) are moved to the DELETED
state (0xE5
). The first byte of valid directory entries contains the
first file name character which must have a value other than 0x00
or
0xE5
(if necessary the value 0x05
is used to encode the character
value 0xE5
).
File systems are expected to allocate directory entries sequentially
starting with the first. As a result, partially allocated directory tables
should consist of an inter-mixture of allocated and
DELETED
entries followed by one or more UNUSED
entries. Directory
tables that have been fully utilized will consist of only allocated
and DELETED
entries. When allocating entries, file systems can
either re-use DELETED
entries or allocate the next UNUSED
entry
(if one exists) - the preference is implementation specific.
The following code fragment provides #defines
and a macro that can
be used to determine a directory entry's allocation status.
#define FAT_DIRENTS_FNUNUSED (0x00) #define FAT_DIRENTS_FNDELETED (0xE5) #define FAT_DIRENTS_FNVALID(pfatDE) \ ((FAT_DIRENTS_FNUNUSED != (pfatDE)->filename[0]) && \ (FAT_DIRENTS_FNDELETED != (pfatDE)->filename[0]))
Valid directory entries contain the associated file system object's
name in 8.3 format. The filename
field contains the base file name
while the extension
field may contain an optional 3 character
extension. As with the boot sector's text fields, the filename
and
extension
fields are padded with spaces and lack null
termination.
The attributes
field contains flags describing the entry's type and
characteristics. The following code fragment defines the various flags
and provides macros for testing them.
#define FAT_DIRATT_READONLY (0x01) #define FAT_DIRATT_HIDDEN (0x02) #define FAT_DIRATT_SYSTEM (0x04) #define FAT_DIRATT_VOLUMELABEL (0x08) #define FAT_DIRATT_LFN (0x0f) #define FAT_DIRATT_DIRECTORY (0x10) #define FAT_DIRATT_ARCHIVE (0x20) #define FAT_DIRATT_MASK (0x3f); #define FAT_DIRATT_ISREADONLY(x) \ (((x) & FAT_DIRATT_READONLY) == FAT_DIRATT_READONLY) #define FAT_DIRATT_ISHIDDEN(x) \ (((x) & FAT_DIRATT_HIDDEN) == FAT_DIRATT_HIDDEN) #define FAT_DIRATT_ISSYSTEM(x) \ (((x) & FAT_DIRATT_SYSTEM) == FAT_DIRATT_SYSTEM) #define FAT_DIRATT_ISVOLUMELABEL(x) \ (((x) & FAT_DIRATT_VOLUMELABEL) == FAT_DIRATT_VOLUMELABEL) #define FAT_DIRATT_ISLFN(x) \ (((x) & FAT_DIRATT_LFN) == FAT_DIRATT_LFN) #define FAT_DIRATT_ISDIRECTORY(x) \ (((x) & FAT_DIRATT_DIRECTORY) == FAT_DIRATT_DIRECTORY) #define FAT_DIRATT_ISARCHIVE(x) \ (((x) & FAT_DIRATT_ARCHIVE) == FAT_DIRATT_ARCHIVE)
Most of the attributes should be self-explanatory. Later in the post
we'll discuss the LFN
attribute and its dire implications.
For this project, the only other fields of interest are firstCluster
and fileSize
. The firstCluster
field specifies the data area
cluster containing the associated file system object's first chunk of
data - the next post on the file allocation table will discuss this in
greater depth. The fileSize
field indicates a file's size in bytes -
non-file entries should have a value of 0 in this field.
For the program I am re-developing alongside these posts, I took the following approach to processing the root directory:
-
malloc
a temporary buffer the same size as the root directory area. - Read the on-disk root directory area into the buffer.
- Iterate over the directory table entries in the buffer and create an abstracted data structure for each valid entry.
-
free
the temporary buffer
Below is a high-level code fragment that implements the above steps:
pRootDirBuff = malloc(rootdirSizeBytes); memset(pRootDirBuff, 0, rootdirSizeBytes); frSectorRead(pimageInfo, pimageInfo->rootdirFirstSector, pimageInfo->rootdirSizeSectors, pRootDirBuff); frDirProcessEntries(&(pimageInfo->rootDir), pRootDirBuff, rootdirSizeBytes); free(pRootDirBuff);
pimageInfo
is a housekeeping data structure my program uses to pass
around context information like the disk image's open file descriptor,
copies of the boot sector structures, and the information calculated
in part 4. frSectorRead()
is a utility routine that reads the
specified sector extent from the disk image file into the memory
buffer provided. All of the real action occurs in
frDirProcessEntries()
.
My first implementation of frDirProcessEntries()
looked something
like the following:
int frDirProcessEntries(frDirEntry_t *pParentDir, char *pdirBuff, size_t sizeBuff) { fatDirectoryEntrySFN_t *fatDirectoryTable; size_t numFatDirTableEntries; fatDirectoryTable = (fatDirectoryEntrySFN_t *) pdirBuff; numFatDirTableEntries = sizeBuff / sizeof(fatDirectoryEntrySFN_t); for (entryIndex = 0; entryIndex < numFatDirTableEntries; entryIndex++) { if (!(FAT_DIRENTS_FNVALID(&(fatDirectoryTable[entryIndex])))) { continue; } entryCount++; pDirEntry = (frDirEntry_t * )malloc(sizeof(frDirEntry_t)); memset(pDirEntry, 0, sizeof(frDirEntry_t)); ptmp = (char *) &(pDirEntry->filename); ptmp += utilCopyNameChars((char *)fatDirectoryTable[entryIndex].filename, ptmp, (int) FAT_DIRENTS_FNSZ, 1); if (FAT_DIRENTS_FNVALID(&(fatDirectoryTable[entryIndex]))) { *ptmp = '.'; ptmp++; ptmp += utilCopyNameChars((char *)fatDirectoryTable[entryIndex].extension, ptmp, (int) FAT_DIRENTS_EXTSZ, 1); } *ptmp = '\0'; pDirEntry->attributes = fatDirectoryTable[entryIndex].attributes; pDirEntry->fileSize = fatDirectoryTable[entryIndex].fileSize; pDirEntry->firstCluster = fatDirectoryTable[entryIndex].firstCluster; pDirEntry->pParentDir = pParentDir; pDirEntry->pNextSibling = pParentDir->pDirEntries; pParentDir->pDirEntries = pDirEntry; pDirEntry = NULL; } return entryCount; }
Basically, the routine:
- Calculates the number of directory entries in the buffer provided.
- Casts it as an array of directory entries.
-
Iterates over the entries via a
for
loop. - Checks the first byte of each entry to see if it is valid.
- Allocates an abstracted data for each valid entry.
- Copies the file name and other metadata to the abstract structure.
- Adds the abstract structure to the parent directory's linked list of contents.
Technically speaking the for
loop should terminate when a
FAT_DIRENTS_FNUNUSED
entry is encountered but the code above just
skips over such entries and continues processing the table - this
shouldn't be a problem so I didn't bother to fix it (this is a hobby
endeavor after all).
As mentioned, frDirEntry_t
is an abstracted data structure used to
represent directory entries. Its structure can be inferred from the
code above. The motivation for using a secondary structure is to avoid
re-processing the raw entries for each listing (the importance of this
will become clear later).
utilCopyNameChars()
is a utility routine that copies characters from
the raw directory entry text fields to a C string.
int utilCopyNameChars(char *srcString, char *destString, int maxChars, int srcSkip) { // Copy characters from srcString to destString until // a space character, null character, 0xff value, or // maxChars is reached. // // Returns the number of characters copied char *psrcChar = srcString; char *pdestChar = destString; int count = 0; while ((*psrcChar != ' ') && (*psrcChar != '\0') && (((unsigned char)*psrcChar) != 0xff) && (count < maxChars)) { *pdestChar = *psrcChar; psrcChar += srcSkip; pdestChar++; count++; } return count; }
The while
loop terminates upon reaching either a space character,
NULL character, or the value 0xff
(which will be explained
later). The skip
argument supports a hack that is explained below.
After adding a short routine to print the abstracted directory entry
structures (which I won't show), I got the following output from
running the improved fatrecover
program against the test image,
frtest1.dmg
, from the prior two posts.
LISTING DIR: ROOT ATTR SIZE NAME 1ST CLUSTER ------ -------- -------------------- ----------- .....A 92889 4204~1.JPG (0x00000a) RHSV.. -1 A4. (00000000) .H..D. 0 FSEVEN~1. (0x000008) RHSV.. -1 A.. (00000000) .H..D. 0 TRASHE~1. (0x000002) RHSV.. -1 A.. (00000000) .H...A 4096 _~1.TRA (0x000003) RHSV.. -1 A.. (00000000) ...V.A 0 FRTEST1. (00000000)
YUCK! The funky file names, -1
sizes, and simultaneous setting of
the RHSV
flags indicated that I had run into the dreaded (by me)
long file name support.
When I first did this project in 2004 I punted on handling long file names. I considered doing the same for this re-implementation but the idea didn't sit well with me - I have a hard time avoiding the same problem twice. Besides, without long file name support I would have had to modify the test disk image to clean up the output and update the prior posts as needed. This felt too much like hiding dirty laundry so I decided to suck it up and hack together long file name support - this isn't going to be pretty.
Long file names are supported by using multiple directory entries with the following structure:
#define FAT_DIRENTL_FN1SZ (10) #define FAT_DIRENTL_FN2SZ (12) #define FAT_DIRENTL_FN3SZ (4) typedef struct fatDirectoryEntryLFN_s { uint8_t sequenceNum; // 00-00 uint8_t filename1[FAT_DIRENTL_FN1SZ]; // 10-01 uint8_t attributes; // 11-11 uint8_t reserved; // 12-12 uint8_t checksum; // 13-13 uint8_t filename2[FAT_DIRENTL_FN2SZ]; // 25-14 uint8_t reserved2[2]; // 27-26 uint8_t filename3[FAT_DIRENTL_FN3SZ]; // 31-28 } __attribute__ ((packed)) fatDirectoryEntryLFN_t;
These long file name (LFN) structures are identified by the RHSV
flags being simultaneously set and precede their associated 8.3
entry (SFN) in the directory table. The 8.3
directory entry includes
a shortened version of the long file name and the object's associated
metadata (i.e. size, starting cluster, etc).
Each LFN entry includes a sequence number indicating its order. The
entries are stored in highest to lowest order in the directory table
making it easier to determine their total number. The last LFN entry
has a a sequence number of 1 bitwise OR
'd with the value 0x40
.
Below is a macro that can be used to check for the 0x40
marker:
#define FAT_DIRENTL_LASTSEQ (0x40) #define FAT_DIRENTL_ISLASTSEQ(x) \ (((x) & FAT_DIRENTL_LASTSEQ) == FAT_DIRENTL_LASTSEQ)
The following is a simple example of an LFN and SFN directory table sequence:
| ENTRY# | TYPE | SEQ# | |--------+------+------| | 1 | LFN | 0x04 | | 2 | LFN | 0x03 | | 3 | LFN | 0x02 | | 4 | LFN | 0x41 | | 5 | SFN | -- |
Each LFN entry has three fields for storing file name characters. Some or all of an LFN entry's text fields may contain file name characters. Those that do will have a non-null value in the first byte. Below are macros that can be used to test each field for valid contents:
#define FAT_DIRENTL_FNUNUSED (0x00) #define FAT_DIRENTL_FN1VALID(pfatDEL) \ (FAT_DIRENTL_FNUNUSED != (pfatDEL)->filename1[0]) #define FAT_DIRENTL_FN2VALID(pfatDEL) \ (FAT_DIRENTL_FNUNUSED != (pfatDEL)->filename2[0]) #define FAT_DIRENTL_FN3VALID(pfatDEL) \ (FAT_DIRENTL_FNUNUSED != (pfatDEL)->filename3[0])
As with other FAT structures, these fields lack null termination so they must be handled similarly to other on-disk text fields.
Unfortunately, unlike the other on-disk structures, the LFN entry text
fields use a UTF-16 UNICODE encoding. I prefer my tangents one at a
time and since I don't have much experience in dealing with UTF-16 in
C I decided to hack a shortcut (I warned you). Since I know my test
images will only use UTF-8 characters I decided to only copy the
low-order byte of the UTF-16 characters. I accomplished this by
calling utilCopyNameChars()
with a skip
argument value of 2 and a maxChars
argument value of the field length divided by 2. Not
my proudest coding moment but it worked.
To handle LFN entries, I modified my original directory table
processing for
loop to include a sub while
loop to iterate over
the LFN entries. The code fragment below shows the modified loop
along with comments documenting the (many) assumptions made (the
remainder of frDirProcessEntries()
remained the same):
fatDirectoryTable = (fatDirectoryEntrySFN_t *) pdirBuff; numFatDirTableEntries = sizeBuff / sizeof(fatDirectoryEntrySFN_t); for (entryIndex = 0; entryIndex < numFatDirTableEntries; entryIndex++) { if (!(FAT_DIRENTS_FNVALID(&(fatDirectoryTable[entryIndex])))) { continue; } entryCount++; pDirEntry = (frDirEntry_t * )malloc(sizeof(frDirEntry_t)); memset(pDirEntry, 0, sizeof(frDirEntry_t)); ptmp = (char *) &(pDirEntry->filename); if (FAT_DIRATT_ISLFN(fatDirectoryTable[entryIndex].attributes)) { // N.B. - Hack alert. Rudimentary support for long file names. // Assumes that LFN entries are valid (no checksum checking). // Assumes that last LFN marked with end of sequence flag // Assumes that file names are in UTF-16. // Assumes that all LFN entries exist and are in proper order. // Assumes that LFN and SFN entries are contiguous. // Assumes that file name is less than 256 characters long. // // Iterates over LFN entries, copies filename to pDirEntry. // Leaves entryIndex at the associated SFN. fatDirectoryEntryLFN_t *pfatDirEntLFN = NULL; do { pfatDirEntLFN = (fatDirectoryEntryLFN_t *)&(fatDirectoryTable[entryIndex]); assert(FAT_DIRATT_ISLFN(fatDirectoryTable[entryIndex].attributes)); assert(entryIndex < numFatDirTableEntries); if (FAT_DIRENTL_FN1VALID(pfatDirEntLFN)) { ptmp += utilCopyNameChars((char *)pfatDirEntLFN->filename1, ptmp, (int) FAT_DIRENTL_FN1SZ/2, 2); } if (FAT_DIRENTL_FN2VALID(pfatDirEntLFN)) { ptmp += utilCopyNameChars((char *)pfatDirEntLFN->filename2, ptmp, (int) FAT_DIRENTL_FN2SZ/2, 2); } if (FAT_DIRENTL_FN3VALID(pfatDirEntLFN)) { ptmp += utilCopyNameChars((char *)pfatDirEntLFN->filename3, ptmp, (int) FAT_DIRENTL_FN3SZ/2, 2); } entryIndex++; } while (!(FAT_DIRENTL_ISLASTSEQ(pfatDirEntLFN->sequenceNum))); } else { // copy file name and extension ptmp += utilCopyNameChars((char *)fatDirectoryTable[entryIndex].filename, ptmp, (int) FAT_DIRENTS_FNSZ, 1); if (FAT_DIRENTS_FNVALID(&(fatDirectoryTable[entryIndex]))) { *ptmp = '.'; ptmp++; ptmp += utilCopyNameChars((char *)fatDirectoryTable[entryIndex].extension, ptmp, (int) FAT_DIRENTS_EXTSZ, 1); } } *ptmp = '\0'; pDirEntry->attributes = fatDirectoryTable[entryIndex].attributes; pDirEntry->fileSize = fatDirectoryTable[entryIndex].fileSize; pDirEntry->firstCluster = fatDirectoryTable[entryIndex].firstCluster; pDirEntry->pParentDir = pParentDir; pDirEntry->pNextSibling = pParentDir->pDirEntries; pParentDir->pDirEntries = pDirEntry; pDirEntry = NULL; }
With this LFN hack in place, re-running the program against the test image produced:
$ sw_vers ProductName: Mac OS X ProductVersion: 10.6.2 BuildVersion: 10C540 $ wc -l fatrecover.c 632 fatrecover.c $ make clean all rm -r fatrecover fatrecover.dSYM gcc -g -Wall fatrecover.c -o fatrecover $ fatrecover frtest1.dmg === FAT RECOVER v0.0 === Opening file frtest1.dmg................OK Reading boot sector.....................OK Processing boot sector..................OK Reading root dir........................OK <BOOT SECTOR INFO REMOVED FOR BREVITY> LISTING DIR: ROOT ATTR SIZE NAME 1ST CLUSTER ------ -------- -------------------- ----------- .....A 92889 4.2.04.jpg (0x00000a) .H..D. 0 .fseventsd (0x000008) .H..D. 0 .Trashes (0x000002) .H...A 4096 ._.Trashes (0x000003) ...V.A 0 FRTEST1. (00000000)
Ah, much better. The first entry is the JPG image file which is 92889
bytes long and begins in cluster 0xa
. The .fseventsd
and
.Trashes
entries are artifacts from mounting the disk image on an
OSX system - they can be ignored. The last entry (which actually
appears first in the on-disk table) is a volume label as indicated by
the V
attribute flag.
As a sanity check, mounting the disk image and listing the root directory from OSX produces:
$ hdiutil attach frtest1.dmg /dev/disk1 /Volumes/FRTEST1 $ls -ao /Volumes/FRTEST1/ total 232 drwxrwxrwx 1 jcardent 16384 Dec 24 10:52 . drwxrwxrwt@ 4 root 136 Dec 24 10:52 .. drwxrwxrwx@ 1 jcardent 2048 Dec 24 10:52 .Trashes -rwxrwxrwx 1 jcardent 4096 Dec 14 05:28 ._.Trashes drwxrwxrwx 1 jcardent 2048 Dec 24 10:52 .fseventsd -rwxrwxrwx 1 jcardent 92889 Dec 14 05:28 4.2.04.jpg
Other than being in opposite orders the two listings match. Woot!
Before declaring total victory, there is one major shortcoming with
the LFN hack shown above - it requires an entire directory table to be
read into an in-memory buffer. My original intent was to process
directory tables one sector or cluster at a time to minimize the
memory footprint. However, the hack above can't handle LFN entries
spanning a sector or cluster boundary as this would involve separate
calls to frDirProcessEntries()
. I thought of a couple of ways to
achieve both goals but decided that they would unnecessarily
complicate the discussion as we'll only be dealing with small disk
images in this series. If you plan to use this code to handle
arbitrarily large file systems you will probably want to consider
solving this problem.
In the next post we'll tackle the last file system area, the file allocation table.