Thursday, January 28, 2010

Recovering Deleted JPEGs from a FAT File System - Part 7

Part 7 in a series of posts on recovering deleted JPEG files from a FAT file system.

After a hiatus due to the holidays, personal matters, and work-stuff I'm ready to continue the FAT Recover project. In this post I'll

  • finally demonstrate the two principal assumptions that this project is based on.
  • actually recover deleted files using a manual approach.

Follow the "read more" link for the detailed discussion.

But first a mea culpa - I've discovered that the code for long file name support in post 5 is utterly broken. The code works for files that only use a single long file name entry but does not correctly process file names spanning multiple entries. For now, I'll side-step the issue and post a fix at a later time. I'll also update post 5 to warn future readers. Apologies for the error - that's the danger of hacking in the wee hours and minimal unit testing.

By this point, it's likely apparent that this project is based on two assumptions:

  1. Deleting a file from a FAT file system often does not erase the associated data.
  2. In simple cases - like a digital camera saving pictures on a fresh memory card - files are stored contiguously in FAT file systems.

To test if these assumptions hold, let's create and analyze a new disk image with multiple JPEG files.

Expanding on the method from post 2, the following shell script creates a new disk image and copies to it ten images from the test corpus.

#!/bin/bash

IMAGENAME=frtest2
VOLNAME=FRTEST2
VOLSIZE='-megabytes 16'
MOUNTPATH=/Volumes/${VOLNAME}
TESTPATH=/Users/jcardent/projects/fatrecover2/test/images
TESTFILES=`ls -1Sr ${TESTPATH}/4*.jpg | head -10`

hdiutil create \
         -fs "MS-DOS FAT16" \
         ${VOLSIZE} \
         -layout NONE \
         -volname ${VOLNAME} ${IMAGENAME}

hdiutil attach ${IMAGENAME}.dmg

for file in $TESTFILES
do
    echo "Copying file " ${file}
    cp ${file} ${MOUNTPATH}
done  

hdiutil detach /Volumes/${VOLNAME}

Mounting the image and listing the root directory confirms that ten image files were copied to the encapsulated FAT file system.

$ hdiutil attach frtest2.dmg 
/dev/disk1                      /Volumes/FRTEST2

$ls -ao /Volumes/FRTEST2/
total 712
drwxrwxrwx  1 jcardent  16384 Jan 26 07:18 .
drwxrwxrwt@ 4 root        136 Jan 26 07:18 ..
drwxrwxrwx@ 1 jcardent   2048 Jan 26 07:18 .Trashes
-rwxrwxrwx  1 jcardent   4096 Jan 26 06:30 ._.Trashes
drwxrwxrwx  1 jcardent   2048 Jan 26 07:18 .fseventsd
-rwxrwxrwx  1 jcardent  23423 Jan 26 06:30 4.1.01.jpg
-rwxrwxrwx  1 jcardent  21630 Jan 26 06:30 4.1.02.jpg
-rwxrwxrwx  1 jcardent  12278 Jan 26 06:30 4.1.03.jpg
-rwxrwxrwx  1 jcardent  22504 Jan 26 06:30 4.1.04.jpg
-rwxrwxrwx  1 jcardent  22935 Jan 26 06:30 4.1.05.jpg
-rwxrwxrwx  1 jcardent  35456 Jan 26 06:30 4.1.06.jpg
-rwxrwxrwx  1 jcardent  13670 Jan 26 06:30 4.1.07.jpg
-rwxrwxrwx  1 jcardent  18166 Jan 26 06:30 4.1.08.jpg
-rwxrwxrwx  1 jcardent  77486 Jan 26 06:30 4.2.01.jpg
-rwxrwxrwx  1 jcardent  85332 Jan 26 06:30 4.2.05.jpg

Since the last post, I've made a number of enhancements to the fatrecover program that I am developing for this series. One new feature is a simple shell that allows interactively analyzing a disk image. Building and running the program results in:

$ sw_vers
ProductName:    Mac OS X
ProductVersion: 10.6.2
BuildVersion:   10C540

$ wc -l fatrecover.c 
    1348 fatrecover.c

$ make all
gcc -g -Wall fatrecover.c -o fatrecover

$./fatrecover frtest2.dmg 

=== FAT RECOVER v0.0 ===

Opening file frtest2.dmg................OK
Reading boot sector.....................OK
Processing boot sector..................OK
Reading root dir........................OK

AT YOUR COMMAND:

>> help
      quit     exits program
      help     displays commands
   bsector     prints boot sector
   fsareas     prints size and location of fs areas
       fat     prints file allocation table
        ls     prints current directory
     chain     prints cluster chain
    contig     prints file continuity status
  checksum     prints cluster checksums
   extract     extracts clusters to file

>>

Listing the root directory within fatrecover also shows the ten copied files (and the volume name entry which is usually hidden).

>> ls

LISTING DIR: ROOT
 ATTR     SIZE            NAME          1ST CLUSTER
------  --------  --------------------  -----------
.....A     85332            4.2.05.jpg  (0x000083)
.....A     77486            4.2.01.jpg  (0x00005d)
.....A     35456            4.1.06.jpg  (0x00004b)
.....A     23423            4.1.01.jpg  (0x00003f)
.....A     22935            4.1.05.jpg  (0x000033)
.....A     22504            4.1.04.jpg  (0x000028)
.....A     21630            4.1.02.jpg  (0x00001d)
.....A     18166            4.1.08.jpg  (0x000014)
.....A     13670            4.1.07.jpg  (0x00000d)
.....A     12278            4.1.03.jpg  (0x000007)
.H..D.         0            .fseventsd  (0x000006)
.H..D.         0              .Trashes  (0x000002)
.H...A      4096            ._.Trashes  (0x000003)
...V.A         0              FRTEST2.  (00000000)

Printing the FAT table suggests a well-ordered layout of the JPEG files in the data area.

>> fat

FAT TABLE ( KEY: .=free  B=bad  -=used X=last R=reserved)

00000000-0000001F: RXX-X..-----X------X--------X---
00000020-0000003F: -------X----------X-----------X-
00000040-0000005F: ----------X-----------------X---
00000060-0000007F: --------------------------------
00000080-0000009F: --X-----------------------------
000000A0-000000BF: ------------X...................
000000C0-000000DF: ................................

Inspecting the cluster chains of the first three files confirms their contiguous layout.

>> chain 0x7

0x0007 -> 0x0008 -> 0x0009 -> 0x000a -> 0x000b -> 
0x000c -> EOC
(6 clusters, check 6, Contiguous)

>> chain 0xd

0x000d -> 0x000e -> 0x000f -> 0x0010 -> 0x0011 -> 
0x0012 -> 0x0013 -> EOC
(7 clusters, check 7, Contiguous)

>> chain 0x14

0x0014 -> 0x0015 -> 0x0016 -> 0x0017 -> 0x0018 -> 
0x0019 -> 0x001a -> 0x001b -> 0x001c -> EOC
(9 clusters, check 9, Contiguous)

For convenience, I implemented the contig shell command which lists each file and reports if it is stored contiguously in the file system.

>> contig

LAYOUT STATUS OF FILES IN DIR: ROOT

       NAME          1ST      LENGTH    CONTIGUOUS?
 ---------------  --------  --------  -------------
      4.2.05.jpg  0x000083       42     CONTIGUOUS
      4.2.01.jpg  0x00005d       38     CONTIGUOUS
      4.1.06.jpg  0x00004b       18     CONTIGUOUS
      4.1.01.jpg  0x00003f       12     CONTIGUOUS
      4.1.05.jpg  0x000033       12     CONTIGUOUS
      4.1.04.jpg  0x000028       11     CONTIGUOUS
      4.1.02.jpg  0x00001d       11     CONTIGUOUS
      4.1.08.jpg  0x000014        9     CONTIGUOUS
      4.1.07.jpg  0x00000d        7     CONTIGUOUS
      4.1.03.jpg  0x000007        6     CONTIGUOUS
      ._.Trashes  0x000003        2     CONTIGUOUS

The contig command confirms that all of the copied files are indeed stored contiguously. This evidence supports the second assumption - contiguous storage - but what about the first assumption that file data doesn't get erased after deletion?

I've also enhanced fatrecover with the ability to checksum a series of clusters using the FNV hash algorithm. Hashing the clusters for the first three files yields:

>> checksum 0x7 6

0007-000A: 621102b1 5f618b21 d2855912 f5e1af11 
000B-000C: 5587dd40 8988110b 

>> checksum 0xd 7

000D-0010: e736c48d 2554ee35 e613f172 fe82cc2f 
0011-0013: de7bf92b 515eea02 acefd6c3 

>> checksum 0x14 9

0014-0017: 73e7cfea cdc2d024 5625892b 560cd9b1 
0018-001B: e5257e13 49fddb98 44610606 6011d7c7 
001C-001C: eadda707 

To validate these checksums, I also implemented a standalone FNV hash utility to checksum the original image files in 2KB chunks (last chunk padded with 0s).

$ ./fnvtest 4.1.03.jpg

0000-0003: 621102b1 5f618b21 d2855912 f5e1af11 
0004-0007: 5587dd40 8988110b 

$ ./fnvtest 4.1.07.jpg 

0000-0003: e736c48d 2554ee35 e613f172 fe82cc2f 
0004-0007: de7bf92b 515eea02 acefd6c3 

$ ./fnvtest 4.1.08.jpg 

0000-0003: 73e7cfea cdc2d024 5625892b 560cd9b1 
0004-0007: e5257e13 49fddb98 44610606 6011d7c7 
0008-000B: eadda707 

They match! This indicates that we have indeed located the data for these files in the disk image.

Now let's mount the disk image and delete the JPEG files.

$ hdiutil attach frtest2.dmg 
/dev/disk1                      /Volumes/FRTEST2

$ rm /Volumes/FRTEST2/4*.jpg

$ ls -ao /Volumes/FRTEST2/
total 48
drwxrwxrwx  1 jcardent  16384 Jan 28 06:59 .
drwxrwxrwt@ 4 root        136 Jan 28 06:59 ..
drwxrwxrwx@ 1 jcardent   2048 Jan 28 06:59 .Trashes
-rwxrwxrwx  1 jcardent   4096 Jan 26 06:30 ._.Trashes
drwxrwxrwx  1 jcardent   2048 Jan 28 06:59 .fseventsd

$ hdiutil detach /Volumes/FRTEST2/
"disk1" unmounted.
"disk1" ejected.

Re-examining the image in fatrecover confirms that the files are no longer listed in the root directory and that their associated clusters have been marked as unused in the FAT table.

>> ls

LISTING DIR: ROOT
 ATTR     SIZE            NAME          1ST CLUSTER
------  --------  --------------------  -----------
.H..D.         0            .fseventsd  (0x000006)
.H..D.         0              .Trashes  (0x000002)
.H...A      4096            ._.Trashes  (0x000003)
...V.A         0              FRTEST2.  (00000000)

>> fat

FAT TABLE ( KEY: .=free  B=bad  -=used X=last R=reserved)

00000000-0000001F: RXX-XXX.........................
00000020-0000003F: ................................
00000040-0000005F: ................................
00000060-0000007F: ................................
00000080-0000009F: ................................
000000A0-000000BF: ................................
000000C0-000000DF: ................................

Checksumming the same clusters as before results in:

>> checksum 0x7 6

0007-000A: 621102b1 5f618b21 d2855912 f5e1af11 
000B-000C: 5587dd40 8988110b 

>> checksum 0xd 7

000D-0010: e736c48d 2554ee35 e613f172 fe82cc2f 
0011-0013: de7bf92b 515eea02 acefd6c3 

>> checksum 0x14 9

0014-0017: 73e7cfea cdc2d024 5625892b 560cd9b1 
0018-001B: e5257e13 49fddb98 44610606 6011d7c7 
001C-001C: eadda707 

The same checksums! Although the files were deleted the data remains unchanged in the file system! This supports the first assumption that deleting files does not erase their associated data.

To recover the deleted files, all that is needed is to extract the data from the disk image and write it to a new file. I added the extract command to do just that:

>> extract 0x7 6 recover1.jpg
Extracting clusters:
0x07 0x08 0x09 0x0a 0x0b 0x0c 

>> extract 0xd 7 recover2.jpg
Extracting clusters:
0x0d 0x0e 0x0f 0x10 0x11 0x12 0x13 

>> extract 0x14 9 recover3.jpg
Extracting clusters:
0x14 0x15 0x16 0x17 0x18 0x19 0x1a 0x1b 0x1c 

The recovered files are slightly larger than the originals but this is due to extracting the entire last cluster even though it was only partially used.

$ ls -ao images/4.1.0[378].jpg 
-rw-r--r--  1 jcardent  12278 Dec  4 21:16 images/4.1.03.jpg
-rw-r--r--  1 jcardent  13670 Dec  4 21:16 images/4.1.07.jpg
-rw-r--r--  1 jcardent  18166 Dec  4 21:16 images/4.1.08.jpg

$ ls -ao recover*.jpg
-rw-r--r--  1 jcardent  12288 Jan 28 10:50 recover1.jpg
-rw-r--r--  1 jcardent  14336 Jan 28 10:50 recover2.jpg
-rw-r--r--  1 jcardent  18432 Jan 28 10:50 recover3.jpg

Comparing the original and recovered files with ImageMagik's identify, and compare utilities results in:

$ identify images/4.1.0[378].jpg
images/4.1.03.jpg JPEG 256x256 256x256+0+0 8-bit DirectClass 12kb 
images/4.1.07.jpg[1] JPEG 256x256 256x256+0+0 8-bit DirectClass 13.3kb 
images/4.1.08.jpg[2] JPEG 256x256 256x256+0+0 8-bit DirectClass 17.7kb 

$ identify recover[123].jpg
recover1.jpg JPEG 256x256 256x256+0+0 8-bit DirectClass 12kb 
recover2.jpg[1] JPEG 256x256 256x256+0+0 8-bit DirectClass 14kb 
recover3.jpg[2] JPEG 256x256 256x256+0+0 8-bit DirectClass 18kb 

$ compare -metric RMSE images/4.1.03.jpg recover1.jpg null:
0 (0)

$ compare -metric RMSE images/4.1.07.jpg recover2.jpg null:
0 (0)

$ compare -metric RMSE images/4.1.08.jpg recover3.jpg null:
0 (0)

Success, the recovered files are identical to the originals! For further confirmation, I opened these files with Apple's Preview application and visually compared the recovered images to the originals - they were indeed identical.

It's important to note that this manual recovery process worked because we knew where the files originally were in the file system. In the real use-case, the location of the original files is unknown so a method is needed to discover the beginning and ending of deleted files. In the next post, I'll begin developing such a method.