Sunday, March 21, 2010

Recovering Deleted JPEGs from a FAT File System - Part 9

Part 9 in a series of posts on recovering deleted JPEG files from a FAT file system.

A month ago (!), in part 8, we looked at the JPEG file format specification to determine if there was sufficient determinism in the on-disk layout to allow the recovery of deleted files through analyzing the residual data in the file system. The answer was mixed:

  1. GOOD: Uniquely valued markers, discoverable through data inspection, identify the beginning and type of the segments that constitute a JPEG file.
  2. GOOD: the metadata segments have a pre-defined size
  3. BAD: the length of the entropy encoded image data is, to the best of my knowledge, unspecified in the START-OF-SCAN segment header. Instead, an END-OF-IMAGE marker is used to identify the end of the entropy encoded data. The theory is that this is done to allow JPEG files to be written as the image is processed.

Essentially, this means that there is no way to determine through data inspection the length or location of the clusters containing the encoded image data. The only clue available is the END-OF-IMAGE marker at the end of the entropy encoded data.

One option is to discover and analyze latent directory entries in the data area - doing so could provide valuable clues to the start and length of erased JPEG files. The downsides to this approach are added complexity (recovering deleted directory entries) and incompleteness (directory entries for deleted JPEG files may not exist due to reuse).

A simpler approach is to inspect each cluster in the data area to see if it begins with a START-OF-IMAGE marker or contains an END-OF-IMAGE marker. Any extent of clusters bounded by START-OF-IMAGE and END-OF-IMAGE markers stands a good chance of being the data for a contiguous JPEG file - the very kind of file we've been trying to recover in this series. In this post, I'll implement this simple method and test the results. Follow the "Read more" think for the rest of the post.

I added the following code to the fatrecover program being developed alongside these posts to perform the marker scan (excerpted from a larger procedure).

#define JPEG_MARKER_COMMON (0xFF)
#define JPEG_MARKER_SOI    (0xD8)
#define JPEG_MARKER_EOI    (0xD9)

int firstCluster;
int lastCluster;
int clusterIndex;
int byteOffset;

unsigned char  lastMarkerType;
int            lastMarkerCluster;

printf("Scanning clusters:\n");
lastMarkerType    = 0;
lastMarkerCluster = 0;
for(clusterIndex = firstCluster; 
    clusterIndex < lastCluster; 
    clusterIndex++) {

  frClusterRead(pimageInfo, clusterIndex, 
                1, pimageInfo->tmpBuff);

  // N.B. - the following code does not cover the
  //        case of markers spanning cluster boundaries.
  //        Not sure if this is a legal condition for
  //        JPEG files. 
  for (byteOffset = 0; 
       byteOffset < (pimageInfo->clusterSizeBytes-1);
       byteOffset++) {

    // N.B. - SOI markers should be in the first two bytes of the
    //        cluster. This constraint can be relaxed to defeat
    //        attempts to hide images by adding a preamble.
    if ((byteOffset == 0) &&
        (JPEG_MARKER_COMMON == pimageInfo->tmpBuff[byteOffset]) &&
        (JPEG_MARKER_SOI    == pimageInfo->tmpBuff[byteOffset+1])) {
      printf("SOI cluster: %#08x\n",clusterIndex);

      lastMarkerType    = JPEG_MARKER_SOI;
      lastMarkerCluster = clusterIndex;
    }

    if ((JPEG_MARKER_COMMON == pimageInfo->tmpBuff[byteOffset]) &&
        (JPEG_MARKER_EOI    == pimageInfo->tmpBuff[byteOffset+1])) {
      printf("EOI cluster: %#08x  offset: %#x\n",
             clusterIndex, byteOffset);

      if (JPEG_MARKER_SOI == lastMarkerType) {
        printf("** CONTIG JPEG? start: %#08X  length: %d\n", 
               lastMarkerCluster,
               (clusterIndex-lastMarkerCluster+1));
      } 

      lastMarkerType    = JPEG_MARKER_EOI;
      lastMarkerCluster = clusterIndex;
    }
  }
}

Processing the post-deletion test disk image from post 7 with this new code via the shell command jpgscan results in:

$ sw_vers
ProductName:    Mac OS X
ProductVersion: 10.6.2
BuildVersion:   10C540

$ wc -l fatrecover.c 
    1431 fatrecover.c

$ make all
gcc -g -Wall fatrecover.c -o fatrecover

$ ./fatrecover frtest2.dmg 

=== FAT RECOVER v0.0 ===

Opening file frtest2.dmg................OK
Reading boot sector.....................OK
Processing boot sector..................OK
Reading root dir........................OK

AT YOUR COMMAND:

>> ls

LISTING DIR: ROOT
 ATTR     SIZE            NAME          1ST CLUSTER
------  --------  --------------------  -----------
.H..D.         0            .fseventsd  (0x000006)
.H..D.         0              .Trashes  (0x000002)
.H...A      4096            ._.Trashes  (0x000003)
...V.A         0              FRTEST2.  (00000000)

>> fat

FAT TABLE ( KEY: .=free  B=bad  -=used X=last R=reserved)

00000000-0000001F: RXX-XXX.........................
00000020-0000003F: ................................
00000040-0000005F: ................................
00000060-0000007F: ................................
00000080-0000009F: ................................
000000A0-000000BF: ................................
[output omitted for brevity]

>> jpgscan
Scanning clusters:
SOI cluster: 0x000007
EOI cluster: 0x00000c  offset: 0x7f4
** CONTIG JPEG? start: 0X000007  length: 6
SOI cluster: 0x00000d
EOI cluster: 0x000013  offset: 0x564
** CONTIG JPEG? start: 0X00000D  length: 7
SOI cluster: 0x000014
EOI cluster: 0x00001c  offset: 0x6f4
** CONTIG JPEG? start: 0X000014  length: 9
SOI cluster: 0x00001d
EOI cluster: 0x000027  offset: 0x47c
** CONTIG JPEG? start: 0X00001D  length: 11
SOI cluster: 0x000028
EOI cluster: 0x000032  offset: 0x7e6
** CONTIG JPEG? start: 0X000028  length: 11
SOI cluster: 0x000033
EOI cluster: 0x00003e  offset: 0x195
** CONTIG JPEG? start: 0X000033  length: 12
SOI cluster: 0x00003f
EOI cluster: 0x00004a  offset: 0x37d
** CONTIG JPEG? start: 0X00003F  length: 12
SOI cluster: 0x00004b
EOI cluster: 0x00005c  offset: 0x27e
** CONTIG JPEG? start: 0X00004B  length: 18
SOI cluster: 0x00005d
EOI cluster: 0x000082  offset: 0x6ac
** CONTIG JPEG? start: 0X00005D  length: 38
SOI cluster: 0x000083
EOI cluster: 0x0000ac  offset: 0x552
** CONTIG JPEG? start: 0X000083  length: 42

In part 7, we used the contig command to list the location and size of the contiguous JPEG files before they were deleted. For convenience, I copied the information from post 7 below:

>> contig

LAYOUT STATUS OF FILES IN DIR: ROOT

       NAME          1ST      LENGTH    CONTIGUOUS?
 ---------------  --------  --------  -------------
      4.2.05.jpg  0x000083       42     CONTIGUOUS
      4.2.01.jpg  0x00005d       38     CONTIGUOUS
      4.1.06.jpg  0x00004b       18     CONTIGUOUS
      4.1.01.jpg  0x00003f       12     CONTIGUOUS
      4.1.05.jpg  0x000033       12     CONTIGUOUS
      4.1.04.jpg  0x000028       11     CONTIGUOUS
      4.1.02.jpg  0x00001d       11     CONTIGUOUS
      4.1.08.jpg  0x000014        9     CONTIGUOUS
      4.1.07.jpg  0x00000d        7     CONTIGUOUS
      4.1.03.jpg  0x000007        6     CONTIGUOUS
      ._.Trashes  0x000003        2     CONTIGUOUS

Comparing the contig and jpgscan outputs confirms that the CONTIG JPEG? lines accurately reflect the location and size of the contiguous JPEG files before they were deleted - success, the simple marker scan method worked!

The contiguous files discovered by the scan can be recovered manually using the fatrecover utility's extract command that was implemented in part 7. With a little more effort, we can automate the recovery of such extents into files for further analysis. This is fairly straight forward and shouldn't require explanation.

In the next post, we'll take a look at what happens when there are non-contiguous files in the file system. After that, I'll try to fix the long file name code from post 5 and wrap up this part of series.