Analysing the byte entropy of a FAT formatted disk

Written by Tariq. Date: 2009-1-27

Over at the Honeynet Project they used to run security competitions which were quite a bit of fun. I remembered one in particular which I looked at but hadn’t completed. It dealt with the forensic investigation of a floppy disk. I was tinkering with an application to measure byte entropy and thinking of a way that it could be used in a forensic investigation. There is no point using the little application to analyse my terabyte (TB) sized drives so remembering the floppy disk challenge I downloaded the floppy disk image (1.44MB;MD5 = b676147f63923e1f428131d59b1d6a72).

Details

Solutions to this competition do exist and although I had read one of the solutions some 12 months ago I was fairly sure that any information was now gone from my head. So, hopefully, this won’t skew my investigation too much. The goal is to use entropy to discover information about the contents on the disk that aid in the completion of the challenge. Let’s go!

Competition questions

  • Who is Joe Jacob’s supplier of marijuana and what is the address listed for the supplier?
  • What crucial data is available within the coverpage.jpg file and why is this data crucial?
  • What (if any) other high schools besides Smith Hill does Joe Jacobs frequent?
  • For each file, what processes were taken by the suspect to mask them from others?
  • What processes did you (the investigator) use to successfully examine the entire contents of each file?
  • What Microsoft program was used to create the Cover Page file. What is your proof (Proof is the key to getting this question right, not just making a guess).

I didn’t have these questions in front of me when I performed the investigation so sorry for being a little random. I wrote things down as I progressed and filled in a few lines in italics to reference the actual questions later on.

Image verification

$ md5sum image.zip b676147f63923e1f428131d59b1d6a72 image.zip
I then unzipped the contents which gave me a single file named image to work with. Excellent.

Entropy analysis of the entire disk

The rest of this article continues by using the now hacked to pieces ByteEntropy.java program. You’ll need to grab that file and compile it or download the classfile and run it as shown below; this doesn’t negate the requirement of having Java installed on your system.

$ java ByteEntropy -f image -w 200 -o image.ent -r 512 -c

This command creates an entropy file image.ent of the number of different bytes in a sliding 200 byte window. The entropy is averaged over 512 bytes (a cluster) and a graph Image.ent.png is also created. This is shown below.

Image.ent.png Entropy of image

As can be seen from the graph there is nothing too interesting happening after the first 120 clusters. Let’s zoom in on that entropy.

$ dd if=image of=image_0t120 bs=512 count=120 skip=31
120+0 records in 120+0 records out 61440 bytes (61 kB) copied, 0.004 s, 15 MB/s
$ java ByteEntropy -f image_0t120 -w 200 -o image_0t120.ent -r 512 -c -S Read,61440,bytes Took,109,ms Rate,0.0,MB/s

Here we have skipped over boot sector and FAT. We examine the entropy of the first 120 blocks as before. This gives us the following entropy graph (image_0t120.ent.png).

image_ot120.ent_.png Zoomed up entropy graph of image

Lets hypothesise about what could be on this disk from our earlier experimentation with different file types. Clusters 0-41 could contain plaintext as the entropy values are in the correct range. We also know from earlier experimentation entropy values of 125-145 indicate random, encrypted, or compressed data. There appears to be two files which fit the bill from the graph above around clusters ~42 and ~71. There is also some interesting low entropy chunks between these two files.

Looking at RAW bytes

The thing which interested me the most was a weird low entropy section of the disk between clusters 62 and 70. The contents should be easily viewable using a hex editor. So let’s take a look!

$ dd if=image bs=512 count=8 skip=96 | od -t a
0000000   "  nl nul   (   "  nl nul   (   "  nl nul   (   "  nl nul   (*
80007320  "+  nl nul   (0   "  nl nul records in   (   "  nl nul8   (   "
  nl nul+ del00007340  Y nul records out nul nul nul nul4096 nul nul nul
nul bytes     ( nul nul nul4.1 kB nul nul) copied0007360 nul nul nul nul,
nul nul0 nul nul nul s,    nul nul nulInfinity B nul nul nul/s nul
*
0007440   p   w   =   g   o   o   d   t   i   m   e   s nul nul nul nul
0007460 nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul
*
0010000

There is a repeated byte pattern (a28a0028) for much of that disk section and then out pops pw=goodtimes. Is goodtimes a password? As this information is effectively found in the slack space of cover_page.jpgc we have answered question 2 when we prove this is a password in a later section. Time to look at that low entropy section in the first 41 clusters.

 $ dd if=image bs=512 count=41 skip=31 | xxd -a | egrep -v “\.{4}”
 *
 0000790: 0000 2608 0000 5202 0000 780a 0000 5200  ..&...R...x...R.
 *
 *
 *
 0000e00: 4a69 6d6d 7920 4a75 6e67 6c65 0d36 3236  Jimmy Jungle.626
 0000e10: 204a 756e 676c 6520 4176 6520 4170 7420   Jungle Ave Apt
 0000e20: 320d 4a75 6e67 6c65 2c20 4e59 2031 3131  2.Jungle, NY 111
 0000e30: 3131 0d0d 4a69 6d6d 793a 0d0d 4475 6465  11..Jimmy:..Dude
 0000e40: 2c20 796f 7572 2070 6f74 206d 7573 7420  , your pot must
 0000e50: 6265 2074 6865 2062 6573 7420 9620 6974  be the best . it
 0000e60: 206d 6164 6520 7468 6520 636f 7665 7220   made the cover
 0000e70: 6f66 2048 6967 6820 5469 6d65 7320 4d61  of High Times Ma
 0000e80: 6761 7a69 6e65 2120 5468 616e 6b73 2066  gazine! Thanks f
 0000e90: 6f72 2073 656e 6469 6e67 206d 6520 7468  or sending me th
 0000ea0: 6520 436f 7665 7220 5061 6765 2e20 5768  e Cover Page. Wh
 0000eb0: 6174 2064 6f20 796f 7520 7075 7420 696e  at do you put in
 0000ec0: 2079 6f75 7220 736f 696c 2077 6865 6e20   your soil when
 0000ed0: 796f 7520 706c 616e 7420 7468 6520 6d61  you plant the ma
 0000ee0: 7269 6a75 616e 6120 7365 6564 733f 2041  rijuana seeds? A
 0000ef0: 7420 6c65 6173 7420 4920 6b6e 6f77 2079  t least I know y
 0000f00: 6f75 7220 6772 6f77 696e 6720 6974 2061  our growing it a
 0000f10: 6e64 206e 6f74 2073 6f6d 6520 6775 7920  nd not some guy
 0000f20: 696e 2043 6f6c 756d 6269 612e 0d20 0d54  in Columbia.. .T
 0000f30: 6865 7365 206b 6964 732c 2074 6865 7920  hese kids, they
 0000f40: 7465 6c6c 206d 6520 6d61 7269 6a75 616e  tell me marijuan
 0000f50: 6120 6973 6e92 7420 6164 6469 6374 6976  a isn.t addictiv
 0000f60: 652c 2062 7574 2074 6865 7920 646f 6e92  e, but they don.
 0000f70: 7420 7374 6f70 2062 7579 696e 6720 6672  t stop buying fr
 0000f80: 6f6d 206d 652e 204d 616e 2c20 4992 6d20  om me. Man, I.m
 0000f90: 7375 7265 2067 6c61 6420 796f 7520 746f  sure glad you to
 0000fa0: 6c64 206d 6520 6162 6f75 7420 7461 7267  ld me about targ
 0000fb0: 6574 696e 6720 7468 6520 6869 6768 2073  eting the high s
 0000fc0: 6368 6f6f 6c20 7374 7564 656e 7473 2e20  chool students.
 0000fd0: 596f 7520 6d75 7374 2068 6176 6520 736f  You must have so
 0000fe0: 6d65 2065 7870 6572 6965 6e63 652e 2049  me experience. I
 0000ff0: 7492 7320 6c69 6b65 2061 2067 7561 7261  t.s like a guara
 0001000: 6e74 6565 6420 7061 7963 6865 636b 2e20  nteed paycheck.
 0001010: 5468 6569 7220 7061 7265 6e74 7320 6769  Their parents gi
 0001020: 7665 2074 6865 6d20 6d6f 6e65 7920 666f  ve them money fo
 0001030: 7220 6c75 6e63 6820 616e 6420 7468 6579  r lunch and they
 0001040: 2073 7065 6e64 2069 7420 6f6e 206d 7920   spend it on my
 0001050: 7374 7566 662e 2049 926d 2061 6e20 656e  stuff. I.m an en
 0001060: 7472 6570 7265 6e65 7572 2e20 416d 2049  trepreneur. Am I
 0001070: 206f 6e6c 7920 6f6e 6520 796f 7520 7365   only one you se
 0001080: 6c6c 2074 6f3f 204d 6179 6265 2049 2063  ll to? Maybe I c
 0001090: 616e 2062 6563 6f6d 6520 6469 7374 7269  an become distri
 00010a0: 6275 746f 7220 6f66 2074 6865 2079 6561  butor of the yea
 00010b0: 7221 0d0d 4920 656d 6169 6c65 6420 796f  r!..I emailed yo
 00010c0: 7520 7468 6520 7363 6865 6475 6c65 2074  u the schedule t
 00010d0: 6861 7420 4920 616d 2075 7369 6e67 2e20  hat I am using.
 00010e0: 4920 7468 696e 6b20 6974 2068 656c 7073  I think it helps
 00010f0: 206d 6520 636f 7665 7220 6d79 7365 6c66   me cover myself
 0001100: 2061 6e64 206e 6f74 2062 6520 7072 6564   and not be pred
 0001110: 6963 7469 7665 2e20 2054 656c 6c20 6d65  ictive.  Tell me
 0001120: 2077 6861 7420 796f 7520 7468 696e 6b2e   what you think.
 0001130: 2054 6f20 6f70 656e 2069 742c 2075 7365   To open it, use
 0001140: 2074 6865 2073 616d 6520 7061 7373 776f   the same passwo
 0001150: 7264 2074 6861 7420 796f 7520 7365 6e74  rd that you sent
 0001160: 206d 6520 6265 666f 7265 2077 6974 6820   me before with
 0001170: 7468 6174 2066 696c 652e 2054 616c 6b20  that file. Talk
 0001180: 746f 2079 6f75 206c 6174 6572 2e0d 0d54  to you later...T
 *
 *
 0001410: 3408 0000 3b08 0000 3c08 0000 2d09 0000  4...;...<...-...
 *
 *0001810: 0807 22b0 0807 2390 a005 2490 a005 25b0  .."......$...%.
 *0001a40: 434a 1800 5f48 0104 614a 1800 6d48 0904  CJ.._H..aJ..mH..
 0001a80: 4400 6500 6600 6100 7500 6c00 7400 2000  D.e.f.a.u.l.t. .
 0001a90: 5000 6100 7200 6100 6700 7200 6100 7000  P.a.r.a.g.r.a.p.
 0001ad0: 2000 4e00 6f00 7200 6d00 6100 6c00 0000   .N.o.r.m.a.l...
 0001b50: 0000 3300 0000 3400 0000 3b00 0000 3c00  ..3...4...;...<.
 0001dd0: 2200 0000 2a00 0000 2d00 0000 2301 0000  "...*...-......
 0001e20: 2800 0000 2c00 0000 3200 0000 3200 0000  (...,...2...2...
 0001e70: 6865 6d61 732d 6d69 6372 6f73 6f66 742d  hemas-microsoft-
 0001e80: 636f 6d3a 6f66 6669 6365 3a73 6d61 7274  com:office:smart
 0001e90: 7461 6773 0680 5374 7265 6574 0080 3b00  tags..Street..;.
 0001eb0: 6d61 732d 6d69 6372 6f73 6f66 742d 636f  mas-microsoft-co
 0001ec0: 6d3a 6f66 6669 6365 3a73 6d61 7274 7461  m:office:smartta
 0001ed0: 6773 0780 6164 6472 6573 7300 8038 0000  gs..address..8..
 0001ef0: 6173 2d6d 6963 726f 736f 6674 2d63 6f6d  as-microsoft-com
 0001f00: 3a6f 6666 6963 653a 736d 6172 7474 6167  :office:smarttag
 0001f20: 002a 8075 726e 3a73 6368 656d 6173 2d6d  .*.urn:schemas-m
 0001f30: 6963 726f 736f 6674 2d63 6f6d 3a6f 6666  icrosoft-com:off
 0001f40: 6963 653a 736d 6172 7474 6167 7305 8070  ice:smarttags..p
 0001f60: 7572 6e3a 7363 6865 6d61 732d 6d69 6372  urn:schemas-micr
 0001f70: 6f73 6f66 742d 636f 6d3a 6f66 6669 6365  osoft-com:office
 0001f80: 3a73 6d61 7274 7461 6773 0580 5374 6174  :smarttags..Stat
 0001fa0: 3a73 6368 656d 6173 2d6d 6963 726f 736f  :schemas-microso
 0001fb0: 6674 2d63 6f6d 3a6f 6666 6963 653a 736d  ft-com:office:sm
 0001fc0: 6172 7474 6167 730a 8050 6f73 7461 6c43  arttags..PostalC
 0002130: 0700 5500 6e00 6b00 6e00 6f00 7700 6e00  ..U.n.k.n.o.w.n.
 00021a0: 6d00 6500 7300 2000 4e00 6500 7700 2000  m.e.s. .N.e.w. .
 00021b0: 5200 6f00 6d00 6100 6e00 0000 3516 9001  R.o.m.a.n...5...
 *
 *
 *
 0002ad0: 4a69 6d6d 7920 4a75 6e67 6c65 0000 6f00  Jimmy Jungle..o.
 0002b50: 7420 576f 7264 2031 302e 3000 4000 0000  t Word 10.0.@...
 *
 *
 0004c00: 5200 6f00 6f00 7400 2000 4500 6e00 7400  R.o.o.t. .E.n.t.
 *
 0004d00: 5700 6f00 7200 6400 4400 6f00 6300 7500  W.o.r.d.D.o.c.u.
 0004d80: 0500 5300 7500 6d00 6d00 6100 7200 7900  ..S.u.m.m.a.r.y.
 0004d90: 4900 6e00 6600 6f00 7200 6d00 6100 7400  I.n.f.o.r.m.a.t.
 0004e00: 0500 4400 6f00 6300 7500 6d00 6500 6e00  ..D.o.c.u.m.e.n.
 0004e10: 7400 5300 7500 6d00 6d00 6100 7200 7900  t.S.u.m.m.a.r.y.
 0004e20: 4900 6e00 6600 6f00 7200 6d00 6100 7400  I.n.f.o.r.m.a.t.
 0004e80: 0100 4300 6f00 6d00 7000 4f00 6200 6a00  ..C.o.m.p.O.b.j.
 *
 *
 *
 *

It looks like we may be dealing with one or multiple Microsoft Office files. We also found a letter shown below clipped from xxd’s ascii view.

Jimmy Jungle 626 Jungle Ave Apt 2 Jungle, NY 11111 Jimmy: Dude, your pot must be the best . it made the cover of High Times Magazine! Thanks for sending me the Cover Page. What do you put in your soil when you plant the marijuana seeds? At least I know your growing it and not some guy in Columbia. These kids, they tell me marijuan a isn.t addictive, but they don.t stop buying from me. Man, I.m sure glad you told me about targeting the high school students. You must have some experience. It.s like a guaranteed paycheck. Their parents give them money for lunch and they spend it on mystuff. I.m an entrepreneur. Am I only one you sell to? Maybe I can become distributor of the year! I emailed you the schedule that I am using. I think it helps me cover myself and not be predictive. Tell me what you think. To open it, use the same password that you sent me before with that file. Talk to you later Thanks, Joe

Hmmm could goodtimes be the password Joe speaks of? One Jimmy Jungle of 626 Jungle Ave Apt 2, Jungle, NY 11111 is Joe’s supplier and that’s the answer to question 1.

Moving forward quickly

Always use the best tools for the job. I am going to continue by using WinHex, I could use xxd but that would just make everything tedious and I have better things to be doing. You can get yourself a trial copy of WinHex if you don’t have it.

  1. Open the file image in WinHex.
  2. From the tool bar click Specialist -> Interpret file as disk

This will give you the view shown below: We can quickly note a few things:

  1. cover_page.jpgc is said to start at cluster 4200 This is complete rubbish as there is nothing in those sections of the disk which look anything like a jpg file (entropy wise).
  2. SCHEDU~1.exe is said to start at cluster 72. We know from the entropy values that this couldn’t be the case as the entropy is too high. This file is compressed, encrypted or contains random bytes.
  3. SCHEDU~1.exe is larger then the stated size of 1kb according the entropy graph.
  4. Jimmy Jungle.doc is a deleted file. Optional: Recovering deleted files.

It’s likely that there is something amiss with cover_page.jpgc and SCHEDU~1.exe ; however, Jimmy Jungle.doc looks like it might be ok. In the WinHex view shown above right click on the Jimmy Jungle.doc row and click Recover/Copy and save the file wherever you want. When I did this there where no issues with the file which contained the letter from Joe we found before.

A better look at SCHEDUL~1.exe

Lets look at the first block of this file.

$ dd if=image bs=512 count=1 skip=104 | xxd -a | head
0000000: 504b 0304 1400 0100 0800 985a b72c c755  PK.........Z.,.U
0000010: 608d ea08 0000 0042 0000 1400 0000 5363  `......B......Sc
0000020: 6865 6475 6c65 6420 5669 7369 7473 2e78  heduled Visits.x
0000030: 6c73 94c8 312a e349 0bdb a810 c270 9dfc  ls..1*.I.....p..
0000040: 1003 31a2 8e48 e83c 4b81 75c9 8b86 51af  ..1..H.<K.u...Q.
0000050: df2a 36c3 24db 1a7e 7546 98ee 4e56 4f05  .*6.$..~uF..NVO.
0000060: ba8d c460 3654 0e11 ab2e 23a5 e816 0252  ...`6T........R
0000070: e21f ef90 a3f5 232d 3410 0248 54c1 62cb  ......-4..HT.b.
0000080: 5ef1 3f91 5272 e3dc 660a 4a20 d3e2 02cf  ^.?.Rr..f.J ....
0000090: 789a 356b 554d b798 fb88 615f 8399 0553  x.5kUM....a_...S

This file is not what it appears to be. It is a PKZIP file (.zip) not an executable; see PKZIP file format. We already know it isn’t 1000bytes in length so lets check where this file is likely to end.

$ dd if=image bs=512 count=5 skip=104 | xxd -a | tail 5+0 records in 5+0 records out 2560 bytes (2.6 kB) copied, 0 s, Infinity B/s
0000900: c6c5 954e 28c9 7de7 48d3 2d1e b30f d7dc  ...N(.}.H.-.....
0000910: 9c23 e676 51aa ec21 dd21 0671 504b 0102  .#.vQ..!.!.qPK..
0000920: 1400 1400 0100 0800 985a b72c c755 608d  .........Z.,.U`.
0000930: ea08 0000 0042 0000 1400 0000 0000 0000  .....B..........
0000940: 0000 2000 b681 0000 0000 5363 6865 6475  .. .......Schedu
0000950: 6c65 6420 5669 7369 7473 2e78 6c73 504b  led Visits.xlsPK
0000960: 0506 0000 0000 0100 0100 4200 0000 1c09  ..........B.....
0000970: 0000 0000 0000 0000 0000 0000 0000 0000  ................
*
00009f0: 0000 0000 0000 0000 0000 0000 0000 0000  ................

With this information we can see that the file ends around line 0×970. 0×970 is hex so converting to decimal gives 2416. So I went to carve out the first 2416 bytes but the file was corrupt so I added an extra 4 bytes and everything worked out nicely.

$ dd if=image bs=512 count=5 skip=104 | dd bs=1 count=2420 > out.zip 5+0 records in 5+0 records out 2560 bytes (2.6 kB) copied, 0 s, Infinity B/s

This appears to work as I unzip the archive I am asked for a password. I enter the goodtimes string I found earlier and WinRar happily uncompresses a file called Scheduled Visits.xls. A small segment of the file is shown below. Oh no, Joe is going down. He frequents Birard High School, Hull High School, Key High School, Leetch High School, Richter High School and Smith Hill High School. This is the answer to question 3.

A closer look at coverpage.jpgc

It is obvious that this file does not begin at cluster 4200. Instead cluster 42 looks far more likely. A quick examination of cluster 42 gives me the evidence I need.

$ dd if=image bs=512 count=22 skip=73 | xxd | head -n22
+0 records in 22+0 records out 11264 bytes (11 kB) copied, 0.002 s, 5.6 MB/s
0000000: ffd8 ffe0 0010 4a46 4946 0001 0101 0060  ......JFIF.....`
0000010: 0060 0000 ffdb 0043 0008 0606 0706 0508  .`.....C........
0000020: 0707 0709 0908 0a0c 140d 0c0b 0b0c 1912  ................
0000030: 130f 141d 1a1f 1e1d 1a1c 1c20 242e 2720  ........... $.'
0000040: 222c 231c 1c28 3729 2c30 3134 3434 1f27  ",#..(7),01444.'
0000050: 393d 3832 3c2e 3334 32ff db00 4301 0909  9=82<.342...C...
0000060: 090c 0b0c 180d 0d18 3221 1c21 3232 3232  ........2!.!2222
0000070: 3232 3232 3232 3232 3232 3232 3232 3232  2222222222222222
0000080: 3232 3232 3232 3232 3232 3232 3232 3232  2222222222222222
0000090: 3232 3232 3232 3232 3232 3232 3232 ffc0  22222222222222..

The jpeg header looks good so we are good to carve. A quick look at cluster 64 (entropy dips) reveals a probable end of file.

$ dd if=image bs=512 count=21 skip=73 | xxd |  tail -n40
21+0 records in 21+0 records out 10752 bytes (11 kB) copied, 0.005 s, 2.2 MB/s
0002780: 6713 a4d9 f9d9 0285 0f22 8c21 72e3 0dc0  g........".!r...
0002790: 522a 5d49 a5b1 d14f 0786 94da 53ba 5bbf  R*]I...O....S.[.
00027a0: 2b5e ff00 7e96 dfaf 4b1d ab5d 69d7 7713  +^..~...K..]i.w.
00027b0: e96f 716b 34e2 33e7 5a17 566d 840c ee4e  .oqk4.3.Z.Vm...N
00027c0: b821 8751 dc7a d5ba e6ec 3c3b 35ae b42e  .!.Q.z....<;5...
00027d0: 1c2b c31d ccf7 492b 5e4c c732 1738 1064  .+....I+^L.2.8.d
00027e0: 4684 7984 6ec9 c807 805b 2bd0 c464 30a1  F.y.n....[+..d0.
00027f0: 9915 252a 37aa 36e5 07b8 0481 91ef 81f4  ..%*7.6.........
0002800: ad20 dbdd 1c38 8853 834a 9cae bf5f ebe6  . ...8.S.J..._..
0002810: b663 e8a2 8ab3 9c28 a28a 0028 a28a 0028  .c.....(...(...(
0002820: a28a 0028 a28a 0028 a28a 0028 a28a 0028  ...(...(...(...(
0002830: a28a 0028 a28a 0028 a28a 0028 a28a 0028  ...(...(...(...(
0002840: a28a 0028 a28a 0028 a28a 0028 a28a 0028  ...(...(...(...(

The file appears to end after 0×2818 (10264) bytes. Lets carve!

$ dd if=image bs=512 count=22 skip=73 | dd bs=1 count=10264 > cover_page.jpg
21+0 records in 21+0 records out 10752 bytes (11 kB) copied, 0.005 s, 2.2 MB/s

cover_page

Image looks good.

Finishing up

I am not interested in finishing the challenge as described, just trying to illustrate that entropy is a useful tool when investigating any sort of data. It very quickly uncovered attempts that had been made to hide data: the renaming of a files to indicate they had another format and messing around with root directory entries to hide where files start and end. I managed to find all the important bits within a few minutes which is good going. Anyhows give it a go! You can’t end there what about questions 4, 5, and 6? sigh ok then. Question 4: As shown above Jimmy jungle.doc was deleted, SCHEDU~1.zip was password protected, SCHEDUL~1.zip was renamed SCHEDUL~1.exe, the size of SCHEDU~1.zip was misrepresented, the starting cluster of cover image.jpgc was misrepresented. Question 5: is outlined throughout this document. Question 6: is a bonus question and I am not feeling in the bonus getting mood — if you need to know take a look at the top 30 solutions on Scan 24.