Writeblocked Blog

Resources for learning python for forensics

This is just a small collection of the resources that are available if you are interested in learning python. It is not intended to be a comprehensive list of everything available, just enough to get you started. They are not listed in any particular order although I may have saved the best till last ;-)

•    The official Python tutorial •

Free Online Classes

•    Google Python classes

Books (free online)

How to think like a computer scientist:
Learn Python the Hard Way:
Invent with Python:
Hacking secret ciphers with Python (from Invent with Python)

Books (not free but worthwhile getting)

T.J OConnor Violent Python:
Justin Seitz Gray Hat Python: Python Programming for Hackers and Reverse Engineers
John Zelle Python Programming: An introduction to computer science 2nd ed


Official Documentation

Forensics & Python

Willi’s modules:
The volatility project:
Joachim Metz’s libraries: (not all of these are python, but many have python bindings and some are python!)
Dave Nides blog (author of 4n6time);
Plaso (backend engine for log2timeline):
T.J OConnor’s SANS paper Grow Your Own Forensic Tools: A Taxonomy of Python Libraries Helpful for Forensic Analysis

<shameless plug> the course I teach at Champlain College Scripting for Digital Forensics
and of course the list would not be complete without a cheat sheet

Filegen - file generator for tool testing

One of my students is currently researching data recovery on solid state drives. Part of the testing requires that he create a large number of files with known and easily identifiable content. There are many ways of doing this and it is something I have done many times before. However every time have meant to write a script to do the work. This time round I figured I would write something to solve the program once and for all.

In this case the objective was to be able to determine the amount of recoverable data after a collection of files had been wiped. So we needed:

  1. Files of different sizes (including small enough to be resident)
  2. Unique filenames
  3. Readily identifiable unique content for each file

So I wrote to generate the files we needed. It is a pretty simple program, in fact processing the options takes more code than generating the files but is simple to use and makes generating test files easy.

In order to address the unique filename requirement the user is able to pass a base filename that will then have a number appended to it. The file is then filled with a repeating pattern of the filename plus the size of the file (in bytes). When the string does not fit the desired file size the end of the file is padded with zeros. This way a keyword search can be used to identify how much of any given file is recoverable. Given the file size it is a simple matter to determine how many times the pattern repeats in the file.

The options are:
-s[1024]   size in bytes, can include a range if -c is also used i.e. -s 1024-1024000 -c 5 will produce 5 files between 1024 and 1024000 bytes
-c[1]   count of files to be produced
-t[evidence]    text to be included in the file this will have a file count number appended to the end of it.
-i[1]   inital number, the starting number to be appended to the string
-b[8192]   block size (how much data is written to file at once
-e[1]     step for difference between sizes in random files
-p [.\]    path for files to be written to, will be created if it does not exist
-v    version

One surprising outcome of the research was that in some cases more file content was recoverable than should have been written to the disk, but that is a story for another time.

The program is available as python code or a windows binary.

February and March recordings posted

I have just posted the recordings of the February and March meetups to the youtube channel ( These were both a couple of lively meetings with great information from Dave Kleiman and David Cowen. The next meetup is coming up in a couple of weeks and I don't have anyone booked for the May meetup yet, so if you have an idea for a presentation shoot me an email and lets get something sorted out. The June meetup will not be happening as I will be heading back to Australia around then and will have too much other stuff going on to run it. Things should be back to normal in July.

Live Challenge

Tonight we will have the first 5 minute challenge on DFIROnline. The idea behind this is to have a bit of fun and also have the chance to share the different ways the same problem can be solved. The challenge can be downloaded here, of course it is encrypted so you will have to wait to start working on it until we give you the password tonight. It is going to be an extra hands on session with Dave Kleiman sharing a collection of tools and scripts that he uses in windows examinations. He will also be taking us through how he uses them all. Throughout the presentation he will be giving away ebooks, just to make sure everyone is paying attention.

Updated filesystem cheat sheets

At PFIC last year I ran a workshop on the analysis of NTFS and handed out some cheat sheets I made for examining NTFS in a hex editor. I have been using these cheat sheets for ages and over the weekend finally got around to updating the partition table and FAT ones. They are available here:


Of course I still have to make them for HFS and ext but I will get there one day.

If you have any suggestions for improvements please let me know.

4096 byte sector drives, NTFS and forensic tools

One of the topics that came up during Kevin Ripa's DFIROnline presentation was the concept of 4k sectors, or really sectors larger than 512 bytes. I have been aware of these for a few years, as far as I am aware Western Digital started using them on their 1TB drives around 2009. At this time we were buying these drives by the box load and they were our main case working drives. Our main analysis OS was windows XP, which uses the MBR and by default creates the first partition at sector 63. This creates performance issues with 4k sector drives, as while they use 4k sectors internally they use logical translation in the firmware to present 512byte sectors to the operating systems. With a partition starting at sector 63 this means that the clusters are not aligned with the drive sectors resulting in at least twice as many reads and writes as necessary.

However I had not really thought about all this from a forensic perspective until Adam posted a MFT carver and started asking questions on the win4n6 mailing list about 4k sectors and their impact on the MFT. I checked the large drive on my system (which is using 1TB WD caviar blacks) and sure enough I had 4k sectors. I then checked the MFT and found it was using 4k File Record Segments.

My theory at the time was that the file record segment would match the sector size if the sector was 1024 or over. This was sort of confirmed by Troy, one of the other win4n6 folk. Troy also pointed out that Microsoft have reported that they are not yet supporting 4k sectors, as mentioned in this blog post from the Microsoft storage team.

This of course got my interest up, as I had a drive clearly reporting 4k sectors, as did Adam. Upon rebooting my system I realized/remembered that I was in fact using a Highpoint Rocket RAID,not connecting directly to the drive (why I did not notice this in windows I don't know). So I copied everything off the RAID, wiped the three 1TB drives and then configured a new RAID, at which point I found that the controller will let you set the sector size from 512, 1024, 2048, 3072 or 4096 bytes (hows that for cool?). I then created a 2TB RAID using 4k sectors. This was then formatted as NTFS on this was on Windows 7 pro SP1. I confirmed once again that the MFT file record segments were 4k in size. By default it still used a 4k cluster size, so there goes file slack space! The picture below shows the BPB of the partition. The key values are Bytes/Sector (0x0100 or 4096), sectors per cluster (1) and Clusters per file record segment (1).

hex view of NTFS BPB on 4096 byte sector drive

Now the interesting part, what tools can handle a disk using 4k sectors?

x-ways 16.1 SR-3 has no problem

FTK imager will mount the drive or image, but does not recognize any file system.*

Encase 6.19.4 & 6.16.2 crash when it starts processing the MFT, apparently encase 7 works*

SIFT 2.13 will mount it and show/access files correctly, although mmls appears to be hard coded to report 512 sectors, so the offsets it provides for the start of the partition are wrong.

analyzeMFT 1.7 does not support it. Adam pointed out the following comment in the code:
    # 1024 is valid for current version of Windows but should really get this value from somewhere  and David Kovar has confirmed that he was aware of this a potential issue.

I have not had the time to test anything else, so I figured I would put the test image out there for everyone to play with. If you have the time please download the test image, try it out in your favourite tools and post your results in the comments. You can download it here: Since the RAID were pretty much all 0x00 once acquired it compresses down to 2GB.

In the meantime I have investigated a few hard drive spec sheets and have not found any individual drives that are using 4096 byte sectors yet. In mycase it was the RAID card and in Adam's case it was the controller in the WD my book. So I guess you would have to be unlucky to run across this in the field, but it is something to be aware of. If you do know of any drives using 4k sectors please let me know. Even if you do run across one you should still be able to acquire it (as long as you are not using encase) and then get a copy of x-ways or SIFT for the analysis ;-).

* I have contacted both Guidance and AccessData, the inital response from AccessData is that they cannot reproduce the problem and will look into it. The thread on the Guidance forums can be found here: at this point they have not responded, but I imagine they may have been a little distracted with CEIC last week....

On 27 May 2012 versions of all software tested were added.