EnScript and Python: Exporting Many Files for Heuristic Processing - Part 1

James Habben with Chet Hosmer

I discovered something very cool this year at CEIC: people actually read my blog posts! The realization came when I found out there were two sessions focusing on Python, and both of them talked about my #en2py techniques that I presented in this blog last year.

One of the sessions, Heuristic Reasoning with Python and EnCase, was presented by the Python forensics guy, Chet Hosmer of python-forensics.org. I got a chance to chat with him after his session, and the discussion led to what you are about to read. Chet has a number of Python scripts that can make a difference in forensic cases, and we decided a joint blog post would be a fun way to touch on the integration between EnCase and Python with another technique. This will be a two-part post with the first part focusing on getting the files out. The second will get some fancy on it by putting a GUI on the front to accept options in the processing. I will now let Chet explain the benefits of his work.

Function and Benefits of Heuristic Reasoning

Applying heuristics during deep-dive investigation allows us to apply rules of thumb during the process. In order to bring this to light, we chose to integrate a Python script that performs “what I call” heuristic indexing of binary files. Binary files like memory snapshots, executable files and photo graphic images have ASCII text embedded with the binary data. Extracting these “text sequences or remnants” and then making sense of them can be a challenge. 

The issue with traditional approaches like dictionary comparisons or keyword lists, is the occurrence of misspelled words, slang, technical jargon, malware strings, filenames, and function names. These can all be missed because they are not in the dictionary or keyword list, an example is shown in the Casey Anthony investigation. Another traditional approach would be to report on all “text sequences or remnants” this can results in a voluminous number of nonsensical meaningless text strings that can overwhelm investigators.

My approach (originally outlined in my text, Python Forensics) uses a set of 400,000 common English words, (loosely a mini corpus of words) to generate a weighted heuristic model.  I have since created additional models for medical and pharmaceutical domains and I’m working on common words used within text messages.  

Using Python, I load the specific weighted heuristics into a Set. Then during the process of extracting “text sequences or remnants” from the binary file(s), the same algorithm is applied to each extracted sequence as was used to build the weighted heuristics. The calculated heuristic is then used as a lookup value. If the value is found in the loaded weighted set, then the word is considered probable and reported. One other final step I should mention…. most languages have what are referred to as “stop words” such as, (whenever, always, another, elsewhere etc). English is no exception. These stop words are filtered from the final list as they typically have little probative value. Each identified word that passes these filters is stored in a dictionary, one of the great built-in data structures within Python. Dictionaries are key, value pairs, in this case the key is the probable word string and the value is the number of times the word is discovered. This allows me to then produce a resulting list of probable words either sorted alphabetically or by frequency of occurrence.

Therefore, the bottom line benefits of heuristic indexing include:
  1. Accurate identification of a broad set of probable words from binary data
  2. Slang, technical jargon, filenames, misspelled words are also identified
  3. Strings that represent nonsense strings are filtered out
  4. Common stop words are ignored
  5. The frequency of words found or alphabetical results are possible
  6. New weighted heuristic models can be created
In order to apply this method more broadly to a case instead of a single file, we needed a method to allow EnCase (via an EnScript), to export multiple selected files to be processed by the Python script. I turned to James, the EnScript Guru for help.

Method of Choosing Files

In my previous posts, I used a simple technique in EnScript to send the highlighted file out from EnCase to the local disk to allow for a Python script to access the data. This works great for Python scripts that are designed to process one file at a time, but it is not very efficient for the examiner when that one file has not been pinpointed yet. There are many Python scripts out there that are designed to process a whole set of files in a designated folder.

In another post, I looped through files in the case, but I was targeting certain filenames known to contain evidence from Windows 8 Phone apps. The structure there is similar to what I have here, but the interaction with Python is the difference.

Chet and I talked at CEIC about how to do exactly this in EnScript, and came to the conclusion that the rest of the world should know about this as well! OK, maybe not the world, but I’m sure you appreciate that we didn’t keep this buried in some dark closet somewhere.

I have talked about ItemIteratorClass before, but it was in a simple post about the changes in EnScript from EnCase v6 to v7. This is the class that gives us access to all of the files in the case. There are a lot of options explained in that post, so I won’t drag it out here. The mode we will focus on is CURRENTVIEW_SELECTED, which will give us a collection of the files that the examiner has blue-checked in the EnCase interface before running the EnScript.

Because we are processing multiple files, the execution of the Python script needs to happen once the loop is complete. The loop will be doing the work of identifying selected files and exporting them to the disk.

EnScript Walkthrough

The usage of ItemIteratorClass starts off with setting some values in variables. I defined these as global variables for reasons you will see in part 2. The mode I chose here allows an examiner to blue-check any number of files in EnCase, and send this collection to the EnScript for export.

The NOPROXY is used because I am not looking to get any hashes calculated and it speeds up the loop. The NORECURSE option is also used to speed up the loop. With the mode using the current view, the recursing into compound files isn’t possible, anyway.


Then we enter into the loop to find all of the files. There's a fairly bulky chunk of code here, but it has a purpose behind it. When you are dealing with files from evidence, you are potentially pulling files from folders all over the drive. Chances are good you will find a couple files with the same name. On line 22, I am using a GUID that is generated by EnCase and is unique inside the evidence file. Lines 20-23 all together are modifying the filename to include this GUID, but also retain the same extension for identity.



There is a little irritation that pops up when you use any of the modes focusing on the current view. It locks that view in EnCase for the examiner running the EnScript while the iterator is active. Line 31 happens immediately after the looping export code, and this clears the iterator to release the view for the examiner while Python does its thing. Little things matter!




Depending on the Python script you are using and the amount of data you are processing, you may have to adjust the timeout value on line 41. If this value is not large enough, the output from Python will be either missing or cut short.



You're getting a two-for-one deal in this joint blog post, because now Chet is going to explain some Python code now. (I don’t want to read any complaints about the length of this post!)

Python Walkthrough

The overview of the Python script is shown in the figure below:



The Script employs a Heuristic Model created from one or more word dictionaries. Dictionaries and vernaculars can be expanded through the training of the model. The Heuristic Indexer receives selected file(s) from EnCase and then extracts possible word strings from each of the files. Heuristics are calculated for each extracted string and then compared against the Heuristic Model. The result is a report that is delivered back to EnCase.

For Part I of the blog I want to focus on the primary integration between James’ EnScript and the Python Heuristic Indexer.

The main entry point for the Python Script prints out some information messages and then obtains the path and individual filenames exported by the EnScript by parsing the command line arguments. Then for each file found, the IndexAllWords() function is called to perform the string extraction and subsequent Heuristic analysis.  I have highlighted the key lines of the Python script.

Python Main Entry Point

# Main program for pyIndex

if __name__ == "__main__":

    # Print Script Basics
    print "\nHeuristic Indexer v 1.1 CEIC 2015"
    print "Python Forensics, Inc. \n"

    print "Script Started", str(datetime.now())
    print

    # Obtain the arguments passed in by the Enscript
    # In Phase I the only argument passed is
    # path where the EnScript copied the selected files

    targetPath = ParseCommandLine()
 
    print "Processing EnCase Target Path: ", targetPath
    print

    # using the targetPath, obtain a list of filenames
    # using the Python os module

    targetList = os.listdir(targetPath)
 
 
    # Creating an object to process
    # probable words
    # the matrix.txt file contains heuristic model

    wordCheck = classWordHeuristics("matrix.txt")
 
    # Now we can iterate through the list of files
    # Calling the IndexAllWords() function for each
    # file. The IndexAllWords() performs the word
    # extraction, heuristic processing and reports
    # results back to EnCase via Standard Out

 
    for eachFile in targetList:
     
        fullPath = os.path.join(targetPath, eachFile)
        print "####################################"
        print "## Processing File: ", eachFile
        print "####################################\n"
     
        IndexAllWords(fullPath, wordCheck)
 
    print "Script Ended", str(datetime.now())
    print

    # Script End

Results: So What Do I Get From All of This?

Here is a screen shot and an abbreviated excerpt from an actual EnCase / Python marriage.


Closing Thoughts

James: This was a new (and exciting) opportunity for me to have a guest author in a joint post. I am so happy to hear that my #en2py techniques have helped others. EnCase is a powerful platform on its own, but enhancing it with the libraries available in other languages and tools just makes everything that much better for examiners. I hope you find this useful and thanks for taking the time to read through this!

Chet: The catalyst behind Python Forensics, Inc. is to create a collaborative environment for the rapid development of new investigative scripts that can directly benefit investigators.  I hope this blog will get you interested in developing and/or using EnScripts and Python in your next endeavor.  I would like to thank James for his enthusiasm for the project and I look forward to Part II.

The Final Details

Download the EnScript here.
Download Chet's Python script here.
Look for an email invitation and announcements on Twitter about an upcoming webinar we're planning with Chet called, "EnCase and Python: Extending Your Investigative Capabilities."

Chet Hosmer
@PythonForensics
Founder of python-forensics.org

James Habben
@JamesHabben
Master Instructor at Guidance Software

My Thoughts on CEIC 2015


CEIC 2015 is Over

This year’s CEIC is over. After a long and relaxing holiday weekend, it feels almost like it was months ago. I really enjoy being involved with CEIC every year because it gives me a chance to catch up with old friends and meet new ones. The real reason (at least the one we tell our bosses) we all go to CEIC is for the great sessions. There were so many of them this year that I wish I could have cloned myself to see them all. To make it a bit more difficult, CEIC is not just a training conference for me since I am part of the team putting it on. I wanted to put down some of my experiences from this year.

The most rewarding thing to me during the entire conference is to hear from past students about their success in completing the EnCE certification. The only way to achieve that cert is by dedication and perseverance. I get thanks from them for teaching classes they attended, but I didn’t take the test. Their excitement and enthusiasm is infectious and I love it! Congratulations to everyone who passed the 1st phase during CEIC, and good luck on the 2nd.

If you didn’t get to attend CEIC this year, you missed a good one. Try again for next year, and I think you will be well rewarded.

Some Sessions

Because I am part of the setup and operations of CEIC, I am not usually able to attend full session, but there are a few that I really enjoyed that I wanted to give mention to.

Monday started off great hearing about new features in IEF from Jamie McQuaid and Rob Maddox of Magnet Forensics in Investigating a User’s Internet Activity across Computers, Smartphones and Tablets. This team knows how to stay on top of industry trends and to enhance their tools with a quick response. It is great to know that Guidance has a partner dedicated to examiners like we are.

A must-see for me is Tracking the Use of USB Storage on Windows 8 by Colin Cree. He has been researching USB artifacts on Windows for many years, and somehow seems to find new intricacies every year. No disappointment this year!

It’s a safe bet on the SANS crew. I enjoyed APT Attacks Exposed: Network, Host, Memory and Malware Analysis since you can never learn too much about how others operate and think. It helps us all grow, and I am glad that Rob Lee, Anuj Soni, Chad Tilbury, and Jake Williams are sharing their experiences.

I am a firm believer in everyone learning to code as a skill. Mari DeGrazia and Ron Dormido laid out a great foundation in Practical Python Forensics for those wanting to learn Python as their language. Extra points since they showed how to integrate EnCase and Python!

Memory forensics has become a huge source of information in all types of investigations, and Jamie Levy knows this better than most. As a part of the Volatility team, she is an immense resource and shared it in Rootkits, Exfil and APT: RAM Conquers All to help us all. I learned a lot about using Volatility from this session. I also learned about her twitter handle outside of the session, but leave it to her to spread that.

My Sessions

I had a lot of fun this year talking in my sessions. I talked about how you can expand EnScript with .NET and Python code. It was exciting to me since everyone seemed to also be excited about the possibilities. I also got a chance to speak with Matt McFadden about EnCase Portable and the huge potential it has for examiners. Got to share how I used Portable on a case to handle a location with 4 examiners and 60+ computers, and we were done before dinner! Talked to many after the session that were excited about using it at home.

Deserved Recognition

Lastly, I wanted to give some recognition for a couple people from the Guidance Software team that really make CEIC the conference that it is. The entire Guidance team works really hard for this event, but these two really make it shine.

There is a technical team that I am part of every year, and it is managed by Jamey Tubbs from the training division. He puts in a ton of hours, before many of you even register for CEIC, in working with the event team, hotel technical staff, and our computer rental vendor. Our conference is unique from many others because of the large scale labs with supplied computers, and it would not be the same without him.

Until you read from me again!
James Habben

Digital Forensic Notables and Top-flight Instructors On Tap at CEIC 2015

(This is Part 3 of a 3-part series on the all-new and enhanced digital forensics labs and lectures at CEIC 2015.)

The first post in this series talked about how we're expanding on the core competency of the EnCase community who converge on CEIC each year. The second post drilled down into the plethora and diversity of digital artifacts and showcased sessions designed to address these exploding challenges. In this final post, we present the marquee of acclaimed industry experts who will be on hand to teach new technologies and tools and share hard-earned insight from decades of experience in digital investigations.

Learn to Expand on the Value of EnCase at CEIC 2015 with EnScripts and Third-Party Apps

Robert Batzloff

This year at CEIC®, we’re committing more training and trainer resources than ever before to help you boost the benefits of EnCase® in your company’s deployment.

Our goal is to show you the brawn behind power EnCase users and apps, and by learning more about the EnScript® language, help you get to that same level.

With an expanded conference track called EnCase Apps and Integrations, we’ve added 12 sessions that will showcase some of the most dynamic apps developed by EnCase forensic investigators that are easy for you to integrate. We’re also boosting the App World booth hosted by EnScript gurus from Guidance Software and developers from the EnCase community, so you’ve got more experts close at hand during all hours of the conference day.

The Good, the Bad, and the Diverse: Gain More Visibility into the Growing Diversity of Devices, OS’s and Artifacts

(This is Part 2 of a 3-part series on the all-new and enhanced digital forensics labs and lectures at CEIC® 2015. Read Part 1 here.)

One of the biggest challenges for investigators today is not only the number of devices or the amount of data (the average hard drive has just crossed the 1TB threshold), but the number and diversity of applications and artifacts that are on a system.

Frankly, we feel your pain. We know there’s no single tool that investigators can rely on to support all applications, browsers, and file systems. We get it when practitioners tell us they require a larger toolbox and deeper skill set to support the overwhelming challenges in digital investigations.

Guidance Software uses CEIC to bring together all of the speakers with their tools and apps that integrate with EnCase and provide you with better visibility into systems, applications and artifacts.

There are four tracks that focus on digital investigations:

  • Digital Forensics Labs
  • Advanced Digital Forensics Labs
  • Topics in Digital Forensics
  • Mobile Devices and Cloud Investigations
We want to remind you that the hands-on labs fill up fast, as 70 percent of attendees say that labs are the number one reason they attend CEIC. So, click here to register now.

You can view the agenda here to read session descriptions and speaker bios on the 44 lab, lecture, and panel sessions that focus on digital forensics.  You can also get a sneak preview on a few of the hands-on lab topics that are sure to warrant a packed room, such as the ones we've highlighted here below.


Digital Forensics Session Highlight: File System Journaling Forensics

David Cowen and Matthew Seyer of G-C Partners, LLC, will outline the three major file systems in use today that utilize journaling (NTFS, EXT3/4, HFS+) and explain what is stored and its impact on your investigations. You will learn:

  • What data is stored by your file systems?
  • How to gather the data using EnCase.
  • How to use a free parser to understand the data.

Digital Forensics Session Highlight: Vehicle Systems Forensics

Ben LeMere, CEO of Berla Corporation, is back by popular demand this year. We know students of vehicle forensics will be glad to hear that you'll be able to get your hands on the data stored in several different infotainment and telematics systems in his practical, hands-on lab session. Vehicle Infotainment and Telematics systems store a vast amount of data such as recent destinations, favorite locations, call logs, contact lists, SMS messages, emails, pictures, videos, social media feeds, and the navigation history of everywhere the vehicle has been. This information is not easily retrievable and is typically stored in several different systems within a vehicle not traditionally associated with event data. This is cutting-edge technology that is quickly becoming more pervasive in the field of investigations.

Digital Forensics Session Highlight: Windows ShellBag Forensics in Depth

Vincent Lo, Digital Forensics and Incident Response Investigator, knows that ShellBag behavior is a challenging task for “forensicators.” The problem of identifying when and which folders a user accessed arises often and investigators attempt to search for them in the ShellBag information because it may contain registry keys indicating which folders the user accessed previously. Their timestamps may demonstrate when they were accessed. Nevertheless, a lot of activities can create/update the timestamps. That’s why you won’t want to miss this hands-on lab, where you’ll understand the details of ShellBag information, review various activities across Windows operating systems and learn how to interpret it correctly.

If it wasn’t obvious before this blog, now it should be loud and clear: this year’s sessions on digital forensics pull no punches when it comes to providing more visibility to the good, the bad, and the sometimes very ugly and diverse applications and artifacts you face every day.

Stay tuned for Part 3 of this blog topic on digital forensics, where we’ll shed light on the caliber of speakers we’re bringing in to teach these sessions mentioned here. We're confident that these are experts whom you know and trust.

In the meantime, be sure to visit the CEIC website for information on the current event agenda, registration information, sponsor and exhibitor opportunities, and to register now. Also, be sure to follow us on Facebook, Twitter, and LinkedIn for the latest CEIC buzz and conversation.