Showing posts with label Python. Show all posts
Showing posts with label Python. Show all posts

EnScript and Python: Exporting Many Files for Heuristic Processing - Part 1

James Habben with Chet Hosmer

I discovered something very cool this year at CEIC: people actually read my blog posts! The realization came when I found out there were two sessions focusing on Python, and both of them talked about my #en2py techniques that I presented in this blog last year.

One of the sessions, Heuristic Reasoning with Python and EnCase, was presented by the Python forensics guy, Chet Hosmer of python-forensics.org. I got a chance to chat with him after his session, and the discussion led to what you are about to read. Chet has a number of Python scripts that can make a difference in forensic cases, and we decided a joint blog post would be a fun way to touch on the integration between EnCase and Python with another technique. This will be a two-part post with the first part focusing on getting the files out. The second will get some fancy on it by putting a GUI on the front to accept options in the processing. I will now let Chet explain the benefits of his work.

Function and Benefits of Heuristic Reasoning

Applying heuristics during deep-dive investigation allows us to apply rules of thumb during the process. In order to bring this to light, we chose to integrate a Python script that performs “what I call” heuristic indexing of binary files. Binary files like memory snapshots, executable files and photo graphic images have ASCII text embedded with the binary data. Extracting these “text sequences or remnants” and then making sense of them can be a challenge. 

The issue with traditional approaches like dictionary comparisons or keyword lists, is the occurrence of misspelled words, slang, technical jargon, malware strings, filenames, and function names. These can all be missed because they are not in the dictionary or keyword list, an example is shown in the Casey Anthony investigation. Another traditional approach would be to report on all “text sequences or remnants” this can results in a voluminous number of nonsensical meaningless text strings that can overwhelm investigators.

My approach (originally outlined in my text, Python Forensics) uses a set of 400,000 common English words, (loosely a mini corpus of words) to generate a weighted heuristic model.  I have since created additional models for medical and pharmaceutical domains and I’m working on common words used within text messages.  

Using Python, I load the specific weighted heuristics into a Set. Then during the process of extracting “text sequences or remnants” from the binary file(s), the same algorithm is applied to each extracted sequence as was used to build the weighted heuristics. The calculated heuristic is then used as a lookup value. If the value is found in the loaded weighted set, then the word is considered probable and reported. One other final step I should mention…. most languages have what are referred to as “stop words” such as, (whenever, always, another, elsewhere etc). English is no exception. These stop words are filtered from the final list as they typically have little probative value. Each identified word that passes these filters is stored in a dictionary, one of the great built-in data structures within Python. Dictionaries are key, value pairs, in this case the key is the probable word string and the value is the number of times the word is discovered. This allows me to then produce a resulting list of probable words either sorted alphabetically or by frequency of occurrence.

Therefore, the bottom line benefits of heuristic indexing include:
  1. Accurate identification of a broad set of probable words from binary data
  2. Slang, technical jargon, filenames, misspelled words are also identified
  3. Strings that represent nonsense strings are filtered out
  4. Common stop words are ignored
  5. The frequency of words found or alphabetical results are possible
  6. New weighted heuristic models can be created
In order to apply this method more broadly to a case instead of a single file, we needed a method to allow EnCase (via an EnScript), to export multiple selected files to be processed by the Python script. I turned to James, the EnScript Guru for help.

Method of Choosing Files

In my previous posts, I used a simple technique in EnScript to send the highlighted file out from EnCase to the local disk to allow for a Python script to access the data. This works great for Python scripts that are designed to process one file at a time, but it is not very efficient for the examiner when that one file has not been pinpointed yet. There are many Python scripts out there that are designed to process a whole set of files in a designated folder.

In another post, I looped through files in the case, but I was targeting certain filenames known to contain evidence from Windows 8 Phone apps. The structure there is similar to what I have here, but the interaction with Python is the difference.

Chet and I talked at CEIC about how to do exactly this in EnScript, and came to the conclusion that the rest of the world should know about this as well! OK, maybe not the world, but I’m sure you appreciate that we didn’t keep this buried in some dark closet somewhere.

I have talked about ItemIteratorClass before, but it was in a simple post about the changes in EnScript from EnCase v6 to v7. This is the class that gives us access to all of the files in the case. There are a lot of options explained in that post, so I won’t drag it out here. The mode we will focus on is CURRENTVIEW_SELECTED, which will give us a collection of the files that the examiner has blue-checked in the EnCase interface before running the EnScript.

Because we are processing multiple files, the execution of the Python script needs to happen once the loop is complete. The loop will be doing the work of identifying selected files and exporting them to the disk.

EnScript Walkthrough

The usage of ItemIteratorClass starts off with setting some values in variables. I defined these as global variables for reasons you will see in part 2. The mode I chose here allows an examiner to blue-check any number of files in EnCase, and send this collection to the EnScript for export.

The NOPROXY is used because I am not looking to get any hashes calculated and it speeds up the loop. The NORECURSE option is also used to speed up the loop. With the mode using the current view, the recursing into compound files isn’t possible, anyway.


Then we enter into the loop to find all of the files. There's a fairly bulky chunk of code here, but it has a purpose behind it. When you are dealing with files from evidence, you are potentially pulling files from folders all over the drive. Chances are good you will find a couple files with the same name. On line 22, I am using a GUID that is generated by EnCase and is unique inside the evidence file. Lines 20-23 all together are modifying the filename to include this GUID, but also retain the same extension for identity.



There is a little irritation that pops up when you use any of the modes focusing on the current view. It locks that view in EnCase for the examiner running the EnScript while the iterator is active. Line 31 happens immediately after the looping export code, and this clears the iterator to release the view for the examiner while Python does its thing. Little things matter!




Depending on the Python script you are using and the amount of data you are processing, you may have to adjust the timeout value on line 41. If this value is not large enough, the output from Python will be either missing or cut short.



You're getting a two-for-one deal in this joint blog post, because now Chet is going to explain some Python code now. (I don’t want to read any complaints about the length of this post!)

Python Walkthrough

The overview of the Python script is shown in the figure below:



The Script employs a Heuristic Model created from one or more word dictionaries. Dictionaries and vernaculars can be expanded through the training of the model. The Heuristic Indexer receives selected file(s) from EnCase and then extracts possible word strings from each of the files. Heuristics are calculated for each extracted string and then compared against the Heuristic Model. The result is a report that is delivered back to EnCase.

For Part I of the blog I want to focus on the primary integration between James’ EnScript and the Python Heuristic Indexer.

The main entry point for the Python Script prints out some information messages and then obtains the path and individual filenames exported by the EnScript by parsing the command line arguments. Then for each file found, the IndexAllWords() function is called to perform the string extraction and subsequent Heuristic analysis.  I have highlighted the key lines of the Python script.

Python Main Entry Point

# Main program for pyIndex

if __name__ == "__main__":

    # Print Script Basics
    print "\nHeuristic Indexer v 1.1 CEIC 2015"
    print "Python Forensics, Inc. \n"

    print "Script Started", str(datetime.now())
    print

    # Obtain the arguments passed in by the Enscript
    # In Phase I the only argument passed is
    # path where the EnScript copied the selected files

    targetPath = ParseCommandLine()
 
    print "Processing EnCase Target Path: ", targetPath
    print

    # using the targetPath, obtain a list of filenames
    # using the Python os module

    targetList = os.listdir(targetPath)
 
 
    # Creating an object to process
    # probable words
    # the matrix.txt file contains heuristic model

    wordCheck = classWordHeuristics("matrix.txt")
 
    # Now we can iterate through the list of files
    # Calling the IndexAllWords() function for each
    # file. The IndexAllWords() performs the word
    # extraction, heuristic processing and reports
    # results back to EnCase via Standard Out

 
    for eachFile in targetList:
     
        fullPath = os.path.join(targetPath, eachFile)
        print "####################################"
        print "## Processing File: ", eachFile
        print "####################################\n"
     
        IndexAllWords(fullPath, wordCheck)
 
    print "Script Ended", str(datetime.now())
    print

    # Script End

Results: So What Do I Get From All of This?

Here is a screen shot and an abbreviated excerpt from an actual EnCase / Python marriage.


Closing Thoughts

James: This was a new (and exciting) opportunity for me to have a guest author in a joint post. I am so happy to hear that my #en2py techniques have helped others. EnCase is a powerful platform on its own, but enhancing it with the libraries available in other languages and tools just makes everything that much better for examiners. I hope you find this useful and thanks for taking the time to read through this!

Chet: The catalyst behind Python Forensics, Inc. is to create a collaborative environment for the rapid development of new investigative scripts that can directly benefit investigators.  I hope this blog will get you interested in developing and/or using EnScripts and Python in your next endeavor.  I would like to thank James for his enthusiasm for the project and I look forward to Part II.

The Final Details

Download the EnScript here.
Download Chet's Python script here.
Look for an email invitation and announcements on Twitter about an upcoming webinar we're planning with Chet called, "EnCase and Python: Extending Your Investigative Capabilities."

Chet Hosmer
@PythonForensics
Founder of python-forensics.org

James Habben
@JamesHabben
Master Instructor at Guidance Software

Learn to Expand on the Value of EnCase at CEIC 2015 with EnScripts and Third-Party Apps

Robert Batzloff

This year at CEIC®, we’re committing more training and trainer resources than ever before to help you boost the benefits of EnCase® in your company’s deployment.

Our goal is to show you the brawn behind power EnCase users and apps, and by learning more about the EnScript® language, help you get to that same level.

With an expanded conference track called EnCase Apps and Integrations, we’ve added 12 sessions that will showcase some of the most dynamic apps developed by EnCase forensic investigators that are easy for you to integrate. We’re also boosting the App World booth hosted by EnScript gurus from Guidance Software and developers from the EnCase community, so you’ve got more experts close at hand during all hours of the conference day.

EnCase and Python – Automating Windows Phone 8 Analysis

James Habben

Roll Call


You may have read my introductory post about using Python scripts with encase. You may have also read my part 2 follow-up, which put a GUI on top of Didier Stevens’ pdf-parser. Did you also read Kevin Breen’s post? He wrote about using EnScript to call out to David Kovar’s analyzemft script using EnScript. Then Chip wrote a post about sending data out to get parsed by parser-usnjrnl.

EnCase and Python – Part 2

James Habben

In Part 1 of this post, I shared a method that lets you use Python scripts by configuring a file viewer in EnCase. We used Didier Stevens’ pdf-parser as an example. I also showed how EnScript could be used to greater effect by allowing us to capture the output of pdf-parser directly in a bookmark without having to manually copy and paste. Both of these techniques reduce effort by leveraging capabilities of both EnCase and the Python language.

In this post, I’ll take the same principles and apply them into an EnScript that provides a little more flexibility and functionality. Our goal is to have a GUI that gives you control over the exact functionality you want from the pdf-parser tool.

EnCase and Python - Part 1

James Habben

As a co-author and instructor for Guidance Software’s EnScript Programming course, I spend a lot of time teaching investigators in person around the globe. Investigators are faced with a dizzying variety of challenges. We work together in class, coming up with solutions that send EnCase off to do our bidding. EnCase and EnScript allow us to “bottle” the result of our efforts to share with other investigators (e.g. categorizing internet history, detecting files hidden by rootkits).

Python is used similarly. The interweb hosts great tools written in Python to accomplish all measures of tasks facing DFIR examiners. The community benefits from the hours of work that go into each and every .py that gets baked. It seemed to me that there should be a way for EnCase and Python to work together, so I put together a brief tutorial.