PDF documents are often exploited by adversaries for embedding the malicious script (mostly JavaScript) to store malicious content and install malware or the dropper on victim’s system.
I have tried to explain some of the useful tools, available in REMnux, that can be used for analyzing a malicious PDF.
1. pdfid
Pdfid is a tool that can be used to scan PDF documents for identifying if it may contain a Javascript or execute an action when opened.
Pdfid will scan a PDF document for the following strings and count their occurrences (total and obfuscated):
obj
endobj
stream
endstream
xref
trailer
startxref
/Page
/Encrypt
/ObjStm
/JS
/JavaScript
/AA
/OpenAction
/JBIG2Decode
/RichMedia
/Launch
/AcroForm
/XFA
Almost every PDF document will contain the first 7 words (obj through startxref). The more interesting fields to analyze when identifying a malicious pdf are:
/Page gives an indication of the number of pages in the PDF document. Most malicious PDF document have only one page.
/Encrypt indicates that the PDF document has DRM or needs a password to be read.
/ObjStm counts the number of object streams. An object stream can contain other objects and can therefore be used to obfuscate objects.
/JS and /JavaScript indicate that the PDF document contains JavaScript. Mostly all malicious PDF documents contain JavaScript. Although, one can also find JavaScript in PDF documents without malicious intent.
/AA and /OpenAction indicate an automatic action to be performed when the page/document is viewed. Malicious PDF documents with JavaScript usually have an automatic action to launch the JavaScript without user interaction.
/AcroForm refers to PDF form.
/JBIG2Decode indicates if the PDF document uses JBIG2 compression.
/RichMedia is for embedded Flash.
/Launch counts launch actions.
/XFA is for XML Forms Architecture.
Ex: We have a pdf file by name ‘malicious.pdf‘ that we need to analyse it on our remnux box say ‘vmstation‘. Running pdfid on the document gives us the following output:
root@vmstation:/home/remnux/# pdfid malicious.pdf
PDFiD 0.2.1 malicious.pdf
PDF Header: %PDF-1.7
obj 6
endobj 6
stream 2
endstream 2
xref 1
trailer 1
startxref 1
/Page 1
/Encrypt 0
/ObjStm 0
/JS 0
/JavaScript 0
/AA 0
/OpenAction 0
/AcroForm 1
/JBIG2Decode 0
/RichMedia 0
/Launch 0
/EmbeddedFile 0
/XFA 1
We can see that although the values for /JS and /JavaScript are 0 but /AcroForm and /XFA have value as 1.
2. peepdf
Peepdf is another helpful python tool that point out suspicious objects like AcroForm, OpenAction, JS and JavaScript, that are often misused. It also detects which vulnerability the pdf document triggers in case it has a signature for it.
Ex. Running peepdf on same file not only shows the suspicious objects but the heap corruption (CVE-2013-2729) vulnerability also that it triggers.
root@vmstation:/home/remnux# peepdf malicious.pdf
File: malicious.pdf
MD5: aaf8534120b88423f042b9d19f1c59ab
SHA1: ed0c7ab19d689554b5e112b3c45b68718908de4c
Size: 50717 bytes
Version: 1.7
Binary: True
Linearized: False
Encrypted: False
Updates: 0
Objects: 6
Streams: 2
Comments: 0
Errors: 0
Version 0:
Catalog: 3
Info: No
Objects (6): [1, 2, 3, 4, 5, 6]
Errors (2): [1, 6]
Streams (2): [1, 6]
Encoded (1): [1]
Objects with JS code (1): [1]
Suspicious elements:
/AcroForm: [3]
/XFA: [2]
BMP/RLE heap corruption (CVE-2013-2729): [1]
3. pdf-parser
Pdf-parser is a command-line program that parses and analyses PDF documents. It can be used to identify PDF documents with unusual/unexpected objects.
Ex. Running pdf-parser on malicious pdf shows Object 1 which contains a stream compressed and looks rather suspicious.
root@vmstation:/home/remnux/# pdf-parser malicious.pdf
PDF Comment ‘%PDF-1.7\n’
PDF Comment ‘%\xc0\xff\xee\xfa\xba\xda\n’
obj 1 0
Type:
Referencing:
Contains stream
<<
/Filter [ /Fl /Fl ]
/L 544
>>
4. pdfextract
By now, we are confirmed that the file is indeed malicious and contain obfuscated JavaScript. If we want to further extract the JavaScript from PDF documents and analyze it, we can use the pdf-extract tool.
Ex. Running pdfextract on malicious.pdf, extract 4 scripts and dump them to: ‘malicious.pdf.dump/scripts’
root@vmstation:/home/remnux# pdfextract malicious.pdf
Extracted 2 PDF streams to ‘malicious.pdf.dump/streams’.
Extracted 4 scripts to ‘malicious.pdf.dump/scripts’.
Extracted 0 attachments to ‘malicious.pdf.dump/attachments’.
Extracted 0 fonts to ‘malicious.pdf.dump/fonts’.
Extracted 0 images to ‘malicious.pdf.dump/images’.
root@vmstation:/home/remnux# cd malicious.pdf.dump/
root@vmstation:/home/remnux/malicious.pdf.dump# cd scripts/
root@vmstation:/home/remnux/malicious.pdf.dump/scripts# lsscript_1208899462995164754.js script_-3196157284528695661.js script_3802492399520803490.js script_537885029827703918.js
script_2339165404470982253.js script_-3507872836391180146.js script_-4152317962809273.js script_566555573402854993.js
root@vmstation:/home/remnux/malicious.pdf.dump/scripts#
We have now the malicious script handy.