• Nitya Garg

How to analyze malicious PDF document using tools available in REMnux?

PDF documents are often exploited by adversaries for embedding the malicious script (mostly JavaScript) to store malicious content and install malware or the dropper on victim’s system.

I have tried to explain some of the useful tools, available in REMnux, that can be used for analyzing a malicious PDF.

1. pdfid

Pdfid is a tool that can be used to scan PDF documents for identifying if it may contain a Javascript or execute an action when opened.

Pdfid will scan a PDF document for the following strings and count their occurrences (total and obfuscated):

  • obj

  • endobj

  • stream

  • endstream

  • xref

  • trailer

  • startxref

  • /Page

  • /Encrypt

  • /ObjStm

  • /JS

  • /JavaScript

  • /AA

  • /OpenAction

  • /JBIG2Decode

  • /RichMedia

  • /Launch

  • /AcroForm

  • /XFA

Almost every PDF document will contain the first 7 words (obj through startxref). The more interesting fields to analyze when identifying a malicious pdf are:

  • /Page gives an indication of the number of pages in the PDF document. Most malicious PDF document have only one page.

  • /Encrypt indicates that the PDF document has DRM or needs a password to be read.

  • /ObjStm counts the number of object streams. An object stream can contain other objects and can therefore be used to obfuscate objects.

  • /JS and /JavaScript indicate that the PDF document contains JavaScript. Mostly all malicious PDF documents contain JavaScript. Although, one can also find JavaScript in PDF documents without malicious intent.

  • /AA and /OpenAction indicate an automatic action to be performed when the page/document is viewed. Malicious PDF documents with JavaScript usually have an automatic action to launch the JavaScript without user interaction.

  • /AcroForm refers to PDF form.

  • /JBIG2Decode indicates if the PDF document uses JBIG2 compression.

  • /RichMedia is for embedded Flash.

  • /Launch counts launch actions.

  • /XFA is for XML Forms Architecture.

Ex: We have a pdf file by name ‘malicious.pdf‘ that we need to analyse it on our remnux box say ‘vmstation‘. Running pdfid on the document gives us the following output:

root@vmstation:/home/remnux/# pdfid malicious.pdf

PDFiD 0.2.1 malicious.pdf

PDF Header: %PDF-1.7

obj 6

endobj 6

stream 2

endstream 2

xref 1

trailer 1

startxref 1

/Page 1

/Encrypt 0

/ObjStm 0

/JS 0

/JavaScript 0

/AA 0

/OpenAction 0

/AcroForm 1

/JBIG2Decode 0

/RichMedia 0

/Launch 0

/EmbeddedFile 0

/XFA 1

We can see that although the values for /JS and /JavaScript are 0 but /AcroForm and /XFA have value as 1.

2. peepdf

Peepdf is another helpful python tool that point out suspicious objects like AcroForm, OpenAction, JS and JavaScript, that are often misused. It also detects which vulnerability the pdf document triggers in case it has a signature for it.

Ex. Running peepdf on same file not only shows the suspicious objects but the heap corruption (CVE-2013-2729) vulnerability also that it triggers.

root@vmstation:/home/remnux# peepdf malicious.pdf

File: malicious.pdf

MD5: aaf8534120b88423f042b9d19f1c59ab

SHA1: ed0c7ab19d689554b5e112b3c45b68718908de4c

Size: 50717 bytes

Version: 1.7

Binary: True

Linearized: False

Encrypted: False

Updates: 0

Objects: 6

Streams: 2

Comments: 0

Errors: 0

Version 0:

Catalog: 3

Info: No

Objects (6): [1, 2, 3, 4, 5, 6]

Errors (2): [1, 6]

Streams (2): [1, 6]

Encoded (1): [1]

Objects with JS code (1): [1]

Suspicious elements:

/AcroForm: [3]

/XFA: [2]

BMP/RLE heap corruption (CVE-2013-2729): [1]

3. pdf-parser

Pdf-parser is a command-line program that parses and analyses PDF documents. It can be used to identify PDF documents with unusual/unexpected objects.

Ex. Running pdf-parser on malicious pdf shows Object 1 which contains a stream compressed and looks rather suspicious.

root@vmstation:/home/remnux/# pdf-parser malicious.pdf

PDF Comment ‘%PDF-1.7\n’

PDF Comment ‘%\xc0\xff\xee\xfa\xba\xda\n’

obj 1 0



Contains stream


/Filter [ /Fl /Fl ]

/L 544


4. pdfextract

By now, we are confirmed that the file is indeed malicious and contain obfuscated JavaScript. If we want to further extract the JavaScript from PDF documents and analyze it, we can use the pdf-extract tool.

Ex. Running pdfextract on malicious.pdf, extract 4 scripts and dump them to: ‘malicious.pdf.dump/scripts’

root@vmstation:/home/remnux# pdfextract malicious.pdf

Extracted 2 PDF streams to ‘malicious.pdf.dump/streams’.

Extracted 4 scripts to ‘malicious.pdf.dump/scripts’.

Extracted 0 attachments to ‘malicious.pdf.dump/attachments’.

Extracted 0 fonts to ‘malicious.pdf.dump/fonts’.

Extracted 0 images to ‘malicious.pdf.dump/images’.

root@vmstation:/home/remnux# cd malicious.pdf.dump/

root@vmstation:/home/remnux/malicious.pdf.dump# cd scripts/

root@vmstation:/home/remnux/malicious.pdf.dump/scripts# lsscript_1208899462995164754.js script_-3196157284528695661.js script_3802492399520803490.js script_537885029827703918.js

script_2339165404470982253.js script_-3507872836391180146.js script_-4152317962809273.js script_566555573402854993.js


We have now the malicious script handy.

61 views0 comments