XML Parser attacks

XML Parser attacks

A summary of ways to attack an XML Parser

Shlomo Yona

Shlomo.Yona@perl.org.il
http://yeda.cs.technion.ac.il/~yona/hebrew


What is an XML Parser Attack?

What is an XML Parser Attack?

An XML parser attack is anything that you can do to make an XML parser to:

  • crash
  • consume too much of a resource (e.g., memory)
  • execute too slowly
  • execute your own code

    but can also be a utilization of a functional or of an implementation detail in order to cause it or an application that uses it to do anything that it is beyond its intended design.


    Main types of attacks

    Main types of attacks
  • (re)definition
  • include/import or XXE
  • boundary/limit or XML Bomb
  • garbage
  • Standards Soup


    (re)definition

    (re)definition

    It is allowed to include DTD or refer to a DTD or to an XML Schema elsewhere.

    This can be utilized for:

  • fetch sensitive information from disk or remote location
  • Attack delegation: trick parser's UserAgent to attack some application/site
  • Denial of Service (DoS) on parser via bad/problematic URI
  • replace the DTD/XML Schema that you validate against
  • plant entity expansion bombs


    XXE

    XXE

    XXE (Xml eXternal Entity) Attack [Credits due to: Gregory Steuck]

    XXE attack is an attack on an application that parses XML input from untrusted sources using incorrectly configured XML parser. The application may be coerced to open arbitrary files and/or TCP connections.

    http://www.w3.org/TR/REC-xml/#include-if-valid says:

    When an XML processor recognizes a reference to a parsed entity, in order to validate the document, the processor MUST include its replacement text. If the entity is external, and the processor is not attempting to validate the XML document, the processor MAY, but need not, include the entity's replacement text. If a non-validating processor does not include the replacement text, it MUST inform the application that it recognized, but did not read, the entity.

    This rule is based on the recognition that the automatic inclusion provided by the SGML and XML entity mechanism, primarily designed to support modularity in authoring, is not necessarily appropriate for other applications, in particular document browsing. Browsers, for example, when encountering an external parsed entity reference, might choose to provide a visual indication of the entity's presence and retrieve it for display only on demand.


    XXE -- suggested fix

    XXE -- suggested fix

    In case of untrusted XML input it is best to prohibit all external general entities.


    XXE -- what can this possibly do to me?

    XXE -- what can this possibly do to me?
  • DoS on the parsing system by making it open, e.g. file:///dev/random | file:///dev/urandom | file://c:/con/con
  • TCP scans using HTTP external entities (including behind firewalls since application servers often have world view different from that of the attacker)
  • Unauthorized access to data stored as XML files on the parsing system file system (of course the attacker still needs a way to get these data back)
  • DoS on other systems (if parsing system is allowed to establish TCP connections to other systems)
  • NTLM authentication material theft by initiating UNC file access to systems under attacker control
  • Doomsday scenario: A widely deployed and highly connected application vulnerable to this attack may be used for DDoS.


    XML bombs

    XML bombs
  • XML document contains too many bytes
  • XML document contains too many characters (one character doesn't necessarily translate to one byte...)
  • Nesting depth too deep
  • Too many elements
  • Too many siblings to an element
  • Too many attributes
  • Too many namespaces
  • Element/attribute/namespace-prefix/value too long (bytes? characters?)
  • recursive nesting of elements (this is not well formed XML!)
  • too many times opening and closing a tag (too many push/pop stack operations)
  • entity resolution depth

    Not necessarily a problem, depending on your implementation...

    Note that Schema validation will not save you here...

    Any other ideas?

    (O(n^2) on duplicate lookup for names/prefixes)


    XML bombs -- how to protect against?

    XML bombs -- how to protect against?
  • restrict size/length/depth of everything
  • consider pre-allocation / static-allocation and manage memory yourself (no malloc/free)


    Garbage

    Garbage

    Don't play by the rules:

  • not well formed XML
  • XML not valid against schema
  • lie about your encoding
  • use illegal chars

    These can be considered as boundary/limit attacks...

    You would want your parser to be able to handle "random junk" gracefully and declare that it will not tolerate garbage. Or would you? [let's discuss!]

    How would you effectively generate random junk for testing purposes? [let's discuss!]


    Standards Soup

    Standards Soup

    Attacker can utilize the implementor's ignorance in standards:

    Take Normalization (AKA Canonization) for example:

  • <a></a> is same as <a/>
  • <a b="c" c="b"/> is same as <a c="b" b="c"/>
  • entity / escaping
  • character encoding (how do I read my bytes as characters?!)
  • character encoding decomposition (Unicode Normalization Forms http://www.unicode.org/reports/tr15/ )

    Any other examples for normalization? [Let's discuss!]

    Any other examples for Standards Soup confusions/ignorance? [Let's discuss!]

  • does your implementation enforce only one occurrence of the same attribute in an element?


    some escaping methods found in the wild...

    some escaping methods found in the wild...

    other than the normative and non-normative XML escaping:

    		&#28;
    

    &#28;#28;

    &2040;

    %xx

    %25xx

    %%3230

    %Uxxxx


    We did not talk about specific applications

    We did not talk about specific applications
  • XML-RPC
  • WSDL/SOAP
  • webDAV
  • XML file formats (e.g., new file formats in Vista)
  • WS-Security... WS-* applications

    Knowing how your XML parser is implemented and behaves on bad/malformed/tricky input can facilitate attacks on the programs/applications/layers that use your parser as a component.

    Think about implementation of Encryption/Decryption of XML data (block?element?...)

    Templating systems used with your XML parser's callbacks or tree traversal can result in executing malicious code.


    Keep in mind

    Keep in mind

    Attacks are asymmetric:

    Trivial to generate these with print statements

    The recipient will consume a lot of resources as a result


    Thank you!

    Thank you!


    Shlomo Yona (c) 2007. All rights reserved. Monthly meetings of Israeli Perl Mongers. Last update: Tue Apr 17 08:31:57 IDT 2007