|
Shlomo Yona
Shlomo.Yona@perl.org.ilhttp://yeda.cs.technion.ac.il/~yona/hebrew |
An XML parser attack is anything that you can do to make an XML parser to:
but can also be a utilization of a functional or of an implementation detail in order to cause it or an application that uses it to do anything that it is beyond its intended design.
It is allowed to include DTD or refer to a DTD or to an XML Schema elsewhere.
This can be utilized for:
XXE (Xml eXternal Entity) Attack [Credits due to: Gregory Steuck]
XXE attack is an attack on an application that parses XML input from untrusted sources using incorrectly configured XML parser. The application may be coerced to open arbitrary files and/or TCP connections.
http://www.w3.org/TR/REC-xml/#include-if-valid says:
When an XML processor recognizes a reference to a parsed entity, in order to validate the document, the processor MUST include its replacement text. If the entity is external, and the processor is not attempting to validate the XML document, the processor MAY, but need not, include the entity's replacement text. If a non-validating processor does not include the replacement text, it MUST inform the application that it recognized, but did not read, the entity.
This rule is based on the recognition that the automatic inclusion provided by the SGML and XML entity mechanism, primarily designed to support modularity in authoring, is not necessarily appropriate for other applications, in particular document browsing. Browsers, for example, when encountering an external parsed entity reference, might choose to provide a visual indication of the entity's presence and retrieve it for display only on demand.
In case of untrusted XML input it is best to prohibit all external general entities.
Not necessarily a problem, depending on your implementation...
Note that Schema validation will not save you here...
Any other ideas?
(O(n^2) on duplicate lookup for names/prefixes)
Don't play by the rules:
These can be considered as boundary/limit attacks...
You would want your parser to be able to handle "random junk" gracefully and declare that it will not tolerate garbage. Or would you? [let's discuss!]
How would you effectively generate random junk for testing purposes? [let's discuss!]
Attacker can utilize the implementor's ignorance in standards:
Take Normalization (AKA Canonization) for example:
Any other examples for normalization? [Let's discuss!]
Any other examples for Standards Soup confusions/ignorance? [Let's discuss!]
other than the normative and non-normative XML escaping:
#28;
&2040;
%xx
%25xx
%%3230
%Uxxxx
Knowing how your XML parser is implemented and behaves on bad/malformed/tricky input can facilitate attacks on the programs/applications/layers that use your parser as a component.
Think about implementation of Encryption/Decryption of XML data (block?element?...)
Templating systems used with your XML parser's callbacks or tree traversal can result in executing malicious code.
Attacks are asymmetric:
Trivial to generate these with print statements
The recipient will consume a lot of resources as a result