Skip to end of metadata
Go to start of metadata

Woodstox XML compliancy

Current stable version (3.2.x) is

  • XML 1.0 compliant (minus bugs), and mostly XML 1.1 compliant (character normalization rules are not implemented). This can be verified (for example) by running SAXTest with Woodstox, which passes at over 99% success rate (most remaining failures are due to SAX API implementation, not xml compatibility).
  • Namespaces 1.0 and 1.1 compliant.
  • Fully Stax 1.0 compliant, both as per TCK and according to authors' best understanding of the specification (and related to Javadocs).
  • Stax2 Extension API compliant (up to v2.1 of the extension API)

Earlier versions (1.0.x, 2.0.x) missed some of less obvious well-formedness checks (or had incomplete handling of such checks). This was due to the initial main goal of Woodstox was to make sure all well-formed documens are parsed succesfully and catching problems with well-formedness/validity were secondary (although important) concerns.

Known issues regarding (full) XML 1.0 compliancy, for Woodstox versions of 1.0.x and 2.0.x were:

  • XML character validity checking not complete. Specifically:
    • Name (character) validity checks used simpler and more sensible XML 1.1 rules, not the convoluted XML 1.0 rules (changed for version 2.9), even in XML 1.0 mode.
    • Text content is only verified for simple NULL character checking on parsing.
  • No verification is done to ensure that the entities expanded have proper nesting (within the scope of entity itself), so it is possible to use non-compliant unbalanced entities.
  • Validation problems were often reported as fatal errors: although this may be compliant behaviour (I believe parsers were allowed to stop parsing), it was unexpected for users of other XML parsing packages (like Xerces).

As mentioned above, all of above-mentioned issues were handled and resolved for 3.0 release.

  • No labels