XML encoding: pom.xml, site.xml, ...
Until now (Maven 2.0.7), XML encoding support is buggy:
- XML streams are read with platform encoding, which leads to problems with non-ascii characters on ascii based platforms, and every characters on non-ascii platforms (Z/OS with EBCDIC),
- XML streams are transformed to String (with platform encoding), and the resulting String is reworked a lot before being parsed by an XML parser (interpolation...)
- even if XML streams were directly passed to the XML parser, MXParser used by Maven does not support encoding itself...
Changing the parser, then the interpolation code is a big task.
It won't change much things in the code: only the Reader instanciation. Every other code (particularly interpolation) can remain the same.
- [PLXUTILS-11]: add XmlReader to plexus-utils, done in plexus-utils 1.4.3
Integration Level 1: detect XML encoding for user-written XML files
These files really need good XML encoding support, since user need accents and other characters (Japanese, greek, cyrillic, ...)
- [MNG-2254]: use XmlReader to read pom.xml, settings.xml and profiles.xml
- [MSITE-239]: use XmlReader to read site.xml
- [DOXIA-133]: XML encoding detection for xdoc, docbook, fml and xhtml files
Integration Level 2: detect XML encoding for internal XML files
These files shouldn't really need special characters, since they are technical descriptors (plexus.xml and so on). But this change is useful for non-ascii platforms (Z/OS with EBCDIC), where even simple ascii characters can't be read with platform encoding.
- [PLX-343]: use XmlReader in plexus-container-default to load internal XML configuration files
- [MANTTASKS-14]: make Maven Ant Tasks work on Z/OS
- TODO : use XmlReader class wherever an XML stream has to be changed into a String/Reader
- TODO: integrate it to Modello
XML files should ideally be marked as "text/xml" to let svn and other tools know that XML encoding detection should be used:
Quick tests with viewvc 1.0.3 showed that such a mark did not change anything: an UTF-16 XML file was considered as binary, and no diff provided.