Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 23 Next »

XML encoding: pom.xml, site.xml, ... 

Until now (Maven 2.0.7), XML encoding support is buggy:

  • XML streams are read with platform encoding, which leads to problems with non-ascii characters on ascii based platforms, and every characters on non-ascii platforms (Z/OS with EBCDIC),
  • XML streams are transformed to String (with platform encoding), and the resulting String is reworked a lot before being parsed by an XML parser (interpolation...)
  • even if XML streams were directly passed to the XML parser, MXParser used by Maven does not support encoding itself...

Changing the parser, then the interpolation code is a big task. 

Solution: use XmlReader class from Rome to detect XML streams encoding as defined in XML specification

It won't change much things in the code: only the Reader instanciation. Every other code (particularly interpolation) can remain the same.

Note: corresponding XmlStreamWriter and WriterFactory have been added in plexus-utils 1.4.4, and XmlStreamReader to be coherent

Integration Level 1: detect XML encoding for user-written XML files

These files really need good XML encoding support, since user need accents and other local characters (Japanese, greek, cyrillic, ...)

  • [MODELLO-92]: use XmlStreamReader to read Modello .mdo files and update the misc. generators, DONE in modello 1.0-alpha-17
  • [MNG-2254]: use XmlStreamReader to read pom.xml, settings.xml and profiles.xml
  • [MANTTASKS-79]: add XML encoding detection support for pom.xml and settings.xml in Maven Ant Tasks
  • [MRELEASE-87]: Poms are written with wrong encodings
  • [MSITE-239]: use XmlStreamReader to read site.xml, DONE
  • [DOXIA-133]: XML encoding detection for xdoc, docbook, fml and xhtml files, DONE in doxia 1.0-alpha-9-SNAPSHOT and doxia-site 1.0-SNAPSHOT

Integration Level 2: detect XML encoding for internal XML files

These files shouldn't really need special characters, since they are technical descriptors (plexus.xml and so on). But this change is useful for non-ascii platforms (Z/OS with EBCDIC), where even simple ascii characters can't be read with platform encoding.

  • [PLX-343]: use XmlStreamReader in plexus-container-default to load internal XML configuration files, done in 1.0-alpha-30
  • [MANTTASKS-14]: make Maven Ant Tasks work on Z/OS
  • TODO: use XmlStreamReader class wherever an XML stream has to be changed into a String/Reader
  • TODO: check correct encoding when XML data are written to a stream through a Writer, using XmlStreamWriter if necessary

Subversion note

XML files should ideally be marked as "text/xml" to let svn and other tools know that XML encoding detection should be used:

 svn propset svn:mime-type text/xml *.xml *.mdo *.fml *.xhtml

Quick tests with viewvc 1.0.3 showed that such a mark did not change anything: an UTF-16 XML file was considered as binary, and no diff provided.

  • No labels