Until now (Maven 2.0.7), XML encoding support is buggy:
Changing the parser, then the interpolation code is a big task.
Solution: use XmlReader class from Rome to detect XML streams encoding as defined in XML specification
It won't change much things in the code: only the Reader instanciation. Every other code (particularly interpolation) can remain the same.
Note: corresponding XmlStreamWriter and WriterFactory have been added in plexus-utils 1.4.4, and XmlReader renamed to XmlStreamReader to be consistent
These files really need good XML encoding support, since user need accents and other local characters (Japanese, greek, cyrillic, ...)
These files shouldn't really need special characters, since they are technical descriptors (plexus.xml and so on). But this change is useful for non-ascii platforms (Z/OS with EBCDIC), where even simple ascii characters can't be read with platform encoding.
new FileReader(File) or
new FileWriter(File) API, platform encoding is used for conversion between bytes and characters.
The Java API documentation is explicit about this fact (if you read it carefully: yes, look at the class description, not the constructor comments), but this is not obvious when using the API: developers tend to forget that they chose an encoding when using this API.
WriterFactory.newPlatformWriter(File) API simply calls previous API, but when using it, the encoding choice is explicit.
After you have replaced your
FileReader/Writer constructor with this API which is explicit about encoding choice, you understand that if the file read/written is XML, platform encoding is a wrong choice: you need XML encoding detection, which is the purpose of
A lot of Maven plugins read and write XML files, and they're actually doing it with platform encoding (ie
FileReader/Writer): the change to
Reader/WriterFactory.newPlatformReader/Writer should be done.
But there is a problem with Maven versions earlier than 2.0.6: in Maven 2.0.5 and earlier, plexus-utils version is forced by Maven Core and cannot be overriden by a plugin. MNG-2892 (released in Maven 2.0.6) fixed this limitation. Then Maven 2.0.6 is a prerequisite to fix plugins...
What can be done?
new FileReader( File )with
new InputStreamReader( new FileInputStream( File ), "utf-8" ): if XML encoding detection is not supported, at least reading the file with default XML encoding, UTF-8, is both more powerful and more coherent (not a bug but a missing feature).
In plugins reading and writing POM files (install, deploy and release), there is no choice: XML encoding support must be the same as in Maven Core, then the classes will be copied in the plugins. But in assembly, for example, assembly.xml file is now simply read as UTF-8.
Here are Jira issues to track where they have been copied, to schedule their removal when upgrading prerequisite to Maven 2.0.6+:
XML files should ideally be marked as "text/xml" to let svn and other tools know that XML encoding detection should be used:
svn propset svn:mime-type text/xml *.xml *.mdo *.fml *.xhtml
Quick tests with viewvc 1.0.3 showed that such a mark did not change anything: an UTF-16 XML file was considered as binary, and no diff provided.