Skip to content
Skip to breadcrumbs
Skip to header menu
Skip to action menu
Skip to quick search
Quick Search
Browse
Pages
Blog
Labels
Attachments
Mail
Advanced
What’s New
Space Directory
Feed Builder
Keyboard Shortcuts
Confluence Gadgets
Log In
Dashboard
Maven User
Copy Page
You are not logged in. Any changes you make will be marked as
anonymous
. You may want to
Log In
if you already have an account. You can also
Sign Up
for a new account.
This page is being edited by
.
Paragraph
Paragraph
Heading 1
Heading 2
Heading 3
Heading 4
Heading 5
Heading 6
Preformatted
Quote
Bold
Italic
Underline
More colours
Strikethrough
Subscript
Superscript
Monospace
Clear Formatting
Bullet list
Numbered list
Outdent
Indent
Align left
Align center
Align right
Link
Table
Insert
Insert Content
Image
Link
Attachment
Symbol
Emoticon
Wiki Markup
Horizontal rule
tinymce.confluence.insert_menu.macro_desc
Info
JIRA Issue
Status
Gallery
Tasklist
Table of Contents
Other Macros
Page Layout
No Layout
Two column (simple)
Two column (simple, left sidebar)
Two column (simple, right sidebar)
Three column (simple)
Two column
Two column (left sidebar)
Two column (right sidebar)
Three column
Three column (left and right sidebars)
Undo
Redo
Find/Replace
Keyboard Shortcuts Help
<h1>XML encoding: pom.xml, site.xml, ... </h1> <p>Until now (Maven 2.0.7), XML encoding support is buggy:</p> <ul> <li>XML streams are read with platform encoding, which leads to problems with non-ascii characters on ascii based platforms, and every characters on non-ascii platforms (Z/OS with EBCDIC),</li> <li>XML streams are transformed to String (with platform encoding), and the resulting String is reworked a lot before being parsed by an XML parser (interpolation...)</li> <li>even if XML streams were directly passed to the XML parser, MXParser used by Maven does not support encoding itself...</li> </ul> <p><a class="confluence-link" href="/display/MAVEN/POM+Loading+and+Building" data-linked-resource-id="58360" data-linked-resource-type="page" data-linked-resource-default-alias="POM Loading and Building" data-base-url="http://docs.codehaus.org">Changing the parser</a>, then the interpolation code is a big task. </p> <p><strong>Solution:</strong> use <a href="https://rome.dev.java.net/apidocs/0_9/com/sun/syndication/io/XmlReader.html">XmlReader</a> class from <a href="https://rome.dev.java.net/">Rome</a> to detect XML streams encoding as defined in <a href="http://www.w3.org/TR/2004/REC-xml-20040204/#sec-guessing">XML specification</a></p> <p>It won't change much things in the code: only the Reader instanciation. Every other code (particularly interpolation) can remain the same.</p> <ul> <li>[<a href="http://jira.codehaus.org/browse/PLXUTILS-11">PLXUTILS-11</a>]: add XmlReader to plexus-utils, DONE in plexus-utils 1.4.5 as <a href="http://plexus.codehaus.org/plexus-utils/apidocs/org/codehaus/plexus/util/xml/XmlStreamReader.html">XmlStreamReader</a> with <a href="http://plexus.codehaus.org/plexus-utils/apidocs/org/codehaus/plexus/util/ReaderFactory.html">ReaderFactory</a></li> </ul> <p>Note: corresponding <a href="http://plexus.codehaus.org/plexus-utils/apidocs/org/codehaus/plexus/util/xml/XmlStreamWriter.html">XmlStreamWriter</a> and <a href="http://plexus.codehaus.org/plexus-utils/apidocs/org/codehaus/plexus/util/WriterFactory.html">WriterFactory</a> have been added in plexus-utils 1.4.4, and XmlReader renamed to XmlStreamReader to be consistent</p> <h2>Integration Level 1: detect XML encoding for user-written XML files</h2> <p>These files really need good XML encoding support, since user need accents and other local characters (Japanese, greek, cyrillic, ...)</p> <ul> <li>[<a href="http://jira.codehaus.org/browse/MODELLO-92">MODELLO-92</a>]: use XmlStreamReader to read Modello .mdo files and update the misc. generators, DONE in modello 1.0-alpha-17</li> <li>[<a href="http://jira.codehaus.org/browse/MNG-2254">MNG-2254</a>]: use XmlStreamReader to read pom.xml, settings.xml and profiles.xml, DONE in <strong>Maven 2.0.8</strong></li> <li>[<a href="http://jira.codehaus.org/browse/MANTTASKS-79">MANTTASKS-79</a>]: add XML encoding detection support for pom.xml and settings.xml in Maven Ant Tasks, DONE in 2.0.8</li> <li>[<a href="http://jira.codehaus.org/browse/MINSTALL-44">MINSTALL-44</a>]: add XML encoding support when reading/writing POM files in install plugin, DONE in 2.3</li> <li>[<a href="http://jira.codehaus.org/browse/MDEPLOY-66">MDEPLOY-66</a>]: add XML encoding support when reading/writing POM files in deploy plugin, DONE in 2.4</li> <li>[<a href="http://jira.codehaus.org/browse/MRELEASE-87">MRELEASE-87</a>]: Poms are written with wrong encodings, DONE in 2.0-beta-8</li> <li>[<a href="http://jira.codehaus.org/browse/MSITE-239">MSITE-239</a>]: use XmlStreamReader to read site.xml, DONE in <strong>maven-site-plugin 2.0-beta-6</strong></li> <li>[<a href="http://jira.codehaus.org/browse/DOXIA-133">DOXIA-133</a>]: XML encoding detection for xdoc, docbook, fml and xhtml files, DONE in doxia 1.0-alpha-9 and doxia-site 1.0-alpha-9</li> <li>[<a href="http://jira.codehaus.org/browse/MECLIPSE-56">MECLIPSE-56</a>]: problem with non-ascii characters in generated .project-file, DONE in eclipse-plugin 2.5</li> <li>[<a href="http://jira.codehaus.org/browse/MREPOSITORY-10">MREPOSITORY-10</a>]: done in 2.1</li> </ul> <h2>Integration Level 2: detect XML encoding for internal XML files</h2> <p>These files shouldn't really need special characters, since they are technical descriptors (plexus.xml and so on). But this change is useful for non-ascii platforms (Z/OS with EBCDIC), where even simple ascii characters can't be read with platform encoding.</p> <ul> <li>[<a href="http://jira.codehaus.org/browse/PLX-343">PLX-343</a>]: use XmlStreamReader in plexus-container-default to load internal XML configuration files, DONE in 1.0-alpha-30, integrated in Maven 2.1-SNAPSHOT 31/7/2007</li> <li>[<a href="http://jira.codehaus.org/browse/MANTTASKS-14">MANTTASKS-14</a>]: make Maven Ant Tasks work on Z/OS</li> <li>TODO: use XmlStreamReader class wherever an XML stream has to be changed into a String/Reader</li> <li>TODO: check correct encoding when XML data are written to a stream through a Writer, using XmlStreamWriter if necessary</li> </ul> <h2>Technical Notes </h2> <h3><code>new FileReader/Writer(File)</code> vs <code>Reader/WriterFactory.newPlatformReader/Writer(File)</code></h3> <p>When using <code><a href="http://java.sun.com/j2se/1.4.2/docs/api/java/io/FileReader.html">new FileReader(File)</a></code> or <code><a href="http://java.sun.com/j2se/1.4.2/docs/api/java/io/FileWriter.html">new FileWriter(File)</a></code> API, platform encoding is used for conversion between bytes and characters.</p> <p>The Java API documentation is explicit about this fact (if you read it carefully: yes, look at the class description, not the constructor comments), but this is not obvious when using the API: developers tend to forget that they chose an encoding when using this API.</p> <p><code><a href="http://plexus.codehaus.org/plexus-utils/apidocs/org/codehaus/plexus/util/ReaderFactory.html#newPlatformReader(java.io.File)">ReaderFactory.newPlatformReader(File)</a></code> and <code><a href="http://plexus.codehaus.org/plexus-utils/apidocs/org/codehaus/plexus/util/WriterFactory.html#newPlatformWriter(java.io.File)">WriterFactory.newPlatformWriter(File)</a></code> API simply calls previous API, but when using it, the encoding choice is explicit.</p> <p>After you have replaced your <code>FileReader/Writer</code> constructor with this API which is explicit about encoding choice, you understand that if the file read/written is XML, platform encoding is a wrong choice: you need XML encoding detection, which is the purpose of <code><a href="http://plexus.codehaus.org/plexus-utils/apidocs/org/codehaus/plexus/util/ReaderFactory.html#newXmlReader(java.io.File)">ReaderFactory.newXmlReader(File)</a></code> and <code><a href="http://plexus.codehaus.org/plexus-utils/apidocs/org/codehaus/plexus/util/WriterFactory.html#newXmlWriter(java.io.File)">WriterFactory.newXmlWriter(File)</a></code>...</p> <h3>Integrating XML encoding detection in Maven plugins</h3> <p>A lot of Maven plugins read and write XML files, and they're actually doing it with platform encoding (ie <code>FileReader/Writer</code>): the change to <code>Reader/WriterFactory.newPlatformReader/Writer</code> should be done.</p> <p><strong>But there is a problem</strong> with Maven versions earlier than 2.0.6: in Maven 2.0.5 and earlier, plexus-utils version is forced by Maven Core and cannot be overriden by a plugin. <a href="http://jira.codehaus.org/browse/MNG-2892">MNG-2892</a> (released in Maven 2.0.6) fixed this limitation. Then Maven 2.0.6 is a prerequisite to fix plugins...</p> <p>What can be done?</p> <ol> <li>In maven-site-plugin, XML encoding classes from plexus-utils were copied to plugin's sources (<a href="http://jira.codehaus.org/browse/MSITE-242">MSITE-242</a> to remove them): there is a lot of XML files read by this plugin, with strong encoding support need, then this bad solution was really the best one. But this wouldn't be good to do such a copy in every plugin.</li> <li>A light solution is to replace <code>new FileReader( File )</code> with <code>new InputStreamReader( new FileInputStream( File ), "utf-8" )</code>: if XML encoding detection is not supported, at least reading the file with default XML encoding, UTF-8, is both more powerful and more coherent (not a bug but a missing feature).</li> <li>Another solution would be to have XML encoding classes in another library than plexus-utils...</li> </ol> <p>In plugins reading and writing POM files (install, deploy and release), there is no choice: XML encoding support must be the same as in Maven Core, then the classes will be copied in the plugins. But in assembly, for example, assembly.xml file is now simply read as UTF-8.</p> <p>Here are Jira issues to track where they have been copied, to schedule their removal when upgrading prerequisite to Maven 2.0.6+:</p> <ul> <li><a href="http://jira.codehaus.org/browse/MSITE-242">MSITE-242</a> for site plugin: done in maven-site-plugin 2.0</li> <li><a href="http://jira.codehaus.org/browse/MINSTALL-46">MINSTALL-46</a> for install plugin: done in maven-install-plugin 2.3</li> <li><a href="http://jira.codehaus.org/browse/MDEPLOY-70">MDEPLOY-70</a> for deploy plugin: done in maven-deploy-plugin 2.5</li> <li><a href="http://jira.codehaus.org/browse/MRELEASE-316">MRELEASE-316</a> for release plugin: done in maven-release-plugin 2.0-beta-8</li> <li><a href="http://jira.codehaus.org/browse/MODELLO-110">MODELLO-110</a> for Modello: done in modello 1.0-alpha-19</li> </ul> <h3>Subversion properties</h3> <p>XML files should ideally be marked as "text/xml" to let svn and other tools know that XML encoding detection should be used:</p> <table class="wysiwyg-macro" data-macro-name="noformat" style="background-image: url(/plugins/servlet/confluence/placeholder/macro-heading?definition=e25vZm9ybWF0fQ&locale=en_GB&version=2); background-repeat: no-repeat;" data-macro-body-type="PLAIN_TEXT"><tr><td class="wysiwyg-macro-body"><pre> svn propset svn:mime-type text/xml *.xml *.mdo *.fml *.xhtml </pre></td></tr></table> <p>Quick tests with viewvc 1.0.3 showed that such a mark did not change anything: an UTF-16 XML file was considered as binary, and no diff provided.</p>
Please type the word appearing in the picture.
Attachments
Labels
Location
Watch this page
< Edit
Preview >
Loading…
Save
Cancel
Next hint
search
attachments
weblink
advanced