Skip to content
Skip to breadcrumbs
Skip to header menu
Skip to action menu
Skip to quick search
Quick Search
Browse
Pages
Blog
Labels
Attachments
Mail
Advanced
What’s New
Space Directory
Feed Builder
Keyboard Shortcuts
Confluence Gadgets
Log In
Dashboard
Maven User
Copy Page
You are not logged in. Any changes you make will be marked as
anonymous
. You may want to
Log In
if you already have an account. You can also
Sign Up
for a new account.
This page is being edited by
.
Paragraph
Paragraph
Heading 1
Heading 2
Heading 3
Heading 4
Heading 5
Heading 6
Preformatted
Quote
Bold
Italic
Underline
More colours
Strikethrough
Subscript
Superscript
Monospace
Clear Formatting
Bullet list
Numbered list
Outdent
Indent
Align left
Align center
Align right
Link
Table
Insert
Insert Content
Image
Link
Attachment
Symbol
Emoticon
Wiki Markup
Horizontal rule
tinymce.confluence.insert_menu.macro_desc
Info
JIRA Issue
Status
Gallery
Tasklist
Table of Contents
Other Macros
Page Layout
No Layout
Two column (simple)
Two column (simple, left sidebar)
Two column (simple, right sidebar)
Three column (simple)
Two column
Two column (left sidebar)
Two column (right sidebar)
Three column
Three column (left and right sidebars)
Undo
Redo
Find/Replace
Keyboard Shortcuts Help
<p>Currently, the character encoding for source files needs to be configured individually for each and every plugin that processes source files. In this context, <em>source file</em> refers to some plain text file that - unlike an XML file - lacks intrinsic means to specify the employed file encoding. The Java source files are the most promiment example of such text files. Velocity templates, BeanShell scripts and APT documents are further examples. This proposal does not apply to XML files as their encoding can be determined from the file itself, see <a class="confluence-link" href="/display/MAVENUSER/XML+encoding" data-linked-resource-id="3866696" data-linked-resource-type="page" data-linked-resource-default-alias="XML encoding" data-base-url="http://docs.codehaus.org">XML encoding</a> for further information.</p> <p>Life would become easier if there was a dedicated POM element like <code><strong>${project.build.sourceEncoding}</strong></code> which could be used to specify the encoding once per entire project. Every plugin could use it as default value:</p> <table class="wysiwyg-macro" data-macro-name="code" style="background-image: url(/plugins/servlet/confluence/placeholder/macro-heading?definition=e2NvZGV9&locale=en_GB&version=2); background-repeat: no-repeat;" data-macro-body-type="PLAIN_TEXT"><tr><td class="wysiwyg-macro-body"><pre>/** * @parameter expression="${encoding}" default-value="${project.build.sourceEncoding}" */ private String encoding; </pre></td></tr></table> <p>Adding this element to the POM structure can only happen in Maven 3.x (tracked with <a href="http://jira.codehaus.org/browse/MNG-2216">MNG-2216</a> issue):</p> <table class="wysiwyg-macro" data-macro-name="code" data-macro-default-parameter="xml" style="background-image: url(/plugins/servlet/confluence/placeholder/macro-heading?definition=e2NvZGU6eG1sfQ&locale=en_GB&version=2); background-repeat: no-repeat;" data-macro-body-type="PLAIN_TEXT"><tr><td class="wysiwyg-macro-body"><pre><project> ... <build> <!-- NOTE: This is just a vision for the future, it's not yet implemented: see MNG-2216 --> <sourceEncoding>UTF-8</sourceEncoding> ... </build> ... </project> </pre></td></tr></table> <p>For Maven 2.x, the value can be defined as an equivalent property:</p> <table class="wysiwyg-macro" data-macro-name="code" data-macro-default-parameter="xml" style="background-image: url(/plugins/servlet/confluence/placeholder/macro-heading?definition=e2NvZGU6eG1sfQ&locale=en_GB&version=2); background-repeat: no-repeat;" data-macro-body-type="PLAIN_TEXT"><tr><td class="wysiwyg-macro-body"><pre><project> ... <properties> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> ... </properties> ... </project> </pre></td></tr></table> <p>Thus plugins could immediately be modified to use <code>${project.build.sourceEncoding</code>} expression, whatever Maven version is used.</p> <h2>Motivation</h2> <p>Why bother with file encoding at all? Well, a file encoding (aka charset) is required to solve the following discrepancy: A file stored on disk or transmitted via network is merely a stream of bytes/octets. In contrast, text is a stream of characters. However, a character is <em>not</em> a byte.</p> <p>To further illustrate this, just consider the Unicode standard chosen for a Java <a href="http://java.sun.com/javase/6/docs/api/java/lang/String.html">String</a>. Unicode defines more than 65.000 characters which obviously cannot be mapped to a single byte each. Hence, one needs a reversible transformation that defines how to map a character to bytes and vice-versa. This transformation is called a file/character encoding.</p> <p>Now, there are different encodings, each potentially yielding different bytes for the same character. For example, the common encoding <a href="http://en.wikipedia.org/wiki/ASCII">ASCII</a> will map the character 'A' to the byte with the hex code 0x41. The same character is mapped to the byte 0xC1 when using the encoding <a href="http://en.wikipedia.org/wiki/EBCDIC">EBCDIC</a>. Another example is the character 'ü' (small letter u with umlaut) that maps to the single byte 0xFC when using <a href="http://en.wikipedia.org/wiki/ISO/IEC_8859-1">ISO-8859-1</a> but maps to the two byte sequence 0xC3 0xBC when using <a href="http://en.wikipedia.org/wiki/UTF-8">UTF-8</a>.</p> <p>It should be clear by now that encoding a character with one encoding and later on decoding it with a different encoding can corrupt the character. To avoid such errors, it is crucial that all developers of a project have agreed to use the same encoding when editing the project sources and running the build.</p> <h2>Default Value</h2> <p>As shown by a <a href="http://www.nabble.com/-POLL--Default-Value-for-File-Encoding-td16958386.html">user poll on the mailing list</a> and the numerous comments on this article, this proposal has been revised: Plugins should use the platform default encoding if no explicit file encoding has been provided in the plugin configuration.</p> <p>Since usage of the platform encoding yields platform-dependent and hence potentially irreproducible builds, plugins should output a warning to inform the user about this threat, e.g.:</p> <table class="wysiwyg-macro" data-macro-name="noformat" style="background-image: url(/plugins/servlet/confluence/placeholder/macro-heading?definition=e25vZm9ybWF0fQ&locale=en_GB&version=2); background-repeat: no-repeat;" data-macro-body-type="PLAIN_TEXT"><tr><td class="wysiwyg-macro-body"><pre>[WARNING] Using platform encoding (Cp1252 actually) to copy filtered resources, i.e. build is platform dependent! </pre></td></tr></table> <p>This way, users can smoothly update their POMs to follow best practices.</p> <h2>Code Spots to Review for Proper Encoding Handling</h2> <p>The following classes and/or methods indicate usage of the JVM's default encoding and hence should be reviewed:</p> <ul class="alternate"> <li><code>String(byte[])</code></li> <li><code>String.getBytes()</code></li> <li><code>FileReader</code></li> <li><code>FileWriter</code></li> <li><code>PrintWriter(File)</code><em>(new in JDK 5)</em></li> <li><code>PrintWriter(OutputStream)</code><em>(new in JDK 5)</em></li> <li><code>InputStreamReader(InputStream)</code></li> <li><code>OutputStreamWriter(OutputStream)</code></li> <li><code>ReaderFactory.newPlatformReader()</code></li> <li><code>WriterFactory.newPlatformWriter()</code></li> <li><code>FileUtils.fileRead(String)</code></li> <li><code>FileUtils.fileRead(File)</code></li> <li><code>FileUtils.fileWrite(String, String)</code></li> <li><code>FileUtils.fileAppend(String, String)</code></li> <li><code>IOUtils.toString(InputStream)</code></li> <li><code>IOUtils.toString(InputStream, int)</code></li> </ul> <h2>Plugins to Modify</h2> <p>Build plugins are highlighted, since the impact of the change is more critical to the built artifact than reporting plugins. </p> <p><strong>Affected Apache plugins:</strong></p> <ul class="alternate"> <li>maven-changes-plugin (velocity template for announcement): <a href="http://jira.codehaus.org/browse/MCHANGES-71">MCHANGES-71</a>, done in 2.1</li> <li>maven-checkstyle-plugin (source analysis): <a href="http://jira.codehaus.org/browse/MCHECKSTYLE-95">MCHECKSTYLE-95</a>, done in 2.2</li> <li><strong>maven-compiler-plugin</strong> (source processing): <a href="http://jira.codehaus.org/browse/MCOMPILER-70">MCOMPILER-70</a>, done in 2.1</li> <li><strong>maven-invoker-plugin</strong> (beanshell script evaluation): <a href="http://jira.codehaus.org/browse/MINVOKER-30">MINVOKER-30</a>, done in 1.2</li> <li>maven-javadoc-plugin (source processing): <a href="http://jira.codehaus.org/browse/MJAVADOC-182">MJAVADOC-182</a>, done in 2.5</li> <li>maven-jxr-plugin (source processing): <a href="http://jira.codehaus.org/browse/JXR-60">JXR-60</a>, done in 2.2</li> <li><strong>maven-plugin-plugin</strong> (javadoc extraction, java source generation): <a href="http://jira.codehaus.org/browse/MPLUGIN-101">MPLUGIN-101</a>, <a href="http://jira.codehaus.org/browse/MPLUGIN-100">MPLUGIN-100</a>, done in 2.5</li> <li>maven-pmd-plugin (source analysis): <a href="http://jira.codehaus.org/browse/MPMD-76">MPMD-76</a>, done in 2.4</li> <li><strong>maven-resources-plugin</strong> (contents filtering): <a href="http://jira.codehaus.org/browse/MRESOURCES-57">MRESOURCES-57</a>, done in 2.3</li> <li>maven-site-plugin (apt sources): <a href="http://jira.codehaus.org/browse/MSITE-314">MSITE-314</a>, done in 2.0-beta-7</li> </ul> <p><strong>Affected Codehaus plugins:</strong></p> <ul class="alternate"> <li>findbugs-maven-plugin: (no Jira issue), done in 2.2</li> <li>jalopy-maven-plugin: <a href="http://jira.codehaus.org/browse/MOJO-1138">MOJO-1138</a>, done in 1.0-alpha-2-SNAPSHOT</li> <li>javancss-maven-plugin: <a href="http://jira.codehaus.org/browse/MJNCSS-31">MJNCSS-31</a></li> <li><strong>modello-maven-plugin</strong>/modello-core (java source generation): <a href="http://jira.codehaus.org/browse/MODELLO-109">MODELLO-109</a>, done in 1.0-alpha-19</li> <li>native2ascii-maven-plugin</li> <li><strong>plexus-component-metadata</strong> (formerly <strong>plexus-maven-plugin</strong>) (javadoc extraction): <a href="http://jira.codehaus.org/browse/PLX-371">PLX-371</a>, done in 1.0-beta-3.0.4</li> <li><strong>shitty-maven-plugin</strong> (groovy script evaluation)</li> <li>simian-maven-plugin</li> <li>taglist-maven-plugin (javadoc extraction): <a href="http://jira.codehaus.org/browse/MTAGLIST-27">MTAGLIST-27</a>, done in 2.3</li> </ul> <h2>References</h2> <p>Please see [0] for the related thread from the mailing list, [1] for some further descriptions and [2] for a similar feature request in JIRA. Also note a related proposal for the output encoding of reports [3].</p> <p>[0] <a href="http://www.nabble.com/POM-Element-for-Source-File-Encoding-to14930345s177.html">http://www.nabble.com/POM-Element-for-Source-File-Encoding-to14930345s177.html</a></p> <p>[1] <a href="http://www.nabble.com/Re%3A-Maven-and-File-Encoding-p16301958s177.html">http://www.nabble.com/Re%3A-Maven-and-File-Encoding-p16301958s177.html</a></p> <p>[2] <a href="http://jira.codehaus.org/browse/MNG-2216">MNG-2216</a></p> <p>[3] <a href="http://docs.codehaus.org/display/MAVEN/Reporting+Encoding+Configuration">Reporting Encoding Configuration</a></p>
Please type the word appearing in the picture.
Attachments
Labels
Location
Watch this page
< Edit
Preview >
Loading…
Save
Cancel
Next hint
search
attachments
weblink
advanced