Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 29 Next »

Currently, the character encoding for source files needs to be configured individually for each and every plugin that processes source files. In this context, source file refers to some plain text file that - unlike an XML file - lacks intrinsic means to specify the employed file encoding. The Java source files are the most promiment example of such text files. Velocity templates, BeanShell scripts and APT documents are further examples.

Life would become easier if there was a dedicated POM element like ${} which could be used to specify the encoding once per entire project. Every plugin could use it as default value:

Adding this element to the POM structure can only happen in Maven 2.1:

For Maven 2.0.x, the value can be defined as an equivalent property:

Thus plugins could immediately be modified to use ${} expression, whatever Maven version is used.

Default Value

Without default value for source encoding, platform encoding is used, which is bad for build reproducibility. Then setting a default value consistently across every Maven plugin will improve build reproducibility.

Proposed default value: ISO-8859-1, which must be supported by every JVM (see java.nio.Charset) and is already the default value for some plugins (the majority of plugins use platform encoding as a default value instead).

Note: Using a fixed default value for the encoding instead of the platform encoding can potentially break builds that rely on a platform encoding other than the proposed Latin-1 but did not lock this down in the POM. It is assumed that those builds:

  1. are neglectable in number
  2. are easy to fix by setting the new property

As such the general benefit of out-of-the-box reproducibility outweighs.

A check has to be coded in every plugin with the default value:

This default value can be coded in POM model too for 2.1.x (default value of the encoding attribute) and in super-pom in Maven 2.0.x. But this change is only for clarity since without it, the previous check coded in every plugin will transform null value to the chosen default value.

Code Spots to Review for Proper Encoding Handling

The following classes and/or methods indicate usage of the JVM's default encoding and hence should be reviewed:

  • FileReader
  • FileWriter
  • InputStreamReader(InputStream)
  • OutputStreamWriter(OutputStream)
  • ReaderFactory.newPlatformReader()
  • WriterFactory.newPlatformWriter()
  • FileUtils.fileRead(String)
  • FileUtils.fileRead(File)
  • FileUtils.fileWrite(String, String)
  • FileUtils.fileAppend(String, String)

Plugins to Modify

Affected Apache plugins:

  • maven-changes-plugin (velocity template processing)
  • maven-compiler-plugin (source processing): MCOMPILER-70, done in 2.1-SNAPSHOT
  • maven-invoker-plugin (beanshell script evaluation): MINVOKER-30, done in 1.2-SNAPSHOT
  • maven-javadoc-plugin (source processing): MJAVADOC-182, done in 2.5-SNAPSHOT
  • maven-jxr-plugin (source processing): JXR-60, done in 2.2-SNAPSHOT
  • maven-plugin-plugin (javadoc extraction, java source generation): MPLUGIN-101, MPLUGIN-100
  • maven-pmd-plugin (source analysis): MPMD-76
  • maven-resources-plugin (contents filtering): MRESOURCES-57, done in 2.3-SNAPSHOT
  • maven-site-plugin (apt sources): MSITE-314, done in 2.0-beta-7-SNAPSHOT

Affected Codehaus plugins:

  • modello-maven-plugin/modello-core (java source generation)
  • plexus-maven-plugin (javadoc extraction)
  • shitty-maven-plugin (groovy script evaluation)
  • taglist-maven-plugin (javadoc extraction)


Please see [0] for the related thread from the mailing list, and [1] for some further descriptions.



  • No labels