Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 35 Next »

Currently, the character encoding for source files needs to be configured individually for each and every plugin that processes source files. In this context, source file refers to some plain text file that - unlike an XML file - lacks intrinsic means to specify the employed file encoding. The Java source files are the most promiment example of such text files. Velocity templates, BeanShell scripts and APT documents are further examples.

Life would become easier if there was a dedicated POM element like ${project.build.sourceEncoding} which could be used to specify the encoding once per entire project. Every plugin could use it as default value:

Adding this element to the POM structure can only happen in Maven 2.1:

For Maven 2.0.x, the value can be defined as an equivalent property:

Thus plugins could immediately be modified to use ${project.build.sourceEncoding} expression, whatever Maven version is used.

Default Value

Without default value for source encoding, local machines' detected platform encoding is used, which is not ideal for build reproducibility. Then setting a static default value consistently across every Maven plugin will improve build reproducibility.

Note: who is affected? Teams with non-uniform configuration: either in multiple countries having different character sets, or on different OSs (Unix*s tend to use UTF-8 as platform encoding, where Windows stays with some sort of ISO encoding). For teams with uniform configuration, there is no immediate problem with using local machines' detected platform encoding, it "simply works".

Proposed default value: ISO-8859-1, which must be supported by every JVM (see java.nio.Charset) and is already the default value for some plugins (the majority of plugins use platform encoding as a default value instead).

Note: Using a fixed default value for the encoding instead of the locally detected platform encoding will break builds that rely on a local machines' platform encoding other than the proposed Latin-1 but did not lock this down in the POM. This problem is of limited impact for reporting plugins (will "only" lead to some garbage in reports), but will really break produced artifacts when build plugins are involved (compiler, resources, modello, plugin, invoker, shitty).
It is assumed that those builds:

  1. are not the vast majority
  2. are easy to fix by setting the new property (platform encoding has been added to "mvn -v" output to help choosing the value, see MNG-3509)

As such the general benefit of out-of-the-box reproducibility outweighs.

A check has to be coded in every plugin with the default value:

This default value can be coded in POM model too for 2.1.x (default value of the encoding attribute) and in super-pom in Maven 2.0.x. But this change is only for clarity since without it, the previous check coded in every plugin will transform null value to the chosen default value.

Code Spots to Review for Proper Encoding Handling

The following classes and/or methods indicate usage of the JVM's default encoding and hence should be reviewed:

  • FileReader
  • FileWriter
  • InputStreamReader(InputStream)
  • OutputStreamWriter(OutputStream)
  • ReaderFactory.newPlatformReader()
  • WriterFactory.newPlatformWriter()
  • FileUtils.fileRead(String)
  • FileUtils.fileRead(File)
  • FileUtils.fileWrite(String, String)
  • FileUtils.fileAppend(String, String)
  • IOUtils.toString(InputStream)
  • IOUtils.toString(InputStream, int)

Plugins to Modify

Affected Apache plugins:

  • maven-changes-plugin (velocity template processing): MCHANGES-71
  • maven-compiler-plugin (source processing): MCOMPILER-70, done in 2.1-SNAPSHOT
  • maven-invoker-plugin (beanshell script evaluation): MINVOKER-30, done in 1.2-SNAPSHOT
  • maven-javadoc-plugin (source processing): MJAVADOC-182, done in 2.5-SNAPSHOT
  • maven-jxr-plugin (source processing): JXR-60, done in 2.2-SNAPSHOT
  • maven-plugin-plugin (javadoc extraction, java source generation): MPLUGIN-101, MPLUGIN-100
  • maven-pmd-plugin (source analysis): MPMD-76, done in 2.4-SNAPSHOT
  • maven-resources-plugin (contents filtering): MRESOURCES-57, done in 2.3-SNAPSHOT
  • maven-site-plugin (apt sources): MSITE-314, done in 2.0-beta-7-SNAPSHOT

Affected Codehaus plugins:

  • modello-maven-plugin/modello-core (java source generation)
  • plexus-maven-plugin (javadoc extraction)
  • shitty-maven-plugin (groovy script evaluation)
  • taglist-maven-plugin (javadoc extraction)

References

Please see [0] for the related thread from the mailing list, and [1] for some further descriptions.

[0] http://www.nabble.com/POM-Element-for-Source-File-Encoding-to14930345s177.html

[1] http://www.nabble.com/Re%3A-Maven-and-File-Encoding-p16301958s177.html

  • No labels