Unfortunately, many browsers do not bother to send the charset information, leaving the web server to guess the correct encoding. For this reason, the Servlet API provides the
SevletRequest.setCharacterEncoding(String) method to allow the webapp developer to control the decoding of the form content.
It is a common fallacy that this permits international characters to be reliably transmitted. This is wrong.
This is because the %HH escape permits the transmission of a sequence of octets, but has nothing to say about what character encoding is in use.
Due to the lack of a standard, different browers took different approaches to the character encoding used. Some use the encoding of the page and some use UTF-8. Some drafts were prepared by various standards bodies suggesting that UTF-8 would become the standard encoding. Older versions of jetty (eg 4.0.x series) used UTF-8 as the default in anticipation of a standard being adopted. As a standard was not forthcoming, jetty-4.1.x reverted to a default encoding of ISO-8859-1.
The W3C organization's HTML standard now recommends the use of UTF-8: http://www.w3.org/TR/html40/appendix/notes.html#non-ascii-chars and accordingly jetty-6 series uses a default of UTF-8.
If UTF-8 is not correct for your environment, you may use one of two jetty-specific methods to set the charset encoding of the query string in GET requests:
- call Request.setQueryEncoding(String) before reading any of the content or params.
- set the system property
org.mortbay.util.URI.charsetto the encoding you want to use.
Handling of International characters by browsers
Anyone interested in the full complexity of handing international characters and languages might like to read the W3C's Character Model (currently a working draft) and follow the W3C's International Activity.
originally contributed by Chris Hayes