Current development on JAMWiki is primarily focused on maintenance rather than new features due to a lack of developer availability. If you are interested in working on JAMWiki please join the jamwiki-devel mailing list.

Non-ASCII Taglib Encoding Bug

Contents

Bug Report[edit]

Some users of non-English browsers reported that non-ASCII characters were getting corrupted. Other users reported no problems. JAMWiki enables UTF-8 support by doing the following:

  • All JSP pages use the <%@ page contentType="text/html; charset=utf-8" %> attribute.
  • The JAMWikiFilter class handles setting encoding for servlets using:
    request.setCharacterEncoding("UTF-8");
    response.setContentType("text/html; charset=utf-8");
  • Tomcat has been configured for UTF-8 by setting all <Connector> elements in server.xml to include a URIEncoding="UTF-8" attribute.

However, some users were reporting that the encoding they saw was localized, indicating that the system was not setting the UTF-8 encoding properly. Most users, however, did not see this problem. It is also worth noting that the problem was not present with Tomcat5, but only seemed to occur with Tomcat4.

The Problem[edit]

The source of the problem turned out to be the JSTL format library. The fmt:setBundle method was for some reason resetting the response encoding for users. Only users for whom an appropriate translation file existed would see this bug - thus, because an ApplicationResources_hu.properties file exists, a Hungarian user reported the problem. Because no ApplicationResources_pt.properties file exists, a Portugese user did not see the problem.

The Solution[edit]

The solution to this problem was partially inspired by an mailing list message that indicated that setting the fmt:message bundle from the web.xml file caused the response encoding to be permanently set to a non-UTF-8 value. Additionally, using the fmt:setBundle tag ALSO reset the response encoding value. The solution was then to remove the following lines from web.xml:

    <context-param>
        <param-name>javax.servlet.jsp.jstl.fmt.localizationContext</param-name>
        <param-value>ApplicationResources</param-value>
    </context-param>

The message resource bundle was then initialized using the following code:

<f:setBundle basename="ApplicationResources" />
    <%
    // must re-set the response header because the f:setBundle tag can sometimes override it
    response.setContentType("text/html; charset=utf-8");
    %>

This solution seems to have resolved the problem.

Archived Bug Report[edit]

  • Non-ASCII characters (accents, Asian characters, etc) are being corrupted during edits by some browsers. I haven't been able to reproduce this problem, so I need help tracking it down. You can help test this issue by editing the Sandbox topic on jamwiki.org and trying to save non-ASCII text such as "블로그". If possible, please include your browser version, language, and operating system in the edit, such as:
블로그 - entered with Mozilla Firefox / Windows XP / English (US).
    • UPDATE: response.setCharacterEncoding() is a servlet 2.4 only method, so I haven't used it thus far to avoid breaking 2.3 compatibility. However, given the problems some people are having I've now included it in such a way so that it will not break 2.3 servers but should work on 2.4 servers. With any luck this may help resolve the issue, but feedback is appreciated.
    • UPDATE2: So as of 23:50 PDT 31-July just about every possible setting that can be set UTF-8 has now been set. I really need help from people, so please copy the note above (including the Korean characters), add your browser/OS/language, and try to edit the Sandbox page to include the non-ASCII.
    • UPDATE3: The report that international character sets worked with JAMWiki 0.0.5 is a good one. I've begun reverting changes to bring the character set encoding back to where it was, but could use some help with testing. For users who have had trouble editing, can you also let me know if any Asian characters display on the site properly? Also, if you can let me know the character encoding that your browser reports it would also be helpful - on Firefox that can be done with "View" → "Character Encoding" and the value will (hopefully) be "UTF-8". Thanks! -- Ryan 02-Aug-2006 17:52 PDT
    • UPDATE4: I've tried something new based on suggestions on http://www.theserverside.com/discussions/thread.tss?thread_id=28944. I'm not hopeful that this will fix the problem, but testing is still appreciated by those who have had trouble editing with non-ASCII. -- Ryan 02-Aug-2006 19:16 PDT
    • UPDATE5: I seriously doubt this will make a difference, but MediaWiki set the Content-Type header as "charset=utf-8" while JAMWiki set it as "charset=UTF-8". I've updated JAMWiki to use lower-case, although if that fixes it then some browser designers need to be taken out and beaten. Anyhow, feedback is appreciated. -- Ryan 03-Aug-2006 19:45 PDT
    • UPDATE6: Installing the Chinese language packs for IE has allowed me to reproduce the problem (FINALLY!) so I can now test things much more quickly. This issue is bugging me enough that I'll be up for a while trying to fix it, so hopefully it will get resolved tonight. -- Ryan 03-Aug-2006 20:03 PDT
    • UPDATE7: So this now looks like it's a Tomcat configuration issue - using the Chinese language pack it breaks on jamwiki.org (Tomcat4) but works fine on my laptop (Tomcat5). I'm digging around trying to figure out what the difference is. -- Ryan 03-Aug-2006 21:06 PDT
    • UPDATE8: Got it. It has to do with the translation files - when the file is read Tomcat4 is setting an encoding. I'll figure out a fix shortly. -- Ryan 03-Aug-2006 23:01 PDT
    • UPDATE9: I think it's fixed now. I'll write up a full report later, but the problem seemed to be that the JSP format taglib was resetting encoding values for languages for which a translation file existed. That explains why Hungarian and Chinese users were seeing gibberish. Hopefully it works for everyone now - feedback is appreciated.
  • Feedback is needed for users of non-English operating systems and browsers, especially Japanese. Does JAMWiki work OK, especially non-English topics like Jyväskylä?
  • Search also fails with Chinese characters, but this is likely due to the issues above.