Solving Character Set Issues With Legacy Systems

While all good modern software supports non-english text (usually by supporting Unicode), sometimes we still have to work with software that either doesn't yet have Unicode support, or isn't correctly configured for Unicode support. Fortunately, EditLive! is designed to not only work in these situations but provides a really useful setting to ensure that the content coming out of EditLive! is safe to pass on to the non-unicode parts of the system. It's as simple as one JavaScript call:

editlive.setOutputCharset("US-ASCII");

The setOutputCharset call tells EditLive! to ensure that it's output is compatible with the specified character set. You can still use the full set of Unicode characters, EditLive! will simply HTML encode them so that they pass through legacy systems without being corrupted. Users can paste content with special characters from any source, including Microsoft Word and other web pages and EditLive! will make sure that the content is safely encoded so that the output is still completely valid US-ASCII.

You can specify any supported character set in the setOutputCharset call, for instance if your systems fully support Unicode, you could use it to gradually convert your content from the legacy character set to UTF-8. Generally though, it's most useful to set it to US-ASCII to ensure compatibility.

If the entire document is being retrieved, EditLive! will even update the Content-Type meta tag to indicate the new character set or insert one if there isn't one already. This ensures that compliant HTML parsers will automatically use the right character set to parse the document.

One important thing to note about setOutputCharset is that it doesn't affect the character set that EditLive! uses when it loads in the content, only when it outputs it. This allows you to load in content using the character set the content is currently stored in (assuming it hasn't already been corrupted by the legacy systems) and EditLive! will convert it to the specified output character set.

Finally, if you are seeing problems with character set support in your system, there are some common places where character set issues occur:

  • the character set where the content is stored (the database or filesystem).
  • the character set of the web servers or application servers, both for displaying the content to users, and editing the content.
  • the way in which data is read from the database (try outputting the data after reading it from the database).
  • the encoding of data before it is transferred into EditLive! (the URL encoding required before setting the body, document or content).

See Also