Internationalized Styles Menu

If you integrated EditLive! prior to the 6.0 release, the style toolbarComboBox definition in your configuration file probably looks like this:

<toolbarComboBox name="Style">  <comboBoxItem name="P" text="Normal"/>  <comboBoxItem name="H1" text="Heading 1"/>  <comboBoxItem name="H2" text="Heading 2"/>  <comboBoxItem name="H3" text="Heading 3"/>  <comboBoxItem name="H4" text="Heading 4"/>  <comboBoxItem name="H5" text="Heading 5"/>  <comboBoxItem name="H6" text="Heading 6"/>  <comboBoxItem name="PRE" text="Formatted"/>  <comboBoxItem name="ADDRESS" text="Address"/></toolbarComboBox>

The 6.0 release maintained backwards compatibility with this format, but if you’re still using it then you are missing out on the translations we added.  The new format removes the text attribute:

<toolbarComboBox name="Style">  <comboBoxItem name="P"/>  <comboBoxItem name="H1"/>  <comboBoxItem name="H2"/>  <comboBoxItem name="H3"/>  <comboBoxItem name="H4"/>  <comboBoxItem name="H5"/>  <comboBoxItem name="H6"/></toolbarComboBox>

Once it is removed, you will see that your english-only style names suddenly become translated in all supported languages.

 

We have included translations for all major block tags:

  • P
  • DIV (in 6.4)
  • H1-6
  • PRE
  • ADDRESS
  • TD
  • TH
  • TR
  • TABLE
  • LI
  • UL
  • OL

We’ve even included some of the less common ones like DT and DL.  These translations will apply to styles loaded via CSS as well as from the configuration file.

URL Encoding And Character Sets

Most people don't realize it, but URL encoding is actually different depending on which character set is used at the time. Characters can be corrupted if a string is URL encoded using one character set and decoded using a different character set.

Take for example the Ä character. Using the default character encoding for Windows in the US (cp1252) it's encoded as %C4, however using UTF-8 it's encoded as %C3%86. If you encode it with UTF-8 and decode it with cp1252 you actually wind up with two characters instead of one.

Most URL encoding functions unfortunately don't require a specific character set to use but instead use the character encoding for the page. JavaScript. PHP and ASP all act this way so make sure that the page's encoding is set to the same encoding as the document you're encoding. As always, check that EditLive! is using the same encoding either by specifying it in your configuration file or with a meta tag in the head of the document.

Related documents: Specifying Character Sets For Internationalization

Controlling Entity Encoding

There are two key APIs that control how special characters such as non-english characters or complex punctuation are encoded in the HTML output from EditLive!

The setOutputCharset function allows you to control which character set is used when serializing the HTML content (see also Solving Character Set Issues With Legacy Systems). Any characters that are supported by that character set are output directly as single characters without any encoding, any characters that aren't supported are encoded. The outputXHTML and outputXML attributes control whether named or numeric entities are used (see our previous article on how and why that works). The table below shows how these two settings work together when outputting special characters.

  ASCII Output Charset UTF-8 Output Charset
outputXHTML or outputXML is true Numeric entities No encoding
outputXHTML and outputXML is false Named entities No encoding

The ASCII charset is one of the simplest charset and supports very few characters - it also tends to avoid any character set problems with other systems. The UTF-8 character set on the other hand can represent nearly any character supported by HTML directly as a character so you generally don't get any encoded characters.

Automatically Choosing Spelling Dictionaries

One of the most commonly changed settings for EditLive! is the location of the spelling dictionary. It's not only annoying to keep the URL to the dictionary correct across different sites but it can also be difficult to support clients using different languages. Fortunately, EditLive! 6.0 introduced automatic selection of dictionaries. The correct dictionary for the user's locale is automatically selected and downloaded without needing to be specified in the configuration file.

To take advantage of this all you have to do is:

  1. Make sure that you copied the dictionaries folder into the download directory for EditLive! This should already be the case for most deployments.
  2. Remove the "jar" attribute from the spellCheck element in your configuration file.

Of course, if you want to specify a specific language regardless of the user's locale or a custom dictionary you can still specify the download location for the spelling jar as usual. Otherwise, sit back, relax and let EditLive! do all the work.

Why International Characters Display As Boxes

Sometimes when editing content that uses international characters, particularly from Asian languages, the characters appear as little boxes. The good news is that this nearly always means you've sorted out all the character encoding issues and the content is being loaded in without corruption. So why don't the characters display correctly? It's all about the font.

First, a little background on what fonts actually do - they're more than just a pretty (font) face. It's important to understand that computers don't think of characters like we do, like everything else to a computer, a character is just a sequence of 1s and 0s. When the character is displayed on screen (or printed) the computer has to translate that sequence of 1s and 0s into something that a human would recognize. That's where fonts come in. The fonts on your computer contain a mapping between the computer's representation of a character and the actual glyph you see on screen.

The problem is, it's time consuming and expensive to develop fonts - each and every character has to be carefully drawn in the font along with a lot of other information to get the font to display just right in any situation. To reduce this cost, most fonts don't include a rendering for every possible character and when they don't support a character they render it as a little box.

So, how do we solve the problem?

First we need to make sure that there is at least one font on your system that supports the characters you're using. To test this, see if your browser can render the characters correctly.

The first option is to use a font that supports the characters your using. Since EditLive!, like most browsers, defaults to Times New Roman for the font you'll probably need to explicitly specify the font using the CSS stylesheet, for example:

body {
    font-family: "Hiragino Mincho Pro";
}

Which font to use will depend on what fonts you have installed on your machine, what OS you use and what characters you need to support. Hiragino Minco Pro is a font available on Mac OS X that support Japanese characters.

All that's fairly complex, fortunately there's another option. You'll notice that even without specifying the right font, browsers will tend to render the characters correctly. That's because they support "font fallback". If the required character isn't support by the current font, the browser automatically picks a different font that does support it. Fortunately, font fallback was added to Java in Java 1.5, so you can solve the problem simply by upgrading to Java 1.5 and above (and making sure there's at least one font that supports the international characters).

If instead of little boxes you're seeing question marks, corrupted content or little boxes every second character, you've probably got actual character set problems. Take a look at the previous article on character sets for some tips on tracking down and solving the problem.

Solving Character Set Issues With Legacy Systems

While all good modern software supports non-english text (usually by supporting Unicode), sometimes we still have to work with software that either doesn't yet have Unicode support, or isn't correctly configured for Unicode support. Fortunately, EditLive! is designed to not only work in these situations but provides a really useful setting to ensure that the content coming out of EditLive! is safe to pass on to the non-unicode parts of the system. It's as simple as one JavaScript call:

editlive.setOutputCharset("US-ASCII");

The setOutputCharset call tells EditLive! to ensure that it's output is compatible with the specified character set. You can still use the full set of Unicode characters, EditLive! will simply HTML encode them so that they pass through legacy systems without being corrupted. Users can paste content with special characters from any source, including Microsoft Word and other web pages and EditLive! will make sure that the content is safely encoded so that the output is still completely valid US-ASCII.

You can specify any supported character set in the setOutputCharset call, for instance if your systems fully support Unicode, you could use it to gradually convert your content from the legacy character set to UTF-8. Generally though, it's most useful to set it to US-ASCII to ensure compatibility.

If the entire document is being retrieved, EditLive! will even update the Content-Type meta tag to indicate the new character set or insert one if there isn't one already. This ensures that compliant HTML parsers will automatically use the right character set to parse the document.

One important thing to note about setOutputCharset is that it doesn't affect the character set that EditLive! uses when it loads in the content, only when it outputs it. This allows you to load in content using the character set the content is currently stored in (assuming it hasn't already been corrupted by the legacy systems) and EditLive! will convert it to the specified output character set.

Finally, if you are seeing problems with character set support in your system, there are some common places where character set issues occur:

  • the character set where the content is stored (the database or filesystem).
  • the character set of the web servers or application servers, both for displaying the content to users, and editing the content.
  • the way in which data is read from the database (try outputting the data after reading it from the database).
  • the encoding of data before it is transferred into EditLive! (the URL encoding required before setting the body, document or content).

See Also