Controlling Entity Encoding
There are two key APIs that control how special characters such as non-english characters or complex punctuation are encoded in the HTML output from EditLive!
- setOutputCharset
- the htmlFilter element in the config file and specifically the outputXHTML and outputXML attributes.
The setOutputCharset function allows you to control which character set is used when serializing the HTML content (see also Solving Character Set Issues With Legacy Systems). Any characters that are supported by that character set are output directly as single characters without any encoding, any characters that aren't supported are encoded. The outputXHTML and outputXML attributes control whether named or numeric entities are used (see our previous article on how and why that works). The table below shows how these two settings work together when outputting special characters.
| ASCII Output Charset | UTF-8 Output Charset | |
|---|---|---|
| outputXHTML or outputXML is true | Numeric entities | No encoding |
| outputXHTML and outputXML is false | Named entities | No encoding |
The ASCII charset is one of the simplest charset and supports very few characters - it also tends to avoid any character set problems with other systems. The UTF-8 character set on the other hand can represent nearly any character supported by HTML directly as a character so you generally don't get any encoded characters.