URL Encoding And Character Sets

Most people don't realize it, but URL encoding is actually different depending on which character set is used at the time. Characters can be corrupted if a string is URL encoded using one character set and decoded using a different character set.

Take for example the Ä character. Using the default character encoding for Windows in the US (cp1252) it's encoded as %C4, however using UTF-8 it's encoded as %C3%86. If you encode it with UTF-8 and decode it with cp1252 you actually wind up with two characters instead of one.

Most URL encoding functions unfortunately don't require a specific character set to use but instead use the character encoding for the page. JavaScript. PHP and ASP all act this way so make sure that the page's encoding is set to the same encoding as the document you're encoding. As always, check that EditLive! is using the same encoding either by specifying it in your configuration file or with a meta tag in the head of the document.

Related documents: Specifying Character Sets For Internationalization