This document presents you a list of things to keep in mind when creating UTF-8 aware web-applications using Dojo and PHP.
Although the focus lies on Dojo and PHP most of the tips below can be used for any other web (scripting) language too.
- write files UTF8-encoded with an UTF-8 capable editor, like PSPad
Basically, ensure that your text editor is capable of writing the BOM correctly (which has been an issue with some editors in the past).
- insert the following in your .htaccess:
php_value default_charset UTF-8
This will direct PHP to always send out data UTF-8 encoded.
- specify correct character set in HTML header:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
- specify correct language in HTML header:
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
- force Dojo to use UTF-8 for dojo.iobind:
var djConfig = {isDebug:true, parseOnLoad:false, bindEncoding:"UTF-8"};
- specify character set for your <form>:
<form accept-charset="UTF-8">
Hint: Although this is optional (browsers will automatically use the encoding previously used by the server) when specified it is much more obvious what encoding you want to use.
- define character set to be used for returning data on the server side:
header("Content-Type: text/html; charset=utf-8");
- encode data to be sent to the server on the client side:
encodeURIComponent();
This makes sure that special characters such as “’$%&,… will be transferred correctly.
- decode data received on the server side:
html_entity_decode(urldecode($p), ENT_QUOTES, "UTF-8");
- use utf8_general_ci as collation for your MySQL database.
This ensures that data will actually be saved UTF-8 encoded and not re-coded to for instance latin1_swedish_ci, which is the default settings of MySQL.
- right before reading/writing from your database issue the following query:
mysql_query("SET NAMES 'utf8'");
This will direct the MySQL server to process UTF-8 encoded data.
- don’t forget to escape quotation marks to avoid code-injection when issuing database queries:
addSlashes();
Note that addSlashes is not the optimal way to escape special characters! This just serves as an example.
- when using PHP’s json_encode() function be sure to utf8_encode strings as json_encode will cut off german umlauts (etc.) that were not encoded before
json_encode(utf8_encode(rawurlencode($x));
Summary:
Client:
- encodeURIComponent() , decodeURIComponent
Server:
- html_entity_decode(urldecode($p), ENT_QUOTES, “UTF-8”)
- mysql_query(“SET NAMES ‘utf8′”), addSlashes()
- header(“Content-Type: text/html; charset=utf-8”)
- json_encode(utf8_encode(rawurlencode($x))
MySQL:
- utf8_general_ci