I'm not 100% sure what all those ini_set actually cause - I don't use them myself.
What I do:
When connecting to a database on the command line (mysql -p -u xyz)
CHARSET utf8;
-> this stops the latin-1 conversions, I now talk utf-8 to th database.
When creating a table e.g.:
CREATE DATABASE test DEFAULT CHARACTER SET = `utf8`;
-> this makes the database I create UTF-8
When connecting from php I only use mysqli (I've no clue how the obsolete mysql works in this respect)
I use this: in my connect script:
// connect to the database
$mysqli = new mysqli($server, $user, $pass, $db);
//error handling suppresed
//set UTF-8
$mysqli->set_charset("utf8");
This does the same as the "CHARSET utf8" command line when taking directly to mysql.
I output polyglot html5 nowadays, where I do that, I use:
// print start of doc
if(stristr($_SERVER["HTTP_ACCEPT"],"application/xhtml+xml")){
header('Content-Type: application/xhtml+xml;charset=UTF-8');
}
print('<!DOCTYPE html>'."\n");
print('<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">'."\n");
print(' <head>'."\n");
print(' <meta charset="UTF-8" />'."\n");
print(' <title>...
Note the charset in there twice: once in the header and once in the document: it puts brwosers in the UTF-8 so whatever form you get them to send to you (unless you ask for other encoding), will now be UTF-8 encoded.
When processing input, I validate the input (of course), but part of that is validating that I get valid UTF-8 sequences if I expect text strings:
if ( !mb_check_encoding($input, 'UTF-8')) {
// error handling goes here
}
similarly, to check that some input is not too long (using 50 as an example here:
if ( mb_strlen($input,'UTF-8') > 50 ) {
// error handling goes here
}
Oh yes, and of course since I use polyglot html5, I'm only allowed 5 htmlentities anymore: &, ", <, > and ' So I have my own output filters to replace those were needed.
I don't worry about SQL injection all that much cause I use prepared statements everywhere where I touch user input.