Forum Moderators: coopster
I need some help handling special characters in my database.
GOAL: To create SEO-friendly URL's that include the title of the movie.
Example: www.domain.com/movie/name-of-movie,12345.htm
ENVIRONMENT: PHP5, mySQL with all fields set to UTF
THE PROBLEM:
Because the movie titles have a variety of special characters (html entities) included, I cannot output to URL's directly. I have tried a variety of encoding/decoding methods, but still am having these issues. As far as I know the original data source is all latin-1.
My code:
function makeurl($input) {
$input = str_replace("'", "", $input); // apparencly decode doesn't work on this character
$input = html_entity_decode($input); // first, undo html entities back into their raw characters
$input = trim($input);
$input = strip_tags($input);
$input = str_replace(" & ", "-and-", $input);
$input = str_replace("/", "-", $input);
$input = str_replace(",", "", $input);
$input = str_replace("'", "", $input);
$input = str_replace('"', '', $input);
$input = str_replace(":", "-", $input);
$input = str_replace(" ", "-", $input);
$input = str_replace(".", "", $input);
$input = preg_replace("/[-\s]+/", '-', $input); // replace ----'s to -
$input = htmlentities($input);
$input = strtolower($input);
return $input
}
Specific problems I need help with:
1. html_entity_decode does not seem to handle all html entities. According to a comment php.net, it only handles about 100 of 250 possible entities.
For example, it does not properly decode ' , which is a single quote ('). Additionally, it's having problems decoding other characters like those in this title:
Prêt-à-Porter will decode to:
pr%26ecirc;t-%26agrave;-porter
Any suggestions?
Anyways, to insert the movie name into the url, all you need to do is pass it through urlencode(), something like:
$url = 'http://example.com/movie/' . urlencode($movie_name) . ',12345.htm';
And then when you're outputing the url to html, pass it through htmlentities():
echo '<a href="'. htmlentities($url) . '">movie</a>';
No need to strip tags or anything, unless you really need to for SEO.