Page is a not externally linkable
- Search Engines
-- Sitemaps, Meta Data, and robots.txt
---- Google site map issues when URL contains %3A


KenB - 2:49 pm on Jan 2, 2010 (gmt 0)


I've discovered that Google's bots seem to be unable to properly follow URLs that encode colons (':') as %3A. It seems that Google insists on replacing the '%3A' with a ':' before following the link. This creates concatenation problems as I'm found other cases where links could not be properly followed by others if they use a ':' instead of '%3A'.

For example if the urlencoded URL http://example.com/foo/widgets%3A%20blue.html was in a site map, Google would follow it as http://example.com/foo/widgets:%20blue.html.

In order to prevent duplicate content penalties and in an effort to try and concatenate all pages to a single page I had coded a 301 redirect from URI requests containing ':' to URIs using '%3A'. This threw Google into a circular redirect as Google's bot would still make its request using ':'.

My method for dealing with this issue has been to stop redirecting requests with URIs containing ':' to URIs using '%3A' and instead using the following in my HTML header:
<link rel="canonical" href="http://example.com/foo/widgets%3A%20blue.html">
Where the URL above is replaced with a properly urlencoded reference for the page in question.


Thread source:: http://www.webmasterworld.com/robots_txt/4052903.htm
Brought to you by WebmasterWorld: http://www.webmasterworld.com