Msg#: 4549053 posted 5:44 pm on Feb 26, 2013 (gmt 0)
Hopefully someone can help me understand what's happening here.
I have a site where all URLs were recently 301 redirected to a non-capitalized version of the URL.
A lot of the internal URLs on the site have encoded characters like %2F - and we realized this was causing all URLs to redirect to the %2f version.
We found Google Webmaster tools complaining about an 'increase in not followed pages'
When I do a 'fetch as googlebot' on the capitalized %2F, the result is a 301 to %2f, all well and good, but then if I copy and paste the lowercase %2f URL and do another fetch as Googlebot again, the grid that shows my requests lists it as %2F (capitalized), and shows another 301 to %2f lowercase.
There must be something about capitalization and character encoding I don't understand...
Msg#: 4549053 posted 10:14 pm on Feb 26, 2013 (gmt 0)
example.com/page and example.com/Page
are different URLs.
%2F and %2f are the same character. But not all functions can disencode both.
Google probably has its own internal function that regularizes [a-f] in encodings to [A-F]. (My logs apparently do the same-- or possibly it's the server itself-- because lower-case letters don't seem to occur.)
You shouldn't have encodable characters in the path of your URL anyway. Do you, or are they coming in from query strings? If so, I hope it isn't really %2F since that is the / slash, a character that doesn't belong in a query string. Either way, the path and the query should be handled separately.
Msg#: 4549053 posted 11:21 pm on Feb 26, 2013 (gmt 0)
Overlapping swa because I detoured to look up some stuff.
Yup, that's a query string. And I guess you're stuck with the slashes :)
Since your de-capitalization is concerned only with your own URLs, there should be no need for your off-site redirect page to get involved. It may even create errors if you're changing the capitalization of some other site's page names.
But at this point it's no longer an html question but some combination of-- probably-- php and apache.
The quickest temporary fix is to put an exclusion in your redirect code that skips anything in the form %\h\h (expanded to [\dA-F] or even [0-9A-F] if your RegEx dialect doesn't do \h). Exact mechanics will again depend on the exact form of the redirecting function.