homepage Welcome to WebmasterWorld Guest from 54.145.243.51
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Code, Content, and Presentation / HTML
Forum Library, Charter, Moderators: incrediBILL

HTML Forum

    
Capitalized encoded characters causing issues
when site is set to 301 URLs to lowercase
metamax



 
Msg#: 4549053 posted 5:44 pm on Feb 26, 2013 (gmt 0)

Hi everyone,

Hopefully someone can help me understand what's happening here.

I have a site where all URLs were recently 301 redirected to a non-capitalized version of the URL.

A lot of the internal URLs on the site have encoded characters like %2F - and we realized this was causing all URLs to redirect to the %2f version.

We found Google Webmaster tools complaining about an 'increase in not followed pages'

When I do a 'fetch as googlebot' on the capitalized %2F, the result is a 301 to %2f, all well and good, but then if I copy and paste the lowercase %2f URL and do another fetch as Googlebot again, the grid that shows my requests lists it as %2F (capitalized), and shows another 301 to %2f lowercase.

There must be something about capitalization and character encoding I don't understand...

 

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4549053 posted 10:14 pm on Feb 26, 2013 (gmt 0)

example.com/page
and
example.com/Page

are different URLs.

%2F and %2f are the same character. But not all functions can disencode both.

Google probably has its own internal function that regularizes [a-f] in encodings to [A-F]. (My logs apparently do the same-- or possibly it's the server itself-- because lower-case letters don't seem to occur.)

You shouldn't have encodable characters in the path of your URL anyway. Do you, or are they coming in from query strings? If so, I hope it isn't really %2F since that is the / slash, a character that doesn't belong in a query string. Either way, the path and the query should be handled separately.

metamax



 
Msg#: 4549053 posted 10:31 pm on Feb 26, 2013 (gmt 0)

Hi Lucy24,

Thanks very much for your help.

Welp, in this case it's a query string which includes an off-site URL as it's a redirect target.

like, /redirect?url=http%3a%2f%2f etc.

So... that's a query string, right?

swa66

WebmasterWorld Senior Member swa66 us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4549053 posted 10:46 pm on Feb 26, 2013 (gmt 0)

/redirect ... sure whatever you have doing the "redirect" isn't actually working ?
Is it validating where it redirects to ?
e.g. avoiding redirecting to itself ?

A 301 response from a webserver is a redirect.

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4549053 posted 11:21 pm on Feb 26, 2013 (gmt 0)

Overlapping swa because I detoured to look up some stuff.

Yup, that's a query string. And I guess you're stuck with the slashes :)

Since your de-capitalization is concerned only with your own URLs, there should be no need for your off-site redirect page to get involved. It may even create errors if you're changing the capitalization of some other site's page names.

But at this point it's no longer an html question but some combination of-- probably-- php and apache.

The quickest temporary fix is to put an exclusion in your redirect code that skips anything in the form %\h\h (expanded to [\dA-F] or even [0-9A-F] if your RegEx dialect doesn't do \h). Exact mechanics will again depend on the exact form of the redirecting function.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / HTML
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved