Welcome to WebmasterWorld Guest from 54.198.93.179

Forum Moderators: Ocean10000 & incrediBILL & phranque

Message Too Old, No Replies

How to have upper & lower case URLs

     
5:13 pm on Apr 2, 2012 (gmt 0)

5+ Year Member



Going through one of our old sites, many page names are either all upper case, all lower case, or a mix. Some of these pages go back to the 1990's and there are too many to change.

But Google is reporting errors looking for all lower-case URLs when the actual URL and file are a combination.

For example, we might have:
www.example.com/DIR1/dir2/Dir3/FileName.html

So how would I have Apache respond to either upper or lower case requests? Would it be in the .htaccess file. If so, what would be the code.

Thank you.
6:02 pm on Apr 2, 2012 (gmt 0)

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



But Google is reporting errors looking for all lower-case URLs when the actual URL and file are a combination.


This usually happens because some stupid scraper with some kiddy script on a Windows box, which is case insensitive, converts all the scraped URLs to lower case and doesn't know any better that the Linux world is case sensitive.

Had this happen many times, Google indexes their scraped content, then comes looking for lower case URLs.

If you waste time on this, you're probably just throwing time, money and resources after a ZERO ROI, a complete boondoggle IMO.

See if you can find the source of those lower case links and block them from scraping your site is the best defense to this problem.
6:30 pm on Apr 2, 2012 (gmt 0)

5+ Year Member



Thanks. I normally wouldn't worry that much except in the last few months our Google Adsense income (which was nothing to laugh at) has shrunk by almost 50%. So I'm looking to find what might be the cause of that.

Doesn't mod_rewrite have some code to respond to UC/lc requests? I know I could do a symbolic link, but that's one file at a time.
6:35 pm on Apr 2, 2012 (gmt 0)

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Going through one of our old sites, many page names are either all upper case, all lower case, or a mix. Some of these pages go back to the 1990's and there are too many to change.


Mixed case names are difficult deal with in regex, however if your able to determine some consistency in the original file names than an expression is certainly possible.

there are three portions of regex that you'll need to explore:

[a-z]
[A-Z]
{n}

you'll need to combine these expressions with other expressions to cover your various and previously determined consistency in the original file names.
8:22 pm on Apr 2, 2012 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



You could return content whatever casing is requested, but then you'd have a huge duplicate content issue on your hands. If the wrong casing is requested you should return 404.

Make sure all new content uses all lower-case URLs wherever possible from now on.
8:58 pm on Apr 2, 2012 (gmt 0)

5+ Year Member



Anything in the last ten years is lower case. I expect the entire site will be redone with a CMS, so not looking to spend too much time on it.

When requesting the URL with the incorrect case, now the site returns the home page. How would I set it to return a 404? Would that be in .htaccess in the docroot, or in the vhost file? Thanks.
9:56 pm on Apr 2, 2012 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



How does it "return the home page"?

Does it return the home page *content* at the originally requested URL, i.e. at the URL that should have been 404? What HTTP status code is returned in the HTTP Header for this request?

OR

Does the site issue a 301 or a 302 redirect from the requested URL to the alternative URL of either example.com/ or example.com/index.html or similar? After the redirect, what HTTP status code is returned for the second HTTP request?
10:17 pm on Apr 2, 2012 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



afaik, everything within htaccess is case-sensitive unless you tell it not to be. If you've already got code in place that redirects to the home page, then you've done the hard part: identifying the culprits. Just replace the redirect with

{blahblah} - [G]

to return a 410. Or get rid of the redirect entirely and it will revert to 404, assuming you're on a case-sensitive server. But 410 is probably what you want-- and it should make g### crawl the pages less often and stop sooner.
 

Featured Threads

Hot Threads This Week

Hot Threads This Month