homepage Welcome to WebmasterWorld Guest from 107.22.70.215
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Pubcon Website
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
How to have upper & lower case URLs
lappert2001




msg:4436288
 5:13 pm on Apr 2, 2012 (gmt 0)

Going through one of our old sites, many page names are either all upper case, all lower case, or a mix. Some of these pages go back to the 1990's and there are too many to change.

But Google is reporting errors looking for all lower-case URLs when the actual URL and file are a combination.

For example, we might have:
www.example.com/DIR1/dir2/Dir3/FileName.html

So how would I have Apache respond to either upper or lower case requests? Would it be in the .htaccess file. If so, what would be the code.

Thank you.

 

incrediBILL




msg:4436318
 6:02 pm on Apr 2, 2012 (gmt 0)

But Google is reporting errors looking for all lower-case URLs when the actual URL and file are a combination.


This usually happens because some stupid scraper with some kiddy script on a Windows box, which is case insensitive, converts all the scraped URLs to lower case and doesn't know any better that the Linux world is case sensitive.

Had this happen many times, Google indexes their scraped content, then comes looking for lower case URLs.

If you waste time on this, you're probably just throwing time, money and resources after a ZERO ROI, a complete boondoggle IMO.

See if you can find the source of those lower case links and block them from scraping your site is the best defense to this problem.

lappert2001




msg:4436328
 6:30 pm on Apr 2, 2012 (gmt 0)

Thanks. I normally wouldn't worry that much except in the last few months our Google Adsense income (which was nothing to laugh at) has shrunk by almost 50%. So I'm looking to find what might be the cause of that.

Doesn't mod_rewrite have some code to respond to UC/lc requests? I know I could do a symbolic link, but that's one file at a time.

wilderness




msg:4436333
 6:35 pm on Apr 2, 2012 (gmt 0)

Going through one of our old sites, many page names are either all upper case, all lower case, or a mix. Some of these pages go back to the 1990's and there are too many to change.


Mixed case names are difficult deal with in regex, however if your able to determine some consistency in the original file names than an expression is certainly possible.

there are three portions of regex that you'll need to explore:

[a-z]
[A-Z]
{n}

you'll need to combine these expressions with other expressions to cover your various and previously determined consistency in the original file names.

g1smd




msg:4436385
 8:22 pm on Apr 2, 2012 (gmt 0)

You could return content whatever casing is requested, but then you'd have a huge duplicate content issue on your hands. If the wrong casing is requested you should return 404.

Make sure all new content uses all lower-case URLs wherever possible from now on.

lappert2001




msg:4436404
 8:58 pm on Apr 2, 2012 (gmt 0)

Anything in the last ten years is lower case. I expect the entire site will be redone with a CMS, so not looking to spend too much time on it.

When requesting the URL with the incorrect case, now the site returns the home page. How would I set it to return a 404? Would that be in .htaccess in the docroot, or in the vhost file? Thanks.

g1smd




msg:4436425
 9:56 pm on Apr 2, 2012 (gmt 0)

How does it "return the home page"?

Does it return the home page *content* at the originally requested URL, i.e. at the URL that should have been 404? What HTTP status code is returned in the HTTP Header for this request?

OR

Does the site issue a 301 or a 302 redirect from the requested URL to the alternative URL of either example.com/ or example.com/index.html or similar? After the redirect, what HTTP status code is returned for the second HTTP request?

lucy24




msg:4436433
 10:17 pm on Apr 2, 2012 (gmt 0)

afaik, everything within htaccess is case-sensitive unless you tell it not to be. If you've already got code in place that redirects to the home page, then you've done the hard part: identifying the culprits. Just replace the redirect with

{blahblah} - [G]

to return a 410. Or get rid of the redirect entirely and it will revert to 404, assuming you're on a case-sensitive server. But 410 is probably what you want-- and it should make g### crawl the pages less often and stop sooner.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved