Forum Moderators: phranque

Message Too Old, No Replies

errordocument for incorrect url

double slashes

         

soapystar

5:59 pm on Aug 30, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



i am trying to use htaccess to redirect calls for urls that have a double slash as a 404

for example
www.domain.com//page.htm
www.domain.com//directory/newpage.htm

they all return a url. I believe the best solution is to return a 404 and was wondering the best way to write this to over all possible occasions of a //

thanks for any help!

Caterham

10:22 pm on Aug 30, 2006 (gmt 0)

10+ Year Member



Use a condition to check r->unparsed_uri which is a part of THE_REQUEST.

If you're using the most recent version of apache you can use the R flag in order to force a 404 not found.

RewriteEngine on
RewriteCond {THE_REQUEST} //
RewriteRule ^ - [R=404]

jdMorgan

11:48 pm on Aug 30, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I've been using this to 'correct' such URLs:

# Fix extra leading slashes in URL
RewriteCond %{REQUEST_URI} ^//+(.*)
RewriteRule .* /%1 [R=301,L]

Note that [R=404] won't work on older versions of Apache. For these older Apache versions, you can simply rewrite the request to a file-path that you know does not exist, and let the default 404 error handler handle it.

Jim

soapystar

8:14 am on Aug 31, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



thanks for that guys!

I wonder whats the best fix for already indexed duplicates then, 301 or 404's? But i guess thats a question for a different forum!

soapystar

8:46 am on Aug 31, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



jd that code only seems to work for double slashes on the top level directory..its not redirecting deep directories such as:

domain.com/level1/level2//page,htm

jdMorgan

4:01 pm on Aug 31, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> that code only seems to work for double slashes on the top level directory

That's true, and the code is documented as behaving in that way...

# Fix extra leading slashes in URL

Really only a small tweak is needed to modify it:


# Fix double slashes in URL
RewriteCond %{REQUEST_URI} ^(.*)//+(.*)
RewriteRule .* /%1/$2 [R=301,L]

Note that this will correct only one instance of double-slash and then redirect. If you ever see multiple instances of double-slashes in requests, then you may want to modify the code to correct multiple instances before doing the redirect. Otherwise, you'll get multiple redirects, with one instance of double-slash removed per redirect.

For already-indexed URLs, a 301 is the correct response.

Jim

Caterham

7:27 pm on Aug 31, 2006 (gmt 0)

10+ Year Member



Keep in mind that since the 2.0 branch r->uri does not have multiple leading slashes, so REQUEST_URI won't catch this there.

soapystar

6:27 pm on Sep 6, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



jd when i say for already indexed content what i mean is google has my double slash pages in supplemental and with the duplicate page with a single slash in the normal index. I wonder if its better to serve a 404 for those duplicates that i want removed rather than associate them further with the real page.

jdMorgan

2:40 am on Sep 7, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



No, a 301 is the correct response, especially if someone is linking to that URL.

Remember that Google indexes/lists/analyzes URLs, not pages. In fact, dynamic sites don't have 'pages' at all -- everything is generated on-the-fly by one script or a few scripts.

If they find a URL, then that URL exists, plain and simple. Therefore, the 301 is used to say, "The content you want is now located at this (new/different) URL. Please re-request it from that URL." This tells the search engines to throw away or ignore the current URL and use the one you provide with the 301 response.

A 404 is generally an indicator of a poor-quality site, while a 301 --if there are not too many of them and they are not used to make up for too many URLs that change too frequently-- indicates that "someone cares for this site enough to ask us to correct this URL."

If you feel you must reject these double-slash requests, then use a 410-Gone response. This tells the search engines unequivocally that the resource associated with the requested URL has been removed and will not come back.

But both 404 and 410 say the resource is not available, and if a customized error document is not used to provide aome help in finding what was requested, that the Webmaster can't be bothered to help visitors who request that URL...

Jim