Forum Moderators: phranque

Message Too Old, No Replies

how do I turn a 200 into a 404

How do I get incorrect urls to return a 404?

         

proboscis

10:08 pm on Jul 17, 2006 (gmt 0)

10+ Year Member



Hello,

Sorry I know nothing on this subject so I hope I am asking the right question...

Google has somehow indexed hundreds or maybe thousands of incorrect urls from my site and the incorrect urls return a status code of 200 ok. I have been told that I can get the bad urls back out of the index by having them return a 404 instead.

One of the problems is the double //

This would be a correct url:

example.com/directory/

Then the bad url that I would want to 404:

example.com//directory/

The other major problem is the extra /

This url is correct:

example.com/page.html

and an incorrect url:

example.com/page.html/

So how would I do that? And should I do that? I don't want to cause more problems than I fix :)

Advice appreciated. Thanks!

jdMorgan

11:17 pm on Jul 17, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This problem is fixable with mod_rewrite, but you'll need to become conversant with that Apache module before using it. As someone just posted in another thread, "I've brought my server down with a single typo."

For more information, see the documents cited in our forum charter [webmasterworld.com] and the tutorials in the Apache forum section of the WebmasterWorld library [webmasterworld.com].

I've seen this problem before, so I'll go look and see if I can find the code. But generally, you won't be able to just copy it and expect it to work, so check out those links.

Jim

[edited by: jdMorgan at 2:41 am (utc) on July 18, 2006]

mr_lumpy

8:41 am on Jul 18, 2006 (gmt 0)

10+ Year Member



Thanks for the links jdMorgan, I'm going over them right now.

So is it possible, using a perl snippet in my perl program, the return a 404 page? I can spit out 200 OK pages with no problem - I just can't master the tricks to get perl to output a 404 when I need to. Any particular info I should look up?

Thanks again for your time and help!

mr_lumpy

jdMorgan

3:23 pm on Jul 18, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If you want/need to use PERL, then all you have to do is output a 404 response status header when your script cannot generate the requested page content. Something like this:

[b]print ("Status: 404 Not Found\n");[/b]
print ("Cache-Control: no-store\n");
print ("Content-Type: text/html[b]\n\n[/b]");
# Note two linefeeds on line above required to mark end of HTTP headers.
print ("<html>\n<head>\n");
print ("<meta http-equiv=\"Content-Type\" content=\"text/html; charset=iso-8859-1\">\n");
print ("<meta http-equiv=\"Content-Language\" content=\"en-US\">\n");
print ("<title>Resource Not Found</title>\n");
print ("</head>\n");
print ("<body text=\"#000000\" bgcolor=\"#FFFFFF\" link=\"#000099\">\n");
print ("<center><h1><font face=\"Arial,Helvetica\" color=\"#CC0000\">Requested page cannot be found</font></h1></center>\n");
print ("<p><font face=\"Arial,Helvetica\">The page you are looking for is one that has been removed or replaced.</font></p>\n");
print ("<p><font face=\"Arial,Helvetica\">Please visit our <a href=\"/site-map.html\">Site Map</a> or ");
print ("our <a href=\"/\">Home Page</a> to find your way around our site.</font></p>\n");
print ("</body>\n</html>\n");

This was taken off a live server, but modified and simplified for posting here. There may be typos in it.

Jim

proboscis

10:48 pm on Jul 18, 2006 (gmt 0)

10+ Year Member



Is this close?

RewriteCond %{QUERY_STRING} shtml/
RewriteRule ^(.*) [R=404]

To make any full url that ends with a / to return a 404.

Thanks:)

LunaC

1:22 am on Jul 19, 2006 (gmt 0)

10+ Year Member



I'm having a similar problem, what I did to fix page.html/ was to use a 301 redirect.

Here's what I'm using in my .htaccess (I got this from this forum, I am in no way an expert or even remotely literate in this stuff.
Oh, and this code won't work if you're on a windows server, just Unix.)

This goes in .htaccess:

Options +FollowSymLinks
RewriteEngine on
# Remove multiple slashes anywhere in URL
RewriteCond %{REQUEST_URI} ^(.*)//(.*)$
RewriteRule . http://www.example.com%1/%2 [R=301,L]
#
# Remove trailing slash if filetype present in URL
RewriteRule ^(.+\.[^/]+)/$ http://www.example.com/$1 [R=301,L]

It's taking a while but the bots are following the redirects to the propper urls on my site and slowly the bad urls are dropping out of the search results, the good ones are staying in.

If you try this make sure you test the headers to be sure they really are returning a 301 permanent redirect (you don't want a 302 temporary redirect). Test, test and re-test every variation you can think of... this is hugely important... you want a 301 response.

If you're a Firefox user try the 'Live http headers' extension. It give the most detailed results I've found. Be sure to clear your cache before testing, otherwise it will still show the old results.

If you're not a Firefox user search for header checker and you'll find a few online.

Other ones to watch out for is being indexed with and without the www (ie. www.example.com and example.com) Google has been having issues with that for ages, and example.com/ and example.com/index.html. Those can both cause indexing troubles to.

jdMorgan

2:12 am on Jul 19, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The 301-redirect solution posted by LunaC is preferable to using a 404. This allows you to 'recover' any traffic arriving via the bad links, as well as telling the search engines to 'correct' the URL.

Jim