Forum Moderators: phranque

Message Too Old, No Replies

redirected 404 - But

Still get 404 in error_log and Google WMT's

         

dougwilson

5:37 pm on Apr 3, 2012 (gmt 0)

10+ Year Member Top Contributors Of The Month



I redid some pages, one of which is a sort of pdf library just for site users. With all the pdf search engines I was getting many hundreds of not founds with no desire to redirect each one.

I added an .htaccess file to the directory with one line in it:

ErrorDocument 404 /pdf/

So all the malformed nonsense goes to index.html where anything that does exist resides.

Good for people - But - still records as 404 with machines which I don't need. Solution welcome, thanks - Doug

g1smd

10:23 pm on Apr 3, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The ErrorDocument directive is there to tell the server what file to serve when the requested URL does not exist. In this case you're telling it to serve the index file in the /pdf/ folder when the requested document does not exist.

When something does not exist at the URL that was requested the correct response is to return the 404 Not Found status code. You could return 410 Gone instead. Whatever you do there will always be a log entry for the access and some sort of status code rerurned. That's how a web server works.

dougwilson

5:10 am on Apr 8, 2012 (gmt 0)

10+ Year Member Top Contributors Of The Month



Thanks, 410 and 404 get dumped in the same box at google WMT's.

"there will always be a log entry for the access and some sort of status code rerurned"

Yeah, I found out there's no way around this. It is what I was after but, Oh well.

At least set up this way people who follow broken links get to the right page. That's #1 priority. I've also set things up so all those scraped links get a 403 while any malformed make it to page. Set meta nofollow, no archive. I'll watch this for a while and see what happens.

I thought there must be someway to knock out the "in addition a 404 was issued" like using the [L}. I guess I could 301 the 404 - but I'm not sure how to do it right.

g1smd

7:16 am on Apr 8, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



For non-existant pages you can return 404 status at the originally requested URL along with a human-readable error message page OR you can 301 redirect the URL request to a different URL, and then return 200 OK status for that other URL when it shows a content page. Only redirect when a page has moved, or there's a good-fit replacement page for the old content. When content has gone for good, return 404 or 410.

Any other combination of status codes is a configuration error. You especially cannot combine returning a 404 status code with returning a 301 status code.

dougwilson

12:30 pm on Apr 8, 2012 (gmt 0)

10+ Year Member Top Contributors Of The Month



thanks, watching to see results from current config.

dougwilson

10:26 am on Apr 16, 2012 (gmt 0)

10+ Year Member Top Contributors Of The Month



set /pdf/ 404 405.shtml, sent error to logger, found about 12 scraper site linked (wrongly), blocked domains, removed errorDocument [R].

g1smd

11:09 am on Apr 16, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



set /pdf/ 404 405.shtml, sent error to logger, found about 12 scraper site linked (wrongly), blocked domains, removed errorDocument [R].

Somewhat cryptic, but appears to report success. :) Good job!

dougwilson

1:22 pm on Apr 28, 2012 (gmt 0)

10+ Year Member Top Contributors Of The Month



Cracked collar bone and dislocated shoulder so it's been hard to type.

I recently modified the feedback.html form I've been using to act as a "Complete Captcha to continue" page.

Now all the malformed links from scraper sites, and others, go to captcha.

Also use it to see who's requesting - login.php, myadmin, WootWoot and the like.

So far it's working really good and SSI to page with nice print out.

thesitewizard-com (C. Heng) provided the form and
PerlScriptsJavaScripts-com provided the 404 Alerter which can be set up to report as desired.

If someone wants to they can make links live for others.