Forum Moderators: Robert Charlton & goodroi
If we search on google for "http error codes" it will bring a w3 page which states:
10.4.4 403 ForbiddenThe server understood the request, but is refusing to fulfill it. Authorization will not help and the request SHOULD NOT be repeated. If the request method was not HEAD and the server wishes to make public why the request has not been fulfilled, it SHOULD describe the reason for the refusal in the entity. If the server does not wish to make this information available to the client, the status code 404 (Not Found) can be used instead.
And if you can talk about how google reacts on 410, 500, 404, 302, 301. Will a 401 (Gone) receive requests in the future? What about 500 (Server Problems)? Or a 404 (Not Found)?
And do mention if any of the error codes will prevent google from ever trying to get a page.
Thanks.
So the answer to 500 server error is:
The bot will leave your site immediately and try again later.
As for 404 file not found:
If there's a link taking the bot to the missing file, it will try to follow the link again on the next crawl.
301: obviously the bot follows the redirect just like users, and treats the new site as the old.
[edited by: callivert at 12:46 am (utc) on Aug. 24, 2007]
301 - Google will index the target url and its information
302 - Google will index the original url, but with the target url's information. This is only true within the same domain. If the 302 is on a different domain than its target, then Google will index the target url and the target url's information
403 - Google will try again. They know mistakes happen
404 and 410 - Treated identically, at present. Google will continue to spider the url. Any previous content from those urls may hang around in the Supplemental Index for several months or more, but it will be removed from the regular index rather quickly.
500 - Google will try again.
In general, Google tries over and over to be sure the urlis really gone. Googlebot has a huge appetite for urls and doesn't want to miss anything.
As it is, they'll keep trying no matter what error code you give them.
Jim
I think this might have caused my supplemental x-files event on that page I mentioned on a different thread.
I'll keep you updated on how soon it'll try to crawl again the 403ed pages.
Otherwise it takes about 15 minutes to rewrite my entire directory structure of my site so ... as a last resort ... I'll do it.
We'll see!
[edited by: TheSeoDude at 1:15 am (utc) on Aug. 24, 2007]
So for all those then funk up once in a while there's still hope ... unless you 301.
-
On 301 will google ever look at the page that issued it or will he just replace it with the new url permanently? This is how he should do ...
-
PS: I'm addressing a robot with "he". I'm lost! He might be a she. It would explain the moods.
Yes, and regularly. Since 301 redirects can transfer backlink influence of all kinds, if that redirect goes away, Google wants to know about it.
In general, Google tries over and over to be sure the urlis really gone. Googlebot has a huge appetite for urls and doesn't want to miss anything.
I've noticed that some CMS software will temporarily lock a page while you're editing it - if Googlebot or somebody comes along and hits that page, they get a 404 until you are done. I think WordPress does this. It's not that big of a deal, but I helped a friend track down why some pages were showing in his logs as 404 even though they weren't, and this was why.
404 and 410 - Treated identically...
I'm just back from SES San Jose, and I'd asked about this at the Meet the Crawlers session. It turns out that Google, Yahoo, MSN, and Ask all treat 404s and 410s the same.
Google still hasn't figured out 410-Gone. They want to 'forgive' Webmaster errors, so they keep trying.
Yes, this was the reason several engineers gave... that we'd be amazed at the innappropriateness of the headers they see, so treating a 410 as 404, in their eyes, is safer.
Not to get into a long discussion about Ask, Ask recommended that for their engine, instead of a 410, you use a robots.txt disallow to remove a page from the index.
I've noticed that some CMS software will temporarily lock a page while you're editing it - if Googlebot or somebody comes along and hits that page, they get a 404 until you are done.
I believe it was Google that suggested you send a 500 or a 503 when you anticipate temporary site problems, and the other engines apparently concurred.
301 - Google will index the target url and its information
Another aside about Ask.... I have some 301s that Ask hasn't followed for some three years now, and I asked them specifically whether this might in fact be on purpose rather a glitch, and they said that yes, it might have been intentional. If they see something on a site that suggests to them that you might be buying an old domain and rebranding it, they might not follow the 301. We didn't have time to go into the specifics at the session... I'll be contacting the engineer... nor did I have time to ask the other engines whether some of their apparent 301 glitches might be intentional as well, but the Ask answer does suggest that's a possibility.
Forever is a very long time.
I would always expect search engines to recheck "gone" URLs from time to time in order to see if they ever came back... a year, a decade, or a century later...
.
What if there was a URL on your site that didn't ever get indexed, and you could not understand why; and the answer was that the URL had been marked "Gone Forever" several decades ago and three owners back?
What if you bought an expired domain that while parked had returned "Gone" for every request, for every one their tens of millions of previously indexed pages, including root? Forever, would mean they would never index your new site.
Indeed specs are made for machines and error-free environments, not for humans.