Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Google & HTTP Header Status Codes

         

austtr

1:51 am on Nov 22, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Q1) Does a 410 GONE statement need to remain in the .htaccess after the specified URL’s are gone from Google’s index? Once the instruction has been actioned, it would seem unnecessary to keep repeating the instruction. Correct or not?

Q2) A bunch of old, DELETED .htm pages have been 301 redirected to new, matching CMS pages to preserve the existing link juice coming via external pages/links. Enter an old URL and the browser renders the new page. Good… redirect is working as intended and Screaming Frog, Seobook and other checkers all report a 301 header on the old URL…. all good.

However, in GSC the old URL’s show as 404 Page Not Found with a detection date just two days ago. If only one header status is possible, and the web server is showing the correct 301 to all the other checkers, where/how/why is Google finding a 404?

rainborick

4:48 am on Nov 22, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Q1) The 410 response code isn't critical. As long as your site returns either a 404 or a 410 response for these URLs, the URL will remain out of the index and you'll be fine.

Q2) I would suggest that you double-check the URLs in the Crawl Errors section of the GSC. Click on one to bring up the error details dialog box where you'll find a "Fetch As Googlebot" link at the bottom of the box. There could be a subtle difference in the URLs that Google has seen that your redirect instruction is missing. In any case, it would be worthwhile knowing the result in order to find a solution.

not2easy

5:08 am on Nov 22, 2015 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Another place to check - make sure the old URLs are not on the current sitemap and check to see the date of the sitemap that Google is using for the old URLs. I've found that submitting a new sitemap can help lower 404s, as if old URLs are in the sitemap they are using it confuses them. In the list format for sitemaps it shows the date last crawled for each sitemap, but if you click on the sitemap you may see it was last fetched a month ago.

lucy24

7:39 am on Nov 22, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Does a 410 GONE statement need to remain in the .htaccess after the specified URL’s are gone from Google’s index? Once the instruction has been actioned, it would seem unnecessary to keep repeating the instruction. Correct or not?

Google never forgets an URL. It's true that once something is gone from its index it can never reappear, but they will keep requesting it periodically. An explicit 410 helps reinforce the message that it's really gone and they should stop asking. (This applies specifically to google. Other search engines don't seem to distinguish between 404 and 410 when it comes to crawling.)

Besides, there's a practical advantage: If you return a 410, the server doesn't have to go physically look for the file, as it would before returning a 404. (Still worse, of course, if the 404 is returned by a CMS only after pawing through your whole database.)

blend27

1:58 pm on Nov 23, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Google never forgets an URL.

Ditto that, especially if there are links from other sites pointing to the old URL.

I am currently working on a site that got hacked 18 month ago. Site got cleaned up but over 100,000 back-links from other hacked sites(over 4000 domains - 3/4 from China) still pointing to none existent pages that we created by the hack. So Goog requests about 50-60 URIs on daily bases, All 410s for 18 month now - no matter.

keyplyr

1:13 am on Dec 1, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Does a 410 GONE statement need to remain in the .htaccess after the specified URL’s are gone from Google’s index?
Google never forgets an URL...

So to keep it out of GWT (Google Search Console) Crawl Errors, I always Disallow the old page in robots.txt.