Forum Moderators: Robert Charlton & goodroi
[edited by: goodroi at 11:56 am (utc) on Aug 22, 2017]
[edit reason] Fixed formatting [/edit]
Btw, the links that point to these 404 pages are mostly from within the site - one reason I'm willing to just block them all with robots.
I've mostly got rid of the 404 errors in GSC (I get rid of 1,500 pages per day) by using robots.txt.
By reporting the 404s, Google is just telling you that they requested the url for the page, and that your server didn't find anything and returned a "404 Not Found" response to Googlebot.
If you think that your server should have found something... ie, that you believe the pages are still around and that Google should not have gotten a 404 Not Found response when it requested the url, then Google's message is useful because it alerts you to a possible problem. Otherwise, 404s are the expected response and are perfectly normal.
As to why Google recrawls urls that you think are gone or non-existent, there are numerous reasons. One is that links to the urls may persist somewhere on the web....
...It might be... that a site will still have internal nav links to the urls of pages that have been removed.... It can be worth checking a site with Xenu or Screaming Frog... to make sure that these urls aren't in the site's code.
For large-scale site changes like this, I'd recommend:
- don't use the robots.txt
- use a 301 redirect for content that moved
- use a 410 (or 404 if you need to) for URLs that were removed
...this seems to fit my situation, as Google's reincarnating a lot of dead urls when I'm moving from the old domain to the new domain."This most likely means that Google is running an old dataset, possibly among other datasets, perhaps for purposes of comparison. This seems to happen at times of big change. "
Question is -- would it be better to let them 404 or just prevent spider access to these pages?
- don't use the robots.txt
- use a 301 redirect for content that moved
- use a 410 (or 404 if you need to) for URLs that were removed
As to why Google recrawls urls that you think are gone or non-existent, there are numerous reasons. One is that links to the urls may persist somewhere on the web....
...It might be... that a site will still have internal nav links to the urls of pages that have been removed.... It can be worth checking a site with Xenu or Screaming Frog... to make sure that these urls aren't in the site's code.
5) We list crawl errors in Webmaster Tools by priority, which is based on several factors. If the first page of crawl errors is clearly irrelevant, you probably won't find important crawl errors on further pages.
https://webmasters.googleblog.com/2012/03/crawl-errors-next-generation.html
I wonder if Google will again try to access those old urls, since all of them have been purged from Google's index over the past three days thanks to the robots.txt. I guess the ones with links will at least be tried again.
it's again started filling my GSC with error reports, this time under 'not found' 410