Forum Moderators: Robert Charlton & goodroi
Massive drop in Google rankings due to spidering issues
...I simply ignored those messages. Obviously, I shouldn't have: Over the course of 5 months and accompanied by a total of 30 warning messages, Google eventually started slamming my site...
The only way I redirect any more is if it will help actual visitors, other than that if I need to remove a page for some reason even if there's something similar, it's 410 Gone.
So the general line of recommendation is to actually let the Googlebot in again, just to let it know that the stuff to spider is gone?
To be honest I don't fully understand how deliberately placing a robots.txt and deliberately removing pages from Google's index doesn't convey the same message, namely that the removal was intentional.
Because when you block their access they don't know if anything has been removed or not, because they can't access it.
...therefore the links don't either.
While Google won't crawl or index the content of pages blocked by robots.txt, we may still index the URLs if we find them on other pages on the web. As a result, the URL of the page and, potentially, other publicly available information such as anchor text in links to the site, or the title from the Open Directory Project (www.dmoz.org), can appear in Google search results.
The requested resource is no longer available at the server and no forwarding address is known. This condition is expected to be considered permanent. Clients with link editing capabilities SHOULD delete references to the Request-URI after user approval. If the server does not know, or has no facility to determine, whether or not the condition is permanent, the status code 404 (Not Found) SHOULD be used instead. This response is cacheable unless indicated otherwise.
The 410 response is primarily intended to assist the task of web maintenance by notifying the recipient that the resource is intentionally unavailable and that the server owners desire that remote links to that resource be removed. Such an event is common for limited-time, promotional services and for resources belonging to individuals no longer working at the server's site. It is not necessary to mark all permanently unavailable resources as "gone" or to keep the mark for any length of time -- that is left to the discretion of the server owner.
Just to reiterate - my site ranked well for a decade with only a couple of dozen pages indexed.
Middle of last year, Google suddenly started sending me warning mails via my Webmaster Tools account, telling me about "possible outages" and that "Googlebot can't access the site". I looked into those, and noticed that Google had found...
Agency has suggested that if the content of a page is not cached thanks to a robots.txt disallow, the content essentially doesn't exist anymore and therefore the links don't either.
If the page is already indexed, then simply adding that line will not remove it from the index, and the content still exists and the links do as well.
There are no rules for this. Sometimes this is how it happens, sometimes not. I would guess it depends on other external factors (perhaps on links pointing to page etc).
Otherwise, disallowing the site in robots.txt would have no effect and the site would continue to rank rather than being dropped from index like a stone (a very recent experience).
Further, if the above is the standard behaviour, it would be a heaven for spammers - just create a page, let it be indexed and rank it, then disallow it in robots and put spammy content on it instead and watch it being ranked for the old content.
I think this would be more close to what happens - blocking a page that was previously indexedvia robots.txt may or may not result in this page remaining in index and it may or may not rank equally well after being blocked (or may drop like a stone).