|Crawl errors - remove dir physically and from search results|
| 1:00 am on Dec 5, 2010 (gmt 0)|
Google webmaster tools keeps complaining about a removed directory from my website, webserver and webmaster tools.
I know that I need to specify a Disallow in robots.txt if I want a map no longer to appear in the search results by removing it through webmaster tools. Yet, this has not been done by google properly for 6 months already.
Google started complaining about "pages blocked by robots.txt", which is is a normal entry in the robots.txt for maps or pages that I no longer want to appear in google's search results.
After a few months, just to find out what would happen, I removed this disallow entry and guess what?
Right, google started to complain about the html pages held in the map that i requested to remove, not to be found.
What goes wrong here?
Best regards. I like this site.
| 5:11 am on Dec 5, 2010 (gmt 0)|
The blocked pages and not found reports are mostly for your information. They don't necessarily indicate something that would hurt your site's performance in Google. In a case like yours where you definitely know why pages are no longer accessible, you can safely ignore those warnings. If the reports referred to URLs that you expected to be indexed properly but were inaccessible, or if the reports referred to malformed URLs or URLs you didn't recognize, then you would be right to be concerned. That's why Google makes that information available in the WMT console.
| 8:13 pm on Dec 5, 2010 (gmt 0)|
I agree with rainborick - you don't need to have any concern about those reports.
|google started to complain about the html pages held in the map |
Here's what it sounds like to me. When you use robots.txt to block Google from even requesting the URLs, then there is now way it can know that those URLs now return a 404. If you remove that Disallow rule, then Google can see the 404 and will eventually remove those pages from the SERPs - usually sooner than later.
But if those URLs sill appear in a Sitemap, then googlebot will dutifully try to retrieve them. This generates a 404 error message related to your Sitemap.
Have I got that right? If so, then you just submit a new Sitemap that doesn't include those URLs. If I missed something, then I'm all ears.
| 11:02 pm on Dec 5, 2010 (gmt 0)|
Here is what is the case.
I wanted to sell my old car and created an ad on a html page with some photos in a webalbum, all put in a particular directory. Of course what I wanted did happen: the pages in the dir were indexed. Now that the car is sold I went the webmaster tools and created a request to remove the whole dir from the google index. A requirement is that the dir is disallowed in robots.txt, which is what I did.
The remove dir from the google index request was succesfully, but still google complains about some pages of that dir are being blocked. So I removed the disallow and now the problem is that the pages - which I thought I removed from the google index by that request - are not found by the bot.
To me it looks like a neverending loop.
I think I'll follow tedster's advice and remove the disallow and wait for google to realize that the page has gone.
That for the help all of you.