homepage Welcome to WebmasterWorld Guest from 54.226.180.223
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld

Visit PubCon.com
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
Crawl errors - remove dir physically and from search results
bolognese




msg:4238795
 1:00 am on Dec 5, 2010 (gmt 0)

Hi there,

Google webmaster tools keeps complaining about a removed directory from my website, webserver and webmaster tools.

I know that I need to specify a Disallow in robots.txt if I want a map no longer to appear in the search results by removing it through webmaster tools. Yet, this has not been done by google properly for 6 months already.

Google started complaining about "pages blocked by robots.txt", which is is a normal entry in the robots.txt for maps or pages that I no longer want to appear in google's search results.
After a few months, just to find out what would happen, I removed this disallow entry and guess what?
Right, google started to complain about the html pages held in the map that i requested to remove, not to be found.

What goes wrong here?

Best regards. I like this site.

Jos

 

rainborick




msg:4238842
 5:11 am on Dec 5, 2010 (gmt 0)

The blocked pages and not found reports are mostly for your information. They don't necessarily indicate something that would hurt your site's performance in Google. In a case like yours where you definitely know why pages are no longer accessible, you can safely ignore those warnings. If the reports referred to URLs that you expected to be indexed properly but were inaccessible, or if the reports referred to malformed URLs or URLs you didn't recognize, then you would be right to be concerned. That's why Google makes that information available in the WMT console.

tedster




msg:4239017
 8:13 pm on Dec 5, 2010 (gmt 0)

I agree with rainborick - you don't need to have any concern about those reports.

google started to complain about the html pages held in the map


Here's what it sounds like to me. When you use robots.txt to block Google from even requesting the URLs, then there is now way it can know that those URLs now return a 404. If you remove that Disallow rule, then Google can see the 404 and will eventually remove those pages from the SERPs - usually sooner than later.

But if those URLs sill appear in a Sitemap, then googlebot will dutifully try to retrieve them. This generates a 404 error message related to your Sitemap.

Have I got that right? If so, then you just submit a new Sitemap that doesn't include those URLs. If I missed something, then I'm all ears.

bolognese




msg:4239099
 11:02 pm on Dec 5, 2010 (gmt 0)

Here is what is the case.

I wanted to sell my old car and created an ad on a html page with some photos in a webalbum, all put in a particular directory. Of course what I wanted did happen: the pages in the dir were indexed. Now that the car is sold I went the webmaster tools and created a request to remove the whole dir from the google index. A requirement is that the dir is disallowed in robots.txt, which is what I did.
The remove dir from the google index request was succesfully, but still google complains about some pages of that dir are being blocked. So I removed the disallow and now the problem is that the pages - which I thought I removed from the google index by that request - are not found by the bot.
To me it looks like a neverending loop.
I think I'll follow tedster's advice and remove the disallow and wait for google to realize that the page has gone.

That for the help all of you.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved