Forum Moderators: open
However, eventually the files will get dropped from the index, and since the googlebot can't crawl them anymore, they won't reappear.
When I deleted pages off my server it took a couple of weeks before they were finally gone from the SERPS.
I'd guess you're just having to wait for the fetched page to go through the indexing process. Googlebot crawls very widely, we've had very few genuine REP violations that I can remember. In fact, I'm not sure I do rememeber one.
On another occasion GB ignored my robots.txt file. Contacted the Googlebot team and they disposed of the pages. 3 Months later the pages are back in the index, displaying titles only. Now with this Florida update they finally seem to disappear, at least from -ex. Still sitting on most other dc's.
Note: If you believe your request is urgent and cannot wait until the next time Google crawls your site, use our automatic URL removal system. In order for this automated process to work, your webmaster must first insert the appropriate meta tags into the page's HTML code.Source: Remove Content from Google's Index [google.com]
This option isn't as automatic as it sounds and will work only temporarily. The results are reincluded in the index after 3 months - without recrawling and without testing for robots.txt. Robots meta tags can't have any influence this way.
From the same message:
Doing this and submitting via the automatic URL removal system will cause a temporary, 90 day removal of your site from the Google index. (Keeping the robots.txt file at the same level would require you to return to the URL removal system every 90 days to reissue the removal.)
The robots.txt bit is garbage.
If you meant "displaying URLs only," then this is standard behaviour for Google and Ask Jeeves/Teoma for pages Disallowed by robots.txt. They do not fetch the page, and they therefore do not list the page title or description, but if they find a link to the page, they will list the URL found in that link.
The fix is to *allow* Google and Ask to fetch the page, and then use the on-page meta robots tag to prevent indexing.
It all comes down to varying interpretations of the word "indexing" in the Standard for Robots Exclusion. Some search engines take a robots.txt Disallow to mean "don't mention it" and others take it to mean, "don't fetch this page." The varying results of a robots.txt Disallow are a result of these interpretations. Note also that compliance with the Standard is purely voluntary; it carries no authority.
Jim
My understanding of the remark on the 90 days is different. It only applies to the situation where the robots.txt file is in a subdirectory. The normal place for this file is the root, and then it should work OK. What leanweb described was using a meta tag like:
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW"> and not the robots.txt file. But maybe your experience is not the same is the explanation on the Google site.
Slip of the keyboard, of course I mean url.
The most funny thing is the pages don't even exist but were made up by Googlebot following relative urls.
Takagi,
It's indeed my experience and not just interpretation.
btw: I'm not concerned about these pages anymore, it's just an observation of how things work.