homepage Welcome to WebmasterWorld Guest from 54.197.183.230
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Google still lists disallowed pages
How long do disallowed pages stay in the index?
jam13

10+ Year Member



 
Msg#: 380 posted 1:16 pm on May 10, 2004 (gmt 0)

We've have robots.txt set like this since July 03:

User-agent: *
Disallow: /ord/
Disallow: /scan/
Disallow: /images/
Disallow: /customerservice.html
Disallow: /login.html
...

and Google seems to be obeying it because it hasn't downloaded any disallowed files since August 03.

However it still hasn't dropped the pages from it's index, there's just no title, snippet or cached copy. Some even have a Page Rank of 5!

So how long does it take for Google to actually drop these pages? or do they stay there indefinately?

 

Macguru

WebmasterWorld Senior Member macguru us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 380 posted 1:24 pm on May 10, 2004 (gmt 0)

Hi,

[google.com...]

Hope this helps.

jam13

10+ Year Member



 
Msg#: 380 posted 1:33 pm on May 10, 2004 (gmt 0)

> [google.com...]

Hmm - read through that and can't find anything I don't already know: robots.txt, robots metatag etc. Tried the automatic URL link - seems to be dead.

These entries are over 8 months old - surely they should have been removed by now (according to Google's FAQ it takes 6-8 weeks).

(BTW our site is crawled heavily every day).

jdMorgan

WebmasterWorld Senior Member jdmorgan us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 380 posted 2:42 pm on May 20, 2004 (gmt 0)

You're experiencing Google's "link listing" behaviour.

Google will list any page it finds a link to, whether or not it is allowed (by robots.txt) to fetch and analyze that page. If the page is disallowed in robots.txt, it just lists it as a URL, with no title or description.

The solution to this problem is non-intuitive: You must *allow* the pages to be fetched in robots.txt, and then use the on-page html <meta name="robots" content="noindex"> tag to tell them to ignore the page.

Ask Jeeves/Teoma does the same thing, and as of this month, Yahoo's Slurp is now apparently doing it, too. Yahoo adds the interesting twist of using the link text it finds on the link as the title for the listing.

Jim

jam13

10+ Year Member



 
Msg#: 380 posted 10:56 am on May 23, 2004 (gmt 0)

Thanks - I'll give that a go.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved