homepage Welcome to WebmasterWorld Guest from 54.226.161.112
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld

Visit PubCon.com
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
Is Google Disregarding robots.txt?
Google seems to be indexing pages from site blocked using robots.txt
latimer




msg:746228
 5:59 pm on Jul 7, 2005 (gmt 0)

Can someone help me understand this one?

we have used robots.txt on one of our sites to prevent google from accessing any of the files as follows:

User-agent: Googlebot
Disallow: /

What I have noticed is that google is somehow getting some of the pages anyway. out of about 20,000 they have now about 3,670.

also interesting is that on the search results page for:

oursitename site:www.example.com

google shows: Results 1 - 9 of about 3,670

And, only 9 url links without title or description show up. No way to access any of the other supposed 3,670 results.

We have another site that has same pages and the reason we block google from the mirror site is to avoid penalty. Concerned about these pages getting in despite the robots.txt block, and possible penalty.

Any help on understanding this would be appreciated.

[edited by: ciml at 6:06 pm (utc) on July 7, 2005]
[edit reason] Examplified [/edit]

 

ciml




msg:746229
 6:08 pm on Jul 7, 2005 (gmt 0)

I suggest looking at your logs. /robots.txt exclusion does not prevent Google from listing the URLs, it prevents Googlebot from fetching them.

The URL-only listings indicate that Google are doing the right thing, so the question is how they found the URLs.

My guess is that either Google visited the site before the /robots.txt was added, or there's some other way for Googlebot to see links to those URLs.

joeduck




msg:746230
 6:16 pm on Jul 7, 2005 (gmt 0)

latimer -

As ciml says. Also my understanding is that if other sites link to you Googlebot will index those links during spidering, but then not flesh them in with content when another bot returns to index the content. You can remove those listings (until the same thing happens again) using the Google robots.txt removal tool but use great caution with it!

latimer




msg:746231
 8:39 pm on Jul 7, 2005 (gmt 0)

Thanks for the replys. Very helpful.

AndAgain




msg:746232
 9:29 pm on Jul 7, 2005 (gmt 0)

I see them obeying but even though pages are not indexed, the homepage still is (without a title being displayed)...has been and I suspect will be in the future....

minnapple




msg:746233
 4:33 am on Jul 8, 2005 (gmt 0)

For "reseach" I created a site over a year ago that did every dirty trick in the book that was outside google's TOS to see if it would get blasted.
This site ranked within top in most searches.
I put the "site up for review" and google wacked it in three days.

I didn't follow the classic sandbox affect like other sites, it was more deliberate.

The domain name came up renewal two months ago, and I decided to block googlebot and move the content to a new domain.

Two weeks later the site started to get google traffic.

Go figure

[edited by: minnapple at 4:38 am (utc) on July 8, 2005]

Adversity Sure Fire




msg:746234
 4:37 am on Jul 8, 2005 (gmt 0)

I think Google removal tool/program should work...

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved