Can someone help me understand this one?
we have used robots.txt on one of our sites to prevent google from accessing any of the files as follows: User-agent: Googlebot
Disallow: /
What I have noticed is that google is somehow getting some of the pages anyway. out of about 20,000 they have now about 3,670.
also interesting is that on the search results page for:
oursitename site:www.example.com
google shows: Results 1 - 9 of about 3,670
And, only 9 url links without title or description show up. No way to access any of the other supposed 3,670 results.
We have another site that has same pages and the reason we block google from the mirror site is to avoid penalty. Concerned about these pages getting in despite the robots.txt block, and possible penalty.
Any help on understanding this would be appreciated.
[edited by: ciml at 6:06 pm (utc) on July 7, 2005]
[edit reason] Examplified [/edit]