Forum Moderators: Robert Charlton & goodroi
Now that google seems to recognize the robots.txt file, I am wondering if there is a consequence to all this? Will I be penalized for this robots.txt file?
Thanks
I put the lines in my robots.txt file to only block the G bot from one PDF file. I started dropping in G after I did that!
FWIW, G doesn't seem to obey the robots meta tag line.
I also have a htaccess.txt file, that is not implemented as .htaccess that you may look at.
KBleivik
Make it simple, as simple as possible, but no simpler.
we have used robots.txt on one of our sites to prevent google from accessing any of the files as follows:
User-agent: Googlebot
Disallow: /
What I have noticed is that google is somehow getting some of the pages anyway. out of about 20,000 they have now about 3,670.
also interesting is that on the search results page for:
oursitename site:www.oursite.com
google shows: Results 1 - 9 of about 3,670
And, only 9 url links without title or description show up. No way to access any of the other supposed 3,670 results.
We have another site that has same pages and the reason we block google from the mirror site is to avoid penalty. Concerned about these pages getting in despite the robots.txt block, and possible penalty.
Any help on understanding this would be appreciated.
I'm not sure about your search results though - it is possible to have a lot of 'hidden' IBL's - possibly from your mirror - in google.
if you are confident of your robots.txt (validate it) you can submit both robots.txt files and it will remove the one site and clean up the other from anything that is disallowed.