Forum Moderators: open
User-agent: Googlebot
Allow: /
as it had previously had:
User-agent: *
Disallow: /
I thought this was strange as I have never had any such corresponance from google before and had never heard of them requesting such things...
Now, i understand that not many people would want to block google from indexing the site ;) but has anyone else had anythinglike this?
So it could have been a Googler, especially if someone thought your site was good. :)
Best wishes,
GoogleGuy
and you'll not notice that it's malicious unless you check IPs against the google lists... you'll just think it's google's bot :-S
or is that too much of paranoia?
Did make me feel good. I had to talk a vendor into changing the robots.txt, but they did eventually come around - for us and a lot of other customers as well.
I was kinda shocked to get an email seemingly 'out-of-the-blue' from Google as well, though. Made me feel good. ;)
vincevincevince, I think that's too much paranoia. :) robots.txt is a voluntary standard that spiders comply to. If a malicious user wanted the content, they could just grab it. There are plenty of "rude" user agents that they could masquerade as instead of the genteel Googlebot. :)
The best suggestion I can do is to not change the file if you don't feel 100% sure.
Note: maybe I'm a bit paranoid with this, but I've hacked and been hacked, and I learned that the only one in who i must trust is the one i see reflected when i look at the mirror.
I hope to be usefull. Espero serte de ayuda
If your robots.txt file had
User-agent: Googlebot
Disallow: /
You would be preventing googlebot from entering your site.
In order to allow it you need
# Allow all
User-agent: *
Disallow:
Take a look at
[robotstxt.org...]
and
[robotstxt.org...]
I don't think it would be someone malicious because nothing in fact forces a bot to take any notice of the robots.txt file, it's just a standard that respectable bots follow, if a person can see a page there's nothing to stop a bot.
The only situation I can think of is that if someone had a bot they didnt write and didn't know how to change but they do know how to change the UA string it uses, it's a pretty far out chance though and hardly worth even mentioning.
> or is that too much of paranoia?
Yep.
However I have noticed that some bots/people do do this, set the UA string to GoogleBot, but whether it's bots or Mozilla users playing around I don't know.
Cheers,
Nigel
dillonstars,I would not be suspicious but happy about such privileged attention. I do hope you keep your code squeeky clean ;-) would not want that attention to backfire on you.
Please note that there is a small difference between the way Googlebot handles the robots.txt file and the way the robots.txt standard says we should (keeping in mind the distinction between "should" and "must"). The standard says we should obey the first applicable rule, whereas Googlebot obeys the longest (that is, the most specific) applicable rule. This more intuitive practice matches what people actually do, and what they expect us to do. For example, consider the following robots.txt file:User-Agent: *
Allow: /
Disallow: /cgi-binIt's obvious that the webmaster's intent here is to allow robots to crawl everything except the /cgi-bin directory. Consequently, that's what we do.
[google.com...]
I was aware that allowing spiders to visit my site wasn't going to do any harm, as yes, the code is pretty squeeky clean.
I never really believed that anything malicious was intended (perhaps a hoax at worst), and am delighted at the positive feedback for the site.