Forum Moderators: open

Message Too Old, No Replies

Did Google Ask Me to Rewrite My robots.txt?

A strange email.

         

dillonstars

1:34 pm on Jul 11, 2003 (gmt 0)

10+ Year Member



I got an email from someone at google today asking me to include the follwing lines in my robots.txt file

User-agent: Googlebot
Allow: /

as it had previously had:

User-agent: *
Disallow: /

I thought this was strange as I have never had any such corresponance from google before and had never heard of them requesting such things...

Now, i understand that not many people would want to block google from indexing the site ;) but has anyone else had anythinglike this?

ciml

7:02 pm on Jul 12, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



dillonstars, I would be very surprised if Google would email you out of the blue just to ask you to add that to your /robots.txt

I would treat such a request with great suspicion.

bakedjake

7:03 pm on Jul 12, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



<post deleted: I'm an idiot>

[edited by: bakedjake at 7:09 pm (utc) on July 12, 2003]

ikbenhet1

7:07 pm on Jul 12, 2003 (gmt 0)

10+ Year Member



It has happend before that google asked to allow googlebot to index pages.
A link was posted on webmasterworld many months ago.
Don't have the link do.

That site in question was something with "damian" i beleive. i forgot.

[edited by: ikbenhet1 at 7:09 pm (utc) on July 12, 2003]

bakedjake

7:09 pm on Jul 12, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



LOL, sorry. I read your message completely backwards of what it really was. I thought you had allow all, and Google was asking you to disallow.

I apologize!

ciml

7:32 pm on Jul 12, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Sorry, I read it the same way Jake did. There's a smudge on my screen that made 'Allow' look more like 'Disallow'. (and if you believe that you're as daft as I am...)

Thanks for the info, ikbenhet1. I must have missed that thread (conspiracy theories back in the draw).

Shak

7:35 pm on Jul 12, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



that last thread and email which did the rounds was #@x$%£&*... (imo), actually there was NO such employee at Google with that name!

Shak

killroy

7:56 pm on Jul 12, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hmm, did you apply to adsense? maybe that's why?

SN

GoogleGuy

12:40 am on Jul 13, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



dillonstars, it's possible that you just have a really great website, and someone at Google noticed that we didn't have it in our index. For example, I think that for the longest time, the California Department of Motor Vehicles (DMV) had a robots.txt that disallowed any spiders. :) After a while, some engineer wanted to sell/transfer a car, and couldn't find out how to do it when they searched on Google. So they investigated and found out that the DMV was blocking bots. I think someone dropped them a note asking them to consider allowing us to index their page--problem solved. :)

So it could have been a Googler, especially if someone thought your site was good. :)

Best wishes,
GoogleGuy

vincevincevince

1:02 am on Jul 13, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



or it could be another malicious user who hopes you'll change your robots.txt and will then sneak in using a false googlebot useragent and get full access?

and you'll not notice that it's malicious unless you check IPs against the google lists... you'll just think it's google's bot :-S

or is that too much of paranoia?

kpaul

2:20 am on Jul 13, 2003 (gmt 0)

10+ Year Member



It's happened to me. Not for a personal site, but another one I work for (offering news on Widgets daily ;)

Did make me feel good. I had to talk a vendor into changing the robots.txt, but they did eventually come around - for us and a lot of other customers as well.

I was kinda shocked to get an email seemingly 'out-of-the-blue' from Google as well, though. Made me feel good. ;)

GoogleGuy

6:44 am on Jul 13, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Cool, kpaul. Sounds like you've got a tres useful site too. :)

vincevincevince, I think that's too much paranoia. :) robots.txt is a voluntary standard that spiders comply to. If a malicious user wanted the content, they could just grab it. There are plenty of "rude" user agents that they could masquerade as instead of the genteel Googlebot. :)

dillonstars

8:40 am on Jul 14, 2003 (gmt 0)

10+ Year Member



Many thanks for all your replies!

I'll now return to building up the site with renewed enthusiasm :)

Herenvardo

4:47 pm on Jul 14, 2003 (gmt 0)

10+ Year Member



I'll warn you: may be the mail really comes from Google, may be not. Have you any way to check it? If yes, what are you waiting for? If not, try to answer the message and ask for more information to the sender: who is him/her, why is asking you to change your robots.txt file, etc. If you receive no answer, don't do it. If you get an answer, try to check all the info you get. In the best case, you will allways in a little danger.

The best suggestion I can do is to not change the file if you don't feel 100% sure.

Note: maybe I'm a bit paranoid with this, but I've hacked and been hacked, and I learned that the only one in who i must trust is the one i see reflected when i look at the mirror.

I hope to be usefull. Espero serte de ayuda

GoogleGuy

5:07 pm on Jul 14, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'm guessing that the mail came from Google. The robots.txt file doesn't affect whether the files are actually there are not, so any random unpolite bot could already crawl the site. The request would let Google index the site--it feels quite real to me; someone probably found a useful resource on your site, dillonstars, and wished that it was in Google. :)

athinktank

5:12 pm on Jul 14, 2003 (gmt 0)

10+ Year Member



I would be suspicious of this email. Take a look at the email headers to see where is *really* came from.
The syntax
User-agent: Googlebot
Allow: /
Is not valid. *Allow:* is not a valid command... only *Disallow:*

If your robots.txt file had
User-agent: Googlebot
Disallow: /
You would be preventing googlebot from entering your site.
In order to allow it you need
# Allow all
User-agent: *
Disallow:

Take a look at
[robotstxt.org...]
and
[robotstxt.org...]

Made In Sheffield

5:51 pm on Jul 14, 2003 (gmt 0)

10+ Year Member Top Contributors Of The Month



vincevincevince,

I don't think it would be someone malicious because nothing in fact forces a bot to take any notice of the robots.txt file, it's just a standard that respectable bots follow, if a person can see a page there's nothing to stop a bot.

The only situation I can think of is that if someone had a bot they didnt write and didn't know how to change but they do know how to change the UA string it uses, it's a pretty far out chance though and hardly worth even mentioning.

> or is that too much of paranoia?

Yep.

However I have noticed that some bots/people do do this, set the UA string to GoogleBot, but whether it's bots or Mozilla users playing around I don't know.

Cheers,
Nigel

ciml

6:53 pm on Jul 14, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'd be very suspicious if the email said "Disallow: /" (as I misread it initially), instead of "Allow: /".

As long as a webmaster like dillonstars understands what Allow: / achieves, and is therefore happy to allow Googlebot to crawl his URLs, then there really isn't a danger.

alxdean

6:54 pm on Jul 14, 2003 (gmt 0)

10+ Year Member



athinktank,
I would not go as far as stating it invalid code. As the robots standard, as it stands, is not backed by any official standards body (not by RFC or IETF). The Internet Draft "A Method for Web Robots Control" found at:http://www.robotstxt.org/wc/norobots-rfc.html might have expired in 1997, but it does not stop many forward looking search engines to start utilising some of the flexibliity it provides.
Taking into respect that not even the robots standard from 1994 made it to RFC level.

dillonstars,I would not be suspicious but happy about such privileged attention. I do hope you keep your code squeeky clean ;-) would not want that attention to backfire on you.

Yidaki

8:49 pm on Jul 14, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



athinktank, it might be not (yet) rfc standard but (allready) googlebot standard:

Please note that there is a small difference between the way Googlebot handles the robots.txt file and the way the robots.txt standard says we should (keeping in mind the distinction between "should" and "must"). The standard says we should obey the first applicable rule, whereas Googlebot obeys the longest (that is, the most specific) applicable rule. This more intuitive practice matches what people actually do, and what they expect us to do. For example, consider the following robots.txt file:

User-Agent: *
Allow: /
Disallow: /cgi-bin

It's obvious that the webmaster's intent here is to allow robots to crawl everything except the /cgi-bin directory. Consequently, that's what we do.

[google.com...]

HitProf

8:59 pm on Jul 14, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Allow: is specific for Google, no other bot uses it.
Why doesn't Google stick to Disallow: (with nothing behind it)?

Symbios

9:57 pm on Jul 14, 2003 (gmt 0)

10+ Year Member



Why not just delete;

User-agent: Googlebot
Allow: /

And googlebot will follow.

dillonstars

7:48 am on Jul 15, 2003 (gmt 0)

10+ Year Member



Thanks for all the attention folks!

I was aware that allowing spiders to visit my site wasn't going to do any harm, as yes, the code is pretty squeeky clean.

I never really believed that anything malicious was intended (perhaps a hoax at worst), and am delighted at the positive feedback for the site.

Powdork

8:26 am on Jul 15, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Dillonstars,
Is your site such that someone could search for something, expect to find yoursite, not find it, and then complete the dissatified with results form. Stating the results were unsatisfactory because your site was not listed. Perhaps this would trigger a manual check, the Googlemployee would say "Hey, this should be listed! I'll fire off an email"

dillonstars

8:33 am on Jul 15, 2003 (gmt 0)

10+ Year Member



Powdork > no it's a personal humor site. It's listed in the ODP, so i presume they found it, found it funny, and noticed on their toolbar that it didnt have a page rank ... or something like that...