Welcome to WebmasterWorld Guest from 54.160.221.82

Forum Moderators: open

Message Too Old, No Replies

robots.txt and googlebot

why isn't googlebot obeying?

     
2:06 am on Mar 11, 2003 (gmt 0)

Junior Member

10+ Year Member

joined:Oct 25, 2002
posts:46
votes: 0


I added something like below into my robots.txt at the top

User-agent: *
Disallow: /adserver/
Disallow: /addbiography.cgi
Disallow: /addtrivia.cgi
Disallow: /addquotes.cgi
Disallow: /search.cgi

I added that about 4 hours ago. I am aware googlebot doesn't request robots.txt all the time, but the following happened in my logs

crawler9.googlebot.com - - [11/Mar/2003:00:55:19 +0000] "GET /robots.txt HTTP/1.0" 200 869 "-" "Mediapartners-Google/2.1 (+http://www.googlebot.com/bot.html)"
crawler9.googlebot.com - - [11/Mar/2003:00:55:19 +0000] "GET /robots.txt HTTP/1.0" 200 869 "-" "Mediapartners-Google/2.1
(+http://www.googlebot.com/bot.html)"
crawler9.googlebot.com - - [11/Mar/2003:00:56:48 +0000] "GET /addquotes.cgi?celeb=Robin%20Williams HTTP/1.0" 200 12988 "-" "Mediapartners-Google/2.1 (+http://www.googlebot.com/bot.html)"
crawler9.googlebot.com - - [11/Mar/2003:00:58:29 +0000] "GET /addquotes.cgi?celeb=Robin%20Williams HTTP/1.0" 200 12988 "-" "Mediapartners-Google/2.1 (+http://www.googlebot.com/bot.html)"

crawler9.googlebot.com read robots.txt but still followed /addquotes.cgi?celeb=Robin%20Williams why? Are there more than one instance of a googlebot running at once? So is it possible the instance which followed the addquotes.cgi page read the robots.txt earlier before I added the Disallow?

2:22 am on Mar 11, 2003 (gmt 0)

Junior Member

10+ Year Member

joined:Jan 30, 2003
posts:48
votes: 0


Google Cache's your Robots.txt file, so they only request it once per crawl.

Why not put a robots tag on those pages i.e. <meta name="robots" content="noindex,follow" />

This will stop Google Indexing the Page, but it will still follow the links.

2:29 am on Mar 11, 2003 (gmt 0)

Junior Member

10+ Year Member

joined:Oct 25, 2002
posts:46
votes: 0


projectphp as you can see the googlebot requested robots.txt just before it crawled the cgi page. The only thing I can think of is that there is more than one instance of a googlebot running per IP, and the request made to the cgi page was from a googlebot instance which requested robots.txt before I updated it.

Good idea to change my cgi pages to like you suggested with the <meta name="robots" content="noindex,follow" />

4:46 am on Mar 11, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member powdork is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Sept 13, 2002
posts:3346
votes: 0


What are the ip ranges of the visits. I've never seen that "Mediapartners" bit
5:11 am on Mar 11, 2003 (gmt 0)

Junior Member

10+ Year Member

joined:Oct 25, 2002
posts:46
votes: 0


if you ping crawler9.googlebot.com it is 64.68.87.79
5:17 am on Mar 11, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member powdork is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Sept 13, 2002
posts:3346
votes: 0


if you ping crawler9.googlebot.com it is 64.68.87.79

I know that, what I'm asking is if the visits with that user agent string are from ip's that are normally associated with googlebot or if someone is masking as Gbot? I'm guessing that 'Mediapartners' is part of a rdns lookup but my stat program doesn't have that functionality to check.

5:23 am on Mar 11, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Feb 17, 2003
posts:687
votes: 0


ME TOO! Now I have both meta robots and robots.txt to see if they obey this time.
5:54 am on Mar 11, 2003 (gmt 0)

Junior Member

10+ Year Member

joined:Oct 25, 2002
posts:46
votes: 0


Powdork, I have no idea. I must admit I thought it was strange with that useragent? Is it possible someone could mask as a googlebot?
6:14 am on Mar 11, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member googleguy is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Oct 8, 2001
posts:2882
votes: 0


kryton, I believe that this user agent is associated with our content ads program. Are you on a site that shows our content ads?
3:12 pm on Mar 11, 2003 (gmt 0)

Junior Member

10+ Year Member

joined:Oct 25, 2002
posts:46
votes: 0


No GoogleGuy am I not.
7:04 pm on Mar 13, 2003 (gmt 0)

New User

10+ Year Member

joined:Mar 3, 2003
posts:38
votes: 0


I'm seeing this mediapartners bot, too.

See my post here:

I am not doing anything with "content ads" that I know of (unless my ad network is serving them up somehow)?

GoogleGuy, what's this about and why is this bot so aggressive?

Mike

7:10 pm on Mar 13, 2003 (gmt 0)

Junior Member

10+ Year Member

joined:Jan 20, 2003
posts:80
votes: 0


Us too - we've got the Google MediaPartners bot and have been wondering what it is... We are not involved with the google ads in any way (at this point, only overture).

Chirs

8:40 pm on Mar 13, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member googleguy is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Oct 8, 2001
posts:2882
votes: 0


kryton, do you sell any banner ads? Content ads can also show up on your site that way. I believe the Mediapartners user agent is the bot for content ads, but it would be helpful if anyone could mention specific urls. I guess that's out of bounds, but I'm pretty sure that these are for content ads.
9:12 pm on Mar 13, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member powdork is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Sept 13, 2002
posts:3346
votes: 0


Wouldn't the ip ranges of the bots give all the needed info?
11:27 pm on Mar 13, 2003 (gmt 0)

Junior Member

10+ Year Member

joined:Jan 20, 2003
posts:80
votes: 0


GG,

We use a little cgi script to rotate through some 234x60 banners - but it's only informational internal ads that link to different pages on our site - nothing external.

I've heard your stickymail is turned off - how should I send you our URL?

Thanks,

Chris

2:25 am on Mar 14, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member googleguy is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Oct 8, 2001
posts:2882
votes: 0


Mmm. Do it as a spam report and mention your nick and this url. I'll get it. Thanks..
3:12 am on Mar 14, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member powdork is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Sept 13, 2002
posts:3346
votes: 0


GoogleGuy,
Do people really send in their own URLs to the spam report when you ask them? I try to keep everything above board but that would still make me a tad nervous.;)
3:53 am on Mar 14, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:July 27, 2001
posts:1472
votes: 0


I'm seeing the bot also although it is behaving itself. And though I don't serve Google content ads (yet), I'm certainly interested.