Forum Moderators: martinibuster

Message Too Old, No Replies

Mediapartners-Bot: Multiple page accesses

Not once, not twice, but thrice and more

         

zCat

12:16 pm on Jul 26, 2005 (gmt 0)

10+ Year Member



Just idly trawling through my stats and logs and noticed the AdSense bot ("Mediapartners") is by far and away my best single "customer". Not that I'm complaining mind you, it's the only bot I know of whose presence translates directly into cents.

What gives me cause for slight concern though is that it fetches the same page multiple times within the same 1-10 second time window. Even though it gets exactly the same page back every time. The pages generally have three code blocks, but are fetched up to four times.

Can I do anything to stop this? Surely one hit per page would be enough - it would save me and Google bandwidth and processing power.

wyweb

3:43 pm on Jul 26, 2005 (gmt 0)



i believe .htaccess can limit this... or maybe it's robots.txt.. I'm pretty sure you can regulate this.

Off hand I can't tell you how to do it though and I apologize for the non-answer.

anyway I'll get you bumped back up where a good coder might see it...

EricGiguere

4:24 pm on Jul 26, 2005 (gmt 0)

10+ Year Member



I wouldn't try blocking the bot, that seems counter-productive. There are a number of valid reasons why they might be accessing the site multiple times. It may simply be that multiple instances of the bot pick off the same request off the queue. Or it may be that the bot is trying to determine if your page content changes on each access.

You're better off spending your time on content and traffic development...

Eric

asinah

4:26 pm on Jul 26, 2005 (gmt 0)

10+ Year Member



Culd be that the MediaBot is looking for cloaked pages.

zCat

4:30 pm on Jul 26, 2005 (gmt 0)

10+ Year Member



in reply to wyweb:

I don`t think there`s any way to regulate the number of hits to a certain page, at least using those methods; it's either all or nothing, and I certainly don`t want to risk blocking that particular bot.

Methinks I should take a look at MediaPartner's incoming headers, maybe it's expecting some kind of If-Changed-Since response my site isn't giving it.

Has anyone else experience this kind of behavior?

zCat

4:40 pm on Jul 26, 2005 (gmt 0)

10+ Year Member



in reply to EricGiguere:
all the requests come from the same IP address (in fact recently all hits from the MediaBot over a long period of time come from the same address). In the logs it looks like this:

66.249.00.000 - - [26/Jul/2005:03:32:59 +0200] "GET /some-page-or-other.html HTTP/1.1" 200 19649 "-" "Mediapartners-Google/2.1"
66.249.00.000 - - [26/Jul/2005:03:32:59 +0200] "GET /some-page-or-other.html HTTP/1.1" 200 19649 "-" "Mediapartners-Google/2.1"
66.249.00.000 - - [26/Jul/2005:03:32:59 +0200] "GET /some-page-or-other.html HTTP/1.1" 200 19649 "-" "Mediapartners-Google/2.1"
66.249.00.000 - - [26/Jul/2005:03:32:59 +0200] "GET /some-page-or-other.html HTTP/1.1" 200 19649 "-" "Mediapartners-Google/2.1"

(some data modified to protect innocent parties).

I`m not losing sleep over it, but from an admin point of view I like to minimize bandwith usage and server load.

EricGiguere

4:47 pm on Jul 26, 2005 (gmt 0)

10+ Year Member



Just because they have the same IP address doesn't mean they're not different instances. And yes, they may be looking for other things. I understand you want to minimize the bandwidth... but look at it as extra incentive to get your earnings up to cover any extra bandwidth costs :-)

Eric

SebastianX

1:29 pm on Jul 28, 2005 (gmt 0)

10+ Year Member



The AdSense bot has just started to download Google XML Sitemaps. Probably that's a test, but it could solve this problem and a few other issues as well.

I've not yet enough data to say that the mediapartners bot definitely makes use of the information (especially the lastmod timestamp) harvested from XML sitemaps. However, I'm speculating that it will do so after a probably short period of testing.

So if you don't provide a dynamic XML sitemap yet, it's possibly a good idea to refresh your programming skills;)

DonMateo

2:56 pm on Jul 28, 2005 (gmt 0)

10+ Year Member



Don't worry about it. The Mediapartners bot always visits my sites multiple times for the same page within a short time (usually 3 times). It's always done this and I think it's normal.

CodeJockey

3:41 pm on Jul 28, 2005 (gmt 0)

10+ Year Member



I noticed this some time ago in our logs, and decided to at least ask about it.

Got a nice person-written email back from Google's 'contact us' saying that my email (and the copy of the log) was being forwarded to their engineer's for 'review'.

Haven't heard back from them (nor at this point do I expect to) but the bot is still hitting up the same url multiple times after it's been called.

I've gotten used to ignoring it.

SebastianX

7:40 pm on Jul 28, 2005 (gmt 0)

10+ Year Member



>The Mediapartners bot always visits my sites multiple times for the same page within a short time (usually 3 times).
>...the bot is still hitting up the same url multiple times after it's been called.

That happens on the first page view(s). Once 'the bot' has made its decision about the new page's content, it puts the page on a to-crawl-every-now-and-then list. If you change the content, you rely on a regular re-crawl by Googlebot and/or the AdSense bot to adjust topic targeting of your ads. Sometimes you wait very long for this re-crawl.

That's the point where Google Sitemaps could drastically improve things. The sitemaps protocol defines an attribute 'last modified', populated with a timestamp by the site's underlying script on every content change in the database (or you do it manually on static sites).

Googlebot (and hopefully the mediapartners bot in the future) download the XML sitemap twice a day, harvest fresh (new or modified) content from its page list and re-crawl the updated pages.

If that's not just a mistake in assigning the user agent name in a few occasions (from my tiny set of data it's seemingly not, but I do not have enough data to backup anything), I speculate that the AdSense bot making use of submitted content changes will improve ad targeting to a great degree. That's not a big issue with static content, but all dynamic pages would profit.

<speculation>I guess possible AdSense improvements were part of Google's sitemap master plan from the very beginning. Perhaps AdSense issues were even the project's 'detonator'.</speculation>