Forum Moderators: martinibuster

Message Too Old, No Replies

Adsense bot did a brain-fart.

         

noodlebox

2:33 pm on Dec 29, 2006 (gmt 0)



haha I just looked at my blocked crawl attempts and they are google cache result pages being craweled by adsense and put Robots.txt error as the reason for block... obviously google ip /robots.txt is going to block it but haha..

anyone else had this?

indias next no1

3:29 pm on Dec 29, 2006 (gmt 0)

10+ Year Member



i am getting the same failed attempt message of google cache page , reason robots.txt

what it means?

is there is any reason on our side?

noodlebox

12:29 pm on Dec 30, 2006 (gmt 0)



My opinion: Our visitors clicked Cache on google search

RonS

5:19 pm on Dec 30, 2006 (gmt 0)

10+ Year Member



LOL

Google's own robots.txt file is blocking the AdSense bot from scanning cached pages! I've ALWAYS wondered why AdSense can't be SMART ENOUGH to understand that when a request comes in for an ad placement that starts with a GoogleIP and /search?q=cache to scan the URL and pick up the page's real URL and serve up ads based on the previously scanned contents of the page. But this takes the cake!

In my "blocked URL"s report, I have
[72.14.253.104...] [snip]

Looking at
[72.14.253.104...]

User-agent: *
Allow: /searchhistory/
Disallow: /news?output=xhtml&
Allow: /news?output=xhtml
Disallow: /search
[...]

BTW, looking at [google.com...] I see the same thing:

User-agent: *
Allow: /searchhistory/
Disallow: /news?output=xhtml&
Allow: /news?output=xhtml
Disallow: /search
[...]

Maybe it's for the same reason I didn't get a GoogleGift. "Who knows"

[edited by: RonS at 5:37 pm (utc) on Dec. 30, 2006]

DamonHD

5:37 pm on Dec 30, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi,

I think that this is only going to happen when visitors click on the ads in your cached pages at G.

Of course, if it really bothers you then you could avoid serving ads on pages for the non-AS G bots, or use the appropriate meta-tags to prevent G showing the cached page, but I don't recommend either.

This might happen, for example, if your domain name contains a "rude" word that is blocked by someone's filter, eg a work, and they have bypassed it by looking up your page in cache. If this is happening for your fu**********.com domain then I'd suggest you have an explanation.

Rgds

Damon

[edited by: DamonHD at 5:40 pm (utc) on Dec. 30, 2006]

plasma

5:45 pm on Dec 30, 2006 (gmt 0)

10+ Year Member



I don't see a reason to have google cache enabled.
It's your content and not Google's, and you want the visitors to visit your site.

RonS

5:47 pm on Dec 30, 2006 (gmt 0)

10+ Year Member



Damon and plasma,

I have thousands and thousands of visitors who have viewed Google's cached version of my pages over the years, and this is the first time it has shown up in this report.

Just quickly glancing over the last 6 months, my click-tracking software has tracked at least 100 clicks on my ads while they were viewing Google's cache of my pages. This is on a very well respected, Family Friendly site.

People use Google's cache for many reasons, including if your site is offline, or to use the highlighting of search terms, which I personally find to be very nice. The toolbar can also help you locate search terms on a page you've googled.

[edited by: RonS at 5:50 pm (utc) on Dec. 30, 2006]

DamonHD

5:54 pm on Dec 30, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi,

Sure, agreed it isn't ONLY for filter dodging, though that would be more common in the kind of case I'm suggesting.

Yes, the highlighted text in the cache, and the "still available if the site is dead" feature also. One of the reasons that I became a fan of G search.

Rgds

Damon

Sunflux

4:27 am on Dec 31, 2006 (gmt 0)

10+ Year Member



Agreed, the bot had a brain fart. My "Site diagnostics" page has been clear of errors ever since they introduced the feature. I know when I checked a couple days ago it was empty. After your post I decided to go look, and sure enough I now have 10 pages of these cached robots.txt errors all from the 28th and 29th.

indias next no1

5:32 am on Dec 31, 2006 (gmt 0)

10+ Year Member



i read a news that most of the cables get damaged due to a earthquake in asia and they are redirecting the traffic through atlantic servers, this may be the reason for google adsense bots to spider the pages from the servers in asia. Because the newspaper quoted that most of the internet traffic in asia is getting affected due to this reason. Any idea?

noodlebox

11:26 am on Dec 31, 2006 (gmt 0)



28th my reports showed google ip cache of my pages.

I just thought that google ads don't show up on cached pages without telling you... this is a new entry to one of the biggest brain farts google have done so far :)

ashii

12:03 pm on Dec 31, 2006 (gmt 0)

10+ Year Member



yes,First time since this feature was introduced I am getting about 30 URLs reported.All of them are Google cache.
All of them reported on 29th.

fredw

7:33 pm on Dec 31, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It's more than a temporary fart. It is still occurring. Urls have added to my block list as late as yesterday, and it's not only Google cache pages. Yesterday an altavista bablefish url was added to my blocked list with the same robots.txt reason.

Someone up at the plex is going to have a headache when they get back from vacation...

zCat

8:17 pm on Dec 31, 2006 (gmt 0)

10+ Year Member



Here too.

If google would like to hand me the passwords to their servers, I'd be glad to "resolve these issues" ;-).