Welcome to WebmasterWorld Guest from 3.85.214.0

Forum Moderators: martinibuster

Message Too Old, No Replies

Problem with Adsense crawler and dynamic sites?

Crawler is retrieving incorrect URLs for dynamic sites

     
1:00 pm on Dec 1, 2006 (gmt 0)

New User

10+ Year Member

joined:May 21, 2003
posts:11
votes: 0


I'm seeing strange things in my access log since this night:


66.249.66.12 - - [01/Dec/2006:03:32:07 +0100] "GET /forums/viewtopic.php%3Ff%3D153&t%3D678703 HTTP/1.1" 404 237 "-" "Mediapartners-Google/2.1"

It requests:


/forums/viewtopic.php%3Ff%3D153&t%3D678703

Which should be (and used to be):


/forums/viewtopic.php?f=153&t=678703

I'm seeing normal and messed up requests around this time (03:27 was the first request). Currently, a few hours later, 100% of the requests return 404's.

I'm 99.8% sure that I haven't changed anything. The site is mainly running a forum and I haven't touched the code in the last couple of days. Also haven't updated any software on that machine (AFAIK). Weird stuff and probably not exactly good for my revenue.

3:59 pm on Dec 1, 2006 (gmt 0)

New User

10+ Year Member

joined:July 12, 2006
posts:6
votes: 0


We're seeing the exact same problem. I emailed our adsense account mgr about it - hopefully they will have some insight. So far, our adsense revenue looks ok for today but I imagine it'd hurt long-term performance if this keeps up (not to mention filling up everyones error logs!)
5:15 pm on Dec 1, 2006 (gmt 0)

Administrator

WebmasterWorld Administrator jatar_k is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:July 24, 2001
posts:15756
votes: 0


the thing is those 2 urls are the same, the first one shouldn't return a 404 if the second is valid. It's just an encoded url.

strange that the bot is requesting them that way though

7:39 pm on Dec 1, 2006 (gmt 0)

New User

10+ Year Member

joined:May 21, 2003
posts: 11
votes: 0


Thanks greeneggsandham! I wasn't really sure if it was a problem with my setup or with Google. So it's kind of good to know that I'm not the only one suffering from this problem.

An account manager *sigh* That would be nice. I'm still stuck with the normal contact form where it takes Google up to 3 weeks to respond :\

This URL is indeed URL encoded but when it's URL encoded all the characters like? and = are escaped after which Apache and PHP doesn't know that those are parameters.

10:34 pm on Dec 1, 2006 (gmt 0)

New User

10+ Year Member

joined:July 12, 2006
posts:6
votes: 0


BartVB,

Do you have the adsense code hard coded on your site? Or do you use an ad server like phpAdsNew? Are your adsense ads delivered as defaults to another ad network?

I'm just trying to figure out why this might be happening. We didn't change anything on our site (it was fine when I went to bed, then this morning all the errors were there!).

The adsense engineers claim they can't figure out why we're seeing errors - they also said they haven't received any reports from anyone else that this is happening. The only reason I notice it is because we get about 400K pageviews a day, and when an error is generated for every adsense view on a page...the logs get some abuse!

So if anyone else IS seeing this in their logs (look in your domain logs and error_log!) please report it to Adsense so they know it's not an isolated case!

10:50 pm on Dec 1, 2006 (gmt 0)

Preferred Member

joined:Apr 17, 2006
posts:433
votes: 0


I got alot of 404's lately in google's webmaster tools, so i'll ensure all 404's go to my root so google can scan the content
11:02 pm on Dec 1, 2006 (gmt 0)

New User

10+ Year Member

joined:May 21, 2003
posts: 11
votes: 0


I'm serving adsense through phpAdsNew (site is in my profile), I'm not using another ads network, I'm using the normal AdSense code and they javascript is retrieved from the pagead2.googlesyndication.com server by the users (i.e. I haven't mirrored that file locally).

Using Apache 2.0.54-5sarge1 and PHP 5.2.0-0.dotdeb.3. Site is serving some 600.000 pages with AdSense banners a day, currently seeing a LOT of 404s :)

No idea what could be causing this except for a problem with their crawler?

I just did some network sniffing and this is what a request from the AdSense crawler looks like:


GET /forums/viewtopic.php%3Ff%3D34&t%3D685890&p%3D20564416%23p20564416 HTTP/1.1
Host: www.[sitename].tld
Connection: Keep-alive
User-Agent: Mediapartners-Google/2.1
Accept-Encoding: gzip

So something either goes wrong with the javascript (that I haven't changed) when it sends the URL of the current page with an AdSense Ad to Google or something goes wrong inside Googles system.

I just received a reply from AdSense support that they have escalated the problem and that they would appreciate it if I would send a screenshot of the problem *sigh* :)

@greedy player: The webmaster tools report about the 'normal' Google Bot, that's a completely different beast.

8:38 am on Dec 2, 2006 (gmt 0)

New User

10+ Year Member

joined:May 21, 2003
posts: 11
votes: 0


It seems like the problem has been fixed. I'm seeing lots of normal 200 and 301's for AdSense at the moment. Not one 404 so far.