Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

URL timeouts with Googlebot - but page loads fine elsewhere

         

devil_dog

3:51 am on Dec 10, 2007 (gmt 0)

10+ Year Member



Im running a news published website.
The webmaster tools shows timeouts for most of my URLs. and my news stories arent being picked up by google. if picked then often hours later (compared to seconds previously).

using ngrep i figured accesses from googlebot (all from 66.249.70.116) are being served really slow... the first packet is served almost instantly but following packets take time.
I am using mod_deflate on apache so i think page generation is not an issue as i assume when gziping the entire page is generated before the first packet is sent.

on some pages its taking over 5 mins! average over 1 min... but its definitely not a apache/php issue as per my testing and feedback, the bulkier pages are generated in about 1 second.
Moreover other sites in this server aren't reporting timeouts..

ive rechecked with my host.. there is no firewal/router which is shaping packets from googlebot.

Why is googlebot using only 1 IP to crawl my site? They must be having loads of datacenters... maybe if they crawl from another datacenter it would be faster? Is it possible my site got dumped to their slowest datacenter?

This problem started on November end.

Any pointers?

devil_dog

4:49 am on Dec 10, 2007 (gmt 0)

10+ Year Member



just some more clarificatons which i missed in OP.

1) Pages load fine (1 to 5 seconds) from browser in my target country, hosting country and the USA
2) Even on slow googlebot GETs my access log shows status as 200 but webmaster tools still shows URL Unreachable.

devil_dog

7:29 am on Dec 10, 2007 (gmt 0)

10+ Year Member



after 1 week of (kinda) no sleep... pulling my hair out (whats left of it) ... finally found a solution.

i noticed while monitoring network traffic that many packets were being sent again and again to googlebot. hence asumed that the reason of delay was packet loss... ping confirmed it... after some tests searching the forums, etc...

i reduced the MTU from 1500 (default) to 500 and almost instantaneously googlebot started showing me the love it once used to....

G works in mysterious ways!

tedster

4:25 pm on Dec 10, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks very much for the details of your experience. If I understand correctly, this issue was fixed by reducing the maximum transmission unit (MTU) - so that the largest packet you now send is greatly reduced in size. I wonder if the larger packets were being lost in transit somewhere, or at googlebot's end of things.

I tried several searches looking for some documentation, and there's no results on google.com for either "googlebot mtu" or "googlebot 'packet size'".

devil_dog

1:10 pm on Dec 11, 2007 (gmt 0)

10+ Year Member



After the fortnight long ordeal... the only explanation i could think of was that there was some change in the routing between google (server with IP 66.249.70.116) and the malaysian data center im using... perhaps one of the routers in between made a boo boo?

I had virtually tried everything even to the extent of (kinda) spamming google groups with multiple threads of my problem... even now am watching the bot activity live...

here are the events that occured

1) Sometime in september my site got included as a google news source
2) Mid november we started adding 100s of stories per day from a paid news agency + inhouse journalists.
3) High traffic... my crappy WAMP became unstable... ordered a new server to load linux.
4) Nov 27th or 28th... new stories not being picked up... even the ones picked up were hours late and only 5% of the stories... <-- at this point until 2 days ago i thought it was some penalty for the poor server performance in the previous weeks... googlebot hit about 3 to 4 times per 10 minutes! (previously was 15 to 20 times a minute)
5) About 2nd December move to LAMP... site very stable and fast but google still said otherwise.
6) about 2 days ago... learnt to use ngrep and monitored googlebot traffic... noticed that the first packed was sent instantly but following packets took upto 10 minutes often the same packet sent over and over again... im using mod_deflate .. so I know that the server had generated the page fast... but was having issues in sending the pages...
7) Used ping 66.249.70.116 -f -l #*$!x . discovered that 500 got the results really fast ... anything higher either it seemed ping took much longer time to execute and often reported packet loss. so following instruction on some forum/blog/website.. i set MTU to 500 + 28.
8) As soon as I did this my other screen (the one where i was runing a watch on the access log for useragent Googlebot) was loaded with activity and the current days articles being picked up in google news... Now its picking news even faster than my previously setup.

Would love to see if anyone can justify what has happened?
Isint the packet size limited to the capacity of the smallest router in the chain?

What MTUs are you people runing? My default was 1500
u can check this by doing the following on linux.
ifconfig

One of the lines it returns would be :-
UP BROADCAST RUNNING MULTICAST MTU:528 Metric:1

devil_dog

1:18 pm on Dec 11, 2007 (gmt 0)

10+ Year Member



yes tedster i think u got it right...

all I know is that the exact moment i reduced the MTU my 2 week long problem was solved.

followed instruction on some forum where they were having packet loss on a DSL connection.