Forum Moderators: Robert Charlton & goodroi
using ngrep i figured accesses from googlebot (all from 66.249.70.116) are being served really slow... the first packet is served almost instantly but following packets take time.
I am using mod_deflate on apache so i think page generation is not an issue as i assume when gziping the entire page is generated before the first packet is sent.
on some pages its taking over 5 mins! average over 1 min... but its definitely not a apache/php issue as per my testing and feedback, the bulkier pages are generated in about 1 second.
Moreover other sites in this server aren't reporting timeouts..
ive rechecked with my host.. there is no firewal/router which is shaping packets from googlebot.
Why is googlebot using only 1 IP to crawl my site? They must be having loads of datacenters... maybe if they crawl from another datacenter it would be faster? Is it possible my site got dumped to their slowest datacenter?
This problem started on November end.
Any pointers?
i noticed while monitoring network traffic that many packets were being sent again and again to googlebot. hence asumed that the reason of delay was packet loss... ping confirmed it... after some tests searching the forums, etc...
i reduced the MTU from 1500 (default) to 500 and almost instantaneously googlebot started showing me the love it once used to....
G works in mysterious ways!
I tried several searches looking for some documentation, and there's no results on google.com for either "googlebot mtu" or "googlebot 'packet size'".
I had virtually tried everything even to the extent of (kinda) spamming google groups with multiple threads of my problem... even now am watching the bot activity live...
here are the events that occured
1) Sometime in september my site got included as a google news source
2) Mid november we started adding 100s of stories per day from a paid news agency + inhouse journalists.
3) High traffic... my crappy WAMP became unstable... ordered a new server to load linux.
4) Nov 27th or 28th... new stories not being picked up... even the ones picked up were hours late and only 5% of the stories... <-- at this point until 2 days ago i thought it was some penalty for the poor server performance in the previous weeks... googlebot hit about 3 to 4 times per 10 minutes! (previously was 15 to 20 times a minute)
5) About 2nd December move to LAMP... site very stable and fast but google still said otherwise.
6) about 2 days ago... learnt to use ngrep and monitored googlebot traffic... noticed that the first packed was sent instantly but following packets took upto 10 minutes often the same packet sent over and over again... im using mod_deflate .. so I know that the server had generated the page fast... but was having issues in sending the pages...
7) Used ping 66.249.70.116 -f -l #*$!x . discovered that 500 got the results really fast ... anything higher either it seemed ping took much longer time to execute and often reported packet loss. so following instruction on some forum/blog/website.. i set MTU to 500 + 28.
8) As soon as I did this my other screen (the one where i was runing a watch on the access log for useragent Googlebot) was loaded with activity and the current days articles being picked up in google news... Now its picking news even faster than my previously setup.
Would love to see if anyone can justify what has happened?
Isint the packet size limited to the capacity of the smallest router in the chain?
What MTUs are you people runing? My default was 1500
u can check this by doing the following on linux.
ifconfig
One of the lines it returns would be :-
UP BROADCAST RUNNING MULTICAST MTU:528 Metric:1