| 10:37 am on May 29, 2004 (gmt 0)|
I don't see anything there that would put Google off, but the 'Last-Modified' header might encourage deeper spidering.
| 11:09 am on May 29, 2004 (gmt 0)|
I have noticed Googlebot doing similar. My theory is that if Googlebot sees the "Content-Length:" field in the header then it assumes the content is static so it crawls the website more aggressively. But if it doesn't see it then it assumes that it is dynamic content and slows down its crawling a bit.
This is just a theory of mine that supports what I have seen Googlebot do. I may of course be totally wrong.
| 11:44 am on May 29, 2004 (gmt 0)|
That's funny, racer_x. I was thinking just that the other day. With dynamic content, the server doesn't know how much data it's sending so it chunks it. If the content length is known, as in a static page, then the server will send use the content length header instead. So those headers are a dead giveaway.
If there's any truth that G favors static pages, this is something to be aware of.
| 2:11 pm on May 29, 2004 (gmt 0)|
You can change a setting in your php.ini file so that Apache won't "identify itself" as using PHP. I doubt that this matters to Google, but I did it anyway.
| 2:24 pm on May 29, 2004 (gmt 0)|
Could this be due to simple PR differences between the sites? It's often reported that sites with higher PR are spidered more frequently.
| 7:45 pm on May 29, 2004 (gmt 0)|
Thanks for all your info thus far,
|You can change a setting in your php.ini file so that Apache won't "identify itself" as using PHP. I doubt that this matters to Google, but I did it anyway. |
Which of the info would that change/hide out of the server header tags?
| 10:25 am on May 31, 2004 (gmt 0)|
The "X-Powered-By: PHP/4.3.4" line can be removed and the "Server: Apache/1.3.27 (Unix) [etc.]" line can be changed, but as stevenmusumeche points out, the 'Server' and 'X-Powered-By' HTTP headers are not an issue for Google.
DaveAtIFG makes a good point, more PR helps with crawling.
A quick server response also helps encourage Googlebot to fetch more pages during its time on the site.
The main benefit of the 'Last-Modified' header is that Googlebot can send 'If-Modified-Since' headers. If it asks for a page that hasn't changed then the server can send a 304 (not modified) response, allowing Googlebot to crawl deeper into the site instead of indexing the same pages again and again.
| 4:00 pm on Jun 1, 2004 (gmt 0)|
|Could this be due to simple PR differences between the sites? It's often reported that sites with higher PR are spidered more frequently. |
No, the site that is not being crawled much is PR4 and the site that is being indexed really fast is PR0 (it has not yet been given a rank as it has only been in the index two months and Google hasn't done a PR update for ages).