Forum Moderators: open
"Status: HTTP/1.1 200 OK
Date: Sat, 29 May 2004 10:05:39 GMT
Server: Apache/1.3.27 (Unix) (Red-Hat/Linux) FrontPage/5.0.2.2623 mod_python/2.7.8 Python/1.5.2 mod_ssl/2.8.12 OpenSSL/0.9.6b DAV/1.0.3 PHP/4.3.4 mod_perl/1.26
X-Powered-By: PHP/4.3.4
Connection: close
Transfer-Encoding: chunked
Content-Type: text/html"
Whereas here is the Server Header for a site that uses a completely different system which Google is indexing at a much faster rate:
"Status: HTTP/1.1 200 OK
Date: Sat, 29 May 2004 10:05:09 GMT
Server: Apache/1.3.27 (Unix) (Red-Hat/Linux) FrontPage/5.0.2.2623 mod_python/2.7.8 Python/1.5.2 mod_ssl/2.8.12 OpenSSL/0.9.6b DAV/1.0.3 PHP/4.3.4 mod_perl/1.26
Last-Modified: Fri, 21 May 2004 11:18:38 GMT
ETag: "3d621d-12ed-40ade58e"
Accept-Ranges: bytes
Content-Length: 4845
Connection: close
Content-Type: text/html"
Is the way my servers are 'talking' to Google putting him off? The site that is being indexed slower has many, many more inbound links and has been in the Google index for over two years. It's just that there's alot of advice floating around just now to 'Check your Server Headers' but I'm not sure what I'm supposed to be looking for.
This is just a theory of mine that supports what I have seen Googlebot do. I may of course be totally wrong.
If there's any truth that G favors static pages, this is something to be aware of.
DaveAtIFG makes a good point, more PR helps with crawling.
A quick server response also helps encourage Googlebot to fetch more pages during its time on the site.
The main benefit of the 'Last-Modified' header is that Googlebot can send 'If-Modified-Since' headers. If it asks for a page that hasn't changed then the server can send a 304 (not modified) response, allowing Googlebot to crawl deeper into the site instead of indexing the same pages again and again.
Could this be due to simple PR differences between the sites? It's often reported that sites with higher PR are spidered more frequently.
No, the site that is not being crawled much is PR4 and the site that is being indexed really fast is PR0 (it has not yet been given a rank as it has only been in the index two months and Google hasn't done a PR update for ages).