homepage Welcome to WebmasterWorld Guest from 54.145.252.85
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Google / Google News Archive
Forum Library, Charter, Moderator: open

Google News Archive Forum

    
Apache Server Headers and Googlebot
Not up to scratch on 'server-talk' ...
MikeBeverley

10+ Year Member



 
Msg#: 24177 posted 10:12 am on May 29, 2004 (gmt 0)

Google has been seriously lagging behind in indexing the pages of one of my sites. Yahoo currently adds almost three times the number of pages to it's index daily over Google. Here is the Server Header for the site in question:

"Status: HTTP/1.1 200 OK
Date: Sat, 29 May 2004 10:05:39 GMT
Server: Apache/1.3.27 (Unix) (Red-Hat/Linux) FrontPage/5.0.2.2623 mod_python/2.7.8 Python/1.5.2 mod_ssl/2.8.12 OpenSSL/0.9.6b DAV/1.0.3 PHP/4.3.4 mod_perl/1.26
X-Powered-By: PHP/4.3.4
Connection: close
Transfer-Encoding: chunked
Content-Type: text/html"

Whereas here is the Server Header for a site that uses a completely different system which Google is indexing at a much faster rate:

"Status: HTTP/1.1 200 OK
Date: Sat, 29 May 2004 10:05:09 GMT
Server: Apache/1.3.27 (Unix) (Red-Hat/Linux) FrontPage/5.0.2.2623 mod_python/2.7.8 Python/1.5.2 mod_ssl/2.8.12 OpenSSL/0.9.6b DAV/1.0.3 PHP/4.3.4 mod_perl/1.26
Last-Modified: Fri, 21 May 2004 11:18:38 GMT
ETag: "3d621d-12ed-40ade58e"
Accept-Ranges: bytes
Content-Length: 4845
Connection: close
Content-Type: text/html"

Is the way my servers are 'talking' to Google putting him off? The site that is being indexed slower has many, many more inbound links and has been in the Google index for over two years. It's just that there's alot of advice floating around just now to 'Check your Server Headers' but I'm not sure what I'm supposed to be looking for.

 

ciml

WebmasterWorld Senior Member ciml us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 24177 posted 10:37 am on May 29, 2004 (gmt 0)

I don't see anything there that would put Google off, but the 'Last-Modified' header might encourage deeper spidering.
[webmasterworld.com...]

racer_x

10+ Year Member



 
Msg#: 24177 posted 11:09 am on May 29, 2004 (gmt 0)

I have noticed Googlebot doing similar. My theory is that if Googlebot sees the "Content-Length:" field in the header then it assumes the content is static so it crawls the website more aggressively. But if it doesn't see it then it assumes that it is dynamic content and slows down its crawling a bit.

This is just a theory of mine that supports what I have seen Googlebot do. I may of course be totally wrong.

jamesa

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 24177 posted 11:44 am on May 29, 2004 (gmt 0)

That's funny, racer_x. I was thinking just that the other day. With dynamic content, the server doesn't know how much data it's sending so it chunks it. If the content length is known, as in a static page, then the server will send use the content length header instead. So those headers are a dead giveaway.

If there's any truth that G favors static pages, this is something to be aware of.

stevenmusumeche

10+ Year Member



 
Msg#: 24177 posted 2:11 pm on May 29, 2004 (gmt 0)

You can change a setting in your php.ini file so that Apache won't "identify itself" as using PHP. I doubt that this matters to Google, but I did it anyway.

DaveAtIFG

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 24177 posted 2:24 pm on May 29, 2004 (gmt 0)

Could this be due to simple PR differences between the sites? It's often reported that sites with higher PR are spidered more frequently.

MikeBeverley

10+ Year Member



 
Msg#: 24177 posted 7:45 pm on May 29, 2004 (gmt 0)

Thanks for all your info thus far,

You can change a setting in your php.ini file so that Apache won't "identify itself" as using PHP. I doubt that this matters to Google, but I did it anyway.

Which of the info would that change/hide out of the server header tags?

ciml

WebmasterWorld Senior Member ciml us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 24177 posted 10:25 am on May 31, 2004 (gmt 0)

The "X-Powered-By: PHP/4.3.4" line can be removed and the "Server: Apache/1.3.27 (Unix) [etc.]" line can be changed, but as stevenmusumeche points out, the 'Server' and 'X-Powered-By' HTTP headers are not an issue for Google.

DaveAtIFG makes a good point, more PR helps with crawling.

A quick server response also helps encourage Googlebot to fetch more pages during its time on the site.

The main benefit of the 'Last-Modified' header is that Googlebot can send 'If-Modified-Since' headers. If it asks for a page that hasn't changed then the server can send a 304 (not modified) response, allowing Googlebot to crawl deeper into the site instead of indexing the same pages again and again.

MikeBeverley

10+ Year Member



 
Msg#: 24177 posted 4:00 pm on Jun 1, 2004 (gmt 0)

Could this be due to simple PR differences between the sites? It's often reported that sites with higher PR are spidered more frequently.

No, the site that is not being crawled much is PR4 and the site that is being indexed really fast is PR0 (it has not yet been given a rank as it has only been in the index two months and Google hasn't done a PR update for ages).

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google News Archive
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved