homepage Welcome to WebmasterWorld Guest from 54.167.41.199
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / WebmasterWorld / Website Analytics - Tracking and Logging
Forum Library, Charter, Moderators: Receptional & mademetop

Website Analytics - Tracking and Logging Forum

    
MSN/Yahoo code 200 - Googlebot code 304
jeremymgp




msg:895915
 9:41 am on Feb 25, 2005 (gmt 0)

Hi,

I've recently added a large database to a site and am hoping the search engines will crawl as many pages possible. I've been looking through my logs and found that while MSN for example is doing nicely with lots of "200" status codes, when GoogleBot crawls the database pages they come up as 304 redirects. For example:

207.46.98.129 - - [24/Feb/2005] "GET /adirectory/index.php/AnEntry HTTP/1.0" 200 7832 "-" "msnbot/1.0 (+http://search.msn.com/msnbot.htm)"

OK so far, but with Googlebot I get:
66.249.64.6 - - [24/Feb/2005] "GET /adirectory/index.php/AnEntry HTTP/1.0" 304 0 "-" "Googlebot/2.1 (+http://www.google.com/bot.html)"

I believe the database uses a lot of internal redirecting, but how come MSN and even minor bots like BecomeBot have no problem finding the pages wheras Google is hitting redirects and going no further? Google eats up the static pages no problem but for the database it's always a 304 redirect, 0 bytes read, and Googlebot stops crawling the page missing the database underneath entirely.

Do I have a problem and what can I do about it?

Thanks for your help,

Jeremy

 

NameNick




msg:895916
 12:39 pm on Feb 25, 2005 (gmt 0)

jeremymgp,

304 is no redirect but the "Not Modified" status code.

From w3.org
Not Modified 304

If the client has done a conditional GET and access is allowed, but the document has not been modified since the date and time specified in If-Modified-Since field, the server responds with a 304 status code and does not send the document body to the client.

Response headers are as if the client had sent a HEAD request, but limited to only those headers which make sense in this context. This means only headers that are relevant to cache managers and which may have changed independently of the document's Last-Modified date. Examples include Date , Server and Expires .

The purpose of this feature is to allow efficient updates of local cache information (including relevant metainformation) without requiring the overhead of multiple HTTP requests (e.g. a HEAD followed by a GET) and minimizing the transmittal of information already known by the requesting client (usually a caching proxy).

That means that MSN bot is downloading your content again and again, even if it hasn't been changed.

Google bot instead recognizes unchanged content and saves your traffic quota.

Best regards

NN

cgrantski




msg:895917
 4:29 pm on Feb 25, 2005 (gmt 0)

NameNick - this is great information, worth the price of admission for sure. I have wondered about this for so long.

pendanticist




msg:895918
 5:01 pm on Feb 25, 2005 (gmt 0)

[w3.org...]

mattglet




msg:895919
 7:03 pm on Feb 25, 2005 (gmt 0)

A nice discussion [webmasterworld.com] to review, straight from the horse's mouth.

jeremymgp




msg:895920
 3:49 am on Feb 26, 2005 (gmt 0)

Hi,

Thanks for the great info, Googlebot and the "If Modified Since" sound a great idea unfortunately in my case Google is not saving time by skipping pages it's already crawled, but rather not crawling new pages at all.

How can I turn off the "IMS header" functionality and get my 200 status codes back?

Thanks,

Jeremy

mattglet




msg:895921
 12:59 pm on Feb 26, 2005 (gmt 0)

I would say your best bet is not to turn off IMS, but actually to use it in your favor. You should look into some functionality that alters the IMS to "Now" or "a few minutes ago", so any bot that looks, will see the modified date to be very recent. This will hopefully entice them to crawl your pages, and you would get your 200's back.

jeremymgp




msg:895922
 3:16 pm on Feb 26, 2005 (gmt 0)

mattglet,

Intelligent reply, it's always a buzz when you finally cut through to the good information and smart ideas. Will do my best to put this into action - thankyou :)

Jeremy

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / WebmasterWorld / Website Analytics - Tracking and Logging
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved