Welcome to WebmasterWorld Guest from 54.227.72.69

Forum Moderators: open

Message Too Old, No Replies

Are you using If Modified Since?

You should be!

     
5:57 pm on Oct 8, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member googleguy is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Oct 8, 2001
posts:2882
votes: 0


I wanted to urge people to configure their server to support the "If Modified Since" header?

Why should you care about "IMS"? When a smart spider like Googlebot comes around, IMS lets you tell the spider that a page hasn't changed. Then Googlebot can use the old copy of the page. That frees up the bot to download more pages while saving bandwidth. Because of the bandwidth savings, IMS hits are almost "free" in terms of server load. Plain apache can serve _lots_ of IMS queries per second before slowing a machine down.

IMS can work for dynamically generated pages too. Someone posted how to do it for PHP-generated pages, for example. The bottom line is that if your server supports IMS correctly, you can tell Googlebot about more pages without as much server load or bandwidth on your part. As Google crawls more often to make the web a fresher place, adding this flag will help you and search engines.

6:04 pm on Oct 8, 2002 (gmt 0)

Senior Member

joined:June 28, 2002
posts:851
votes: 0


Hey GG,

although I am not a techy, however I am passing your message on to my tech team, seems to make a lot of sense and probably a way forward in the relationship between webmasters and google.

Shak

6:04 pm on Oct 8, 2002 (gmt 0)

Administrator from GB 

WebmasterWorld Administrator engine is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:May 9, 2000
posts:23267
votes: 359


Thanks for the tip, GoogleGuy.
I assume that this means the "Fresh" will be picked up more easily and I'll be happy to instigate it asap.
6:13 pm on Oct 8, 2002 (gmt 0)

Preferred Member

10+ Year Member

joined:Aug 3, 2002
posts:482
votes: 0


It's really nice that you like to see Last-Modified and 304 Not Modified more often but it's not always so easy to use this for generated pages.

I return some 304s on PHP generated pages but it's not always so easy to get the modification time right, it's a lot harder than get the file modification date and use it.

6:16 pm on Oct 8, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


Hmmm...

I thought those 304 response codes to googlebot in my log were something new... :)
Glad to see CondGET supported.

Jim

6:31 pm on Oct 8, 2002 (gmt 0)

Preferred Member

10+ Year Member

joined:July 16, 2001
posts:545
votes: 0


Along the same vein, why doesn't the bot do HEAD requests?

edit: OK, a little bit of reading says that could actually be less eficient.

[edited by: Slade at 6:33 pm (utc) on Oct. 8, 2002]

6:33 pm on Oct 8, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member ciml is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:June 22, 2001
posts:3805
votes: 2


I wonder how much bandwidth can be saved (for Google and other users) by a best-practice suggestion from a Google engineer. Probably more than if the same suggestion was made by just about anyone else. This is a very nice tip.

For Apache users with static content included by SSI (eg. headers and footers that don't change often), XBitHack Full [httpd.apache.org] is the answer. For those who can't edit the server configuration, it can be enabled by .htaccess

6:41 pm on Oct 8, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Jan 1, 2002
posts:1017
votes: 1


Perhaps Brett could do up a little tool up for all of us non-techies to CHECK for it? (Sorta like the keyword density tool?)
6:46 pm on Oct 8, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member ciml is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:June 22, 2001
posts:3805
votes: 2


>... HEAD requests?
> edit: OK, a little bit of reading says that could actually be less eficient

Yep. If the GET has If-Modified-Since, then the server should send the body with 200 status or an empty 304 header.

With HEAD, the server should just send the header. If the content's changed, the bot would need to ask again with a GET.

Scott, click control panel [webmasterworld.com], then Server Headers [webmasterworld.com]. If there's a Last-Modified header then it should be cache friendly.

6:50 pm on Oct 8, 2002 (gmt 0)

Moderator

WebmasterWorld Administrator buckworks is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 9, 2001
posts:5654
votes: 58


Is there a techie in the house who could spell out EXACTLY how to do this for folks like me whose techie skills are rudimentary?

Or point out some good tutorials or something?

I want to do anything I can to help make Googlebot happy, but I simply don't know what to do here!

6:58 pm on Oct 8, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:July 16, 2001
posts:2006
votes: 0


Can someone explain the relationship between
If-Modified-Since, Last-Modified, and <META NAME="GOOGLEBOT" CONTENT="NOARCHIVE">?
7:17 pm on Oct 8, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member ciml is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:June 22, 2001
posts:3805
votes: 2


Last-Modified is the date and time when the page was last modified. This can be recorded by the user agent or robot.

The user agent or robot can then send an If-Modified-Since header with the date and time of the Last-Modified header last time the URL was fetched.

If the content has changed, the server can send the new version with a 200 (OK) header. If the content has not changed, then the server can send a 304 (not modified) header and no content. The robot can just keep the content from last time, saving bandwidth. RFC 2616 [ietf.org] explains in more detail.

The META NOARCHIVE tag asks Googlebot not to keep a cache of your page. Google's support pages [google.com] describe its use.

7:22 pm on Oct 8, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


buckworks,

I found this site [mnot.net] to be quite useful. It has a tutorial and a page cacheability checker.

Jim

Sasquatch

7:52 pm on Oct 8, 2002 (gmt 0)

Inactive Member
Account Expired

 
 


Hmmm, this is going to be fun. I will need to keep track of the last modified time of all the real content elements that my PHP code uses to generate the page and use that date for last-modified when the request comes from googlebot, and give it a monthly last-modified so that the update will include my navigation updates.

Then for the users I will need to include the latest times for both the navigation and the update parts of the code, in the last-modified dates.

All in all a worthwhile goal, especially as we climb towards our bandwidth cap, even aside from slowing down the freshbot. But I still think a "nofresh" meta tag would get a lot more use and would free up the freshbot from even having to send a IMS get in the first place.

But I will take what I can get.

8:37 pm on Oct 8, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Feb 20, 2002
posts:812
votes: 1


GoogleGuy

If you'll be saving bandwidth you can do your crawl faster right?
So would this lead to more frequent updates?

8:56 pm on Oct 8, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 10, 2001
posts:1459
votes: 0


Thanks for the tip GG ...will be useful for sites with large number of pages .
9:07 pm on Oct 8, 2002 (gmt 0)

Full Member

10+ Year Member

joined:Mar 25, 2002
posts:292
votes: 0


Does this tip just apply to those larger enterprises that have their own (dedicated) servers? What about the little guy that only needs / can only afford shared (virtual) hosting? Anything we can do 'client side'? (this is from a non-technie).

Sasquatch

9:17 pm on Oct 8, 2002 (gmt 0)

Inactive Member
Account Expired

 
 


quiet_man,

What sort of server is hosting your website?

What language do you use to produce your pages?

It truns out that with most methods of generating dynamic pages, you are able to manually process the headers.

On the other hand, if you are using static HTML, all the settings are server side, and I would hope they are set up properly by default.

9:39 pm on Oct 8, 2002 (gmt 0)

New User

10+ Year Member

joined:Sept 17, 2002
posts:16
votes: 0


"So would this lead to more frequent updates?"

As long as they are not like the latest one.

Sasquatch

9:49 pm on Oct 8, 2002 (gmt 0)

Inactive Member
Account Expired

 
 


As long as they are not like the latest one.

I am getting really tired of this topic sneaking into every thread. Please refrain from adding these comments every time someone mentions "update".

10:01 pm on Oct 8, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member googleguy is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Oct 8, 2001
posts:2882
votes: 0


"If you'll be saving bandwidth you can do your crawl faster right?"

Right on, Chico_Loco. The other nice bonus is that if we don't have to waste time fetching pages that haven't changed, that gives us a chance to crawl more new pages. Everybody wins because crawling gets more efficient.

10:09 pm on Oct 8, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:July 22, 2002
posts:1807
votes: 1


Good tip Google Guy.....
It's nice when a suggestion is mutually beneficial:)
It keeps from sweet lil' googlebot hammering servers too!:)

I just checked on Brett's tool, and it said that I had it turned on. I'm not REAL techie, but my boss and I just installed this Linux (apache) webserver recently......is 'last modified' turned on by default?

10:33 pm on Oct 8, 2002 (gmt 0)

Junior Member

10+ Year Member

joined:Dec 1, 2001
posts:106
votes: 0


Hi GoogleGuy

My web site is not hosted on an Apache server but on Microsoft IIS. Anything like this "If Modified Since" available for the Microsoft NT/IIS platform?

10:44 pm on Oct 8, 2002 (gmt 0)

New User

10+ Year Member

joined:Apr 7, 2003
posts:2
votes: 0


john5 - you can read headers in ASP on IIS by using the Request object. I am not in a position to test it, but I think 'Request.ServerVariables("HTTP_IF-MODIFIED-SINCE")' would return the date GoogleBot is asking for you to check.

You would then check that internally and if your page has not changed return 'Response.Status="304 Not Modified"' and 'Response.End'. If it has changed simply return the page as normal.

Gareth

10:45 pm on Oct 8, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member rfgdxm1 is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:May 12, 2002
posts:4479
votes: 0


ME! Does this mean Googleguy I'll get higher SERPs because the server I'm on does so? ;) While I don't pay that much attention to the raw logs, I definitely know that the server gives 304 responses. I noticed Googlebot is nosy and comes around like every 2 days for the home page, and is getting a 304.
10:53 pm on Oct 8, 2002 (gmt 0)

Junior Member

10+ Year Member

joined:Aug 18, 2002
posts:131
votes: 0


Someone posted how to do it for PHP-generated pages

Does anyone know which thread this refers to? I did a couple of searches and couldn't find anything.

My server does send 304 responses, but only for image files. My index page has PHP at the very top that creates an "expires" header each time the page is requested. But there is no if-modified-since header being sent with that page.

Sasquatch

11:14 pm on Oct 8, 2002 (gmt 0)

Inactive Member
Account Expired

 
 


The solution was incomplete since he did not send the 304 if the dates were the same.

Do a search on "php if modified since" without quotes in google and you should find some good sites right there on the front page.

1:39 am on Oct 9, 2002 (gmt 0)

Junior Member

10+ Year Member

joined:May 18, 2002
posts:126
votes: 0


My server is returning a correct 304 for pages that are not updated. However, Googlebot gets the 304 and then immediately requests the page again and gets a 200 and the full payload. Is google currently asking for the page no matter what? Or is this a new feature that we will see soon. Or is there something more to this than returning a 304.
1:44 am on Oct 9, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:July 18, 2001
posts:889
votes: 0


This is a timely thread for me, I think, but I could just be totally confused too. Is this the same as TTL (time to live)?

I just moved my site from one host to another (Sunday about noon the DNS name server was changed) Late yesterday I started seeing hits in the new server logs including bunches of visits from ms. googlebot today :) - one worry down.

However, I checked the old server logs a few minutes ago and there are just a few hits there now - most are my own IP and some from inktomi slurp :(

My question .. am I still seeing my own hits in the old log because my ISP has not updated their DNS? I've dumped the cache manually several times today. I know I'm still seing the old server files because I used absolute urls there, but on the new server I'm using relative urls. Can't get my email downloaded from the new server either and I'm assuming this is an ISP DNS issue.

My old server logs have consistently shown 304's when I changed pages so I'm not sure if this discussion is the same thing as TTL or not.

Thanks..

<edit-- said that wrong, 304 when the file was not changed >

6:35 am on Oct 9, 2002 (gmt 0)

Preferred Member

10+ Year Member

joined:Aug 3, 2002
posts:482
votes: 0


GoogleGuy tell Googlebot to send Accept-Encoding: gzip, deflate, x-gzip and you can save even more bandwidth.
This 75 message thread spans 3 pages: 75