Forum Moderators: open

Message Too Old, No Replies

Are you using If Modified Since?

You should be!

         

GoogleGuy

5:57 pm on Oct 8, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I wanted to urge people to configure their server to support the "If Modified Since" header?

Why should you care about "IMS"? When a smart spider like Googlebot comes around, IMS lets you tell the spider that a page hasn't changed. Then Googlebot can use the old copy of the page. That frees up the bot to download more pages while saving bandwidth. Because of the bandwidth savings, IMS hits are almost "free" in terms of server load. Plain apache can serve _lots_ of IMS queries per second before slowing a machine down.

IMS can work for dynamically generated pages too. Someone posted how to do it for PHP-generated pages, for example. The bottom line is that if your server supports IMS correctly, you can tell Googlebot about more pages without as much server load or bandwidth on your part. As Google crawls more often to make the web a fresher place, adding this flag will help you and search engines.

Lisa

8:15 am on Oct 9, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



GoogleGuy tell Googlebot to send Accept-Encoding: gzip, deflate, x-gzip and you can save even more bandwidth.

Yes, I agree. I am starting to encode my pages with gzip if the browser supports it. It saves my server bandwith and help speed downloads of plain HTML pages. Google would save a lot of money each month if they supported this transfer method.

martin

8:44 am on Oct 9, 2002 (gmt 0)

10+ Year Member



>Google would save a lot of money each month if they supported this transfer method.

According to my calculations it makes my PHP generated pages 3 times smaller.

There is a nice feature in Apache too, because I have gzip support only in PHP, I am using content negotiation for CSS and JavaScript. This also saves server processing time for static files.

Sirius

11:35 am on Oct 9, 2002 (gmt 0)



GoogleGuy tell Googlebot to send Accept-Encoding: gzip, deflate, x-gzip and you can save even more bandwidth.

I'd like to see that too.

Here's one of my larger pages:

Not compressed length: 111,395 bytes
Compressed length: 14,394 bytes

The compressed page is 13% of the size of the uncompressed page. That is a huge bandwith and speed difference. Google could download 6 of those compressed pages in the time it takes to download 1 of the uncompressed pages.

nell

2:02 pm on Oct 9, 2002 (gmt 0)

10+ Year Member



For PHP
This will put in today's date

<?
header ("Last-Modified: " . gmdate("D, d M Y H:i:s") . " GMT");
?>

MagicManiac

2:10 pm on Oct 9, 2002 (gmt 0)

10+ Year Member



Does anyone know exactly how to do this with IIS server (windows 2000)?

martin

2:39 pm on Oct 9, 2002 (gmt 0)

10+ Year Member



<%
Response.Header("Last-Modified: date")

%>

MagicManiac

2:43 pm on Oct 9, 2002 (gmt 0)

10+ Year Member



So there is no server setting to activate?

Sasquatch

3:26 pm on Oct 9, 2002 (gmt 0)



Just putting in a last-modified date will do you no good if you do not send back the 304 header when you get a if-modified-since. That will just increase you bandwidth usage.

Grumpus

3:46 pm on Oct 9, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



ASP Code For ya'll


<script runat=SERVER language=VBSCRIPT>
function DoDateTime(str, nNamedFormat, nLCID)
dim strRet
dim nOldLCID

strRet = str
If (nLCID > -1) Then
oldLCID = Session.LCID
End If

On Error Resume Next

If (nLCID > -1) Then
Session.LCID = nLCID
End If

If ((nLCID < 0) Or (Session.LCID = nLCID)) Then
strRet = FormatDateTime(str, nNamedFormat)
End If

If (nLCID > -1) Then
Session.LCID = oldLCID
End If

DoDateTime = strRet
End Function
</script>
<% if (getHead.Fields.Item("lastModifiedDate").Value) = "" then
strLastMod = getHead.Fields.Item("addedDate").Value
Else
strLastMod = getHead.Fields.Item("lastModifiedDate").Value
End If
If strLastMod > "1" then strLastMod = strLastMod else strLastMod = "03/01/2002"

%><% modHead = DoDateTime((strLastMod), 1, -1) & " 00:00:00 GMT" %><% '=modHead %> <% =Response.AddHeader ("Last-modified", modHead) %>
<%
getHead.Close()
%>
<% end if %>
<% =Request.ServerVariables("ALL_HTTP") %>

This assumes you've got a recordset named "getHead" that contains your modified dates.

In my case, I opened my site on March 1 and didn't add my time stamp field until sometime in May, so in some cases, it's an empty field.

I've also got two fields in the database. When a record is added (new) it goes into the "added date" field. If it's updated, it goes into the "lastModifiedDate" field. I need to pull both in case it was added but never modified (though the added date SHOULD appear in the modified field if it's new, I'm not taking chances). IF neither have a date stamp, I just slap in the value "03/01/2002".

I don't have a time of day part of the field, so I'm just adding midnight as the time stamp. If you have time of day in there, you won't need that.

If your date stamp is already in the proper format, then you don't need the conversion routine. My stamps are: "MM/DD/YYYY" so I need to convert them.

Finally, when you get everything working, you'll want to remove the last lines. That's just there for debugging and returns all of the HTTP headers. This will help you see if your server is returning the "IF MODIFIED SINCE" header. Here are the results from my server:

RUN #1 (I don't have it in my browser cache)


HTTP_ACCEPT:text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,video/x-mng,image/png,image/jpeg,image/gif;q=0.2,text/css,*/*;q=0.1
HTTP_ACCEPT_LANGUAGE:en-us, en;q=0.50
HTTP_CONNECTION:keep-alive
HTTP_HOST:www.rock-n-reel.com
HTTP_USER_AGENT:Mozilla/5.0 (Windows; U; Win98; en-US; rv:1.0.1) Gecko/20020823 Netscape/7.0
HTTP_COOKIE:ASPSESSIONIDQGQGQVWO=MPGDBLJBJGMHPBHJCFEEIFNG; ASPSESSIONIDGQQGGVWU=PBHHNNLBOFDFKGGEEAIDPGME; ASPSESSIONIDQGGGGLZC=EPDNBEOBPEEPAJMLOHBMMNKI
HTTP_ACCEPT_ENCODING:gzip, deflate, compress;q=0.9
HTTP_ACCEPT_CHARSET:ISO-8859-1, utf-8;q=0.66, *;q=0.66
HTTP_KEEP_ALIVE:300

Now I hit RELOAD in my browser...


HTTP_ACCEPT:text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,video/x-mng,image/png,image/jpeg,image/gif;q=0.2,text/css,*/*;q=0.1
HTTP_ACCEPT_LANGUAGE:en-us, en;q=0.50
HTTP_CONNECTION:keep-alive
HTTP_HOST:www.rock-n-reel.com
HTTP_USER_AGENT:Mozilla/5.0 (Windows; U; Win98; en-US; rv:1.0.1) Gecko/20020823 Netscape/7.0
HTTP_COOKIE:ASPSESSIONIDQGQGQVWO=MPGDBLJBJGMHPBHJCFEEIFNG; ASPSESSIONIDGQQGGVWU=PBHHNNLBOFDFKGGEEAIDPGME; ASPSESSIONIDQGGGGLZC=EPDNBEOBPEEPAJMLOHBMMNKI
HTTP_IF_MODIFIED_SINCE:Friday, March 01, 2002 00:00:00 GMT
HTTP_ACCEPT_ENCODING:gzip, deflate, compress;q=0.9
HTTP_ACCEPT_CHARSET:ISO-8859-1, utf-8;q=0.66, *;q=0.66
HTTP_KEEP_ALIVE:300
HTTP_CACHE_CONTROL:max-age=0

WOOOOOO Freakin' Hooooo! It works!

Now, if I'm reading Googleguy correctly, the pages that are in the index this month can get a nice quick and soft hit next month and it can move on and it'll crawl me more deeply. Well, at least that's what I'm hoping. Still have a lot more pages to do, but I wanted to share this with you folks because I was completely lost for about an hour and a half this morning. Once I figured it all out, the whole process was about 10 minutes with an additional 3 minutes per page that I have a last updated recordset on. :)

Cheers and hope this helps at least a few folks!

G.

Grumpus

3:49 pm on Oct 9, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Ack - I forgot to explain this line in the source code, above...

If strLastMod > "1" then strLastMod = strLastMod else strLastMod = "03/01/2002"

For some reason when I converted my database from Access to mySQL, a value other than "" or " " got put into the fields that were empty. I was lazy and didn't bother finding out what was there. I just found that if there was a date there, then it was greater than "1" and if not, it was less than "1". Just me being a lazy coder. ;)

G.

Sasquatch

4:03 pm on Oct 9, 2002 (gmt 0)



WOOOOOO Freakin' Hooooo! It works!

But did you send back a 304?

I am not seeing any code to check the date in the if-modified-since header. As it is a dynamic page, your server has no way fo knowing what your content modification date is.

All you sre doing is adding a header to the client the first time and adding a header that is sent to you on the next browse. You are wasteing bandwidth if you do not check the IMS header and return the 304 when appropriate.

Grumpus

4:17 pm on Oct 9, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The way I understand it from what I read this morning is that Google, like a browser, stores the date that it first got the page and cached it. It sends that date when it asks for the page again and if the date that googlebot sends is greater than the date returned in the "if_modified_since" header, then it knows to use the cached version as it hasn't changed. If it's less than, it's not going to use its cached version, but it's actually going to pull the page again.

From what I read, if the agent (browser or spider) requests the "if_modified_since" header and gets it, then that's all you need. If it ISN'T getting that header, that's where you run into the problems of more load.

Correct me (and clarify, please) if I'm wrong...

G.

Sasquatch

4:31 pm on Oct 9, 2002 (gmt 0)



When you send it a last-modified date, it will then include that date in it's next request. That part you got right.

The part you are missing is that there are no decisons on the part of the client (browser, googlebot) but the decision is on the part of the server.

The client is saying "GET if-modified since <date>" and the server looks at the date it has stored for that file. it will then send the page + new modified date, or if it hasn't been modified since the date that was sent, it will return a 304.

The server handles this automatically with static content like images or html, but with any form of dynamic content it must be handled by your code.

andreasfriedrich

4:37 pm on Oct 9, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



No Grumpus.

  1. The UA sends a request containing the If-Modified-Since: date field in the request header.
  2. The server reads the date and checks whether the requested resource changed since the date given in the If-Modified-Since: date field.
  3. If it changed the server returns a code of 200 and sends the changed data. If the resource did not change, then the server returns a code of 304 Not Modified. The UA will use its cached version.

Andreas

Grumpus

4:39 pm on Oct 9, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Ack. Back to more reading I guess.

Thanks gang!

G.

This 75 message thread spans 5 pages: 75