Forum Moderators: open
Why should you care about "IMS"? When a smart spider like Googlebot comes around, IMS lets you tell the spider that a page hasn't changed. Then Googlebot can use the old copy of the page. That frees up the bot to download more pages while saving bandwidth. Because of the bandwidth savings, IMS hits are almost "free" in terms of server load. Plain apache can serve _lots_ of IMS queries per second before slowing a machine down.
IMS can work for dynamically generated pages too. Someone posted how to do it for PHP-generated pages, for example. The bottom line is that if your server supports IMS correctly, you can tell Googlebot about more pages without as much server load or bandwidth on your part. As Google crawls more often to make the web a fresher place, adding this flag will help you and search engines.
GoogleGuy tell Googlebot to send Accept-Encoding: gzip, deflate, x-gzip and you can save even more bandwidth.
Yes, I agree. I am starting to encode my pages with gzip if the browser supports it. It saves my server bandwith and help speed downloads of plain HTML pages. Google would save a lot of money each month if they supported this transfer method.
According to my calculations it makes my PHP generated pages 3 times smaller.
There is a nice feature in Apache too, because I have gzip support only in PHP, I am using content negotiation for CSS and JavaScript. This also saves server processing time for static files.
GoogleGuy tell Googlebot to send Accept-Encoding: gzip, deflate, x-gzip and you can save even more bandwidth.
Here's one of my larger pages:
Not compressed length: 111,395 bytes
Compressed length: 14,394 bytes
The compressed page is 13% of the size of the uncompressed page. That is a huge bandwith and speed difference. Google could download 6 of those compressed pages in the time it takes to download 1 of the uncompressed pages.
<script runat=SERVER language=VBSCRIPT>
function DoDateTime(str, nNamedFormat, nLCID)
dim strRet
dim nOldLCIDstrRet = str
If (nLCID > -1) Then
oldLCID = Session.LCID
End IfOn Error Resume Next
If (nLCID > -1) Then
Session.LCID = nLCID
End IfIf ((nLCID < 0) Or (Session.LCID = nLCID)) Then
strRet = FormatDateTime(str, nNamedFormat)
End IfIf (nLCID > -1) Then
Session.LCID = oldLCID
End IfDoDateTime = strRet
End Function
</script>
<% if (getHead.Fields.Item("lastModifiedDate").Value) = "" then
strLastMod = getHead.Fields.Item("addedDate").Value
Else
strLastMod = getHead.Fields.Item("lastModifiedDate").Value
End If
If strLastMod > "1" then strLastMod = strLastMod else strLastMod = "03/01/2002"%><% modHead = DoDateTime((strLastMod), 1, -1) & " 00:00:00 GMT" %><% '=modHead %> <% =Response.AddHeader ("Last-modified", modHead) %>
<%
getHead.Close()
%>
<% end if %>
<% =Request.ServerVariables("ALL_HTTP") %>
This assumes you've got a recordset named "getHead" that contains your modified dates.
In my case, I opened my site on March 1 and didn't add my time stamp field until sometime in May, so in some cases, it's an empty field.
I've also got two fields in the database. When a record is added (new) it goes into the "added date" field. If it's updated, it goes into the "lastModifiedDate" field. I need to pull both in case it was added but never modified (though the added date SHOULD appear in the modified field if it's new, I'm not taking chances). IF neither have a date stamp, I just slap in the value "03/01/2002".
I don't have a time of day part of the field, so I'm just adding midnight as the time stamp. If you have time of day in there, you won't need that.
If your date stamp is already in the proper format, then you don't need the conversion routine. My stamps are: "MM/DD/YYYY" so I need to convert them.
Finally, when you get everything working, you'll want to remove the last lines. That's just there for debugging and returns all of the HTTP headers. This will help you see if your server is returning the "IF MODIFIED SINCE" header. Here are the results from my server:
RUN #1 (I don't have it in my browser cache)
HTTP_ACCEPT:text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,video/x-mng,image/png,image/jpeg,image/gif;q=0.2,text/css,*/*;q=0.1
HTTP_ACCEPT_LANGUAGE:en-us, en;q=0.50
HTTP_CONNECTION:keep-alive
HTTP_HOST:www.rock-n-reel.com
HTTP_USER_AGENT:Mozilla/5.0 (Windows; U; Win98; en-US; rv:1.0.1) Gecko/20020823 Netscape/7.0
HTTP_COOKIE:ASPSESSIONIDQGQGQVWO=MPGDBLJBJGMHPBHJCFEEIFNG; ASPSESSIONIDGQQGGVWU=PBHHNNLBOFDFKGGEEAIDPGME; ASPSESSIONIDQGGGGLZC=EPDNBEOBPEEPAJMLOHBMMNKI
HTTP_ACCEPT_ENCODING:gzip, deflate, compress;q=0.9
HTTP_ACCEPT_CHARSET:ISO-8859-1, utf-8;q=0.66, *;q=0.66
HTTP_KEEP_ALIVE:300
Now I hit RELOAD in my browser...
HTTP_ACCEPT:text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,video/x-mng,image/png,image/jpeg,image/gif;q=0.2,text/css,*/*;q=0.1
HTTP_ACCEPT_LANGUAGE:en-us, en;q=0.50
HTTP_CONNECTION:keep-alive
HTTP_HOST:www.rock-n-reel.com
HTTP_USER_AGENT:Mozilla/5.0 (Windows; U; Win98; en-US; rv:1.0.1) Gecko/20020823 Netscape/7.0
HTTP_COOKIE:ASPSESSIONIDQGQGQVWO=MPGDBLJBJGMHPBHJCFEEIFNG; ASPSESSIONIDGQQGGVWU=PBHHNNLBOFDFKGGEEAIDPGME; ASPSESSIONIDQGGGGLZC=EPDNBEOBPEEPAJMLOHBMMNKI
HTTP_IF_MODIFIED_SINCE:Friday, March 01, 2002 00:00:00 GMT
HTTP_ACCEPT_ENCODING:gzip, deflate, compress;q=0.9
HTTP_ACCEPT_CHARSET:ISO-8859-1, utf-8;q=0.66, *;q=0.66
HTTP_KEEP_ALIVE:300
HTTP_CACHE_CONTROL:max-age=0
WOOOOOO Freakin' Hooooo! It works!
Now, if I'm reading Googleguy correctly, the pages that are in the index this month can get a nice quick and soft hit next month and it can move on and it'll crawl me more deeply. Well, at least that's what I'm hoping. Still have a lot more pages to do, but I wanted to share this with you folks because I was completely lost for about an hour and a half this morning. Once I figured it all out, the whole process was about 10 minutes with an additional 3 minutes per page that I have a last updated recordset on. :)
Cheers and hope this helps at least a few folks!
G.
If strLastMod > "1" then strLastMod = strLastMod else strLastMod = "03/01/2002"
For some reason when I converted my database from Access to mySQL, a value other than "" or " " got put into the fields that were empty. I was lazy and didn't bother finding out what was there. I just found that if there was a date there, then it was greater than "1" and if not, it was less than "1". Just me being a lazy coder. ;)
G.
WOOOOOO Freakin' Hooooo! It works!
But did you send back a 304?
I am not seeing any code to check the date in the if-modified-since header. As it is a dynamic page, your server has no way fo knowing what your content modification date is.
All you sre doing is adding a header to the client the first time and adding a header that is sent to you on the next browse. You are wasteing bandwidth if you do not check the IMS header and return the 304 when appropriate.
From what I read, if the agent (browser or spider) requests the "if_modified_since" header and gets it, then that's all you need. If it ISN'T getting that header, that's where you run into the problems of more load.
Correct me (and clarify, please) if I'm wrong...
G.
The part you are missing is that there are no decisons on the part of the client (browser, googlebot) but the decision is on the part of the server.
The client is saying "GET if-modified since <date>" and the server looks at the date it has stored for that file. it will then send the page + new modified date, or if it hasn't been modified since the date that was sent, it will return a 304.
The server handles this automatically with static content like images or html, but with any form of dynamic content it must be handled by your code.
Andreas