Forum Moderators: open
Why should you care about "IMS"? When a smart spider like Googlebot comes around, IMS lets you tell the spider that a page hasn't changed. Then Googlebot can use the old copy of the page. That frees up the bot to download more pages while saving bandwidth. Because of the bandwidth savings, IMS hits are almost "free" in terms of server load. Plain apache can serve _lots_ of IMS queries per second before slowing a machine down.
IMS can work for dynamically generated pages too. Someone posted how to do it for PHP-generated pages, for example. The bottom line is that if your server supports IMS correctly, you can tell Googlebot about more pages without as much server load or bandwidth on your part. As Google crawls more often to make the web a fresher place, adding this flag will help you and search engines.
GoogleGuy tell Googlebot to send Accept-Encoding: gzip, deflate, x-gzip and you can save even more bandwidth.
Yes, I agree. I am starting to encode my pages with gzip if the browser supports it. It saves my server bandwith and help speed downloads of plain HTML pages. Google would save a lot of money each month if they supported this transfer method.
According to my calculations it makes my PHP generated pages 3 times smaller.
There is a nice feature in Apache too, because I have gzip support only in PHP, I am using content negotiation for CSS and JavaScript. This also saves server processing time for static files.
GoogleGuy tell Googlebot to send Accept-Encoding: gzip, deflate, x-gzip and you can save even more bandwidth.
Here's one of my larger pages:
Not compressed length: 111,395 bytes
Compressed length: 14,394 bytes
The compressed page is 13% of the size of the uncompressed page. That is a huge bandwith and speed difference. Google could download 6 of those compressed pages in the time it takes to download 1 of the uncompressed pages.
<script runat=SERVER language=VBSCRIPT>
function DoDateTime(str, nNamedFormat, nLCID)
dim strRet
dim nOldLCIDstrRet = str
If (nLCID > -1) Then
oldLCID = Session.LCID
End IfOn Error Resume Next
If (nLCID > -1) Then
Session.LCID = nLCID
End IfIf ((nLCID < 0) Or (Session.LCID = nLCID)) Then
strRet = FormatDateTime(str, nNamedFormat)
End IfIf (nLCID > -1) Then
Session.LCID = oldLCID
End IfDoDateTime = strRet
End Function
</script>
<% if (getHead.Fields.Item("lastModifiedDate").Value) = "" then
strLastMod = getHead.Fields.Item("addedDate").Value
Else
strLastMod = getHead.Fields.Item("lastModifiedDate").Value
End If
If strLastMod > "1" then strLastMod = strLastMod else strLastMod = "03/01/2002"%><% modHead = DoDateTime((strLastMod), 1, -1) & " 00:00:00 GMT" %><% '=modHead %> <% =Response.AddHeader ("Last-modified", modHead) %>
<%
getHead.Close()
%>
<% end if %>
<% =Request.ServerVariables("ALL_HTTP") %>
This assumes you've got a recordset named "getHead" that contains your modified dates.
In my case, I opened my site on March 1 and didn't add my time stamp field until sometime in May, so in some cases, it's an empty field.
I've also got two fields in the database. When a record is added (new) it goes into the "added date" field. If it's updated, it goes into the "lastModifiedDate" field. I need to pull both in case it was added but never modified (though the added date SHOULD appear in the modified field if it's new, I'm not taking chances). IF neither have a date stamp, I just slap in the value "03/01/2002".
I don't have a time of day part of the field, so I'm just adding midnight as the time stamp. If you have time of day in there, you won't need that.
If your date stamp is already in the proper format, then you don't need the conversion routine. My stamps are: "MM/DD/YYYY" so I need to convert them.
Finally, when you get everything working, you'll want to remove the last lines. That's just there for debugging and returns all of the HTTP headers. This will help you see if your server is returning the "IF MODIFIED SINCE" header. Here are the results from my server:
RUN #1 (I don't have it in my browser cache)
HTTP_ACCEPT:text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,video/x-mng,image/png,image/jpeg,image/gif;q=0.2,text/css,*/*;q=0.1
HTTP_ACCEPT_LANGUAGE:en-us, en;q=0.50
HTTP_CONNECTION:keep-alive
HTTP_HOST:www.rock-n-reel.com
HTTP_USER_AGENT:Mozilla/5.0 (Windows; U; Win98; en-US; rv:1.0.1) Gecko/20020823 Netscape/7.0
HTTP_COOKIE:ASPSESSIONIDQGQGQVWO=MPGDBLJBJGMHPBHJCFEEIFNG; ASPSESSIONIDGQQGGVWU=PBHHNNLBOFDFKGGEEAIDPGME; ASPSESSIONIDQGGGGLZC=EPDNBEOBPEEPAJMLOHBMMNKI
HTTP_ACCEPT_ENCODING:gzip, deflate, compress;q=0.9
HTTP_ACCEPT_CHARSET:ISO-8859-1, utf-8;q=0.66, *;q=0.66
HTTP_KEEP_ALIVE:300
Now I hit RELOAD in my browser...
HTTP_ACCEPT:text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,video/x-mng,image/png,image/jpeg,image/gif;q=0.2,text/css,*/*;q=0.1
HTTP_ACCEPT_LANGUAGE:en-us, en;q=0.50
HTTP_CONNECTION:keep-alive
HTTP_HOST:www.rock-n-reel.com
HTTP_USER_AGENT:Mozilla/5.0 (Windows; U; Win98; en-US; rv:1.0.1) Gecko/20020823 Netscape/7.0
HTTP_COOKIE:ASPSESSIONIDQGQGQVWO=MPGDBLJBJGMHPBHJCFEEIFNG; ASPSESSIONIDGQQGGVWU=PBHHNNLBOFDFKGGEEAIDPGME; ASPSESSIONIDQGGGGLZC=EPDNBEOBPEEPAJMLOHBMMNKI
HTTP_IF_MODIFIED_SINCE:Friday, March 01, 2002 00:00:00 GMT
HTTP_ACCEPT_ENCODING:gzip, deflate, compress;q=0.9
HTTP_ACCEPT_CHARSET:ISO-8859-1, utf-8;q=0.66, *;q=0.66
HTTP_KEEP_ALIVE:300
HTTP_CACHE_CONTROL:max-age=0
WOOOOOO Freakin' Hooooo! It works!
Now, if I'm reading Googleguy correctly, the pages that are in the index this month can get a nice quick and soft hit next month and it can move on and it'll crawl me more deeply. Well, at least that's what I'm hoping. Still have a lot more pages to do, but I wanted to share this with you folks because I was completely lost for about an hour and a half this morning. Once I figured it all out, the whole process was about 10 minutes with an additional 3 minutes per page that I have a last updated recordset on. :)
Cheers and hope this helps at least a few folks!
G.
If strLastMod > "1" then strLastMod = strLastMod else strLastMod = "03/01/2002"
For some reason when I converted my database from Access to mySQL, a value other than "" or " " got put into the fields that were empty. I was lazy and didn't bother finding out what was there. I just found that if there was a date there, then it was greater than "1" and if not, it was less than "1". Just me being a lazy coder. ;)
G.
WOOOOOO Freakin' Hooooo! It works!
But did you send back a 304?
I am not seeing any code to check the date in the if-modified-since header. As it is a dynamic page, your server has no way fo knowing what your content modification date is.
All you sre doing is adding a header to the client the first time and adding a header that is sent to you on the next browse. You are wasteing bandwidth if you do not check the IMS header and return the 304 when appropriate.
From what I read, if the agent (browser or spider) requests the "if_modified_since" header and gets it, then that's all you need. If it ISN'T getting that header, that's where you run into the problems of more load.
Correct me (and clarify, please) if I'm wrong...
G.
The part you are missing is that there are no decisons on the part of the client (browser, googlebot) but the decision is on the part of the server.
The client is saying "GET if-modified since <date>" and the server looks at the date it has stored for that file. it will then send the page + new modified date, or if it hasn't been modified since the date that was sent, it will return a 304.
The server handles this automatically with static content like images or html, but with any form of dynamic content it must be handled by your code.
Andreas
What about the dynamic pages? We have thousdands of products and items come off, go up every day ( products sell out, different stuff comes in, we rebuild this file everyday). Should I put a last modified here?
Andreas
In my case, I don't want the freshbot hitting all my content pages every time I change the navigation of my site, but I want the users to get the new navigation.
1. I want to keep everyone within 1 month of current, even on navigation. Set the first date to the most recent 20th of the month.
2. If it's not googlebot, add the dates of all the source files to the array.
3. add the update dates of the content to the array.
4. do a MAX() on the array, and that is my last-modified date.
In the case of catalog pages you probably don't care if google crawls each time your "in stock" count for an item goes up or down, so you might not want to include that value in your last-modified date for google, but you want to include it for your customers.
May I boldly suggest to GG that an information page for server administrators broken down between IIS & Apache & etc (other web servers) would be appropriate, and perhaps even a header testing tool.
the "if modified since" header is automatic on IIS 5
To be sure let me stress this again. It is the useragent that makes a conditional request with the If-Modified-Since: date field. The server does not send such a header.
The quoted statement would have to read: the "if modified since" header is handled automatically by IIS 5.
Andreas
I agree that some sort of best practices document would be welcome. But the truth is that they are not going to officially suggest this for dynamic content as too many people will not implement it right.
NOTE: IIS 5 does support RFC2616(If-Modified-Since) for static websites and images, but when it comes to ASP you will want to add the following functionality so that you are certain that an unchanged page will be bypassed.
I think the Grumpus example above is a little off - He is sending the response a date when actually you need to check the request date sent (from the client) and compare it to your modified date (from the server), and if it has not been modified since then you need to send a 304 to the client.
This should get the ASP coders out there a start.
<%
Dim dModified, sModifiedSince, sModifiedLast, ckDate'This is the date in which the last update was made.
dModified = "10/29/1994 7:43:31 PM"' Add 7 hours to our time for PST to GMT difference
dModified = DateAdd ("h",7,dModified)'If the HTTP_IF_MODIFIED_SINCE exists then compare it
If Len(Request.ServerVariables("HTTP_IF_MODIFIED_SINCE")) > 0 Then'Modify our date to make it readable in VBScript
sModifiedSince = Request.ServerVariables("HTTP_IF_MODIFIED_SINCE")
sModifiedSince = Left(sModifiedSince, Len(sModifiedSince) - 4)
sModifiedSince = Right(sModifiedSince, Len(sModifiedSince) - 5)
ckDate=CDate(sModifiedSince)'Compare our dates and throw a 304 if our date is less then or = to the If Modified Since Date
If (dModified<=ckDate) Then
Response.Clear
Response.Status = "304 Not Modified"
Response.End
End If
End If'This may not be necessary but I am passing back the Last-Modified date
'Converting it back to the Standard Date Format
sModifiedLast = WeekDayName(WeekDay(dModified),TRUE) & ", " & Day(dModified) & " " & MonthName(Month(dModified),TRUE) &_
" " & Year(dModified) &" "& Hour(dModified) & ":" & Minute(dModified) & ":" & Second(dModified) & " GMT"
' Passing back the Last-Modified
Response.AddHeader "Last-modified", sModifiedLast
%>
On our e-commerce sites (on order submit) we send ourselves an e-mail with full customer details, enter selected customer and order details in a MySQL database and send the customer an order confirmation e-mail. We do all that in a single sendorder.php page and use this to no-cache and expire that page:
<?
$delete = time() + 1;
header ("Expires: $delete");
header ("Last-Modified: " . gmdate("D, d M Y H:i:s") . " GMT");
header ("Cache-Control: no-cache, must-revalidate"); // HTTP/1.1
header ("Pragma: no-cache"); // HTTP/1.0
.
Other Last-Modified Usage
Again: the Last-Modified entity header field specifies when the URL was last modified. This is just to inform the useragent when the last change was made.
The If-Modified-Since request header field is send by the useragent to ask the server to send the requested information pointed to by the URL only if it has been modified since the specified time.
nellīs example which can be found in the documentation for the header [php.net] function is just the serverīs attempt to prevent caching of the document.
The If-Modified-Since request header field is the useragentīs way to always work with the latest version of a given document.
Hope this clears things up.
Andreas
As in, it visits today and gets
Last-Modified: Thu, 10 Oct 2002 00:18:53 GMT
Then visits next week and sends
If-Modified-Since: Thu, 10 Oct 2002 00:18:53 GMT
I can use a string comparison in PHP if this is the case. But if Googlebot uses some other method of creating the if-modified-since then I'll need something more complicated.
Does anyone have working PHP code for an if-modified-since check? (looking for non-register-globals code)
No. But it should be fairly easy to write. Just look at my explanation on what to do. Gizmareīs ASP code should help you as well.
I've been toying with it, but don't seem to be able to get my browser to actually make that request.
This is entirely unrelated to the question of how to implement If-Modified-Since handling in PHP. There is nothing you can do on the server side to force a useragent to use the If-Modified-Since header field.
To test your implementation telnet to your server at port 80 and do:
GET / HTTP/1.1
Host: your.server.tld
Connection: close
If-Modified-Since: date
Andreas
$last holds the string version of the time you think the page was last modified.
if ($last) {
$last = strtotime($last);
$cond = isset($_SERVER['HTTP_IF_MODIFIED_SINCE'])? $_SERVER['HTTP_IF_MODIFIED_SINCE'] : 0;
if ($cond and $_SERVER['REQUEST_METHOD'] == 'GET' and strtotime($cond) >= $last) {
header('HTTP/1.0 304 Not Modified');
exit;
}
header("Last-Modified: " . $this->rfc_date($last));
}
Strip $_SERVER[' and the matching '] to get it to register_globals.
Google's educational september update - a history:
Step 1: show the webmasters that they waste their time with seo
Step 2: show them that they should learn about webmastering first
[add]Goal: if they work on their server they don't have time to seo-stress google. :)[/add]
.. ts, ts ...
[Yidaki now drive's to his office to clean his server's response headers, too ;)]