homepage Welcome to WebmasterWorld Guest from 54.205.99.71
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Google / Google News Archive
Forum Library, Charter, Moderator: open

Google News Archive Forum

This 75 message thread spans 3 pages: < < 75 ( 1 [2] 3 > >     
Are you using If Modified Since?
You should be!
GoogleGuy




msg:201337
 5:57 pm on Oct 8, 2002 (gmt 0)

I wanted to urge people to configure their server to support the "If Modified Since" header?

Why should you care about "IMS"? When a smart spider like Googlebot comes around, IMS lets you tell the spider that a page hasn't changed. Then Googlebot can use the old copy of the page. That frees up the bot to download more pages while saving bandwidth. Because of the bandwidth savings, IMS hits are almost "free" in terms of server load. Plain apache can serve _lots_ of IMS queries per second before slowing a machine down.

IMS can work for dynamically generated pages too. Someone posted how to do it for PHP-generated pages, for example. The bottom line is that if your server supports IMS correctly, you can tell Googlebot about more pages without as much server load or bandwidth on your part. As Google crawls more often to make the web a fresher place, adding this flag will help you and search engines.

 

Lisa




msg:201367
 8:15 am on Oct 9, 2002 (gmt 0)

GoogleGuy tell Googlebot to send Accept-Encoding: gzip, deflate, x-gzip and you can save even more bandwidth.

Yes, I agree. I am starting to encode my pages with gzip if the browser supports it. It saves my server bandwith and help speed downloads of plain HTML pages. Google would save a lot of money each month if they supported this transfer method.

martin




msg:201368
 8:44 am on Oct 9, 2002 (gmt 0)

>Google would save a lot of money each month if they supported this transfer method.

According to my calculations it makes my PHP generated pages 3 times smaller.

There is a nice feature in Apache too, because I have gzip support only in PHP, I am using content negotiation for CSS and JavaScript. This also saves server processing time for static files.

Sirius




msg:201369
 11:35 am on Oct 9, 2002 (gmt 0)

GoogleGuy tell Googlebot to send Accept-Encoding: gzip, deflate, x-gzip and you can save even more bandwidth.

I'd like to see that too.

Here's one of my larger pages:

Not compressed length: 111,395 bytes
Compressed length: 14,394 bytes

The compressed page is 13% of the size of the uncompressed page. That is a huge bandwith and speed difference. Google could download 6 of those compressed pages in the time it takes to download 1 of the uncompressed pages.

nell




msg:201370
 2:02 pm on Oct 9, 2002 (gmt 0)

For PHP
This will put in today's date

<?
header ("Last-Modified: " . gmdate("D, d M Y H:i:s") . " GMT");
?>

MagicManiac




msg:201371
 2:10 pm on Oct 9, 2002 (gmt 0)

Does anyone know exactly how to do this with IIS server (windows 2000)?

martin




msg:201372
 2:39 pm on Oct 9, 2002 (gmt 0)

<%
Response.Header("Last-Modified: date")

%>

MagicManiac




msg:201373
 2:43 pm on Oct 9, 2002 (gmt 0)

So there is no server setting to activate?

Sasquatch




msg:201374
 3:26 pm on Oct 9, 2002 (gmt 0)

Just putting in a last-modified date will do you no good if you do not send back the 304 header when you get a if-modified-since. That will just increase you bandwidth usage.

Grumpus




msg:201375
 3:46 pm on Oct 9, 2002 (gmt 0)

ASP Code For ya'll


<script runat=SERVER language=VBSCRIPT>
function DoDateTime(str, nNamedFormat, nLCID)
dim strRet
dim nOldLCID

strRet = str
If (nLCID > -1) Then
oldLCID = Session.LCID
End If

On Error Resume Next

If (nLCID > -1) Then
Session.LCID = nLCID
End If

If ((nLCID < 0) Or (Session.LCID = nLCID)) Then
strRet = FormatDateTime(str, nNamedFormat)
End If

If (nLCID > -1) Then
Session.LCID = oldLCID
End If

DoDateTime = strRet
End Function
</script>
<% if (getHead.Fields.Item("lastModifiedDate").Value) = "" then
strLastMod = getHead.Fields.Item("addedDate").Value
Else
strLastMod = getHead.Fields.Item("lastModifiedDate").Value
End If
If strLastMod > "1" then strLastMod = strLastMod else strLastMod = "03/01/2002"

%><% modHead = DoDateTime((strLastMod), 1, -1) & " 00:00:00 GMT" %><% '=modHead %> <% =Response.AddHeader ("Last-modified", modHead) %>
<%
getHead.Close()
%>
<% end if %>
<% =Request.ServerVariables("ALL_HTTP") %>

This assumes you've got a recordset named "getHead" that contains your modified dates.

In my case, I opened my site on March 1 and didn't add my time stamp field until sometime in May, so in some cases, it's an empty field.

I've also got two fields in the database. When a record is added (new) it goes into the "added date" field. If it's updated, it goes into the "lastModifiedDate" field. I need to pull both in case it was added but never modified (though the added date SHOULD appear in the modified field if it's new, I'm not taking chances). IF neither have a date stamp, I just slap in the value "03/01/2002".

I don't have a time of day part of the field, so I'm just adding midnight as the time stamp. If you have time of day in there, you won't need that.

If your date stamp is already in the proper format, then you don't need the conversion routine. My stamps are: "MM/DD/YYYY" so I need to convert them.

Finally, when you get everything working, you'll want to remove the last lines. That's just there for debugging and returns all of the HTTP headers. This will help you see if your server is returning the "IF MODIFIED SINCE" header. Here are the results from my server:

RUN #1 (I don't have it in my browser cache)

HTTP_ACCEPT:text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,video/x-mng,image/png,image/jpeg,image/gif;q=0.2,text/css,*/*;q=0.1
HTTP_ACCEPT_LANGUAGE:en-us, en;q=0.50
HTTP_CONNECTION:keep-alive
HTTP_HOST:www.rock-n-reel.com
HTTP_USER_AGENT:Mozilla/5.0 (Windows; U; Win98; en-US; rv:1.0.1) Gecko/20020823 Netscape/7.0
HTTP_COOKIE:ASPSESSIONIDQGQGQVWO=MPGDBLJBJGMHPBHJCFEEIFNG; ASPSESSIONIDGQQGGVWU=PBHHNNLBOFDFKGGEEAIDPGME; ASPSESSIONIDQGGGGLZC=EPDNBEOBPEEPAJMLOHBMMNKI
HTTP_ACCEPT_ENCODING:gzip, deflate, compress;q=0.9
HTTP_ACCEPT_CHARSET:ISO-8859-1, utf-8;q=0.66, *;q=0.66
HTTP_KEEP_ALIVE:300

Now I hit RELOAD in my browser...


HTTP_ACCEPT:text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,video/x-mng,image/png,image/jpeg,image/gif;q=0.2,text/css,*/*;q=0.1
HTTP_ACCEPT_LANGUAGE:en-us, en;q=0.50
HTTP_CONNECTION:keep-alive
HTTP_HOST:www.rock-n-reel.com
HTTP_USER_AGENT:Mozilla/5.0 (Windows; U; Win98; en-US; rv:1.0.1) Gecko/20020823 Netscape/7.0
HTTP_COOKIE:ASPSESSIONIDQGQGQVWO=MPGDBLJBJGMHPBHJCFEEIFNG; ASPSESSIONIDGQQGGVWU=PBHHNNLBOFDFKGGEEAIDPGME; ASPSESSIONIDQGGGGLZC=EPDNBEOBPEEPAJMLOHBMMNKI
HTTP_IF_MODIFIED_SINCE:Friday, March 01, 2002 00:00:00 GMT
HTTP_ACCEPT_ENCODING:gzip, deflate, compress;q=0.9
HTTP_ACCEPT_CHARSET:ISO-8859-1, utf-8;q=0.66, *;q=0.66
HTTP_KEEP_ALIVE:300
HTTP_CACHE_CONTROL:max-age=0

WOOOOOO Freakin' Hooooo! It works!

Now, if I'm reading Googleguy correctly, the pages that are in the index this month can get a nice quick and soft hit next month and it can move on and it'll crawl me more deeply. Well, at least that's what I'm hoping. Still have a lot more pages to do, but I wanted to share this with you folks because I was completely lost for about an hour and a half this morning. Once I figured it all out, the whole process was about 10 minutes with an additional 3 minutes per page that I have a last updated recordset on. :)

Cheers and hope this helps at least a few folks!

G.

Grumpus




msg:201376
 3:49 pm on Oct 9, 2002 (gmt 0)

Ack - I forgot to explain this line in the source code, above...

If strLastMod > "1" then strLastMod = strLastMod else strLastMod = "03/01/2002"

For some reason when I converted my database from Access to mySQL, a value other than "" or " " got put into the fields that were empty. I was lazy and didn't bother finding out what was there. I just found that if there was a date there, then it was greater than "1" and if not, it was less than "1". Just me being a lazy coder. ;)

G.

Sasquatch




msg:201377
 4:03 pm on Oct 9, 2002 (gmt 0)

WOOOOOO Freakin' Hooooo! It works!

But did you send back a 304?

I am not seeing any code to check the date in the if-modified-since header. As it is a dynamic page, your server has no way fo knowing what your content modification date is.

All you sre doing is adding a header to the client the first time and adding a header that is sent to you on the next browse. You are wasteing bandwidth if you do not check the IMS header and return the 304 when appropriate.

Grumpus




msg:201378
 4:17 pm on Oct 9, 2002 (gmt 0)

The way I understand it from what I read this morning is that Google, like a browser, stores the date that it first got the page and cached it. It sends that date when it asks for the page again and if the date that googlebot sends is greater than the date returned in the "if_modified_since" header, then it knows to use the cached version as it hasn't changed. If it's less than, it's not going to use its cached version, but it's actually going to pull the page again.

From what I read, if the agent (browser or spider) requests the "if_modified_since" header and gets it, then that's all you need. If it ISN'T getting that header, that's where you run into the problems of more load.

Correct me (and clarify, please) if I'm wrong...

G.

Sasquatch




msg:201379
 4:31 pm on Oct 9, 2002 (gmt 0)

When you send it a last-modified date, it will then include that date in it's next request. That part you got right.

The part you are missing is that there are no decisons on the part of the client (browser, googlebot) but the decision is on the part of the server.

The client is saying "GET if-modified since <date>" and the server looks at the date it has stored for that file. it will then send the page + new modified date, or if it hasn't been modified since the date that was sent, it will return a 304.

The server handles this automatically with static content like images or html, but with any form of dynamic content it must be handled by your code.

andreasfriedrich




msg:201380
 4:37 pm on Oct 9, 2002 (gmt 0)

No Grumpus.

  1. The UA sends a request containing the If-Modified-Since: date field in the request header.
  2. The server reads the date and checks whether the requested resource changed since the date given in the If-Modified-Since: date field.
  3. If it changed the server returns a code of 200 and sends the changed data. If the resource did not change, then the server returns a code of 304 Not Modified. The UA will use its cached version.

Andreas

Grumpus




msg:201381
 4:39 pm on Oct 9, 2002 (gmt 0)

Ack. Back to more reading I guess.

Thanks gang!

G.

graywolf




msg:201382
 4:57 pm on Oct 9, 2002 (gmt 0)

Ok so I run an asp site with static and dynamic content. For the static pages I can keep a table of the pages and the last modified date and publish that into my header (a little more maintenance but no big deal).

What about the dynamic pages? We have thousdands of products and items come off, go up every day ( products sell out, different stuff comes in, we rebuild this file everyday). Should I put a last modified here?

andreasfriedrich




msg:201383
 4:58 pm on Oct 9, 2002 (gmt 0)

Answering a request containing a If-Modified-Since: date field is easy to implement in any server side scripting language. Any implementation will involve the following steps.

  1. Check incoming header for If-Modified-Since: date field.
  2. Unless it is there continue with your script.
  3. Read the date the Useragent supplied.
  4. Check whether the content your script would produce has changed since that date. How you do that, depends on a lot of circumstances. Grumpus pointed out one way. If the content you produce depends on more than one database record you need to take that into consideration.
  5. Send a 304 - Not Modified response header if there were no changes and exit your script. As you see, this method saves not only bandwidth but processing time for your own server as well.
  6. If there were changes, just run your script as you did before.

Andreas

Sasquatch




msg:201384
 5:09 pm on Oct 9, 2002 (gmt 0)

You should also consider what information you consider important enough, an who you are sending it to.

In my case, I don't want the freshbot hitting all my content pages every time I change the navigation of my site, but I want the users to get the new navigation.

1. I want to keep everyone within 1 month of current, even on navigation. Set the first date to the most recent 20th of the month.
2. If it's not googlebot, add the dates of all the source files to the array.
3. add the update dates of the content to the array.
4. do a MAX() on the array, and that is my last-modified date.

In the case of catalog pages you probably don't care if google crawls each time your "in stock" count for an item goes up or down, so you might not want to include that value in your last-modified date for google, but you want to include it for your customers.

chinook




msg:201385
 9:23 pm on Oct 9, 2002 (gmt 0)

A quick scan of the MS knowledge base seems to imply that the "if modified since" header is automatic on IIS 5, EXCEPT when urlscan has been configured in a particular way.

May I boldly suggest to GG that an information page for server administrators broken down between IIS & Apache & etc (other web servers) would be appropriate, and perhaps even a header testing tool.

andreasfriedrich




msg:201386
 9:38 pm on Oct 9, 2002 (gmt 0)

the "if modified since" header is automatic on IIS 5

To be sure let me stress this again. It is the useragent that makes a conditional request with the If-Modified-Since: date field. The server does not send such a header.

The quoted statement would have to read: the "if modified since" header is handled automatically by IIS 5.

Andreas

Sasquatch




msg:201387
 9:39 pm on Oct 9, 2002 (gmt 0)

I seriously hope that MS is only saying that about static pages! Of course it would be just like MS to decide that they know better than you about your dynamic pages.

I agree that some sort of best practices document would be welcome. But the truth is that they are not going to officially suggest this for dynamic content as too many people will not implement it right.

Gizmare




msg:201388
 10:04 pm on Oct 9, 2002 (gmt 0)

Ok here is an example I wrote that you may use to get your ASP to function properly according to RFC2616(If-Modified-Since). Note I have no error trapping, and I have not tested this solution, but anyone familiar with ASP should be able to add that functionality.

NOTE: IIS 5 does support RFC2616(If-Modified-Since) for static websites and images, but when it comes to ASP you will want to add the following functionality so that you are certain that an unchanged page will be bypassed.

I think the Grumpus example above is a little off - He is sending the response a date when actually you need to check the request date sent (from the client) and compare it to your modified date (from the server), and if it has not been modified since then you need to send a 304 to the client.

This should get the ASP coders out there a start.


<%
Dim dModified, sModifiedSince, sModifiedLast, ckDate

'This is the date in which the last update was made.
dModified = "10/29/1994 7:43:31 PM"

' Add 7 hours to our time for PST to GMT difference
dModified = DateAdd ("h",7,dModified)

'If the HTTP_IF_MODIFIED_SINCE exists then compare it
If Len(Request.ServerVariables("HTTP_IF_MODIFIED_SINCE")) > 0 Then

'Modify our date to make it readable in VBScript
sModifiedSince = Request.ServerVariables("HTTP_IF_MODIFIED_SINCE")
sModifiedSince = Left(sModifiedSince, Len(sModifiedSince) - 4)
sModifiedSince = Right(sModifiedSince, Len(sModifiedSince) - 5)
ckDate=CDate(sModifiedSince)

'Compare our dates and throw a 304 if our date is less then or = to the If Modified Since Date
If (dModified<=ckDate) Then
Response.Clear
Response.Status = "304 Not Modified"
Response.End
End If
End If

'This may not be necessary but I am passing back the Last-Modified date
'Converting it back to the Standard Date Format
sModifiedLast = WeekDayName(WeekDay(dModified),TRUE) & ", " & Day(dModified) & " " & MonthName(Month(dModified),TRUE) &_
" " & Year(dModified) &" "& Hour(dModified) & ":" & Minute(dModified) & ":" & Second(dModified) & " GMT"
' Passing back the Last-Modified
Response.AddHeader "Last-modified", sModifiedLast
%>


nell




msg:201389
 12:35 am on Oct 10, 2002 (gmt 0)

Other Last-Modified Usage

On our e-commerce sites (on order submit) we send ourselves an e-mail with full customer details, enter selected customer and order details in a MySQL database and send the customer an order confirmation e-mail. We do all that in a single sendorder.php page and use this to no-cache and expire that page:

<?
$delete = time() + 1;
header ("Expires: $delete");
header ("Last-Modified: " . gmdate("D, d M Y H:i:s") . " GMT");
header ("Cache-Control: no-cache, must-revalidate"); // HTTP/1.1
header ("Pragma: no-cache"); // HTTP/1.0

.

andreasfriedrich




msg:201390
 12:55 am on Oct 10, 2002 (gmt 0)

Other Last-Modified Usage

Again: the Last-Modified entity header field specifies when the URL was last modified. This is just to inform the useragent when the last change was made.

The If-Modified-Since request header field is send by the useragent to ask the server to send the requested information pointed to by the URL only if it has been modified since the specified time.

nellīs example which can be found in the documentation for the header [php.net] function is just the serverīs attempt to prevent caching of the document.

The If-Modified-Since request header field is the useragentīs way to always work with the latest version of a given document.

Hope this clears things up.

Andreas

Finder




msg:201391
 6:48 am on Oct 10, 2002 (gmt 0)

Does Googlebot use the Last-Modified header from a previous visit to form its If-Modified-Since request?

As in, it visits today and gets

Last-Modified: Thu, 10 Oct 2002 00:18:53 GMT

Then visits next week and sends

If-Modified-Since: Thu, 10 Oct 2002 00:18:53 GMT

I can use a string comparison in PHP if this is the case. But if Googlebot uses some other method of creating the if-modified-since then I'll need something more complicated.

Sasquatch




msg:201392
 7:22 am on Oct 10, 2002 (gmt 0)

That should work, but a better method would be to use strtotime() on the time that they send you, then do an if ($modified > $since) in case of some sort of screwup.

Slade




msg:201393
 2:03 pm on Oct 10, 2002 (gmt 0)

Does anyone have working PHP code for an if-modified-since check? (looking for non-register-globals code)

I've been toying with it, but don't seem to be able to get my browser to actually make that request.

andreasfriedrich




msg:201394
 2:29 pm on Oct 10, 2002 (gmt 0)

Does anyone have working PHP code for an if-modified-since check? (looking for non-register-globals code)

No. But it should be fairly easy to write. Just look at my explanation on what to do. Gizmareīs ASP code should help you as well.

I've been toying with it, but don't seem to be able to get my browser to actually make that request.

This is entirely unrelated to the question of how to implement If-Modified-Since handling in PHP. There is nothing you can do on the server side to force a useragent to use the If-Modified-Since header field.

To test your implementation telnet to your server at port 80 and do:

GET / HTTP/1.1 
Host: your.server.tld
Connection: close
If-Modified-Since: date

Andreas

martin




msg:201395
 2:41 pm on Oct 10, 2002 (gmt 0)

>Does anyone have working PHP code for an if-modified-since check?

$last holds the string version of the time you think the page was last modified.

if ($last) {
$last = strtotime($last);
$cond = isset($_SERVER['HTTP_IF_MODIFIED_SINCE'])? $_SERVER['HTTP_IF_MODIFIED_SINCE'] : 0;

if ($cond and $_SERVER['REQUEST_METHOD'] == 'GET' and strtotime($cond) >= $last) {
header('HTTP/1.0 304 Not Modified');
exit;
}

header("Last-Modified: " . $this->rfc_date($last));
}

Strip $_SERVER[' and the matching '] to get it to register_globals.

Yidaki




msg:201396
 4:34 pm on Oct 10, 2002 (gmt 0)

GoogleGuy, do you smile about seo techies that know everything about "hacking the google algo" but don't even know the basics about how their webservers work? ... bad boy! ;)

Google's educational september update - a history:
Step 1: show the webmasters that they waste their time with seo
Step 2: show them that they should learn about webmastering first
[add]Goal: if they work on their server they don't have time to seo-stress google. :)[/add]

.. ts, ts ...

[Yidaki now drive's to his office to clean his server's response headers, too ;)]

Sasquatch




msg:201397
 5:41 pm on Oct 10, 2002 (gmt 0)

Now, now. All he was doing was passing on some info when I asked him about cutting down on the freshbot load.

If he really wanted to get webmasters to work on it he would tell them that you would get some of the bandwidth back for deeper crawls. Or drop some sort of hint that sites that return 304s will get a very minor boost in the SERPs.

Imagine how much faster surfing would be if they were to do that!

This 75 message thread spans 3 pages: < < 75 ( 1 [2] 3 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google News Archive
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved