Forum Moderators: Robert Charlton & goodroi
I've tried: Checking for the HTTP_IF_MODIFIED_SINCE header and returns "304 Not Modified" if possible.
Problem: Googlebot doesn't always send this header. Even if they already know about a page they doesn't always send the header.
I've tried: Using the expires header to tell google that each page should expire in a month from the request.
Problem: Googlebot keep requesting the pages. They seem to ignore this header.
I've tried: Lowering the crawl rate to "Slow" in google webmaster tools.
Problem: This doesn't seem to have any significant effect.
Are there other solutions to this problem? I don't want to ban googlebot since we get a lot of visitors from google.
header("Expires: " . gmdate("D, d M Y H:i:s", time()+24*60*60) . " GMT");
if($_SERVER['HTTP_IF_MODIFIED_SINCE']===$date ¦¦ $_SERVER['HTTP_IF_NONE_MATCH']===$etag) {
header('HTTP/1.1 304 Not Modified'); exit();
}
I use a simple md5 hash based on URL / filemtime() to create custom ETag headers and ensure if I update the file a new request is made.
The order in the file should be reversed...
The Expires is created and set only if there is no match to either header, and needs to be set before any other output.
Justin
I have this header so far:
<?php
// $file contains the file name of the page being displayed (the actual
// content, not any templates you may be using). We take the last modified
// date of this file.
$mtime = filemtime(__FILE__);
// Create a HTTP conformant date, example 'Mon, 22 Dec 2003 14:16:16 GMT'
$gmt_mtime = gmdate('D, d M Y H:i:s', $mtime).' GMT';
// send a unique 'strong' identifier. This is always the same for this
// particular file while the file itself remains the same.
header('ETag: "'.md5($mtime.$file).'"');
// check if the last modified date sent by the client is the the same as
// the last modified date of the requested file. If so, return 304 header
// and exit.
if(isset($_SERVER['HTTP_IF_MODIFIED_SINCE']))
{
if ($_SERVER['HTTP_IF_MODIFIED_SINCE'] == $gmt_mtime)
{
header('HTTP/1.1 304 Not Modified');
exit();
}
}
// check if the Etag sent by the client is the same as the Etag of the
// requested file. If so, return 304 header and exit.
if (isset($_SERVER['HTTP_IF_NONE_MATCH']))
{
if (str_replace('"', '', stripslashes($_SERVER['HTTP_IF_NONE_MATCH'])) == md5($mtime.$file))
{
header("HTTP/1.1 304 Not Modified");
// abort processing and exit
exit();
}
}
// output last modified header using the last modified date of the file.
header('Last-Modified: '.$gmt_mtime);
// tell all caches that this resource is publically cacheable.
header('Cache-Control: must-revalidate');
// this resource expires one day from now.
header("Expires: " . gmdate("D, d M Y H:i:s", time()+24*60*60) . " GMT");
// set the content-type
if (isset($_SERVER["HTTP_ACCEPT"]) && stristr( $_SERVER["HTTP_ACCEPT"], "application/xhtml+xml") ) {
header ("Content-type: application/xhtml+xml; charset=utf-8");
} else {
header ("Content-type: text/html; charset=utf-8");
}
// start output.
// Note that no output can precede the headers unless you call ob_start().
// You don't have to use gzip, but it greatly saves on bandwidth (for text)
// at the cost of a little more processing.
ob_start ("ob_gzhandler");
?>
What do you think about that?
[edited by: Webnauts at 6:21 am (utc) on Oct. 11, 2007]
There are few things I notice at a glance:
1. You need to move the ETag header down to where the rest of the headers are set, so the checks are first.
2. I don't usually set the cache-control header, but it overrides the expires setting if a max-age is set, so if you are still having issues, you might set it to cache for a day (or the time period of your choice --- 24*60*60 = 1 day) using max-age=86400...
3. The comment for the cache control header says you want to let the file be cached publicly, but you are running 'must-revalidate', so the remote cache will always be compared to the original source. Are you sure that's what you want?
Justin
Try [alexandre.alapetite.net...]
[w3.org...]
Expires:
The format is an absolute date and time as defined by HTTP-date in section 3.3.1; it MUST be in RFC 1123 date format.
max-age:
Indicates that the client is willing to accept a response whose age is no greater than the specified time in seconds.
Justin