Welcome to WebmasterWorld Guest from 126.96.36.199
Forum Moderators: open
Why should you care about "IMS"? When a smart spider like Googlebot comes around, IMS lets you tell the spider that a page hasn't changed. Then Googlebot can use the old copy of the page. That frees up the bot to download more pages while saving bandwidth. Because of the bandwidth savings, IMS hits are almost "free" in terms of server load. Plain apache can serve _lots_ of IMS queries per second before slowing a machine down.
IMS can work for dynamically generated pages too. Someone posted how to do it for PHP-generated pages, for example. The bottom line is that if your server supports IMS correctly, you can tell Googlebot about more pages without as much server load or bandwidth on your part. As Google crawls more often to make the web a fresher place, adding this flag will help you and search engines.
I return some 304s on PHP generated pages but it's not always so easy to get the modification time right, it's a lot harder than get the file modification date and use it.
For Apache users with static content included by SSI (eg. headers and footers that don't change often), XBitHack Full [httpd.apache.org] is the answer. For those who can't edit the server configuration, it can be enabled by .htaccess
Yep. If the GET has If-Modified-Since, then the server should send the body with 200 status or an empty 304 header.
With HEAD, the server should just send the header. If the content's changed, the bot would need to ask again with a GET.
The user agent or robot can then send an If-Modified-Since header with the date and time of the Last-Modified header last time the URL was fetched.
If the content has changed, the server can send the new version with a 200 (OK) header. If the content has not changed, then the server can send a 304 (not modified) header and no content. The robot can just keep the content from last time, saving bandwidth. RFC 2616 [ietf.org] explains in more detail.
The META NOARCHIVE tag asks Googlebot not to keep a cache of your page. Google's support pages [google.com] describe its use.
Then for the users I will need to include the latest times for both the navigation and the update parts of the code, in the last-modified dates.
All in all a worthwhile goal, especially as we climb towards our bandwidth cap, even aside from slowing down the freshbot. But I still think a "nofresh" meta tag would get a lot more use and would free up the freshbot from even having to send a IMS get in the first place.
But I will take what I can get.
What sort of server is hosting your website?
What language do you use to produce your pages?
It truns out that with most methods of generating dynamic pages, you are able to manually process the headers.
On the other hand, if you are using static HTML, all the settings are server side, and I would hope they are set up properly by default.
I just checked on Brett's tool, and it said that I had it turned on. I'm not REAL techie, but my boss and I just installed this Linux (apache) webserver recently......is 'last modified' turned on by default?
You would then check that internally and if your page has not changed return 'Response.Status="304 Not Modified"' and 'Response.End'. If it has changed simply return the page as normal.
Someone posted how to do it for PHP-generated pages
Does anyone know which thread this refers to? I did a couple of searches and couldn't find anything.
My server does send 304 responses, but only for image files. My index page has PHP at the very top that creates an "expires" header each time the page is requested. But there is no if-modified-since header being sent with that page.
I just moved my site from one host to another (Sunday about noon the DNS name server was changed) Late yesterday I started seeing hits in the new server logs including bunches of visits from ms. googlebot today :) - one worry down.
However, I checked the old server logs a few minutes ago and there are just a few hits there now - most are my own IP and some from inktomi slurp :(
My question .. am I still seeing my own hits in the old log because my ISP has not updated their DNS? I've dumped the cache manually several times today. I know I'm still seing the old server files because I used absolute urls there, but on the new server I'm using relative urls. Can't get my email downloaded from the new server either and I'm assuming this is an ISP DNS issue.
My old server logs have consistently shown 304's when I changed pages so I'm not sure if this discussion is the same thing as TTL or not.
<edit-- said that wrong, 304 when the file was not changed >