Forum Moderators: open

Message Too Old, No Replies

Deepcrawl question, and HTTP_IF_MODIFIED_SINCE

         

Jesse_Smith

4:19 am on Mar 14, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



On my sites that I havn't done any changes to since the last deepcrawl, the deepcrawl is hardly doing anything, while it is going through sites that I have added stuff to and made changes to. Will it delete links that it doesn't look at in the deepcrawl? If it skips a site that hasn't had any changes to, does that mean the server supports HTTP_IF_MODIFIED_SINCE?

How do you know if you have HTTP_IF_MODIFIED_SINCE on the server?

On 2/11/03 when I tried it out with Google in a log script, it gave out
HTTP_IF_MODIFIED_SINCE >> Thu, 02 Jan 2003 03:57:20 GMT
so I'm guessing my server does support it, saving me a ton of bandwidth, and won't delete the links it doesn't look at. Would that be correct?

jdMorgan

4:54 am on Mar 14, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Jesse_Smith,

> Will it delete links that it doesn't look at in the deepcrawl?

No.

> How do you know if you have HTTP_IF_MODIFIED_SINCE on the server?

Use the WebmasterWorld Server Header Checker [webmasterworld.com] and look for "Last-Modified" in your server's response.

In your logs, you should still see the 'bot requesting files, but instead of a 200-OK response code, you should see a 304-Not Modified response. This still saves bandwidth, since only the HTTP header is returned if the content has not been modified.

HTH,
Jim

Jesse_Smith

5:22 am on Mar 14, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Does any one know how to get it working on HTML files? It only works on txt files.

jdMorgan

5:37 am on Mar 14, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



What server are you hosted on, Apache, IIS, or other?

Jim

Jesse_Smith

7:11 am on Mar 14, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Apache

jdMorgan

2:04 pm on Mar 14, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Jesse_Smith,

Using the Apache <files> directive [httpd.apache.org], mod_expires [httpd.apache.org], and mod_headers [httpd.apache.org], here's an example for use in .htaccess:


# Set up Cache Control headers
# Default - Set http response header to expire everything 1 week from last access, set must-revalidate
ExpiresActive On
ExpiresDefault A604800
Header append Cache-Control: "must-revalidate"
# Apply a customized Cache-Control header to frequently-updated files
<Files robots.txt>
ExpiresDefault A1
Header unset Cache-Control:
Header append Cache-Control: "no-cache, must-revalidate"
</Files>
<Files index.html>
ExpiresDefault A3600
</Files>

You can also use ExpiresByType in mod_expires if that is more suitable to your needs.

Jim