Forum Moderators: open

Message Too Old, No Replies

IIS stripping the "if-modified-since" header(?)

         

waveform

7:30 pm on Jun 17, 2009 (gmt 0)

10+ Year Member



Hi! Need some help with a hairy Googlebot/IIS issue.

I've written a mini-cms for a client, mainly for delivering dynamic PDF documents, which vary somewhat in content depending on who is viewing them.

My approach was to use a custom 404 script, which reads the request, checks user credentials and so delivers the right version document. This also means the links to the PDF files "look right". They don't look like "showmethedoc.aspx?doc=xyz.pdf". It's simply "/docs/doctitle.pdf" and the 404 script does the rest. All fairly straightforward.

As Googlebot crawls the site, I therefore need to check the "if-modified-since" header so my code doesn't keep sending Googlebot the same files when they haven't changed.

While testing that, I found a very weird problem. IIS seems to specifically strip off the "if-modified-since" header from the headers collection. It doesn't matter if I use ASP.NET, PHP or classic ASP.

If I call the script directly, no problem - the header is there. But during IIS's handling of a 404 error document, it seems to "eat" that header before passing control to the script.

What's more bizarre is that if I add, say, ".aspx" to the end of my non-existent link (eg. /docs/doctitle.aspx), suddenly the header reappears in the collection! It seems to only disappear if both the following are true: a) the script is a 404 (or other) error document, and b) the requested file is something other than .asp or .aspx.

This has been driving me nuts for the past 2 days. Anyone have an idea how I can get access to the "if-modified-since" header from a 404 script?

marcel

7:01 am on Jun 18, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi waveform,

It looks like what you need is not possible, here is a discussion on the subject:
[eggheadcafe.com...]

I'm not really a fan of using 404 handling for something other than its intended use, preferring to leave it for what it is.

Which technology are you using (ASP, ASP.Net ...)? There is most likely a neater way to achieve what you need.

waveform

10:18 am on Jun 18, 2009 (gmt 0)

10+ Year Member



Hi Marcel. Thanks for the link, I don't feel so alone now. :)

Using .NET, and even tried a httpModule but the header is eaten by IIS before it gets there too. I found, however, that if I'm handling a 404 request for an .aspx file, the header IS passed along! It only gets eaten when the request is for other types of files; doc, pdf, etc.

So I'm toying with this idea: Detect when a crawler is making the request (via user-agent) and, if so, send back a 302 (temp redirect) to a url like this:

/showdoc.aspx?doc=/somepath/mydoco.pdf

This way, IIS passes the if-modified-since header to showdoc.aspx and it's no problem to work out whether to respond with the file or a 304 Not Modified. Normal site visitors can still use the "friendly" URL.

The only question is whether Google etc. will react negatively to that sort of thing? Do you know if Googlebot will frown on 302's to the files?

marcel

11:43 am on Jun 18, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'm not sure how Google would interpret a 302 in this case, you can best ask in the Google Forum [webmasterworld.com]

Have you considered using a HTTP Handler to map *.pdf files to your custom code?
[msdn.microsoft.com...]

(also configuration for IIS7 [infosysblogs.com])

waveform

12:30 pm on Jun 18, 2009 (gmt 0)

10+ Year Member



Thanks Marcel, will do.

httpHandler would be nice, but I believe the file type needs to be registered directly in IIS as well for the httpHandler to work. The site is on a remote server, so I don't have direct control over IIS. :(

waveform

1:39 pm on Jun 18, 2009 (gmt 0)

10+ Year Member



I may have found a solution... it seems, although the "if-modified-since" header is unavailable to my code, the "ETag" header remains!

[en.wikipedia.org...]

If I understand correctly, this does basically the same thing. Googlebot stores the ETag it was sent last time it requested the file, and sends it back to the server next time. All I do is update my files' ETags when their contents are updated, right?

So if Googlebot is sent back a *different* ETag along with the with updated file, it should store it and send me the new ETag next time it requests that file.

Is that how it works? If so, bingo! I don't need to use if-modified-since at all, as long as I maintain my ETags properly. Is someone able to verify this is the case?

waveform

7:38 pm on Jun 18, 2009 (gmt 0)

10+ Year Member



Wait.. scratch that. I misunderstood the protocol. ETag is the header I should be sending back to the client. What the client sends the server is the "if-none-match" header.

And guess what.. IIS strips that out as well. argh. MS really gets my goat sometimes. Why make certain client headers completely unavailable to our code? I'm stunned. :o

marcel

7:49 pm on Jun 18, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



but I believe the file type needs to be registered directly in IIS

Yes, that is correct (for IIS 6). But if I am not mistaken you can register this in the web.config if you are using IIS 7. I haven't tried this myself though.

You may also be able to convince your web host to set this up for you, if you're lucky :)

* Edit - Possible solution here: IIS stripping the "If-Modified-Since" header [webmasterworld.com]