Forum Moderators: open
I've written a mini-cms for a client, mainly for delivering dynamic PDF documents, which vary somewhat in content depending on who is viewing them.
My approach was to use a custom 404 script, which reads the request, checks user credentials and so delivers the right version document. This also means the links to the PDF files "look right". They don't look like "showmethedoc.aspx?doc=xyz.pdf". It's simply "/docs/doctitle.pdf" and the 404 script does the rest. All fairly straightforward.
As Googlebot crawls the site, I therefore need to check the "if-modified-since" header so my code doesn't keep sending Googlebot the same files when they haven't changed.
While testing that, I found a very weird problem. IIS seems to specifically strip off the "if-modified-since" header from the headers collection. It doesn't matter if I use ASP.NET, PHP or classic ASP.
If I call the script directly, no problem - the header is there. But during IIS's handling of a 404 error document, it seems to "eat" that header before passing control to the script.
What's more bizarre is that if I add, say, ".aspx" to the end of my non-existent link (eg. /docs/doctitle.aspx), suddenly the header reappears in the collection! It seems to only disappear if both the following are true: a) the script is a 404 (or other) error document, and b) the requested file is something other than .asp or .aspx.
This has been driving me nuts for the past 2 days. Anyone have an idea how I can get access to the "if-modified-since" header from a 404 script?
It looks like what you need is not possible, here is a discussion on the subject:
[eggheadcafe.com...]
I'm not really a fan of using 404 handling for something other than its intended use, preferring to leave it for what it is.
Which technology are you using (ASP, ASP.Net ...)? There is most likely a neater way to achieve what you need.
Using .NET, and even tried a httpModule but the header is eaten by IIS before it gets there too. I found, however, that if I'm handling a 404 request for an .aspx file, the header IS passed along! It only gets eaten when the request is for other types of files; doc, pdf, etc.
So I'm toying with this idea: Detect when a crawler is making the request (via user-agent) and, if so, send back a 302 (temp redirect) to a url like this:
/showdoc.aspx?doc=/somepath/mydoco.pdf
This way, IIS passes the if-modified-since header to showdoc.aspx and it's no problem to work out whether to respond with the file or a 304 Not Modified. Normal site visitors can still use the "friendly" URL.
The only question is whether Google etc. will react negatively to that sort of thing? Do you know if Googlebot will frown on 302's to the files?
Have you considered using a HTTP Handler to map *.pdf files to your custom code?
[msdn.microsoft.com...]
(also configuration for IIS7 [infosysblogs.com])
[en.wikipedia.org...]
If I understand correctly, this does basically the same thing. Googlebot stores the ETag it was sent last time it requested the file, and sends it back to the server next time. All I do is update my files' ETags when their contents are updated, right?
So if Googlebot is sent back a *different* ETag along with the with updated file, it should store it and send me the new ETag next time it requests that file.
Is that how it works? If so, bingo! I don't need to use if-modified-since at all, as long as I maintain my ETags properly. Is someone able to verify this is the case?
And guess what.. IIS strips that out as well. argh. MS really gets my goat sometimes. Why make certain client headers completely unavailable to our code? I'm stunned. :o
but I believe the file type needs to be registered directly in IIS
You may also be able to convince your web host to set this up for you, if you're lucky :)
* Edit - Possible solution here: IIS stripping the "If-Modified-Since" header [webmasterworld.com]