An ASP 404 Handler

Forum Moderators: open

Message Too Old, No Replies

An ASP 404 Handler

Xoc

11:03 am on Apr 3, 2001 (gmt 0)

It turned out that trying to supply the code for my ASP 404 handler in a post was just too complicated and too much text. So I put it on my web site, and you can get to it here: [xoc.net...]

The objective of the 404 handler is to easily manage files that have moved. Any time that you move a file, you can create one line in a 404.xml file that tells it what the url used to be, and what it should be now. The handler then automatically forwards the browser to the new url.

There is also a list of spiders that it does not forward. They get a legitimate 404 error. I use this to try to cause the spiders to drop the old page. This is a sort of inverse cloaking, as the intention is to get the spider to drop the page, not index it better.

Since I implemented this, site management has become a whole lot easier. I have no fear of moving or renaming files because I just create one new line in the 404.xml file to fix it.

Let me know how it works for you.

Brett_Tabke

12:35 am on Apr 4, 2001 (gmt 0)

Thank you Xoc. On my laymans 'spare reading time', it rates a 9. I gotta ask because I'm not sure I have the story right, do you teach ASP or XML? And you are a former MS employee? Not working for bill now though? Where do you teach at and to who, if I understand correctly. (if you don't want to answer in public, I understand), but I was just curious since you know the MS side of the road so well.

Xoc

1:22 am on Apr 4, 2001 (gmt 0)

I currently teach Visual Basic, ASP, and XML/XSLT for AppDev (appdev.com). I worked for Microsoft as a Software Design Engineer from 1988 until 1992. I wrote the print engine in Microsoft Access 1.0, among other things. I co-authored three books on Access, but they are all out of print now. I am the author of a widely used set of naming conventions for Visual Basic and VBA.

Since you gave me special premission, I'll give a little plug since it actually is related to what we do in these forums: AppDev is a major national training company, so I teach in a number of different cities around the U.S. The courseware is top notch, and all of the AppDev instructors are very good. They also have a video line, if taking a week off from work is difficult. The Visual Basic courseware is written by Ken Getz, author of such books as the Visual Basic Language Developer's Handbook. The ASP courseware is written by Paul Litwin, author of such books as the Access 2000 Developer's Handbook. The XML/XSLT courseware is written by Don Kiely, author of such books as the Visual Basic Developer's Guide to E-Commerce with ASP and SQL Server. These guys are the best in their fields. Even if they weren't friends, I'd still recommend just about anything these guys are associated with: books, articles, conferences, whatever.

And no, I don't get any sort of kickback! :)

Laisha

5:20 am on Apr 4, 2001 (gmt 0)

Well, I'd say you're definitely qualified. :) Count me as one of those blonde types whose eyes glaze over in these conversations, but I did want to mention that I'm impressed. :)

cbg3

5:59 pm on Oct 5, 2001 (gmt 0)

So using a custom 404 page in ASP 2/IIS 4, you basically have two options:

1) Redirect the browser to a page that exists;
2) Use script in the custom 404 page to build the page.

When redirecting the browser to the real page, I'd have to redirect it to the page that uses query strings, thus obviating the reason for using the custom 404 page in the first place. In this case, the status for the URL without the query strings is 302, which I assume would prevent spiders from indexing it.

When building the page using script in the custom 404 page, the URL looks something like this:

[mysite.com...]

Not very pretty. The status for the [mysite.com...] URL gets logged as 200, though, which is what we're after.

Anyone know how to get rid of that [mysite.com...] in the URL?

Xoc

4:54 am on Oct 6, 2001 (gmt 0)

So let me get this straight. What you want is the ability to have a bunch of URLs that all use the same script to generate content, without having to place a file on the server for each one.

Because if you can place a file on the server for each one, then the easiest thing to do is do SSI from each one that generates the content.

cbg3

2:20 pm on Oct 6, 2001 (gmt 0)

Basically, yes. What I want to have are the dynamically generated pages, with a URL like www.mysite.com/page/123, generated just as if they came from the URL www.mysite.com/page.asp?p=123, and without the query string URL ever appearing in the browser. (Also without the www.mysite.com/404.asp?404;http://www.mysite.com/page/123 URL appearing, either.)

The goal is to create dynamically generated, friendly-URL, crawlable pages, and on IIS 4 (no $$$ right now to upgrade to Win 2K). If I had to create a static file and an SSI entry for every new page in an 11-language Web site, that would sort of defeat the whole reason for having a dynamically generated site in the first place.

Xoc

4:14 pm on Oct 6, 2001 (gmt 0)

I believe that this should work. If you do a:

<%
Response.Status = "200 OK"

%>

in the 404 handler, you should be okay. Don't support query strings at all. Instead you will need a complicated script that parses the URL that was asked for (see the top of the 404 handler code), then calls the appropriate subroutines. If you had Win2K/IIS 5 you could use Server.Transfer, but you can't with IIS 4.

You can't do a Response.Redirect because that sends a new URL down to the client, which defeats part of the purpose. Instead you must do all processing on the server directly from the 404 handler.

Using Server Side Includes will probably make managing the script in the 404 handler easier.

---

As a completely different option, you could look into ISAPI programming and a filter map. I think this will work, but I've never implemented one.

chris

6:03 am on Oct 7, 2001 (gmt 0)

I've done something like this with PHP. There is actually a good discussion about this topic on another site.

But I've never understood why one would 'complicate' things with the 404. Just set an apache directive:

<Location /shopping>
ForceType application/x-httpd-php
</Location>

And place a script named 'shopping' in the root directory. Based on the URL, the script will 'explode' the URL and query a database to get the dynamic content.

So:
www.mysite.com/shopping/computers/laptops/dell/

will pull the information out, while eliminating the need for ?'s in the URL.

cbg3

4:00 pm on Oct 7, 2001 (gmt 0)

The 404 script page discussion above pertains to servers running Windows and IIS. I don't think the solutions applicable to PHP and Apache would be applicable to this configuration. Would be nice to have the simpler option, though!

rogerd

3:47 pm on Oct 8, 2001 (gmt 0)

Looks pretty cool, Xoc, thanks for sharing. If you'll excuse what may be a dumb question:

It looks like the browser (or spider) is being sent a 301 error. Elsewhere, there has been considerable discussion of using 404 error handling to create spiderable urls. For example, if a site uses dynamic URLs, one creates a link to [mydomain.com...] . This page doesn't actually exist, but the 404 handler parses it and delivers the appropriate dynamic page. To be indexed, though, the spider/browser can't be given an error code - it has to think the page is there. Visitors who click on the link in the SERP, of course, have to get the content, too. Can your system be tweaked to do this, as well?

JuniorHarris

6:11 pm on Oct 8, 2001 (gmt 0)

A 404 trap for IIS/4.0 should work without returning an error string. Be sure when configuring MMC to choose the URL option rather then using browse (otherwise it will return the script without executing it). There is a great visual to this note on the link [xoc.net] Xoc posted.

The url does appear to be a returned error, so I would guess the error handler is not being executed correctly.

WebBug [cyberspyder.com] is a great tool for checking status codes returned by web pages. When I tested the 404 handler with IIS/4.0, webBug returned a status 200 for HTTP 1.0 requests and above. Only HTTP 0.9 returned errors, which were corrected by adding the response.status code listed above.

Xoc

7:32 pm on Oct 8, 2001 (gmt 0)

All you need to do to generate dynamic content is set the response code to 200, as shown above, then use your script to generate the correct content.

cbg3

3:29 pm on Oct 22, 2001 (gmt 0)

Following Xoc's instructions in his 4:14 pm on Oct. 6 post above on IIS 4, my results are the following:

The status code gets set to 200 in the log file. The content is properly displayed by parsing the URL grabbed via Request.QueryString, querying the database and displaying the content using a script in the custom 404 ASP file. What the visitor still sees as the URL, however, is [mysite.com...]

This is the part that's still preventing me from having a production-ready solution on IIS 4. Is there a step I'm missing? I know it's not Server.Transfer, because IIS 4 doesn't support that method.

(edited by: cbg3 at 3:57 pm (gmt) on Oct. 22, 2001

cbg3

3:54 pm on Oct 22, 2001 (gmt 0)

Oops! Xoc's solution does indeed display the correct URL in the browser. My mistake was that I was including an ASP file as part of the friendly URL. Apparently when the server saw that ASP file name as part of ther URL, it set a 302 (Object Moved) status code, built and displayed the page using the script in the 404 ASP, and set the status code to 200 for the page it displayed. That 302 status code seemed to cause the URL from being displayed as the friendly version.

The useful bit of info that came out of my mistake, for you IIS 4 users out there, is not to use an ASP file name as part of your friendly URL. Thanks for your help on Oct. 6, Xoc!

suler

5:50 pm on Nov 12, 2001 (gmt 0)

Xoc,

I tryed going to your website and it's not going it says the website can't be found.

The Web site cannot be found
The Web site you are looking for is unavailable due to its identification configuration settings.

--------------------------------------------------------------------------------

Please try the following:

Click the Refresh button, or try again later.

If you typed the page address in the Address bar, make sure that it is spelled correctly.

Click the Back button to try another link.
11001 - Host not found
Internet Security and Acceleration Server

--------------------------------------------------------------------------------

Technical Information (for support personnel)

Background:
This error indicates that the gateway could not find the IP address of the Web site you are trying to access.

ISA Server: troy_proxy.troy.k12.mi.us
Via:
URL: [xoc.net...]
Time: 11/12/2001 5:48:30 PM GMT

I'd really appreciate it if you can help me

Xoc

7:19 pm on Nov 12, 2001 (gmt 0)

Power failure this morning for a couple of hours. It's up now.

suler

9:03 pm on Nov 12, 2001 (gmt 0)

ahh :) alright cool :) Thanks!

ezGuy

8:22 am on Oct 31, 2002 (gmt 0)

Hi Guys,

I know this thread is a bit old but can use your help anyways.

Ok, I implemented 404 trapping, and also put in Xoc's patch:
Response.Status = "200 OK"

Used webbug as suggested, and I cannot see any errors coming back... but my querystring has a "404;..." in it. I know I am paraniod here, but is that not visible to SE's.

Please help.. urgently.

ezGuy

Xoc

11:25 pm on Nov 1, 2002 (gmt 0)

Not clear on what you are asking. When you request the page that is 404, IIS executes the 404 error page script. At that point, you can generate whatever you want, including a 301 redirection, or a 404 error. If you generate a 200, then the content is considered correct OK return.