Forum Moderators: open
The objective of the 404 handler is to easily manage files that have moved. Any time that you move a file, you can create one line in a 404.xml file that tells it what the url used to be, and what it should be now. The handler then automatically forwards the browser to the new url.
There is also a list of spiders that it does not forward. They get a legitimate 404 error. I use this to try to cause the spiders to drop the old page. This is a sort of inverse cloaking, as the intention is to get the spider to drop the page, not index it better.
Since I implemented this, site management has become a whole lot easier. I have no fear of moving or renaming files because I just create one new line in the 404.xml file to fix it.
Let me know how it works for you.
Since you gave me special premission, I'll give a little plug since it actually is related to what we do in these forums: AppDev is a major national training company, so I teach in a number of different cities around the U.S. The courseware is top notch, and all of the AppDev instructors are very good. They also have a video line, if taking a week off from work is difficult. The Visual Basic courseware is written by Ken Getz, author of such books as the Visual Basic Language Developer's Handbook. The ASP courseware is written by Paul Litwin, author of such books as the Access 2000 Developer's Handbook. The XML/XSLT courseware is written by Don Kiely, author of such books as the Visual Basic Developer's Guide to E-Commerce with ASP and SQL Server. These guys are the best in their fields. Even if they weren't friends, I'd still recommend just about anything these guys are associated with: books, articles, conferences, whatever.
And no, I don't get any sort of kickback! :)
1) Redirect the browser to a page that exists;
2) Use script in the custom 404 page to build the page.
When redirecting the browser to the real page, I'd have to redirect it to the page that uses query strings, thus obviating the reason for using the custom 404 page in the first place. In this case, the status for the URL without the query strings is 302, which I assume would prevent spiders from indexing it.
When building the page using script in the custom 404 page, the URL looks something like this:
[mysite.com...]
Not very pretty. The status for the [mysite.com...] URL gets logged as 200, though, which is what we're after.
Anyone know how to get rid of that [mysite.com...] in the URL?
Because if you can place a file on the server for each one, then the easiest thing to do is do SSI from each one that generates the content.
The goal is to create dynamically generated, friendly-URL, crawlable pages, and on IIS 4 (no $$$ right now to upgrade to Win 2K). If I had to create a static file and an SSI entry for every new page in an 11-language Web site, that would sort of defeat the whole reason for having a dynamically generated site in the first place.
<%
Response.Status = "200 OK"
<!--The rest of your script-->
%>
in the 404 handler, you should be okay. Don't support query strings at all. Instead you will need a complicated script that parses the URL that was asked for (see the top of the 404 handler code), then calls the appropriate subroutines. If you had Win2K/IIS 5 you could use Server.Transfer, but you can't with IIS 4.
You can't do a Response.Redirect because that sends a new URL down to the client, which defeats part of the purpose. Instead you must do all processing on the server directly from the 404 handler.
Using Server Side Includes will probably make managing the script in the 404 handler easier.
---
As a completely different option, you could look into ISAPI programming and a filter map. I think this will work, but I've never implemented one.
But I've never understood why one would 'complicate' things with the 404. Just set an apache directive:
<Location /shopping>
ForceType application/x-httpd-php
</Location>
And place a script named 'shopping' in the root directory. Based on the URL, the script will 'explode' the URL and query a database to get the dynamic content.
So:
www.mysite.com/shopping/computers/laptops/dell/
will pull the information out, while eliminating the need for ?'s in the URL.
It looks like the browser (or spider) is being sent a 301 error. Elsewhere, there has been considerable discussion of using 404 error handling to create spiderable urls. For example, if a site uses dynamic URLs, one creates a link to [mydomain.com...] . This page doesn't actually exist, but the 404 handler parses it and delivers the appropriate dynamic page. To be indexed, though, the spider/browser can't be given an error code - it has to think the page is there. Visitors who click on the link in the SERP, of course, have to get the content, too. Can your system be tweaked to do this, as well?
The url does appear to be a returned error, so I would guess the error handler is not being executed correctly.
WebBug [cyberspyder.com] is a great tool for checking status codes returned by web pages. When I tested the 404 handler with IIS/4.0, webBug returned a status 200 for HTTP 1.0 requests and above. Only HTTP 0.9 returned errors, which were corrected by adding the response.status code listed above.
The status code gets set to 200 in the log file. The content is properly displayed by parsing the URL grabbed via Request.QueryString, querying the database and displaying the content using a script in the custom 404 ASP file. What the visitor still sees as the URL, however, is [mysite.com...]
This is the part that's still preventing me from having a production-ready solution on IIS 4. Is there a step I'm missing? I know it's not Server.Transfer, because IIS 4 doesn't support that method.
(edited by: cbg3 at 3:57 pm (gmt) on Oct. 22, 2001
The useful bit of info that came out of my mistake, for you IIS 4 users out there, is not to use an ASP file name as part of your friendly URL. Thanks for your help on Oct. 6, Xoc!
I tryed going to your website and it's not going it says the website can't be found.
The Web site cannot be found
The Web site you are looking for is unavailable due to its identification configuration settings.
--------------------------------------------------------------------------------
Please try the following:
Click the Refresh button, or try again later.
If you typed the page address in the Address bar, make sure that it is spelled correctly.
Click the Back button to try another link.
11001 - Host not found
Internet Security and Acceleration Server
--------------------------------------------------------------------------------
Technical Information (for support personnel)
Background:
This error indicates that the gateway could not find the IP address of the Web site you are trying to access.
ISA Server: troy_proxy.troy.k12.mi.us
Via:
URL: [xoc.net...]
Time: 11/12/2001 5:48:30 PM GMT
I'd really appreciate it if you can help me
I know this thread is a bit old but can use your help anyways.
Ok, I implemented 404 trapping, and also put in Xoc's patch:
Response.Status = "200 OK"
Used webbug as suggested, and I cannot see any errors coming back... but my querystring has a "404;..." in it. I know I am paraniod here, but is that not visible to SE's.
Please help.. urgently.
ezGuy