Forum Moderators: open

Message Too Old, No Replies

404 Handling URL Problem

         

rogerd

3:33 pm on Jul 23, 2002 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



I hope this isn't too off-topic, but since there's not an ASP/IIS forum this seemed like a good place to start.

I'm in the final throes of debugging an ASP error processor. The error page parses the request and determines whether the original page request was for a page that will call a dynamic query from an online catalog, or a true error. Here's the problem... When the error page is called, the server requests the page in the form:
http: //www.example.com/404error.asp?http: //www.example.com/bogus.htm

I.e., the original page request is attached as a query string, with the URL being that of the error page. From what I can tell, this is standard behavior for IIS if the error page is dynamic. (When the server didn't think the page was dynamic, it didn't run the ASP code at all.)

My error page parses the query string and determines whether to load a dynamic page or just a static 404 page. (It uses the Server.CreateObject approach described by wardbekker here rather than Server.Transfer, which allows it to pass a query string when calling dynamic content.)

Here's the problem: under some conditions, the URL shown to the browser (or spider) is the http: //www.example.com/404error.asp?http: //www.example.com/bogus.htm URL. Oddly, it appears that if I call a nonexistent directory, only the requested URL appears (e.g., http: //www.example.com/bogusdir/ shows in the address bar when processed by the error page); if the request generating the error is for a .htm file, the error page + query string URL appears.

Is there a Response object I need to set that will send (or keep) the requested URL in the address bar? If not, is there another way to accomplish that? Thanks...

Dreamquick

8:46 pm on Jul 23, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I've played with the exact same thing myself and it can be very frustrating...

Assuming you have set a custom error page through IIS, the basic behaviour is as you described;

user requests a page that doesn't exist
user gets redirected to the 404 handler w/ querystring
404 script does whatever it is supposed to

It's important to understand that *every time* your script is used it is referred to as;

www.example.com/404error.asp?404;http://www.example.com/bogus.htm

Your browser simply hides this redirect from you, a spider will *always* see this page referenced in the redirect, that's simply the way IIS has chosen to handle custom errors...

Unfortunately, as you have spotted, some of the time your browser decides to show you the redirect and as far as I'm aware there is no reliable way to force the browser to always show the old URL.

I have seen the problem before but I seem to remember it being one of those things I couldn't get to happen every time and so it got put aside.

Look on the plus side of the situation - it's not just you that this happens to, its everyone else that uses IIS too... (me included)

mavherick

9:08 pm on Jul 23, 2002 (gmt 0)

10+ Year Member



There might still be hope. Take a look at this thread by Xoc: An ASP 404 Handler [webmasterworld.com]

JuDDer

9:17 pm on Jul 23, 2002 (gmt 0)

10+ Year Member



I do this on many ASP sites under IIS.

Indeed the querystring is always passed, but if you server.transfer to the page you want to load then the querystring will be hidden.

If you are redirecting to a page that needs the querystring to select a record from a database, parse the url and set a session variable to the value of the ID.

Your page that you are server.transfer'ing to will need to be modified so it executes the query on the value of the id stored in the session variable instead of the querystring.

Here's a good article for more info:
[asp101.com ]

The principle can be adapted to make it do whatever you like.
I have tens of thousands of static looking url's that don't actually exist listed in search engines using this method, and they are listed along the lines of mydomain.com/directory/page12345.html when really, it's page.asp?id=12345

rogerd

2:04 am on Jul 24, 2002 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



Your browser simply hides this redirect from you, a spider will *always* see this page referenced in the redirect, that's simply the way IIS has chosen to handle custom errors...

I thought this was true, but... actually, the reverse seems to be true, DQ! This is weird... I can spider the fake/friendly page all day, and the spider sees the URL it requested - NOT the error + query string URL. But IE requests the same page, and sees the latter URL.

Is this some sort of session thing that requires browser interaction? The more I know, the more confused I get... :)

Dreamquick

8:16 am on Jul 24, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



rogerd,

I took two examples from my own script;

1) if i request a page which doesn't exist (and hasn't been assigned a redirect) then i get shown the 404 page immediately

2) if i request a page which doesn't exist but *has* been assigned a redirect then IIS appears to issue a redirect to the custom 404;

HTTP/1.1 302 Object moved
Server: Microsoft-IIS/5.0
Date: Wed, 24 Jul 2002 08:30:03 GMT
Connection: close
Location: /helpers/status_404.asp?404;http://example.com/about.asp

...and then the script executes and handles the request.

Clearly in some cases a 404 script can produce a "clean" result, but other times it will use a redirect - this warrants further investigation...

< edited due to stupidity on my part >

- Tony

aspdaddy

12:48 pm on Jul 24, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi,

I am planning to do similar on a site, but I am going to let the custom error page select and display from the database. This way I wont use any querystring or redirect/transfer.

When the error page gets hit, my stats show the actual page name entered in the address bar.

Hopefully its going to work something like this -

select content from the db where [page] = server.variables("SCRIPT_NAME"),

If an empty recordset returns , display the standard 404 content, otherwise display the page template with the content fromthe recordset.

I will let you know if it works or not...

rogerd

5:19 pm on Jul 24, 2002 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



It will be interesting to see what happens, aspdaddy. My guess is that you will see the query string URL, which seems to be the way IIS handles transfers to a dynamic error page. The fact that under many conditions the "friendly", nonexistent page URL is supplied, though, gives one reason to hope.

jdMorgan

7:49 pm on Jul 24, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



What I've seen is that a 404 response is not a redirect. However, it can contain a link to a custom
404 page, and most browsers will pick up that link and follow it automatically.

I'm not an ASP user, but spent some time working on an Apache-hosted site because I had used a
custom error page redirect containing a canonical path (http://www....) and instead of returning a
404 response, Apache was returning a 302-Moved Temporarily response. So, SE spiders never
de-listed that page - It was drivin' me nuts!

Looking at the actual server response headers is where I saw how custom 404 page redirection works.

Brett has a nice "Server Headers" viewer here on wmw which may prove useful to you.
It's under "control panel" and over in the left-side nav bar.

Jim

Xoc

9:38 pm on Jul 24, 2002 (gmt 0)

rogerd

4:24 pm on Jul 25, 2002 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



Hmmm, it seems that the browser can get sent a 200 while the server header can still be a 400. This is perplexing - I'm working with some code loosely based on wardbekker's suggestion at [webmasterworld.com...] . It seems that several participants in that thread found that the server headers were coming out as 200s, which is what one wants to happen when serving dynamic content. I'm finding that the error page can deliver the correct content, show the browser a 200, but still have a server header with a 400.

Is there some additional code needed to fix the server header, or is this an IIS setting that needs to be changed? Or could it be something as simple as where the Response.Status "200 OK" is located in the code?

JuniorHarris

1:50 pm on Aug 2, 2002 (gmt 0)

10+ Year Member



It's important that the status be set prior to any output being returned to the browser. So setting or clearing the status has to be determined early in the script. It could be possible to enable buffering for large/complex scripts, and setting, clearing, flushing it as necessary.

For those IIS users looking for an alternative to Server.Transfer, you may want to review Server.Execute. It functions very much like Server.Transfer, with all parameters accessible etc, except it will return control to the calling script!~;)

New Directions in Redirection [msdn.microsoft.com]

rogerd

2:02 pm on Aug 2, 2002 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



Hmmm, that could be my problem, JH. I'll check that out, thanks!