Trying to understand my fall from grace within the Google results, I've found stuff in my server logs that I don't like and believe to not be at issue with SEs. I like to get some ideas on how to clean it up. I Recently added a 404 error handler to clean up the URLs.
When a browser request a page using the friendly rewritten URLs the error-handler parses the URL then queries the database, finds the data, then sets the status to 200 and writes the page on the fly. If the data is not found the handler sets the status to 404 and writes out a plain 404 Not Found page.
Everything I thought was perfect, I used a <BASE href="http://www.example.com"> so that the images and links would work regardless of the structure built into the friendly URLs, for example www.example.com/typeOfdoc/docID.htm. But it turns out that most search engines don't use the <BASE href> and I don't use absolute URLs, so now the internal links get crawled by SEs against www.example.com/typeOfdoc/. Still this shouldn't be a problem but all my links are ASP pages.
It seams that when IIS 5.0 encounters an unknown ASP page it uses a 302 redirect to the error-handler which then issues a 404.
Here's an example from my log file:
66.249.xx.#*$! - 192.168.19.229 80 GET /articles/SomeFile.asp ¦-¦0¦404_Object_Not_Found 302 0 614 190 www.example.com
66.249.xx.#*$! - 192.168.19.229 80 GET /articles/err-handler.asp 404;http:ww.example.com/articles/SomeFile.asp 404 0 0 242 www.example.com
Notice the 404_Object_Not_Found 302 Status in the in the first log entry above. This doesn't happen with an HTM extension, I just get the second log entry.
I can't find anything on this 404_Object_Not_Found status result other than publicly exposed log files of other servers.
[edited by: Xoc at 1:50 am (utc) on Jan. 13, 2005]
[edit reason] changed to use example.com [/edit]