Forum Moderators: martinibuster

Message Too Old, No Replies

HTML Redirects To Domain Names

How is looping avoided?

         

celgins

1:45 am on May 12, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



This question could probably be moved to another thread, but I'll ask it here.

I've been reading through some older threads about .html redirects:

[webmasterworld.com ]

[webmasterworld.com ]

But I'm looking at redirects on Windows 2003 Servers running IIS. Since no .htaccess file is involved, I employ ASP scripting for redirects. Currently, I'm redirecting "example.com" to "www.example.com", and I understand the need to do so.

But I fall into logical confusion when trying to understand redirecting "index.html" to "www.example.com".

I understand search engines seeing "index.html" and "www.example.com" differently and splitting their page rank. But if "index.html" is the default file in your root directory when navigating to "www.example.com", won't redirecting "index.html" to "www.example.com" cause some sort of infinite loop and blow a hole in the universe or something?

jdMorgan

1:57 am on May 12, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yes, this can indeed cause a loop. The trick is to detect that it is the client requesting the URL /index.html, and not the server accessing the file index.html as a result of that request. If it is a direct client request for the /index.html URL, then you redirect. If it is simply a server-internal request for the index.html file, you don't.

I don't know about IIS, but on Apache, you just need to test the server variable "THE_REQUEST." This returns the entire request header sent by the client. In the redirect case it would contain, for example, "GET /index.html HTTP.1.1", whereas in the no-redirect case, it would contain "GET / HTTP/1.1".

IIS probably has an analogous function and variable.

This problem illustrates an important concept: That URLs and filepaths are not the same thing, and in fact need not be related. They are two systems for locating information in two different spaces -- The URL-space on the Web, and the file-space in the server. It is the server's main job to translate URLs to filepaths, and return the appropriate content.

Jim

celgins

2:38 am on May 12, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks Jim. Your explanation makes sense, but I just have to find a way to wrap my brain around the concept.

In my mind, I have a hard time understanding how the search engines can view "index.html" and "www.example.com" differently and give them both pageranks. Granted, I see the same pagerank for both "index.html" and "www.example.com", so it appears that the "index.html" and its pagerank is representing both. (I think many webmasters think this and don't realize the search engines are splitting PR between the two)

But I have seen an "index.html" file rank higher (or lower) than the "www.example.com".

Since "index.html" is the default file in the root folder and "www.example.com" would not work if "index.html" were removed, it appears that they're one in the same.

Anyway, I trust your knowledge on this. You're right in your observation that IIS has several analogous server variables to play with. I'll work on piecing together another redirect and try to do it without losing brain cells.

jdMorgan

3:03 am on May 12, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> Since "index.html" is the default file in the root folder and "www.example.com" would not work if "index.html" were removed, it appears that they're one in the same.

Ah, but no, www.example.com/ would work just fine if the file index.html were removed... As long as you told the server what file you wanted to serve when the www.example.com/ URL was requested. On Apache, this is done using the DirectoryIndex directive. This is a server configuration directive, without which, it is actually the requests for www.example.com/ that would fail... Or rather, they would result in a listing of all files in the root directory, instead of a "Web page."

I think you'll find many sites that don't have an index.html file, and have never had one. Instead, they use default.cfm, or main.php, or service.asp. But it doesn't matter what the file is called, because the servers are configured to serve the correct file whenever a request for the URL "/" is received.

On any site I've ever touched, there is no index.html URL. There may be an index.html file, but that is a file, not a URL, and that file is served only in response to a request for the URL "/". Any direct request for a URL of "/index.html" is redirected to "/".

Again, we've bolted a lot of accessories onto servers in the past decade, but their essential function is to translate requests in the URL name-space into filesystem requests in the server filespace, and to return the appropriate file contents. This is why it makes no difference whether a server uses Windows file-naming conventions or *nix file-naming conventions: The requests are made using Web-standard URLs, and the server itself translates those into whatever file-naming system it uses internally; URLs work no matter what operating system the server uses, which is one of the main reasons why they have a different name and are not called "Web file locations" or something...

Hopefully, the IIS crew will show up here soon, so I can stop trying to answer your IIS questions with Apache directives as examples... :)

Jim

celgins

3:41 pm on May 12, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Ah, but no, www.example.com/ would work just fine if the file index.html were removed... As long as you told the server what file you wanted to serve when the www.example.com/ URL was requested.

I think what is a bit confusing about IIS is the fact that when you setup a Windows web server, the default directory parameters list certain files as the directory index. For example -- unless you specify another file, Windows looks for "default.htm". If it doesn't find it, it looks for "default.html"; then "index.htm"; then "index.html"; then "index.asp". If you do not place one of those files in your web root, you'll either get a directory listing or a "Page cannot be displayed" error.

Of course, if you want the default page to be "services.asp", you would either list that first, or have it as the only entry in IIS. A problem for many webmasters is the fact that their ISP has "default.htm, default.html, index.htm, index.html, index.asp" as the generic listing in IIS, so the webmaster must choose one of those files.

Some ISP services allow you to specify a directory index file. Many do not.

If the default page is set to "index.html", requesting "www.example.com/" would cause the web server to send the contents of the file "www.example.com/index.html".

Though I've had a bit of experience with IIS, I've never run my own web server in a production environment, so maybe one of the Windows gurus knows how to implement a redirect without causing explosions.

On the Apache side, Jim's explanation is perfect.