Forum Moderators: phranque

Message Too Old, No Replies

Auto redirect to 'index' file destroys HTTP HOST

         

Catfluf

9:09 pm on Dec 13, 2011 (gmt 0)

10+ Year Member



Not sure if this can be solved via Apache or PHP, but here goes:

I have multiple domain names pointing to the same site.

This works fine in the root folder: mysite.com and myothersite.com begin at mysite.com/index.php and I can change site settings depending on HTTP_HOST (which is either set to mysite.com or myothersite.com).

It doesn't work when I go one folder deeper: mysite.com/start and myothersite.com/start are always seen as mysite.com/start/index.php - destroying the HTTP_HOST variable.

What can I do? I can't save the URL in a session variable because I can't get to it before it's changed back to mysite.com
Any help appreciated!

lucy24

11:26 pm on Dec 13, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



What can I do?

Well, what are you doing right now? One thing I'm sure of is that the fairies aren't looking at your user requests and sending them to the right domain. You're doing something to make it happen.

What do you mean by "begin at"? Likewise what do you mean by "change site settings"? Are we talking about config files, php scripts, DNS problems, something else entirely?

phranque

1:23 am on Dec 14, 2011 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



welcome to WebmasterWorld, Catfluf!

what response are you getting from your initial request?
i'm guessing it is an external redirect - where is this being generated?

Catfluf

4:46 am on Dec 14, 2011 (gmt 0)

10+ Year Member



Thanks for your help guys :)
Normally I enter the site at mydomain.com and index.php in the root folder starts of course. My script in index.php then looks at HTTP_HOST and determines which domain the user entered with and changes site settings to accommodate.

What I mean by start, is use a nice looking simple URL like mydomain.com/go to access another part of my site.
I can use something like mydomain.com?gohere=1 but I'd like something simple.

It works fine for a single domain (mydomain.com/go starts the script at mydomain.com/go/index.php)

With alias domains however, HTTP_HOST always ends up containing mydomain.com and never myotherdomain.com before my index script sees it.

lucy24

8:56 am on Dec 14, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Please tell me you are not rewriting a whole bunch of domains to a single source of content while letting the address bar show whatever domain name the user originally requested. You need to redirect all the extra domains to the correct one, and this needs to happen in your config file or htaccess before the user ever sets foot on any page. It doesn't matter what page they originally requested; it's always the same two lines of code. In fact it's the identical code that you may already be using for the with-or-without-www redirect. It just has to do a little more work.

phranque

9:20 am on Dec 14, 2011 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



do you have a UseCanonicalName directive specified in your config?
http://httpd.apache.org/docs/current/mod/core.html#usecanonicalname

phranque

1:25 pm on Dec 14, 2011 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



try looking at the SERVER_NAME variable instead of HTTP_HOST.

Catfluf

5:10 pm on Dec 14, 2011 (gmt 0)

10+ Year Member



OK I must admit I'm new to this, and lost already!

Please tell me you are not rewriting a whole bunch of domains to a single source of content while letting the address bar show whatever domain name the user originally requested.
Can you explain to me why this is bad? It's worked fine for me apart from this issue.

Can I put something simple and specific in my htaccess? Rather than going into theory, my brain works better with specific examples that I can analize :)

SERVER_NAME contains the same as HTTP_HOST

I did have a brief look at UseCanonicalName but can I change this on a commercial shared server?

lucy24

9:33 pm on Dec 14, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Can you explain to me why this is bad? It's worked fine for me apart from this issue.

Look up "duplicate content". What you need to do is pick one domain name and use it. Unless you are confident that you are, and will always remain, the sole occupant of your niche. In that case I guess it doesn't matter because everyone has to go to you. But it still seems like it would be cleaner and tidier if you started out by checking for

HTTP_HOST !^(the-exact-form-you-want)?$

and redirecting the rest. Otherwise every line of every php script has to detour into checking the current hostname and plugging that into whatever value it returns.

Catfluf

12:59 am on Dec 15, 2011 (gmt 0)

10+ Year Member



The website isn't totally the same content with multiple domains leading to it.

It caters for different groups within an organisation. I check the domain on entry and set variables accordingly: The logos and various settings change depending on what that is, and from then on I don't have to check the domain because the group name is stored in a session variable. Each variation has it's own photo gallery and events board etc and my links are relative (or set using variables) so it automatically remains in the address bar as whichever one I enter the site from.

The alternative is to program multiple clone sites which would contain mostly the same content, and every time I update a script I'd have update the same code across all sites.

I can't check HTTP_HOST because the only point I'm having trouble with is HTTP_HOST not containing the correct domain name under a particular circumstance. That is: Apache loses the value when the browser enters the site at a folder rather than a file.

lucy24

1:47 am on Dec 15, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The website isn't totally the same content with multiple domains leading to it.

Oh, lord, that sounds risky. Can't you do it the other way around? Treat them as separate sites, but call on some of the same include files for their content.

Apache loses the value when the browser enters the site at a folder rather than a file.

Apache never loses anything unless you've expressly told it to in your config file, htaccess or somewhere between. Then it's no use changing your mind a nanosecond later.

A user doesn't enter the site at a folder, a file or anything else. (There's really no such things as a folder, anyway. What you see is the folder's index file.) A user enters the site at the server, and then gets passed down through a series of directories.

Like this: If you are going to an office in a building and you know it's located on the third-floor back, you are not allowed to shinny up the fire escape and climb in the window, no matter how much faster easier it is. You have to come in the front door, show your credentials to the security guard and then wait while assorted people do assorted things before you get permission to get escorted inside.

If you are telling apache to redirect, then you need to store the original information in a query string. But you can't hide it from the user unless you put it somewhere else, like a fake directory name, and then you're right back where you started.

What's happening right now in your config and/or htaccess? Not the entire code ::shudder:: just the parts that pertain to host or domain name. Is it your own server or (second-best) do they all live in the same directory within a shared-hosting environment? You can't do anything at the apache level unless they all pass through the same htaccess or config immediately before branching out into the different domain names.

Catfluf

3:17 am on Dec 15, 2011 (gmt 0)

10+ Year Member



Thanks for your help on this! I have a reasonable knowledge of PHP but not the mechanics of servers (probably obvious by now. lol)

If I had separate sites and includes there would still be duplication that is unnecessary as basically all sites are the same except for details here and there.

Commercial shared server so no access to config. The only things I have in my htaccess start off my custom PHP.ini

So how does the redirecting work when you type mysite.com, and not have to type mysite.com/index.php to enter any site? The point I'm trying to stress is, HTTP_HOST is fine when this happens. It only seems to 'reset' when the URL points to a directory another level down.

Is there something I can place in htaccess that simply redirects anydomain.com/go to anydomain.com/index.php?page=value - and leave HTTP_HOST intact with the name of the domain that the user entered on?

This is seemingly such a small problem but there doesn't appear to be a straight-forward answer. I may just give up on it and use a more complicated URL for what I wanted to do, as the site works fine apart from this anyway.

lucy24

6:11 am on Dec 15, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Commercial shared server so no access to config. The only things I have in my htaccess start off my custom PHP.ini

Not a problem, so long as you're allowed to maintain your own htaccess and make it do the various things an htaccess can do. Somewhere in your host's fine print you will find the details.

So how does the redirecting work when you type mysite.com, and not have to type mysite.com/index.php to enter any site?

That part is simple. Well, conceptually simple. The config file includes a list of possible index-file names:
index.html
index.htm
index.php
The exact list depends on the config file.

When a request comes in for a directory, several things happen. What follows is the default behavior. Some things can be overridden by explicit rules in htaccess files. When I say "Apache", I mean either the Apache core or a module whose job is to deal with some specific aspect of the request.

First, if the request is for a "naked" name (no extension or slash), Apache checks whether there is a directory by that name. If yes, Apache sticks on a slash. This is officially called the Directory Slash Redirect. If no, you are handed an immediate 404. Generally we think of 404s as being for nonexistent pages, but they are also used for nonexistent directories.

Next, Apache looks inside the directory to see if there exist files with any of the names in its "possible index file" list. If yes, you're taken to that file. If no, Apache checks whether auto-indexing is enabled for this directory. If yes, Apache runs up an index on the fly. If no, you get a 403 error. Yes, the very same 403 that evil robots get.

The point I'm trying to stress is, HTTP_HOST is fine when this happens. It only seems to 'reset' when the URL points to a directory another level down.

Who is reading the HTTP_HOST value? Your php file, or the .htaccess? Going back to the original post:

mysite.com/start and myothersite.com/start are always seen as mysite.com/start/index.php

This cannot happen by itself. Something is redirecting myothersite.com/start to mysite.com/start and from there, as spelled out above, it ends up in mysite.com/start/index.php. Redirects can be issued by php, or they can originate in htaccess. In fact that seems to be exactly what you said:
mysite.com and myothersite.com begin at mysite.com/index.php and I can change site settings depending on HTTP_HOST (which is either set to mysite.com or myothersite.com)

So the HTTP_HOST is getting read before the redirect to mysite.com

Did you write all your own php, or is some of it boilerplate that you inherited or copy-and-pasted? That is, do you know what every single line of your php does? (Say Yes, because I don't speak php ;))
____

OK, all of that was preliminary. I'm still trying to figure out what a human user sees when they go to your site. Is it the user's choice which domain they ask for, or does your php read some information and send them to the right place? Can they switch from one domain to another? What do they see in the address bar?

That's assuming I have understood correctly that you have more than one domain name. If so, there has to be some kind of DNS setup. Otherwise, people would never reach you in the first place. If there were no php scripts and no htaccess, where would the different domain requests end up?

phranque

6:34 am on Dec 15, 2011 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



do you have a .htaccess file on those subdirectories?

Catfluf

9:41 pm on Dec 15, 2011 (gmt 0)

10+ Year Member



No I haven't got as far as htaccess files on other directories apart from root yet. I'm new to htaccess.
I do write all my scripts from scratch to keep it simple and so I know what's going on :)

Let me try to make my examples clearer...


Say I have a site called jacksplace.com

It contains pages about Jack and pages for his photos etc.

Then Sally says she likes it and wants one for herself.

I register the domain sallysplace.com and through my host's control panel, make sallysplace.com an alias of jacksplace.com
I do this because I don't want to duplicate the whole site into a whole new directory.

My index.php at jacksplace.com looks at HTTP_HOST and the logic (I'll spare you the actual code!) goes like this:

If HTTP_HOST = jacksplace.com then User = jack
If HTTP_HOST = sallysplace.com then User = sally
This is saved in a session variable so it stays set until the visitor leaves the site.
I also set some variables for my links. For example:
PhotoDirectory = User/photos
My root contains a directory called 'jack' and a directory called 'sally' which contain their data.

Now I'm all set!
The entire site doesn't have to think about which user data to display because the links are preset. For example:
Photo file I want to display = PhotoDirectory/photo.jpg - which for jack equals jack/photos/photo.jpg and for sally equals sally/photos/photo.jpg
The domain in the address bar remains correct at all times without me doing a thing. At no time does Sally's site display the word 'jack'.
Even if Sally right-clicks on a photo and choses Properties, the link displayed is sallysplace.com/sally/photos/photo.jpg

---

So now my problem comes when I want to give them a new entry URL to their editing login page.

I can use jacksplace.com/index.php?page=login or sallysplace.com/index.php?page=login (or something similar) but I'd like to simplify it to jacksplace.com/edit or sallysplace.com/edit

I put a directory in root called 'edit' with an index.php that examines HTTP_HOST the same way as I normally do.
The only difference is that it's one directory level down from normal.

The change now is that HTTP_HOST always contains jacksplace.com even when I enter via sallyplace.com/edit

lucy24

11:54 pm on Dec 15, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Got it. But I do hope you were simplifying for our benefit, since I don't suppose jack would like to know that "his" site was really the alias for a bunch of other people's sites :)

Is the aliasing part done via mod_alias or something else? This is potentially crucial, because mod_alias typically executes after mod_rewrite, even if it's located in a higher-level directory or config file. So there's the risk that anything you do in your own htaccess via mod_rewrite will get overridden and/or overwritten when it loops back outside and meets mod_alias.

This is where the analogy about entering a building via the front door comes in. No matter what URL the user requests, physically they will always pass through your top-level directory with its top-level htaccess before they get sent out to the appropriate subdirectory. So the "user" value has to be set at the beginning of a session, not in a specific page.

Does your htaccess look at the {HTTP_COOKIE} value at any point? I'm starting to get the nasty feeling that what you want to do can't be done using your current setup. I think the aliasing may have to be moved somewhere else, so it's under your control rather than the host's. Most importantly, you need to have complete control over what order things happen in, because you need to do a final rewrite after your final redirect.

!
Final obvious question that I almost overlooked. When you look at {THE_REQUEST}, does that give you the domain name that the user sees, or the "real" domain name as in {HTTP_HOST} ?

Catfluf

12:54 am on Dec 16, 2011 (gmt 0)

10+ Year Member



Hmmm... I have no idea how the aliasing is done. It's just a control panel option I used. All I know is they arrive at the same site! So, sorry, I can't answer some of your questions as they're currently over my head.

I can't find a server variable called THE_REQUEST ?

I've been researching mod rewrite but haven't been able to work out the syntax properly and got stuck.

So basically I don't have anything much in my htaccess at the moment, it's all done for me somewhere!

Catfluf

1:13 am on Dec 16, 2011 (gmt 0)

10+ Year Member



How would I word a mod rewrite to make jacksplace.com/edit go to jacksplace.com/index.php?page=edit using relative references?
I would place this in the root directory I take it?

lucy24

4:47 am on Dec 16, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



This is where I step back and let the grownups take over.

:: looking vaguely around and wondering where everyone has gone ::

%{THE_REQUEST} is a mod_rewrite condition. It's the one that tells you what URL the user originally asked for, before their request got rewritten or auto-indexed. (But not redirected: that counts as a whole new request.)

I was trying to figure out if there is a quick-and-easy way to find out what THE_REQUEST is. But fortunately for you, I tested it on myself first-- and was treated to a nice array of 500 errors and/or browser stepping in to prevent infinite redirect loops. Oops. Never mind, then.

The change from
jacksplace.com/edit
to
jacksplace.com/index.php?page=edit

is trivial. So trivial, in fact, that we're not going to tell you how to do it. (That's the Apache forums "we", not the editorial "we".)

[httpd.apache.org...]

Read the first part of the mod_rewrite docs, stopping when you get tired. Then read the first page or so of posts in this forum. Sometimes one of the Forums regulars forgets to put in the boilerplate about "We answer this identical question about 87,000 times per month" and instead absent-mindedly answers the question all over again ;)

If it weren't for the aliasing complication, your question really would be identical to those other 87,000 questions.

phranque

10:24 am on Dec 16, 2011 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



i'm thinking lucy24 might be onto something with the cookie, but we're going in too many different directions already with this thread.
you need to get a tool working such as the Live HTTP Headers and/or Web Developer add-ons for Firefox so that you can viddy exactly what Request is being sent to the server and exactly what Response is returned to your browser.

right now you sound like the proverbial 5 blind guys describing an elephant.

Catfluf

6:28 pm on Dec 16, 2011 (gmt 0)

10+ Year Member



I installed Live HTTP Headers, and can see a difference between arriving at the root folder without specifying a file name, and arriving at a 'directory' a second level down without specifying a file name.

The latter contains this entry right before the host changes to the value I don't want: HTTP/1.1 301 Moved Permanently

Apache must be doing this before my index.php starts.

lucy24

8:21 pm on Dec 16, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Did you type the directory name with or without a slash? It's unnerving to watch, but the "directory slash redirect" is exactly that: a true 301 redirect, exactly as if there had been something in your htaccess or config file. Matter of fact, there would be two redirects in this situation. I just verified this by going to my own site and asking for

example.com/directory

with Live Headers running. First I get redirected to my preferred name form

www.example.com/directory

(stop weeping, g1, I'm taking the lazy route by letting the host do it) and then I'm redirected again to

www.example.com/directory/

Unfortunately Live Headers doesn't say exactly where the redirect originated. Just that there is one. It doesn't happen with the root folder, because there the trailing slash is appended by your browser itself.

You can save the Live Headers output as a text file and pore over its nuances.

phranque

5:43 am on Dec 17, 2011 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



The latter contains this entry right before the host changes to the value I don't want: HTTP/1.1 301 Moved Permanently

Apache must be doing this before my index.php starts.

you could turn on rewrite logging to see if the server is providing the redirect before the rewrite to your directory index script.

you should also inspect the cookies sent with the request.

Catfluf

6:10 am on Dec 17, 2011 (gmt 0)

10+ Year Member



You've hit the nail on the head Lucy! The difference between the two situations, and the reason a redirect is there - is the lack of trailing slash. A trailing slash on the domain name fixes my whole problem entirely!

I didn't realise the browser was adding it to jacksplace.com but NOT jacksplace.com/edit, thus causing Apache to add the slash and redirect to the new URL with it added.

Currently working out how to fix this...

Catfluf

7:17 am on Dec 17, 2011 (gmt 0)

10+ Year Member



Apologies Lucy and anyone else I've driven mad. I see you've covered the missing slash redirect previously in my topic but I didn't know enough at that stage to understand what you meant!


How's this for a clever fix? I can't claim full credit as I found the htaccess part browsing the net.

Instead of having a directory in the root named 'edit' containing a file called 'index.php':

I deleted the edit directory,

Put edit's index.php in the root and renamed it simply 'edit' (no extension),

And added this to my htaccess:
<Files edit>
ForceType application/x-httpd-php
</Files>

Now example.com/edit parses as php, with or without a trailing slash!

Any downsides you can see?

g1smd

7:47 am on Dec 17, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Glad you got it fixed!

example.com/page
is the URL for a page and

example.com/folder/
is the URL for a folder OR for the index page in a folder.

These conventions exist for a reason. The server will make assumptions about the request and can take action all by itself to "correct" incorrect requests. One of those is to add the slash for a folder requested without the slash.