homepage Welcome to WebmasterWorld Guest from 54.147.248.118
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
server config or .htaccess for 301 redirects?
hosting company prefers server config
Robert Charlton

WebmasterWorld Administrator robert_charlton us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 52 posted 8:44 am on Nov 24, 2002 (gmt 0)

I'm trying to get some 301 redirects put into the .htaccess file of a site I'm optimizing, and the guy I'm talking to at the hosting company wants to use server config instead.

He says that .htaccess should be used for controlling access only... not for redirects. Since it's his company and he's doing a favor for the site owner, and me, in handling this... and because I don't know much about Apache servers... I need to proceed very tactfully.

Also, I don't have the vocabularly or the knowledge to disagree. I suppose I could say that .htaccess is what works for Google... but I'd like to have more background info behind me. What's behind his server config preference, and will it work as well? For me, preserving links and PageRank are important factors.

Researching the Apache mod_alias docs a little, I see that the Apache version could be a factor (.htaccess redirect context "only available in versions 1.1 and later"). Since it's a fairly large hosting company, I'm assuming they have a recent version.

I'll post part 2 of this question, which is maybe much trickier, after getting feedback on server config vrs .htaccess. Let me just say that the site is a can of worms. ;)

 

bobriggs

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 52 posted 9:14 am on Nov 24, 2002 (gmt 0)

mod_alias directives will work in both situations, and I'm sure that the host is probably using Apache higher than 1.1, I think mine is 1.3.26 or .27, somewhere in there.

The disadvantage of using the directives in the server config files is that every time you want to make a change, the server will have to be rebooted:
[httpd.apache.org...]

Everything you need to do can be done in .htaccess, and all you need is ftp.

Robert Charlton

WebmasterWorld Administrator robert_charlton us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 52 posted 5:40 pm on Nov 24, 2002 (gmt 0)

bobriggs - Thanks. What is the best way to synchronize the two areas (I don't even know whether server config is a file), and which file takes precedence if there are conflicting instructions? If I move instructions to .htaccess, are there some instructions better left in server config?

Getting to the can of worms aspect of this site, the current home page, I'm told, is set up on the server as a directory... www.domain.index.htm/ --- with some sort of meta refresh from a default index.shtml page taking you to this page. Has to do with the way they thought they needed to preserve links way back when.

I'm still not sure what they did to get this arrangement. I'm now faced with the task, with my limited knowledge and vocabularly, of finding out what was done, setting things up to make index.html the default home page, and using 301s to redirect the old pages.

Among other things, index.htm/ is the landing "page" for a great many PPC links, and the site owner definitely needs to preserve these, and also to preserve the referrer strings. There are some other (mirror) landing pages as well that need redirection. For more on this latter problem of the referrer strings, see:

[webmasterworld.com...]

What I want to do is to use 301s in htaccess for these pages/directories, if 301s would work here. I'm guessing the syntax would be something like:

Redirect /index.htm/ [domain.com...]

I also need to untangle what was done to make index.htm/ work as a page, and to figure out whether I need to preserve the old index.htm/ directory with an empty page so the local redirect will work. For starters....

Site owner has a lot of PPC coming in, so we can't risk these referrers being down for any length of time. I'll need to try it all step by step... probably first on a little-used landing page/directory before we even tackle index.htm/

jdMorgan

WebmasterWorld Senior Member jdmorgan us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 52 posted 7:43 pm on Nov 24, 2002 (gmt 0)

RC,

Does your client pay this hosting company? If so, you are the customer, and the customer is always right.

I would insist on using .htaccess - and on being granted full priveleges to do so. Tell the hosting company that you want to do it in .htaccess, get the bugs worked out, and then you'll consider moving it to httpd.conf (server configuration).

For the hosting company, this is likely a control and efficiency issue. Granting you .htaccess priveleges may be seen as "dangerous" from their perspective. Also, using httpd.conf is much more efficient server-processor-usage-wise; As a policy, there is merit in doing as much as you can at the higher httpd.conf level. But if you can talk them into letting you use .htaccess, there is one huge advantage - Both you and the hosting company personnel can access and modify it at any time.

You imply that the hosting company is willing to help you, and so this set-up would be ideal for initial development of the redirects, allowing cooperative work between you and the hosting co. As bobriggs points points out, changes made in httpd.conf can only be invoked by re-booting the server. This will materially slow down your work progress and - unless you write perfect code with perfect foresight - make debugging very difficult.

Your life will be much easier if you have full .htaccess priveleges (AllowOverride All), mod_rewrite available for use at your discretion in .htaccess, and mod_alias.

<rant>With hosting rates as low as they are, there is no reason to put up with mickey-mouse restrictions on access to basic tools needed to maintain and secure your site.</rant>

httpd.conf is higher-priority than .htaccess, and can override it by not allowing .htaccess to change the rules at a lower level. There is no need to "synchronize" the two files per se, but httpd.conf must be set up to allow .htaccess to do what you need to do, as listed above.

Bottom line - develop your rewrites and get everything working in .htaccess, then migrate parts or all of it to httpd.conf if you really need to.

HTH,
Jim

Robert Charlton

WebmasterWorld Administrator robert_charlton us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 52 posted 3:21 am on Nov 25, 2002 (gmt 0)

Jim - I get the feeling the hosting company has been very nice to my client, who has a rat's nest of a site, and I want to preserve all possible good will. Also, the worst thing that could happen is that they do exactly what I say, since I really don't know my way around a server.

>>httpd.conf is higher-priority than .htaccess, and can override it by not allowing .htaccess to change the rules at a lower level. There is no need to "synchronize" the two files per se, but httpd.conf must be set up to allow .htaccess to do what you need to do, as listed above.<<

Thanks. I thought there might be some priorities. Any idea what they might have done to have directory urls on pages? And will this redirect statement I suggested work, or does something more need to be done?

Redirect /index.htm/ [domain.com...]

And, do I need to preserve index.htm/ as a directory containing an empty page if I'm to do a local redirect (I remember reading this in another thread on the forum)?

The hosting company would rather have us just dump the old site and put up a new one. I want to maintain a bunch of the links that are coming in. I don't think they've given this much thought in the past. All help would be appreciated. I'm trying to come in with some knowledge of what's needed, so things go smoothly both politically and at the operational level.

jdMorgan

WebmasterWorld Senior Member jdmorgan us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 52 posted 3:45 am on Nov 25, 2002 (gmt 0)

RC,

I get the feeling the hosting company has been very nice to my client, who has a rat's nest of a site, and I want to preserve all possible good will. Also, the worst thing that could happen is that they do exactly what I say, since I really don't know my way around a server.

I know, I get off on a tear sometimes... Above all, a provider should want your business and to keep you happy - and so on down the chain...

httpd.conf is higher-priority than .htaccess, and can override it by not allowing .htaccess to change the rules at a lower level. There is no need to "synchronize" the two files per se, but httpd.conf must be set up to allow .htaccess to do what you need to do, as listed above.

Thanks. I thought there might be some priorities. Any idea what they might have done to have directory urls on pages? And will this redirect statement I suggested work, or does something more need to be done?

Redirect /index.htm/ [domain.com...]

I'll be honest - I've never seen such a directory name! It's not against the rules on a Unix box, it's just "highly irregular, harumph!" from a URL viewpoint.

The original URL,
[domain.com...] in all probability resolves (internal to the server) to
[domain.com...] (or ...index.htm)

Otherwise, the server would produce either an error page or a directory listing in response to that request. Given that, your redirect should work - and if you do have .htaccess priveleges, go ahead and try it.

And, do I need to preserve index.htm/ as a directory containing an empty page if I'm to do a local redirect (I remember reading this in another thread on the forum)?

I don't think so. The original directories in and below the "web-visible root" and all files can be deleted entirely, as long as you have their replacements available and a redirect in place before you delete them. You can view these redirects as a sort of "run-time renaming process". What is "renamed" is the URL in the incoming request, not the files.

The hosting company would rather have us just dump the old site and put up a new one. I want to maintain a bunch of the links that are coming in. I don't think they've given this much thought in the past. All help would be appreciated. I'm trying to come in with some knowledge of what's needed, so things go smoothly both politically and at the operational level.

Take it slow, one step at a time, and test as you go. Get the replacement directories/files/links in place before you delete the old ones, and start with the most important pages first. That way, if you get exaspirated or run out of funding, you'll have the important stuff done first.

And perhaps most importantly, get a second opinion! :)

Jim

Robert Charlton

WebmasterWorld Administrator robert_charlton us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 52 posted 4:09 am on Nov 25, 2002 (gmt 0)

Jim - Thanks...

>>I'll be honest - I've never seen such a directory name!<<

I'll be honest. I wish I never had. ;)

Robert Charlton

WebmasterWorld Administrator robert_charlton us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 52 posted 8:10 am on Nov 25, 2002 (gmt 0)

The original URL,
[domain.com...] in all probability resolves (internal to the server) to
[domain.com...] (or ...index.htm)

Yes... what I see happening, now that I have some full urls to enter, is that the default page:

www.domain.com/index.shtml

...meta refresh redirects to:

www.domain.com/index.htm/index.html

If I actually type the full urls in, they stay in the address bar. Otherwise I don't see the page names, just the page-like directory names, which is what was throwing me.

The page index.shtml is just a bare bone static redirect page... probably the shtml extension was used because there weren't any others left. ;)

Where are all these defaults set on the server? Eventually, we'll want the default page to be www.domain.com/index.html

Would the correct procedure be to reset the root directory default page (however we reset it) to index.html and then also to apply the following redirects?

Redirect /index.shtml [domain.com...]

Redirect /index.htm/index.html [domain.com...]

...OR, should the latter be?
Redirect /index.htm/ [domain.com...]

jdMorgan

WebmasterWorld Senior Member jdmorgan us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 52 posted 4:35 pm on Nov 25, 2002 (gmt 0)

RC,

Ugh... With all those pathnames, I'm lost. I also have to go deal with an insurance claim. I will get back to you here, but in the meantime, try to figure out what each file is, rather than just its name. Then maybe draw a map of the current file structure next to your desired file structure, and then draw arrows across the diagram to show the redirects. If you can classify files/pages into groups, you can redirect whole subdirectories at a time, rather than going file-by-file.

Search the Apache documentation for mod_dir for info on the DirectoryIndex directive.

I'll be back late this afternoon (local time), and will check back.

Jim

Robert Charlton

WebmasterWorld Administrator robert_charlton us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 52 posted 8:30 pm on Nov 25, 2002 (gmt 0)

JD - Thanks...

Maybe this can help clarify. My questions break down into several parts.

FIRST PART - DEFAULT INDEX PAGE:

I'm going to want to have the default be a normal setup... index.html in the root directory of www.domain.com --

Something has made the default for the main domain be www.domain.com/index.shtml -- but index.html is the default page in the subdirectories.

From what I can find, the index.shtml probably has been accomplished by the following line, in either (or both) httpd.conf and srm.conf, OR in htaccess:

DirectoryIndex index.shtml index.html

As I understand it, if the line is

DirectoryIndex index.html
...then index.html will be the default.

But, since the default is index.shtml in the root, and it is index.html everywhere else on the site, I'm not exactly sure what's happening. I thought DirectoryIndex was global.

SECOND PART - REDIRECTS:

There are lots of PPC landing "pages"/mirror "pages" on the site that currently have inbound links that I want to preserve. It turns out that these actually aren't pages, but directories named in the form of pages... as landingpage.htm/ -- (more on this in a moment).

The PPC links also contain query strings that are necessary for tracking, as described in:

[webmasterworld.com...]

Eventually, if we can preserve the query strings, I plan to redirect *all* the landing "pages" to the new main default page of the site, www.domain.com/index.html .

Before I do this, I will run a test with an unimportant landing page to a dummy test page. The directories containing these pages all have the landingpage.htm names, I believe because the site owner wanted once upon a time to preserve his old PPC urls. So, the current link to a landing page is of the form:

www.domain.com/landingpage.htm

The actual page called up by this url is:

www.domain.com/landingpage.htm/index.html

Suppose I want to redirect all queries to this page to:

www.domain.com/testpage.html --

Should the redirect read?:

(a)
Redirect /landingpage.htm/ www.domain.com/testpage.html

(b)
Redirect /landingpage.htm/index.html www.domain.com/testpage.html

(c)
Redirect www.domain.com/landingpage.htm/index.html www.domain.com/testpage.html

...etc? Or, should I use both
/landingpage.htm/
...and
/landingpage.htm/index.html
...to cover all bases?

REDIRECTING TO THE NEW INDEX PAGE:

Once I get this generic redirect scheme working, I'll want to:
- get rid of index.shtml as the default page for the site.
- set up index.html as the default page for the site and test it.
- point all the landing page redirects to www.domain.com/index.html

The devil is in the details... and I don't really understand the server commands or files.

I'd appreciate your filling in whatever you can do without a huge amount of trouble... at least not too much more than what you've already gone to. Thank you again.

jdMorgan

WebmasterWorld Senior Member jdmorgan us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 52 posted 8:41 pm on Nov 25, 2002 (gmt 0)

RC,

OK, I think I get it now.

Declare your default index file with:
DirectoryIndex index.html

Redirect all requests for /domain/index.htm/ to /domain/
(In other words, your default "home" page URL will now be [domain...] and the file index.html will be served because of the DirectoryIndex directive.)
Then redirect all other requests for domain/index.htm/(anything) to domain/(anything)

I believe you will want to use mod_rewrite to do this, since RedirectPermanent only does prefix pattern-matching. That is, it matches from the left-to the right only, and doesn't provide much flexibility because of that. e.g., there's no way I know of to conditionally redirect with RedirectPermanent based on the presence or absence of any filename following the domain name, as required for the "home page" redirect above.

If I understand the problem, this code should do the trick in your top-level .htaccess file:

DirectoryIndex index.html
Options +FollowSymlinks
RewriteEngine on
RewriteRule ^index\.htm/index\.s?html?$ http://www.domain.com/ [R=301,L]
RewriteRule ^index\.htm/(.*)$ http://www.domain.com/$1 [R=301,L]

The above should leave you with a nice clean "pageless" URL (http://www.domain.com/) for your home page, and make it appear that all pages have moved up one directory level, since the /index.htm/ subdirectory name following your domain name will now be gone.

You may also want to modify the 2nd line above to use
Options +FollowSymlinks -Indexes
This will "turn off" the ability of a visitor to view directory listings of your subdirectories - actually, of any directory not mapped to an html page by a DirectoryIndex directive. The server response is typically "You are not authorized to view directories on this server."

I sure hope I got this right! :o

Jim

jdMorgan

WebmasterWorld Senior Member jdmorgan us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 52 posted 8:43 pm on Nov 25, 2002 (gmt 0)

Oops! We cross-posted!

Still doing the insurance claim. Back in one hour...

Jim

jdMorgan

WebmasterWorld Senior Member jdmorgan us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 52 posted 10:43 pm on Nov 25, 2002 (gmt 0)

RC,

As I understand it, if the line is

DirectoryIndex index.html
...then index.html will be the default.

Correct.

Before I do this, I will run a test with an unimportant landing page to a dummy test page. The directories containing these pages all have the landingpage.htm names, I believe because the site owner wanted once upon a time to preserve his old PPC urls. So, the current link to a landing page is of the form:

www.domain.com/landingpage.htm

The actual page called up by this url is:

www.domain.com/landingpage.htm/index.html

Suppose I want to redirect all queries to this page to:

www.domain.com/testpage.html --

Should the redirect read?:

(a) Redirect /landingpage.htm/ www.domain.com/testpage.html

(b) Redirect /landingpage.htm/index.html www.domain.com/testpage.html

(c) Redirect www.domain.com/landingpage.htm/index.html www.domain.com/testpage.html

(a) with two changes: RedirectPermanent /landingpage.htm www.domain.com/testpage.html

Or, should I use both
/landingpage.htm/
...and
/landingpage.htm/index.html
...to cover all bases?

No, no need, since you're replacing or moving the page and its implied redirect, and these won't exist anymore.

However, I don't think Redirect or RedirectPermanent will preserve your query strings. It has never worked for me, but the only time I've tried it was on a "mickey-mouse" hosting service. Try it, but this may be another reason to use mod_rewrite. The RewriteRules in my earlier post will preserve the query strings. You can also add to the pre-existing query strings if you need to "tag" rewritten URLs:

RewriteRule ^index\.htm/index\.s?html?$ http://www.domain.com/ [R=301,L]
RewriteRule ^index\.htm/(.*)$ http://www.domain.com/$1?Rewritten_URL=true [R=301,QSA,L]

Here the arbitrary query string, "Rewritten_URL=true" will be appended to the existing query string of requests for anything except the site "home" page - by virtue of the [QSA] flag appearing in the second rule.

I think that ought to cover it - Sorry for the cross-post confusion!

Jim

Robert Charlton

WebmasterWorld Administrator robert_charlton us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 52 posted 11:07 pm on Nov 25, 2002 (gmt 0)

Jim - Rushed response, as I need to make a phone call and then pick up my car and do shopping before the Niners game. I think there's a partial miscommunication....

>>The above should leave you with a nice clean "pageless" URL (http://www.domain.com/) for your home page...<<

This part seems fine, at least at first reading, but...

>>...and make it appear that all pages have moved up one directory level, since the /index.htm/ subdirectory name following your domain name will now be gone....<<

...is where I think there's a miscommunication.

There aren't "all pages" in the index.htm subdirectory. The only page in the index.htm subdirectory is index.html, and it's returned in such a way that you're not aware of index.html... ie, you think you've gotten index.htm. I think the reason for this was that when the site owner moved to this particular hosting company, his home page, which had a lot of incoming PPC links, was index.htm.

In addition, he had other landing pages, also htm. Let's say that one of them was snafu.htm.

What he or the hosting company did was to create a subdirectory named /snafu.htm/ -- and the previous snafu.htm page became index.html in the snafu.htm subdirectory. Without access to the server, you would never know the index.html page was there... because when you linked to snafu.htm, the url you got was www.domain.com/snafu.htm/ ---

When you enter www.domain.com/snafu.htm without the trailing slash, the url that appears in the address bar contains the trailing slash.

When you enter just the domain name, what happens is that you get www.domain.com/index.shtml, but the page name doesn't show up in the address bar. Only the domain name shows up... and the page then immediately redirects, via a meta refresh, to
www.domain.com/index.htm/ , which is actually
www.domain.com/index.htm/index.html ---

Are you following? ;)

The only way I found out about the index.shtml and index.html filenames was to talk to the hosting company... and now that I know them, if I type in the full filenames, they do appear in the address bar.

Whatever I do to straighten this out, I must make sure I preserve the spidering path for Googlebot in order to get the benefit of links to the various landing pages, including to www.domain.com/index.htm/ --

It's unlikely that there are external links to index.shtml, because the page only hangs there for a fraction of a second, but I'd like to be safe if all other things are equal.

And it's also very important to preserve the PPC query strings (which I gather are sent by the browser) that go back to the server.

jdMorgan

WebmasterWorld Senior Member jdmorgan us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 52 posted 1:37 am on Nov 26, 2002 (gmt 0)

RC,

Oh, OK... I thought that everything on the site was under this /index.htm/ directory.

Maybe what you want is something like this:

RewriteRule ^/(.*)\.htm/$ /$1.html [R=301,L]

This will rewrite [domain.com...] to [domain.com...]
while

RewriteRule ^/.*\.htm/$ / [R=301,L]

Will rewrite any of those pages-posing-as-directories to your home page, assuming you already have the DirectoryIndex index.html directive set up.

In both cases, query strings will be preserved - I tested that on one of my sites earlier today. And you can append further tracking info (if needed to unravel this fur-ball) to the existing query string by putting the additional query string at the end of the rewritten URL in the RewriteRule, and using the [QSA] flag.

Jim

Robert Charlton

WebmasterWorld Administrator robert_charlton us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 52 posted 6:57 am on Nov 26, 2002 (gmt 0)

>>Will rewrite any of those pages-posing-as-directories to your home page<<

For a whole bunch of reasons, I want to do it on a page-by-page basis, and if we find that RedirectPermanent preserves the query strings, I'd prefer to do it that way. If it doesn't work, I will suggest mod-rewrite, but I'm in way way over my head with this.

To double check, the correct form for the redirects would be?...

RedirectPermanent /landingpage.htm www.domain.com/test.html

and eventually:

RedirectPermanent /landingpage.htm /domain/

Are the above correct?

Since I've found no evidence of any links to the index.shtml page, we're going to drop that and set up index.html as the default... but for the time being retain index.htm until we can test the redirects.

Thanks. Please send more thoughts if you have them.

jdMorgan

WebmasterWorld Senior Member jdmorgan us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 52 posted 2:33 pm on Nov 26, 2002 (gmt 0)

RC,

To double check, the correct form for the redirects would be?...

RedirectPermanent /landingpage.htm www.domain.com/test.html

and eventually:

RedirectPermanent /landingpage.htm /domain/

Are the above correct?

These look OK to me. Just remember that with Redirect, the entire prefix (left side) of the pattern must match. i.e., in the second line above, /landingpage.htm would have to be in the top-level directory. On Apache 1.3 or later, you could use RedirectMatch if this is a problem, since it allows the use of standard (non-extended) regular expressions.

Good luck with the clean-up!

Jim

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved