Forum Moderators: phranque

Message Too Old, No Replies

newbie problem with mod_rewrite and virtual hosting

mod_rewrite rewrites *.html and other files, but not dir/

         

Paymaan

8:35 am on Apr 12, 2005 (gmt 0)

10+ Year Member



Hi everybody,

I am new here, and I apologize if my problem seems too simple.

I am using a virtual hosting, so my only choice is .htaccess for rewriting URLs. As my hosting space is too big for me, I am trying to use it for hosting my other domain names.

I am trying with a few scripts and currently this is what I use:

---
Options +FollowSymlinks
RewriteEngine On
RewriteBase /

#RewriteLog rewrite.log
#RewriteLogLevel 9

RewriteCond %{HTTP_HOST} ^(www\.)?domain\.org [NC]
RewriteCond %{REQUEST_URI}!^/computer/
RewriteRule (.*) /computer/$1 [L]

# Prevent direct 'computer' type-in or link access
rewritecond %{THE_REQUEST} ^[A-Z]+\ (http://(www\.)?domain\.(org)(:[0-9]+)?)/(computer)/(.*)\ HTTP
rewriterule .* %1/%6 [R=301,L]

---

Yes I have copied it from somewhere in this list, but I have tried other types and got almost the same result.

First that I can not see any rewrite log on my server to be able to track the problem, so I've commented those lines out.

Then links are rewrite with this results (considering I am using domain.org, domain.org should be rewrited to domain.com/computer folder and the are html pages there, including index.html, and the domain everything is
hosted on is domain.com):

1) [domain.org...] : browser shows the 404 page not found, address bar remains unchanged (http://www.domain.org)

2) [domain.org...] same as above.

3) [domain.org...] shows the content of /computer folder correctly, as I desire.

4) [domain.org...] shows the contect of domain.com, images are not shown, URL in address bar remains unchanged. (this might be due my ISP's cache server, as another html file shows correctly).

I have read mod_rewrite guide and documents, 100s of this forum's posts, regex tutorial etc. but I am somehow mixed up as I can not log what is happening (any help on rewrite log?), so I appreciate if you help to understand the problem (most important to me) and then to resolve it?!

Paymaan.

jd01

9:33 am on Apr 12, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I am not completely sure you can do what you are attempting, without having access to the httpd.conf file. (Have not tried that one myself). My recommendation is to work through one step at a time...

1st start with a simple rewrite:
#RewriteEngine ON
#RewriteRule ^.*$ /computer/index.html [L]

Then request any page except /computer/index.html

If you are getting a 404 and both pages are there, the engine is working correctly, but your path is incorrect. You might use [R,L] if you can't get your log to work, this should show you the address you are trying to access in the browser, so you will know where your path is wrong.

Then add a simple condition:

#RewriteEngine ON
#RewriteCond %{REQUEST_URI}!^(.*)/computer/(.*)
#RewriteRule ^.*$ /computer/index.html [R,L]

Then request any file that is not in /computer

When you get the 'blanket' rules/conditions working, then you can add specifics one at a time. Not only will this will let you see where it breaks, so you can find the correct syntax, path, etc. for each area of your file, it will teach you how to create your own rules.

Remember the 1st part of the RewriteRule is from where you have you .htaccess file, the second is the full path.

Justin

sitz

1:11 pm on Apr 12, 2005 (gmt 0)

10+ Year Member



This likely goes without saying, but you'll want to strip the leading '#' characters from the above example; commented out directives won't do you any good. =)

jd01

1:33 pm on Apr 12, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



sitz,

Uh, yeah, that's what I meant...

Good catch.

Thanks,

Justin

jdMorgan

2:59 pm on Apr 12, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



{THE_REQUEST} will never contain the method or domain name, so the second rule will never be invoked. You'll need to use HTTP_HOST to capture the requested domain name. There is also an unclosed right parenthese in that RewriteCond.

RewriteCond %{HTTP_HOST} ^(www\.)?domain\.org [NC]
RewriteCond %{REQUEST_URI} !^/computer/
RewriteRule (.*) /computer/$1 [L]
#
# Prevent direct 'computer' type-in or link access
RewriteCond %{HTTP_HOST} ^(www\.)?domain\.org
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /computer/([^\ ]*)\ HTTP
RewriteRule . http://%{HTTP_HOST}/%1 [R=301,L]

Or an even shorter version of the second ruleset:

# Prevent direct 'computer' type-in or link access
RewriteCond %{HTTP_HOST}<->%{THE_REQUEST} ^((www\.)?domain\.org[^<]*)<->[A-Z]+\ /computer/([^\ ]*)\ HTTP
RewriteRule . http://%1/%3 [R=301,L]

Note that the "<->" character sequence is arbitrary. While I use it to imply concatenation, it has no special function, and only serves to delimit the two variables so that the combined compare is unambiguous.

Jim

Paymaan

9:11 pm on Apr 12, 2005 (gmt 0)

10+ Year Member



Thanks for your replies, I have used Jim's script (the short version) on site but there are problems.

rewrite to html pages take a lot of time and finally gives page not available error.

[*.com...] is available, along all subfolders and files.

[*.org...] gives page not available error.

every other html page at *.org version gives the same error!

current script is this:
----
RewriteCond %{HTTP_HOST} ^(www\.)?example\.org [NC]
RewriteCond %{REQUEST_URI}!^/computer/
RewriteRule (.*) /computer/$1 [R,L]
#
# Prevent direct 'computer' type-in or link access
RewriteCond %{HTTP_HOST}<->%{THE_REQUEST} ^((www\.)?example\.org[^<]*)<->[A-Z]+\ /computer/([^\ ]*)\ HTTP
RewriteRule . [%1...] [R=301,L]
----

[edited by: jdMorgan at 12:57 am (utc) on April 13, 2005]
[edit reason] Removed specifics. [/edit]

jdMorgan

12:59 am on Apr 13, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Get rid of the [R] in the first rule. You've created an 'infinite' rewrite loop.

RewriteCond %{HTTP_HOST} ^(www\.)?example\.org [NC]
RewriteCond %{REQUEST_URI} !^/computer/
RewriteRule (.*) /computer/$1 [b][L][/b]
#
# Prevent direct 'computer' type-in or link access
RewriteCond %{HTTP_HOST}<->%{THE_REQUEST} ^((www\.)?example\.org[^<]*)<->[A-Z]+\ /computer/([^\ ]*)\ HTTP
RewriteRule . http://%1/%3 [R=301,L]

Jim

Paymaan

8:36 am on Apr 13, 2005 (gmt 0)

10+ Year Member



Thanks Jim,

Removing R helps to see html files again, but [example.org...] and [example.org...] still give the 404 page not found error. Also any subfolders give the same result, just that if I included trailing slash, it gives another type of 404 error.

Any hints? Does it have anything with IndexAuto (AutoIndex?!) etc.? If so, I might be able to change it through my website's control panel, if it helps.

jdMorgan

1:48 pm on Apr 13, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Let's go back to basics for a second: Have you ever had pages from www.example.org or example.org show up when requested from this server?

If not, then it's likely that either your DNS is not set up to point these requests to your server, or that the server is not set up to recognize them, or both. The DNS is set up using the 'zone file' for the example.org domain, and the server is set up either by modifying httpd.conf, or possibly by using your 'control panel.'

Jim

Paymaan

3:15 pm on Apr 13, 2005 (gmt 0)

10+ Year Member



Well, let get to actual domain names to make it easier for you.

<Let's not -- It's against the WebmasterWorld Terms of Service [webmasterworld.com]>

My main site is example.com, there are a few domain pointers set to the same virtual hosting, and I am trying to experiment on one of least used domain pointers (example.org) to see if I can host other low traffic websites of mine there, as I am tired of using low quality free hostings etc. By the way, why use free hostings or pay more money for hosting when my major site has 900MB unused space and lots of free bandwidth?

So to answere you, yes, the org domain is set correctly. before I play with mod_rewrite, it was pointing to the com version. There are no problems with the .com version, it works as usual and shows indexes and other files correctly (with or without trailing slashes, with or without www. etc.).

[edited by: jdMorgan at 4:12 pm (utc) on April 13, 2005]
[edit reason] Removed specifics per TOS. [/edit]

jdMorgan

4:27 pm on Apr 13, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Well, to further simplify, you can delete the second ruleset for now, and test only:

RewriteCond %{HTTP_HOST} ^(www\.)?example\.org [NC]
RewriteCond %{REQUEST_URI} !^/computer/
RewriteRule (.*) /computer/$1 [L]

That is a very simple ruleset, and examples of it are posted all over the place here. Get that ruleset working, and then add the second ruleset in order to prevent direct type-in access of the alternate-domain subdirectories and to prevent them from getting accidentally listed in search results. Be sure to flush your browser cache (delete your Temporary Internet Files) after making any change to your .htaccess file.

I doubt this problem has to do with autoindex, unless the alternate domain subdirectory does not contain an 'index.html' or other index file.

When you get the 404 error, what does your access log file say? And what does your error log say?

jd01 supplied a useful debugging tip above: Temporarily using an external redirect should make the rewritten URL visible in your browser address bar. However, this can't be tested with the second ruleset in place, because as previously mentioned, the two rules working against each other will put your server into a loop if an external redirect is used in the first rule.

Jim

Paymaan

6:40 pm on Apr 13, 2005 (gmt 0)

10+ Year Member



I removed the second ruleset first, then cleared my browser cache.

example.org/index.html (*.html) file works ok both when the R flag exists and when it does not.

also www.example.org/index.html

but when I remove R flag, example.org and www.example.org give 404 errors.

when R flag exists, both www.example.org and example.org are correctly redirected to their index.html files. It shows the correct path to www.example.org/computer/ and example.org/computer/ and shows their index.html

WHat's wrong with internal rewrites?!

jdMorgan

1:09 am on Apr 14, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I wonder if your server is configured with "UseCanonicalName [httpd.apache.org] on". If so, ask your host to turn it off.

Other than that, code in httpd.conf added by your host or by a "control panel" may be interfering with your new rules.

Jim

Paymaan

8:27 pm on Apr 15, 2005 (gmt 0)

10+ Year Member



I finally got to my access log and separated the lines which seems are from me doing tests on the mentioned URIs, but I can not easily understand where is the problem, although some wrong things appear to exist like some double slashes (say //computer etc.) where some of 404s are. So how can I give this part of my log to interpret?

I noticed that in my htaccess, I also have two redirect lines before using mod_rewite, can those be of importance regarding this problem?

Here are those lines:
redirect /wwwboard [example.com...]
redirect /example/ [example.com...]

3rd, the .htaccess I am using is the one in the main www directory, is that ok? I do not use mod_rewrite on any subfolders (yet).

jdMorgan

8:42 pm on Apr 15, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The following line will result in double slashes if "/wwwboard/" is requested:

redirect /wwwboard http://www.example.com/store/

The result would be "/store//"

Jim

Paymaan

3:37 pm on Apr 16, 2005 (gmt 0)

10+ Year Member



I resolved that problem with //store, thanks to Jim. I have also sent my logs through sticky mail to you, but also let's try something else, I mean is it possible to use rewrite and regex to recognize the request has not mentioned index.html (like [example.org)...] and then use rewrite to add an index.html to resolve the matter?!

Any hints anybody?

jdMorgan

5:08 pm on Apr 16, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Do you have a DirectoryIndex directive in your .htaccess or httpd.conf file?

You can try adding:


DirectoryIndex index.html

to your .htaccess, and that might help. It would be quite odd to need to add this though, since it's usually part of a standard server configuration.

Jim

Paymaan

8:47 pm on Apr 16, 2005 (gmt 0)

10+ Year Member



I do not have it in .htaccess, but I am sure I have it in httpd.conf file, as refering to directories in other conditions lead them to index.html

I can also stop showing folder contents through control panel, can it be a source of problem? I personally doubt it.

jdMorgan

9:21 pm on Apr 16, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Control panels are a constant source of problems, because they modify your server configuration by placing code in the various configuration files. Because this code is machine-generated, it is often very badly-coded, or at least what I'd politely call 'non-optimal.'

Some of the worst mod_rewrite code I've every seen is generated by cpanel.

In order to find this problem, you may have to fully evaluate all of the server config files and all of the .htaccess files in the directory path used to reach the pages that are having problems.

Jim

Paymaan

3:25 pm on Apr 17, 2005 (gmt 0)

10+ Year Member



Normally, I do not try to use control panel features which modify any config files. Unfortunately, I do not have access to most config files other than .htaccess

A mod_rewrite question, how can I redirect pages with [example.org...] or [example.org...] to [example.org...] , and how can I combine such with the following? (considering every following subdirectory will also be directed to its own index.html, say example.org/dir1/ will be redirected to example.org/dir1/index.html)

---
RewriteCond %{HTTP_HOST} ^(www\.)?example\.org [NC]
RewriteCond %{REQUEST_URI}!^/computer/
RewriteRule (.*) /computer/$1 [L]
---

I guess if I combine such rulesets, my problem will be resolved completly, html pages will be rewrited correctly and when a folder, being root or a subfolder is in the URL, it adds the index.html to the URL.

jdMorgan

4:22 pm on Apr 17, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> when a folder, being root or a subfolder is in the URL, it adds the index.html to the URL.

In order to do this, you'd need to use RewriteCond with the -d flag to check that the requested URL is a directory. This means an extra filesystem access for each and every HTTP request, and is inefficient.

You'd be far better off to figure out why accesses to any_dir/ don't automatically return the index file, as they should, even without mod_rewrite. If you add more complication to something that is broken, you often get more complication but still broken.

If you put


DirectoryIndex index.html
Options -Indexes

in your .htaccess file, does that change anything?

Having done that, any request for / should return /index.html and any request for directory/ should return directory/index.html

If not, you need to have your host fix your account.

SEO note: For purposes of SEO and usability, you want your main index page to be example.com/ and not example.com/index.html. Otherwise, you will split the PageRank between the / and /index.html URLs. In addition to that, what if you later decide to change your index page to index.php? Then you'd lose all the PageRank for the main page of your site, until Google got around to fully indexing it again... :(

If you have UseCanonicalName off, and Apache mod_dir is correctly installed, then any request for directory_name should be redirected to sirectory_name/ without you having to add any code anywhere. It should just work (see mod_dir documentation).

If not, you need to have your host fix your account.

You may also wish to try adding


Options -MultiViews if you don't use content negotiation.

Even if that doesn't fix anything, turning off content negotiation can speed up your server significantly.

Please try these various tests and let me know what heppens in detail. Otherwise, it will be up to your hosting company to help you unless someone else here has any more ideas. On Apache, .htaccess is subordinate to httpd.conf, and it is rare that you can 'fix' a problem in httpd.conf by using .htaccess. The power of .htaccess is intentionally limited for security and server confguration control reasons, and that means that errors in configuration cannot usually be fixed by individual server users.

Jim

Paymaan

8:35 pm on Apr 17, 2005 (gmt 0)

10+ Year Member



I tried the DirectoryIndex and both options, with no success. Something seems to be wrong, everything workd ok if I add the R Flag ([R,L]) to the rewrite rule, but directory indexes for this rewrite don't work if I remove the R flag ([L]). Can you please guess what can be the source of this?!

Everything remains ok in the example.com and other domain pointers in both cases, so something related to internal rewrites should be wrong.

In case of external rewrites, doing [example.org...]
is being rewrited to [example.org...] and it workd correctly, I mean it shows index.html

I am quiet mixed up, working with a such a big hosting, and having such a stupid problem :)

Paymaan

6:27 am on Apr 25, 2005 (gmt 0)

10+ Year Member



It seems nobody is interested in resolving my problem anymore! Heeeeelp please!

jd01

9:47 pm on Apr 25, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Not sure I understand your last 'this is what works and doesn't' post.

I think Jim's idea of contacting your hosting co. to determine the current configuration of the httpd.conf file is very wise, otherwise you may be compounding a problem, which could create undesired results.

My advice... Contact your host for the httpd.conf file, then generalize and post that with the generalized portion of your .htacces file that you are using to deal with the directory vs directory/index.html situation.

In giving people the opportunity to look at both simultaneously, you should be able to get some solid feedback/direction for your situation.

Justin

Paymaan

10:59 am on Apr 26, 2005 (gmt 0)

10+ Year Member



As you suggested, I contacted the admins and they simply refused to give httpd.conf files, because of security reasons. I am working with them to see if they can the source of internal rewrites problem with subdirectory and no file name.

I am still curious to see if there is any mod_rewrite solution to separate URLS containing no file name and adding an index.html file to them, I know it can be not efficient, but it is still intersting to me.

Thanks.
Paymaan.

jdMorgan

2:43 pm on Apr 26, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> separate URLS containing no file name and adding an index.html

You could use a construct such as:


RewriteCond $1 ![^.]+\.[^/]+$
RewriteRule (.*)/?$ http://www.example.com/$1/index.html [R=301,L]

Jim

Paymaan

3:37 pm on Apr 28, 2005 (gmt 0)

10+ Year Member



Thanks Jim,

Finally somebody in my hosting worked on problem and changed the script to this, and it works almost ok, I would like to thank all who gave me hints, as well as to studying this working one, let me know what has been the problem, and if we all have missed something in the old non-working script?!

here it is, to make Jim's life easier, I change the actual domain name to example.org:

Options +FollowSymlinks
RewriteEngineOn
#RewriteBase/

## prevent direct '/computer' requests on example.org
RewriteCond %{HTTP_HOST} ^(www\.)?example\.org[NC]
RewriteCond %{THE_REQUEST} ^GET\ /computer
RewriteRule ^computer/(.*) [%{HTTP_HOST}...] external redirect

## rewrite everything else into /computer/
RewriteCond %{HTTP_HOST} ^(www\.)?example\.org[NC]
RewriteCond %{REQUEST_URI} !^/$
RewriteCond %{REQUEST_URI} !^/computer/
## internal redirect
RewriteRule ^(.*)$ /computer/$1[L]

Paymaan.

[edited by: jdMorgan at 7:08 pm (utc) on April 28, 2005]
[edit reason] Examplified. [/edit]

Paymaan

10:44 pm on Apr 28, 2005 (gmt 0)

10+ Year Member



I have to add that the main problem with the new script is that the old problem happens with subdirectories, they don't show indexes.
Say example.org/test/ again gives 404.

Any ideas?!

Paymaan.

jd01

12:34 am on Apr 29, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi again,

It looks like Jim's example will fit your needs

RewriteCond $1![^.]+\.[^/]+$
RewriteRule (.*)/?$ [example.com...] [R=301,L]

This externally rewrites everything to the /file/index.html file...

Or, could be adjusted to something like this, depending on exactly what you need:

RewriteCond $1![^.]+\.[^/]+$
RewriteRule (.*)/?$ /computer/index.html [L]

This is a silent version that will take any file (directory) EG /file or /file/ and write the contents of computer/index.html to it...

You might need to adjust it a little more depending on your specific purposes, but I think the idea should give you some direction.

Your complete file would look like this:

Options +FollowSymlinks
RewriteEngineOn
#RewriteBase/

## prevent direct '/computer' requests on example.org
RewriteCond %{HTTP_HOST} ^(www\.)?example\.org[NC]
RewriteCond %{THE_REQUEST} ^GET\ /computer
RewriteRule ^computer/(.*) [%{HTTP_HOST}...] external redirect

RewriteCond $1![^.]+\.[^/]+$
RewriteRule (.*)/?$ [example.com...] [R=301,L]

## rewrite everything else into /computer/
RewriteCond %{HTTP_HOST} ^(www\.)?example\.org[NC]
RewriteCond %{REQUEST_URI}!^/$
RewriteCond %{REQUEST_URI}!^/computer/
## internal redirect
RewriteRule ^(.*)$ /computer/$1[L]

I would also recommend changing this:
[%{HTTP_HOST}...]

to this:
[%{HTTP_HOST}...]

The difference is a permanent move vs the temporary one you are using currently.

With all the 'drama' surrounding the 302's of late, I would be safe rather than sorry, unless you know you need to use a temporary move.

The reason for adding in the middle of your current file is so we make sure to catch requests properly and in order.

The first 'set' checks to see if the original request is for /computer and responds accordingly.

Then the second 'set' (added) checks to make sure there is a file appended to the directory (/anything/anything.any), and if not rewrites to the correct format (/anything/index.html)

Finally, the third 'set' checks to see if /computer is already being used and if not serves (internally redirects) to /computer/anything.any.

Please, notice the order is important, and can be the reason a rule or 'set' will fail...

If this does not work, you might try some different ordering, but I think this is the correct version of what you are looking for.

Hope it finally does the trick.

Justin

Paymaan

11:15 am on May 1, 2005 (gmt 0)

10+ Year Member



Thanks for all the input, As you all predicted something seems to be wrong with my hosting, they gave me the following answere, where I beleive is not quiet correct, and this is not default Apache setting should do:


"Subdirectories do not work in this case because, in the sequence of processing steps that take place within the server for each web
request,
mod_rewrite is invoked long after the module that turns subdirectory requests into requests for the "index.html" file within the requested subdirectory. The reasons for this are a little bit complicated, but are explained in the mod_rewrite documentation, which is available
here:

[httpd.apache.org...]

Such delay should not exist I guess. Please correct me if this is true so I'll try to convice them to fix it.

Paymaan.

This 34 message thread spans 2 pages: 34