homepage Welcome to WebmasterWorld Guest from 54.166.111.111
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
add index.htm after trailing slash
omoutop




msg:4560877
 8:17 am on Apr 3, 2013 (gmt 0)

I need a way to add index.htm at end of url.
For example:
www.example.com/folder1/folder2 AND
www.example.com/folder1/folder2/ should both point to www.example.com/folder1/folder2/index.htm (and preferable be redirected there with 301 redirect).

Folders do not exist physical.
The /folder1/folder2 is the result of a rewrite rule:

RewriteRule ^([^/]+)/([^/]+)/index.htm$ path/to/file.php?var1=$1&var2=$2 [NC,L]

My first thought was to add a trailing slash, if it did not exist, so i add the following above any rewrite rules, and it seems to work.

RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*[^/])$ /$1/ [L,R=301]

EDIT-1:
I have come closer to a solution to my problem by using the following rule:

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} !/index\.htm$ [NC]
RewriteRule ^ %{REQUEST_URI}/index.htm [L,R=301]

but it doens not work properly. It adds /index.htm to all urls, although i thought by using RewriteCond %{REQUEST_URI} !/index\.htm$ [NC] i could have avoided that.

Any help?

 

omoutop




msg:4560896
 9:51 am on Apr 3, 2013 (gmt 0)

ok it seems i was able to solve my problem with the following rule.

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} !\.htm$ [NC]
RewriteRule ^(.*?)/?$ /$1/index.htm [L,R=301]

lucy24




msg:4560912
 10:47 am on Apr 3, 2013 (gmt 0)

What do you need all those conditions for? If you already know that the directories don't physically exist, there's no need to test for them.

RewriteRule ^([^./]+/[^/.]+)/?$ http://www.example.com/$1/ et cetera

I see what you're trying to do here
^(.*?)/?$

but it won't work. You want the RegEx to stop before the final slash, if there is one, and omit the slash from the capture. But Regular Expressions don't work that way.

How many of these nonexistent directories are there? It may be quicker to list them by name in your Rule.

But now for the real question:

Why on earth do you want to do this? If they were real files and directories, you'd be redirecting to get rid of the trailing "index.htm". If the directories don't exist, then their index.htm files don't either. You're just taking an extra detour before the final rewrite.

omoutop




msg:4560974
 1:50 pm on Apr 3, 2013 (gmt 0)

not my choice.. client wants his pages to have extension
so.. index.htm, contact.htm, etc

there more than 120 basic folders (subfolders may be well over 1000 in any combination of 2/3/4 level deep)
so its out of the question to write that many rules

as for the regex... so far it seems to work in the project. No strange links, no double slashes, all appear well.
example.com/folder1 AND
example.com/folder1/ get properly rewritten into example.com/folder1/index.htm... i don't see why the regex will fail - but i have a very limited experience with htaccess/apache enviroment. Care to explain a little more?

g1smd




msg:4561001
 2:27 pm on Apr 3, 2013 (gmt 0)

Clients don't always know best.

Google prefers to index URLs without the index filename in them.

You should redirect to remove the index name and link to URLs without the index name.

lucy24




msg:4561131
 8:55 pm on Apr 3, 2013 (gmt 0)

i don't see why the regex will fail - but i have a very limited experience with htaccess/apache enviroment.

Same here, but this one is purely about Regular Expressions.

^(.*?)/?$
There are two pieces to this pattern:
(.*?) "Capture the content, if any"
/? "There might be a / at the end"
Since the slash is not mandatory, the capture does not have to stop. It might, depending on RegEx dialect, but it doesn't have to.

I detoured to check in my text editor. There the pattern worked as intended. But I have met very similar situations where the .+? plus {blahblah}? pattern didn't do what I wanted it to. So I wouldn't rely on it. Remember that Regular Expressions, like computers as a whole, will misunderstand you whenever they possibly can. So you have to make a rule that leaves absolutely no wiggle room or space for ambiguity.

The one situation where a *? or +? construction will probably work as intended is if you had something like

^(.+?)/(.+)$
applied to a request containing more than one slash:
dir1/dir2/dir3
You should then get
(dir1)/(dir2/dir3)
instead of the default
(dir1/dir2)/(dir3)

But, again, it's safer not to depend on the ? question mark.

client wants his pages to have extension
so.. index.htm, contact.htm, etc.

Nothing wrong with extensions. It's only the specific filename "index.xtn" that's a problem. Ask your client if he wants requests for
directory/
to be explicitly redirected (not just silently rewritten) to
directory/index.htm

If he says "No, either way is fine", make up some scary stuff about Duplicate Content. If he says yes, give up and humor him.

omoutop




msg:4561217
 6:01 am on Apr 4, 2013 (gmt 0)

lucy, i will keep a note about the regex behavior and for a while monitor it to check its performance.

omoutop




msg:4561277
 10:10 am on Apr 4, 2013 (gmt 0)

ok after a small talk with the customer he agreed to a non-index.htm approach.
so, the "new" approach must satisfies these conditions:
- non-www into www urls
- index.htm/index.html to / url
- trailing slash to be added if needed

The new block of code is modified and looks like:


# remove index.htm/html from url
# rewrite non-www into www
# RULE A = combine both lack of www and presence of index
# RULE B = only lackof www
# RULE C = presence of index (htm or html)
# RULE D = add trailingslash if its gone, for folders only
#rule A
RewriteCond %{HTTP_HOST} !^www\. [NC]
RewriteCond %{REQUEST_URI} ^(.*/)(index\.html|index\.htm)$ [NC]
RewriteRule . [%{HTTP_HOST}%1...] [R=301,NE,L]
#rule B
RewriteCond %{HTTP_HOST} !^www\. [NC]
RewriteRule . [%{HTTP_HOST}%{REQUEST_URI}...] [NE,R=301,L]
#rule C
RewriteCond %{REQUEST_URI} ^(.*/)(index\.html|index\.htm)$ [NC]
RewriteRule . %1 [R=301,NE,L]
#rule D
RewriteCond %{REQUEST_URI} !(\.[a-zA-Z0-9]{1,5}|/)$
RewriteRule (.*)([^/])$ /$1$2/ [R=301,L]


Can this be improved? My knowledge ends here.

g1smd




msg:4561282
 10:18 am on Apr 4, 2013 (gmt 0)

Is there just one site involved here?

In other words, there's no other sites sharing the same folder?

[edited by: g1smd at 10:29 am (utc) on Apr 4, 2013]

omoutop




msg:4561283
 10:20 am on Apr 4, 2013 (gmt 0)

only one site
no other sites, no other subdomains

why?

g1smd




msg:4561284
 10:29 am on Apr 4, 2013 (gmt 0)

You need different (much more complicated) code for multi-site scenarios.

For single site, this is quite easy.

The index redirects must be before the non-www/www redirects. This avoids a double redirect for some requests.

The index redirects should include the canonical hostaname in the rule target. This avoids a double redirect for some requests.

Never begin a RegEx pattern with .* as this means "capture the remaining input all the way to the end", e.g. as used in ^(.*/)(index....

(index\.html|index\.htm) simplifies to index\.html? and you must escape literal periods.

Why do you need the NE flag?

This is the usual method:

# Index redirect
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.html?\ HTTP/
RewriteRule ^(([^/]+/)*)index\.html? http://www.example.com/$1 [R=301,L]

#Slashes
When are slashes added?
Trailing slash denotes a folder or the index page in a folder.
The DirectorySlash directive takes care of that automatically.
URLs for pages should not have a trailing slash. This is in the HTTP specs.

# Non-www/www redirect
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteRule (.*) http://www.example.com/$1 [R=301,L]



Never use (.*) at the beginning of a RegEx pattern, as in (.*)([^/])$

lucy24




msg:4561504
 8:45 pm on Apr 4, 2013 (gmt 0)

The [NE] flag is only needed when your target contains a literal character that must not be percent-encoded. For example:

RewriteRule ^paintings/rats/(kabloona|yesno)\.html http://www.example.com/hovercraft/caribou.html#$1 [R=301,L,NE]

(This is a rule from my own htaccess. It's also the example apache uses in their docs for the [NE] flag. I mean, ahem, the process, not the literal rule ;))

omoutop




msg:4561632
 8:22 am on Apr 5, 2013 (gmt 0)

thank you both for your advice - i will try to implement what you suggest

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved