Forum Moderators: phranque

Message Too Old, No Replies

redirect folder URL to file

         

ChanandlerBong

1:07 pm on Jun 17, 2012 (gmt 0)

10+ Year Member Top Contributors Of The Month



I am trying to handle a few bad links that point to /folder when they should be pointing to /page.html

There are also many other pages in that folder

I tried


RedirectMatch 301 ^/folder http://www.example.com/page.html


and when you try to open up /folder/page5.html, you get redirected to /page.html every time

and I tried


redirect 301 /folder http://www.example.com/page.html


and got something similar to /folderpage.html. I understand this last one is because as soon as it matches /folder, it just adds the page URL onto the end of it.

I need something that redirects:

/folder to page.html
/folder/ to page.html

but leaves

/folder/page5.html unmolested

g1smd

1:26 pm on Jun 17, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Use a RewriteRule and add an end anchor to the RexEx pattern.

^folder/
means "begins example.com/folder/"

^folder/?$
means "is exactly example.com/folder or example.com/folder/"

It's unusual for a folder to redirect elsewhere whilst pages inside the folder do not.

ChanandlerBong

1:56 pm on Jun 17, 2012 (gmt 0)

10+ Year Member Top Contributors Of The Month



thanks very much, I actually intend to start learning this stuff now because I realise after reading one or two rewrite/reg exp tutorials that it was a painfully basic question.

it is an unusual situation but out of my control, I have 2 sites linking to /folder instead of page.html and I've never been able to get them to change the links so I need to do it this way.

Is there any difference between using RewriteRule and RedirectMatch because, before seeing your answer, I tried RedirectMatch and added the $ end anchor and it seems to be working.

g1smd

2:01 pm on Jun 17, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Redirect and RedirectMatch come from mod_alias and RewriteRule comes from mod_rewrite.

If some directives are used from one module and other directives are from the other module, you cannot guarantee what order they will be processed. htaccess is processed in "per module" order, not in the order the directives are listed in htaccess.

Redirect A
RewriteRule B
Redirect C

Some servers will process those rules in order A, C, B while others will process in order B, A, C. If there's a redirect of any type processed after an internal rewrite, the rewritten path will be exposed back on to the web as a new URL. That's usually a disaster.


If you use RewriteRule for any of your rules you must then use it for all of your rules. The rules will then be processed in the order you list them in the htaccess file.

Once you have converted all of the directives to use mod_rewrite syntax, in general you should list rules that block access first (no point redirecting a request only to then block it), rules that redirect next (so the user is actually requesting the correct URL), and rules that rewrite last (so that only correctly requested URLs are rewritten).

RewriteRule can be configured to deliver external redirects or to process internal rewrites with only minor syntax changes. Make sure you fully understand these two processes.

ChanandlerBong

2:59 pm on Jun 17, 2012 (gmt 0)

10+ Year Member Top Contributors Of The Month



right, so I think I'll do everything through RewriteRule. At the moment, my htaccess file has RewriteRule for www/non-www and Redirect to catch a lot of old incoming links.

If I have


redirect 301 /page2.html http://www.example.com/page2.shtml
redirect 301 /page3.html http://www.example.com/page3.shtml
redirect 301 /page4.html http://www.example.com/page4.shtml


would that be equally easily done with RewriteRule? I suppose I always had in my mind that RewriteRule was when you needed to do something complicated with regex whereas Redirect was for simple a > b switches.

g1smd

3:07 pm on Jun 17, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



mod_rewrite is a newer Apache module. Using RewriteRule you can perform the same functions as Redirect and RedirectMatch (albeit with different syntax) as well as a whole lot of new stuff that the old ones cannot do.

RewriteRule ^page([234])\.html http://www.example.com/page$1.shtml [R=301,L]


Use the power of Regular Expressions!

List your rules from most specific to most general. Rules affecting single pages are first. The non-www/www redirect should be the last of the redirects.

ChanandlerBong

3:23 pm on Jun 17, 2012 (gmt 0)

10+ Year Member Top Contributors Of The Month



excellent, I'll go and get my hands dirty with this and come back if I have problems. Many thanks.

ChanandlerBong

12:01 am on Jun 18, 2012 (gmt 0)

10+ Year Member Top Contributors Of The Month



here is what I have so far, I'm having one issue I'll mention afterwards.


Options -Indexes

ExpiresActive on
ExpiresByType image/gif "access plus 1 month"
ExpiresByType image/jpeg "access plus 1 month"
ExpiresByType image/png "access plus 1 month"
ExpiresByType text/css "access plus 1 month"
ExpiresByType text/javascript "access plus 1 month"
ExpiresByType application/x-javascript "access plus 1 month"

# phprint file can't be cached
<Files phprint.php>
ExpiresDefault A0
</Files>

# Set up rewriting
Options +FollowSymLinks
RewriteEngine On

# specific redirects for old links and 404s

RewriteRule ^page\.s?html$ http://www.example.com/new-page.php [R=301,L]

# redirect all .html and .shtml requests to .php

RewriteRule ^(([^/]+/)*[^/.]+)\.s?html?$ http://www.example.com/$1.php [R=301,L]

# set all URLs to www versions

RewriteCond %{HTTP_HOST} ^[0-9]+(\.[0-9]+){3} [OR]
RewriteCond %{HTTP_HOST} ^example\.com
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

# add utf-8 to all text file types

<FilesMatch "\.(htm|html|shtml|css|js|php)$">
AddDefaultCharset UTF-8
</FilesMatch>

# custom 404 page

ErrorDocument 404 /404.php


the problem I'm having is even when I click on a "/" or "/subfolder/" link, the URL is coming out:

example.com/index.php or example.com/subfolder/index.php

I'm guessing that's because when a / link is clicked, the server serves up index.html which is then turned into index.php. Does that sound right? I added a rewrite rule which caused a loop:


RewriteRule ^index\.php$ http://www.esl-lounge.com/ [R=301,L]


I know this wouldn't have caught the subfolder occurrences but just wanted to test it, browser freaked out and said it couldn't be resolved. I guess it was stuck in a index.html to index.php to / to index.html to index.php loop, no?

lucy24

10:12 am on Jun 18, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I'm guessing that's because when a / link is clicked, the server serves up index.html which is then turned into index.php. Does that sound right?

Close, but not exactly. When you request a directory, mod_dir kicks in and checks the server's list of possible names for index files: for example index.html, index.htm, index.php etc. It then checks those against the contents of the directory, and uses the first match. (If there isn't anything, there's a further detour to see whether auto-indexing is enabled for that directory.)

The default is only index.html. You mention htaccess so you are probably on shared hosting. They will typically include a longer list to allow for the most common filenames; if you've got something really unusual you can specify it in your own htaccess.

The file-- index.php or whatever-- is functionally a rewrite, not a redirect. So you should never see /index.php at the end of the URL unless you've goofed elsewhere in your htaccess.

It's unusual for a folder to redirect elsewhere whilst pages inside the folder do not.

I've got a fistful of these. For various historical reasons, some directories contain named files but no index file. So rather than throw around 404s-- or, worse, 403s-- for humans who make the wrong guess, there are individual redirects. I send them either to the page that's the "flagship" of the directory, or to the page that contains the material you would expect to find in /index.html. For example:

RewriteRule ebooks/\w+(/(index\.html)?)?$ http://www.example.com/ebooks/ [R=301,L]
(That falls under "historical reasons": Each e-book has a name, and is in its own directory, so the only appropriate redirect is to the ebooks uber-directory.)

or
RewriteRule paintings/(\w+)(/(index\.html)?)?$ http://www.example.com/paintings/$1.html [R=301,L]
(Gallery-type pages, where for example /paintings/cats.html does the job that might otherwise be done by /paintings/cats/index.html.)

or
RewriteRule hovercraft/nunangat(/(index\.html)?)?$ http://www.example.com/hovercraft/nunangat/UraniumSplash.html [R=301,L]
(A pre-existing page, so I didn't want to rename it.)

Conversely ::cough-cough:: I've got a

RewriteRule games/index\.html http://www.example.com/games/ [R=301,L]

thanks to a scattering of outside links, generally* outside my control.

:: frantic detour here to add {THE_REQUEST} line before something rises up and bites me ::

At least I know that no evil hacker will ever run wild in my htaccess. I change it myself every other day ;)


* One of them is on my son's site. He finally got assertive and set up his own hosting plan so I can't just sneak in and change it, but, ahem, he'll fix it if I ask him.

ChanandlerBong

11:46 am on Jun 18, 2012 (gmt 0)

10+ Year Member Top Contributors Of The Month



haha, I've just found out what it was. Really dumb! I changed the site to php last night and uploaded hundreds of php files, but left all the shtml files too...you know, just for safety, my emergency parachute in case I had to go back and revert everything.

so in my root folder, I had index.php and index.shtml.

Obviously when there are two index files, the full filepath is shown with the /index.php appended, to show you which one is being served up. Doh!

I presume this is very well known behavior to anyone not green behind the apache ears? :o|

lucy24

4:40 pm on Jun 18, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Ooh, I didn't know that, so I have now learned something. (I also don't know how you get access to the extra smileys, darn it.)

:: detour to Apache ::

What Apache version are you on? I couldn't find anything to the point until I got to the docs for 2.3/2.4 which includes a new DirectoryIndexRedirect Directive [httpd.apache.org] within mod_dir.

By default, the DirectoryIndex is selected and returned transparently to the client. DirectoryIndexRedirect causes an external redirect to instead be issued.
...
A request for http://example.com/docs/ would return a temporary redirect to http://example.com/docs/index.html if it exists.

Ugh. Why would anyone want that? You can turn it off in htaccess.

g1smd

5:20 pm on Jun 18, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I'm guessing that's because when a / link is clicked, the server serves up index.html which is then turned into index.php.

Solve this by adding:
DirectoryIndex index.php


Obviously when there are two index files, the full filepath is shown with the /index.php appended, to show you which one is being served up. Doh!

That doesn't seem right. Requesting / will silently serve the first file listed in the
DirectoryIndex
list. A rewrite might then force a name change and a redirect expose that name. That would be a coding error. The index filename shouldn't be seen.

I added a rewrite rule which caused a loop:
RewriteRule ^index\.php$ http://www.example.com/ [R=301,L]

That code will result in a loop. You need to ensure that the pointer is set to "index" because "index" appeared in an external request and only redirect those requests and not redirect where the pointer is set to "index" as a result of a previous internal rewrite:

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.(php|s?html?)
RewriteRule ^(([^/]+/)*)index\.(php|s?html?)$ http://www.example.com/$1? [R=301,L]


This redirect goes before your non-www/www redirect.

In your non-www/www redirect, change the two conditions to one:

RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$

ChanandlerBong

6:21 pm on Jun 18, 2012 (gmt 0)

10+ Year Member Top Contributors Of The Month



lucy, I'm on 1.3.41 (seems a bit old, no?)

As soon as I removed all the .shtml files, the behavior returned to normal. Maybe it's a correlation/causation thing, but it is the only thing I changed between seeing /index.php appended and then it returning to normal with just the / shown. Maybe there's something quirky higher up the hierarchy on the server that my host has put which means the index filename shows when there are more than one present.

g1smd, the /index.php problem seems to have gone away now and I've simplified the non-www redirect too.

a question on your rewrite regex near the end of your message:

what is the difference between THE_REQUEST and REQUEST_URI. I have a lot of 404s to old /php-ads-new/ URLs. I want to send them to a black hole (403), would this do it?


RewriteCond %{REQUEST_URI} ^php-ads-new/[a-z-]\.php [NC]
RewriteRule .* - [F,L]

g1smd

6:29 pm on Jun 18, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



REQUEST_URI holds the current path and file, and the value of this pointer is changed by mod_rewrite processing.

THE_REQUEST holds the actual request header as sent by the browser in the form:
GET /somefile HTTP/1.1




RewriteCond %{REQUEST_URI} ^php-ads-new/[a-z-]\.php [NC]
RewriteRule .* - [F,L]


Four errors. Use this:

RewriteRule ^php-ads-new/[a-z-]\.php - [F,NC]


You don't need a separate RewriteCond for REQUEST_URI, just test the requested URL path in the rule's RegEx pattern. The extra condition would never have matched anyway, as it's value always begins with a leading slash which you had omitted.

ChanandlerBong

6:35 pm on Jun 18, 2012 (gmt 0)

10+ Year Member Top Contributors Of The Month



great, sooooo why would you ever use a RewriteCond for a URL match when you could just do it with one line, as you did there and as I've done with the rest of my htaccess?

g1smd

6:44 pm on Jun 18, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You'd use a RewriteCond when you needed to test:
- HTTP_HOST for which domain was requested e.g. in non-www to www redirect,
- SERVER_PORT testing for "443" or for "not 443" to find out if the current request was for a http or for a https URL,
- QUERY_STRING for query string parameters when redirecting to extensionless or other URL,
- REQUEST_URI with ! for "not" when you wanted to exclude some paths from being redirected and the RegEx pattern in the rule part was already capturing path parts for re-use,
- THE_REQUEST to ensure that the matched part was in the original external request and not in a rewritten pointer (use this to prevent an infinite redirect-rewrite loop),
and a whole host of other scenarios.

lucy24

10:03 pm on Jun 18, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



why would you ever use a RewriteCond for a URL match

Because it saves you the alarming and time-consuming experience of fixing the boilerplate htaccess that came with your CMS installation ;)

I detoured to my own htaccess to see if the %{REQUEST_URI} element ever occurs. I found four-- all !negative as noted in g1's post. If you are very careful you can write a rule containing ! but it's definitely not for the faint of heart.

ChanandlerBong

8:11 am on Jun 19, 2012 (gmt 0)

10+ Year Member Top Contributors Of The Month



another dumb question for the thread: can you not do negative matching using just a one-line RewriteRule?

I know using ^ and ! within a regex is not the same, is it?

phranque

8:53 am on Jun 19, 2012 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



"is a perl compatible regular expression with some additions"
... including using the '!' as a negation character to specify a non-matching pattern...

so the ! is a "prefix to", not "within", a regex.

and in a PCRE, '^' typically is used as the start anchor for a pattern.
only when used first in a character class specification (first character after the opening square bracket) does it mean "not" this set of characters.

and yes in some cases you can use negation in a RewriteRule pattern - see the note in the documentation for exceptions to this.


@lucy24:
Using Faces:
http://www.webmasterworld.com/help.cgi?cat=ubbcodes [webmasterworld.com]

phranque

11:36 am on Jun 19, 2012 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



i just realized i missed part of the beginning of my previous post which should have read:
the RewriteCond condition pattern "is a perl compatible regular expression with some additions":
http://httpd.apache.org/docs/current/mod/mod_rewrite.html#rewriterule
... including using the '!' as a negation character to specify a non-matching pattern...

g1smd

7:24 pm on Jun 19, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Negative match patterns cannot be used for capturing backreferences.

Lets say you wanted to redirect requests for all files other than robots.txt to a different domain.

You might be tempted to try this:
RewriteRule !^(robots\.txt)$ http://www.example.com/$1 [R=301,L]


The above code will NEVER capture anything in $1.

This is a case where a negative match RewriteCond is the right thing to do, as follows:
RewriteCond %{REQUEST_URI} !^/robots\.txt 
RewriteRule (.*) http://www.example.com/$1 [R=301,L]


Be aware that the rule pattern is evaluated first and the condtions after that. So, the above ruleset is for "all URLs" "except robots.txt".