Forum Moderators: phranque

Message Too Old, No Replies

Regular expressions in .htaccess

         

wener

8:50 pm on Aug 16, 2005 (gmt 0)

10+ Year Member



Hi,
Can anyone help me out with correcting this line in my .htaccess file?

We just redesigned our site, so some file names have been changed. I want to use redirect 301 to redirect all old files named *_basket.htm to a new page named gift_baskets.htm. I wrote this line on the .htaccess file and it didn't work.

redirect 301 /\w+_basket.htm http://example.com/gift_baskets.htm

Thank you!

[edited by: jdMorgan at 1:50 pm (utc) on Aug. 17, 2005]
[edit reason] Examplified. [/edit]

ChadSEO

9:46 pm on Aug 16, 2005 (gmt 0)

10+ Year Member



wener,

You can use RedirectMatch instead to match several pages. Try this:

RedirectMatch 301 .*_basket.htm http://example.com/gift_baskets.htm

Chad

jdMorgan

10:17 pm on Aug 16, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If the intent is to require some characters before "_basket" then ".+" would be more appropriate. Also, you should escape the literal period and end-anchor the pattern:

RedirectMatch 301 .+_basket\.htm$ http://example.com/gift_baskets.htm

See the regular-expressions tutorial cited in our forum charter for more information.

Jim

wener

10:44 pm on Aug 16, 2005 (gmt 0)

10+ Year Member



Thank you so much! It works!

wener

7:50 am on Aug 17, 2005 (gmt 0)

10+ Year Member



Hi,
Thank you for your help. One more question. I want to add another redirect, but it goes to infinite loop. I don't know how to do that. Can you help me out? I read the tutorial, but still can't figure it out.:)-
RedirectMatch 301 /sibling_.+\.htm$ http://example.com/sibling_gifts.htm

I know it goes to infinte loop because the first part of the old file names and that of the new file name are exactly the same. But I don't know how to fix the problem. Thank you very much in advance.

[edited by: jdMorgan at 1:51 pm (utc) on Aug. 17, 2005]
[edit reason] Examplified. [/edit]

jdMorgan

1:49 pm on Aug 17, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



For a case like this, you need to use mod_rewrite [httpd.apache.org] instead of mod_alias [httpd.apache.org]. This is because mod_rewrite supports conditional action, whereas mod_alias acts unconditionally. In this case, you want to do the redirect unless the requested page is already "sibling_gifts.htm":

Options +FollowSymLinks
RewriteEngine on
#
RewriteRule ^.+_basket\.htm$ http://example.com/gift_baskets.htm [R=301,L]
#
RewriteCond %{REQUEST_URI} !^/sibling_gifts\.htm$
RewriteRule ^sibling_[^.]+\.htm$ http://example.com/sibling_gifts.htm [R=301,L]

Rather than mixing mod_alias and mod_rewrite code, I have also shown the mod_rewrite replacement for your existing "_basket.htm" RedirectMatch.

See the reference documents cited in our forum charter for more information.

Jim

wener

7:44 pm on Aug 17, 2005 (gmt 0)

10+ Year Member



Thank you.
But I have a lot of lines using simple "Redirect 301" redirecting an single old page to a single new page. Is that okay to mix these simple redirect 301 and mod rewrite on the .htaccess page? I am sorry I know nothing about these stuff. Thank you so much for your help!

Webdetective

9:48 pm on Aug 17, 2005 (gmt 0)

10+ Year Member



Do I have my .htaccess file set up correctly?

ErrorDocument 404 [mydomain.com...]
AddType text/x-server-parsed-html .html .htm
RewriteEngine On
RewriteCond %{HTTP_HOST} ^mydomain\.com$ [NC]
RewriteRule ^(.*)$ [mydomain.com...] [R=301,L]

googlebot has been frequently spidering my homepage, after I removed a large number of doorway pages.

jdMorgan

11:41 pm on Aug 17, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



wener,

You can mix mod_alias and mod_rewrite redirects if you wish. I just showed the example in case you wanted to use it.

Jim

jdMorgan

11:53 pm on Aug 17, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Webdetective,

This belongs in it's own thread, but you have several problems in that code.
First and most dangerous being that your site will not return a 404-Not Found response, it will return a 302-Found for all missing resources. Try it here [webmasterworld.com] with a made-up (missing) page URL.

See the notes about the ErrorDocument directive [httpd.apache.org] in the Apache documentation to learn why.

Another problem is that you've end-anchored your domain name, which will be a problem if a user or caching proxy appends a port number to it, i.e. http://www.example.com:80

I'd rewrite the whole thing as follows:


ErrorDocument 404 /
AddType text/x-server-parsed-html .html .htm
RewriteEngine on
RewriteCond %{HTTP_HOST} ^example\.com [NC]
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

I would also suggest that for best SE results, you should not link all missing resources to your home page. For one thing, what if the missing resource is an image? For failed page requests, I'd suggest making a custom error page with links to your home page and site map.

Jim

wener

11:55 pm on Aug 17, 2005 (gmt 0)

10+ Year Member



jdMorgan, Thank you very much!

jbgilbert

12:24 am on Sep 3, 2005 (gmt 0)

10+ Year Member



Why is it that redirectmatch is so seldom mentioned, when (to me) it is much easier to understand?

So for an example, what is the difference between using the two "redirects (so to speak)" below:

RedirectMatch 301 (.*) [anotherserver.com...]

and

RewriteRule / [anotherserver.com...] [R=301,L]

And, if you (JDmorgan) answer this question, remember I get a free copy of your book!

jdMorgan

2:28 am on Sep 3, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



No book, but most of it's posted here... ;)

There is no difference in the results of the two directives, provided the RewriteRule is tweaked so that both lines of code do the same thing:


RedirectMatch 301 (.*) http://www.anotherserver.com/$1
#
RewriteRule (.*) http://www.anotherserver.com/$1 [R=301,L]

However, this thread started with a question about RedirectMatch. But the problem is that the Redirect family of directives in mod_alias have no capability to act conditionally. The RewriteCond directive of mod_rewrite adds this capability to RewriteRule, so mod_rewrite is more suited to complex rewriting requirements.

Sometimes it is this conditional-control advantage that causes mod_rewrite to be selected instead of mod_alias. In other cases, it's simply that the person who started the thread is asking a specific question about mod_rewrite, and we like to stay on-topic and not divert the conversation into other possible implementations.

However, simplicity is sometimes an advantage, and I see no reason to avoid Redirect and RedirectMatch if those directives are sufficient to the job.

Jim

jbgilbert

2:57 am on Sep 3, 2005 (gmt 0)

10+ Year Member



Excellent answer!

But, begs another question from me.

RedirectMatch 301 (.*) [anotherserver.com...]
RewriteRule / [anotherserver.com...] [R=301,L]

(I thought both of these, as written, said: "redirect" a request for any file on this domain to the same named file on anotherserver.com)

Is that not right?

If not, then that tells me there is a difference between
RewriteRule / [anotherserver.com...] [R=301,L]
and
RewriteRule (.*) [anotherserver.com...] [R=301,L]

?

jdMorgan

3:14 am on Sep 3, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If not, then that tells me there is a difference...

Yes, there is. And a rather big one, too.

It's time to hit the books [httpd.apache.org] on "back-references."

Redirect uses prefix-matching -- If the requested URL matches the given prefix, then the redirect is invoked. Prefix-matching is also used by robots when comparing URLs in their databases to filename prefixes specified in robots.txt.

RedirectMatch and the RewriteCond and RewriteRule directives of mod_rewrite use the far-more-powerful regular expressions pattern-matching, and also include the ability to create and use back-references. Parts of the URL matching a parenthesized subpattern in the regular-expressions pattern can be "remembered" and reused in the substitution URL.

Check out the documentation cited in our forum charter, and the threads in the Apache Forum section of the WebmasterWorld Library for more info.

Jim

jbgilbert

4:11 pm on Sep 3, 2005 (gmt 0)

10+ Year Member



Boy... blew me away with that one...

I'll do the research, but before I do please confirm what I think you said...

RewriteRule / [anotherserver.com...] [R=301,L]
AND
RewriteRule (.*) [anotherserver.com...] [R=301,L]

Are different and do not accomplish the same thing... That is what you said, right?

jdMorgan

6:35 pm on Sep 3, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



No, they are not at all alike in operation.

The first one redirects any URL with a slash in it (except for the initial one, if this code is in .htaccess) to a single page -- All requests for all resources in any subdirectory on the first server will therefore get redircted to one page - the default index page on the other server. However, requests for resources in the root directory won't be redirected, because (in .htaccess) the leading slash will be stripped, and there will be no slash to match. Therefore the rule won't be applied to those requests.

The parenthesized pattern and the $1 back-reference in the second rule are what makes it possible to redirect any page on one server to the same page on another server. The parentheses tell the regex parser to "save" the contents that match the parenthesized pattern into a variable named $1, and the the contents of that variable are then inserted in the substitution URL. If additional parenthesized subpatterns are present in the RewriteRule, they are saved into sequentially-numbered variables $2 through $9.

Note that back-references can also be created in RewriteConds, and are assigned to variables %1 through %9. RewriteConds and RewriteRules can back-reference each other's variables.

The initial learning curve on regular expressions and mod_rewrite is steep. There are a lot of new terms and concepts to absorb. But it's largely a matter of pattern-matching, text substitution, and server variable definitions (e.g. knowing the meanings of %{REMOTE_ADDR} and %{HTTP_HOST}, etc.). After enlightenment, it then becomes a search for coding elegance and avoidance of typos... :)

Jim

jbgilbert

7:51 pm on Sep 3, 2005 (gmt 0)

10+ Year Member



Thanks, this discussion may have also provided me with an explanation for what went wrong in the following case.

Hosting company screwed up things. Ended up with 2 hosted domains with "exact" same pages and content (original domain1 and duplicate domain2). To fix the indexing in the SEs I used the following on domain2 thinking all pages on duplicate domain2 would be redirected to the home page on original domain1

(on domain2)
Redirect permanent / [domain1.com...]

Appeared to work (as far as helping get the dupe pages removed from the SE indexes), BUT (even after 3 months) the index page of domain2 (www.domain2.com/ as shown with the site: command) IS STILL INDEXED in the SEs!

Am I interpreting what happened properly?

I think I should have used:

RedirectMatch 301 (.*) [domain1.com...]

OR

RewriteEngine ON
RewriteRule (.*) [anotherserver.com...] [R=301,L]

But then would these last 2 also taken care of any directories as well as files?

Please tell me I am starting to get this...lol

jdMorgan

8:32 pm on Sep 3, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Your Redirect directive was correct, but would redirect each page on the duplicate domain to the same-named page on the "correct" domain. I think you're confusing the syntax used by Redirect versus RewriteRule.

Redirect uses prefix-matching, while RedirectMatch and RewriteRule use regular-expressions pattern-matching.

The major problem with search engines is that they like to see incoming links updated -- a 301 redirect from one domain to another will "take effect" faster in the search engines if some/most of the incoming links to the old domain are updated to point to the new domain. This process takes some time and effort, but a 301 by itself isn't the fastest way to get your rankings straightened out.

Please review the documentation for Redirect and RedirectMatch in mod_alias [httpd.apache.org], and RewriteRule and RewriteCond in mod_rewrite [httpd.apache.org]. This will clarify your questions about basic issues such as the matching methods used by the various directives (prefix-matching versus regular-expressions pattern matching) much faster and better than we can do it here.

Jim