Forum Moderators: phranque

Message Too Old, No Replies

Simple mod rewrite change not so simple!

         

ichthyous

4:05 pm on Oct 21, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I am trying to get the following match to work, and for some reason it isn't. I have URLS which are missing the trailing slash and need to have it added, for example:

/photos/new-york/architecture/page-name

to:

/photos/new-york/architecture/page-name/

The rewrite rule is adding an extra trailing slash to the end of the URL:

/photos/new-york/architecture/page-name//

Here is the code:

RewriteRule ^(photos/)?new-york/architecture/([^.]+)$ http://example.com/photos/new-york/architecture/$2/ [R=301,L]

Am I missing something?

jdMorgan

4:36 pm on Oct 21, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yes, after this rule is executed once, the URL will still match, so another slash will be added.

RewriteRule ^(photos/)?new-york/architecture/([^.]+)$ http://example.com/photos/new-york/architecture/$2/ [R=301,L]

Two solutions are possible. The first is to add a slash to the negative character group, and the other is to use a RewritCond to specifically exclude URLs that already have a trailing slash.

I'm suspect of your existing negative group "[^.]" -- as it will reject any URL with a filetype on it. But leaving that for the moment. you can add the slash to the group, so it will also reject any URL with additional slashes following "architecture/":


RewriteRule ^(photos/)?new-york/architecture/[b]([^./]+)[/b]$ http://example.com/photos/new-york/architecture/$2/ [R=301,L]

To explicitly exclude URLs that already have a trailing slash, use:

RewriteCond %{REQUEST_URI} !/$
RewriteRule ^(photos/)?new-york/architecture/([^./]+)$ http://example.com/photos/new-york/architecture/$2/ [R=301,L]

Jim

ichthyous

8:47 pm on Oct 21, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks Jim, that worked fine...although I'm still not sure I fully grasp why. Can you explain what role the ([^.]+) negative condition plays?

jdMorgan

9:19 pm on Oct 21, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Well, no I can't -- I assumed it was there because you needed it there.

In regular expressions, "[^.]+" means, "match one or more characters not equal to a period."

I added the "/", changing the pattern to "[^./]" and making it mean, "match one or more characters not equal to a period or a slash."

That is why it's critical to understand regular expressions and all of the Apache directives you use: Every single character in your .htaccess file can potentially shoot down your site or cause subtle but dangerous (to your income) problems with search engine spiders if it is incorrect. And only you can 'sign off' on the code as being 100% correct for your needs, based on your URLs, on your site. There is no such thing as 'one size fits all' code, and all we can really do here is talk in generalities about the most common applications.

Jim

ichthyous

9:49 pm on Oct 22, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I see, I was reading it correctly, but didn't really understand why it was being excluded. I usually test each page to see how they are reacting and if there is a proper 301--->200OK returned. There sheer number of pages I have needed to redirect recently has necessitated adding a large number of these rules. It can be very easy to lose track!

ichthyous

5:46 pm on Oct 23, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Jim,

Unfortunately I didn't discover a major error that the above code caused until this morning. It also adds the trailing slash to page numbers which shouldn't have them...for example it converted:

http://example.com/photos/architecture/2
to
http://example.com/photos/architecture/2/

I understand why it's doing that, because ([^./]+) will aplly to any number or character which is not a period or slash. How would I rewrite this to say "match any character NOT a period, slash, or number"? Very few of my page names have a number in them so that would correct both the first problem and prevent the slash from being added after numbers. I have given it a shot here:

([^./[0-9]]+)

Can this be nested in this way?

Thanks!

jdMorgan

6:12 pm on Oct 23, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> ([^./[0-9]]+)

> Can this be nested in this way?

No, but it need not be nested at all...

([^./0-9]+)

should work.

Jim

ichthyous

2:03 pm on Oct 24, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks Jim, that did work...but now I have to correct for all the page numbers which were being requested in error and getting a 404. I want to check this code before I post it. What I want the code below to say is "take any url that ends with a number AND a slash and rewrite it to the same url ending without a slash." However I am still uncertain where the slash should go in the second variable, inside or outside the parens?:

RewriteCond %{REQUEST_URI}!/$
RewriteRule ^(photos/)?cityscapes-skylines/([0-9]+)/$ http://example.com/photos/cityscapes-skylines/$2 [R=301,L]

Also I was wondering...wouldn't it be better if the page numbers DID end in a slash? I have read that urls not ending in trailing slashes can cause problems sometimes. Thanks!

jdMorgan

2:21 pm on Oct 24, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You don't need the RewriteCond -- which was the opposite of what you wanted anyway -- "!" means NOT.

# Strip trailing slash from /cityscapes-skylines/<number>/ and /photos/cityscapes-skylines/<number>/ URLs,
# and redirect to example.com/photos/cityscapes-skylines/<number> as the canonical URL
RewriteRule ^(photos/)?cityscapes-skylines/([0-9]+)/$ http://example.com/photos/cityscapes-skylines/$2 [R=301,L]

Also I was wondering...wouldn't it be better if the page numbers DID end in a slash? I have read that urls not ending in trailing slashes can cause problems sometimes. Thanks!

Piffle. A URL not ending in a slash refers to a file. A URL ending in a slash refers to the index page of the specified (sub)directory. But once you start rewriting URLs, it's six of one, half-dozen of the other -- It's largely a matter of style.

The only reason that the slashless URLs might be preferred is that --in a non-rewritten environment-- the slash approach would require that each slashed 'page' be the index file of its own directory, thus requiring a whole lot of directories with (possibly) only the index file in them. But if you're rewriting URLs, this no longer applies, since URLs and filepaths then become largely independent. Other than that, I prefer the slashless page URLs only because they are one character shorter.

Jim

ichthyous

3:50 pm on Oct 24, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



This worked great...thanks for the clarification on the trailing slash issue