Forum Moderators: phranque

Message Too Old, No Replies

Enforce trailing slash on only certain urls?

         

ichthyous

9:48 pm on Nov 6, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I have the following rewrite rule in place currently to enforce the trailing slash on old urls, however I need certain urls excluded from the pattern:

### Enforce trailing slash
RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_URI} !(.*)/$
RewriteRule ^(.*)$ http://examplesite.com/$1/ [QSA,L,R=301]


I need all urls that end in a trailing slash and then two numbers from a paginated series excluded...for example: examplesite.com/category/subcategory/12
This type of url ending in pagination up to about page 25 should be excluded from the rule. Thanks for any help!

robzilla

12:28 pm on Nov 7, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Not sure how your pagination works, but this modification also excludes URIs that end with digits after the trailing slash:

RewriteCond %{REQUEST_URI} !(.*)/\d*$

ichthyous

7:36 pm on Nov 11, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks Robzilla, that worked however I discovered that it also added trailing slashes to any files that ended in .html, so those pages started to break. How can I exclude certain filenames ending in .htm .html .pdf, etc?

robzilla

9:59 pm on Nov 11, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



To exclude all files with an extension, you could use another condition, like so:
RewriteCond %{REQUEST_URI} !\.[a-z0-9]*$ [NC]

i.e. do not match (!) any request that ends ($) with a dot (\.) followed by zero or more (*) instances of a letter or number ([a-z0-9]), and ignore case differences ([NC]).

Hmm. Actually, you already have a condition that does the same in the form of:
RewriteCond %{REQUEST_FILENAME} !-f

i.e. do not match if the file does not exist on the server.

Turns out that as of Apache 2.2 (I'm an nginx user myself), you need to write this like so:
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_FILENAME} !-f

So try that first.

ichthyous

5:37 pm on Nov 12, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Something is not working. If I change
RewriteCond %{REQUEST_FILENAME} !-f

to
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_FILENAME} !-f

Then everything breaks and all pages redirect to the home page

If I use this
RewriteCond %{REQUEST_URI} !\.[a-z0-9]*$ [NC]

instead of this
RewriteCond %{REQUEST_URI} !(.*)/$


I get double trailing slashes at the end of normal urls I don't want affected at all, i.e. urls that already end in a trailing slash, but not .html, .pdf. or a number like /15

Can you post the entire code maybe I'm missing something, thanks!

robzilla

9:19 pm on Nov 12, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



It's not a replacement of the other condition but an additional one:
RewriteCond %{REQUEST_URI} !\.[a-z0-9]*$ [NC]
RewriteCond %{REQUEST_URI} !(.*)/$
RewriteRule ^(.*)$ http://example.com/$1/ [QSA,L,R=301]

ichthyous

10:13 pm on Nov 12, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Ok I though that might be it. I merged them from the previous threads and this works to exclude pages ending in numbers and pages ending in .html, while adding the trailing slash to urls that don't have them:

### Enforce trailing slash for urls missing one, except page numbers and .html
RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_URI} !\.[a-z0-9]*$ [NC]
RewriteCond %{REQUEST_URI} !(.*)/\d*$
RewriteRule ^(.*)$ http://example.com/$1/ [QSA,L,R=301]


Thanks for the help all working now!

robzilla

9:52 am on Nov 13, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Right, sorry, I unthinkingly copied the RewriteCond from your last post, which was your original condition, when I should have copied the corrected one from my first post here. Can't edit it anymore, unfortunately, but I'm glad you caught that yourself :-)

lucy24

9:15 pm on Nov 16, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Tangentially... If your URLs never contain literal periods (hostname and extension don't count), you can trim this down a good deal by expressing the rule as something like
RewriteRule ^([^.]+[^/.])$ http://example.com/$1/ [QSA,L,R=301]

with no conditions except the one about URLs ending in /\d+ (slash plus numbers). Notably you can get rid of the cpu-greedy !f test, because real files in practice will always contain a period. (They're not required to, but who in their right mind would use extensionless names for real, physical files?)

Though periods in URLs are perfectly legal, many things become vastly easier if you don't use them. That's assuming your name is not apache.org or similar.

robzilla

10:27 pm on Nov 16, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Good point about the unnecessary file check, and the ^(.*)$ rule does seem a bit silly now.

RewriteCond %{REQUEST_URI} !.*/\d*$
RewriteRule ^([^\.]+)$ http://example.com/$1/ [QSA,L,R=301]

I took the liberty of escaping your periods, and removed the second clause(?) from the RewriteRule referring to a period or forward slash at the end of the URL because the RewriteCond already checks the forward slash and who in their right mind would put a period at the end of a URL? ;-)

lucy24

7:18 pm on Nov 17, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Periods inside grouping brackets don't need to be escaped :) (neither do parentheses, and possibly a couple of other things that I've gone blank on* at the moment) though the escape isn't actively harmful. Otherwise you'd be saying "a group of anything" which is obviously pointless.

The final [^./] is essential because a forward-slash meets the condition [^.] and the rule is intended to match only requests that don't already end in a slash. Never put something in a condition that can go in the body of a rule (because it saves the server from having to evaluate conditions in the first place). Well, almost never.

It later occurred to me that if pagination is the only situation where your URLs would ever end in a numerial, you could then eliminate conditions entirely (yay!) by expressing the pattern as
^([^.]+[^./\d])$
But this is only safe if "no final numerals" is reliably true for your whole site all the time.

who in their right mind would put a period at the end of a URL? ;-)

Nobody, I hope ;) but the issue here is about ending requests with a period. If they do, it's a garbage request that's best handled as-is rather than being gratuitously redirected. The likeliest scenario is when someone else's site auto-generates links from quoted URLs, and the whole thing comes at the end of a sentence, whose final . is then included in the <a href> blahblah--as can happen on this very site. If this happens a lot, you're better off expressing the pattern as ^([^.]+[^./])\.?$ But that takes us into the realm of coding for every conceivable bad request that could ever happen anywhere, which most people don't need to do.


* I first wrote "that escaped me" but fortunately the brain stepped in to intervene.

robzilla

8:14 pm on Nov 17, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Periods inside grouping brackets don't need to be escaped

Didn't know that, but makes sense. Saves me from adding unnecessary complexity!

Never put something in a condition that can go in the body of a rule (because it saves the server from having to evaluate conditions in the first place).

If you don't use a RewriteCond for a RewriteRule, won't the server still have to evaluate the conditions of the RewriteRule (that you would otherwise put in a RewriteCond) for each request? What's the difference?

What I think you propose (and assuming "no final numerals" whatsoever is not reliably true):
RewriteCond %{REQUEST_URI} !.*/\d+$
RewriteRule ^([^.]+[^/.])$ http://example.com/$1/ [QSA,L,R=301]

What I propose:
RewriteCond %{REQUEST_URI} !.*/\d*$
RewriteRule ^([^.]+)$ http://example.com/$1/ [QSA,L,R=301]

The latter makes more sense to me because the RewriteRule will not be evaluated for requests ending in a forward slash; i.e. it's more restrictive in what it passes on to RewriteRule processing.

lucy24

6:20 pm on Nov 18, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



What's the difference?

One evaluation vs. two or more. No matter what, the server always has to check the body of the rule. If the pattern given in the body doesn't match the request, then the server doesn't even look at conditions, thanks to mod_rewrite's unique and distinctive* "two steps forward, one step back" structure.

The main exception is when you've got a complicated capture like
((blahblah/)*somespecificblahblah)$
especially when coupled with one or more conditions that will usually fail. In this case, it may be less server-intensive to express the rule as "somespecificblahblah$" and defer the capture to the last condition, making it %1 instead of $1 in the target. That's especially true if the whole thing is happening in htaccess, where Regular Expressions have to be recompiled on every request. (I actually use this pattern for index redirects, so the server doesn't have to waste time on captures that will end up being thrown away.) I can't remember if the original query in this thread was htaccess or config.

the RewriteRule will not be evaluated for requests ending in a forward slash

Why not? Where were the slashes excluded?

Incidentally ... initial .* or .+ is only necessary when there's to be a capture. So it's sufficient to say
RewriteCond %{REQUEST_URI} !/\d+$

unless the site has URLs in the form "example.com/123" and-that's-all. And if it did, you'd have to say . (better yet [^/] to exclude erroneous double-slashes) rather than .* in the condition.

Oh, and finally: The flag [QSA] is only needed when the target includes a query string. Otherwise QSA is the default behavior.


* Euphemism for PITA.

robzilla

11:25 pm on Nov 18, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



That's interesting. I assumed the RewriteCond served as a kind of if() statement. If these conditions are met, (only) then consider this RewriteRule. I would be inclined to think the most selective condition ought to be applied first, but I recognize the risk of unnecessary repetition.

Why not? Where were the slashes excluded?

In !/\d*$ (that's a regex, not a curse word), i.e. do not match requests ending in a trailing slash or a trailing slash followed by any number of digits, but that was assuming the conditions would be processed top-to-bottom.

lucy24

7:43 pm on Nov 19, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I assumed the RewriteCond served as a kind of if() statement. If these conditions are met, (only) then consider this RewriteRule.

Well, that's what any reasonable person would assume ;) But it happens not to be the way mod_rewrite works. (Another place you see the "two steps forward, one step back" is in inheritance from one directory/htaccess to another. Rules from higher/earlier directories are only invoked if not superseded by rules from later/deeper directories.)

Edit: That's assuming 2.2. In 2.4 there's also an InheritBefore option. But "after" is still the default.

In !/\d*$ (that's a regex, not a curse word),

:: insert witticism about lack of clear difference between the two ::

Oh, I see, that's why the condition said \d* rather than \d+. D'oh! But if you put the non-slash part into the body of the rule, it's still more expeditious.

robzilla

8:45 pm on Nov 19, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



But if you put the non-slash part into the body of the rule, it's still more expeditious.

Fair enough.

Good talk! ;-)