Forum Moderators: phranque

Message Too Old, No Replies

Excluding directories and files from anything

         

csdude55

6:56 pm on Apr 9, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The logic here seems right to me, but I wanted to double check...

Am I correct that I could list directories to be excluded from all RewriteRules at the top of my .htaccess (after RewriteEngine on, of course), and then not have to have a RewriteCond later to exclude them? Eg,

# I never want anything to match example.com/includes, etc
RewriteCond %{REQUEST_URI} ^/(includes|images|cgi-bin)/
RewriteRule ^ - [L]

lucy24

8:18 pm on Apr 9, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yes, that is conceptually appropriate, provided you
(a) don’t need to apply mod_rewrite-based access controls to any of these directories (or you place this new rule after all the [F] rules, before the [R] rules)
and
(b) want to exempt all these directories from canonicalization redirects

I say “conceptually appropriate” because I really don’t see the need for a RewriteCond at all. It could just as well be
RewriteRule ^(includes|images|cgi-bin)/ - [L]
listing the three directories in order of frequency of access if there's a significant difference among the three.

Edit: Is it too late to rename your /includes/ directory? One of my abiding regrets is that I didn’t think to give it some weird one-off name that malign robots would never request as a matter of routine.

csdude55

5:35 am on Apr 10, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Good point on the RewriteCond! I guess I really only need those when I'm using a negative, [OR], or need to test something other than REQUEST_URI.

Is it too late to rename your /includes/ directory?

Nope, not too late at all. Thanks for the tip, that never even crossed my mind!

w3dk

7:54 am on Apr 10, 2020 (gmt 0)

10+ Year Member Top Contributors Of The Month



Good point on the RewriteCond! I guess I really only need those when I'm using a negative, ....


You can use "a negative" with the RewriteRule pattern as well.

csdude55

8:25 pm on Apr 10, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I was thinking more along these lines:

RewriteCond %{REQUEST_URI} !^/example/(foo|bar) [NC]
RewriteRule ^example/([a-z-]+)/?([a-z-]+)?/?$ /example/list.php?var1=$1&var2=$2 [NC,QSA,NE,L]

I'm not sure how to write it without the RewriteCond... this is what I originally thought would work, but doesn't match:

RewriteRule ^example/(!(foo|bar))/?([a-z-]+)?/?$ /example/list.php?var1=$1&var2=$2 [NC,QSA,NE,L]

After giving it some thought, though, this almost works:

RewriteRule ^example/(?!foo|bar)/?([a-z-]+)?/?$ /example/list.php?var1=$1&var2=$2 [NC,QSA,NE,L]

This captures $1 the way I'm expecting, but not $2. So this:

blah.com/example/foo

rewrites as expected, to:

blah.com/example/list.php?var1=foo

But this:

blah.com/example/foo/blip

doesn't match at all. Where'm I messing up?

I guess that's not "exactly" the same, anyway... the original would only match $1 if the string is a-z or -, while the second would also match "12345".

lucy24

9:36 pm on Apr 10, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If you’re capturing AND you need a negative, then yes, a RewriteCond is definitely warranted. But in the OP it looked like a flat-out [L], target - (nothing), take-no-more-action.

RewriteRule ^example/(?!foo|bar)/?([a-z-]+)?/?$ /example/list.php?var1=$1&var2=$2 [NC,QSA,NE,L]

This captures $1 the way I'm expecting, but not $2. So this:
That's because, in the rule as written, there is no $2. A lookahead (or lookbehind) doesn’t capture. At this point it's probably a little less confusing if you start by writing the rule without thinking of the exceptions, and then once you've got the rule formulated properly--right now it’s got altogether too many ? for my taste--you can then reinsert the (?!foo|bar) lookahead if desired.

Will this rule ultimately, when everything is sorted out, be in htaccess or config? It may make a difference in speed and efficiency, since Regular Expressions in htaccess have to be compiled over again on every single request, while in config the server learns them and remembers them.

csdude55

10:37 pm on Apr 10, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



But in the OP it looked like a flat-out [L], target - (nothing), take-no-more-action.

THAT part is just a target -, yes. But w3dk replied to my later comment that I would need a RewriteCond for a negative match... so the discussion evolved a little and is now more or less off topic :-(

That's because, in the rule as written, there is no $2. A lookahead (or lookbehind) doesn’t capture.

After much poking and prodding, I've found this to work in the tester:

RewriteRule ^example/(?!foo|bar)/?([a-z-]+)?/? /example/list.php?cat=$1&subcat=$3 [NC,QSA,NE,L]

I'm confused by it because (a) I thought it would match with (?!(foo|bar)) but not (?!foo|bar); and (b) why $3 instead of $2?

Will this rule ultimately, when everything is sorted out, be in htaccess or config? It may make a difference in speed and efficiency, since Regular Expressions in htaccess have to be compiled over again on every single request, while in config the server learns them and remembers them.

Right now I'm running my server in the red all day long, and my storage is at around 96%! So as soon as I get the rebuild finished I plan to set up a new server. Once I do that I'll put it in config, but for now it'll be in the htaccess. I'm hoping that it will be a mostly seamless copy-and-paste to the config, with the exception of adding the / to the beginning of the rules.

lucy24

1:30 am on Apr 11, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



This isn't the same tester that thinks mod_rewrite behaves like mod_alias, is it?

I would test it by making it into a redirect--no need for [R=301] flag, just put the full protocol-and-domain in the target--and see what you get in the address bar. (In this situation, a temporary redirect is what you'd want anyway, to make sure the browser doesn't remember it.)

From my test site, using rule
RewriteRule ^example/(?!foo|bar)/?([a-z-]+)?/? https://www.example.com/example/list.php?cat=$1&subcat=$3 [NC,QSA,NE,L]

Request for /example/foolist >> rule does not deploy (lookahead matches)

Request for /example/otherlist
>>
long pause ending up in browser error because there's nothing to prevent the request from getting redirected over and over. If this were instead done as an internal rewrite, there would be a server error on essentially the same grounds, except it would happen faster.

The obvious cause is that the pattern in the rule does not have a $ closing anchor, making it match all requests all the time. With the closing anchor
Request for /example/otherlist
>>
https://www.example.com/example/list.php?cat=otherlist&subcat=

Incidentally, why is it
([a-z-]+)?
rather than
([a-z-]*)
? Seems like they would be identical, at a savings of one byte ;)

There remains the issue of
RewriteRule ^example/(?!foo|bar)/?([a-z-]+)?/?
which, omitting the lookahead, has the pattern
^example//?([a-z-]+)?/?
in other words, an optional second slash--which wouldn't be recognized in a pattern anyway, only in a RewriteCond. (I know this because I once made a mistake in a link, and had to figure out how to fix it because the Googlebot discovered the erroneous URLs before I did.)

Along the way, I also learned that one effect of [QSA] is that the server will happily append multiple parameters with the same name. This is probably not a problem in real life.

csdude55

7:37 pm on Apr 11, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



This isn't the same tester that thinks mod_rewrite behaves like mod_alias, is it?

Haha, yeah... do you know a better tester?

You can see what I tested with, though:

[htaccess.madewithlove.be?share=b3eed0af-c3be-5a14-8306-da20953cb677...]

The obvious cause is that the pattern in the rule does not have a $ closing anchor, making it match all requests all the time.

I'm not 100% on this, but I think that the lack of a $ was the problem (or "one" problem) that was causing the infinite loop problem I had before. But that stemmed from my misconception that this:

RewriteRule ^foo /bar [L]

would effectively rewrite as:

example.com/foo
=> example.com/bar

example.com/foo/blah.php
=> example.com/bar/blah.php

example.com/foo/blah.php?var=whatever
=> example.com/bar/blah.php?var=whatever

Incidentally, why is it
([a-z-]+)?
rather than
([a-z-]*)
? Seems like they would be identical, at a savings of one byte ;)

Haha, just a leftover from one of my attempts that didn't include the ? :-P Good catch, eagle eye!

lucy24

8:15 pm on Apr 11, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



do you know a better tester?
Well, yes: Your own site. There’s no substitute for trying things out on your own server, with your own configuration, your own filenames. I maintain a test site for just this reason: for the cost of a domain-name registration, I can experiment fully and at length, with no risk of harming any “real” site.

w3dk

12:01 am on Apr 12, 2020 (gmt 0)

10+ Year Member Top Contributors Of The Month



...and (b) why $3 instead of $2?


Bug in the tester. However, it does "work in the tester" with $2 as well? (As it does with $6, $7 and $8!) HOWEVER, this is just highlighting more bugs with the tester. $2 to $9 should always be empty, as you only have at most 1 capturing group in the preceding regex.

EDIT: It looks like lucy24 already covered this in an earlier post.

csdude55

4:20 am on Apr 12, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Well, yes: Your own site. There’s no substitute for trying things out on your own server, with your own configuration, your own filenames. I maintain a test site for just this reason: for the cost of a domain-name registration, I can experiment fully and at length, with no risk of harming any “real” site.

There are two reasons why I hate doing that:

1. I don't always trust the results. For example, with the redirect issue I was having earlier (where I expected a rewrite), even after I uploaded what I thought was a fix, it didn't work. But I found that if I added a dummy variable to the address then it worked fine. For whatever reason, my browser had cached the R=301, so it was still doing it without even looking at the htaccess... but sending the dummy variable made it refresh.

So whenever I upload a new htaccess, I have to add ?z=1, z=2, z=3, etc, just to make sure I'm getting the real results.

2. I'm sure most of you code directly via SSH, but I still struggle with that. So for me, I code in Notepad++, then upload the file to the server via FTP. This is normally fine, but for whatever reason my computer won't save the file .htaccess because of the opening .

Instead, I end up coding in Notepad++, copy the entire file, then go to Filezilla, right-click .htaccess, View/Edit, wait for it to open, select all, paste the new, then save, then upload. Then go back to the browser, refresh, get an ISE, then back to Notepad++ to see what's up.

It would be a hundred times easier if there was a well-functioning site where I could just copy in the code, click a button, and it tells me what errors are there. Like this, but for Apache:

[sandbox.onlinephpfunctions.com...]

not2easy

4:35 am on Apr 12, 2020 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



There are options on most OS and apps to show and/or save "hidden" files or "system" files such as .htaccess. I'd check the options in Notepad++ because I am certain I was able to work on htaccess files from within that free app on Windows machines and use FTP to replace the previous version. (Saving a copy just in case, of course. )

lucy24

5:16 am on Apr 12, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



For whatever reason, my browser had cached the R=301
Yes, browsers cache permanent redirects. They’re supposed to. That’s why experiments should be done with a temporary redirect (R=302, R alone, or no flag at all).

I'm sure most of you code directly via SSH
Nope, not me. I do all my text editing, for all purposes--including HTML--in SubEthaEdit. htaccess files are saved locally with names like htaccess_sitename, and then get renamed .htaccess with leading dot when uploading. Locally I do have a file that is actually called .htaccess, one for each site, but that's only for use with MAMP; it doesn’t have all the content of my actual htaccess file, since it obviously doesn’t need things like access control.

As not2easy says, there are platform-specific variations. I do uploading/downloading via Fetch (SFTP), which has an option for making .files visible, and includes an Edit command where you specify what application to use. So it’s SubEthaEdit either way. But the only .htaccess I edit dynamically in this way is the one for my test site; for everything else there's a permanent copy on my HD that is identical to the real one in everything but name.