Welcome to WebmasterWorld Guest from 18.207.132.114

Forum Moderators: Ocean10000 & phranque

Message Too Old, No Replies

htaccess filesmatch - can't figure out expression

htaccess regex filesmatch noindex nofollow

     
7:47 pm on Apr 22, 2015 (gmt 0)

Preferred Member

10+ Year Member

joined:May 21, 2004
posts: 385
votes: 5


noindex and nofollow on all php, htm, and html files :

<FilesMatch "\.(php|html?$">
<IfModule mod_headers.c>
Header set X-Robots-Tag "noindex, nofollow"
</IfModule>
</FilesMatch>


,BUT how can I exclude /index.php and /forgot.php from my public_html folder only?

I came up with something as so in regexr.com, but I don't think the domain is used in filesmatch... so I am at a loss how to accomplish this.

(?!www.mysite.com\/((index\.php)|(forgot\.php)))(^.*\.(php|html?))$


Reason for this is I found google was indexing files which I do not want it to... aside from the two I want to exclude... as this is a user login type site. Password protecting the directory is not an option.
9:33 pm on Apr 22, 2015 (gmt 0)

Full Member

Top Contributors Of The Month

joined:Apr 11, 2015
posts: 328
votes: 24


What you could do is set an environment variable if one of your "excluded" files is requested (using mod_rewrite) and only set the header if that environment variable is not set.

For example:


RewriteRule ^(index|forgot)\.php$ - [E=exclude:1]
<FilesMatch "\.(php|html?)$">
Header set X-Robots-Tag "noindex, nofollow" env=!exclude
</FilesMatch>


(Incidentally, you were missing a closing parenthesis on your FilesMatch regex and I don't think you really need the IfModule test?)

[edited by: whitespace at 10:24 pm (utc) on Apr 22, 2015]

10:02 pm on Apr 22, 2015 (gmt 0)

Preferred Member

10+ Year Member

joined:May 21, 2004
posts: 385
votes: 5


I was just looking at the environment stuff before dinner... looks like a great solution! Now, that rewrite rule would be :

example.com/index.php and example.com/forgot.php online correct? Not, example.com/test/index.php?

I have always added the IfModule out of habit, but yeah I guess you are right in that it would not be needed regardless if the module is loaded or not.
10:44 pm on Apr 22, 2015 (gmt 0)

Full Member

Top Contributors Of The Month

joined:Apr 11, 2015
posts: 328
votes: 24


example.com/index.php and example.com/forgot.php online correct? Not, example.com/test/index.php?


Yes, providing these rules are in the .htaccess file in your document root. The ^ indicates the start of the string/URL. (Note that it doesn't start with a slash in .htaccess files.)

(BTW I Just tidied that RewriteRule regex slightly and took out the repeating ".php" from the group.)
10:59 pm on Apr 22, 2015 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15944
votes: 890


RewriteRule ^(index|forgot)\.php$ - [E=exclude:1]

Is there a solid reason for using mod_rewrite here? mod_setenvif executes just as early, and you don't have to deal with inheritance:
SetEnvIf Request_URI (index|forgot)\.php exclude

<IfModule> is generally a hallmark of CMS boilerplate. There are rare situations where you might need it-- "If module A exists, then do this stuff involving module B"-- but even then it's only applicable if your server changes so often, you can't keep track. If you haven't got module B in the first place, then its directives will generally be ignored anyway.
11:41 pm on Apr 22, 2015 (gmt 0)

Full Member

Top Contributors Of The Month

joined:Apr 11, 2015
posts: 328
votes: 24


Is there a solid reason for using mod_rewrite here?


Good point; no solid reason, just... habit (that ol' chestnut!).

When using "SetEnvIf Request_URI" would you not need to prefix the pattern with ^/ to only match files in the document root?
3:22 am on Apr 23, 2015 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15944
votes: 890


would you not need to prefix the pattern with ^/ to only match files in the document root?

Oh, er, yes, if the same filename can occur elsewhere. I shouldn't think there would be more than one /forgot.xtn, but there are presumably lots of index.php. Equally important, if you use an opening anchor the server can test more efficiently: "First character neither f nor i? OK, we're outta here."
4:11 am on Apr 23, 2015 (gmt 0)

Preferred Member

10+ Year Member

joined:May 21, 2004
posts: 385
votes: 5


Thanks everyone! I never really took the time to learn these as I don't use them often, but learning more each time.

whitespace - your method works exactly as I intended it to.

lucy24 - I tried yours and had some issues getting it to work properly. I ended up tweaking yours to :

SetEnvIf Request_URI ^/((index|forgot)\.php)?$ exclude


I found that without the ^/ it was not working properly then I also realized that it was not working for the index because it would be rewritten to domain.com. I am not as knowledgeable about these as you guys, but what I listed above works. The initial slash then making the filename optional. So, it would be true for /, /index.php, or /forgot.php... or at least that is what I was going for. I did some test and it seems to work this way.
6:50 am on Apr 23, 2015 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15944
votes: 890


it was not working for the index because it would be rewritten to domain.com

D'oh! Or rather, the other way around: requests for index.php get redirected to / alone, after which there's an internal request (probably mod_dir, not mod_rewrite) for the file index.php. So the pattern might really be
SetEnvIf Request_URI ^/(forgot\.php)?$ exclude
if you're looking at the user request rather than the physical file served.

In some situations, such as setting expiration times, what matters is the real, physical file, not what the user "thinks" they're getting. If I remember, I will do some experimenting later and see which version applies when setting environmental variables-- the user request or the physical file.
8:50 pm on Apr 23, 2015 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15944
votes: 890


Follow-up: I spent some time playing around on my test site. For mod_setenvif purposes, Request_URI means the currently requested physical file. So that's
SetEnvIf Request_URI ^/(index|forgot)\.php$ exclude
2:56 am on Apr 24, 2015 (gmt 0)

Preferred Member

10+ Year Member

joined:May 21, 2004
posts: 385
votes: 5


Thanks again!