homepage Welcome to WebmasterWorld Guest from 54.145.183.169
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
.htaccess block and redirect
Blekfis



 
Msg#: 4615280 posted 8:45 am on Oct 8, 2013 (gmt 0)

If .htaccess looks like this:

SetEnvIfNoCase User-Agent .*rogerbot.* bad_bot

<Limit GET POST HEAD>
Order Allow,Deny
Allow from all
Deny from env=bad_bot
</Limit>

Redirect 301 / http://domain.com


Will rogerbot read/see the whole file or stop at </Limit>?

 

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4615280 posted 12:15 pm on Oct 8, 2013 (gmt 0)

The very basic understanding of SetEnvIf requires the use of anchors

Begins with (^)
Ends with ($)
Contains ( )
Exactly as ("")

Using wildcards will produce less than desired results and may not function as you intended at all.

You'll also need to add Error docs, or else a loop will be created.

Blekfis



 
Msg#: 4615280 posted 12:55 pm on Oct 8, 2013 (gmt 0)

Just copied a bit of code I found = Apache-noob ;)

Would this better do what I'm looking for?


RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^.*rogerbot.*$ [NC,OR]
RewriteRule ^.*.* http://www.googlehammer.com/ [L]

Redirect 301 / http://domain.com



I'd like the bot(s) not to see the 301, is it even possible with just the htaccess..?

[edited by: phranque at 8:37 am (utc) on Oct 10, 2013]
[edit reason] unlinked urls [/edit]

penders

WebmasterWorld Senior Member penders us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 4615280 posted 2:07 pm on Oct 8, 2013 (gmt 0)

What exactly are you trying to do? Are you wanting to redirect this bot or simply block it?

Regarding your regex, you have a lot of superfluous...
RewriteCond %{HTTP_USER_AGENT} rogerbot [NC]
RewriteRule .* http://example.com/ [R=301,L]


Assuming example.com is another domain then that will be an external redirect, as opposed to an internal rewrite (as suggested by your code).

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4615280 posted 3:43 pm on Oct 8, 2013 (gmt 0)

RewriteCond %{HTTP_USER_AGENT} ^.*rogerbot.*$ [NC,OR]


NO. eliminat all .*

Begins with:
RewriteCond %{HTTP_USER_AGENT} ^rogerbot [NC,OR]

Ends with:
RewriteCond %{HTTP_USER_AGENT} rogerbot$ [NC,OR]

Contains:
RewriteCond %{HTTP_USER_AGENT} rogerbot [NC,OR]

Exactly as:
RewriteCond %{HTTP_USER_AGENT} "rogerbot" [NC,OR]

Fundamental anchors.

Same anchors used with SetEnvIf.

penders

WebmasterWorld Senior Member penders us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 4615280 posted 6:49 pm on Oct 8, 2013 (gmt 0)

Exactly as:
RewriteCond %{HTTP_USER_AGENT} "rogerbot" [NC,OR]


Surrounding the CondPattern in double quotes does not result in an exact match, in this example the double quotes are superfluous and it will search for rogerbot anywhere in the string.

For an exact match, prefix the CondPattern with = (equals)
RewriteCond %{HTTP_USER_AGENT} =rogerbot [NC]

Or use start and end anchors (although the CondPattern is still a regex):
RewriteCond %{HTTP_USER_AGENT} ^rogerbot$ [NC]
Blekfis



 
Msg#: 4615280 posted 8:18 pm on Oct 8, 2013 (gmt 0)

I want to block certain bots so they don't see the 301 redirect.

Is it possible to do this with just htaccess or do I need to do a combo of htaccess to block bots and index.php to redirect visitors and allowed bots?

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4615280 posted 9:45 pm on Oct 8, 2013 (gmt 0)

Will rogerbot read/see the whole file or stop at </Limit>?

Anything inside an envelope-- whether it's <Limit>, <Files(Match)> or (in config files) <Directory> --supersedes anything outside the envelope.

An htaccess file isn't read sequentially from top to bottom. Each module reads its own sections, followed by the core, and within those categories, anything inside an envelope is evaluated after anything lying around loose.

A common example:
<Files robots.txt>
Order allow,deny
Allow from all
</Files>

It doesn't matter whether you put this section before, after, or smack in the middle of other authorization directives. It will always override them.

In mod_setenvif, you can use quotation marks to "protect" literal spaces in a user-agent string. They're not useful or necessary for anything else I can think of.

When more than one directive could apply to a request-- for example a redirect issued by mod_rewrite followed by a flat-out denial issued by mod_authz-whatever --no response is sent out until all modules have had their chance. A 403 issued by one mod will override a 301 issued by another mod, regardless of which one is evaluated first.

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4615280 posted 10:35 pm on Oct 8, 2013 (gmt 0)

In blocking the bots they will see a 403 status code if you use the "deny from" syntax.

If you don't want them to "see" the redirect, what do you want them to see instead?

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4615280 posted 2:14 am on Oct 9, 2013 (gmt 0)

Exactly as:
RewriteCond %{HTTP_USER_AGENT} "rogerbot" [NC,OR]


Surrounding the CondPattern in double quotes does not result in an exact match,


nonsense.

in this example the double quotes are superfluous and it will search for rogerbot anywhere in the string.


agreed

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4615280 posted 2:18 am on Oct 9, 2013 (gmt 0)

In mod_setenvif, you can use quotation marks to "protect" literal spaces in a user-agent string. They're not useful or necessary for anything else I can think of.


"rogerbot 1.6.2"

or any other longer string

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4615280 posted 3:41 am on Oct 9, 2013 (gmt 0)

Are you translating or disagreeing? ;)

Blekfis



 
Msg#: 4615280 posted 6:40 am on Oct 9, 2013 (gmt 0)

If you don't want them to "see" the redirect, what do you want them to see instead?


...doesn't matter, I just don't want them to see the redirect...

This is done as a SEO-test where I need to have some bots not see the redirect, just not sure what the best way would be to do so...

penders

WebmasterWorld Senior Member penders us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 4615280 posted 8:29 am on Oct 9, 2013 (gmt 0)

Surrounding the CondPattern in double quotes does not result in an exact match,


nonsense.


@wilderness: Why "nonsense"? You agreed to the second part, "it will search for rogerbot anywhere in the string" - which isn't an exact match. Agreeing to one and not the other would seem to be a contradiction?

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4615280 posted 8:44 am on Oct 9, 2013 (gmt 0)

Agreeing to one and not the other would seem to be a contradiction?


Life is a contradiction, while Apache and regex are beyond multiple life's.
I gave some valid examples in the preliminary use of anchors and you chose to pick my examples apart, rather that assist the OP.

Are you translating or disagreeing?


Hey Lucy,
Disagreeing. There are applications of exactly as far beyond blank spaces. Their just not common.

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4615280 posted 9:26 am on Oct 9, 2013 (gmt 0)

But, but, but

:: splutter ::

This
"rogerbot 1.6.2"

or any other longer string

seems to be illustrating exactly what I meant. The UA string contains literal spaces. If you don't want to escape them you have to put the whole thing into quotation marks to prevent the space from taking on semantic meaning. You should also, ahem, escape the literal periods. Quotation marks don't turn off Regular Expressions.

There are applications of exactly as far beyond blank spaces.

I think your cat stepped on the keyboard.

...doesn't matter, I just don't want them to see the redirect...

They have to see something when they request the page. If you don't like or trust them, why don't you simply block them?

Edit:
Back to OP:
SetEnvIfNoCase User-Agent .*rogerbot.* bad_bot

If you're neither capturing nor anchoring, the formulation .* is never necessary. It simply means "there may or may not be more stuff here".

^.*rogerbot.*$ = ^.*rogerbot = rogerbot.*$ = rogerbot

Would this better do what I'm looking for?

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^.*rogerbot.*$ [NC,OR]
RewriteRule ^.*.* http://www.example.com/ [L]

Redirect 301 / http://example.com

NOOOO. If it weren't 2:30 AM, I would go into detail. Count your blessings.

Oh yes and: Read the Forums rules about using "example.com". Look at your own post and you'll understand why it's doubly important in the Apache subforum.

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4615280 posted 9:46 am on Oct 9, 2013 (gmt 0)

seems to be illustrating exactly what I meant. The UA string contains literal spaces. If you don't want to escape them you have to put the whole thing into quotation marks to prevent the space from taking on semantic meaning. You should also, ahem, escape the literal periods. Quotation marks don't turn off Regular Expressions.


lucy,
Jim and I had this disagreement many times (i. e., the use of quotes and exactly as) in the last years of his participation here.
This goes all the way back to the earliest versions of Apache (I've still lines in place from those earlier days) and they remain functional in the most current versions.

Jim kept quoting the Apache Docs and I kept telling him that in this specific instance the Apache Docs were full of beans (of which there are a few other examples).

penders

WebmasterWorld Senior Member penders us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 4615280 posted 10:00 am on Oct 9, 2013 (gmt 0)

Life is a contradiction...


What?! I was simply correcting a wholly incorrect statement you made in your example... which benefits the OP, you, and everyone else who happens to read this thread.

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4615280 posted 10:07 am on Oct 9, 2013 (gmt 0)



I gave some valid examples in the preliminary use of anchors


I was simply correcting a wholly incorrect statement you made in your example...

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4615280 posted 10:19 pm on Oct 9, 2013 (gmt 0)

Jim and I had this disagreement many times (i. e., the use of quotes and exactly as) in the last years of his participation here.
This goes all the way back to the earliest versions of Apache (I've still lines in place from those earlier days) and they remain functional in the most current versions.

Jim kept quoting the Apache Docs and I kept telling him that in this specific instance the Apache Docs were full of beans (of which there are a few other examples).

I'm sorry, Don, but I don't understand what you are saying. Specifically I don't understand what you're disagreeing with. And I don't see where "exactly as" enters into it at all, since I didn't say anything about that.

What _I_ said was: If the user-agent string-- or whatever other string you're testing in mod_setenvif-- contains literal spaces, one way to preserve those spaces is to put the test string in quotation marks. If you don't use quotation marks, the spaces acquire their usual semantic meaning.

BrowserMatch "rogerbot 1.6.2" keep_out
= If the UA string contains the element "rogerbot 1x6x2" then set the variable "keep_out" to its default value (1, or "true", or whatever it is)

BrowserMatch rogerbot 1.6.2 keep_out
= If the UA string contains the element "rogerbot" then set two variables, "1.6.2" and "keep_out"

Quotation marks don't cancel regular expressions and they don't create anchors.

BrowserMatch "Camino/2.1.2 (like"
= 500 error due to mismatched parenthesis

BrowserMatch "Camino/2.1.2 \(like"
= I am blocked

BrowserMatch "Camino/2...2"
= I am blocked

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4615280 posted 11:59 am on Oct 11, 2013 (gmt 0)

lucy,
Despite your prolific use of BrowserMatch, it's a lame tool and overall provides less than desired results.
Adding that example and/or application to this thread merely confuses matters more.

User-Agent
or
%{HTTP_USER_AGENT}

are much more effective and focused.

Once again, there are effective uses for exactly as ("")

In any event, I've violated my commitment to discontinue posting in the Apache Forum and fear that your just egging me on to jerk my chain ;)

Don

Blekfis



 
Msg#: 4615280 posted 6:23 am on Oct 21, 2013 (gmt 0)

Bumping this since I doesn't feel I really got the answer.

In short I want to 301 a page/site and block certain bots from seeing this 301. Can this be done with just htacess or do I need to make it a combo with index.php

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4615280 posted 8:14 am on Oct 21, 2013 (gmt 0)

do I need to make it a combo with index.php

Say what now?

A visitor-- whether human or robot-- doesn't see any headers until all Apache mods have done their stuff. If any of those mods issues a lockout, the 403 is all the robot will ever see.

THIS >> "Sorry, you're not wanted here"

NOT THIS >> "Sorry, you're not wanted, but if I had let you in I would have sent you to otherpage.html instead".

Throughout this thread your wording has been a little bit odd. So the lack of an unambiguous answer is because it's not 100% clear that you are, in fact, blocking the robot. If you're blocking it by User-Agent or IP or simply because you don't like its face, your job is done. A blocked request will not see any redirects arising from the same request.

Blekfis



 
Msg#: 4615280 posted 9:57 am on Oct 21, 2013 (gmt 0)

Ok, then "mission accomplished" ;)

As I mentioned earlier, this is just a small SEO-test so no need for these bots to see a 403

Thanks!

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4615280 posted 9:57 pm on Oct 22, 2013 (gmt 0)

They'll see "something".

What do you want that something to be?

Blekfis



 
Msg#: 4615280 posted 4:05 am on Oct 23, 2013 (gmt 0)

Nothing ;)

I want to stop certain bots from crawling the page and see outbound links

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4615280 posted 7:40 am on Oct 23, 2013 (gmt 0)

Ah, we're quibbling over the definition of "something". A blocked robot will see the 403 response. It may-- if it so chooses-- see the content of the 403 page that your server obligingly sent out. That's assuming the 403 was issued by the server in the first place.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved