Welcome to WebmasterWorld Guest from 54.145.55.135

Forum Moderators: Ocean10000 & incrediBILL & phranque

Message Too Old, No Replies

.htaccess block and redirect

     
8:45 am on Oct 8, 2013 (gmt 0)

New User

joined:Oct 8, 2013
posts:7
votes: 0


If .htaccess looks like this:

SetEnvIfNoCase User-Agent .*rogerbot.* bad_bot

<Limit GET POST HEAD>
Order Allow,Deny
Allow from all
Deny from env=bad_bot
</Limit>

Redirect 301 / http://domain.com


Will rogerbot read/see the whole file or stop at </Limit>?
12:15 pm on Oct 8, 2013 (gmt 0)

Senior Member

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2001
posts:5459
votes: 3


The very basic understanding of SetEnvIf requires the use of anchors

Begins with (^)
Ends with ($)
Contains ( )
Exactly as ("")

Using wildcards will produce less than desired results and may not function as you intended at all.

You'll also need to add Error docs, or else a loop will be created.
12:55 pm on Oct 8, 2013 (gmt 0)

New User

joined:Oct 8, 2013
posts:7
votes: 0


Just copied a bit of code I found = Apache-noob ;)

Would this better do what I'm looking for?


RewriteEngine On 
RewriteCond %{HTTP_USER_AGENT} ^.*rogerbot.*$ [NC,OR]
RewriteRule ^.*.* http://www.googlehammer.com/ [L]

Redirect 301 / http://domain.com



I'd like the bot(s) not to see the 301, is it even possible with just the htaccess..?

[edited by: phranque at 8:37 am (utc) on Oct 10, 2013]
[edit reason] unlinked urls [/edit]

2:07 pm on Oct 8, 2013 (gmt 0)

Senior Member

WebmasterWorld Senior Member penders is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2006
posts: 3123
votes: 0


What exactly are you trying to do? Are you wanting to redirect this bot or simply block it?

Regarding your regex, you have a lot of superfluous...
RewriteCond %{HTTP_USER_AGENT} rogerbot [NC] 
RewriteRule .* http://example.com/ [R=301,L]


Assuming example.com is another domain then that will be an external redirect, as opposed to an internal rewrite (as suggested by your code).
3:43 pm on Oct 8, 2013 (gmt 0)

Senior Member

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2001
posts:5459
votes: 3


RewriteCond %{HTTP_USER_AGENT} ^.*rogerbot.*$ [NC,OR]


NO. eliminat all .*

Begins with:
RewriteCond %{HTTP_USER_AGENT} ^rogerbot [NC,OR]

Ends with:
RewriteCond %{HTTP_USER_AGENT} rogerbot$ [NC,OR]

Contains:
RewriteCond %{HTTP_USER_AGENT} rogerbot [NC,OR]

Exactly as:
RewriteCond %{HTTP_USER_AGENT} "rogerbot" [NC,OR]

Fundamental anchors.

Same anchors used with SetEnvIf.
6:49 pm on Oct 8, 2013 (gmt 0)

Senior Member

WebmasterWorld Senior Member penders is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2006
posts: 3123
votes: 0


Exactly as:
RewriteCond %{HTTP_USER_AGENT} "rogerbot" [NC,OR]


Surrounding the CondPattern in double quotes does not result in an exact match, in this example the double quotes are superfluous and it will search for rogerbot anywhere in the string.

For an exact match, prefix the CondPattern with = (equals)
RewriteCond %{HTTP_USER_AGENT} =rogerbot [NC]


Or use start and end anchors (although the CondPattern is still a regex):
RewriteCond %{HTTP_USER_AGENT} ^rogerbot$ [NC]
8:18 pm on Oct 8, 2013 (gmt 0)

New User

joined:Oct 8, 2013
posts: 7
votes: 0


I want to block certain bots so they don't see the 301 redirect.

Is it possible to do this with just htaccess or do I need to do a combo of htaccess to block bots and index.php to redirect visitors and allowed bots?
9:45 pm on Oct 8, 2013 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:13218
votes: 348


Will rogerbot read/see the whole file or stop at </Limit>?

Anything inside an envelope-- whether it's <Limit>, <Files(Match)> or (in config files) <Directory> --supersedes anything outside the envelope.

An htaccess file isn't read sequentially from top to bottom. Each module reads its own sections, followed by the core, and within those categories, anything inside an envelope is evaluated after anything lying around loose.

A common example:
<Files robots.txt>
Order allow,deny
Allow from all
</Files>

It doesn't matter whether you put this section before, after, or smack in the middle of other authorization directives. It will always override them.

In mod_setenvif, you can use quotation marks to "protect" literal spaces in a user-agent string. They're not useful or necessary for anything else I can think of.

When more than one directive could apply to a request-- for example a redirect issued by mod_rewrite followed by a flat-out denial issued by mod_authz-whatever --no response is sent out until all modules have had their chance. A 403 issued by one mod will override a 301 issued by another mod, regardless of which one is evaluated first.
10:35 pm on Oct 8, 2013 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


In blocking the bots they will see a 403 status code if you use the "deny from" syntax.

If you don't want them to "see" the redirect, what do you want them to see instead?
2:14 am on Oct 9, 2013 (gmt 0)

Senior Member

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2001
posts:5459
votes: 3


Exactly as:
RewriteCond %{HTTP_USER_AGENT} "rogerbot" [NC,OR]


Surrounding the CondPattern in double quotes does not result in an exact match,


nonsense.

in this example the double quotes are superfluous and it will search for rogerbot anywhere in the string.


agreed
2:18 am on Oct 9, 2013 (gmt 0)

Senior Member

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2001
posts:5459
votes: 3


In mod_setenvif, you can use quotation marks to "protect" literal spaces in a user-agent string. They're not useful or necessary for anything else I can think of.


"rogerbot 1.6.2"

or any other longer string
3:41 am on Oct 9, 2013 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:13218
votes: 348


Are you translating or disagreeing? ;)
6:40 am on Oct 9, 2013 (gmt 0)

New User

joined:Oct 8, 2013
posts: 7
votes: 0


If you don't want them to "see" the redirect, what do you want them to see instead?


...doesn't matter, I just don't want them to see the redirect...

This is done as a SEO-test where I need to have some bots not see the redirect, just not sure what the best way would be to do so...
8:29 am on Oct 9, 2013 (gmt 0)

Senior Member

WebmasterWorld Senior Member penders is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2006
posts: 3123
votes: 0


Surrounding the CondPattern in double quotes does not result in an exact match,


nonsense.


@wilderness: Why "nonsense"? You agreed to the second part, "it will search for rogerbot anywhere in the string" - which isn't an exact match. Agreeing to one and not the other would seem to be a contradiction?
8:44 am on Oct 9, 2013 (gmt 0)

Senior Member

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2001
posts:5459
votes: 3


Agreeing to one and not the other would seem to be a contradiction?


Life is a contradiction, while Apache and regex are beyond multiple life's.
I gave some valid examples in the preliminary use of anchors and you chose to pick my examples apart, rather that assist the OP.

Are you translating or disagreeing?


Hey Lucy,
Disagreeing. There are applications of exactly as far beyond blank spaces. Their just not common.
9:26 am on Oct 9, 2013 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:13218
votes: 348


But, but, but

:: splutter ::

This
"rogerbot 1.6.2"

or any other longer string

seems to be illustrating exactly what I meant. The UA string contains literal spaces. If you don't want to escape them you have to put the whole thing into quotation marks to prevent the space from taking on semantic meaning. You should also, ahem, escape the literal periods. Quotation marks don't turn off Regular Expressions.

There are applications of exactly as far beyond blank spaces.

I think your cat stepped on the keyboard.

...doesn't matter, I just don't want them to see the redirect...

They have to see something when they request the page. If you don't like or trust them, why don't you simply block them?

Edit:
Back to OP:
SetEnvIfNoCase User-Agent .*rogerbot.* bad_bot

If you're neither capturing nor anchoring, the formulation .* is never necessary. It simply means "there may or may not be more stuff here".

^.*rogerbot.*$ = ^.*rogerbot = rogerbot.*$ = rogerbot

Would this better do what I'm looking for?

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^.*rogerbot.*$ [NC,OR]
RewriteRule ^.*.* http://www.example.com/ [L]

Redirect 301 / http://example.com

NOOOO. If it weren't 2:30 AM, I would go into detail. Count your blessings.

Oh yes and: Read the Forums rules about using "example.com". Look at your own post and you'll understand why it's doubly important in the Apache subforum.
9:46 am on Oct 9, 2013 (gmt 0)

Senior Member

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2001
posts:5459
votes: 3


seems to be illustrating exactly what I meant. The UA string contains literal spaces. If you don't want to escape them you have to put the whole thing into quotation marks to prevent the space from taking on semantic meaning. You should also, ahem, escape the literal periods. Quotation marks don't turn off Regular Expressions.


lucy,
Jim and I had this disagreement many times (i. e., the use of quotes and exactly as) in the last years of his participation here.
This goes all the way back to the earliest versions of Apache (I've still lines in place from those earlier days) and they remain functional in the most current versions.

Jim kept quoting the Apache Docs and I kept telling him that in this specific instance the Apache Docs were full of beans (of which there are a few other examples).
10:00 am on Oct 9, 2013 (gmt 0)

Senior Member

WebmasterWorld Senior Member penders is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2006
posts: 3123
votes: 0


Life is a contradiction...


What?! I was simply correcting a wholly incorrect statement you made in your example... which benefits the OP, you, and everyone else who happens to read this thread.
10:07 am on Oct 9, 2013 (gmt 0)

Senior Member

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2001
posts:5459
votes: 3




I gave some valid examples in the preliminary use of anchors


I was simply correcting a wholly incorrect statement you made in your example...
10:19 pm on Oct 9, 2013 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:13218
votes: 348


Jim and I had this disagreement many times (i. e., the use of quotes and exactly as) in the last years of his participation here.
This goes all the way back to the earliest versions of Apache (I've still lines in place from those earlier days) and they remain functional in the most current versions.

Jim kept quoting the Apache Docs and I kept telling him that in this specific instance the Apache Docs were full of beans (of which there are a few other examples).

I'm sorry, Don, but I don't understand what you are saying. Specifically I don't understand what you're disagreeing with. And I don't see where "exactly as" enters into it at all, since I didn't say anything about that.

What _I_ said was: If the user-agent string-- or whatever other string you're testing in mod_setenvif-- contains literal spaces, one way to preserve those spaces is to put the test string in quotation marks. If you don't use quotation marks, the spaces acquire their usual semantic meaning.

BrowserMatch "rogerbot 1.6.2" keep_out
= If the UA string contains the element "rogerbot 1x6x2" then set the variable "keep_out" to its default value (1, or "true", or whatever it is)

BrowserMatch rogerbot 1.6.2 keep_out
= If the UA string contains the element "rogerbot" then set two variables, "1.6.2" and "keep_out"

Quotation marks don't cancel regular expressions and they don't create anchors.

BrowserMatch "Camino/2.1.2 (like"
= 500 error due to mismatched parenthesis

BrowserMatch "Camino/2.1.2 \(like"
= I am blocked

BrowserMatch "Camino/2...2"
= I am blocked
11:59 am on Oct 11, 2013 (gmt 0)

Senior Member

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2001
posts:5459
votes: 3


lucy,
Despite your prolific use of BrowserMatch, it's a lame tool and overall provides less than desired results.
Adding that example and/or application to this thread merely confuses matters more.

User-Agent
or
%{HTTP_USER_AGENT}

are much more effective and focused.

Once again, there are effective uses for exactly as ("")

In any event, I've violated my commitment to discontinue posting in the Apache Forum and fear that your just egging me on to jerk my chain ;)

Don
6:23 am on Oct 21, 2013 (gmt 0)

New User

joined:Oct 8, 2013
posts: 7
votes: 0


Bumping this since I doesn't feel I really got the answer.

In short I want to 301 a page/site and block certain bots from seeing this 301. Can this be done with just htacess or do I need to make it a combo with index.php
8:14 am on Oct 21, 2013 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:13218
votes: 348


do I need to make it a combo with index.php

Say what now?

A visitor-- whether human or robot-- doesn't see any headers until all Apache mods have done their stuff. If any of those mods issues a lockout, the 403 is all the robot will ever see.

THIS >> "Sorry, you're not wanted here"

NOT THIS >> "Sorry, you're not wanted, but if I had let you in I would have sent you to otherpage.html instead".

Throughout this thread your wording has been a little bit odd. So the lack of an unambiguous answer is because it's not 100% clear that you are, in fact, blocking the robot. If you're blocking it by User-Agent or IP or simply because you don't like its face, your job is done. A blocked request will not see any redirects arising from the same request.
9:57 am on Oct 21, 2013 (gmt 0)

New User

joined:Oct 8, 2013
posts: 7
votes: 0


Ok, then "mission accomplished" ;)

As I mentioned earlier, this is just a small SEO-test so no need for these bots to see a 403

Thanks!
9:57 pm on Oct 22, 2013 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


They'll see "something".

What do you want that something to be?
4:05 am on Oct 23, 2013 (gmt 0)

New User

joined:Oct 8, 2013
posts: 7
votes: 0


Nothing ;)

I want to stop certain bots from crawling the page and see outbound links
7:40 am on Oct 23, 2013 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:13218
votes: 348


Ah, we're quibbling over the definition of "something". A blocked robot will see the 403 response. It may-- if it so chooses-- see the content of the 403 page that your server obligingly sent out. That's assuming the 403 was issued by the server in the first place.