Forum Moderators: open

Message Too Old, No Replies

Yahoo Slurp China htaccess block

Slurp China htaccess block

         

cyberdyne

12:40 pm on Mar 6, 2008 (gmt 0)

10+ Year Member



Hi,
A little advice please.
Would the following block Slurp as well as Slurp China? I don't want to block Slurp (US).

RewriteCond %{HTTP_USER_AGENT} ^Slurp\ China [NC,OR]

Thank you.

wilderness

4:13 pm on Mar 6, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I'm not sure of the UA, as I haven't seen it.

I've had the 202 range that they crawl from denied since they began crawling from it.

edited wilderness:
Have the following in robots.txt

User-agent: Yahoo! Slurp China
Disallow: /

cyberdyne

6:18 pm on Mar 6, 2008 (gmt 0)

10+ Year Member


Thanks wilderness.

The log read:

T /robots.txt HTTP/1.0" 302 419 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp China; http://misc.yahoo.com.cn/help.html)"

I get hit by Yahoo! Slurp China quite a bit and would like to block it while permitting Slurp (US).

Thanks

wilderness

6:35 pm on Mar 6, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You may also use the following:

RewriteCond %{HTTP_USER_AGENT} "Slurp China" [NC,OR]

without using the escapes.
The use of quotes results in "exactly as".

Absence of the leading ^ (starts with) catches the term anywhere in the UA.

Don

cyberdyne

6:42 pm on Mar 6, 2008 (gmt 0)

10+ Year Member



Great, I'll use that.
Thank you Don

keyplyr

11:00 am on Mar 7, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



cyberdyne, if using mod_rewrite you'll need to escape the space as you did in your first post, but leave off the starting anchor:

RewriteCond %{HTTP_USER_AGENT} "Slurp\ China" [NC,OR]

Actually, you could just block anything with "China" in the UA:

RewriteCond %{HTTP_USER_AGENT} "China" [OR]

cyberdyne

11:18 am on Mar 7, 2008 (gmt 0)

10+ Year Member



OK keyplyr, I'll amend it :)
Thank you.

wilderness

1:25 pm on Mar 7, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



cyberdyne, if using mod_rewrite you'll need to escape the space as you did in your first post, but leave off the starting anchor:

RewriteCond %{HTTP_USER_AGENT} "Slurp\ China" [NC,OR]

Hey keyplr,
that's not correct!
Use of quotes utilizes "exatly as" and includes spaces (or any other character) without requiring escaping of the
space.

Additionally, when using the simgle word China, adding quotes is redundant.

As an aside, you may recall another who came into this forum some time ago and was utilizing quotes on every line because that was the example both provided and expressed in Apache?
It's simply not requried on every line.

Don

cyberdyne

2:32 pm on Mar 7, 2008 (gmt 0)

10+ Year Member



Wilderness, can you confirm that the original:

RewriteCond %{HTTP_USER_AGENT} "Slurp\ China" [NC,OR]

is indeed correct?

Thank you :)

wilderness

3:37 pm on Mar 7, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



cyberdyne,
That will work.

The escape however is redundant and not necessary due to use of quotes. (the escape is necessary without the use of quotes)

Your original inquiry included the leading ^ which would not function because the UA does not begin with those terms.

Please don't be confused by this?
It's merely a matter of clarification of practices which help to provide insight into more complicated reg ex. As well as consistency in the usage of pattern for our personal clarification (a solitary error in syntax may and will result in both your htaccess and website (s) from functioning)

RewriteCond %{HTTP_USER_AGENT} "Slurp\ China" [NC,OR]
OR
RewriteCond %{HTTP_USER_AGENT} "Slurp China" [NC,OR]
OR
RewriteCond %{HTTP_USER_AGENT} Slurp\ China [NC,OR]
OR
RewriteCond %{HTTP_USER_AGENT} China [NC,OR]

All three (four; edited to add lone China) above accomplish the same desired effect.
However, I would caution you against not chosing a particular method and then conforming to that method in your practicses.
The result of inconsistency in reg ex amy result in you spending days to determine a syntax error (should you htaccess ever grow large).

keyplr is far more adapt at writing extensive expressions than myself, however I've benefitted from "KISS" in my use of htaccess.

Don

keyplyr

2:20 am on Mar 8, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Don, it was very late when I posted. I just cut'n pasted the example without even noticing those quotes - duh. Ever since I started running dual monitors juggling multiple windows, I'll notice more mistakes as the night grows long.

Thanks for correcting me. I hate it when misinformation is posted as valid.

wilderness

3:36 am on Mar 8, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Ever since I started running dual monitors juggling multiple windows

keyplr,
I've a hard enough time keeping track of what comes across one...do the dual monitors affect your ability to focus?

There are occassional days when my eyes or simply out of focus and I'm not able to function with a computer screen.

Don

cyberdyne

9:02 am on Mar 8, 2008 (gmt 0)

10+ Year Member



Thank you very much guys.
I've learnt a lot from this post.
Appreciated.