homepage Welcome to WebmasterWorld Guest from 54.166.122.86
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
sub-semalt
wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4694297 posted 2:25 am on Aug 10, 2014 (gmt 0)

anybody have a clue if this correct syntax?

#any two numbers
RewriteCond %{HTTP_REFERER} ^http://[0-9]{2}\.semalt\.com/

 

iamzippy

5+ Year Member



 
Msg#: 4694297 posted 11:03 am on Aug 10, 2014 (gmt 0)

It works in Regex Buddy.
So does:

RewriteCond %{HTTP_REFERER} ^http://\d\d\.semalt\.com/

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4694297 posted 1:01 pm on Aug 10, 2014 (gmt 0)

Many thanks

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4694297 posted 1:53 pm on Aug 10, 2014 (gmt 0)

There are assorted subdomains (e.g.: http://semalt.semalt.com/) thus --

RewriteCond %{HTTP_REFERER} semalt
RewriteRule .* - [F]

-- works for me. (Ditto for fellow pest kambasoft.)

ronin

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4694297 posted 2:24 pm on Aug 10, 2014 (gmt 0)

I'm still learning (and improving) my regex skills, but I am using this:

RewriteCond %{HTTP_REFERER} ^https?://([a-z0-9-]+\.)?semalt\.com [NC]
RewriteRule .* - [F]

not2easy

WebmasterWorld Administrator 5+ Year Member Top Contributors Of The Month



 
Msg#: 4694297 posted 3:50 pm on Aug 10, 2014 (gmt 0)

Couldn't these be combined like we do for UAs?

RewriteCond %{HTTP_REFERER} (kambasoft|semalt|whatever) [NC]
RewriteRule .* - [F]

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4694297 posted 4:27 pm on Aug 10, 2014 (gmt 0)

certainly.

dupres01



 
Msg#: 4694297 posted 5:45 pm on Aug 10, 2014 (gmt 0)

which is the better form to use (and, to help with my education, why)?
this one:
RewriteCond %{HTTP_REFERER} (kambasoft|semalt|whatever) [NC]
RewriteRule .* - [F]

or this one:
RewriteCond %{HTTP_REFERER} kambasoft [NC,OR]
RewriteCond %{HTTP_REFERER} semalt [NC,OR]
RewriteCond %{HTTP_REFERER} whatever [NC]
RewriteRule .* - [F]

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4694297 posted 5:50 pm on Aug 10, 2014 (gmt 0)

the combined line will be less server strain and slightly faster.

Both do however work.

I've one for "crawler" as well, however for simplicity sake and to possibly stop another stray bot, you could use "crawl".

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4694297 posted 7:05 pm on Aug 10, 2014 (gmt 0)

The form
#any two numbers
RewriteCond %{HTTP_REFERER} ^http://[0-9]{2}\.semalt\.com/

is syntactically correct, but I suspect it's easier on the server if you simply say
^http://[0-9][0-9]\.semalt\.com
That's assuming it will always be exactly two. Otherwise of course you'd go to
[0-9]+
Or-- my preference-- \d for a savings of three bytes ;)

fwiw, mine simply says

SetEnvIf Referer semalt keep_out

It's in mod_setenvif because this rule is in my shared htaccess used by all sites. If it were for a single site it would be expressed as a RewriteCond along with assorted other referer-based lockouts.

My impression is that semalt works 100% via infected human browsers, because they always ask for favicon and stylesheet. Robots normally don't. Did anyone ever figure out what they want?

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved