Welcome to WebmasterWorld Guest from 54.226.62.251

Forum Moderators: Ocean10000 & incrediBILL & phranque

Message Too Old, No Replies

How do i block these scrapers?

IP blocking doesnt work

     

tangster

9:47 pm on Feb 26, 2013 (gmt 0)



So far i tried ip blocking, even put in the word "torkaland" and "streamica" in referrer and user-agent block list. None of it works! Pls help.

torkaland.blogspot.com and streamica.com

Frank_Rizzo

10:10 pm on Feb 26, 2013 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Did you restart apache after each change?

tangster

10:31 pm on Feb 26, 2013 (gmt 0)



The block are in the .htaccess. I didn't know you have to restart Apache for it to take affect?

The problem is i cant block the IP of the blogspot site because its owned by Google, i am afraid they might use the same IP to crawl my site and get blocked.


Whereas streamica seems to be pulling RSS feeds from a different IP than what its hosted on and i don't know which IP they are using to scrape the site.

lucy24

11:29 pm on Feb 26, 2013 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



Anything in htaccess takes effect immediately. The only exception is that if your browser has already cached the page, it may not know that there have been changes.

It is trivial to make a conditional block to say, for example,

RewriteCond %{REMOTE_ADDR} {give the numerical IP here}
RewriteCond ${USER_AGENT} !Googlebot


... and then take it from there. Currently all the googlebot variants such as the imagebot and the three-or-more mobiles contain the element "Googlebot" (capitalized) in their User-Agent string.

even put in the word "torkaland" and "streamica" in referrer and user-agent block list.

What exactly do you mean by this? That is, what did you do physically?

tangster

2:06 am on Feb 27, 2013 (gmt 0)



Would it be safe to block blogspot.com which is on a Google IP? My concern is that Googlebot may also use the same IP sometimes. If someone can confirm it doesn't that would be of immense help.


Below is an example of what i meant...


SetEnvIfNoCase user-agent "torkaland" keep_out
SetEnvIfNoCase user-agent "streamica" keep_out

and

RewriteCond %{HTTP_REFERER} streamica.com [NC,OR]

tangster

8:59 pm on Feb 28, 2013 (gmt 0)



I found out torkaland has added our site to widget BlogList on blogger, anyway to stop the RSS feed?
 

Featured Threads

Hot Threads This Week

Hot Threads This Month