Welcome to WebmasterWorld Guest from

Forum Moderators: Ocean10000 & incrediBILL & phranque

Message Too Old, No Replies

How do i block these scrapers?

IP blocking doesnt work



9:47 pm on Feb 26, 2013 (gmt 0)

So far i tried ip blocking, even put in the word "torkaland" and "streamica" in referrer and user-agent block list. None of it works! Pls help.

torkaland.blogspot.com and streamica.com


10:10 pm on Feb 26, 2013 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member

Did you restart apache after each change?


10:31 pm on Feb 26, 2013 (gmt 0)

The block are in the .htaccess. I didn't know you have to restart Apache for it to take affect?

The problem is i cant block the IP of the blogspot site because its owned by Google, i am afraid they might use the same IP to crawl my site and get blocked.

Whereas streamica seems to be pulling RSS feeds from a different IP than what its hosted on and i don't know which IP they are using to scrape the site.


11:29 pm on Feb 26, 2013 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month

Anything in htaccess takes effect immediately. The only exception is that if your browser has already cached the page, it may not know that there have been changes.

It is trivial to make a conditional block to say, for example,

RewriteCond %{REMOTE_ADDR} {give the numerical IP here}
RewriteCond ${USER_AGENT} !Googlebot

... and then take it from there. Currently all the googlebot variants such as the imagebot and the three-or-more mobiles contain the element "Googlebot" (capitalized) in their User-Agent string.

even put in the word "torkaland" and "streamica" in referrer and user-agent block list.

What exactly do you mean by this? That is, what did you do physically?


2:06 am on Feb 27, 2013 (gmt 0)

Would it be safe to block blogspot.com which is on a Google IP? My concern is that Googlebot may also use the same IP sometimes. If someone can confirm it doesn't that would be of immense help.

Below is an example of what i meant...

SetEnvIfNoCase user-agent "torkaland" keep_out
SetEnvIfNoCase user-agent "streamica" keep_out


RewriteCond %{HTTP_REFERER} streamica.com [NC,OR]


8:59 pm on Feb 28, 2013 (gmt 0)

I found out torkaland has added our site to widget BlogList on blogger, anyway to stop the RSS feed?

Featured Threads

Hot Threads This Week

Hot Threads This Month