homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

bot hit RSS every 15 minutes
How I block Windows 98 bot

 12:06 am on May 20, 2013 (gmt 0)

I have this bot visiting my site since months it use opera proxy multiple IP's Hostname: z07-13.opera-mini.net so i can't only block it IP .
I already Block it using this code on .htaccess
SetEnvIf user-agent "^Windows 98" ban #Bad bot
Order allow,deny
allow from all
deny from env=ban

but after I change my host the bot back again .
I contact their support but the gave me th same code I already had .
SetEnvIf User-Agent Mozilla/4.0 (Windows 98; US) Opera 10.00 [en] GoAway=1
Order allow,deny
Allow from all
deny from env=GoAway

This effect none to this annoying bot .
I already test this code
RewriteCond %{HTTP_USER_AGENT} "^windows 98"
RewriteRule ^.* - [F,L]

This work On me when I change My browser UA Using Firefox User-Agent Addon . but it didn't work on the bot .and I don't know why

The bot become worst and grabbing contents from my site now .



 12:41 am on May 20, 2013 (gmt 0)

Here is the correct code for all three options. The second is the most precise. The first and last will ban any user-agent with Windows 98.

SetEnvIf User-Agent Windows\s98 ban
Order allow,deny
allow from all
deny from env=ban

SetEnvIf User-Agent "^Mozilla/4\.0 \(Windows 98; US\) Opera 10\.00 \[en\]$" GoAway
Order allow,deny
allow from all
deny from env=GoAway

RewriteCond %{HTTP_USER_AGENT} Windows\s98
RewriteRule .* - [F]


 2:25 am on May 20, 2013 (gmt 0)

SetEnvIf User-Agent Mozilla/4.0 (Windows 98; US) Opera 10.00 [en] GoAway=1

Ouch. I hope that wasn't a literal quote from htaccess. What mod_setenvif thinks this means is:

If the user-agent is "Mozilla/4.0"
then set this list of environmental variables:
and finally
GoAway, which gets the specific value of 1. Frankly you're lucky this didn't result in locking out all UAs containing the string "Mozilla/4.0" (robots plus all but the latest versions of MSIE). For that matter, maybe you did lock them out, you just didn't notice :)

Remember that in Apache, a literal space very often has semantic meaning. So it needs to be either escaped or hidden inside quotation marks. Some mods let you go either way; some are more particular. In mod_setenvif, quotation marks are enough.

Incidentally, there is a useful shorthand in mod_setenvif. When matching against the user-agent, you can say

^windows 98

With the opening anchor, this rule-- in any module-- will only work on user-agents that begin "windows 98". With lower-case w, because Regular Expressions are case sensitive unless you've particularly told them not to be.


 2:12 pm on May 21, 2013 (gmt 0)

I know this is a dumb question, considering you've already decided on a solution. But why block the bot? The bot is reading your rss feed, and aggregating your content for you, and puts links to your content in more places than it would be normally. This helps you. Why would you want to stop it?


 8:02 pm on May 21, 2013 (gmt 0)

Best guess: because it isn't putting links anywhere, it's just scraping the content for other sites' benefit.

Global Options:
 top home search open messages active posts  

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved