Welcome to WebmasterWorld Guest from 35.172.195.49

Forum Moderators: Ocean10000 & phranque

Message Too Old, No Replies

hitbot redirection with htaccess help needed

hitbot redirection with htaccess help needed

     
4:18 pm on Jan 7, 2004 (gmt 0)

New User

10+ Year Member

joined:Jan 7, 2004
posts:2
votes: 0


Hello everyone! This is my first post over here. I was referred here to pose my questions to the experts on this board.

My issue is hitbots on my site. I am trying to redirect them to google with htaccess and need some advice. Below is a snip of my htaccess file for the hitbots.

################################
RewriteOptions inherit
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} ^Anon.* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Avant.* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Alligator [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^attach [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^BackWeb [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Bandit [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^BatchFTP [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craftbot@yahoo.com [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Buddy [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^CherryPicker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Collector [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Copier [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Crescent [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Custo [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^DA [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^DISC.* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^DISCo [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^DISCo\Pump [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^DLExpert [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Download.* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Download\ Master [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Download\ Ninja [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Download\Wonder [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Downloader [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Drip [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^eCatch [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^FileHound [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^FlashGet [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^FlipDog [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^FreshDownload [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^GetRight [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^GetSmart [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^GetWeb! [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^GornKer [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^gotit [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^GrabNet [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Grabber [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Grafula [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^HiDownload [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^HMView [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^HTTrack [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^ia_archiver [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^InterGET [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^InternetSeer.com [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Iria [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Irvine [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^iwantmy [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^JetCar [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^JustView [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^larbin [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^LeechGet [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^lftp [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^likse [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Link [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Magnet [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Mag-Net [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Memo [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^MetaProducts\ Download\ Express [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft.URL [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Mirror [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Mister.* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*NEWT [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^MyGetRight [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Navroad [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^NearSite [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^NetAnts [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^NetButler [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^NetPumper [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^NetSpider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^NetZIP [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^NICErsPRO [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Ninja [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Nitro\ Downloader [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline.* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Octopus [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^pavuk [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Ping [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Pockey [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Pump [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^PuxaRapido [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^RealDownload [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Reaper [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Recorder [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^ReGet [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Siphon [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Snake [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^SpaceBison [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^SpeedDownload [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Stripper [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Sucker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Surfbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Vacuum [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Web.* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebAuto [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebCopier [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Webdup [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebFetch [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Go [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebLeacher [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebPictures\ Downloader [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebReaper [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebSauger [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ Quester [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Webster [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebStripper [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebWasher [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebZIP [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Whacker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Widow [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WWWOFFLE [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^ZyBorg [NC]
RewriteRule .* [google.com...] [R,L]
###########################################

My question is on the syntax I am using to identify these bots. I have been told the following:

"when you only use ^Something as your wildcard htaccess only looks for a useragent that starts with ^Something

you could use a wildcard like that:
.*Something.*

^ and $ are there to figure out if the subject starts or ends with a certain phrase. you don't need to use it."

And agree that I do need to use a wildcard on thse but am not sure where to place it for each bot? Can I just change all of them to read:

^*.*botname.*

Or, is the placement of the * and . going to be different depending on the bot?

Does anyone have any experience with this?

Thanks, and regards,
Patrick

5:18 pm on Jan 7, 2004 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


Patrick,

Welcome to WebmasterWorld [webmasterworld.com]!

My advice would be to use that list exactly as you have it, and not change the anchoring until you have reason to do so.

You question concerns Regular Expressions [etext.lib.virginia.edu] anchoring. Regex is a precise pattern-matching language, and mod_rewrite is very powerful module. Together, they can be extremely helpful. Or harmful -- a single typo can render your server unreachable (from HTTP). Study is advised.

Note that "^.*" and ".*$" are completely redundant, and can be omitted.

If you wish to spend a few hours, you can compare your list to the ones in our "Close to perfect ban list [webmasterworld.com]" thread (now in three parts) and look at the start- and end-anchoring of the user-agent strings listed there. There are anchoring errors in some of those posts, but you might want to go with the majority vote on each user-agent string.

Jim

5:33 pm on Jan 7, 2004 (gmt 0)

New User

10+ Year Member

joined:Jan 7, 2004
posts:2
votes: 0


Thanks JDMorgan. I have been picking the best of the best like you suggest from several boards and going with most popular on what is correct or not. :-) That's definately good advice!

Already read that post earlier today but did not make any adjustments to my htaccess yet and posted as is. I have been reading up on regex to but still am no pro.

I appreciate the feedback!