hit every page on my smaller site. Around 150 pages.
On my larger site; 38 pages; less 5% of total pages.
On an additional note, I have this bot in my robots.txt as defined on their website and it failed to slow them down a bit.
Of course that was prior to name change of their bot ;)
Must have been a long day, can't believe I typed 'sight' instead of site :).
Did it look like this?
|220.127.116.11 - - [25/Nov/2004:16:35:29 -0800] "GET /robots.txt HTTP/1.1" 200 1705 "-" "Mozilla/4.7 [en](BecomeBot'at'exava.com)" |
|11/26/04 14:26:20 IP block 18.104.22.168 |
Trying 22.214.171.124 at ARIN
Trying 64.124.85 at ARIN
Abovenet Communications, Inc ABOVENET (NET-64-124-0-0-1)
126.96.36.199 - 188.8.131.52
Exava MFN-B753-64-124-85-0-24 (NET-64-124-85-0-1)
184.108.40.206 - 220.127.116.11
# ARIN WHOIS database, last updated 2004-11-25 19:10
# Enter? for additional hints on searching ARIN's WHOIS database.
dat be da critter
Im gettin dizzy...
I just blocked these IPs in cPanel, is that enough to do the trick? The last octet on the second IP I put a * in there but it saved it in as blank. I really really want to keep it out the next sweep it attempts because it did some major damage, including executed 1024 blank help desk tickets and a credit card transaction. Anyone know if the below settings in cPanels IP deny manager will keep them out?
Been watching this little guy for a few days. Apparently Exava has become Become. Looks like they are in beta mode (still) under a new name. Definitley the most aggressive bot in our logs. They seem to take every page every day - occasionally more than once per day.
|just blocked these IPs in cPanel, is that enough to do the trick? The last octet on the second IP I put a * in |
Not sure if you cpanel uses rewrites or sets?
Did you view your htacess file afterwards?
the "*" is not a variable to be used is such numeral expressions for either set or rewrite.
The first full range denies precisely that SOLITARY IP only. (if sets are used or cpanel creates a valid rewrite.
The second line depends on IF rewrites or sets are used?
If rewrites the line is invaild on two accounts, your attempt at wildcard and not omitting the ending period.
The ending period takes out everything below 64.124.85.
Not sure if sets or rewrites but after what you said I contacted my host and according to them the wildcard should work. I tested it on my own internal network for a dummy domain and that worked ok too, so I guess in my case it works for me. Thanks for the input!
did you view your htaccess as I previously advised?
The examples provided on this page were are still functioning today and were functioning at least two years prior to when this thread was created:
The only lines which contain a *
are either in the referrer based denies or in the action closing lines.
BTW what my host knows about htaccess wouldn't fill the head of needle ;)
It is entirely posibble to create a line in htaccess which does not result in a 500 (taking the site down,) however the line fails to function as you intended when implemented.
Three good examples are multiple "¦¦", lack of a closing parenthenses, and lack of a closing [OR].
In addition these invalid lines have an effect on the other lines in the htaccess files.
Please see this page"
Under the heading of "quantifiers"
Looks like they updated the bot to: BecomeBot/2.0beta
18.104.22.168 - - [14/Dec/2004:17:06:45 -0500] "GET /robots.txt HTTP/1.1" 200 3774 "-" "Mozilla/5.0 (compatible; BecomeBot/2.0beta; +http://www.become.com/webmasters.html)
These are the mySimon and Wisenut guys:
I wonder if this will turn into anything decent...
Sorry for the delay, just got back from a trip up north...
First off, thanks so much for the help and valuable information. Those links are truly helpful for anyone trying to learn htaccess (such as myself).
Actually what I ment by using the wildcard is putting it in last octet field in the cPanel IP Deny Manager module. The output htaccess file from the previously mentioned blocked IP range is this:
IndexIgnore .htaccess */.?* *~ *# */HEADER* */README* */_vti*
<Limit GET POST>
#The next line modified by DenyIP
#The next line modified by DenyIP
#deny from all
allow from all
<Limit PUT DELETE>
deny from all
allow from all
deny from 22.214.171.124
deny from 64.124.85.
So, it removes the wild card. I guess you could just forget the wildcard alltogether and just leave the field blank. For someone who knows little about htaccess, myself (which I'm sure my host doesn't know much about either; he he ), I was glad IP Deny Manager took care of it. BTW - The above htaccess seems to keep them out good. I see exabot/becomebot attempting a crawl with 1 hit every so often; but its haulted.
Thanks again wilderness... :)
Is there any way that I can block this from robots.txt? I do not have access to htaccess.
it is consuming 2/3 of my bandwidth from one of my sites.
Oh, Nevermind...I find the line for exclusion.
>I wonder if this will turn into anything decent...
Not at the rate this thing is going at present.
This is all I see, each time. No crawl, just a link from which they might have come...because I'm listed on that page. <shrug> That should eliminate Log Spamming.
|126.96.36.199 - - [21/Dec/2004:11:40:03 -0800] "GET /robots.txt HTTP/1.1" 200 1727 "-" "Mozilla/5.0 (compatible; BecomeBot/2.0beta; +http://www.become.com/webmasters.html)" |
188.8.131.52 - - [21/Dec/2004:11:40:17 -0800] "GET / HTTP/1.1" 200 20407 "http:///PAGE_REMOVED.html" "Mozilla/5.0 (compatible; BecomeBot/2.0beta; +http://www.become.com/webmasters.html)"