homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

New Exava bot?

 8:55 pm on Nov 25, 2004 (gmt 0)

Just started seeing a new bot, "BecomeBot@exava.com", in my logs today. Couldn't find anything on google or on exava.com. I have already blocked exabot due to its aggressiveness and for the fact that my sight isn't "shopping" related.

Anyone else seen this one?



 5:23 am on Nov 26, 2004 (gmt 0)

hit every page on my smaller site. Around 150 pages.

On my larger site; 38 pages; less 5% of total pages.


 7:32 am on Nov 26, 2004 (gmt 0)

On an additional note, I have this bot in my robots.txt as defined on their website and it failed to slow them down a bit.

Of course that was prior to name change of their bot ;)


 6:14 pm on Nov 26, 2004 (gmt 0)

Must have been a long day, can't believe I typed 'sight' instead of site :).


 9:20 pm on Nov 26, 2004 (gmt 0)

Did it look like this? - - [25/Nov/2004:16:35:29 -0800] "GET /robots.txt HTTP/1.1" 200 1705 "-" "Mozilla/4.7 [en](BecomeBot'at'exava.com)"

11/26/04 14:26:20 IP block
Trying at ARIN
Trying 64.124.85 at ARIN
Abovenet Communications, Inc ABOVENET (NET-64-124-0-0-1) -
Exava MFN-B753-64-124-85-0-24 (NET-64-124-85-0-1) -

# ARIN WHOIS database, last updated 2004-11-25 19:10
# Enter? for additional hints on searching ARIN's WHOIS database.


 9:56 pm on Nov 26, 2004 (gmt 0)

dat be da critter


 9:57 pm on Nov 26, 2004 (gmt 0)

Im gettin dizzy...



 4:47 pm on Dec 13, 2004 (gmt 0)

I just blocked these IPs in cPanel, is that enough to do the trick? The last octet on the second IP I put a * in there but it saved it in as blank. I really really want to keep it out the next sweep it attempts because it did some major damage, including executed 1024 blank help desk tickets and a credit card transaction. Anyone know if the below settings in cPanels IP deny manager will keep them out?


 5:00 pm on Dec 13, 2004 (gmt 0)

Been watching this little guy for a few days. Apparently Exava has become Become. Looks like they are in beta mode (still) under a new name. Definitley the most aggressive bot in our logs. They seem to take every page every day - occasionally more than once per day.


 8:13 pm on Dec 13, 2004 (gmt 0)

just blocked these IPs in cPanel, is that enough to do the trick? The last octet on the second IP I put a * in

Not sure if you cpanel uses rewrites or sets?
Did you view your htacess file afterwards?

the "*" is not a variable to be used is such numeral expressions for either set or rewrite.

The first full range denies precisely that SOLITARY IP only. (if sets are used or cpanel creates a valid rewrite.

The second line depends on IF rewrites or sets are used?
If rewrites the line is invaild on two accounts, your attempt at wildcard and not omitting the ending period.

If sets?
The ending period takes out everything below 64.124.85.


 12:34 am on Dec 14, 2004 (gmt 0)

Not sure if sets or rewrites but after what you said I contacted my host and according to them the wildcard should work. I tested it on my own internal network for a dummy domain and that worked ok too, so I guess in my case it works for me. Thanks for the input!


 1:19 am on Dec 14, 2004 (gmt 0)

did you view your htaccess as I previously advised?

The examples provided on this page were are still functioning today and were functioning at least two years prior to when this thread was created:


The only lines which contain a *

are either in the referrer based denies or in the action closing lines.

BTW what my host knows about htaccess wouldn't fill the head of needle ;)

It is entirely posibble to create a line in htaccess which does not result in a 500 (taking the site down,) however the line fails to function as you intended when implemented.
Three good examples are multiple "", lack of a closing parenthenses, and lack of a closing [OR].
In addition these invalid lines have an effect on the other lines in the htaccess files.

Please see this page"

Under the heading of "quantifiers"


 12:14 am on Dec 15, 2004 (gmt 0)

Looks like they updated the bot to: BecomeBot/2.0beta - - [14/Dec/2004:17:06:45 -0500] "GET /robots.txt HTTP/1.1" 200 3774 "-" "Mozilla/5.0 (compatible; BecomeBot/2.0beta; +http://www.become.com/webmasters.html)


 6:59 pm on Dec 15, 2004 (gmt 0)

These are the mySimon and Wisenut guys:


I wonder if this will turn into anything decent...


 2:24 pm on Dec 21, 2004 (gmt 0)


Sorry for the delay, just got back from a trip up north...

First off, thanks so much for the help and valuable information. Those links are truly helpful for anyone trying to learn htaccess (such as myself).

Actually what I ment by using the wildcard is putting it in last octet field in the cPanel IP Deny Manager module. The output htaccess file from the previously mentioned blocked IP range is this:

# -FrontPage-

IndexIgnore .htaccess */.?* *~ *# */HEADER* */README* */_vti*

<Limit GET POST>
#The next line modified by DenyIP
order allow,deny
#The next line modified by DenyIP
#deny from all
allow from all
order deny,allow
deny from all
AuthName www.mydomain.com
AuthUserFile /home/user/public_html/_vti_pvt/service.pwd
AuthGroupFile /home/user/public_html/_vti_pvt/service.grp

<Files 403.shtml>
order allow,deny
allow from all

deny from
deny from 64.124.85.

So, it removes the wild card. I guess you could just forget the wildcard alltogether and just leave the field blank. For someone who knows little about htaccess, myself (which I'm sure my host doesn't know much about either; he he ), I was glad IP Deny Manager took care of it. BTW - The above htaccess seems to keep them out good. I see exabot/becomebot attempting a crawl with 1 hit every so often; but its haulted.

Thanks again wilderness... :)


 3:39 pm on Dec 21, 2004 (gmt 0)

Is there any way that I can block this from robots.txt? I do not have access to htaccess.

it is consuming 2/3 of my bandwidth from one of my sites.


 3:40 pm on Dec 21, 2004 (gmt 0)

Oh, Nevermind...I find the line for exclusion.

User-agent: BecomeBot
Disallow: /


 5:57 am on Dec 22, 2004 (gmt 0)

>I wonder if this will turn into anything decent...

Not at the rate this thing is going at present.

This is all I see, each time. No crawl, just a link from which they might have come...because I'm listed on that page. <shrug> That should eliminate Log Spamming. - - [21/Dec/2004:11:40:03 -0800] "GET /robots.txt HTTP/1.1" 200 1727 "-" "Mozilla/5.0 (compatible; BecomeBot/2.0beta; +http://www.become.com/webmasters.html)" - - [21/Dec/2004:11:40:17 -0800] "GET / HTTP/1.1" 200 20407 "http:///PAGE_REMOVED.html" "Mozilla/5.0 (compatible; BecomeBot/2.0beta; +http://www.become.com/webmasters.html)"

Global Options:
 top home search open messages active posts  

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved