Welcome to WebmasterWorld Guest from 54.226.238.178

Forum Moderators: goodroi

Message Too Old, No Replies

How to Stop BecomeBot?

     
3:29 am on Jun 23, 2005 (gmt 0)

New User

10+ Year Member

joined:Mar 31, 2004
posts:33
votes: 0


Hi,

Recently, BecomeBot have been very active crawling my sites, eating a lot of bandwidth.

I've create robots.txt with content like this:

# Disallow BecomeBot
User-agent: BecomeBot
Disallow: /

But the robot seems ignore it.

Is there a way to stop BecomeBot completely from crawling my sites?

Thanks.

Regards,
Sjarief

9:41 am on June 23, 2005 (gmt 0)

Full Member

joined:Jan 12, 2004
posts:334
votes: 0


If you are sure that's its correct name, and it's not working, you'll have to find the IP of the bot(s) and if you use cPanel, put their IP's in the "IP Deny" area. Or else contact them and ask them about it.

I just checked my access logs and I see I have the Become Bot visiting, but only one entry. (It's IP is 64.124.85.78 if you want to block in cPanel). The entry references this page [become.com...] (Authoritative URL). According to that page, you're using the correct robots.txt line, so looks like you're going to have to ask them, or else IP Deny 64.124.85. since they use that ENTIRE IP range!

Why do you want to block their bot, have you heard something bad about them? I just went to become.com and I can't be found in them, strange since it's visiting my site. So I'm wondering about submitting to them, but I don't even see where you can submit a site to them. Ahh, I see now they say this: "At this time, we do not accept submissions from webmasters." I did some searches to test it, and in my case I got nothing but non-relevant results, looks pretty bad.

9:51 am on June 23, 2005 (gmt 0)

Full Member

joined:Jan 12, 2004
posts:334
votes: 0


Interesting. The Become bot only visited ONE of my pages. It first hit my robots.txt file, then only ONE webpage. I searched their SE for the product on that page, and I was 1st on the 1st page. I searched for other specific products I carry, and I'm also on the first page for them. Strange that it chose one of my FORWARDED domains that point to my MAIN domain as 1st spot in one search! FAIK, coincidentally that could be the very first time their bot has ever visited. I don't now why it chose this one page though and no others to visit! It was not even an entry page.

Their SE doesn't seem to do very well with the generic searches I first tried like "[blue widget] sales" or dealers, all I saw were non relevant results. But when searching for a specific product, it does better. I like the fact they have a rating system where you can rate each search, and I like their "auto complete" of searches. I've never seen that before with any SE.

9:59 am on June 23, 2005 (gmt 0)

New User

10+ Year Member

joined:Mar 31, 2004
posts:33
votes: 0


Thanks, if robots.txt cannot stop them, I will use cpanel to stop it.

The only reasons I want to stop it is because it's crawled heavily and deep to my sites so eat a lot of bandwidth and from my log, I haven't see any visitors referred by them. So I think it's just a wasted bandwidth.

Regards,
Sjarief

12:00 pm on June 23, 2005 (gmt 0)

Full Member

joined:Jan 12, 2004
posts:334
votes: 0


You know if I were you, and this is just me, I'd be GLAD the bot is crawling your site. They are brand new from what I saw, (still in Beta), and with their new kind of "indexing intelligence" and features I mentioned, they could be a big hit. It's just going to take some time after they are out of Beta for their name to get around and the referrals. Something to think about. Looks like they came out in March.

With Google "freaking out" every few months, we need ALL the SE's we can find.

(From March) "Last week, Become.com sent out a press release that talked about it's patent pending ranking algorithm dubbed AIR. AIR stands for Affinity Index Ranking and based on claims from Become is the next generation search engine ranking algorithm.

Go to G and enter "become search engine" in quotes, and look at the hit on marketingshift.com , it's the 4th hit in my area.

3:41 pm on June 23, 2005 (gmt 0)

New User

10+ Year Member

joined:Mar 31, 2004
posts:33
votes: 0


May be you are right, we need more SE so we don't depend only on The Big G.

The problem with BecomeBot in my site is they eat about 100MB bandwidth in a snap. The crawl heavy and deep.

You know, my sites is using a script to showcase a products from amazon. If it is not stopped, can you imagine a bot crawling the whole amazon store at once?

I'm a newbie in robots.txt, is there a way to limit so the bot do not crawl the whole amazon store at once?

4:51 pm on June 23, 2005 (gmt 0)

Full Member

joined:Jan 12, 2004
posts:334
votes: 0


You may want to start a new topic for that. I don't know if you can set time limits that small in the robots.txt file, but I think you can set periodic crawl dates, or just block it from accessing Amazon. It's not like they need the help. A Bot accessing another site isn't going to affect your bandwidth since it's not on your server, unless the Amazon pages ARE ON your server, as in MyDomain.com/amazon/whatever.
5:39 pm on June 23, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Mar 30, 2004
posts:712
votes: 0


You can control the rate at which your site is crawled by using the Crawl-Delay feature. The Crawl-Delay feature allows you to specify the number of seconds between visits to your site. Note that it may take quite a long time to crawl a site if there are many pages and the Crawl-Delay is set high. You could specify an interval of 30 seconds between requests with an entry like this:

User-agent: BecomeBot
Crawl-Delay: 30
Disallow: /cgi-bin

5:48 pm on June 23, 2005 (gmt 0)

Preferred Member

joined:Apr 22, 2004
posts:528
votes: 0


are you kidding?

the engine is spam infested.

5:50 pm on June 23, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 16, 2004
posts:693
votes: 0


if the bot is ignoring robots.txt then you should contact the bot developer or ban it period.
6:41 pm on June 23, 2005 (gmt 0)

Preferred Member

joined:Apr 22, 2004
posts:528
votes: 0


you could log in as root and try this command:
/etc/init.d/httpd shutdown
3:09 pm on June 29, 2005 (gmt 0)

New User

10+ Year Member

joined:May 21, 2005
posts:21
votes: 0


Hey guys,

I recently had this problem too. They spidered thousands of my pages in one day - totally took down the dedicated server!

I found this piece of code on their site which has worked well for me. It tells the bot to wait at least 30 seconds inbetween page requests:

User-agent: BecomeBot
Crawl-Delay: 30
Disallow: /cgi-bin