Welcome to WebmasterWorld Guest from 126.96.36.199
Forum Moderators: goodroi
Recently, BecomeBot have been very active crawling my sites, eating a lot of bandwidth.
I've create robots.txt with content like this:
# Disallow BecomeBot
But the robot seems ignore it.
Is there a way to stop BecomeBot completely from crawling my sites?
I just checked my access logs and I see I have the Become Bot visiting, but only one entry. (It's IP is 188.8.131.52 if you want to block in cPanel). The entry references this page [become.com...] (Authoritative URL). According to that page, you're using the correct robots.txt line, so looks like you're going to have to ask them, or else IP Deny 64.124.85. since they use that ENTIRE IP range!
Why do you want to block their bot, have you heard something bad about them? I just went to become.com and I can't be found in them, strange since it's visiting my site. So I'm wondering about submitting to them, but I don't even see where you can submit a site to them. Ahh, I see now they say this: "At this time, we do not accept submissions from webmasters." I did some searches to test it, and in my case I got nothing but non-relevant results, looks pretty bad.
Their SE doesn't seem to do very well with the generic searches I first tried like "[blue widget] sales" or dealers, all I saw were non relevant results. But when searching for a specific product, it does better. I like the fact they have a rating system where you can rate each search, and I like their "auto complete" of searches. I've never seen that before with any SE.
The only reasons I want to stop it is because it's crawled heavily and deep to my sites so eat a lot of bandwidth and from my log, I haven't see any visitors referred by them. So I think it's just a wasted bandwidth.
With Google "freaking out" every few months, we need ALL the SE's we can find.
(From March) "Last week, Become.com sent out a press release that talked about it's patent pending ranking algorithm dubbed AIR. AIR stands for Affinity Index Ranking and based on claims from Become is the next generation search engine ranking algorithm.
Go to G and enter "become search engine" in quotes, and look at the hit on marketingshift.com , it's the 4th hit in my area.
The problem with BecomeBot in my site is they eat about 100MB bandwidth in a snap. The crawl heavy and deep.
You know, my sites is using a script to showcase a products from amazon. If it is not stopped, can you imagine a bot crawling the whole amazon store at once?
I'm a newbie in robots.txt, is there a way to limit so the bot do not crawl the whole amazon store at once?
You can control the rate at which your site is crawled by using the Crawl-Delay feature. The Crawl-Delay feature allows you to specify the number of seconds between visits to your site. Note that it may take quite a long time to crawl a site if there are many pages and the Crawl-Delay is set high. You could specify an interval of 30 seconds between requests with an entry like this:
I recently had this problem too. They spidered thousands of my pages in one day - totally took down the dedicated server!
I found this piece of code on their site which has worked well for me. It tells the bot to wait at least 30 seconds inbetween page requests: