|Bot is killing my site, is it worth it?|
URGENT to block or not to block
"Jeeves/Teoma" is spidering my site right now. I have many, many dynamic pages. That go like this - /country/state/city/ and I have almost every city in US and Canada. That makes ~ 20,000 pages. I have little content on those pages yet thou. And ask is getting *every single page*. I already reached 80% of my bandwidth usage just overnight.
Is it worth it? I didn't pay for inclusion, so I might not even get listed, right? And if I will get listed, will amount of pages reflect ranking?
And the same question for other SE, should I let them or it's not worth it?
Please respoint quick, because bot is still out there and I don't know what to do.
spelling and added a question[/edit]
[edited by: moltar at 3:28 pm (utc) on Aug. 8, 2003]
We've had more than 10,000 visits from AJ so far this month, seems like they've really ramped it up.
I have found that it it likley thay the pages will be indexed after free spidering - it just takes a while. I don't think that the amount of pages spidered have anything to do with ranking.
You could wind up with a good amount of traffic depending on the niche you're in.
Just a thought--What will you do when googlebot comes calling for a deep crawl?
[edited by: fiestagirl at 3:38 pm (utc) on Aug. 8, 2003]
that is what I am worried about too.. If Googlebot comes and does the same thing... hm.. It came 5 times so far and just grabbed 1st page. It came last 3 days in a row just taking main page and nothing else.
I don't know if I should let them spider it. Maybe once, or twice.
Ask took 174Mb in one visit and didn't even finisn spidering.
ask yourselve one question, do you want the pages to be indexed?
If the answer is no - robots.txt the pages you don't want crawled
If the answer is yea - then you will have to up your bandwidth to deal with it.
If you have any way to monetize traffic (AdSense, especially I would think for a site like that) the crawls may be well worth it.
Well, I ask myself and get "don't know" answer :)
That is why I asked here. If the pages get indexed, how it will reflect my placement? There is not much content on those pages.
There is a plus side. Each page has country, state, and city name on it, as well as "widgets" keyword. Some people look for "widgets state", but I still not sure if that will help...
It's not only Ask problem. Hopefully other bots will show up, and I have no idea how will they act. Especially googlebot. It might come every once in a while and grab all the pages, or come once a month and take one. It's kind of unpredictable.
Ahhh so much confusion :)
peterdaly: at the moment it's a new site, and I cannot monetize the traffic in no way. I get 20 visitors/day. I was anyways, now google droped me from index, and I get almost none, except from the people that I email for link exchange :)
Teoma took up 199MB of bandwidth from my site earlier this month. I tried a robot text file to disallow it, but it still kept coming. I finally had to block it by IP filter.
I made the personal decision that bandwidth preservation was more important than the chance that Teoma will bring any significant amount of visitors to my site. My site, can be compared to yours in that mines doesn't really have true original content at the moment. It's a bunch of affiliate links and my objective was to control cost.
You should be able to purchase additional bandwidth from your host "temporarily". In my case, I had done that the end of last month and don't want to do that again. My site shuts down if the bandwidth limit for the month is exceeded. This temporary bandwidth can add up to a big expense.
I am saving my bandwidth for googlebot. Google has already brought in over 1300 referers since the beginning of the month, while Teoma brought in just one! In the past, Teoma hasn't referred much, if at all, to my site (unless it doesn't give out it's referer info).
Yes, same happened to me. My site got shutdown with a message that I exeeded the bandwidth limit. It all happened over night. I actually got an automatic notification from my host when I reached 80% of my bandwidth limit, but I was sleeping.
It was partially my fault anyways. I have a reseller account, and I can assign bandwidth limits myself. So I assigned 100Mb limit to that account. Who knew? I had 7mb of transfer in 2 weeks. 100MB looked like enough.
Well, I will see. If Ask is going to come frequently and grab *everything* it finds, I will just limit the access to that specific folder I guess... Let it crawl the rest of the site.
Obviously i don't know what your site is about, hence whether the 20k pages are built around a products database, which would have 'manufacturer' 'type' 'model no#' and other characteristics, then i would suggest - yes - especially if the content on the pages is unique to each other.
If however the conent is not for commercial advertising, then the cost of not having some kind of income from the site, may outweigh the cost of uping your upstream throughput.
Unique commercial content - as long as the site is built well -> worth allowing all bots to crawl.
If not commercial - then i would look at restrictive measures, on pages that probably don't need to be crawled.
Can somebody help me pay for inclusion into teoma.com? Please, only one URL? firstname.lastname@example.org
in our country we have no payment systems. Please!