Welcome to WebmasterWorld Guest from 54.159.111.156

Forum Moderators: open

Message Too Old, No Replies

Ask eating bandwidth

Any way to limit it?

     
2:40 pm on Jul 7, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Without completely stopping it :(

In a way this is a follow on from this thread which made me look at my ASK traffic more carefully:
[webmasterworld.com...]

This from my stats for this month:

Jeeves1238722.32 GB
Inktomi Slurp212941.75 MB
Googlebot (Google) 80012.43 MB

Last month it indexed 159129 pages and used 2.04 GB, so the pace seems to be accelerating if anything. It is there all the time.

Google has it about right - I have somewhere between 900 and 1000 pages in the site proper.

What Ask is so busily indexing is basically Amazon. I have an Amazon shop that is really only intended to have about 5-6 sections of books relevant to my topic, organised in a way very different from the way Amazon does it, but Ask is following everything - every obscure author you (n)ever heard of.

It does send traffic in reasonable numbers but it's mostly irrelevant traffic looking for obscure authors for whom I now rank very highly in Ask - without ever meaning to! They do not buy anything much, maybe a couple of books a month - I have checked the pages they land on and mostly the books are not available - and the Adsense income from those pages is not brilliant.

I am caught between disallowing Ask from that directory altogether and leaving well enough alone - in a way I hate to lose ranking, even relatively useless ones!

I could plaster Adsense all over it in a very aggressive way, but I really do not want to do that for the pages that have value to my users.

Has anyone else experienced this? Does it eventually slow down? What would you do?

9:38 pm on Jul 7, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You could slow it down by putting a high Crawl-Delay for that bot. You could also set expiries for pages well into the future so it might crawl those pages left often.
10:51 pm on Jul 7, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Ask has been a real pain in the rear for us too. We were forced to put the crawl delay in because they were requesting pages several times a second and yet providing <1% of our traffic. I'd be more sympathetic if they were as good as Google.
11:39 pm on Jul 7, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I have tried to add the crawl delay but I am obviously making a pigs dinner of it because I keep on blocking everone from my site

I added this:


User-agent: teoma
Crawl-Delay: 240

At various points of this:


<Files .htaccess>
order allow,deny
deny from all
</Files>

ErrorDocument 403 http://www.example.com/errordocs/403.html
ErrorDocument 404 http://www.example.com/errordocs/404.html

RewriteEngine On
RewriteCond %{HTTP_HOST} ^example.com
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

and can't make it work.

Where should it be?

11:49 pm on Jul 7, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



In your robots.txt do you also have the following after all the others?

User-agent: *
Disallow:

Always verify your robots.txt file. A link available from that robots.txt forum.

The second snippet should be in your .htaccess file, not in robots.txt Of course this is only for Apache web servers. If you need help with that post in that forum.
[webmasterworld.com ]

11:54 pm on Jul 7, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



My only excuse is that it is very late here, it's been a long hard day and I am exhausted. Other than that I can think of no reason why I was adding that to my .htaccess file rather than to robots.txt

I need a brain transplant.... or sleep ...... or something!

11:29 pm on Jul 17, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Now i have a similar problem with it. Its eating loads of bandwidth and crawling loads of pages.

Ive had 24,000 hits in four days and its had almost 3GB of Bandwidth - I wouldnt mind if it indexed some of the pages its cashed, currently ive only a few pages in the index, our home page was last updated about a year ago!.

I can only conclude that its either imposed some sort of penalty on us but still cashing pages but not including them in the index or it updates its index with snail pace.

Either way i think we need to block it for all the good it does. We see next to no traffic from ASK anyway.

11:39 pm on Jul 17, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I had to block it to stop it - the crawl delay did not work. However a day later I set up the crawl delay and removed the block - now it is there all the time but playing nicely. I set the delay to 180, and it is duly getting about 500 pages a day.
 

Featured Threads

Hot Threads This Week

Hot Threads This Month