homepage Welcome to WebmasterWorld Guest from 54.204.168.212
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Ask - Teoma
Forum Library, Charter, Moderator: open

Ask - Teoma Forum

    
Ask eating bandwidth
Any way to limit it?
abbeyvet

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 504 posted 2:40 pm on Jul 7, 2005 (gmt 0)

Without completely stopping it :(

In a way this is a follow on from this thread which made me look at my ASK traffic more carefully:
[webmasterworld.com...]

This from my stats for this month:

Jeeves1238722.32 GB
Inktomi Slurp212941.75 MB
Googlebot (Google) 80012.43 MB

Last month it indexed 159129 pages and used 2.04 GB, so the pace seems to be accelerating if anything. It is there all the time.

Google has it about right - I have somewhere between 900 and 1000 pages in the site proper.

What Ask is so busily indexing is basically Amazon. I have an Amazon shop that is really only intended to have about 5-6 sections of books relevant to my topic, organised in a way very different from the way Amazon does it, but Ask is following everything - every obscure author you (n)ever heard of.

It does send traffic in reasonable numbers but it's mostly irrelevant traffic looking for obscure authors for whom I now rank very highly in Ask - without ever meaning to! They do not buy anything much, maybe a couple of books a month - I have checked the pages they land on and mostly the books are not available - and the Adsense income from those pages is not brilliant.

I am caught between disallowing Ask from that directory altogether and leaving well enough alone - in a way I hate to lose ranking, even relatively useless ones!

I could plaster Adsense all over it in a very aggressive way, but I really do not want to do that for the pages that have value to my users.

Has anyone else experienced this? Does it eventually slow down? What would you do?

 

Dijkgraaf

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 504 posted 9:38 pm on Jul 7, 2005 (gmt 0)

You could slow it down by putting a high Crawl-Delay for that bot. You could also set expiries for pages well into the future so it might crawl those pages left often.

diamondgrl

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 504 posted 10:51 pm on Jul 7, 2005 (gmt 0)

Ask has been a real pain in the rear for us too. We were forced to put the crawl delay in because they were requesting pages several times a second and yet providing <1% of our traffic. I'd be more sympathetic if they were as good as Google.

abbeyvet

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 504 posted 11:39 pm on Jul 7, 2005 (gmt 0)

I have tried to add the crawl delay but I am obviously making a pigs dinner of it because I keep on blocking everone from my site

I added this:


User-agent: teoma
Crawl-Delay: 240

At various points of this:


<Files .htaccess>
order allow,deny
deny from all
</Files>

ErrorDocument 403 http://www.example.com/errordocs/403.html
ErrorDocument 404 http://www.example.com/errordocs/404.html

RewriteEngine On
RewriteCond %{HTTP_HOST} ^example.com
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

and can't make it work.

Where should it be?

Dijkgraaf

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 504 posted 11:49 pm on Jul 7, 2005 (gmt 0)

In your robots.txt do you also have the following after all the others?

User-agent: *
Disallow:

Always verify your robots.txt file. A link available from that robots.txt forum.

The second snippet should be in your .htaccess file, not in robots.txt Of course this is only for Apache web servers. If you need help with that post in that forum.
[webmasterworld.com ]

abbeyvet

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 504 posted 11:54 pm on Jul 7, 2005 (gmt 0)

My only excuse is that it is very late here, it's been a long hard day and I am exhausted. Other than that I can think of no reason why I was adding that to my .htaccess file rather than to robots.txt

I need a brain transplant.... or sleep ...... or something!

RichTC

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 504 posted 11:29 pm on Jul 17, 2005 (gmt 0)

Now i have a similar problem with it. Its eating loads of bandwidth and crawling loads of pages.

Ive had 24,000 hits in four days and its had almost 3GB of Bandwidth - I wouldnt mind if it indexed some of the pages its cashed, currently ive only a few pages in the index, our home page was last updated about a year ago!.

I can only conclude that its either imposed some sort of penalty on us but still cashing pages but not including them in the index or it updates its index with snail pace.

Either way i think we need to block it for all the good it does. We see next to no traffic from ASK anyway.

abbeyvet

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 504 posted 11:39 pm on Jul 17, 2005 (gmt 0)

I had to block it to stop it - the crawl delay did not work. However a day later I set up the crawl delay and removed the block - now it is there all the time but playing nicely. I set the delay to 180, and it is duly getting about 500 pages a day.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Ask - Teoma
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved