Forum Moderators: open

Message Too Old, No Replies

Ask Jeeves-Teoma

Cutbacks to cut crawling?

         

Pfui

2:08 am on Nov 10, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Featured Home Page Discussion: Ask.com Ends Search Effort thread [webmasterworld.com...] regarding IAC's Diller Surrenders to Google, Cuts Jobs, Ends Ask.com Search Effort [bloomberg.com...]

On a technical note, I wonder if/when they'll retire their toolbar or stop crawling. Their bots hit ~10 times a day using:

Mozilla/5.0 (compatible; Ask Jeeves/Teoma; http://about.ask.com/en/docs/about/webmasters.shtml)

Referrers are few and far between, and from old pages.

Pfui

8:21 pm on Nov 13, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Apparently there's been at least one (temp?) change but it's an iffy one in my book...

I first saw the following version yesterday, as did [botsvsbrowsers.com...] (They say it's not a known bot; oops.) This retrieved robots.txt and root simultaneously on a site where anything other than robots.txt is disallowed, including root:

crawler9047.ask.com
Teoma/Nutch-1.2 ( Question and Answer Search; Mozilla/2.0 (compatible; Ask Jeeves/Teoma; +http://sp.ask.com/docs/about/tech_crawling.html); bot@afarm.com)

11/12 14:09:29 /robots.txt
11/12 14:09:29 /

Note the ill-coded space after the first paren. Also, there's no mention of a Nutch variant on the string's link page, and afarm.com is registered (in Australia), but siteless.

It's been ages (years?), since I recall seeing anything from .ask.com other than the crawler in the OP. And a Nutch? Nah. Those are so prolific and typically rude that they're always only robots.txt or bust. Thus far, this one's no different.

dstiles

9:16 pm on Nov 13, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Began getting ^Teoma/Nutch yesterday (Friday 12th). I don't think I've seen it before. It was allowed access due to the way the Ask bot is trapped.

When Jade was taken over, many years ago, some really nasty things began happening, including spam to registered emails. I wonder if something siilar is happening now to Ask? The Ask web site insists on openning via a javascript redirect, which is usually blocked by Firefox/NoScript.

dstiles

10:22 pm on Nov 13, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I wonder if the nutch bot is actually just for the Q&A. It has a Mozilla/2.0 UA and a different http - although the latter redirects to the page given in the "real" bot, which I haven't seen in the past few days.

The nutch bot is not intrusive - a single hit then it goes away for a while (at the moment!).

If Ask is no longer an SE then the change to the alternative UA would make sense. I do get the occasional nutch I let through (currently only cabot/nutch) but I agree it's a poor choice.

Pfui

10:55 pm on Nov 13, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Ask is reformatting, so to speak, into a Q&A-specific kind of SE. From the article in the OP:

"The search unit will consolidate its engineering operations at its headquarters in Oakland, California, and focus its resources on developing its online question-and-answer service. ...

"The new Ask.com Q&A platform also provides answers to questions asked using natural language. The answers are provided from links to relevant Web sites, and also from hand-crafted answers from members of an Ask.com community. ..."

Continued [bloomberg.com]

tangor

11:24 am on Nov 14, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Who allows "Nutch" these days?

Teoma Jeeves Ask is down. Deal with it in .htaccess. Saves me 6-ish hits per week doing that.

Pfui

2:07 pm on Nov 14, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



FWIW, as of this morning, 05:20:37 (-0700), the 'regular' UA was/is still at it:

crawler5106.ask.com
Mozilla/5.0 (compatible; Ask Jeeves/Teoma; +http://about.ask.com/en/docs/about/webmasters.shtml)

robots.txt? Yes

dstiles

8:08 pm on Nov 14, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yes, I have some proper Ask bot accesses today. Scattered across several sites and only one or two hits at a time, not usually to home pages.

incrediBILL

7:32 pm on Nov 18, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I'm still seeing about 10 pages crawled per day from Teoma, sad, really sad.