Welcome to WebmasterWorld Guest from 54.167.252.62

Forum Moderators: Ocean10000 & incrediBILL

Message Too Old, No Replies

Twiceler/cuil.com craziness (FWIW)

     
7:16 am on Dec 6, 2009 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



Anyone else seeing odd things with Cuil's crawler in recent days? All hits are always for robots.txt but today -- 36 times in ~12 hours?! Both with the usual UA --

Mozilla/5.0 (Twiceler-0.9 http://www.cuil.com/twiceler/robot.html)

-- and with no UA at all (scroll down to see differently-named servers). Usually Twiceler visits a few times a day. Never, ever like this:

[11:23:29] crawl-14c.cuil.com
[12:11:36] crawl-4c.cuil.com
[19:21:00] crawl-1c.cuil.com
[19:24:52] crawl-1c.cuil.com
[20:41:26] crawl-14c.cuil.com
[20:45:52] crawl-14c.cuil.com
[20:57:06] crawl-15c.cuil.com
[21:01:28] crawl-15c.cuil.com
[21:06:45] crawl-12c.cuil.com
[21:09:12] crawl-17c.cuil.com
[21:09:33] crawl-19c.cuil.com
[21:11:17] crawl-12c.cuil.com
[21:13:37] crawl-17c.cuil.com
[21:13:58] crawl-5c.cuil.com
[21:14:07] crawl-19c.cuil.com
[21:14:23] crawl-16c.cuil.com
[21:14:53] crawl-7c.cuil.com
[21:17:45] crawl-4c.cuil.com
[21:18:16] crawl-5c.cuil.com
[21:18:55] crawl-16c.cuil.com
[21:19:14] crawl-7c.cuil.com
[21:20:32] crawl-9c.cuil.com
[21:24:47] crawl-9c.cuil.com
[21:28:59] crawl-18c.cuil.com
[21:29:49] crawl-2c.cuil.com
[21:30:11] crawl-3c.cuil.com
[21:33:45] crawl-18c.cuil.com
[21:34:18] crawl-8c.cuil.com
[21:34:20] crawl-2c.cuil.com
[21:35:14] crawl-3c.cuil.com
[21:38:36] crawl-8c.cuil.com
[21:43:49] crawl-6c.cuil.com
[21:47:55] crawl-6c.cuil.com

And these were without any UA at all. At leat they did read/heed robots.txt --

ramp2b.cuil.com
ramp1hq.cuil.com
ramp1hq.cuil.com

9:45 pm on Dec 6, 2009 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Cuil's stealthy behavior also comes from 67.218.99.195
ramp1hq.cuil.com at Layer42 and have earned themselves a ban.
2:21 pm on Dec 7, 2009 (gmt 0)

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



I've this thing requesting robots.txt and not proceeding any further (it wouldn't get in anyway) for more than a few days now.

This IP's make multiple requests and in no specific order, however quite close together.
216.129.119.zz
67.218.116.zzz

10:45 pm on Dec 8, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



<speculation>

Such behavior can happen in search engines when they are cleaning up a large list of URLs to eliminate those that have been disallowed. As this process is likely to be done in parallel it can manifest itself as described above.

</speculation>

3:09 am on Dec 9, 2009 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member



All requests on my sites look normal today -- and that's actually a new thing, because prior to last month, Twiceler apparently did not understand multi-user-agent policy records in robots.txt, and as a result didn't crawl the sites. That's changed now, and they're crawling away (at a normal rate).

Combined with Lord Majestic's speculation above and the "ramp" hosts with no UA, it wouldn't surprise me if they're preparing to roll out a new index some time soon.

Jim

1:47 am on Dec 18, 2009 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



FWIW... Twiceler's still hammering away at the same site, every single day, usually in the late afternoon/early evening (Pacific). The next time I'm procrastinating something dreadful, I'll e-mail them about their overkill hits to robots.txt:

[17:12:36] crawl-2c.cuil.com
[17:12:37] crawl-6c.cuil.com
[17:12:57] crawl-7c.cuil.com
[17:12:59] crawl-8c.cuil.com
[17:13:05] crawl-14c.cuil.com
[17:13:30] crawl-4c.cuil.com
[17:17:55] crawl-5c.cuil.com
[17:17:58] crawl-17c.cuil.com
[17:17:58] crawl-12c.cuil.com
[17:18:05] crawl-19c.cuil.com
[17:18:11] crawl-9c.cuil.com
[17:18:22] crawl-16c.cuil.com
[17:18:23] crawl-3c.cuil.com
[17:18:34] crawl-1c.cuil.com

Do any of you ever get any traffic from them? I don't.

7:05 am on Dec 29, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Despite what Cuil's PR people would claim, in Irish, "Cuil" means fly or bug. Despite the claims of genius made about its founders Cuil is a pest and sends zero traffic on two of my sites. One of them is one of the largest Irish web directories and the other is a very large domain history and domain statistics website. They've been hammering away for months but normally when they start getting problematic they automatically get slapped with a 503. They've been 403ed on the web directory site for not following robots.

Lord Majestic's speculation is a possibility. The last I heard of Cuil was that it was trying some social search engine experiments and some Twitter stuff was being integrated.

Regards...jmcc

 

Featured Threads

Hot Threads This Week

Hot Threads This Month