Welcome to WebmasterWorld Guest from 54.196.208.6

Forum Moderators: Ocean10000 & incrediBILL

Message Too Old, No Replies

Twiceler/cuil.com craziness (FWIW)

     
7:16 am on Dec 6, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1


Anyone else seeing odd things with Cuil's crawler in recent days? All hits are always for robots.txt but today -- 36 times in ~12 hours?! Both with the usual UA --

Mozilla/5.0 (Twiceler-0.9 http://www.cuil.com/twiceler/robot.html)

-- and with no UA at all (scroll down to see differently-named servers). Usually Twiceler visits a few times a day. Never, ever like this:

[11:23:29] crawl-14c.cuil.com
[12:11:36] crawl-4c.cuil.com
[19:21:00] crawl-1c.cuil.com
[19:24:52] crawl-1c.cuil.com
[20:41:26] crawl-14c.cuil.com
[20:45:52] crawl-14c.cuil.com
[20:57:06] crawl-15c.cuil.com
[21:01:28] crawl-15c.cuil.com
[21:06:45] crawl-12c.cuil.com
[21:09:12] crawl-17c.cuil.com
[21:09:33] crawl-19c.cuil.com
[21:11:17] crawl-12c.cuil.com
[21:13:37] crawl-17c.cuil.com
[21:13:58] crawl-5c.cuil.com
[21:14:07] crawl-19c.cuil.com
[21:14:23] crawl-16c.cuil.com
[21:14:53] crawl-7c.cuil.com
[21:17:45] crawl-4c.cuil.com
[21:18:16] crawl-5c.cuil.com
[21:18:55] crawl-16c.cuil.com
[21:19:14] crawl-7c.cuil.com
[21:20:32] crawl-9c.cuil.com
[21:24:47] crawl-9c.cuil.com
[21:28:59] crawl-18c.cuil.com
[21:29:49] crawl-2c.cuil.com
[21:30:11] crawl-3c.cuil.com
[21:33:45] crawl-18c.cuil.com
[21:34:18] crawl-8c.cuil.com
[21:34:20] crawl-2c.cuil.com
[21:35:14] crawl-3c.cuil.com
[21:38:36] crawl-8c.cuil.com
[21:43:49] crawl-6c.cuil.com
[21:47:55] crawl-6c.cuil.com

And these were without any UA at all. At leat they did read/heed robots.txt --

ramp2b.cuil.com
ramp1hq.cuil.com
ramp1hq.cuil.com

9:45 pm on Dec 6, 2009 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:5806
votes: 64


Cuil's stealthy behavior also comes from 67.218.99.195
ramp1hq.cuil.com at Layer42 and have earned themselves a ban.
2:21 pm on Dec 7, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2001
posts:5408
votes: 2


I've this thing requesting robots.txt and not proceeding any further (it wouldn't get in anyway) for more than a few days now.

This IP's make multiple requests and in no specific order, however quite close together.
216.129.119.zz
67.218.116.zzz

10:45 pm on Dec 8, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Aug 8, 2004
posts:1679
votes: 0


<speculation>

Such behavior can happen in search engines when they are cleaning up a large list of URLs to eliminate those that have been disallowed. As this process is likely to be done in parallel it can manifest itself as described above.

</speculation>

3:09 am on Dec 9, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


All requests on my sites look normal today -- and that's actually a new thing, because prior to last month, Twiceler apparently did not understand multi-user-agent policy records in robots.txt, and as a result didn't crawl the sites. That's changed now, and they're crawling away (at a normal rate).

Combined with Lord Majestic's speculation above and the "ramp" hosts with no UA, it wouldn't surprise me if they're preparing to roll out a new index some time soon.

Jim

1:47 am on Dec 18, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1


FWIW... Twiceler's still hammering away at the same site, every single day, usually in the late afternoon/early evening (Pacific). The next time I'm procrastinating something dreadful, I'll e-mail them about their overkill hits to robots.txt:

[17:12:36] crawl-2c.cuil.com
[17:12:37] crawl-6c.cuil.com
[17:12:57] crawl-7c.cuil.com
[17:12:59] crawl-8c.cuil.com
[17:13:05] crawl-14c.cuil.com
[17:13:30] crawl-4c.cuil.com
[17:17:55] crawl-5c.cuil.com
[17:17:58] crawl-17c.cuil.com
[17:17:58] crawl-12c.cuil.com
[17:18:05] crawl-19c.cuil.com
[17:18:11] crawl-9c.cuil.com
[17:18:22] crawl-16c.cuil.com
[17:18:23] crawl-3c.cuil.com
[17:18:34] crawl-1c.cuil.com

Do any of you ever get any traffic from them? I don't.

7:05 am on Dec 29, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Aug 30, 2002
posts:2415
votes: 24


Despite what Cuil's PR people would claim, in Irish, "Cuil" means fly or bug. Despite the claims of genius made about its founders Cuil is a pest and sends zero traffic on two of my sites. One of them is one of the largest Irish web directories and the other is a very large domain history and domain statistics website. They've been hammering away for months but normally when they start getting problematic they automatically get slapped with a 503. They've been 403ed on the web directory site for not following robots.

Lord Majestic's speculation is a possibility. The last I heard of Cuil was that it was trying some social search engine experiments and some Twitter stuff was being integrated.

Regards...jmcc