Welcome to WebmasterWorld Guest from 54.82.57.154

Forum Moderators: Ocean10000 & incrediBILL & keyplyr

Message Too Old, No Replies

Twiceler/cuil.com craziness (FWIW)

     
7:16 am on Dec 6, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1


Anyone else seeing odd things with Cuil's crawler in recent days? All hits are always for robots.txt but today -- 36 times in ~12 hours?! Both with the usual UA --

Mozilla/5.0 (Twiceler-0.9 http://www.cuil.com/twiceler/robot.html)

-- and with no UA at all (scroll down to see differently-named servers). Usually Twiceler visits a few times a day. Never, ever like this:

[11:23:29] crawl-14c.cuil.com
[12:11:36] crawl-4c.cuil.com
[19:21:00] crawl-1c.cuil.com
[19:24:52] crawl-1c.cuil.com
[20:41:26] crawl-14c.cuil.com
[20:45:52] crawl-14c.cuil.com
[20:57:06] crawl-15c.cuil.com
[21:01:28] crawl-15c.cuil.com
[21:06:45] crawl-12c.cuil.com
[21:09:12] crawl-17c.cuil.com
[21:09:33] crawl-19c.cuil.com
[21:11:17] crawl-12c.cuil.com
[21:13:37] crawl-17c.cuil.com
[21:13:58] crawl-5c.cuil.com
[21:14:07] crawl-19c.cuil.com
[21:14:23] crawl-16c.cuil.com
[21:14:53] crawl-7c.cuil.com
[21:17:45] crawl-4c.cuil.com
[21:18:16] crawl-5c.cuil.com
[21:18:55] crawl-16c.cuil.com
[21:19:14] crawl-7c.cuil.com
[21:20:32] crawl-9c.cuil.com
[21:24:47] crawl-9c.cuil.com
[21:28:59] crawl-18c.cuil.com
[21:29:49] crawl-2c.cuil.com
[21:30:11] crawl-3c.cuil.com
[21:33:45] crawl-18c.cuil.com
[21:34:18] crawl-8c.cuil.com
[21:34:20] crawl-2c.cuil.com
[21:35:14] crawl-3c.cuil.com
[21:38:36] crawl-8c.cuil.com
[21:43:49] crawl-6c.cuil.com
[21:47:55] crawl-6c.cuil.com

And these were without any UA at all. At leat they did read/heed robots.txt --

ramp2b.cuil.com
ramp1hq.cuil.com
ramp1hq.cuil.com

9:45 pm on Dec 6, 2009 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:10433
votes: 602


Cuil's stealthy behavior also comes from 67.218.99.195
ramp1hq.cuil.com at Layer42 and have earned themselves a ban.
2:21 pm on Dec 7, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2001
posts:5463
votes: 3


I've this thing requesting robots.txt and not proceeding any further (it wouldn't get in anyway) for more than a few days now.

This IP's make multiple requests and in no specific order, however quite close together.
216.129.119.zz
67.218.116.zzz

10:45 pm on Dec 8, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Aug 8, 2004
posts:1679
votes: 0


<speculation>

Such behavior can happen in search engines when they are cleaning up a large list of URLs to eliminate those that have been disallowed. As this process is likely to be done in parallel it can manifest itself as described above.

</speculation>

3:09 am on Dec 9, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


All requests on my sites look normal today -- and that's actually a new thing, because prior to last month, Twiceler apparently did not understand multi-user-agent policy records in robots.txt, and as a result didn't crawl the sites. That's changed now, and they're crawling away (at a normal rate).

Combined with Lord Majestic's speculation above and the "ramp" hosts with no UA, it wouldn't surprise me if they're preparing to roll out a new index some time soon.

Jim

1:47 am on Dec 18, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1


FWIW... Twiceler's still hammering away at the same site, every single day, usually in the late afternoon/early evening (Pacific). The next time I'm procrastinating something dreadful, I'll e-mail them about their overkill hits to robots.txt:

[17:12:36] crawl-2c.cuil.com
[17:12:37] crawl-6c.cuil.com
[17:12:57] crawl-7c.cuil.com
[17:12:59] crawl-8c.cuil.com
[17:13:05] crawl-14c.cuil.com
[17:13:30] crawl-4c.cuil.com
[17:17:55] crawl-5c.cuil.com
[17:17:58] crawl-17c.cuil.com
[17:17:58] crawl-12c.cuil.com
[17:18:05] crawl-19c.cuil.com
[17:18:11] crawl-9c.cuil.com
[17:18:22] crawl-16c.cuil.com
[17:18:23] crawl-3c.cuil.com
[17:18:34] crawl-1c.cuil.com

Do any of you ever get any traffic from them? I don't.

7:05 am on Dec 29, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Aug 30, 2002
posts:2620
votes: 93


Despite what Cuil's PR people would claim, in Irish, "Cuil" means fly or bug. Despite the claims of genius made about its founders Cuil is a pest and sends zero traffic on two of my sites. One of them is one of the largest Irish web directories and the other is a very large domain history and domain statistics website. They've been hammering away for months but normally when they start getting problematic they automatically get slapped with a 503. They've been 403ed on the web directory site for not following robots.

Lord Majestic's speculation is a possibility. The last I heard of Cuil was that it was trying some social search engine experiments and some Twitter stuff was being integrated.

Regards...jmcc

 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members