Page is a not externally linkable
- Search Engines
-- Search Engine Spider and User Agent Identification
---- Twiceler/cuil.com craziness (FWIW)


Pfui - 7:16 am on Dec 6, 2009 (gmt 0)


Anyone else seeing odd things with Cuil's crawler in recent days? All hits are always for robots.txt but today -- 36 times in ~12 hours?! Both with the usual UA --

Mozilla/5.0 (Twiceler-0.9 http://www.cuil.com/twiceler/robot.html)

-- and with no UA at all (scroll down to see differently-named servers). Usually Twiceler visits a few times a day. Never, ever like this:

[11:23:29] crawl-14c.cuil.com
[12:11:36] crawl-4c.cuil.com
[19:21:00] crawl-1c.cuil.com
[19:24:52] crawl-1c.cuil.com
[20:41:26] crawl-14c.cuil.com
[20:45:52] crawl-14c.cuil.com
[20:57:06] crawl-15c.cuil.com
[21:01:28] crawl-15c.cuil.com
[21:06:45] crawl-12c.cuil.com
[21:09:12] crawl-17c.cuil.com
[21:09:33] crawl-19c.cuil.com
[21:11:17] crawl-12c.cuil.com
[21:13:37] crawl-17c.cuil.com
[21:13:58] crawl-5c.cuil.com
[21:14:07] crawl-19c.cuil.com
[21:14:23] crawl-16c.cuil.com
[21:14:53] crawl-7c.cuil.com
[21:17:45] crawl-4c.cuil.com
[21:18:16] crawl-5c.cuil.com
[21:18:55] crawl-16c.cuil.com
[21:19:14] crawl-7c.cuil.com
[21:20:32] crawl-9c.cuil.com
[21:24:47] crawl-9c.cuil.com
[21:28:59] crawl-18c.cuil.com
[21:29:49] crawl-2c.cuil.com
[21:30:11] crawl-3c.cuil.com
[21:33:45] crawl-18c.cuil.com
[21:34:18] crawl-8c.cuil.com
[21:34:20] crawl-2c.cuil.com
[21:35:14] crawl-3c.cuil.com
[21:38:36] crawl-8c.cuil.com
[21:43:49] crawl-6c.cuil.com
[21:47:55] crawl-6c.cuil.com

And these were without any UA at all. At leat they did read/heed robots.txt --

ramp2b.cuil.com
ramp1hq.cuil.com
ramp1hq.cuil.com


Thread source:: http://www.webmasterworld.com/search_engine_spiders/4038054.htm
Brought to you by WebmasterWorld: http://www.webmasterworld.com