Page is a not externally linkable
- Search Engines
-- Search Engine Spider and User Agent Identification
---- -very- much the same robot


lucy24 - 11:29 pm on Sep 22, 2012 (gmt 0)


"My data" may be putting it strongly ;) As of yesterday they've requested:

-- two copies of robots.txt
-- two directories in the form /name/index.html with no follow-up of resulting 301
-- two .sit (StuffIt) files of Mac games, datestamped 2004 but really at least 5 years older
-- two ditto, only these are patches for game files that they haven't got
-- one homemade MiSTing of similar vintage
-- two further pages also dating from around 2005
-- one random gallery page
-- one full-size jpg linked from a different gallery page

:: further detour to previous batch of visits in June ::

-- three requests for robots.txt
-- one for front page
-- two for one of the same directories as above-- only this time called correctly /name/ even though, ahem, I wasn't redirecting "index.html" at the time
-- three requests for different directory, three of them stopping short at directory-slash redirect for form /name
-- three for a different MiSTing, probably left over from when I had a very large file with this name

Before that, an even longer gap. Patterns like this make me think they've got to have collaborators. Other robots with different UAs operating from different IPs (I checked both ways) who tell them what files to ask for. The alternative is that they're working through shopping lists from 2007.

:: insert "noidea" emoticon here ::

I can block 38.something, but not the whole aaa.


Thread source:: http://www.webmasterworld.com/search_engine_spiders/4498453.htm
Brought to you by WebmasterWorld: http://www.webmasterworld.com