Page is a not externally linkable
- Search Engines
-- Search Engine Spider and User Agent Identification
---- Mr.Carlito


idiotgirl - 7:51 am on Jul 6, 2008 (gmt 0)


Here's one I haven't seen before:
64.237.57.*** - - [05/Jul/2008:20:28:36 -0400] "GET / HTTP/1.1" 200 7643 "-" "Mozilla/5.0 (MrCarlito-0.1 http://www.mrcarlito.com/spider.html)"

Didn't check robots.txt. The reference page says:
MrCarlito-0.1 is an experimental spider that collects header & link information from web pages. The spider is written in PERL (Practical Extraction and Report Language), and uses the LWP::UserAgent Class. Currently this spider does not delve into websites, it simply obtains the headers & hostnames contained in your web page index.

IMHO - it would be more polite if Mr.Carlito bothered to check with robots.txt to see if he's welcome. I guess that's not Carlito's Way.

[edited by: incrediBILL at 8:12 pm (utc) on July 6, 2008]
[edit reason] fixed formatting and link [/edit]


Thread source:: http://www.webmasterworld.com/search_engine_spiders/3691492.htm
Brought to you by WebmasterWorld: http://www.webmasterworld.com