homepage Welcome to WebmasterWorld Guest from 54.197.19.35
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
One for the UA profilers
Bewenched




msg:4533177
 8:10 am on Jan 4, 2013 (gmt 0)

Apache-HttpClient/UNAVAILABLE+(java+1.4)

the java got them blocked.... very imaginative with the UNAVAILABLE tag there... LOL

 

dstiles




msg:4533395
 10:24 pm on Jan 4, 2013 (gmt 0)

httpclient also gets blocked here and there are other traps the UA would have fallen into if they failed. :)

lucy24




msg:4533624
 11:33 pm on Jan 5, 2013 (gmt 0)

I like UNAVAILABLE. It suggests that they were using an off-the-rack program to generate their robot's name, and the function they needed was unavailable so that's what got plugged into Slot B of the name :)

I'm narrower on java. It has to be Java/ like that. Probably because I accidentally locked out the wrong person.

Apache-HttpClient really? Quick detour to raw logs reveals a solid block of them from early July to early September last year, with a scant handful earlier. Maybe they sold their robots to a poorer country.

:: closer inspection of some random dates ::

Oh, will you look at that.

98.139.243.64 - - [07/Sep/2012:02:23:46 -0700] "GET /paintings/sparerats/blowups/largepinkriver.jpg HTTP/1.1" 403 1423 "-" "Apache-HttpClient/4.1 (java 1.5)"

Apparently once a day for two solid months, and I never noticed because they were already blocked by IP.

:: further shuffling of papers ::

Oh. Yahoo Cache System. Imagine that. Wonder how they got fixated on this file? It's so obscure, it doesn't even have a page to go with it, just a click-for-blowup link.

keyplyr




msg:4533663
 5:01 am on Jan 6, 2013 (gmt 0)

Apache-HttpClient really? Quick detour to raw logs reveals a solid block of them from early July to early September last year, with a scant handful earlier. Maybe they sold their robots to a poorer country.

Apache-HttpClient is not a robot. It's a core module for the Appache server, although it is most often used as a document retrieval tool just as a bot is.

incrediBILL




msg:4533672
 6:38 am on Jan 6, 2013 (gmt 0)

Not starting with Opera or Mozilla gets them blocked here if it gets past my "browser header check" which it probably wouldn't based on that UA.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved