homepage Welcome to WebmasterWorld Guest from 54.196.201.253
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
anyone know this UA?
all the spaces are +'s :?
wkitty42




msg:3602445
 1:25 am on Mar 17, 2008 (gmt 0)

been a while since i've been by but thought i'd run this by you and see what you can tell me about it...

this thing, obviously a spider of some kind, shows up on my system from the same appsitehosting address each time... i'm trying to find out what it is, who owns it and who is running it...

"Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.0;)"

i'm unable to find anything via google because it eats the +'s :? :(

anyone?

advTHANKSance

 

incrediBILL




msg:3603422
 10:08 pm on Mar 17, 2008 (gmt 0)

Most likely a scraper or email harvester script.

I've seen stuff like this before by someone that just cut & paste that user agent string from an Windows log file which adds the "+" where spaces used to be. They think that's how it really looks and it's not valid whatsoever so it's pretty safe to block anything with "MSIE+" in a user agent string.

wkitty42




msg:3603525
 11:39 pm on Mar 17, 2008 (gmt 0)

thanks, bill...

while i tend to agree with the overall assessment, i must point out that this particular appsitehosting PITA has been hitting me for years and not just with this UA... so i'm a bit "speculative" on the "cut and paste and think that's how it really looks" portion of your assessment ;)

unfortunately, though, i think we've just purged our old logs or i'd offer up more UAs from whoever this is... the really ugly part is that they are still around and no one seems to know anything about them :?

incrediBILL




msg:3603579
 1:17 am on Mar 18, 2008 (gmt 0)

It could be a proxy IP with multiple people using it to hit your site or a single person running multiple scraper apps, really hard to say without some details.

At least you can block "Mozilla/4.0+" or "MSIE+" and get rid of whatever that one is.

wilderness




msg:3603605
 1:57 am on Mar 18, 2008 (gmt 0)

while i tend to agree with the overall assessment, i must point out that this particular appsitehosting PITA has been hitting me for years and not just with this UA... so i'm a bit "speculative" on the "cut and paste and think that's how it really looks" portion of your assessment wink

"appsitehosting", this implies the home of some sort of hosting farm and best practice is the uppermost range of the backbone.

unfortunately, though, i think we've just purged our old logs or i'd offer up more UAs from whoever this is... the really ugly part is that they are still around and no one seems to know anything about them :?

Don't you keep any records of website abuses of protocol outside of your visitor logs?

Just because a dictionary hasn't been updated to include an abusive word, doesn't imply that the word is any less abusive!
Same goes for IP's and backbone which "harbor" these pests.
If the backbone is a provider to less than reputable websites that fail to offer any benefits to your own website (s), simply take out the entire range of the backbone.

wkitty42




msg:3603719
 5:49 am on Mar 18, 2008 (gmt 0)

i was able to locate an archive of logs going back about 10 years... these guys showed up on my site, coming from appsite, back in Aug of 2006 with the same UA... my logs show that they've had two IPs in this time period and i was actually able to locate at least one (very recent) entry of access apparently by a human where they were apparently reading a forum that i participate in and they pulled a graphic off of my site from a thread that i posted that graphic in...

generally speaking, though, all they do is scarf up the entries for my files areas and "validate" that all the listed zip/archive files still exist... i have a few instances of HEAD entries but most all are GET entries... so far as i can tell, the same UA has been used in all of these appsite related accesses except a few which do not contain the +'s for spaces... those entries i also suspect are actual human and that does lend more credence to bill's original hypothesis that the UA might be copied from an IIS log and pasted into the spider's UA field by a st00pid human thinking they're being "smart" :lol:

in more recent appearances, it almost look like there's a human visiting with a background downloader following up to pull the files... i say this because i get a "normal" UA with spaces for the directory name and then there's another UA with the +'s pulling each of the filenames...

like it or not, i do recall conversing with some entity that i believe, without further research into my past sent emails, to have been appsitehosting and all they would do was confirm that that was one of their hosted sites but they would not give me any more information... i may be confusing this with another site, though, as i'm also recalling discussion about a specific spider id in the UA but nothing shows up in the logs when searched for appsite...

oh well, i could just simply lock out all of appsite's ip blocks... it isn't like anyone's gonna tell me anything and if my data goes missing, well, that's their effin' loss until such time as they decide to go legit and let me know about their activities :? ;) no skin off my back... now, if their cohorts, vericenter, sungard, and sgns want to go fess up, that's ok, too ;)

wkitty42




msg:3603724
 5:55 am on Mar 18, 2008 (gmt 0)

Don't you keep any records of website abuses of protocol outside of your visitor logs?

in what way? no, i don't sit and spend hours upon hours digging thru my logs any more... i used to several years back but that was before i discovered other things more important to do with my time ;)

my "visitor logs" contain everything that i need to know and, as i recently discovered, i do have all of them going back at least 10 years... this is also why i run webalizer and awstats ;)

wilderness




msg:3603728
 6:04 am on Mar 18, 2008 (gmt 0)

Keepping notaions of denial changes (whether adding or removing) as well as the reasons for the changes may provide valuable insights upon reflection.

The reflections also tend to make decisions easier in the future, especially reagrding what I refer to as "short-denials", where I've attempted to keep the IP ranges to a minimum and attempting to exclude as most innocents as possible.

"short-denials" nearly always, returns to bite one in the backside.

wilderness




msg:3603731
 6:14 am on Mar 18, 2008 (gmt 0)

I've a notation from this providers range for 2003, however the range itself is without restrictions in my sites.

Either they have avoided me or they are getting caught by a UA denial.

wkitty42




msg:3604172
 4:05 pm on Mar 18, 2008 (gmt 0)

FWIW: for now, i've developed a rule for my IDS which, when coupled with an IDS alert analyzer, will drop their connection and keep it dropped for a specific time period... if they come back within that time period, the block will be extended for the original time period... the more they come back during the blocking period, the longer they will be blocked for ;)

now, to wait and see if they actually trigger the rule ;)

wkitty42




msg:3604173
 4:07 pm on Mar 18, 2008 (gmt 0)

Keepping notaions of denial changes (whether adding or removing) as well as the reasons for the changes may provide valuable insights upon reflection.

to an extent, yes... i do actually have something like that... but it is more like code comments in the configs :)

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved