WHAT is this critter?
I found a good 40-50 hits from www.yournetdetective.com in my logs.
Each one requested the same 3rd level page (some map with a little text)
and every last one had a different user agent.
Yournetdetective really rang the changes.
Every version of MSIE, Windows, Linux, Macintosh, Konquerer .. even Commodore 64!
Each UA was different, one hit each. Now what is the purpose of all that?
I Googled up www.yournetdetective.com expecting to find some discussion here, but nothing.
Instead, I find all sorts of 'Affiliate Programs'. \
Is THAT is what they are all about? If so, I want no part of it.
The question remains: What possible purpose is a long crazy string of hits like that?
Is somebody trying to see if my pages render differently? If I'm cloaking?
They certainly won't get me to sign up. Anybody else see this recently? -Larry
Well, they are back again, on today's access logs.
Same exact method of operation, only now they picked another 3rd level page,
as always without the accompanying image, and with the same stew of unique user agents.
Before I 86 them via .htaccess, I would like to know if this pest is unique to my site.
Any info at all much appreciated. -Larry
Looking at the site, it looks like they offer some sort of net monitoring services.
When I get hit by bots from people like this, especially when they lie in their user agent sting, I block them first and (sometimes) ask questions later.
I'm curious. Why block them?
|cgrantski asked: I'm curious. Why block them? |
What good reason would they have for violating robots.txt and then lying about who and/or what they are?
I'd be interested in seeing the exact string that included "Commodore 64" within it, because the web browser I wrote for the 64 includes "Commodore 64" within the user-agent string.
Anyone who abuses "Commodore 64" unlikely to honor Good Friday and Chocolate Saturday.
PS: The creep is back again. - Larry
Here's the exact Commodore user agent given:
"http://www.yournetdetective.com" "Mozilla/4.0 (compatible; X 10.0; Commodore 64)"
There were others for Amiga, macintosh, all sorts of stuff. -Larry
cgrantski asked: I'm curious. Why block them?
stapel said: What good reason would they have for violating robots.txt and then lying about who and/or what they are?
me: I too would like to know why they should be blocked, other than the reason of disliking their tactics or thinking they are suspicious. I'm serious. What tangible harm could they do?
I'm not defending them and I don't think cgrantski is either based on his past posts about ethical subjects. I'm just wondering in general about actual bad effects to the site of this kind of visitor. A little bandwidth usage, yes. The question of "what if everybody did this," okay. General principles, sure. But other than those generalities, is there a tangible threat, any real consequences?
|McElvoy asked: What tangible harm could they do? |
Since most of us pay for our bandwidth, bandwidth theft isn't a "hypothetical" "generality". Since most of us don't prefer to be hacked or scraped, or to face the dangers and difficulties of hack attempts, most of us view these issues as ones involving "tangible harm".
I'm curious as to why you aren't leery of people/bots/etc who intentionally break the rules. Why would you assume violators to be benign?
First, I'm not convinced anybody is "lying." They could be using an emulator program to see how a certain kind of page displays in a lot of browsers. Maybe they are about to try a page that contains the same map technology and they want to see on an existing page whether that map technology displays correctly under all circumstances. Or maybe the page has another bit of code that they want to check for compatibility before using it. Emulators are legitimate tools used by developers.
Yes, it's using another site for their own purposes, but who hasn't gotten ideas from another site, looked at source code on another site to help with something we're working on, looked at links to another site as a way of finding advertisers, or checked out a company before doing business with them? Or, for that matter, looked closely at a competitor's site or business?
Regarding bandwidth theft - 40 or 50 pages a day with no images is not going to cost anybody anything.
Regarding scraping, I don't see how this could be scraping. And I can't think how this is hacking or preparation for hacking, i.e. stealing somebody's data or bringing down a site. I guess anything's possible.
It's anybody's choice to block anybody for any reason, but in this case it seems to me like an overreaction. If somebody skillful was planning on hacking the site, blocking the IP won't stop them at all.
It could be referrer spam. They just hit your site with a weird referrer and you go visit their site. Looks like it worked.
|First, I'm not convinced anybody is "lying." |
OK, it doesn't directly cost me anything, but it doesn't benefit me either, so in the filter they go.
>I'm curious. Why block them?
If you run a PPC site then you have to block all bots that fail to obey robots.txt or you end up allowing bots to generate false clicks to your customers.
Paying for the bandwidth they consume is an issue, them generating false clicks can be a killer!
There are spam bots that can easily generate 20,000+ false clicks per day. They must be blocked.
I hate all bots that fail to obey robots.txt. I block them as fast as I can catch them. But, some of my more savvy clients still get click throughs from there crawling....never a good scene!
Good points for a lot of situations. My question was about this particular situation. Nobody mentioned PPC clicks happening because of this entity. 40-50 page views is the situation here and its bandwidth cost is trivial. And so forth. In this case, it seems to me that the minutes spent fussing with this one entity could be better spent improving the site or its marketing. My main action in this particular case would be to filter them from the stats for the main report, and filter them into the stats for the spiders/bots report. I'm looking for balance, you're looking for the 100% solution and a litmus test. That's fine. I just want to clear up the impression that I was saying that nothing ever needed to be blocked. I've been running and analyzing web sites since Mosaic 1.0 and, believe me, I would not be that naive.