| 2:25 pm on Apr 14, 2006 (gmt 0)|
Well, they are back again, on today's access logs.
Same exact method of operation, only now they picked another 3rd level page,
as always without the accompanying image, and with the same stew of unique user agents.
Before I 86 them via .htaccess, I would like to know if this pest is unique to my site.
Any info at all much appreciated. -Larry
| 3:19 pm on Apr 14, 2006 (gmt 0)|
Looking at the site, it looks like they offer some sort of net monitoring services.
When I get hit by bots from people like this, especially when they lie in their user agent sting, I block them first and (sometimes) ask questions later.
| 5:30 pm on Apr 14, 2006 (gmt 0)|
I'm curious. Why block them?
| 8:36 pm on Apr 14, 2006 (gmt 0)|
|cgrantski asked: I'm curious. Why block them? |
What good reason would they have for violating robots.txt and then lying about who and/or what they are?
| 9:58 pm on Apr 14, 2006 (gmt 0)|
I'd be interested in seeing the exact string that included "Commodore 64" within it, because the web browser I wrote for the 64 includes "Commodore 64" within the user-agent string.
| 10:37 pm on Apr 14, 2006 (gmt 0)|
Anyone who abuses "Commodore 64" unlikely to honor Good Friday and Chocolate Saturday.
PS: The creep is back again. - Larry
| 10:27 pm on Apr 15, 2006 (gmt 0)|
Here's the exact Commodore user agent given:
"http://www.yournetdetective.com" "Mozilla/4.0 (compatible; X 10.0; Commodore 64)"
There were others for Amiga, macintosh, all sorts of stuff. -Larry
| 4:59 pm on Apr 16, 2006 (gmt 0)|
cgrantski asked: I'm curious. Why block them?
stapel said: What good reason would they have for violating robots.txt and then lying about who and/or what they are?
me: I too would like to know why they should be blocked, other than the reason of disliking their tactics or thinking they are suspicious. I'm serious. What tangible harm could they do?
I'm not defending them and I don't think cgrantski is either based on his past posts about ethical subjects. I'm just wondering in general about actual bad effects to the site of this kind of visitor. A little bandwidth usage, yes. The question of "what if everybody did this," okay. General principles, sure. But other than those generalities, is there a tangible threat, any real consequences?
| 5:28 pm on Apr 16, 2006 (gmt 0)|
|McElvoy asked: What tangible harm could they do? |
Since most of us pay for our bandwidth, bandwidth theft isn't a "hypothetical" "generality". Since most of us don't prefer to be hacked or scraped, or to face the dangers and difficulties of hack attempts, most of us view these issues as ones involving "tangible harm".
I'm curious as to why you aren't leery of people/bots/etc who intentionally break the rules. Why would you assume violators to be benign?
| 1:35 pm on Apr 17, 2006 (gmt 0)|
First, I'm not convinced anybody is "lying." They could be using an emulator program to see how a certain kind of page displays in a lot of browsers. Maybe they are about to try a page that contains the same map technology and they want to see on an existing page whether that map technology displays correctly under all circumstances. Or maybe the page has another bit of code that they want to check for compatibility before using it. Emulators are legitimate tools used by developers.
Yes, it's using another site for their own purposes, but who hasn't gotten ideas from another site, looked at source code on another site to help with something we're working on, looked at links to another site as a way of finding advertisers, or checked out a company before doing business with them? Or, for that matter, looked closely at a competitor's site or business?
Regarding bandwidth theft - 40 or 50 pages a day with no images is not going to cost anybody anything.
Regarding scraping, I don't see how this could be scraping. And I can't think how this is hacking or preparation for hacking, i.e. stealing somebody's data or bringing down a site. I guess anything's possible.
It's anybody's choice to block anybody for any reason, but in this case it seems to me like an overreaction. If somebody skillful was planning on hacking the site, blocking the IP won't stop them at all.
| 1:46 pm on Apr 17, 2006 (gmt 0)|
It could be referrer spam. They just hit your site with a weird referrer and you go visit their site. Looks like it worked.
| 6:40 am on Apr 19, 2006 (gmt 0)|
|First, I'm not convinced anybody is "lying." |
OK, it doesn't directly cost me anything, but it doesn't benefit me either, so in the filter they go.
| 7:35 am on Apr 19, 2006 (gmt 0)|
>I'm curious. Why block them?
If you run a PPC site then you have to block all bots that fail to obey robots.txt or you end up allowing bots to generate false clicks to your customers.
Paying for the bandwidth they consume is an issue, them generating false clicks can be a killer!
There are spam bots that can easily generate 20,000+ false clicks per day. They must be blocked.
I hate all bots that fail to obey robots.txt. I block them as fast as I can catch them. But, some of my more savvy clients still get click throughs from there crawling....never a good scene!
| 10:05 am on Apr 20, 2006 (gmt 0)|
Good points for a lot of situations. My question was about this particular situation. Nobody mentioned PPC clicks happening because of this entity. 40-50 page views is the situation here and its bandwidth cost is trivial. And so forth. In this case, it seems to me that the minutes spent fussing with this one entity could be better spent improving the site or its marketing. My main action in this particular case would be to filter them from the stats for the main report, and filter them into the stats for the spiders/bots report. I'm looking for balance, you're looking for the 100% solution and a litmus test. That's fine. I just want to clear up the impression that I was saying that nothing ever needed to be blocked. I've been running and analyzing web sites since Mosaic 1.0 and, believe me, I would not be that naive.