Forum Moderators: open
[webmasterworld.com...]
I just sent them an email complaining that their bot does not identify itself as such, throwing off my stats--in two visits it racked up 6,000 pageviews and 60MB of bandwidth. Not a huge amount, but I'd still like to get it out of my stats.
For anyone who is interested, the two IPs I have for them are:
72.20.99.**
72.20.99.**
I haven't blocked them yet, but am considering it.
What are others feelings about this sort of practice, scraping a site's content to report to manufacturers what websites are saying about them?
[edited by: volatilegx at 3:14 am (utc) on Nov. 26, 2006]
[edit reason] obfuscated ip addresses [/edit]
What are others feelings about this sort of practice, scraping a site's content to report to manufacturers what websites are saying about them?
Personally, I find it annoying, and I think it is understandable why webmasters would want to ban bots like this.
I can also understand why companies might find this sort of intelligence useful.
Thanks for the welcome.
I find it especially annoying that their bot doesn't indentify itself as such. I saw the term UA in other threads--is this how bots usually identify themselves?
Other people have reported seeing the Brandimensions bot using this UA: "BDFetch".
I'm interested to know how you positively identified your bot as coming from Brandimensions. What UA did it provide?
These are the IP ranges that have been reliably reported for BDFetch bot so far:
64.26.128.*
70.25.237.*
204.92.59.*
216.183.91.*
[edited by: Mokita at 4:50 am (utc) on Nov. 26, 2006]
Anyway, I had to put 2 and 2 together. I just use awstats, which I suppose is probably the worst such program. But on the 24th I noticed a spike in my 404 errors. Looking at the details, awstats reported that www.brandimensions.com had made 1488 requests for www.truedelta.com/www.trudelta.com. Why their software would do this, I have no idea. But it got my attention.
I then looked at the list of top hosts, where bots that identify themselves as bots don't show up. There I saw nearly 3000 pageviews attributed to one of the IPs I listed above for the same date as the huge set of 404 errors. I also then noticed that a very similar IP (just last digit different) was responsible for a similar number of pageviews back on the 11th.
No human is capable of 3,000 pageviews. So clearly a bot. And one IP last visited the same day brandimensions had the large number of 404s.
What I don't know is how awstats traced the 404s to Brandimensions, since when I do a lookup of the IPs they go to Bay Area Internet Solutions. Which is a commercial ISP, I suppose to Brandimensions, no idea how awstats did a more complete trace.
Is this considered acceptable, to use an IP for such activities that WHOIS traces to an ISP? Or to use so many different IPs, which suggests that they might be trying to get around blocks?
BDFetch is not among the listed bots. Nor have I logged any of the IPs you listed.
I did, however, have an "Unknown robot (identified by crawl)" visit on the 25th. It was responsible for 3,707+18 hits, so it doesn't match up with the hits for the two IPs I originally reported. Most likely not the same agent.
One thing I do not get is why the pageviews and hits for bots are nearly identical for both the "unknown robot" and for the IPs I listed. Hits on my site are about 3x pageviews for normal visitors. I assume that bots just request text somehow, even when they don't identify themselves as bots.
If my logic is faulty, then I apologize. To date I haven't done any tracing/blocking of bots, but I fear I'm going to have to start paying more attention to them. This forum seems a good place to do some learning.