joined:Nov 2, 2006
First of all, I'll admit that I'm a bit overwhelmed by this particular forum. We have a fair-to-middling robots.txt file, and ban the occasional bot that goes way overboard in violating it, but for the most part we're pretty hands-off. It hasn't really become a huge problem for us in terms of bandwidth or content scraping. We're strictly ecommerce and yeah we get scraped from time to time, but we haven't seen anything that's makes us think it affects our bottom line.
However, we're in an industry where the biggest of big ecomm sites has started showing some significant interest. I know that this particular company has a price matching program that crawls a list of competitor sites and automatically matches the lowest price. I also heard through the grapevine that our site is on that list of sites being price matched.
I'd love nothing more than to ban their bot. However, I've looked through our logs and I don't see anything that looks suspicious. They must be masking their user agent and who knows what else, and I just don't know enough about how to proceed.
Does anyone have any tricks for identifying these camouflaged bots? Based on anecdotal information and gut feelings, I think there may be several competitors scraping our prices.
*** SIDE NOTE ***
About a year ago, we built a similar program to scrape pricing from some competitors who always seemed a step ahead of us on the pricing of competitive products. I felt that if they were doing it to us it was fair game. I started a discussion here about the ethics of what we were doing, and learned that it wasn't maybe as white hat as I'd thought. FWIW, we've shelved that project based on the feedback I got here, particularly that of one of this forum's mods. We really are trying to be one of the good guys, but its getting flippin' hard.