Forum Moderators: phranque
on one of my sites, I have a page which contains money off discount codes for various etailers. Some of the codes are only available on my site due to an exclusive deal with the etailers.
I run adverts for related but non-competing businesses on the same page as the codes, and until recently this was working well. The problem is someone is distributing the codes when they get publised (they are published approx once a week at a random time and the codes are only valid for the first x shoppers). People don't have to visit my site to get the codes anymore so traffic and therefore revenue has dropped to about 40% what it was. My hosting provider has been very helpful, we've tried blocking/filtering traffic based on IP and request headers but he says the most likely suspect is a screen scraper, and it's hiding itself well. I suspect it's automated software of some kind too because the codes start appearing online within minutes of being published (it used to be hours). That can only happen if someone or something is constantly monitoring the page (the software is probably distributed amongst several people because there's no easily identifiable IP address which is hitting the server more than others)
I experimented with publishing the codes as captchas but it hasn't made much difference. It slowed them down but the codes were still getting posted online within an hour or so. My regular visitors also hate the captchas because it means typing out a lengthy code by hand, and of course they're not accessible which I hate.
so you see my predicament. I need a way to post the codes on-line which will require a person to visit the web page without p***ing of my visitors. am I trying to skate up hill?
any input appreciated.
If it was an automated scraper there's lots of things you could do to block, search these forums for blocking bots, but for a single page where all the person has to do is click 'save as' to get your data, I don't see what you can do to stop it.
If it's automated, all you have to do is build a bunch of spider traps around and on the page, but it sounds to me like they're using proxy servers to access your pages, so blocking by ip won't work [all you have to do anyway is get a dialup account, new ip every time]. Sorry, too bad.
Or use the image method that is used on godaddy, netsol, etc. Generate the codes and put them in an IMAGE that can't be read by a program.
That's known as a captcha, which he has already tried.
I suspect that it is users and not scrapers which are the problem. After all, what incentive do they have to NOT share, once they have their own code?
If the codes are only valid for x number of shoppers, and x is a small number, could they not each have a unique code?
If the codes are valuable and you don't spam the users people generally don't mind registering. You may experience a little backlash in the beginning from some of the people who are accustomed to getting the information anonymously but they'll get over it if the information is valuable.
You should be able to track down the culprit pretty quickly.
Then adjust one (or more) of the codes you are giving away in a simple way.
Add the value you put in the cookie to the code, this way it is possible to find out where the person comes from with a specific code.
Make sure that the businesses that accept the codes know they can ignore the extra numbers.
When you post the codes on your own website, go check the site that scraped the results and you know wich visitor got the codes.
This way you know where the requests are coming from and you may see a pattern.
If they're not to smart you should be able to give them fake codes by checking for their cookie.