|Site cloned by proxy scraper?|
| 11:54 am on Sep 11, 2012 (gmt 0)|
Rather interesting method. We were up in arms all morning checking server security trying to work out how they are doing this.
We've noticed a different domain making some requests for our js files and imagery.
Upon further inspection we noticed theres a site out there which is an EXACT duplicate of our site, apart from the logo, and each mention of our business name had been replaced with their business name, also phone numbers replaced with different phone numbers. Apart from that, content, layout, text and imagery is absolutely the same. Google indexed the site and it was ranking higher than ours, which is worrying.
What was weird is that changing something on the homepage of our site (in dbase) we could see the change on his cloned site immediately, making us believe the attacker had access to our dbase, or dns server or something.
For some reason I kept thinking he is piping http responses from our server through a proxy, replacing certain keywords in http text and showing the output on his cloned site.
So, I checked our server logs, and lo and behold,
if I request a nonexistent file on their domain, like /somethingtest.html
i can see that request on OUR server in server logs, coming from an IP in Ukraine, proving my proxy theory.
So, I blocked this IP's access to our server, and his site went down immediately (redirect loop error on there now)
Is there anything we can do to prevent this kinda thing from happening? It seem its quite easy to set this up after thinking about it, and the worst thing is you do not even have to compromise the attacked server.
I could add in htaccess rules which replace anything sent to that ip with some nice porn imagery or something, or at least a message warning users the site is a clone, however since he is just requesting http traffic im not sure how we can block them long term, its easy for them just to move the proxy to another ip?
| 9:26 am on Sep 14, 2012 (gmt 0)|
if you search on webmasterworld for proxy scraper site [google.com] or something similar you may get some usable ideas.
one suggestion i didn't see mentioned is:
you might consider using a link rel canonical with the entire url percent-encoded, making it more difficult to translate to another domain.