System: The following message was spliced to this thread by robert_charlton - 2:37 pm on Nov 19, 2017 (PDT -8)
Mod's note: This is a continuation of previous thread, an ongoing part of the diagnosis of the originally observed problem, and I've spliced it on to that discussion to maintain continuity.
Poster's title for the new thread was:
Website cloning and google issues - incredible ---
I know this was already written about, but I need to add a few things.. because what I've seen in last 24 hours is next to incredible...
Long post, but I'm trying to cover it all...
Cloning method: 1. hackers don't need access to your site at all
2. simple script is put in on some-clone-site.com (it is scary how simple the script is - I even found the script and I tested it)
3.
yoursite.com gets rewritten to
some-clone-site.com in every instance, your url is nowhere to be found on the clone site
4. YOU are actually hosting the clone site. Your site is NOT scraped (copy pasted), just served to another url, with rewrites that hacker puts in
5. script got, to explain simply - FIND and REPLACE parameters in it (example - to turn bits of your code into their banners or to simply add even more keywords to whatever they want)
6. voila, they now have an exact replica of your site, ready for googlebot to come and index it as if it is something new
Usual targets: Big sites with huge amount of subpages (thousands).
With such big sites, it's next to impossible to detect you're being cloned - any refference to yoursite.com is being rewritten as clonesite.com, and traffic fluctuations are a normal thing. You'll pretty much realize you've been cloned when it's already too late and effects on your google ranks are already drastic.
What can you do to defend yourself Firstly, problems with defending: 1. Fixing your site (javascript redirects, htaccess, blocking IPs..) - yes, you can do all that - problem is, hacker can simply tweak his cloning method and your effort is wasted, clone keeps running
2. Takedown notices, dmca, spam report, abuse.. - yes, you can do all that too - by the time you're done with it, your business suffered a huge hit in google and you've spent valuable resources (time and money) to get his site removed.. but a fun fact: hacker can get a new clone running in 5 minutes (literally, once again, I've seen that script and it is scary how simple it is)
Never the less, a quick few things you CAN do: 1. code that checks if your site is your site (needs to be done in a specific way, feel free to send me a private msg and I'll explain it - not writing it here to avoid chances that hackers are seeing this too and potentially working on counter solution for it, and no, I'm not charging money for it)
2. your host can block their server IPs or you can do it via htaccess and such - this is short term stuff, because it's pretty much fighting the windmills (hacker can change his ip and server or better yet, he can simply launch a few other clones that are pointed from elsewhere)
3. google dmca/spam reports.. good luck with that if you're running sites with 1000s of subpages like I am - same problem, fighting windmills - you'll spend a week on listing urls, or 100s $ for some agency to do it, and chances are you'll just get a google dmca team's response saying they need "further proof.." (oh and, did I mention googlers are crazy slow to respond? yep, they are)
Regarding why I wrote "incredible" in the thread title..
Magnitude of this kind of hacking / mirroring / cloning is off the charts. Thousands of (big) sites have their clones in google index and they all definitely suffer to a degree because of it, with two differences:
- some suffer less ranking drop, some virtually disappear from google index - life and business ruining type of thing
- some sites know about it, some have no idea (as funny as it may be, it isn't easy to discover a clone)
What really puzzles me is google role in all of this. They are either pretending the problem does not exist or they are completely clueless on how to fix it.
Yes, I know algorithms are not simple stuff and so on but... what the hell:
YOURSITE.com - thousands of indexed pages, good ranks, original content, brand established
CLONE-SITE.com - oh, an identical clone site just appeared, lets deindex YOURSITE.com and index that one, despite the fact it's 99% identical (I'm saying 99% because chances are that hacker "glued" 1% of his own keywords / code inside yours).
I strongly believe google needs to do something about this. Here is my recommendation to google, as FIRST STEP on how to assist webmasters: Some kind of spam report that is sitewide.
Example being:
INPUT YOUR SITE: yoursite.com
INPUT CLONE SITE YOU'RE REPORTING: clone-site.com
Algorithm should be able to do two simple things:
1. compare sitewide data of both sites, % similarity
2. compare actual indexing dates and age of both sites
If algorithm is in doubt of any kind --> pass along to manual review.
Voila - instant deindexing of clones can commence and internet becomes one hell of a nicer place to be in.
--------
I'm not just saying all of this because my sites are affected by this malpractice - I'm saying it because I know how devastating it is and
I KNOW it's happening to thousands of websites and webmasters.
P.S. If by any wild chance anyone can get me a contact of some googler or anything such, I can provide all info needed to make this malpractice go away from internet for good. Links, proofs, codes, methods.. I can show it all and explain it all.
P.S. 2 I'm dead tired - I've been reading, researching, testing and figuring stuff out for almost 24 hours with barely any sleep. If I omitted a piece of info - please ask me anything. [edited by: Robert_Charlton at 11:06 pm (utc) on Nov 19, 2017]
[edit reason] added notes re splicing threads [/edit]