Welcome to WebmasterWorld Guest from 35.171.45.91

Forum Moderators: Robert Charlton & goodroi

Another website has cloned my site

     
8:50 pm on Sep 23, 2019 (gmt 0)

New User

Top Contributors Of The Month

joined:Apr 28, 2019
posts: 22
votes: 8


My website has been live since the start of January last year and has steadily grown in the last 6-10 months, ranking in the top three for a few thousands of keywords.

My issue is that another website has registered an unrelated domain and has cloned my entire website on there. My website is news, so it’s updated daily, and the other website seems to get all the changes in almost real-time. The entire site and its functionality is cloned.

The website shows up on Google if you search for my articles long enough, although Chrome shows an ‘unsecured website’ due to the SSL certificate they’re trying to use as mine.

My biggest issue is Google perhaps penalising me, because according to Semrush, I’m now getting 22,000 back links from this website that is appearing to be mine.

Ideally I would just disavow the domain, however, I’ve read conflicting things about using it. Is it best to just leave it in case it hurts my current rankings? Or should I disavow the links so Google knows I’m not trying to deploy any black hat techniques?

Any advice would be greatly appreciated. I’ve contacted the host, but they seem to be making it uber difficult to do anything about it.
10:18 pm on Sept 23, 2019 (gmt 0)

Administrator from US 

WebmasterWorld Administrator not2easy is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2006
posts:4558
votes: 363


Before doing anything, determine whether they have a copy of your site or if they are displaying it via hotlinks and iframes. Look at your access logs, it will show there if they are remotely displaying your pages, and that is fairly simple to prevent. Look at the source code on their site ad you may see it there as well.

Don't disavow things that you didn't cause, it just makes it more work you shouldn't need to bother with.
10:36 pm on Sept 23, 2019 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member tangor is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 29, 2005
posts:10563
votes: 1122


@nmbrsk ... welcome to the web where bad actors do the least amount of work (using other people's creativity). No hard and fast answers ... and if you want to deal with this and squash it you will have to invest time and perhaps some money filing DCMA (or related in various countries) to "cease and desist" those infringing your site/copyright.

Consider it a cost of doing business.

Whether hotlinked, copied, etc. this is a common ugly on the web and...

There's only so many ways to combat it.

Make sure your site is locked down against hotlinking---then go from there.
12:21 am on Sept 24, 2019 (gmt 0)

New User from CN 

Top Contributors Of The Month

joined:Mar 20, 2019
posts:9
votes: 0


As a fresher,I would first disavow whole domain,and do the next thing.Anyone who want to help ?
12:48 am on Sept 24, 2019 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member redbar is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Oct 14, 2013
posts:3371
votes: 564


@nmbrsk

Is yours a business or personal site?

Is this your primary living income?

Yes, it does make a difference to my opinion/advice.
1:30 am on Sept 24, 2019 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15932
votes: 887


determine whether they have a copy of your site or if they are displaying it via hotlinks and iframes
Or, option C, they can simply point their DNS to your physical files. If you make a change and it shows up instantly on the offending site, that's the likeliest explanation. It is the easiest way to do it from their end--but fortunately also the easiest thing to prevent at yours.
11:40 am on Sept 24, 2019 (gmt 0)

Preferred Member

10+ Year Member Top Contributors Of The Month

joined:Feb 5, 2004
posts: 618
votes: 105


RSS feeds are another way they can get your content instantly.

Not sure what @not2easy means by "Don't disavow things that you didn't cause". While disavowing in this instant may not be useful (if they are not linking to your site) I think most people's disavow file is filled with links that they had absolutely nothing to do with. My site has thousands of links that are the result of Negative SEO or whatever (mostly Russian links) that point back to my pages. They look like spam links from bad neighborhoods but Google still has huge problems telling the difference. If a page drops in the SERPS I check the backlinks in the Search Console and most likely I will find a whole pile of new spam links pointing back to it.
1:35 pm on Sept 24, 2019 (gmt 0)

New User

Top Contributors Of The Month

joined:Apr 28, 2019
posts: 22
votes: 8


They aren't using iFrames. I've seen their source code, and they're using all my files, scripts & css to load their file. The internal linking structure directs to their domain, however all the links placed by me in the template file etc. point to my website, resulting in me getting 20k+ backlinks from them.

The rel=canonical goes to the copycat website, as does og:content etc. It seems like an obvious way to try and get me penalised?

In answer to the question: it's a personal site, but for all intent and purposes acts as a 'business' site - delivering news updates and content. Up to a few months ago, it was my primary living income.
2:35 pm on Sept 24, 2019 (gmt 0)

Preferred Member

10+ Year Member Top Contributors Of The Month

joined:Feb 5, 2004
posts: 618
votes: 105


@nmbrsk still not 100% clear.

Have they scrapped your site and just are hosting the scrapped html content (with the URLs domain changed in the code as you said with rel=canonical, etc...)

OR

Do they actually have your source files that you are hosting? (html or php?)

If you make a change in an old article does it replicate to the other website?
3:39 pm on Sept 24, 2019 (gmt 0)

New User

Top Contributors Of The Month

joined:Apr 28, 2019
posts: 22
votes: 8


"If you make a change in an old article does it replicate to the other website?"

I've just tested this and yes, it does.
3:55 pm on Sept 24, 2019 (gmt 0)

Preferred Member

10+ Year Member Top Contributors Of The Month

joined:Feb 5, 2004
posts: 618
votes: 105


While I am in no way an expert in this issue (or Wordpress) after looking at the sites in question it looks like it mirrors your site exactly (except for some missing graphics in the menu and google ads not showing (but the code is there for the ads)). The other site even seems to function as yours allowing me to subscribe to a newsletter and perform searches which returns the same results as your site.

So either he has got your files and database somehow or he is running some software that allows him to request from your site directly any requests the cloned site gets and then in real time update any urls in the source before displaying the content.

You could also do a reverse IP lookup of your websites IP to make sure no other domains (like the clone website) is pointing at your IP. (This is to check lucy24 suggestion)

One hint of how maybe the other site is able to do this is I notice at the bottom of the pages some html comments. One of them being:

"Cached page generated by WP-Super-Cache on …"

which do not appear on your site.

I would approach this problem in 2 ways. First figure out who is hosting the website (do a whois lookup) and then approach them with the complaint to find out how to proceed in getting the website removed (most likely a DMCA request).

Second figure out how the other website is returning real time results from your website and block his requests. Can this WP-Super-Cache plugin actually cache other domains? Do you have a security hole in your website setup that allows for this? Maybe someone who knows more about Wordpress can answer this?
7:32 pm on Sept 24, 2019 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11869
votes: 244


They might be using a proxy server to request and modify your content on the fly.
8:04 pm on Sept 24, 2019 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15932
votes: 887


They might be using a proxy server to request and modify your content on the fly.
Wouldn't this be deducible from logs? From the offending site, request several obscure interior pages that aren't closely linked, make a note of the times, check your logs to see if some mysterious IP requested those same pages at those same times.

nmbrsk, did you at some point say that you have a domain-name canonicalization redirect in place? It always helps to eliminate the easiest solutions first.
8:23 pm on Sept 24, 2019 (gmt 0)

New User

Top Contributors Of The Month

joined:Apr 28, 2019
posts: 22
votes: 8


Thanks for your answers. Thanks Lucy and Jester, you put me on the right track, and I did some more research and came across this article: [webmasters.stackexchange.com...]

The first answer says: "The good news is this. These are not clones or copies of your site, they are your site. Each sub-domain points to your IP address specifically." So I used his suggestion and put this in the .htaccess file:

"RewriteCond %{HTTP_HOST} ^copycatsite\.com$ [NC]
RewriteRule .* - [F,L]"

I didn’t get much luck when I had it towards the bottom of the file, but I shifted it to the top and it's worked perfectly. The cloned website now displays this on all the pages:

"Forbidden
You don't have permission to access this resource.
Additionally, a 403 Forbidden error was encountered while trying to use an ErrorDocument to handle the request."

Thanks once again. My only question is now this:

Now the site is 'gone', how does this affect me in the SERPs? Like I was saying, I had over 22,000 backlinks from this cloned site (far more than any other), so surely this would have some impact? I wanted to sort this so Google wouldn’t penalise me, but perhaps it will, now that the BL profile of my website has suddenly dropped?
11:29 pm on Sept 24, 2019 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11869
votes: 244


The first answer says: "The good news is this. These are not clones or copies of your site, they are your site. Each sub-domain points to your IP address specifically." So I used his suggestion and put this in the .htaccess file:

"RewriteCond %{HTTP_HOST} ^copycatsite\.com$ [NC]
RewriteRule .* - [F,L]"

this is possible if your server has a virtual host configuration for wildcard hostnames.
this could instead be solved by a hostname canonicalization redirect ruleset.

Additionally, a 403 Forbidden error was encountered while trying to use an ErrorDocument to handle the request.

if you decide to keep that "Forbidden" ruleset you will need to add an exception for the specified custom 403 document.
RewriteCond %{REQUEST_FILENAME} !^custom-403-document\.html$
RewriteCond %{HTTP_HOST} ^copycatsite\.com$ [NC]
RewriteRule .* - [F,L]"
1:36 am on Sept 25, 2019 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15932
votes: 887


you will need to add an exception for the specified custom 403 document
I was thinking that in this specific situation--unlike most--the recursive 403 is just what you want: “Nope, you’re not going to see one single solitary thing on my site, not even my 403 page!” :)

But yeah, in general you do need to poke holes for your error documents.

Edit: With the [F] flag, the [L] is redundant. It will do no harm; it just isn't needed.
4:41 am on Sept 25, 2019 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11869
votes: 244


I was thinking that in this specific situation--unlike most--the recursive 403 is just what you want: “Nope, you’re not going to see one single solitary thing on my site, not even my 403 page!”

in this case it would be better to not specify the custom error document(s) in a wildcard virtual host configuration.
12:27 pm on Sept 25, 2019 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator robert_charlton is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2000
posts:12402
votes: 410


nmbrsk, all those links to your site, combined with the cloned pages, suggest to me that someone might be intending to hijack your site as part of a churn and burn spam network, your site either as a relay station or a possible destination for many links, some of which may also come from hijacked sites, as well as payload, which could be anything from malware to pharma to phishing. .

I would fetch those new subdomain pages as Googlebot, and see if there's any content in them which suggests spam targets or links to other sites in the network, which is cloaked for Googlebot but not visible to the webmaster by eye.

Possibilities as I see them, and I'm not an expert in this area, but I see a lot of this as a Google mod...

- proxy hijack, perhaps spoofing as Googlebot...
- dns hijacking, though I'm less sure of that
- multiple potential reasons for the hijack.

This thread below was written for another situation, so things are in a different order, but it's one of our most thorough threads on hijacks. Read all the reference threads.

My site's being de-indexed and replaced by others
Feb, 2016
https://www.webmasterworld.com/google/4790240.htm [webmasterworld.com]

The reference(s) to proxy hijacking is old, but a classic WebmasterWorld thread.

Much... too much, actually... to be said about hacked and/or hijacked sites, if that's what's going on here, as there's infinite variation possible in how they hide their tracks.. Hijacking is the only motive I can think of, aside from just defacing your site.

What's your site platform/CMS? You should definitely check it out if it's more than just a wildcard DNS hijack. Frankly, I'd get rid of the wildcard DNS, as they're very vulnerable to lots of subdomain hacks.

Please keep us posted on how it goes. We get very little feedback on how these are resolved.