Welcome to WebmasterWorld Guest from 35.172.100.232

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Hacked websites displaying my site

     
4:21 pm on Mar 2, 2017 (gmt 0)

New User

5+ Year Member

joined:Oct 11, 2013
posts: 39
votes: 5


Hello fellow webmasters,

I'm going straight to the case:

in GSC "Links to Your Site" page some weird domains has begun to appear. Totally unrelated to my site, totally random and I have nothing to do with them. At the moment there's at least 100 different domain names are pointing to my website.

Now the interesting part: clearly all these websites has been hacked at some point, because all of them are hosting a (fully or partly) copied version of my website. It looks something like this:

    random-hacked-domain-1.com/mbt/mbt-144723.html
    random-hacked-domain-2.com/mbt/mbt-158745.html
    random-hacked-domain-3.com/mbt/mbt-187451.html
    ...


At this point you can imagine that each of these domains has hundreds and hundreds of links pointing to my website. I have a meta tag canonical on each of my website's page, but of course it has been removed prior uploading a copy to these hacked websites. I'm checking these hacked websites one by one now and noticing that some of them already deleted all hacked pages (including a copy of my site).

1) It looks like an attempt to attack my website (both SERP and Trust), doesn't it?
2) Should I be worried that Google won't be able to dismiss these links?
3) What I could do on my part to reduce the (possible) damage?

Thank you
11:07 pm on Mar 2, 2017 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator robert_charlton is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2000
posts:12362
votes: 403


Often, many such links pointing to your site can indicate that you're being set up to rank for something that you don't want to rank for. Use Fetch as Googlebot to view your site to make sure that you haven't been hacked as well... or to get knocked out of the results and have the copied content rank in your place.

As you describe it, hijacking seems more likely.

The hacks would be visible to Google and to visitors who come to your site via Google, but not to.you if you look at your site directly or if you check it via another search engine.

See this thread as a good overview...

Understanding hacked sites that rank in Google
April, 2013
https://www.webmasterworld.com/google/4561487.htm [webmasterworld.com]

Also see...

Google launches new tool to identify site security issues
Oct 31, 2013
https://www.webmasterworld.com/google/4620501.htm [webmasterworld.com]

...and use site search at WebmasterWorld for hacked sites, and for hijacked sites.

----

For hijacked sites, also see...

My site's being de-indexed and replaced by others
Feb, 2016
https://www.webmasterworld.com/google/4790240.htm [webmasterworld.com]

and, possibly most descriptive of what's happening to you, but the variations are many...

Google Result Hijacking
April, 2016
https://www.webmasterworld.com/google/4800812.htm [webmasterworld.com]


(Edited to change date on one of the threads posted.)

[edited by: Robert_Charlton at 7:38 am (utc) on Mar 3, 2017]

12:56 am on Mar 3, 2017 (gmt 0)

Junior Member

joined:Sept 25, 2016
posts:62
votes: 19


3) What I could do on my part to reduce the (possible) damage?

Try to work out where they are coming from and get them taken down. Find the ISP, file the complaints, etc etc.

I suffered a similar cloning issue a year ago, although it was a bit simpler since it was one domain, and I dealt with an ISP that was responsive and morally sound.

While there were a lot of people who tried to convince me that Google could see through such shenanigans, I watched in horror as our traffic took a dive and stayed that way for months last year. Our traffic started recovering when we got the site shut down and pages slowly.... so very slowly... started dropping from Google. Was it a coincidence? Were we hit by something unrelated and then recovered? Maybe, but it sure as heck didn't look like that to me.

Throughout the ordeal Google ignored all attempts for us to have all of the pages dropped. Their DMCA process wanted one page submitted at a time (we had over 400,000 cloned). Their abuse, spam, etc email addresses completely ignored us, even though it was completely obvious to anyone what was going on. Their clone-detection is a total fail. It has been 8 months since the cloned website was taken offline completely, and Google *STILL* has 1000 of our pages indexed under the clone domain. These are pages that (a) are cloned, (b) have inserted malware, (c) have been inaccessible (404) for 8 months. And they are still indexed!

So, please do what you can to try to stop it yourself. People here generally say that you can't get hurt through such tactics, but I don't believe it. This is one area Google absolutely needs to do better at. They need to have a dedicated, responsive team to investigate this sort of garbage and correct it immediately.
4:57 am on Mar 3, 2017 (gmt 0)

New User

5+ Year Member

joined:Oct 11, 2013
posts: 39
votes: 5


Oh, this looks nasty.

If I try to search "exact page title in quotes", I get tens of indexed websites with my content. What's even more interesting, I've found that some of my main competitors are targeted the exact same way too (competition in my niche is huge, my site ranks high for some very important keywords).

I use "Fetch as Google" almost every day, my site is OK and not hacked.

Found and disavowed 104 different domain names already. Switched canonical meta tag to HTTP header - I have high hopes for this one (as per Google [support.google.com...] Now I need to identify the scrapper/hacker.

Thanks Robert and westcoast
6:50 am on Mar 3, 2017 (gmt 0)

Administrator from US 

WebmasterWorld Administrator not2easy is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2006
posts:4454
votes: 330


To identify the scrapers/site owner(s) do a whois lookup. You can get the registered owners' name(s) if they are published and host(s) which is where the removal requests and DMCAs should be filed. The DMCA is a USA law so if you or the copied domains are outside the US you may need to use a form that is recognized where they are hosted. The EU has a different type of copyright law. Not all hosts and countries support your rights, it depends on where those sites are hosted.

Please realize that this is not legal advice which is not available from forums. If you need legal advice you should contact a lawyer.
7:11 am on Mar 3, 2017 (gmt 0)

New User

5+ Year Member

joined:Oct 11, 2013
posts: 39
votes: 5


Thanks not2easy.

As I was going through the logs (found and blocked some IPs, User agents) I realized that this is when the bad boys win - they take my time from actually working with/improving my website to looking for hacked websites, IPs, investigating, filling DMCA claims etc. Sure, I'm still going to do all the important tasks (like monitoring logs, blocking too aggressive IPs etc), but I won't fall into their game. My biggest worry from today will be sending as much strong signals about my website/content and server (!) to Google and others as possible (I will still keep the 104 blocked domain long disavow file just to see where it leads, but I won't add any new domains. It's clearly a waist of time).
2:33 pm on Mar 3, 2017 (gmt 0)

Administrator from US 

WebmasterWorld Administrator not2easy is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2006
posts:4454
votes: 330


I will still keep the 104 blocked domain long disavow file just to see where it leads
- don't give up, but please don't block by domain.

There is an entire forum here dedicated to helping people learn what works, what doesn't and how to do it best: [webmasterworld.com...] you can see a list of the discussions there and see if some of those topics give you more efficient ways to keep track of who is doing what in your logs. It can take months of research, but all webmasters can do more to protect their hard work. Especially helpful topics are found in the Charter area and Library of each forum, they are found in the drop-down menu under the "Forum Options" button on every thread.
8:01 pm on Mar 3, 2017 (gmt 0)

New User

5+ Year Member

joined:Oct 11, 2013
posts: 39
votes: 5


Ok, I have a follow up post. All these sites displaying my content are hacked wordpress sites (again, my site is not hacked, nor wordpress based). This is clearly not an attack against my site, it's rather poor attempt to promote some particular widgets brand/widgets store. It works something like this:

1) Attacker collects links to the sites/pages mentioning his/similar widgets;
2) Attacker has access to hundreds of poorly maintained wordpress sites;
3) Attacker creates pages on hacked wordpress sites, includes his wigets name in the link and on the actual page he displays contents of '1)'.

Displayed pages are live (so no copying/scrapping), that is if I change anything on my site and refresh the hacked one, I see all changes immediately. But it's not an iframe, it's done in PHP code. The source code on hacked website looks 100% as original source code. On my server access log I see it as a direct access (w/o referrer) from IP that is not the same as the IP of that hacked site (this bugs me quite a bit, host and network prefixes does not match! For ex., IP in the access log is aaa.bbb.179.148, hacked site domain is hosted on aaa.bbb.139.147).

Maybe someone has a clue how to prevent those sites displaying my content? I doubt it's possible, but it well may be limits of my knowledge.

Thanks again.
2:18 am on Mar 4, 2017 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator robert_charlton is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2000
posts:12362
votes: 403


sangi, forgive this very rushed note, as I was just walking out the door, but this might help....

Displayed pages are live (so no copying/scrapping), that is if I change anything on my site and refresh the hacked one, I see all changes immediately.

Great bit of information to have. If the site is live, then it's most likely a proxy server hijack, where the proxy is spoofing Googlebot. What you think is going to Googlebot is actually going to the hijacker.

See...

Proxy Server URLs Can Hijack Your Google Ranking - how to defend?
June, 2007
https://www.webmasterworld.com/google/3378200.htm [webmasterworld.com]

Also, you might want to read what various of us say about proxy hijacking and that thread in this discussion....

My site's being de-indexed and replaced by others
Feb, 2016
https://www.webmasterworld.com/google/4790240.htm [webmasterworld.com]

If not on those two threads, Google's got documentation for verifying Googlebot. There are various combinations of techniqes as well, to confuse you about what's really happening. Sorry to rush, but that should get you going.

2:22 am on Mar 4, 2017 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator robert_charlton is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2000
posts:12362
votes: 403


PS: Could conceivably also be a DNS hijack, but I don't even have time to think that one through.
6:06 am on Mar 4, 2017 (gmt 0)

New User

5+ Year Member

joined:Oct 11, 2013
posts: 39
votes: 5


Dear Robert,

I'm pretty confident it's not a DNS hijack, because if I take the link from my OP random-hacked-domain-1.com/mbt/mbt-144723.html and detele the bolded part, I get an actual site of that domain.

cat my-site.com_access_log |grep Googlebot |grep -v '66.249' |wc -l


Returns less than 200 lines, most of those are unsophisticated POST attempts to 'hack' a wordpress code (which I have none). At the same time this log file contains over 67 000 lines with the real Googlebot (IP starts with 66.249.) access cases.

If I try to Google "site:random-hacked-domain-1.com/mbt/mbt-144723.html", I see it's being indexed with the exact same title and description as the original one. Only the domain name is different (duh).

At this point I can find hundreds of such cases and not only for my site. Looks like it's kind of a big problem for many of us (including confused Google).
7:46 am on Mar 4, 2017 (gmt 0)

New User

5+ Year Member

joined:Oct 11, 2013
posts: 39
votes: 5


Sorry for double posting, for some reason I'm not allowed to edit my previous post anymore.

I've found out more:

1) with each hacked page refresh in my access log I see different user agents but IP is the same.
2) hacked pages are using cloaking, that is if I curl them as a Googlebot, I see my content. If I curl with any other UA, I get redirected to some widgets store.

At this point it looks like there's impossible to defend yourself from these kind of attacks. Sending canonical URL via HTTP header and blocking hacked sites IP's is the best I can think of right now.
8:12 am on Mar 4, 2017 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator robert_charlton is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2000
posts:12362
votes: 403


sangi - Again, since you're seeing the changes live, as well as also seeing changes on your pages, it sounds like some sort of proxy hijacking that's then run through a script and making changes to your content.

I'm stressed for time, and I'm not going to be able to add much anyway, but some basic questions that should be asked...
- was your site ranking well?
- are the mirrored pages replacing you in the serps? In the same position, or different?

I'm assuming, btw, that you've checked your own server carefully, but it's worth mentioning.



8:28 am on Mar 4, 2017 (gmt 0)

New User

5+ Year Member

joined:Oct 11, 2013
posts: 39
votes: 5


Robert, my poor English misled you, sorry. No one is able to change content on my website except me. What I meant was if I change something in one particular page on my site, I see those changes on hacked sites right after the refresh (I did that to check if my pages are scrapped or it's a Proxy Hijack, which at this point I'm 100% confident it is).
9:42 am on Mar 4, 2017 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator robert_charlton is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2000
posts:12362
votes: 403


if I change something in one particular page on my site, I see those changes on hacked sites right after the refresh (I did that to check if my pages are scrapped or it's a Proxy Hijack, which at this point I'm 100% confident it is).
Good! I think we're saying the same thing. If the changes appear in more or less real time, we are both thinking it's a proxy hijack.

But there are two (or three) sets of things going on with this hijack all at once.

- one is that the proxy is preventing Google from seeing your content on your domain, so I assume you are dropping out of the serps...
- another is that the hijackers, once they have your content, can play with it and cloak it, etc. They've got hijacked sites pointing at copies of your content and outranking your weakened site.
- - additionally, I think you're saying that they're using copies of your content with your canonical tags removed and titles changed, etc... which also creates canonical confusion for Google. Possibly, they've got a script they run their hijacked content through. The hackers have set up multiple ways to try to outrank you. It may take a while to clear up.

You should be able to get rid of the proxy, as discussed this thread from 2006...

How to Verify Googlebot and Avoid Rogue Spiders
Sept 2006
https://www.webmasterworld.com/google/3092423.htm [webmasterworld.com]

Also, definitely reread the proxy thread I linked to earlier, from 2007.

Even with the proxy removed, though, the hackers have your content and the hacked sites may hang on for a while. The fourth thread (April 2016) I cite in my first post above goes into that at length. My issue with what Andy says, btw, is simply with the word "error". A churn and burn situation is way outside the normal range of algorithmic consideration, and I don't expect the algo to get the combination of things being thrown at your site right the first time, so I wouldn't call a problem in that range an "error". Otherwise, I feel, he nails it.

Hope this helps. Again, I need to disappear for a while. Good luck.

9:58 am on Mar 4, 2017 (gmt 0)

New User

5+ Year Member

joined:Oct 11, 2013
posts: 39
votes: 5


It's good to have you, Robert. I'm reading through all the threads you've cited. Also I came up with this idea (perhaps it's already mentioned somewhere in threads you've cited):

I will check each and single IP in my access log and if it belongs to some site/hosting company/etc I will block it right away. With fail2ban shouldn't be too hard, but I need to think it through/test it before, I do not want to get myself in to more trouble.

To answer your previous questions:

1) yes, some of my rankings are dropping;
2) if I search for "exact_unique_page_title" in Google, I see that some hacked sites has higher SERPs.
10:25 am on Mar 4, 2017 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12913
votes: 893


Blocking the server farm IPs in your logs is a full time effort for anyone running a site.

But what you describe, especially the instant mirroring, is being done through transmission. A proxy hijack is a strong possibility, another is "man-in-the-middle" hijack. Using HTTPS will stop that.
10:38 am on Mar 4, 2017 (gmt 0)

New User

5+ Year Member

joined:Oct 11, 2013
posts: 39
votes: 5


keyplyr, thanks. I'm planning to fully automate blocking (fail2ban is a perfect tool for many things including this). My site is on HTTPS, so it's safe to oust the "man-in-the-middle".
10:45 am on Mar 4, 2017 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12913
votes: 893


There's a lot of info on IP ranges and User Agents here: [webmasterworld.com...]

Hope your IP blocking tool allows you to make exceptions for the beneficial agents.
12:18 pm on Mar 4, 2017 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12913
votes: 893


Another way the perps can be getting your content is by using one of the many RSS type feed readers (including Google's feed-fetcher) then displaying the content on their page using iframes or similar techniques. This would account for the instant updates.
5:45 pm on Mar 4, 2017 (gmt 0)

Administrator from US 

WebmasterWorld Administrator not2easy is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2006
posts:4454
votes: 330


You may be able to determine how they are showing the site by clearing your browser cache and visiting with a header tool such as the FF "Live HTTP Headers" enabled. That will show you just what files are being requested and served and from where.

You could also use reverse scraper methods such as saving their "Web Page, Complete" to a folder where you can view their coding and files. That method may have barriers in place but usually not.
4:56 am on Mar 5, 2017 (gmt 0)

Moderator from US 

WebmasterWorld Administrator martinibuster is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Apr 13, 2002
posts:14922
votes: 491


Found and disavowed 104 different domain names already.


In my opinion, disavows are a last resort solution, especially if there are easier and better solutions. Your problem is that these pages exist. So in my opinion it makes sense if you take actions to make those pages disappear.

1. Contact the site owners (by phone AND email) and tell them their sites have been hacked. They probably don't know. Best case scenario is that they are grateful you contacted them and they take down the offending pages. End of story. Submitting a DMCA does not solve the problem as directly as the publisher taking down the offending web pages, so in my opinion you may wish to consider putting your energy into contacting the site owners to get the offending content removed.

2. If that fails, and if their site is hosted in the USA or their domain name is registered in the USA, you may have an option to file a DMCA complaint. Consult with an attorney to understand your rights and if the DMCA provides a way for you to address this issue. This is a legal matter so you should consult with an attorney to find out what your rights are.

Good luck,
;)

Roger Montti
6:13 am on Mar 5, 2017 (gmt 0)

New User

5+ Year Member

joined:Oct 11, 2013
posts: 39
votes: 5


keyplyr, I tried to change something in my HTML output, not in the content itself, so it's pretty much clear the attackers are using PHP code (php Curl would be my guess). For the sake of test I tried to change the URL of my original-but-copied page - hacked websites displayed my (!) 404 error page immediately.

not2easy, yes, it's a must to find out the invisible information too. I use curl for that, because with curl you can change your UA easily and that was quite handy, because I was able to find out hacked websites act differently depending on visitors UA (they manager to game poor Google this way).

Roger, as a long time lurker here I have a huge respect for you, but this time I completely disagree: there's simply no time for these steps. Its been more than a month now when I first started to notice some strange things happening (loosing some important keywords that I had for many years as #1, some weird backlinks were appearing in my GSC account etc), but because I am a fool I choose to ignore all these signals at first. But I learned my lesson:

1) analyze access logs every day, you have to notice weird things happening first!
2) block every single suspicious IP or even IP ranges (my case)

That's basically it. Yes, you can use third party tools for keyword monitoring, even the GSC is good enough for this. But if you choose to rely on some third party app to learn what's happening on your site/server then you're deemed to loose in this thing we know as The Internet.
6:21 am on Mar 5, 2017 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12913
votes: 893


or the sake of test I tried to change the URL of my original-but-copied page - hacked websites displayed my (!) 404 error page immediately.
Yup, sounds like they are iframing your page. You can block that with x-header rules or a script that only allows parent (your site) to display your content.
7:08 am on Mar 5, 2017 (gmt 0)

New User

5+ Year Member

joined:Oct 11, 2013
posts: 39
votes: 5


You can block that with x-header rules or a script that only allows parent (your site) to display your content.


Is this even possible? I mean, they're getting output of the page and I don't have any control over this (hacked websites act like a browser). To my knowledge the only way to prevent this is to block IPs.
7:34 am on Mar 5, 2017 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts: 12913
votes: 893


sangi - Absolutely.

This page discusses the various methods available to block your page content from being displayed remotely using an iframe: [javascript.info...] Scroll down to "X-Frame-Options"

I use 2 methods concurrently. A script:
<script type="text/javascript">
if (parent.frames.length > 0) {
parent.location.href = location.href;
}
</script>
and the header tag in htaccess:
Header append X-FRAME-OPTIONS "deny"


they're getting output of the page and I don't have any control over this
This stops the *browser* from displaying the remote content in an iframe. Your logs may still show the attempts, but none of the major browsers will display your content *if* this is actually what they are doing :)
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members