Google Result Hijacking

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Google Result Hijacking

vordmeister

5:46 pm on Apr 15, 2016 (gmt 0)

I found this evening that when searching for a product offered by my website I found my normal result in the number 2 spot but with a different URL than my website. It's like the days when you used to be able to hijack a result with a 302 redirect.

Upon clicking on the result I was directed to that irrelevant looking url which then redirected to a rather disgusting site containing the sort of videos whose name cannot even be mentioned on here.

I have reported this to Google using the link at the bottom of their results page. Is there a better way to report the issue? I didn't find their help pages useful.

What might I have done wrong to allow this hijacking to happen? The files on the server appear to be sound and have not been altered by others.

vordmeister

5:31 pm on Apr 21, 2016 (gmt 0)

My latest attempt to influence things from my end is to change the URL of my page. I'm not sure if that will help, but the position of the hacked site in the serps is the position I would expect to occupy, so their page must be supported by the links to my page (and the weird canonical error at Google's end). I figure if I delete my original page and replace it with a new one it will starve my original page of links and might give Google another chance to figure out which website the page is on.

doc_z

9:38 am on Apr 22, 2016 (gmt 0)

My latest attempt to influence things from my end is to change the URL of my page.

Are you redirecting the old URLs to the new ones? If not, you're loosing all of your backlinks. Moreover, the content can be treaten as new by Google.

If you think that links to your site plays a role: How about using the disavow tool?

Andy Langton

7:09 pm on Apr 22, 2016 (gmt 0)

their page must be supported by the links to my page

Quite - unfortunately, this sort of canonicalisation doesn't rely on the non-canonicals existing. Google canonicalised - you didn't.

If you think that links to your site plays a role: How about using the disavow tool?

If disavow were quick/straightforward, then this would be a decent option. However, should OP really disavow links to his page to try to stop a third party site from ranking? The whole situation is Google craziness that they should fix. They're really failing on the hacked page front here, too, which is just simple user-agent detection for Googlebot. The result is that users clicking this result are directed to extremely dodgy sites.

vordmeister

8:23 pm on Apr 22, 2016 (gmt 0)

Quite - unfortunately, this sort of canonicalisation doesn't rely on the non-canonicals existing. Google canonicalised - you didn't.

I've been reading up a bit. A google for "How I hijacked rand fishkin's blog" is revealing, though suggestions for defending are "I don't know". Any other ideas or pointers much appreciated. We are 8 days in to this and nothing I have tried has had any effect.

I'm going to try pumping some serious pagerank to my page at it's new URL via a homepage link offered for a short time by the industry leading website in the field. Very nice of them to do that for me. My own page was only 5 months old which might have made it vulnerable.

Good suggestion doc Z about redirecting old page to new page. Very good point - I've done that now. I'll not disavow links as it's internal links in my website that seem to govern the position of the offending result. Once they have got to the point that my page should be there they make a canacolisation error and stick some other page up there instead. Unfortunately they chose the copy that does the redirect to p0rn thing.

seoskunk

2:02 am on Apr 23, 2016 (gmt 0)

I doubt the homepage hijack is possible without it already being penalised.

Google like their fun and games but no site gets hijacked in Google unless Google think they "deserve it"

[edited by: seoskunk at 2:07 am (utc) on Apr 23, 2016]

seoskunk

2:06 am on Apr 23, 2016 (gmt 0)

Start looking at your site overall or maybe you did something else wrong, things like this are totally deliberate from Google's point of view.

tangor

5:10 am on Apr 23, 2016 (gmt 0)

If you haven't tried it, send a DMCA request to google at the offending site's real ip. They are, after all, "copying" your content. Couldn't hurt.

Robert Charlton

9:39 am on Apr 23, 2016 (gmt 0)

Forgive long post in advance.

I doubt the homepage hijack is possible without it already being penalised.

Not necessarily penalized, but we're thinking along similar lines. The big question for me has been about why the hijacked page disappeared. I'm very impressed by Andy's sleuthing, and if he says it's not a proxy I trust that. But the diagnosis that "Google has totally screwed up" and that this is a canonical error leaves me not fully satisfied. Following up his description of what he did see, here's some speculation on what I'm thinking might be happening....

First, just to emphasize... this isn't simply a case of scraped content replacing the original. This is churn and burn spam doing something outside the range of normal algo. Even though the original hasn't been touched, there's a network of hacked and cloaked sites carrying the results and then redirecting searchers to nefarious targets. There might also be a bunch of hacked sites supplying the link juice to augment these rankings, as that's how it's usually done, albeit in this case maybe not all that extra link juice was necessary.

Here's a thread that's a good overview Google and hacking...

Understanding hacked sites that rank in Google
April, 2013
https://www.webmasterworld.com/google/4561487.htm [webmasterworld.com]

So, I wouldn't say that Google is screwing up when it's got to deal with a bunch of hacked sites. If Google used the normal algo to deal with the general type of spam, which throws a lot of link juice at target pages for very competitive queries, we would see rankings getting distorted in ways that we don't like.

Here, I think that the hacker/spammer is doing something ingeniously different from the normal hacked site approach, possibly accidentally so. In this case, the nature of the queries themselves suggests that on vordmeister's site, it's the rarity of these pages that is what's being gamed.

A query for [product + model number] taken from a manuals site isn't likely to be very competitive. As vordmeister describes his page, it was ranking #2 without much in the way of backlinks. All this suggests that it's not in very competitive area. I don't know exactly what he was targeting, but apparently it fits what the scraper is targeting.

In the case of dupe content, it's generally been that the page with the most PageRank usually wins. The scraped directories we often see generally rank for highly focused searches or for exact quotes. Most of the directories like this I've seen have been essentially scavengers, going after what's already been killed off... like penalized sites, Pandalized site, or sites with dupe content. And yes, the rankings probably are mostly random.

But the particular difference here, which intrigued me a lot, is that it sounds like vordmeister's site had been replaced precisely, as if it were a proxy hijacking or some kind of redirect hijack, which we now know it isn't.

What I now think might be enabling the apparent page by page replacement is roughly as follows...

- the likely queries are very specifically focused on the content of the scraped pages...
- chances are that competing relevant pages are comparitively rare...
- the original pages generally don't have much in the way of inbound links...
- and the resulting scraped sites are highly focused.

I'm theorizing that when Google makes a choice between the original page and the duplicate pages, these scraped pages in a highly focused site (ie, a dupe of vordmeister's but with a bit more link juice), are probably the only decently linked pages that come close to matching the queries. This is why it may seem that they're replacing the originating site on a page by page basis. There may not be enough competition to offer other good alternative choices, so it might seem as if it's a page by page replacement, even though it might not be, as it would be if it were a canonical replacement.

I don't know how competitive vordmeister's target phrases are, so this is all a guess, but in this case, from what's been described, this is how I think it might be working.

I think the hacker's overall approach as Andy describes it is actually fairly clever, as it systematically identifies targets that will rank easily and might survive for a while. In very competitive areas, like payday loans, Google was motivated enough to eventually come up with a special algo that went after high profile spammy queries and spammy links.

Here, I'm thinking, what spammers might be doing here is to turn the spam pattern around... by going after very non-competitive queries that Google isn't looking for, and winning perhaps on the basis of easy relevance with not a huge amount of link juice.

Lengthy speculation... but more satisfying to me than a canonical screw-up.

[edited by: Robert_Charlton at 10:00 am (utc) on Apr 23, 2016]

doc_z

9:56 am on Apr 23, 2016 (gmt 0)

If disavow were quick/straightforward, then this would be a decent option. However, should OP really disavow links to his page to try to stop a third party site from ranking?

If it fix the problem, yes. I don't know if it works, but at least it's worth a try.

Andy Langton

9:57 am on Apr 23, 2016 (gmt 0)

A few clarifications to this.

It's a single page from OP's site that is affected. Not especially competitive, but not worthless. The vast majority of the scraped pages are from other sites - they're taken from top-ranking results for product name + model number, so Amazon and Ebay appear frequently, as do product manufacturers of all shapes and sizes. My guess is that these products are scraped from an online manuals site (i.e. product manuals) as away to identify products that may be searched for. Then the top Google results are scraped. If OP's site hadn't happened to rank well for a product + product number, it wouldn't have been affected at all.

I would put a crude estimate of a few thousand scraped pages per hack attempt. The vast majority don't rank. There are no external links to the hacked pages, rather the hackers serve a series of sitemap page to Google linking to every scraped page. So, the hacked site becomes (in Google's eyes) a large site containing pages in different templates, covering a huge array of different products. This "piggybacks" on any links to the hacked site that already existed.

Because the whole process can be automated, including hacking Wordpress/Joomla it's possible to do this on a pretty large scale. A couple of search queries led me to another 50 hacked sites via the same script - I imagine I could find a good deal more if I put the time in. In those examples alone there must be 25,000 hacked pages.

All that said, I still think the end result is a failing on Google's part, for two reasons:

- Poor detection of hacked pages. There is nothing subtle about how this is done. Overnight, a site suddenly has a sitemap of dozens of pages, all linking to thousands of pages that are already in Google's index, all with different templates, and all mentioning someone else's brand name. Any user-agent other than Googlebot is served totally different content
- Questionable canonicalisation procedure.This is clearly a fringe case, but the bottom line to me is that Google is getting it wrong. They will eventually figure it out, but they are very slow to do so.

Robert Charlton

10:44 am on Apr 23, 2016 (gmt 0)

Andy, thanks for your clarification. I realized my misreading about the number of pages scraped from vordmeister's site and was in the process of correction when you posted. To not change the essence of your post referring to what I'd said, I've restored my post to its original form.

That said, my comments about churn and burn spam and the rarity of the queries remains. As Matt Cutts mentions in his Email to a hacked site noted in the understanding hacked sites thread cited above...

Google can't install everybody's security patches for them.

In this case, the problem isn't with vordmeister's site... it's with all the WordPress and Joomla sites that got hacked and are part of this.

Regarding Google's apparent slowness in correcting this, if it were a medical result, I suspect it would be fixed pretty quickly. As it is, Google's preference often is to resolve these problems algorithmically, and it may well be that leaving the bad results in for a while is necessary for them not only to design and test an algorithm, but also to see how this network behaves. As you point out, there are thousands of pages that can be created at the push of a button, and... with Google's desire for speed, I don't think it's a completely simple thing to resolve. It's definitely not something they want to keep fixing by hand.

Also, I assume that this query isn't searched a lot, which may make it less urgent in terms of Google's priorities, and harder for them to assess certain quality factors that their current algorithm needs to look at.

vordmeister

11:43 am on Apr 23, 2016 (gmt 0)

I agree with your assessments. It isn't an especially competitive market and it would be quite easy to out-pagerank my page.

I've filled in a DMCA this morning (feeling mean so sent it to the web host), though I'm sure one of the other 50 hacked sites will take over the result.

Andy Langton

12:21 pm on Apr 23, 2016 (gmt 0)

of the other 50 hacked sites

Just to note that the majority of those don't have a copy of your page - you must have been in one batch, and there are two or three copies of that batch. So, you wouldn't have an endless task if you were able to "knock out" the hacked results. How long your page is stored by hackers is an unknown quantity. I would assume that they would re-scrape when they launch a new attack.

Robert Charlton

10:17 pm on Apr 23, 2016 (gmt 0)

My latest attempt to influence things from my end is to change the URL of my page. I'm not sure if that will help, but the position of the hacked site in the serps is the position I would expect to occupy, so their page must be supported by the links to my page (and the weird canonical error at Google's end).

IMO, this was the wrong thing to do, and it's likely to make it much harder for Google to diagnose and rectify the situation.

The idea that "their page must be supported by the links to my page" is based on a misconception. It might be true if there were an ongoing connection between the two pages, but otherwise there is not. Perhaps this misconception is caused by the idea of a "canonicalization error", which, again, isn't the term I'd use to describe scraped content placed on a hijacked domain.

As Andy describes it...

The only contact from the hacker to OP's site is when the content was scraped, as far as I can tell.

Assuming that this is the case, then removing links to the original url should have no effect on the scraped page, which sounds like it's ranking because of dupe content and higher PageRank. If there's no proxy or ongoing hijacking involved, I don't see how removing links to your original content can do anything but hurt you rather than hurt the scraper.

This is where I'm feeling that the term "canonicalization error" may be clouding the issue... particularly as you had no rel=canonical tag on your page... and the scraper has none on his page linking back to you.

Andy Langton

10:35 pm on Apr 23, 2016 (gmt 0)

I agree with you, Robert. I don't think changes to OP's site are the right approach.

That said, there are aspects of this where I confess I'm unsure. A search for id:op.example.com/url will return hacked.example.com/asbdas_asdsad - Google has "canonicalised" OP's URL to the hacker's URL, in the sense that I would understand the word. A site search for op.example.com will not include the relevant URL. If I were to link to op.example.com/url, will the hacked site get the credit? I would hope not, but, then again, it would sort of make sense if this happened. I don't think the hacker is relying on OP's links at all, but how is Google handling the canonicalisation aspects? I understand we're not going to agree on the definitions ;)

vordmeister

6:57 pm on May 13, 2016 (gmt 0)

I'm Back! Back in my number 3 spot for the search term.

The answer for anyone else with the same trouble is don't mess with your own site. Send a Google DMCA report to knock the offending site out. Google are quite good and keep the report open if they can't see the content and you can reply to the email to tell them how to find it. The DMCA to the host didn't work as they just rejected the report every time I re-sent it.

It was a useful insight into search. It feels like Google recognised it was a product search and felt they ought to include a shop page in the number 3 position. They have been flipping through a few shops in that position over the last couple of weeks.

Andy Langton

7:06 pm on May 13, 2016 (gmt 0)

Much appreciated that you reported back!

However, I'm not sure it was the DCMA. Removing the other sites' URL resulted in a DMCA message in the footer of results - but no ranking for you. Google has been gradually delisting the hacked pages entirely (so no more DMCA message - and no more canonical competition - you return to results).

That said, DMCA was a better method to avoid users seeing hacked pages than Google's own detection, which left a hacked page in results for weeks. And it denies the hacker their reward, of course.

Andy Langton

7:17 pm on May 16, 2016 (gmt 0)

As a footnote, and for readers of the future, I would (still!) describe this as Canonical Hijacking - stealing the rankings of an already ranking page by serving an exactly identical copy to a search engine. A very low percentage of the time, the canonical URL is replaced - hijacked.

This 48 message thread spans 2 pages: 48