Welcome to WebmasterWorld Guest from 34.204.169.76

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

site: query shows wrong domain in some results

     
6:05 pm on Nov 12, 2017 (gmt 0)

New User

Top Contributors Of The Month

joined:May 22, 2017
posts: 28
votes: 5


Mods note: I've need to removed specifics from this post to publish. I'll try to note what they are...

---

I recently launched a new website (NSFW - voyeur theme), and check this out..
I underlined the intruder.

Mod's note: This refers to a jpg serp printout of the site: command for the poster's site. Seven results are shown, three of which belong to the intruder...

IMPORTANT TO SAY:
- underlined (intruder) site is NOT mine
- (intruder) is NOT hosted on my server
- intruder does not share any content with my site,
- - Mod's note: but urls are set up to look as if they are part of poster's site

ONLY related thing these two sites share is the same platform (script) that powers it. Whole script is self hosted and designs/templates are totally different.

Any idea how such an intruder can pop up in site query?

P.S. If you try to do the search and click on the intruder pages (ie, on results taken over by intruder)... result, you'll end up on my site. If you check cache, you'll end up on intruder site.

<snipped jpg with serps, which showed too many specifics>

[edited by: Robert_Charlton at 1:03 pm (utc) on Nov 13, 2017]

1:00 pm on Nov 13, 2017 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator robert_charlton is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2000
posts:12338
votes: 400


sikosaurus - Very briefly, as it's late, I've removed diagram with the serps pages, including titles, urls, and specifics about the particular flavors of adult content involved. WebmasterWorld doesn't offer site reviews anywhere in the public areas of our site. (We do have reviews in the paid supporters section, which is blocked from the web).

I think I've conveyed in my edits the basic parts of what you see with regard to the distribution of results... seven results on a site: display of your site, three of which, I'm guessing that you have been hi-jacked and perhaps hacked. Because hi-jacked sites often carry malware as a payload, I've not tried these searches... and because of all sorts of exposure which can be dangerous to you and to others, we cannot make that information publis.

You mention
...click on the intruder pages (ie, on results taken over by intruder)... result, you'll end up on my site. If you check cache, you'll end up on intruder site.
Just guessing, as I'd need more information to really sort it out... ths suggests that the cache is cloaked for Googlebot, and that the other content might be framed. 100% frames are often used in spam sites. Occasionally, framed content is redirected by javascript with mouse-over. The idea is to cloak for Googlebot, and perhaps also confuse everyone about the sources of the hi-jack using redirects, etc... while collecting visitors to payloads. Can you describe more specifically the nature of the payloads (without getting into specifics of the niches)?

Have you fetched your content as Googlebot, either with GSC or with a user-agent switcher? That might provide a clearer picture of what's going on, as pages on your server might look perfectly normal to you but changes will be seen by anyone coming in via Google.

Note that in order for a hi-jack scheme to work, either your site needs to be de-indexed in some way... weakened greatly, or some vulnerability needs to be exploiated. Often, hacked pages/sites are used as targets, or used to relay link-juice to a page/site that is the target. The other part is cloaking for Googlebot. DNS vulnerabilites and hosting vulnerabilites can also be exploited. Note that it may be possible to be hijacked, but with no hacks to your site itself.

I can't comment on the themes or scripts you're using as a source of vulnerability... and we don't want to implicate specific themes publicly unless we're certain about them. but these are possibilities.

Please also expand on your thoughts about Google's view of the script you've been using, which may well be it.

For now, here are several threads with different scenarios, as well as a bunch of reference links. I hope others jump in here, as I'm going to be very scarce for a while, and really, this is not an area of specialty for me.

sikosaurus, please provide feedback on what you see when you explore further. Viewing as Googlebot will help you to see some things you wouldn't see otherwise.

Here are a few threads possibly of interest. Even if they don't get to your precise problem, you'll get a sense of how hackers work....

My site's being de-indexed and replaced by others
Feb, 2016
https://www.webmasterworld.com/google/4790240.htm [webmasterworld.com]

Google Result Hijacking
April, 2015
https://www.webmasterworld.com/google/4800812.htm [webmasterworld.com]

Proxy Server URLs Can Hijack Your Google Ranking...
https://www.webmasterworld.com/google/3378200.htm [webmasterworld.com]

4:36 pm on Nov 13, 2017 (gmt 0)

New User

Top Contributors Of The Month

joined:May 22, 2017
posts: 28
votes: 5


Firstly, thanks to Robert for spending the time and effort on editing out my initial post.

Firstly, the issue itself

After doing a site:mydomain.com search, several results appeared from other site:

Other site page title
Other site page summary
www.mydomain.com/some-url-here <------ only this part had MY stuff in it (url)

When clicking on result, it properly leads to my site. When examining cache, it shows intruder site url and data.

IMPORTANT NOTE:
- my site is 100% on my dedicated server, with nothing shared with the intruder site (script included), no relation whatsoever
- only similarity between sites is the fact they both use the same licensed script, which is secure and doesn't come cheap (it is relatively widely used)

Secondly, regarding HACK diagnosis...

Thanks to Robert once again for his effort, but it is NOT a hack, and I'm 100% sure of that.
Site is freshly published, still being worked heavily on, and I know every byte of it.
Other than that, it would also make no logic to hack it because it's freshly launched and pretty much without traffic :)
(if it's even possible to hack this kind of sites, they would target much bigger fishes than my new site..)

Thirdly, why I'm specially interested in this "silly issue"...

Normally, I would just let google "fix itself", which it most likely will, probably very soon.

However, I have 2 more flagship sites of my business, based on the very same script.

On May 17th, both of my sites suffered identical google ranking hit and traffic fell down drastically, roughly 80-90% down.
Considering both of my sites are similar-ish (url structure, principle of use, etc), I figured there is some penalty involved.
Important note - content wise, sites are completely different, 100% unique stuff on both, not linked between, different designs etc.

Six months and counting, I'm trying to diagnose the issue, with no luck so far. I've done a lot of tweaks along the way and many experiments. Nothing helps, for either of sites. Sites are down in the gutter but traffic is very stable, barely 1000 google hits daily on each (no further downfall it seems).

What I've started to believe is that google somehow and for some reason, punished ALL sites based on the script I'm using.
Issue I described back in original post is I believe simple confusion in google engine, considering the affected urls are both the same, like: domain.name/category-list/, which wouldn't even be surprising if google internally groups all new and old sites built on the same backend script

Like I said, script is widely used, probably on some very spammy sites as well, which could have been a signal for google to punish all of the kind.

So.. yeah, sounds like a conspiracy theory, but don't know what else to think at the moment.

Feel free to ask anything if you're interested in helping or just theories about this, I hope the initial part was clear enough :)
8:26 pm on Nov 13, 2017 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month

joined:Sept 14, 2011
posts:1045
votes: 132


Just out of interest are all the sites under the same webmaster search console account?
9:10 pm on Nov 13, 2017 (gmt 0)

New User

Top Contributors Of The Month

joined:May 22, 2017
posts: 28
votes: 5


Hmm, yes, new site with the above mentioned site query issue and 2 of my biggest sites, they are all on my own (same) search console account.

Think that is related to penalties of some sort?
9:28 pm on Nov 13, 2017 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month

joined:Sept 14, 2011
posts:1045
votes: 132


Yep! [seroundtable.com...]
11:29 pm on Nov 13, 2017 (gmt 0)

New User

Top Contributors Of The Month

joined:May 22, 2017
posts: 28
votes: 5


Wow, I just realized I forgot to mention one big piece of info in my second part of the post....

Other than my two bigger sites, I've also checked more than few OTHER people sites, built on same platform. Some bigger than mine, some smaller - all suffered the same fate on pretty much identical dates (traffic spike in march-april and drastic downfall in May).

@seoskunk - it is possible that same account of webmaster tools helps them diagnose "issues" but I refuse to believe my sites are spammy in any way (only thing I can think of that "smells like spam" is the fact that sites update often, every 5 or so hours, every single day, but it's not any scraped content but unique pic + text + video, 100% unique, self hosted)
6:22 pm on Nov 19, 2017 (gmt 0)

New User

Top Contributors Of The Month

joined:May 22, 2017
posts: 28
votes: 5


System: The following message was spliced to this thread by robert_charlton - 2:37 pm on Nov 19, 2017 (PDT -8)

Mod's note: This is a continuation of previous thread, an ongoing part of the diagnosis of the originally observed problem, and I've spliced it on to that discussion to maintain continuity.

Poster's title for the new thread was:
Website cloning and google issues - incredible


---

I know this was already written about, but I need to add a few things.. because what I've seen in last 24 hours is next to incredible...

Long post, but I'm trying to cover it all...

Cloning method:
1. hackers don't need access to your site at all
2. simple script is put in on some-clone-site.com (it is scary how simple the script is - I even found the script and I tested it)
3. yoursite.com gets rewritten tosome-clone-site.com in every instance, your url is nowhere to be found on the clone site
4. YOU are actually hosting the clone site. Your site is NOT scraped (copy pasted), just served to another url, with rewrites that hacker puts in
5. script got, to explain simply - FIND and REPLACE parameters in it (example - to turn bits of your code into their banners or to simply add even more keywords to whatever they want)
6. voila, they now have an exact replica of your site, ready for googlebot to come and index it as if it is something new

Usual targets:
Big sites with huge amount of subpages (thousands).
With such big sites, it's next to impossible to detect you're being cloned - any refference to yoursite.com is being rewritten as clonesite.com, and traffic fluctuations are a normal thing. You'll pretty much realize you've been cloned when it's already too late and effects on your google ranks are already drastic.

What can you do to defend yourself
Firstly, problems with defending:
1. Fixing your site (javascript redirects, htaccess, blocking IPs..) - yes, you can do all that - problem is, hacker can simply tweak his cloning method and your effort is wasted, clone keeps running
2. Takedown notices, dmca, spam report, abuse.. - yes, you can do all that too - by the time you're done with it, your business suffered a huge hit in google and you've spent valuable resources (time and money) to get his site removed.. but a fun fact: hacker can get a new clone running in 5 minutes (literally, once again, I've seen that script and it is scary how simple it is)

Never the less, a quick few things you CAN do:
1. code that checks if your site is your site (needs to be done in a specific way, feel free to send me a private msg and I'll explain it - not writing it here to avoid chances that hackers are seeing this too and potentially working on counter solution for it, and no, I'm not charging money for it)
2. your host can block their server IPs or you can do it via htaccess and such - this is short term stuff, because it's pretty much fighting the windmills (hacker can change his ip and server or better yet, he can simply launch a few other clones that are pointed from elsewhere)
3. google dmca/spam reports.. good luck with that if you're running sites with 1000s of subpages like I am - same problem, fighting windmills - you'll spend a week on listing urls, or 100s $ for some agency to do it, and chances are you'll just get a google dmca team's response saying they need "further proof.." (oh and, did I mention googlers are crazy slow to respond? yep, they are)


Regarding why I wrote "incredible" in the thread title..

Magnitude of this kind of hacking / mirroring / cloning is off the charts.
Thousands of (big) sites have their clones in google index and they all definitely suffer to a degree because of it, with two differences:
- some suffer less ranking drop, some virtually disappear from google index - life and business ruining type of thing
- some sites know about it, some have no idea (as funny as it may be, it isn't easy to discover a clone)

What really puzzles me is google role in all of this. They are either pretending the problem does not exist or they are completely clueless on how to fix it.
Yes, I know algorithms are not simple stuff and so on but... what the hell:

YOURSITE.com - thousands of indexed pages, good ranks, original content, brand established
CLONE-SITE.com - oh, an identical clone site just appeared, lets deindex YOURSITE.com and index that one, despite the fact it's 99% identical (I'm saying 99% because chances are that hacker "glued" 1% of his own keywords / code inside yours).

I strongly believe google needs to do something about this.

Here is my recommendation to google, as FIRST STEP on how to assist webmasters:
Some kind of spam report that is sitewide.
Example being:
INPUT YOUR SITE: yoursite.com
INPUT CLONE SITE YOU'RE REPORTING: clone-site.com
Algorithm should be able to do two simple things:
1. compare sitewide data of both sites, % similarity
2. compare actual indexing dates and age of both sites

If algorithm is in doubt of any kind --> pass along to manual review.

Voila - instant deindexing of clones can commence and internet becomes one hell of a nicer place to be in.

--------

I'm not just saying all of this because my sites are affected by this malpractice - I'm saying it because I know how devastating it is andI KNOW it's happening to thousands of websites and webmasters.

P.S. If by any wild chance anyone can get me a contact of some googler or anything such, I can provide all info needed to make this malpractice go away from internet for good. Links, proofs, codes, methods.. I can show it all and explain it all.

P.S. 2 I'm dead tired - I've been reading, researching, testing and figuring stuff out for almost 24 hours with barely any sleep. If I omitted a piece of info - please ask me anything.

[edited by: Robert_Charlton at 11:06 pm (utc) on Nov 19, 2017]
[edit reason] added notes re splicing threads [/edit]

11:44 pm on Nov 19, 2017 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month

joined:Sept 14, 2011
posts:1045
votes: 132


I am adding this for the forum I have already pm you, 2 things that together should cure the problem

As I said in my email though site hijacks are normally a symtom of underlying issues with a website.


1. Prevent direct access to you ip (add to header)

<?php

$servername = $_SERVER['SERVER_NAME'];

if($servername == 'yoursite.com'){

} elseif ( $servername == 'www.yoursite.com'){

}else{
die("Direct ip access not allowed!");
}
?>

2. Show Google when it renders the page its a fake site (add to footer)

<script type="text/javascript">
var myurl = "www.yoursite.com";
var currenturl = window.location.hostname;
if(myurl != currenturl) {
var content = 'Fake Site';
document.getElementsByTagName('body')[0].innerHTML = content;
}
</script>

Enjoy :)
5:26 am on Nov 20, 2017 (gmt 0)

New User

Top Contributors Of The Month

joined:May 22, 2017
posts: 28
votes: 5


I would like to thank Robert for publishing this, but as I already wrote to him personally, my two posts talk about different issues on different sites. Second post, second issue, is something that is very tricky and needs special resolutions. If possible, Robert, please at least change the title of this discussion to "site cloning and google issues" or such.

Two solutions that seoskunk gave are valid, and I do indeed recommend affected webmasters to apply them - but they are short term solutions.

During my research of this, I've also discovered the method that is widely used for website cloning (one of the methods at least).
I mentioned it in my longer post to a degree and all I can say is, a determined hacker can easily jump over the fixes that seoskunk suggested.
The hack method I found would enable me to skip those fixes with ease, few lines of code and it's out, cloning continues (and I am not even a coder or programmer).
6:49 pm on Nov 20, 2017 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month

joined:Sept 14, 2011
posts:1045
votes: 132


You could also try forward/reverse dns of google bot as the traditional proxy hijack breaker in .htaccess

[webmasters.googleblog.com...]

But I am not sure that will work for chained proxies.
10:33 pm on Nov 22, 2017 (gmt 0)

New User

Top Contributors Of The Month

joined:May 22, 2017
posts: 28
votes: 5


Well.. just stumbled up on the following little article that kind of goes hand in hand with what I wrote above..

“Important: The Lowest rating is appropriate if all or almost all of the MC (main content) on the page is copied with little or no time, effort, expertise, manual curation, or added value for users. Such pages should be rated Lowest, even if the page assigns credit for the content to another source.“ – Google Search Quality Evaluator Guidelines March 2017

TLDR: ‘Duplicate content‘ is NOT mentioned **once** in the recently published Search Quality Raters Guidelines. ‘Copied content’, is. Semantics aside, duplicate content is evidently treated differently by Google than copied content, with the difference being the INTENT and nature of the duplicated text.
Duplicated content is often not manipulative and is commonplace on many websites and often free from malicious intent. Copied content can often be penalised algorithmically or manually. Duplicate content is not penalised, but this is often not an optimal set-up for pages, either. Be VERY careful ‘spinning’ ‘copied’ text to make it unique!


Not like I didn't know it until now, this was more the final official confirmation I needed.

So when you combine this piece of the puzzle with the site clones running loose all over google and getting indexed like never before (coz, google is fast now), you get a VERY DEAD business from affected webmasters.



TLDR: Your original site gets cloned => Copied / Spun content issue => Google Penalty => Dead website with no traffic



I guess black hat webmasters don't need to think much on how to beat the competition. Just clone and devalue to hell and back. Rinse and repeat.
Sad truth :(
11:37 pm on Nov 22, 2017 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month

joined:Sept 14, 2011
posts:1045
votes: 132


Sikosaurus I have given you three solutions. Have you actually implemented them?

1. Protect Direct access to your ip
2. Rewrite the rendering content
3. Reverse and forward dns lookup of googlebot to ban proxies when crawled by googlebot

All three work (3) not sure on chained proxies, try implementing the code please, if you need help with the .htaccess code just ask
11:43 pm on Nov 22, 2017 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month

joined:Sept 14, 2011
posts:1045
votes: 132


I will also repeat what I told you upfront, no one gets cloned or copied without underlying problems on their site. Its a symptom of other problems.
6:45 am on Nov 23, 2017 (gmt 0)

New User

Top Contributors Of The Month

joined:May 22, 2017
posts: 28
votes: 5


I have done all the methods to protect myself seoskunk, that is not an issue.
I have also taken down all clones that I know of at this moment.

That doesn't negate the fact that I still have:
1. over 50 000 copies of my pages indexed within google, from multiple domains
2. hard to guess, but it isn't hard to believe that some xy site has restarted the cloning.

Underlying problems - I honestly believe I have none of those. My sites were NEVER hacked. Sites NEVER had a manual action of any kind.