Welcome to WebmasterWorld Guest from 54.198.119.26

Forum Moderators: ergophobe

Message Too Old, No Replies

Website has been copied.

What has taken place in terms of CMS/server security?

     
6:38 am on Jun 3, 2017 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Feb 25, 2004
posts:994
votes: 47


I recently became aware of a domain that has copied what seems to be my entire site (100% of pages, graphics, everything).

When I view the source code of the pages on the infringing site I see references to Drupal, my CMS.

Does that mean they too are using Drupal and have a copy of my original database, or have they just copied the HTML of my pages and are serving them as static HTML pages, or something in between?

I know how to pursue the infringing domain, I just don't know how much of a security lapse what they have done is.
7:02 am on June 3, 2017 (gmt 0)

Administrator from US 

WebmasterWorld Administrator not2easy is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2006
posts:3719
votes: 205


They may be serving it from your own host via framing. If you can view the source code and it shows links to your images, your navigation, it is probably framed. If you update your site or change a sentence and it shows on their site - it is very likely to have been framed. If you look through your access logs you may find evidence there too.

You can add a header to your pages via htaccess that prevents their ability to frame your site:
Header append X-Frame-Options SAMEORIGIN

You could use a line or so of javascript, but that is page by page.
7:58 am on June 3, 2017 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:11443
votes: 686


@Broadway. - I agree with not2easy. Framing is the most logical reason for seeing your entire site "copied" remotely (at another location) so there likely wasn't any hack or security failure at your end.

After you install the htaccess code not2easy suggested, remember to clear cache and close/relaunch your browser before going to the offending site to check.
4:49 pm on June 3, 2017 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:14707
votes: 613


Don't overlook the third possibility: The DNS for their domain name is pointed to your physical space. That means they haven't “copied” anything--they’ve found an even simpler way to swipe the whole thing. On rare occasions this can even happen by accident when someone changes DNS. (It happened to someone I know. He was understandably wigged-out, because his site involves HUGE files; it seemed as if it would be impossible for anyone to scrape the whole thing--and in fact they hadn't. Whew.) It's one more thing to investigate.
6:49 pm on June 3, 2017 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:11443
votes: 686


The DNS for their domain name is pointed to your physical space
If the DNS was pointed to Broadway's account then Broadway couldn't view his content at the other site, the browser would be taken to his own site unless he use his IP address.
That means they haven't “copied” anything
If Broadway's site is being framed, there isn't any copying going on. His site is just being displayed at the remote address.
7:12 am on June 4, 2017 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11276
votes: 133


yet another possibility is them serving your live content on their urls through a proxy server.
i would study the source in a few of those documents to determine whether the content was scraped in whole or in part, or framed.
try looking for clues in the differences between your sites rather than the similarities.
do the request paths of their internal urls look like yours?
does their internal navigation link to your domain or theirs?
10:24 am on June 5, 2017 (gmt 0)

Moderator from US 

WebmasterWorld Administrator robert_charlton is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2000
posts:12072
votes: 331


I think an early distinction you need to make to help in the diagnosis is whether changes you make in your site are reflected "live" in a captured or hijacked site... or whether your content has been scraped and someone is now using a static capture of your site on their dupe site. That should be fairly easy for you to check out.

If the changes are live... ie, changes you make would be immediately reflected in the dupe site... and framing is probably the easiest for them to accomplish. It's also the easiest to break with frame busting code.

The dns hijacking, or proxy hijacking, as lucy, keplyr and phranque describe, might be a little harder for you to deal with.

Have you noticed any drop in traffic? With proxy hijacking, the proxy server is most likely spoofing Googlebot... in which case it's unlikely you would be experiencing the same problem, eg, on Bing. There are also ways to detect and block this spoofing, to be discussed if that's your problem.

Also, there are a huge number of uses to which the bad guys can put your hijacked site, some of which will involve modifying your pages in ways only visible to Googlebot.

If you used canonical tags on own site, check to see whether these have been rewritten on the hijacking sites to reflect their new domain. Use either a user agent checker running as Googlebot, or use Fetch as Googlebot in your Google Search Console.

11:14 am on June 5, 2017 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11276
votes: 133


Don't overlook the third possibility: The DNS for their domain name is pointed to your physical space.

I think that could only work if the server was configured to accept requests for wildcard hostnames and the wildcard configuration used the virtual server for Broadway's site.
1:34 pm on June 5, 2017 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Feb 10, 2002
posts:936
votes: 4


I think a response from Broadway might help in deciding what has happened to his site and focus the educated 'guessing' that's going on..... ;)
7:52 pm on June 5, 2017 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11276
votes: 133


My focus was on getting an educated response and narrowing Broadway's focus on how to recognize the most likely scenarios.
8:38 pm on June 5, 2017 (gmt 0)

New User

Top Contributors Of The Month

joined:June 5, 2017
posts:2
votes: 0


It is possible when both sites are hosted in a same server company and pointing towards same DNS. Or he might have copied the html and css files, built his own product. Why don't you go for HTTPs in order to protect your website from this kind of infringement things.
9:41 pm on June 5, 2017 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:14707
votes: 613


if the server was configured to accept requests for wildcard hostnames

Well, that's one reason we tell everyone to have a domain-name-canonicalization redirect. It's not just for the wrong www; sometimes it can be an entirely wrong name.

when both sites are hosted in a same server company and pointing towards same DNS

In the real-life case I mentioned above, everyone was using the same host and the same DNS. So it turned out to be a reasonably benign and innocent--but still alarming--mistake. Whew!

HTTPS won't prevent individual humans--or individual robots--from downloading your content and re-uploading it somewhere else. But that may or may not be what happened here.

Broadway? Still with us?
8:25 pm on June 8, 2017 (gmt 0)

Preferred Member from US 

5+ Year Member

joined:June 14, 2010
posts: 606
votes: 4


I have a similar situation, but not quite the same.

We have a handful of aged and tightly focused consumer info based sites that have stood the test of time over the last 6-8 years. These are small sites, only 8-10 pages in size on a specific consumer topic. I won't kid myself to think they are the best, but we have detailed industry knowledge and regardless of who has tried to replicate over the years, we always came out on top of search. (top 1-3 result, answer boxes, great user engagement, etc)

After noticing a sudden traffic drop, we checked the indexing at Google and found that while OUR url's and pages showed as we expected, the "cached copy" of the site in Google index, showed the url of the copied website.

After digging, we found that someone simply copied our html source code, generated a separate html page, and started placing it across several newly purchased or dropped domains. They were not trying to be quiet about it, leaving links to our sites, images, etc. The only thing they changed on these html sites was the canonical URL, which they pointed to their new domain. Clicking on the incorrect cached url FROM Google, redirected right back to Google. Pasting into the browser and it worked fine.

A few DMCA requests later, and a visit through WMT's to "Fetch and Index", and the issue went away quickly. For the time being...

Fast forward 3-4 months later and the same person/company is going back at us again. This time however, they are using hacked websites to post the static html pages within the file structure without the owners knowing about it. It might be a static html page one day, a Joomla site the next and WP the third. Again, the only change is the canonical url. This is obviously some kind of negative seo tactic and unfortunately, its working against us.

The result is that we're constantly (once a week now) forcing GWT to "Fetch and Index" our homepages, to force the cache to update and get our traffic back. Some cache updates are instant, others take several days while we bleed off our daily income losses.

We've tried to get the privacy uncovered to no avail on those domains that the page was on the root, or was obviously controlled by the person/company behind it all.

Not trying to hijack this thread - but is there anything that can be done to avoid this from happening?

How can I get a registrar to give me privacy enabled whois info? We've considered using our Atty to handle this, but by the time they get involved, we've already used the DMCA system to get a site removed, etc.

Thanks in advance
4:30 am on June 9, 2017 (gmt 0)

Moderator from US 

WebmasterWorld Administrator robert_charlton is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2000
posts:12072
votes: 331


mhansen, to answer your last question....
How can I get a registrar to give me privacy enabled whois info?

IANAL, but I assume that they legally can't, as their registrants have contracted for private registration. Possibly you can manage legal action, but I think you'd be better off talking with your hosting company and a security professional than with a lawyer.

What you are describing is essentially a hijacked site, with a network of hacked and or hijacked sites, generally cloaked for Googlebot, some of which have scraped content, and some of which are pointing links to that scraped content. Some might be carrying malware. Some might be redirecting in order to obscure to users what they're doing.

DMCA is not an efficient way of controlling hijackers, but that really depends on the particular circumstances. There's a huge variety of ways that hijacking and hacking can happen, with many combinations and permutations. Pages or sites you see with your content may be innocent victims, and some of these may or may not be "privately" registered.

Private registration doesn't automatically suggest criminality. It's safer for hijackers to rely on hacked sites than to use their own domains, but the domains that belong to the criminals who are doing the hijacking are most likely registered anonymously in third-world countries where it's not practical to pursue them. Various tricks are use to obscure the setup and to hide the hijacker domains.

Any detailed discussion of your problem would inevitably derail this thread for the OP. The admin who sent you here probably did so because members have posted suggestions or diagnostics to cover some of the most likely hijacking scenarios. I don't think it was assumed that you'd repost your entire original thread here.

Unfortunately, the OP of this thread, Broadway, hasn't gotten back to it. I do suggest you read through what's been posted here and see if any of it might apply to you. Right now, you've posted many more observations than Broadway has, but this thread for the moment has the most feedback... only it's all speculative.

For other aspects of your problem, I hope members posting here will jump over to your discussion and comment. Thread here....

Homepage Being Copied - Google Cache Showing Different URL Than Ours
https://www.webmasterworld.com/webmaster/4853054.htm [webmasterworld.com]

9:13 pm on June 14, 2017 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Feb 25, 2004
posts:994
votes: 47


I appologize for my absence. I've had the worst, although eventual for all of us, type of family event. Now I'm back to work and looking to get things back to normal.

I did an exceptionally poor job of explaining my situation.
I'm still under the impression that the infringer has duplicated my entire site.
However they have replaced all instances of my domain name with theirs.
My site uses Drupal CMS, I thought possibly they had somehow gotten a copy of my database and "search and replaced" it, replacing their name with mine.
On a single graphic (in the header) they have edited the picture to show their domain name.

After I posted I realized that a Drupal website should have a user sign in page at:
example.com/user
When I go to that URL I get a 404. I thought that meant it was most likely not a Drupal website but instead cached static pages.

But there is some functionality to some of the pages that I would have thought that could only be provided by Drupal modules.

Things like the Views module that dynamically queries the database and creates pages from that information. Many of these pages seem to work.

Also I had used the Flexslider module for some slideshows, and those pages seem to work.

I found out about this infringement via Adsense (my code was showing on an unauthorized site).
It makes no sense that they would not change my Adsense code to theirs.
Also, per month (according to Adsense reporting) this website is serving less than 10 pages a month (possibly I'm their biggest visitor).

I guess the main thing here is so far I'm lucky and need to get on with squelching this site.
2:02 pm on June 15, 2017 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Feb 25, 2004
posts:994
votes: 47


Ok, I've finally thought clearly enough to understand at least what hasn't been done.
The infringers do not have an exact copy of my database that is then used by by them via Drupal to create their output.

Evidence:
As mentioned above, there is no Drupal sign in page at URL: example.com/user

Above I mentioned that my site serves some dynamically created web pages. My pages take the format:
example.com/topic-page/31,32,33
(The numbers 31,32,33 in this case being the variables that define what search of the database is performed.)

Their pages take the form:
example.com/topic-page/31,32,33.html
When I introduce additional variables into their URL I get a 404 error.
(Also, the 404 page is a standard unformatted one, not the one from my website.)

That puts my mind at ease. At least I don't have a security problem on top of the infringement problem.

I have to assume that software that can capture the output of a website for this type of purpose is commonplace but don't know, so if anyone has anything to say there I'd be interested.
5:24 pm on June 15, 2017 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:14707
votes: 613


Many suggestions will fall under the head of locking the barn door after et cetera. But have you any indication that the scraping is ongoing? That is, if you create a new page, will it appear on the offending site? You might try setting up a honeypot: a page that robots will discover but humans don't see. And then keep track of who visits.
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members