Forum Moderators: Robert Charlton & goodroi
I am really PO'd and worried right now.
Someone in China is duplicating my page and hundreds of others and spoofing the url like:-
www.mydomainname.theirdomainname.com/
Whats up with this?
I am about to email a bunch of sites he has done this to, but want to make sure this is not some legit thing, I don't understand (newbie)
I tried running a whois on the domain in question and can't get results
Please help,
Thanks
Scott
[edited by: trillianjedi at 10:09 am (utc) on Jan. 16, 2005]
[edit reason] TOS - no specifics or URL's please [/edit]
Anyway, back to the topic:
Recap:
This is not new.
Google knows all about them.
This is the largest scale in awhile (they must have gotten a new bigger hard drive. More than likely - there is only one machine involved here on a modest connection).
To the best of my knowledge, this is legal under chinese law and Google may not do anything.
You are the master of your domain - take control of your own site - the power is yours.
I had no idea that they store your site on their servers. I though it was just a server trick and no information was stored in their server.
"To the best of my knowledge, this is legal under chinese law and Google may not do anything."
Sure Google can. They can add a line that blocks *thief's-domain.com. Google has the right (and the obligation in this case) to block them. Most of us know what domain we're talking about, so search Yahoo for it and notice how only their homepage is shown, none of the stolen domains. In Google, you'll see close 150,000 domains.
This hurts the sites, but also the G serps, since sites with good content suffer from the dupe penalty, and aren't ranked on their merits. This is interfering with Google's business and there's no question on who owns the content.
and this very well may be a state sponsored item in Bejing.
china, if you have any difficulty accessing websites due to filters, you can try this: add "example url" to any url this way:
your.url.example.url
this is an anonymizer that was designed specifically to get around chinese filters, though it works from here as well. it is VERY slow, but seems to work.
</end quote>
Some forums are getting very heated about it, some are looking at it as an opportunity.
I had no idea that they store your site on their servers.
It's way over my head, I've thrown it at my techies who are calling it "real time proxying" or " forwarding sites via subdomains" without creating the sites on their servers.
I've five IP addresses now, if required let me know.
Thanks for the heads up webmasterworld.
I imagine a lot of webmasters would appreciate the heads up on this one.
Question: Will an IP ban be enough to keep them off your back? Or are they getting your site from the search engine caches?
Several recent examples unfortunately seem to indicate that it is not SE's main concern if users get your information from yourdomain.com or via chinaxyz.cn, or via some other type of redirection like 302, as long as the search finds the information you have put together.
[edited by: geekay at 10:12 am (utc) on Jan. 30, 2005]
Several recent examples unfortunately seem to indicate that it is not SE's main concern if users get your information from yourdomain.com or via chinaxyz.cn, or via some other type of redirection like 302, as long as the search finds the information you have put together.
Yes, users are the constituency that search engines should cater to. Which is also why they should try to link to original sources of information. The source of the information is as important as the information itself. As a user, I want to know what is said AND who said it. That is why search engines should try to list the original sources (original websites) not scraped versions or cached versions.
And besides, as I stated above, this kind of operation has huge potential for phishing. You think you are on the original site but you are on their clone and you input your valuable personal information such as passwords and credit card numbers. Maybe, the site that we were talking about is just a caching service. I really don't know. But the next one that comes along might be even more sinister. As a model, this kind of site is dangerous to the security and integrity of the web.
order allow,deny
deny from aaa.bbb.ccc.ddd
deny from aaa.bbb.ccc.ddd
deny from aaa.bbb.ccc.ddd
deny from aaa.bbb.ccc.ddd
deny from aaa.bbb.ccc.ddd
allow from all
Where aaa.bbb.ccc.ddd are the IP addresses that OptiRex will give you if you ask nicely. It stops them dead (for now).
sadly new ones will pop up. They need to be banned by the search engines.
I think you're missing the real problem. Traffic is not it. If Google indexes yoursite-com-chinese-url-com (because someone linked it or something), your rankings are out of the window. You have a serious dupe problem. By the time you'll find out most likely your rankings are drastically down.
I run my site with some CMS software. It has lots of neat features. One feature is an external link tracking system, so that whenever anybody clicks on a link off my site it gets recorded in my database. I can later look at a stats page and see just how many times each of the outgoing links were clicked. This is useful to me. Then I can fire off an email to one of the destination websites and say "Hey, did you notice I sent you 592 clicks last month? Maybe you could put a recip link on your site back to me? Thanks." So it's a useful feature that is not nefarious.
Guess how it works?
In order to track the clicks what it does is make the actual href a link to a script on my site. That script notes the click in my database, and then issues a 302 redirect to the real site in order to get the user where they wanted to go. I don't think the people who wrote the CMS software that I use had any ill intentions when they wrote this software. I don't think they considered search engines AT ALL when they wrote it. I doubt very much they have any clue how search engines work--they just wanted to write some software that tracks clicks.
Now if I understand correctly this has the potential to trigger a bug in Google: Under some circumstance this 302 redirect will wind up making *my* script replace *your* page in Google's directory?
That's a ridiculous bug if that is the case.
I did some searching and I am pretty sure that my site has not "page jacked" any of the people I made links to. I made those links to build good will with other sites in the hopes of getting recip's.
So here's my question:
Should I shut this sucker off now before (a) someone thinks I'm trying to page-jack them, and (b) Google thinks I'm doing something nefarious, and (c) one of you guys goes ballistic and rains down some team of lawyers on my head thinking it's some DMCA violation?
There is very clearly a google bug here and it looks like Google should fix this. I would like to be able to track clicks off my site. I'd like NOT to cause anybody else any problems when I do that!
Can anyone clarify if I have understood this all correctly?
.. and the other guy links back to you, the same way you linked to him .. php script or whatever.
Would that be what you wanted, or are you hoping for a straight <a> href= type link? - Larry
Should be easy for a bot to follow, forward the PageRank, and so on. So if Google isn't doing this, whine to them.
I guess the main point I'm making here, or trying to make, is that a lot of people have link systems like this and they are just honest people who are trying to do another site a favour by linking. There seems to be a thread here that anybody who uses a PHP link is trying to cheat someone, or page-jack, and I am just pointing out that thousands of people use these systems without ever having heard of page-jacking or SEO.
WebMasters shouldn't have to go read up at webmasterworld.com to find out what's the current recommended way of linking--especially if they are using off-the-shelf software to build their site, they aren't even going to know or care how it is working. The click on "add link" on their site and so far as they are concerned they are done.
Linking this way is rapidly becoming a common practice, and it is *useful* to know where you are sending traffic. My opinion is that Google can and should count these links.
Google definately has the bug here. It needs to be fixed. That's just all there is to it.
The only thing I found that helps stop your site from getting a penalty caused by the 302 redirects is to put randomly changing content on every one of your pages.
I did this to a new site I built 6 months ago and I see the hijacker urls still when I do a inurl: search but the site still get's a nice flow of traffic from google. No penalties.
So my guess is that the hijacker urls are causing a dupe content penalty. and my random content is keeping the pages from being exact duplicates of the hijacker urls.
I simply used a random quote generator and called it into the page using SSI. 5 quotes per page seems to keep it totally random so that no page has the same quotes. And the database has 100 quotes to rotate.
I have an older site 4 years, that was completely dropped from google because of the 302 redirects. when I added the random stuff to the pages it took about 4 weeks and that site began to get traffic again.
The funny thing is that I had random stuff on that site since the beginning and it allways did great in google. I had built a website for my Mom that did good too for about 3 months until it got hijacked.
And every site I built after October 2003 had the same problem. All hijacked. I didn't realize what was going on for over a year.
I noticed the hijacker urls on inurl searches for all of my websites I built. So I just figured it was a normal thing because I had one site that was fine and getting tons of traffic from google. Which was the site with random content on all the pages.
It wasn't until that website crashed the server it was on and my host shut it down. So I moved it to a dedicated server but didn't put the random content on the pages.
Within a month that site was blocked from google and now I realize it was because of the 302 redirects and duplicate content filter. That old website did good all that time because it had random content on the pages.
All the other websites I built didn't have the random content and they all were blocked from google's results.
Of course now I have random content on all of my websites and they are all starting to do great. Of course it's only been 1-2 months for some of my sites so we'll see what happens.
Finally, I got fed up and blocked their entire network since I dont do any business with them it had no impact on my site. Don't know if this is the same group you guys are talking about, but they were pretty persistent with a new IP every other day or so before I blocked their network.