If you need help on banning ip's - here you go:
Anyway, back to the topic:
This is not new.
Google knows all about them.
This is the largest scale in awhile (they must have gotten a new bigger hard drive. More than likely - there is only one machine involved here on a modest connection).
To the best of my knowledge, this is legal under chinese law and Google may not do anything.
You are the master of your domain - take control of your own site - the power is yours.
"This is the largest scale in awhile (they must have gotten a new bigger hard drive. More than likely - there is only one machine involved here on a modest connection)."
I had no idea that they store your site on their servers. I though it was just a server trick and no information was stored in their server.
"To the best of my knowledge, this is legal under chinese law and Google may not do anything."
Sure Google can. They can add a line that blocks *thief's-domain.com. Google has the right (and the obligation in this case) to block them. Most of us know what domain we're talking about, so search Yahoo for it and notice how only their homepage is shown, none of the stolen domains. In Google, you'll see close 150,000 domains.
This hurts the sites, but also the G serps, since sites with good content suffer from the dupe penalty, and aren't ranked on their merits. This is interfering with Google's business and there's no question on who owns the content.
They are doing both walkman.
>Sure Google can
We don't know what Googles agreement with the Chinese govt is - and this very well may be a state sponsored item in Bejing.
My hunch is that it is the same crew that was ripping pages via msn cache in late november.
Well, I've been reading all sorts of different opinions etc:
|and this very well may be a state sponsored item in Bejing. |
china, if you have any difficulty accessing websites due to filters, you can try this: add "example url" to any url this way:
this is an anonymizer that was designed specifically to get around chinese filters, though it works from here as well. it is VERY slow, but seems to work.
Some forums are getting very heated about it, some are looking at it as an opportunity.
|I had no idea that they store your site on their servers. |
It's way over my head, I've thrown it at my techies who are calling it "real time proxying" or " forwarding sites via subdomains" without creating the sites on their servers.
I've five IP addresses now, if required let me know.
Wow! I had never heard of this Chinese site but after reading this thread I went and checked. Sure enough three sites I monitor are there.
Thanks for the heads up webmasterworld.
I imagine a lot of webmasters would appreciate the heads up on this one.
Question: Will an IP ban be enough to keep them off your back? Or are they getting your site from the search engine caches?
What a great (terrible) phishing scam. I tested the url out on webmasterworld and forgot. Later when I went back to 'webmasterworld' to post a topic, I didn't look closely at the url, but I noticed I wasn't logged on so I typed in my name and password into their version of webmasterworld(and quickly changed it when I did notice).
Search engines are not for webmasters, they are for Internet users.
Several recent examples unfortunately seem to indicate that it is not SE's main concern if users get your information from yourdomain.com or via chinaxyz.cn, or via some other type of redirection like 302, as long as the search finds the information you have put together.
[edited by: geekay at 10:12 am (utc) on Jan. 30, 2005]
Is there anyone who uses the internet more than webmasters?
|Search engines are not for webmasters, they are for Internet users. |
"Search engines are not for webmasters, they are for Internet users."
how about... Search engines are not for webmasters, they are for Internet spenders.
& webmasters are a tiny subset of internet spenders?
|Several recent examples unfortunately seem to indicate that it is not SE's main concern if users get your information from yourdomain.com or via chinaxyz.cn, or via some other type of redirection like 302, as long as the search finds the information you have put together. |
Yes, users are the constituency that search engines should cater to. Which is also why they should try to link to original sources of information. The source of the information is as important as the information itself. As a user, I want to know what is said AND who said it. That is why search engines should try to list the original sources (original websites) not scraped versions or cached versions.
And besides, as I stated above, this kind of operation has huge potential for phishing. You think you are on the original site but you are on their clone and you input your valuable personal information such as passwords and credit card numbers. Maybe, the site that we were talking about is just a caching service. I really don't know. But the next one that comes along might be even more sinister. As a model, this kind of site is dangerous to the security and integrity of the web.
I've investigated it some more. It does seem to grab your pages in real time (It's not coming from a cache). And if you want to block them, add the following to your .htaccess code
deny from aaa.bbb.ccc.ddd
deny from aaa.bbb.ccc.ddd
deny from aaa.bbb.ccc.ddd
deny from aaa.bbb.ccc.ddd
deny from aaa.bbb.ccc.ddd
allow from all
Where aaa.bbb.ccc.ddd are the IP addresses that OptiRex will give you if you ask nicely. It stops them dead (for now).
"It stops them dead (for now)."
sadly new ones will pop up. They need to be banned by the search engines.
|sadly new ones will pop up |
True. How true. But I don't really care about the ones with small time traffic. If they get big enough for us to take notice, then we shut 'em down. The fact that they grab the pages real time makes it easy to determine their IPs. It is scary though.
"But I don't really care about the ones with small time traffic. If they get big enough for us to take notice, then we shut 'em down"
I think you're missing the real problem. Traffic is not it. If Google indexes yoursite-com-chinese-url-com (because someone linked it or something), your rankings are out of the window. You have a serious dupe problem. By the time you'll find out most likely your rankings are drastically down.
Good Point. Except I'm not THAT worried because for my main bread-and-butter sites, I'm pretty sure it is they who will get the duplicate content penalty and not me. I've had people pilfer before and they always got the lower rankings. But you're right it is a potential problem.
So now I have read all of these "page jacking" threads and I am sure that some people do this intentionally, but I think it is worth pointing out to everyone that there is an actual innocent explanation for this.
I run my site with some CMS software. It has lots of neat features. One feature is an external link tracking system, so that whenever anybody clicks on a link off my site it gets recorded in my database. I can later look at a stats page and see just how many times each of the outgoing links were clicked. This is useful to me. Then I can fire off an email to one of the destination websites and say "Hey, did you notice I sent you 592 clicks last month? Maybe you could put a recip link on your site back to me? Thanks." So it's a useful feature that is not nefarious.
Guess how it works?
In order to track the clicks what it does is make the actual href a link to a script on my site. That script notes the click in my database, and then issues a 302 redirect to the real site in order to get the user where they wanted to go. I don't think the people who wrote the CMS software that I use had any ill intentions when they wrote this software. I don't think they considered search engines AT ALL when they wrote it. I doubt very much they have any clue how search engines work--they just wanted to write some software that tracks clicks.
Now if I understand correctly this has the potential to trigger a bug in Google: Under some circumstance this 302 redirect will wind up making *my* script replace *your* page in Google's directory?
That's a ridiculous bug if that is the case.
I did some searching and I am pretty sure that my site has not "page jacked" any of the people I made links to. I made those links to build good will with other sites in the hopes of getting recip's.
So here's my question:
Should I shut this sucker off now before (a) someone thinks I'm trying to page-jack them, and (b) Google thinks I'm doing something nefarious, and (c) one of you guys goes ballistic and rains down some team of lawyers on my head thinking it's some DMCA violation?
There is very clearly a google bug here and it looks like Google should fix this. I would like to be able to track clicks off my site. I'd like NOT to cause anybody else any problems when I do that!
Can anyone clarify if I have understood this all correctly?
OK. Lets say you send the message:
"Hey, did you notice I sent you 592 clicks last month? Maybe you could put a
recip link on your site back to me? Thanks."
.. and the other guy links back to you, the same way you linked to him .. php script or whatever.
Would that be what you wanted, or are you hoping for a straight <a> href= type link? - Larry
I realize the PR implications of this. When I really care about PR I don't link this way (and shouldn't you be complaining to Google to get them to count these links too? Why don't they?).
But let me point out that not every link on the web was created for Google.
Should be easy for a bot to follow, forward the PageRank, and so on. So if Google isn't doing this, whine to them.
I guess the main point I'm making here, or trying to make, is that a lot of people have link systems like this and they are just honest people who are trying to do another site a favour by linking. There seems to be a thread here that anybody who uses a PHP link is trying to cheat someone, or page-jack, and I am just pointing out that thousands of people use these systems without ever having heard of page-jacking or SEO.
WebMasters shouldn't have to go read up at webmasterworld.com to find out what's the current recommended way of linking--especially if they are using off-the-shelf software to build their site, they aren't even going to know or care how it is working. The click on "add link" on their site and so far as they are concerned they are done.
Linking this way is rapidly becoming a common practice, and it is *useful* to know where you are sending traffic. My opinion is that Google can and should count these links.
I have used cgi scripts to redirect users and google never indexed those urls. Then I tried it with a PHP click tracking script and even banned the bots from that script in the robots.txt but google still indexed the pages.
Google definately has the bug here. It needs to be fixed. That's just all there is to it.
The only thing I found that helps stop your site from getting a penalty caused by the 302 redirects is to put randomly changing content on every one of your pages.
I did this to a new site I built 6 months ago and I see the hijacker urls still when I do a inurl: search but the site still get's a nice flow of traffic from google. No penalties.
So my guess is that the hijacker urls are causing a dupe content penalty. and my random content is keeping the pages from being exact duplicates of the hijacker urls.
I simply used a random quote generator and called it into the page using SSI. 5 quotes per page seems to keep it totally random so that no page has the same quotes. And the database has 100 quotes to rotate.
eyez, there's a quote plugin for the software i use.. i always thought it was a gimmick, but now, hey!
did it matter where the quotes were on the page? did it make your site uglier? maybe i can think of some valuable content to rotate through. hmm.
The random stuff only keeps the pages from getting a dupe penalty from the 302 redirects.
I have an older site 4 years, that was completely dropped from google because of the 302 redirects. when I added the random stuff to the pages it took about 4 weeks and that site began to get traffic again.
The funny thing is that I had random stuff on that site since the beginning and it allways did great in google. I had built a website for my Mom that did good too for about 3 months until it got hijacked.
And every site I built after October 2003 had the same problem. All hijacked. I didn't realize what was going on for over a year.
I noticed the hijacker urls on inurl searches for all of my websites I built. So I just figured it was a normal thing because I had one site that was fine and getting tons of traffic from google. Which was the site with random content on all the pages.
It wasn't until that website crashed the server it was on and my host shut it down. So I moved it to a dedicated server but didn't put the random content on the pages.
Within a month that site was blocked from google and now I realize it was because of the 302 redirects and duplicate content filter. That old website did good all that time because it had random content on the pages.
All the other websites I built didn't have the random content and they all were blocked from google's results.
Of course now I have random content on all of my websites and they are all starting to do great. Of course it's only been 1-2 months for some of my sites so we'll see what happens.
I've had more than a few instances of someone from a specific network in china trying to offload my entire web site, which would be about 40,000 pages or more. I catch them easily (so far) because each time they come they set off a bandwidth alarm which I have set to go off when the site exceeds 50% of average traffic. Whoever they are, they are greedy pigs and hit the site hard. Since the alarm goes off within 5 minutes of extended traffic excess, I just go look to see who's being bad and block their IP.
Finally, I got fed up and blocked their entire network since I dont do any business with them it had no impact on my site. Don't know if this is the same group you guys are talking about, but they were pretty persistent with a new IP every other day or so before I blocked their network.
I've just had this problem with a new site!
I couldn't figure out why on earth it wasn't being indexed. Even tried new domains on the site but of course dupe content prevented anything happening.
IP addresses is banned... hopefully this will fix it.
| This 54 message thread spans 2 pages: < < 54 ( 1  ) |