| This 57 message thread spans 2 pages: 57 (  2 ) > > || |
|HIJACKED! Exact Content and Ranking|
Dear Google Support, Jan. 27, 2011 7:30 PM
To wit, my popular #*$!#*$!xx page -
has been hijacked by -
Whereas, my page has come up #1 for the Google search #*$!#*$!xx FOR YEARS,
my EXACT content (with a few omissions) is now being presented by Google
as being the property of bbbbbbbb.net, IN THE #1 SPOT!
How is it that you are displaying MY CONTENT as theirs?
This is my most popular, and highest-earning web page, for perhaps 5 years now.
To prove my point -->
Here is a summary of my recent aaaaaaaa page popularity in Google -
42.2% of all my impressions - all of 2010
41.5% of all my impressions - January 3-10, 2011
41.4% of all my impressions - January 11-17, 2011
40.3% of all my impressions - January 18-24, 2011
36.7% of all my impressions - January 25, 2011
15.6% of all my impressions - January 26, 2011
12.5% of all my impressions - January 27, 2011
I trust that Google will be able to punish these b@st@rds, post haste.
Since AdSense is not being displayed, NEITHER YOU NOR I are getting paid.
Would you also please provide a brief explanation of how this has happened, and suggest how I might avoid this type of HEIST in the future? Please provide guidance for me.
Thank you very much for your time.
#*$!#*$!xx, webmaster for #*$!#*$!xx
(To moderator - Also sent to AdSense support.)
A DMCA to their host might be in line also.
The January 26 date jumps out at me. Note this current discussion...
January 26 2011 Traffic Change - Back to "Zombie Traffic"
From that thread...
|In a nutshell, the first 26 days of January were GREAT! |
It's likely that the scraper was around for a while and came to the surface because of some Google changes on the 26th.
I'm wondering how many of the problems reported on that thread were due to scraped content problems. Possibly, Google is surfacing the spam to make it visible enough to skim off. Or possibly they've filtered the wrong sites.
Sally, how would you describe the kinds of queries for which this page brought in traffic? Take a look at the Jan 26 thread noted, on which I'm trying to get some thoughts about what Google's trying to do.
Yes, a DMCA would be in order. Tomorrow though, it's getting late.
What kind of query?
Google returns the scraper's page for the "3-word-term" that the page was designed for. I feel that is still arguably the best page on the subject, and best result for the 3-word-term. This is evidenced by the fact that Google is serving my identical content, but displayed by ANOTHER ENTITY, in the #1 spot.
If there is too much of this, the system will eventually self-destruct, as publishers call it quits, as the thieves take over. What's the point of trying to piff up a rope?
[edited by: Sally_Stitts at 7:01 am (utc) on Jan 28, 2011]
Same old Google, different day... As said in another thread:
It's only those of us without a doctorate of some kind who think the original content should be displayed in the results and the duplicate version tanked... If we were more educated I'm sure we'd all see the faulty logic in showing the original source at the top of the results and not displaying the duplicates / near duplicates... Somehow it doesn't make sense to the people who do the programming at Google though. I wish I could understand their position, but I really don't get it.
Hope you get the issue resolved.
To all those effected by this recent change. I feel it will be corrected soon, why I say this is this use to be a common trend in Google. G would cut off a filter and all kinds of junk would take over the spots, G would study the results tweek the filter and cut it back on. Study the results and do over again or leave it alone. This is not something G wants to happen and I do expect it to be resolved. This is just my thoughts though. Please let us know when you get the spot back.
Today, my page described above, is down to 11.1% of my impressions, due to my scraped data being displayed by Google in the #1 spot. The r@pe continues.
Seriously folks, how many webmasters are going to continue to create content that is scraped, and awarded the 1st SERP spot by Google? In the past, I have been scraped plenty, but the offenders pages were deeper down in the SERPs. This is the first time that my page has been wiped out of the #1 position, by my own scraped content!
IMHO, THE MOST IMPORTANT JOB OF GOOGLE IS TO KEEP INTERNET CONTENT STRAIGHT. If everyone's data is scraped, and awarded the number 1 SERP spot, then the entire Internet goes straight to he||. Who will want to contribute any longer, when all your efforts are for naught? Many Google issues are discussed here, but there is none more important than this. The Google machine will eventually fail, if it continues this type of carnage.
In my case, the offender URL is in the form of #*$!xx-#*$!#*$!.net. In these cases, it would be highly advantageous to be able to "out" the scum entirely, and not just hint. If I could, I would, right here.
No response from Google AdSense yet. Only my email to AdSense support went through. I don't even know how to contact the Google search people, since my email to them was bounced, using a previously good email address.
I have decided to "OUT" them myself, on my own site. It's my only option, right? ALL names and data will be exposed, and never taken down. This will be a PERMANENT nasty record of this large scraper. I'm tired of effing around. Anyone who PM's me will be given the URL. I will be back in a couple of hours.
Your best chance of getting someone in search's attention would probably be to post it on the Google Help forum for Search, where some Google types hang out.
And I would jump on that DMCA like a duck on a june bug. Send it to their ISP.
Hmm, sounds like you might be the collateral damage of what Matt Cutts just announced on his blog.
OK, my initial cut at exposing this not-small scraper is ready, and online now.
I really prefer not to expose my identity here, but I am shooting at the greater evil, for the common good. IF I am going down, I may as well go down SCREAMING.
Anyone here who wants to see the data, just PM me, and I will send you the link. I will be updating it throughout the day with more info. Does anybody know why whois doesn't seem to work anymore? I provide the URL, so maybe someone can tell me how to get the whois info.
This is war.
EDIT - whois is OK now. I was so piffed, I couldn't even type in the URL properly.
|I don't even know how to contact the Google search people... |
The best / only way to do it is to start a thread in the google webmaster tools forum (if you haven't done so already).
I would name the thread something like: Scraped Content Ranking #1 - My site dropped to #XYZ
I would give as many specifics as possible.
You would most likely have to include the link to your site and the link to their site as well to get any sort of meaningful advice. Note that even then, you might not have a google employee comment on your site or that of the scraper.
Also, there is another thread on this forum by an alorentz (I think that is the poster's handle), where he had experienced something similar. However, this happened to him in November. It might be a good idea to contact him and see if he has any suggestions.
I hope this helps.
I posted on Google Webmasters Tools Forum as suggested.
The title is - "My Scraped Content Ranking #1 - What Should I Do?"
It is already on page 2 - that doesn't look too promising.
Here today, gone in an hour.
My full description details are now indexed on a new page on my site, in the Internet category, and will remain there FOREVER, whether it gets linked to, or not. I'm sick of this cr@p. There will also be a brand new page, for every scraper I find in the future. Enough is enough.
Several folks have PM'ed me for the link. If interested in the details, please do the same.
Iíll say this. This case seemingly proves duplicate stolen content can lower your rankings in Google. Content on a two month old PR0 web site with no links displacing that of a PR5 site which is obviously the original. Plus I donít see the stolen content ranked in Bing/Yahoo. Any thoughts as to why? Google just has to much self-interest in stolen content.
When contacting adsense/Google I find that short and to the point works best. You could have condensed your message into 2-3 lines and said the same thing and the person reading it would appreciate it since they probably read far too many similar emails a day already.
Anyway... if your content is stolen verbatim a DMCA is in order, and perhaps a phonecall to the offending sites host.
I took a look and for sure your content has been stolen!
One thing I noticed.
I did the supplemental test:
Most of your site's index pages are supplemental. I believe they are indexed by Google as both a subdirectory and index.htm file:
show up in Google's results. Google was really complaining about my searches.
If these pages are the source of links to the hijacked page, Google may not handle this well. Don't get me wrong they should, I think other engines do. I think the thief linked directly from the home page, and perhaps this is a secondary factor.
In fact I'm not sure I'd try to correct this now until you resolve the theft. Just trying to provide any hints I can.
Webmaster Tools might provide a clue if you look at your internal link report?
To all of those helpful folks who PM'ed me with advice, thank you very much.
Since I have decided to go full "open kimono" with this, I have added a link to my initial DMCA in doc format to my Stolen Content page, for those who would like to see it, to get some feedback.
Also, I have not filed a DMCA before, and I thought that there are others here who may be in the same boat.
If you have not PM'ed me on this, and would like to see the DMCA that I prepared, please feel free to PM me.
In all likelihood, Google will index the DMCA in about 7 days, so that everyone can have YET ANOTHER free online example. There seems to be a lot of variation in online example DMCAs. I selected one form, and used it. Whether it is the best or not, I will let you decide.
Thank you all very much for all your help.
Something else I noted.
The thief not only linked to the copied contents from the home page. There is a second link with different anchor text and this link 301 redirects to the copied page. I hope something like this isn't fooling Google.
Something is fooling Google.
|I did the supplemental test: |
Is this test still working with reliable results?
|If these pages are the source of links to the hijacked page, Google may not handle this well. |
I'm not following this...
Are you saying that the original Poster (Sally Stitts) site was hacked / compromised and that after they scraped the content, links were placed FROM the original site TO the scraped content?
Or am I just confused again?
I was confused by that too.
I don't believe that my site has been hacked. I just checked the source code, from my downloaded page. It looks OK.
It was hacked a few years ago, but that was by some foreigners.
No problems from them since - they did it for ego reasons - "look what I can do" type of thing.
Hey sally, I find your page to be no. 1 for the three word term and i don't see the scraper site anywhere near.
But i find a pdf page on your site to be ranking and not the one you had shared.I guess google would have been confused with the two content as they both (html and the pdf version) seem to be identical.So it would have temporarily brought up the scraper site to the top.
Did you add the pdf version now or is it there for sometime now?
I just checked, and I find no change. The scraped page still comes up #1 for the 3-word search, using both Firefox and Safari.
Perhaps it is getting fixed, and the change has not rippled over to my results/ area/ whatever. I have great confidence that this SERP will be fixed soon, because it is so obviously wrong.
I will report here as soon as I see any change at my end.
The English .PDF version has been online, almost as long as the .htm page (~5 years?). According to one of my stats packages, it is my #4 most viewed page this month. It consists of the entire page.
I also have a Spanish version .PDF, which link was REMOVED by the scraped page. It consists of the entire page translated into Spanish (poorly).
I also have a Portuguese version .PDF, which link was NOT removed by the scraper. It consists of ONLY the large graphic, translated and created extremely well by a very knowledgeable Portuguese fan.
Let me add that I have sent a STRONG Cease and Desist message to the host, to be followed by a DMCA. I had to use the email from whois, since it is the only one available. The scraper site has no contact page, no privacy page, no about page, etc.
I still find yours on top.Guess it is coming back.
i am not sure whether the same three word term will hold good for the spanish language as well but using it in google spain, i see you as no.3 while your scraper is no.4.
Check out the mail i sent for the results i see..
Another day, another drop. My page is now only 8.6% of my total impressions (8AM PDT), due to the scraper continuing to hold the top spot. Normally, for years, my page has been over 40% of all my impressions.
I just tried again, and they are still in the #1 spot, searching from California using Firefox and Safari.
However, another forum member reports seeing my page at the top again, with the scraper not to be seen. Can this be a regional thing, although there is nothing "Local" about the page? Even if the fix is rippling through the data centers, I should start to see some change in my page impression rate, but the impressions of my page continue to go DOWN.
Google continues to display the newly canonized scraper page, and ignores my de-canonized original page, which has had the top spot for 7 years. AdSense and I continue to lose money, since the scraper has removed the ads.
I have added a GIF of the Cease and Desist email I sent to the host, to my scraped content discussion web page. Full open kimono. Any other approach will not be effective, IMHO. I have nothing to hide, and much to say.
|I was confused by that too. |
I don't believe that my site has been hacked. I just checked the source code, from my downloaded page. It looks OK.
No I don't believe your site was hacked either.
I looked at your problem from two perspectives:
1. What about the thief's site could possibly fool Google into ranking the stolen copy higher than the original.
2. What about your site might bother Google and it's newest algo's.
So for number 1:
The thief has two links on his home page, one directly to the stolen copy of your page, but another link on his home page goes indirectly to the stolen copy through a 301 redirect. Why he/she would do that I have no idea, other than Google has had problems with 301 and 302 redirects since I can remember.
For number 2:
Your index pages like:
This page has a cascading 301 redirect. If I type your domain www.example.com/medicine with no trailing slash, the browser (and Google) will see three redirects in a row before it reaches the actual medicine/index.htm page. Google likes one 301 redirect, but I think it has a problem with 3. Each redirected path should go directly to the final page. Perhaps these redirects have been added over the years. This redirect cascade may upset Google's new algo's since 301 redirects used to be a major problem for the Google crawler. So this could be a very long path for Google to take to get to your page that was stolen. (The thief's website has a much shorter path)
This is very difficult to demonstrate but Google has indexed each of your index.htm pages twice. Once through the path (for example):
and directly through this path
This is usually not good.
Perhaps the 301 redirects and the two indexed paths to the index.htm pages have triggered a quirk in Google's algo?
Really it would be great if they'd just keep a real good history of the page's life. (but I think this is harder than it sounds)
I'll bet some of the webmasters here at WW can explain this much better.
[edited by: bumpski at 9:00 pm (utc) on Jan 30, 2011]
|I did the supplemental test: |
Is this test still working with reliable results?
This test seems to have gone away for about a year and I'd say about six months ago it started working again. It's a very useful test to find pages that may be indexed more than once by Google. Please see this thread
In a recent thread here,
I was advised to change ALL of these -
so I did exactly that.
I still come up #1 in Bing and Yahoo for the search phrase.
Did I "fool" Google?
|so I did exactly that. |
I say it's a good idea, especially since you said that you had internal links formed both ways:
If making that particular change has any relation to the ranking hiccups you're experiencing (and I'm not convinced that it would) it should recover as soon as the flow of PR within your site gets fully recalculated.
Even if that is part of the mix, you are wise to go on the warpath about the stolen content. That is a separate problem that calls for action, the strongest action you know how to take.
Sally... first, if you've made the changes correctly, it's very unlikely that they caused the page hijacking to replace you. I think you were very wise to make the changes, and I think the hijacking is a Google problem.
I should add, btw, that I've had a page that I wrote personally and that I know is original, which was on the first page last week as a nested #2 result, zapped by the update. The page now doesn't show up for many exactly quoted searches. Google IMO has not solved the problem of scraping... and there's been some major collateral damage in this update. I see these in other sites I've looked at as well. I hope to have time to post about this.
With regard to the canonical changes, to double-check that you interpreted various recommendations correctly....
In addition to changing the nav links, you also need to do a 301 redirect from index.html to "/". A less preferable alternative would be to use the rel="canonical" tag on your affected pages.
Re the 301, g1smd explains further along in the post that I quoted above why the redirect is necessary....
|...The 301 redirect ensures that anyone that does try to access the other three URLs is redirected to the correct URL before the content is served to them. |
Note that "anyone" here includes Googlebot. The concern here is that if you have inbound links going to different urls for the page, you need to redirect Googlebot to reach them, so all the link votes will combine and credit the chosen canonical version.
Also, I should add, the redirect of "index.html" to "/" is not a simple "redirect" command redirect, and it does require mod_rewrite.
Code examples are in this thread, which is also referenced in Hot Topics....
Merging www.example.com/ and www.example.com/index.htm
Also, take a look at this thread for the discussion on why mod_rewrite is needed in this case. It might be helpful....
Split pagerank on index.htm
You should have no problems if you've done the above correctly. You should be observing the url changes in your address bar and also be able to check them with a server header checker.
The use of the rel="canonical" tag, IMO, is iffier, partially because you can't check Google's implementation, and partially because, if you had non-canonical links to your home page, they may possibly have affected the urls in links to other pages on your site... so you might need to add a correct canonical to every page. I feel there's much more chance of error, and the rel="canonical" tag should be used only if you don't have adequate access to your server.
Here's Google's help page on Google Webmaster Central...
The hijacked content, I believe, is a separate issue. Is this the page that you'd posted about earlier, btw, that wasn't coming up for its exactly quoted title? I'm seeing similar issues with this update, which is why I ask.
| This 57 message thread spans 2 pages: 57 (  2 ) > > |