| This 89 message thread spans 3 pages: < < 89 ( 1  3 ) > > || |
|My attempt to Recover from Panda|
In full disclosure I am adult webmaster. Had two profitable sites get hit. I haven't recovered nearly enough but I am documenting what I have found
First site: been running for 5 years very popular among those people enjoying this umm... activity. 100% unique, runs reviews, contests as well as a one stop for updates in the industry. It is heavy with affiliate links which I am becoming convinced is why it got hit in Farmer.
Solution: I found some java script code to encrypt the links. Google sees a random string being passed in the on click even. Short of being able to execute it it won't see the link.
Result: Tested it on the home page on the text links. Saw some very nominal recovery (could be noise). Began the first posting under this method today and will test for a couple of more days to make sure everything is fine with the click though. Then roll it out slowly to the rest of the site.
Site #2. Niche site has been 1-3 in its main keyword (competitive one) for 3 to 4 years. New posts every day, 100% unique written by me. Lost 60% of its traffic when panda rolled out. For its main keyword it dropped from position 2 to around the 5th page.
Solution: did an analysis of the pages hardest hit. One I thought was interesting I did a parody on the made for tv commercials. Where "widget" was mentioned multiple times in almost all the paragraphs. More interesting I was not trying to sell said "widget" (It appeared in the picture set). So apparently Panda doesn't enjoy sarcasm, any human reader would have thought it was funny. In any case I rewrote the page and it recovered OK
Second page: admittedly thin content. added one paragraph. It recovered.
A number of my pages had duplicate titles and metatags that got hit. I have been cleaning them up as I go (or deleting them). so far no change but Google has not yet seen them (as confirmed in the webmaster tools).
Also no changes for the main keyword. If anything it has gotten worse. I will probably also try the link encryption since I am convinced it has something to do with affiliate links
Anyone else have any ideas or findings they want to share?
@danijelzi Unfortunately most of that advice on how to stop scrapers isn't realistic or practical.
In my niche the scrapers do it manually with cut and paste. They hire teams in other countries to create content, and those teams mostly manually copy and paste from all over.
The secret sauce seems to be you can copy anything from 3-4 years or older and re-publish it and rank for it. It works for our competitors and Google ignores all spam reports citing such examples.
So have it everyone!
Yes, Google seems to ignore scraped contend published years ago.. It's the same for us!
@tedster said: >> @shatner If all or part of your thick, 1000+ word articles are available anywhere else and you are not the #1 rank (to check: google them, sentence by sentence, in quotes), then google quite possibly considers them #*$!ty pages. It doesn't mater how original they were when you wrote them and posted them, (or how unfair it may be if someone scraped them or copied them, etc.), all that matters is: how original are they TODAY and does google see them as YOURS?
That's what I'm thinking too. Which perhaps suggests that one of the main reasons I am pandalized is because my content has been so widely scraped perhaps, since my most heavily penalized pages have the most content?
If that's the case, what do I do? I can't stop the scrapers. This seems like a question Google really needs to answer.
@shatner It was actually I, not tedster, who said that original quote above.
The solution is ugly. You need to redo your content. From what I've seen, it truly is a percentage thing. So, you belly up, and start fixing it, page by page. You won't need to fix them all before you cross back over the threshold, and you get ranking back. But to be safe, before they increase the threshold again, don't stop when you get ranking back, you better just keep going and fix all the content. I also see that once google labels you as using dupe content, they will be more likely yo assume all of your content is duped. So, as you start fixing it, once you cross the threshold and come back, I have the impression they will be less likely to assume all of your content is duped, and they will be more likely to give you credit for more of your content. You need to redo any content that you don't "own" in google's eyes. You said you have 100,000 pages. It sucks. I know what you're going through. I have 2,500,000 pages to fix. I threw 2.4 million under the bus onto new domains. That leaves about 100,000 that are the most business-critical to me. I've got a lot of work to do, and the 10 minutes I spent in this post is maybe 1 or 2 more pages I personally could have fixed. ; ) Again, I said 'using dupe content', not meant to grate on your nerves, but the reality is, how your content is dupe no longer matters. Be pragmatic. The goal is to get ranking back. The debate on what is fair or unfair is moot. Be pragmatic. Go get your ranking back. Redo your content. And this time, as you redo it, try to use some tricks to establish this content as yours before it gets scraped again - twitter it at least. Maybe google will fix this, maybe they don't know how. So, start again, and this time, try to establish credit for each new page of content as you go, rather than wait for a google crawl that may come tomorrow or in 2 weeks, and maybe someone scrapes it and by fluke they get crawled before you do and they get credit for it. It ain't fair. But fair and a buck gets you no more than a coffee. Screw the coffee, you want your ranking back. Go get it back. Redo your content. Screwing around with ads and color of the background and reading level and a spelling mistake, that's all red herring stuff.If your content is yours, you will get away with any color background you want, and you can spell it 'color', 'colour' or kolor' if you want. Be pragmatic. Fix your content. That you can control. How fair google has been or has not been, you can't control that. Content you can control.
Just be pragmatic. Go get your ranking back.
/ end of well-meaning rant.
@helpnow said: >>The solution is ugly. You need to redo your content. From what I've seen, it truly is a percentage thing. So, you belly up, and start fixing it, page by page.
Not really a solution for me, on a 100,000+ page site built up over half a decade. That is literally impossible. To even affect 10% of my content would be a monumental task beyond all measure. You say you have 2.5 million pages, I'm not sure how you can even follow your own advice. Doesn't make sense to me.
How do you propose to basically rewrite that many pages? Any insight there? It's not something you can automate.
Not sure how that helps with scrapers anyway, they'll just scrape the content I replace it with. How does that solve the scraper problem? Confused by your post.
Sorry about confusing you with tedster. :)
>>>And this time, as you redo it, try to use some tricks to establish this content as yours before it gets scraped again - twitter it at least.
Twitter as in tweet links to it as soon as it's posted? Doing that. Have been for years. No help. Any other realistic suggestions on how to establish your "scraped" content as yours?
I'm not worried about fair or unfair, just your advice doesn't seem practical or doable. What am I missing?
@shatner What choice do you have? It is a threshold. So you don't have to fix 100,000. Maybe fixing 10,000 will be enough. Dunno. And of course, for some of it, google got it right, and has given you credit. Those are fine. And for some, maybe no one scraped them - those are fine. So maybe on 100,000, you've got issues on 20,000. Dunno. And maybe all you need to do is fix 25%, or 5,000. Dunno. But what choice do you have? You get into the trenches, and you start slugging it out, step by step, page by page. You did 100,000 in 5 years. So, 5,000 will take you a lot less time. You have no choice. This will not fix on its own.
Again, I am speaking frankly, and as a friend. 2 guys over a beer, no BS, no waxing poetic, no tiptoeing around egos. It is the brutal truth. I don't care if it is a popular thing to say, it is the truth. Be pragmatic. Fix your content. It is your ranking, go get it back. The solution is easy, the effort is huge.
There's two types of scraping - scraping using your RSS feed, and screen scraping, where they download the html after they hit your page.
The RSS scraping is easy to solve - just turn the feed off, or at least set it to "short" so that they only get the first paragraph.
Regarding screen scraping - take a look at this page for info on how to deal with it
P.S. The ban IP address method described is slow (scrapers use proxies, and it's only after a period of several years that you nab most of them.
So you still have to do what helpnow said and start changing and amending the content on your site. Don't put it off, otherwise you'll be there for years.
In addition to tweeting your new content URL you should use hubsubpub as suggested in the video [youtube.com...] and file a bunch of DMCAs with Google. I know they are not very effective but they are easier then creating new content. In particular make sure whatever was your biggest landing pages have all new content that was tweeted/hubsubpubbed as soon as it was posted.
If that's the solution, don't you think it more practical to simply start completely over? Better to move forward with new content on a different site to waste time endlessly trying to fix old content. Don't you think?
That's why I'm saying your solution isn't really a solution at all. Actually it's kind of a waste of time.
Again, no offense. Not trying to bash your idea. Just talking. I mean you may well be right that the only way to re-rank is to completely rewrite 100,000 old pages, but my point is that if that's the only solution, then the truth is that there IS no solution and you should give up and start over.
Don't you agree?
I'm not faulting your analysis of the problem, but to me your solution isn't a solution. It's basically the same as saying to do something else for a living.
You say you have millions of pages you have to rewrite, how are you going to approach this? Obviously you cannot rewrite two million pages.
>>So you still have to do what helpnow said and start changing and amending the content on your site. Don't put it off, otherwise you'll be there for years.
Why is pulling it off worse than changing it?
Also again, I maintain that anyone who is in this position and thinks the only way to recover from the Panda penalty is to rewrite 10,000 pages by hand... is better off quitting, getting a completely new domain and starting a completely new site.
It's not a logical solution. I don't understand why anyone thinks it is. Am I being pranked? A solution which it is immpossible to execute is by definition, not a solution.
Shatner, ide rather just delete them 10k pages and slowly build new, content rich, quality pages. Too many people start new domains/websites like thats the answer. Meanwhile they stick with the same practices that got them penalized to begin with and all they did was waste a bunch of time.
|Also again, I maintain that anyone who is in this position and thinks the only way to recover from the Panda penalty is to rewrite 10,000 pages by hand... is better off quitting, getting a completely new domain and starting a completely new site. |
You could always hire someone to help you (or persuade your spouse, siblings and parents to help). Or you could start a new site. Or do both.
What you shouldn't do is simply wait and while away your time on forums complaining about G. I mean that kindly. Since Panda, you say you've made changes in the first week, and then? Don't waste any further time. Pick a course of action and stick to it.
Given how fed up you seem with the state of your current site, perhaps work on a new site and then come back to your old site in two months time with fresh eyes. But do something.
@Shatner probably my last post.
First, it is a threshold. You won't have to fix 100% of your problems to recover.
Second, by fixing these issues, you will be stronger than before. Your whole site wasn't punished before, but these dupe pages never ranked before anyway. Fix them, they will rank where they once never did, and you will rescue all the pages that did rank before. You will be stronger.
Third, for myself, I threw away 2.4 million URLs by removing them from my coveted domain onto a couple other domains, and no-indexing the whole lot. They were all scraped anyway from more than 10 years ago when I sold books. I walked away from those 2.4 million URLs, they haven't put serious bread on my table anyway for more than 7 years. So, I don't have 2.5 million URLs to worry about, my new starting position is 100,000, and tons of that is safe (unique products, unique content, ranking #1 for the content). But I still have lots of dupes in there too, some my fault, some unfair, but who cares - I have some problems. They will be fixed.
Fourth, your domain may have intrinsic value. It may be pandalized, but it may still have value beyond what a new domain may have. This is not a penalty per se, this is recoverable - it is re-ranking. The value of your old domain far outweighs the value of any new start-again domain, especially if you have 100,000 URLs on it. Links, age, branding, etc.
Fifth, I am not pranking you. I am paying it forward. I have been helped in the past by members of this forum way beyond the help I could ever repay to anyone. But my altruism only runs so deep. ; ) I too must be pragmatic. ; ) I have to feed my family, and it is not lost on me that the more people who take my advice to heart, the more competition I will eventually have. ; ) Half joking, but, half serious too... ;) So, I have paid a tiny fraction of the forward I need to pay here by opening my mouth now, but... you can lead a horse to water, but you cannot force it to drink. ; )
Fifth, you say it took you 5 years to do 100,000 pages. Surely, you do not need to redo all 100,000 pages? Meanwhile, some of those pages are still creating some revenue for you. You are not down to 0%. I would rather start from 50% down, and work my way back up, than try to restart at 0% and try to work my way back up to my former 100%. I also find it easier to rework content, than to start with a completely blank canvas. I do not relish the thought of rewriting 5,000 pages of content, but I cannot fathom the thought of recreating 100,000 pages of content.
Sixth, everyone's situation is different, but my domain is more than 10 years old, and it is a "brand". I am also insulated in that I have multiple domains older than 10 years. As much as the task before me is daunting, killing this particular domain is not an option, when I weigh the pros against the cons, and consider all the value this domain has. Much easier to rework some content, than start from scratch. My infrastructure is awesome, I just need to do a search-and-destroy on dupe content - this is a much narrower task. I'd rather spend 3 months rewriting 5000 pages on an established domain, than spend 5 years recreating 100,000 pages from scratch AND start a domain from scratch.
In summary, I've done a small pay-it-forward here. A tiny, minute fraction of what I owe to all the white knights who've helped me in the past. ; ) For you, and for the multitudes of lurkers. For some I will prove to be the voice of reason, they will listen and consider, take a long walk to sort out their thoughts, then return and get organized and get down to work, and then thank me for it silently weeks/months from now. Others, well... I cannot save the world.
I wish you and everyone else all the best.
As for myself, I need to get back to work.
Very nice post, CJH - thank you. I hope we do continue to see you around.
|Surely, you do not need to redo all 100,000 pages? |
Exactly. You can use your stats to locate the URLs that are hurting you the most and address them first.
I believe, unless all the content is completely mash-up and spun database garbage anyway, that the threshold can be reached a lot sooner than you might imagine. And for most people who are posting here, I'm pretty sure that their content isn't spun-up garbage, or else there would be little problem in just forgetting about the loss and moving on.
@helpnow Please don't think I do not appreciate your post and the well stated thoughts. I really do. I'm just trying to point out the real world application of them. :)
@tedster >>You can use your stats to locate the URLs that are hurting you the most and address them first.
I think for a lot of people here that's part of the problem, it's really hard to identify the worst URLs. As I've said before here there doesn't seem to be any pattern to it. Making it really difficult to identify the worst hit. If I just go by the ones which were de-ranked the worsts that doesn't help because those pages I can't find any problem with. It's not like the worst hit are thin content you know?
So Ok, I rework the worst 1000 pages. But if the worst 1000 pages are all pages with 2000 word articles and no ads on them, then that's where it becomes very confusing to follow this advice. So what is my goal then? Just randomly reword those articles? That doesn't make sense and seems unlikely to help.
Hope I'm explaining myself there. :)
@Alyssa >>>What you shouldn't do is simply wait and while away your time on forums complaining about G.
Please don't assume just because people are here discussing things and asking questions that they're doing nothing.
I'm pretty sure that's not the case.
|But if the worst 1000 pages are all pages with 2000 word articles and no ads on them, then that's where it becomes very confusing to follow this advice. So what is my goal then? Just randomly reword those articles? That doesn't make sense and seems unlikely to help. |
Is that really your situation, or is it just a "what if" hypothetical? If it really looks like that, then it doesn't sound very promising, I must agree.
I assume you do have at least some pages that fit your description anyway. I'd just put them to the side and start in on the pages that do have obvious problems. I'd make sure the pages have good semantic mark-up too, to signal the quality of the writing and give it strong relevance signals.
|I think for a lot of people here that's part of the problem, it's really hard to identify the worst URLs |
Using analytics to compare a month before and after Panda easily shows the traffic drops highlighting which pages have problems - kind of web 101 stuff really.
IncrediBill, In my case, analytics doesn't help a lot.Even after panda, there is no big change in the top 25 to 50 pages.They are still the major traffic pullers. That was a surprise to me.They don't rank that well as they used to and they are getting much less traffic now.
But I used GWT to find out the top pages that had gone down. Though the site was affected on Feb 23, I chose dates between April 11 and 17 as April 12 was the international roll out and there was a slope between April 11 and 12.
One page is showing a decline in avg.position of -300 and it is followed by pages with -200 (1), -80 (1), -60 (2) , -50 (4), -40 (2), -30(1), -20 (8) and so on.The number within brackets represent the number of pages.
Interestingly, many of those pages which showed a decline of -300, -200 etc., have been copied by many sites and forums.I confirmed this by just picking a sentence and doing a search in google without quotes.There are several sites that show the same text.Many of these sites look to be spammy and a few are set up on free hosting platforms like blogspot, wordpress etc.
Rant - Free hosting platforms like blogspot should be immediately closed if google is really serious about web spam.
For most of those searches, I do see my site to be on top, though several have copied it. But there are a few posts for which copycats rank higher!
Is google classifying those pages as poor quality, because the content is found at many other places? Or is it those few pages where the copycats rank higher the real problem?
For one page I noticed this too.
I had used a sentence, which starts like this.
|However, here is a free.... |
When I did a search google spell checker suggested a correction.
|Did you mean: However, there is a free... |
Anyone can easily simulate the above by just using those five words.
could these be triggering quality issues? (Does google even suggest grammatical corrections?)
Can you all pls. share your thoughts on this?
|Interestingly, many of those pages which showed a decline of -300, -200 etc., have been copied by many sites and forums. |
Exactly the same thing happens to me. Now I'm modifying and adding content to some of the articles that were scraped, maybe it will help.
Shatner, suppose you start with a new domain. By the time it's indexed Google might slap it with the same penalty so you lost a lot of time and by then you might want to jump of a bridge ;)
Google is not reacting on Panda 1 victims as far as I can tell, so many problems may have been fixed but we don't know know yet.
It suggested an alternative, "here is a free" and "there is a free" are both grammatically correct .
The "did you mean" tends to "push" the most "currently" commonly used search term or phrase which is similar to whatever you entered..which is why it can take a while to get it to stop "suggesting" alternative domain names if you launch one and Google has more searches done for a similar sounding one. or one with a similar spelling..you have to train it ;-) and if you can get others to do so that helps too ..
But if you are using a really uncommon spelling because of "branding" or the one you wanted is taken ..sometimes it will not ever "take" ( unless you make super human efforts or throw a great deal of money at the problem ..or your spelling or phrase goes "viral" )..it is like trying to push water up hill ..or herd cats ..or even like telling webmasters for months that too many ads and intrusive ad placements and types of ads was playing a large part in Panda ;-)..
Leosghost, thanks.I mentioned it as "correction" as Google does it usually with typos. I am surprised that a common phrase like "here is a free" triggers an alternative suggestion. As spelling and grammar are all related to quality, I always thought Google might use the same tool to check pages for these kind of errors.
What are your thoughts on the first question?
|Is google classifying those pages as poor quality, because the content is found at many other places? Or is it those few pages where the copycats rank higher the real problem? |
|about 4.000 pages on one site and 2.000 on the other! we are going page by page. I personally removed about 20 already with thin content. |
I am planning on doing some Panda Purification and I am sure I will be needing to remove pages. Normally I would 301 them or meta refresh.
Can I ask which would be best
Just delete and pick up with 404
.htaccess and 301 it somewhere/homepage
Also may I ask what would be a ideal number of words in a page/article I don't want to do too little and in the same thought too many
Thank you all
301 redirect means you have a replacement page for the one they ended up on, it shouldn't be used to send people to the homepage because there is no more content on that page. That's just confusing for visitors and search engines. That's what 404 is for.
On the above comments re: blocking the IP of scrapers...
There are two primary scrape methods. A "lazy" scraper will attempt to scrape using a single (maybe two or thee) IPs. Since most of this class are likely to be from a server farm, blocking the complete server IP range will kill the spammer (repeat for all known server farms). In some cases scrapers may operate from a broadband IP, in which case you can block the offending IP (assuming the IP is "static") or lay a complaint to the broadband provider stating thus and thus; if that fails, block the complete IP range (which may or may not result in lost "real" visitors). If your site is big then broadband scrapers may not be practical due to the ISP's bandwidth restrictions.
The second class of scrape is run through a botnet. These are cheap to rent (though illegal to run). The advantage to the bot runner is that the scraper takes each page using a different IP. I don't KNOW for certain but I think I've had a few attempts at this recently, starting the same day as Panda 2 hit the UK (coincidence? Maybe). This kind of scrape is more difficult to block unless you know how to detect bots; more info on this in the WebmasterWorld forum Search Engine Spider and User Agent Identification, which EVERYONE with a web site should read and digest! It doesn't tell you everything (otherwise scrapers would know how it's done) but it does point you in the right direction.
How you detect the IPs to block is up to you; there are a few methods in common use.
None of this will guarantee scrape-free sites but it should reduce scraping. You just have to pay attention.
Please note that SCRAPERS are people or bots that steal your content. SCRAPPERS, on the other hand, are people who fight. :)
>>>Is that really your situation, or is it just a "what if" hypothetical? If it really looks like that, then it doesn't sound very promising, I must agree.
That's really my situation. No hyperbole.
I went through Webmaster reports again last night, just to re-verify. In the top 100 pages most heavily deranked, many of which were -300, 95 had more than 500 words on it. Many had large numbers of inbound links as well, when I spot checked. Some were heavily scraped, others were not (which kills the duplication theory).
Ironically, when I look at the handful of pages which gained, a large percentage of them are the least original, least content rich pages on the site.
ok, A little update
Site Number 1 (hit by farmers). Did 3 full posts with the new encrypted links. Plus all affiliate links on the home page. Result: No change, Will start rolling out the changes site wide this week.
Site Number 2
The rewritten page definitely came back and is holding strong across two sample periods. Further changes
Changed duplicate titles for another bunch of hit pages (all had the same title) Result - Nominal recovery
Added a paragraph to one hard hit page Result - Nada
Rewrote another hard hit page Result - Nominal recovery
And the saga goes on
|The rewritten page definitely came back and is holding strong across two sample periods. Further changes |
Hey buddy - excuse my ignorance but is there an method you use to find which pages may need complete revamping vs others? I have so many pages on so many sites i just dont even know where to begin
thanks in advance...
Your friend in being raped by panda
There was this excel add in I downloaded that allows you to download Google analytic data. So I take a snapshot of the keyword, source, and landing pages for a particular period of time pre-panda and compare them to a like time period post panda.
Then I go page by page and compare. Some I can't figure out, but a considerable number of them its obvious why the page lost its ranking.
I would give you the name of the tool but I think its against the TOS here. If you search a bit you will find it. Handy little thing
@ lty83 - If you are not getting credit from google for the content, it is as good as not yours anymore. Take every sentence of content, sentence by sentence, and search them in quotes at google. If you come up #1, its yours. If you are not #1, its not yours, and you need to change it. Some content may have been scraped in snippets, so you may find, for example, that the first 4 sentences are lost, but the rest of the article is still "yours". So, you just need to change the first 4 sentences, and the rest is OK to leave as you are getting credit for it. Bloody brutal process, but there is no fail-safe substitute. I don't believe in using WMT data, as my experience has been that it is FUD and incomplete and deliberately misleading and out-of-date. The data in WMT simply does not match up to the SERPs, in my experience - your experience may differ. So depending on WMT data to find pages that dropped the most is a red herring in my opinion. Go to the source - go to the actual content, and check it.
| This 89 message thread spans 3 pages: < < 89 ( 1  3 ) > > |