| 11:52 pm on Mar 23, 2011 (gmt 0)|
Just looked at my raw logs, Google is trying to get pages long deleted. This, to me, is a sign of a deep, deep crawl. We should know soon if this data is for Panda II or after it.
| 3:52 am on Mar 24, 2011 (gmt 0)|
Well it is the one month anniversary. Let's hope they are doing a second round of crawling and re-shuffling/penalizing. We have made some significant improvements and would welcome a new deep crawl.
| 1:30 pm on Mar 24, 2011 (gmt 0)|
|We have made some significant improvements and would welcome a new deep crawl. |
How did you know what improvements to make? Since Panda is far from anything logical you could fall even deeper because of what you think is an improvement.
| 1:58 pm on Mar 24, 2011 (gmt 0)|
We're all about deep backlinks, it's essentially the only kind we attract. I've worried that this is part of the problem, since our home pages are basically just place-holders with links to major sections. Since we never had any interest in brand building, all of our pages have simple html file names, and all of them are stand-alone subjects or solutions. Between our two sites that drew around 10,000 a day from search and got hit, I think there was exactly one page that terminated with "click here to continue" simply because it was so long with so many photos I was worried it would take too long to load.
So our material always drew deep links, and I believe that's one reason we did so well in our subject areas. The smaller site has interior pages with PR=6, while the home page is PR=4. But it's nonstandard design for modern websites. And it may be that Google is now looking at links to the home page of a site, like Amazon or Sears, as a sign that those are successful brands of "quality".
| 4:59 pm on Mar 24, 2011 (gmt 0)|
I see a number of people reporting a return to normal SERPs or better in BLF today. I'm not seeing it at all though. Anyone seeing improvement today?
| 5:06 pm on Mar 24, 2011 (gmt 0)|
Of the 111 keywords I track, 77 are up from a week ago and 34 are down. I've had some server issues over the past 24 hours, so it's hard to tell yet if there is any improved traffic yet.
| 5:22 pm on Mar 24, 2011 (gmt 0)|
Chrisv- We made improvements that were mostly technical/indexing issues. Some we know we will take a short term hit on. Others we have no clue the impact.
These were all also related to improving the users experience when and where they land on our site as well.
So given the option of doing nothing and accepting our penalty or making the site better... I'll choose option B.
| 5:31 pm on Mar 24, 2011 (gmt 0)|
|I see a number of people reporting a return to normal SERPs or better in BLF today. I'm not seeing it at all though. Anyone seeing improvement today? |
PjMan, looks like we had a sort of shuffling yesterday that went unnoticed for many. My changes were to add some content to most pages and delete all the thin ones.
| 6:31 pm on Mar 24, 2011 (gmt 0)|
We saw some slight improvements yesterday. But oddly even site unaffected by panda saw the same improvement.
| 6:40 pm on Mar 24, 2011 (gmt 0)|
I havent seen any real change on what I look at
| 6:41 pm on Mar 24, 2011 (gmt 0)|
|My changes were to add some content to most pages and delete all the thin ones. |
I blocked all the thin pages while I review and delete/add new content. Still have 29,000 indexed pages+. I have seen zero improvement on my site wide Panda penalty. That was 3 weeks ago.
KW across the board down 3-7 positions. Mostly from Page 1 to Page 2. My WMT account looks like red death on feb. 24.
| 7:14 pm on Mar 24, 2011 (gmt 0)|
Pjman, if your site is the teaching one, and if I had to bet Google screwed you for thin content. So your guess is probably the right one.
To speed up maybe you should send Google the sitemap again and LET them scan the noindex. Blocking them via robots takes much longer IMO.
| 7:21 pm on Mar 24, 2011 (gmt 0)|
Pjman, I looked at BLF and I would take those threads with a grain of salt.
I am seeing the spam pages that everyone is talking about on the 2nd page for many terms, yesterday Google burped up some nasty results all propped up by spam links.
| 8:22 pm on Mar 24, 2011 (gmt 0)|
Thanks for the advice on blocking robots. I'll remove it and go straight "no index".
Should I also go "no follow" too?
I think you're right about the BLF threads. They are usually black hatters over their so their sites will probably be far from the norm.
| 8:41 pm on Mar 24, 2011 (gmt 0)|
Thanks for the advice on blocking robots. I'll remove it and go straight "no index".
Should I also go "no follow" too?
I would leave it as follow. Right now Google follows the links but ignores the content in scoring. Also you may want to ping some pages to Google, send a sitemap, link to the noindex pages or something to speed it up.
| 9:49 pm on Mar 24, 2011 (gmt 0)|
A 1000 Thanks. Love this forum, I found a new home. BLF, DP, and WF we're all just people looking for shortcuts. I see here the culture is about being proactive and actually helping out one another. I'll pay it forward, I promise.
| 10:37 pm on Mar 24, 2011 (gmt 0)|
OK, I'm having a hard time deciding what to do. I've spent an entire day writing about 900 words on the features of Acme widgets, including an explanation of how the much-praised mechanism works. I included details photographs of the inside of the widget, showing the mechanism. I think it's good quality work.
I've added it to the main Acme Widgets page. Considering what I found with pages I have for another brand of widgets--the pages I mentioned in my second-last post--I'm seriously debating putting this little article at the bottom of each of the ~100 pages of individual models. That's what the pages for the other brands of widgets have, and they've done much better than the rest of my site.
By adding 900+ words to thin content pages, the duplicate content ratio goes up to 90+%. The widgets pages I mentioned in the earlier post are 80+% similar.
Would you do it?
| 10:55 pm on Mar 24, 2011 (gmt 0)|
@dickbaker If I understand you correctly, you want to take a 900 word article and drop it onto 100 of your pages? You want to duplicate the same 900 words onto 100 pages? If so, no, not in a million years, don't do it!
| 12:30 am on Mar 25, 2011 (gmt 0)|
Why are people posting about so much non-Panda related stuff in here? Really making a mess of what was a useful thread. Any chance the mods can clean it up, move some of this stuff out?
| 2:47 am on Mar 25, 2011 (gmt 0)|
Shatner, I don't know if you're referring to my post as "non-Panda" or not, but they're definitely Panda-related.
It's tempting to see what if anything would happen if I gave those hundred or so pages a ton of content, but duplicate content, but it's too risky.
What's funny is that the pages I mentioned in the earlier thread have tons of dupe content that make up the "non-thin" content. Those pages are doing twice as well as hand-written thin pages.
| 3:10 am on Mar 25, 2011 (gmt 0)|
If duplicate content is a part of the Panda update, then we are in big trouble.
I was at the SES New York conference today and attended a session about duplicate content. One of the speakers was a Google engineer. The advice she gave to let Google know that you are the original owner of the content was that you should ask the site that copies your content to link to the original article on your site. That way Google would know what the original is that is supposed to rank higher.
Problem 1: Doesn't this mean that Google is not able to identify the original on its own? Scary! It would explain why this Panda update is such a mess.
Problem 2: I don't think we will be very succesful when we ask copyright violators, scrapers, thiefs, ... to link to the original on our site.
She seemed very uncomfortable when the Panda update was being mentioned. I don't think they are very happy with the new algo and hopefully we will see some changes soon. The Google engineers are smart people and I'm sure that sooner or later they will realize that Panda is far from an improvement.
| 3:26 am on Mar 25, 2011 (gmt 0)|
Duplicate content comes in two very different flavors: cross-domain and same-domain. Then those two flavors each have sub-flavors.
Cross-domain we've got syndication (including quotation) and scraping.
Same-domain we've got intentional duplication and technical accidents, such as canonical issues.
So we need to think very clearly in this area and not just talk about "duplicate content". You can bet that Google doesn't do that.
This new algo is not something that Google built once and will live with from here on. They NEVER work that way; they iterate and iterate and iterate, rather than aiming for perfection right at launch. Google is guided by long-term vision, not short term actions that merely favor the immediate or expedient. In this case, they ran the algorithm and found it agreed 86% or whatever with their human input - and that was good enough for a first step they decided. And yes, then they were talking about "layer 2" almost immediately.
If your site took a hit and you are really confident that you have an excellent offering that fell into that 14% mis-match area, then I'd say keep improving for your users, and not for Google. It is Google's job to recognize what your visitors already see in your site.
But if you don't have that certainty, then I'd focus on the areas that Matt and Amit described [webmasterworld.com] - even telegraphed - to us. I would not chase after Panda based on anything else right now. Not any article from any industry "authority", and certainly not just any old post on a forum somewhere.
Why do I say this? Because it's clear to me that no one actually knows anything for sure right now. If we chase after what we think Panda is in this moment, then very soon it will have shifted... and shifted again. Better to understand what Google wants the algorithm to be measuring and do that. And what Google is describing sounds to me like what visitors want, too. This is how I am guiding my own work and my clients.
[edited by: tedster at 3:54 am (utc) on Mar 25, 2011]
| 3:29 am on Mar 25, 2011 (gmt 0)|
Maybe because she doesn't know yet what the Panda algo is all about ;)
|She seemed very uncomfortable when the Panda update was being mentioned. |
| 3:48 am on Mar 25, 2011 (gmt 0)|
@tedster Same-domain we've got intentional duplication, thin content and technical accidents, such as canonical issues.
To a bot, a bunch of pages that differ by just a few words are basically the same, and thus duplicate content.
| 4:03 am on Mar 25, 2011 (gmt 0)|
| 5:07 am on Mar 25, 2011 (gmt 0)|
Tedster, I understand that Panda right now is just a snapshot of the moment, and not a picture of the future.
However, in my case, I have dozens of pages of content that I scraped from the manufacturer a few years ago, content that's been copied by others all over the internet, and I've repeated that content several times across those dozens of pages. If I measured all of them for dupe content, I'd bet that 60-70% of them are duplicates of other pages in the same group.
The bizarre thing is that the duplicate content doesn't seem to matter. These pages were hit only half as hard as pages with original content. The difference is that there's more content on the scraped pages.
This is cross-domain as well as same domain duplicate "fat" content, and it's being favored over original but thin content.
| 5:19 am on Mar 25, 2011 (gmt 0)|
I have seen many examples of that kind of thing, Dick. Whatever role dupe/scraped content may be playing in Panda, it seems to be quite minor.
From what I've been reading, there is little mention of any actual text analysis - semantic richness, reading level, even spelling and grammar. It seems to me that any attempt to measure quality would be using at least some of this kind of measurement.
We know Google has reading level in place - it appeared within the past year, while Panda was already in development. We know they have a highly refined phrase-based indexing system. Could this be one reason why so many sites with UGC have been dinged?
| 5:37 am on Mar 25, 2011 (gmt 0)|
|It seems to me that any attempt to measure quality would be using at least some of this kind of measurement. |
Remember the discussions of people getting away with gazillions of ROS links and other getting slammed by Google? The same thing might be at play here. Some sites (with 'immunity') get away with dupes and thin content, others do not.
A lot of sites with user content (forums, q&a) have been hit. But they also had thin pages, multiple tags, empty searches that google followed, many ads etc.
On the other hand even sites with perfect English have been slammed
| 7:13 am on Mar 25, 2011 (gmt 0)|
>>>One of the speakers was a Google engineer. The advice she gave to let Google know that you are the original owner of the content was that you should ask the site that copies your content to link to the original article on your site. That way Google would know what the original is that is supposed to rank higher.
@chris thanks for reporting back on this info. That's really... strange. As you say, content thieves are by definition not ethical, and they are never, ever going to link to the place they stole the content from. The idea that Google even thinks this is a possibility is pretty scary, makes them seem really out of touch.
Also, even if they do link to you, I've seen firsthand that it doesn't make any difference. There are many sites which "excerpt" a portion of our content and link back to us. And they are frequently ranked higher than the actual content from us that they excerpted and are linking to. So clearly that makes no difference at all.
| 11:08 am on Mar 25, 2011 (gmt 0)|
@dickbaker, as I understand it, you've broken your pages into two groups -- 1) hit hardest by Panda and 2) hit hard by Panda. You said #2, which is doing relatively better post-Panda, has more scraped (can't think of a better word for what you've got) content but the word count is greater because you augmented scraped content with additional unique content.
Although Dan01 is correct that those pages will have more diverse keywords because they are longer and will naturally get more long-tail traffic, it could also be that Group 2, because it's longer, is more attractive to Google on these metrics (which many speculate are important in a post-Panda world):
- Ads on page divided by words on page
- Affiliate links on page divided by words on page
Food for thought anyway. Thanks for sharing your analysis.
| 11:25 am on Mar 25, 2011 (gmt 0)|
|Here is why I think they can rank higher: Google says that their robots revisit sites more frequently if content is added quickly. If all I had to do was scrape content, I can get tons of content up quickly. If Google believes they were the first to post the content, they must have created it. |
I agree with this. As aggressively as Google might crawl a site, many scrapers are more aggressive. After all, scrapers don't care if they crash your server (Google does) and they don't have to crawl the entire web like Google does. So, when you post a new page of content, there's an extremely high likelihood that a scraper will get your content before Google sees it.
From that point, the question is whether Google crawls the scraper site's page (with your content on it) before they crawl yours. If they do, they may erroneously assume that the scraper site wrote the content and you copied it from them (!).
To get their scraped page crawled before you, the scraper site just has to be a little more sophisticated than you -- e.g. they submit the scraped page to Google via RSS feed, XML site map, Twitter tweet, etc.
In contrast, if you are just hoping that Google will deep crawl your site and find your original content, before it gets to the scraper site's page, that's not a good bet.
For me, a takeaway from Panda is that I need to get my original content in front of Googlebot as fast as possible in order to make the record clear and stake a claim that it's my content, and doesn't originate from the many scrapers that can quickly grab it.