|Duplicate Sentences For User Benefit- Will Panda Eat Me?|
I'm about to start working on a site with a colleague. It is educational in nature. We provide some exercises for college students to help them in class.
It would be really helpful to college students and professors to know which learning standard each exercise met or aligned with. This way they could quickly identify which exercises worked on things they needed to learn for their class.
The learning standards appear on over 35,000 pages as per Google. Each learning standard would be about 1-2 sentences per page of our site. Since our work is mostly interactive exercises, we might have 200 words a page. So 10-20% of each page would be duplicated content in Google's Eyes.
Here are my options:
1) Post the Learning Standards verbatim on each page.
Makes it great for our users, but google will see that we have mass duplicated sentences all over our site.
2) Re-write wording of the Learning Standard for each page.
We could leave the numbering the same as the real standard and just rewrite the wording (1-2 sentences) of the standard.
3) Leave the Learning Standards Off
Bad for users, Good for google.
I feel like #2 is the safe play. What do you guys think?
I would say option 1 because you believe that's best for your users.
But how about putting all the learning standards on one central page and providing a link which takes users straight to the position on the central page where the relevant learning standard is. Just a thought.
But really the only correct answer is try each one in a split test and see which one your users really prefer. The winner is the one google will like best, regardless of what that might be.
You might also put each learning standard on its own URL and then import that URL into a div on the exercise page using AJAX.
That way, the sentences are not being truly duplicated on many pages, but only exist in one location. At the same time, you are showing the appropriate standard to inform each exercise to the visitor.
I don't really know the nature of your site. But in your situation, I would always want to consider what's best for the users first. I would also ask myself if all 35,000 of those pages really need to be indexed in Google. How much quality traffic would they be likely to bring in on their own, even under the best circumstances? If the answer is not a significant amount, I might consider blocking a good portion of them and optimizing / improving pages that will really bring me the traffic I want.
I really believe that Google doesn't need to see *everything* we have, and sometimes what you keep out is as important as what you want in.
Can I say that option 2 is worrying
Re writing a "Standard" because of the G ?
Par for the course where folk feel compelled to re write manufacturers standard descriptions to please G ?
What is a standard?
How important is the "standard"
how important is the "wording of the Standard"
What happens if,,,,
|I would also ask myself if all 35,000 of those pages really need to be indexed in Google. |
You can create an image out of the Learning Standards verbatim and post this image instead of the text itself.
|You can create an image out of the Learning Standards verbatim and post this image instead of the text itself. |
That is something I never thought of. I have no idea why. How simple and perfect for my need. A 1000 Thanks!
@tedster - Do you think that will stand the test of time or will G start finding duplicates between video, text, and pics? I know they are working on it heavily for Youtube. Since scraper are just making sites by making text versions of videos.
What a bunch of great ideas, guys! This was a huge help.
Google doesn't need to look for duplicate anything and I seriously doubt they do any more. User metrics (how people react to your site) handles that for them. Humans are very good at spotting duplicated images, text, videos, etc. and the more we see of the same stuff the worse our reaction. Google just looks at how people react and that tells them all they need to know about your site.
The way to stand the test of time, in my view, is to focus all your efforts on thinking about your users and not to worry about how google might interpret that.
|The learning standards appear on over 35,000 pages as per Google. Each learning standard would be about 1-2 sentences per page of our site. Since our work is mostly interactive exercises, we might have 200 words a page. So 10-20% of each page would be duplicated content in Google's Eyes. |
I actually think Google is pretty good at sussing out site overhead and boilerplate, especially if it appears at the same spot on each page.
Think about it: How much duplicate content between pages does a templated site have? Quite a bit, many times more overhead than actual content and Google has no problem in segmenting the page. My opinion only, but one or two boilerplate sentences should be fine.
|How much duplicate content between pages does a templated site have? Quite a bit, many times more overhead than actual content and Google has no problem in segmenting the page. My opinion only, but one or two boilerplate sentences should be fine. |
Bingo. If the duplicated part is always in the same place, same format, exactly the same in every way, it isn't really duplicate content. It's simply repeated content. Like, ahem, all the bits around the edges of your typical search-engine results screen that remain the same on all pages. Or recurring headers: same address bar, title graphic and so on.
It may even be worse to fiddle with the wording, because then it looks like intentional duplication rather than unavoidable mechanical repetition.
I'm not too wild about the text-as-image approach. Fine for small bits of decorative text like headers, but you need everyone to be able to read it, regardless of indivisual visual acuity.
I appreciate everyone's thoughts and I ideas here. They bring up great points.
Although it will be more work, I will be putting the 1-2 sentences in an image. The end user experience is the same. So much for not designing sites based on G's liking.
Doing a little research, the sites that are reusing the standards in text can't rank for anything. I can see a number of sites that we're hit by Panda 1 and never returned according to third-party stats.
How about an I-frame?
Hmm, I've learned something about Google here, Good Luck with the image idea