No - the UK BBC website still has hi-graphics and lo-graphics pages both in the index, even though the HTML is pretty much identical.
Indeed, that site has thousands of duplicated pages. Both sets are indexed.
It's not as simple as a fixed threshold.
Unfortunately Derek, we're not the BBC so we have to watch our step a little more carefully.
bakedjake, it seems there's been an on and off run on dupicates since over a year ago, and there were discussions only a few months back about template based sites having some problems. It does seem to have gotten a little more stringent, but I haven't noticed anything unusual recently.
I don't have duplicates, but I have noticed a problem repeatedly when the very top of pages is identical across sites. The last time I did a dup check, a few months ago, the person's pages got hit with being 80% unique.
I'd like to test this myself. Is there a tool somewhere for comparing dupe pages? How does one come up with these percentages?
Thanks in advance.
<Anyone see a radical dupe filter change this weekend>
One of my competitors who owns three ( 3) 100% identical sites (sic!) reached last week a top position with one of them; the two others are top 30 also.
So would an RSS feed probably trigger the dupe filter now, even if it's only 1/3 of the page's content (for example)?
Anyone know what percentage of similar content is dangerous?
[edited by: ciml at 8:42 pm (utc) on Sep. 30, 2004]
[edit reason] Please see StickyMail. [/edit]
I watch an industry where everybody makes tons of pages one for each country and there is no change there. There is very little difference between the country pages. They often times make several pages per country because people use all kinds of adj's to find their products. I bet these pages are at the most 1% different if that. The only difference is a few words.
It is impossible to make a algo that can guarantee that each page will be a certain percentage different from others when you have thousands of pages but you can make it different enought to make sure it is at least over a certain percentage. I try for 15% which is not that hard if you know what you are doing.
|Pass the Dutchie|
not much to base this on as I have not ventured into this realm before until a few weeks back.
While attempting to beat the sandbox I placed a 'doorway' of sorts on the back of an existing established site, designed to look very similar to the home page of the site I am trying to drive the traffic to.
Result was the page rocketed to the top of the SERPs in 48 hours for the targeted keywords.
To avoid the dupe checker I hosted the page on separate IP, re-named images, body text altered 10%, headers and title altered, nav links altered to image map with JS links. This week when the doorway page fell from grace and got buried at the bottom of the pile I removed it. If my established site is now banned altogether then it will be a harsh lesson.
If this was a result of the dupe filter then in my case it took G less than 2 weeks to notice the similarity and acted faster then sh*t off a brick.
It would appear to me that large template based sites took another hit.
I am not seeing this in my sector.
You mean to say your site took a hit today or on 22 sept?
|large template based sites |
I often see this term and wonder what it means. Does it mean a text template, in the sense that there are 'items', each with a set of information pretty much standard, like price, description, dimension, number of fuzzy buttons, etc? Or that a common template is used for the HTML formatting across pages? Or that it's pretty much standard text with just a few text variations from page to page? Like a text 'We all love [thing]. Buy [thing] since having [thing] is a must for succeeding in business and once Molly bought her [thing], she saw her business double overnight, and now she visits her [thing] customers in her own pink learjet! Hoorah [thing]! Buy [thing]!' - that sort of site?
I'm finding on my site that google will only go 101k deep on my pages now. I have had a rather large header file with navigation for my site on every page. So I'm sure that this header file is taking up a good bit of what google indexes so it may consider it dublicate content. I changed it last night to a really small file and put the navigation on the right side. We'll see how that does. I have noticed on a lot of pages that I have adsense on that the ads were based on words in my header file and not the content of the pages. I'd rather have my navigation on the top, its easier for humans to use, which, according to google is what we are supposed to make our pages for, but now it seems that google doesn't like that.
"large template based sites" means that every page has a certain amount of html that is exactly the same. No matter how much different each page is it starts off with most of the pages looking almost the same. Every page has the same header and footer.
A lot of people make a site template that is blank then every new page starts with this template. Some templates are quite elaberate with tons of code. If you use a template try to keep the code as small as possible. CSS is the best way to do this if you wnat a fancy looking page.
If your template is 75K and all you do is add 5K of content to each new page then G thinks that your pages only has about 6% difference from each other.
it's been grabbing only 101k for a long long time.
Jake not sure how related this is, and we only started seeing it today, but there's evidence of duplicated/templated pages and/or tons of repetitive internal backlinks doing better today...at least where we are looking.
Not sure if this is related, but I have today seen a couple of my AFFILIATES websites replace the URL of my own website in Google.
They appear to be identical content, but they are really just the same page. I give a special URL to my affiliates to track their sales. Previously, Google could tell the difference, but it seems something has changed. I've had this problem for a long time with Inktomi. I just thought Google was smarter.
|"large template based sites" means that every page has a certain amount of html that is exactly the same. No matter how much different each page is it starts off with most of the pages looking almost the same. Every page has the same header and footer. |
Like sites that use content management systems?
>>>Not sure if this is related, but I have today seen a couple of my AFFILIATES websites replace the URL of my own website in Google.<<<
I don't like to start a controversy but if a site owner wants affiliates then he has got to expect a good affiliate to replace him in the serps.A smart SEO can easily make google rank their affiliate referrer code in place of the sponsors page.
If you want affiliates then let them do what they do best and you concentrate on what you do best and supply the goods bought.
And don't think that just because the aff ref code is scoring in the serps that you would've ranked in the same place for the same keyword.
|They appear to be identical content, but they are really just the same page. I give a special URL to my affiliates to track their sales. Previously, Google could tell the difference, but it seems something has changed. I've had this problem for a long time with Inktomi. I just thought Google was smarter. |
We also had this problem with our affiliate system, as each page looked like an ordinary .html page (i.e. with no special characters), which resulted in quite a bit of what google thought was duplicate content being indexed. The good news is that it was easily solved by having our system add a no-index robots meta-tag to all affiliate-referred pages.
>>>The good news is that it was easily solved by having our system add a no-index robots meta-tag to all affiliate-referred pages. <<
All I can say is that you have cost yourself a heap of search engine traffic and probably some good affiliates.
epic - if you mean 101k is all you see in the cache - that has always been the case - however the bot will go deeper than that when actually following links - its a misnomer to think that 101k is a limit to bot deepness. Is your header file graphic intensive - just want to make sure you are not including graphics in that number.
I think you are misunderstanding me.
The page that has replaced mine IS MY PAGE. In this case, the affiliate did not build his own page that ranks better. It is just a tracking URL that i give to some of my affiliates. It just forwards them to certain pages of my website, and tracks it so i can pay them. For some of my pages, Google has replaced my normal website URL with the tracking URL for the affiliate. The affiliate in this case doesn't have any way to change the page to make it rank higher.
I have worked very hard over the years to have my pages rank high in Google so i can get a fair amount of "FREE" traffic. It hurts to see my URL's replaced with the tracking URL's, which i have to pay a 16% commission on. I'm sure it thrills my affiliates, but not me.
Here is an example.
when i search for "Blue Widgets", my site used to come up in the #8 position in Google as such:
Now, it is showing one of my affiliate tracking URL's as such:
They are both the identical page on my site. Previously, Google has been able to tell the difference, and has kept my affiliate tracking URL's out of the index, but something has recently changed.
I guess I'm going to have to change how my affiliate program works, and cancel all affiliate tracking links.
an interesting side note...
The affiliate link has a cache date of Sep. 29th, and my normal pages all have a cache date of Sep. 18th.
My normal site seems to update it's cache on google about every two weeks. I'm crossing my fingers that my normal page will replace the affiliate page when it is cached next.
I'm also crossing my fingers that Google doesn't see all my affiliate tracking links as Duplicate Content.
Another interesting note.
Last night when i searched google for the url of the affiliate tracking link, it was showing up in Google.
Now, it doesn't show up when i search directly for the URL, but it does show up when i search for "Blue Widgets"
|All I can say is that you have cost yourself a heap of search engine traffic and probably some good affiliates. |
On the contrary, our organic traffic (especially from Yahoo) has returned and/or begun returning.
I think you may misunderstand. Our system creates links for the affiliates that look like this:
Which, of course, makes them look like a normal page, but also (unfortunately), caused some duplicate content isssues, as the regular page looks like:
While we do everything we can to supprt our affiliates, you can't possibly expect us to accept a duplicate content penalty jsut so a few can have their affiliate ID'd pages in the SERPS.
At any rate...we promote our program MUCH differently than most merchants in that we forbid search engine spamming, and actually limit who we accept into our program. Considering our conversion rates exceed 2.5%, our affiliates who do "play by our rules" are VERY happy ;-)
You need to tell your affiliates that they can not duplacate your content or act like you are them. I have received emails from the sites I affiliate for and they asked me to make changes and I did.
sounds like webfusion and I have the same problem.
ogletree, you are misunderstanding.
Our affiliates aren't duplicating our content.
It's just two different URL's pointing to the SAME PAGE. One URL belongs to us, and the other is just a special tracking URL to point to the same page. The affiliates use this tracking URL to point to a certain page of our site, and they get paid if someone buys a prouduct. In this case, the affiliate isn't creating any content at all, just pointing to our page.
I hope this makes sense now.
The bad news for me is that I can't do what Webfusion suggests above with the no-index robots meta-tag, as mine is a Yahoo Store, and Yahoo creates the links for me. I am just going to have to cancel all my affiliates, and create some kind of new program external of Yahoo Store. Unless someone else has any suggestions on how to make this work with a Yahoo Store. I guess maybe i could create some landing pages that were strictly for my affiliates. Well, off to think about it.
<OT>Ummm, ever consider building a separate site for the aff's?</OT>
| This 41 message thread spans 2 pages: 41 (  2 ) > > |