homepage Welcome to WebmasterWorld Guest from 23.20.220.61
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
Duplicate Content Going to hit me for this?
luke175




msg:3169604
 5:23 pm on Nov 27, 2006 (gmt 0)

I use 3 versions of the same sales page on my site. The reason I do this is because each one has a different affiliate program tied to it.

I don't combine them into one because that would not be fair to affiliates who promote one page but then lose their commission because the customer paid through another method.

This system has worked great for me but now I'm wondering if I'll get penalized for this since I have 3 pretty much identical pages on one site.

For example I have mydomain.com, mydomain.com/affiliate1.htm, mydomain.com/affiliate2.htm

Am I going to get penalized for this and is there anything I can do?

 

g1smd




msg:3169787
 8:51 pm on Nov 27, 2006 (gmt 0)

If the content is very similar then you may have a problem coming.

Swanny007




msg:3169800
 9:08 pm on Nov 27, 2006 (gmt 0)

You could get into trouble depending on a few things. Do you have a link somewhere on the site to each of the affiliate pages?

If you want to get around the problem, you can use robots.txt to block search engines from indexing the affiliate pages and just have them index the main product page.

luke175




msg:3169865
 10:22 pm on Nov 27, 2006 (gmt 0)

I didn't really want to block one with robots.txt but if I'm going to get knocked for it then perhaps that's what I need to do.

How about sites that have one page and then a "printable" version of the same page. Aren't they essentially doing the same thing?

g1smd




msg:3169867
 10:26 pm on Nov 27, 2006 (gmt 0)

Yes, they are - and the print-friendly page should also be excluded from being indexed.

LifeinAsia




msg:3169906
 11:05 pm on Nov 27, 2006 (gmt 0)

Look at it from the user's perspective. If you have 3 pages with identical content, why should more than 1 of them be indexed?

g1smd




msg:3169916
 11:26 pm on Nov 27, 2006 (gmt 0)

Since the "print friendly" page is likely to have no navigation links to the rest of the site, it is a very user unfriendly place to land directly from a search engine results page.

luke175




msg:3170196
 6:58 am on Nov 28, 2006 (gmt 0)

Ok, this is interesting...

All of the pages are indexed in Google. In fact, a couple of the affiliate pages are indexed with an affiliate's username (i.e. mydomain.com/affiliate1.htm?-affiliatedude)

So if I go excluding those pages I knock a potential revenue source and tick off an affiliate.

Should I just leave things be?

helpnow




msg:3170529
 2:35 pm on Nov 28, 2006 (gmt 0)

"All of the pages are indexed in Google."

Just because they are indexed, does not mean you are not being hurt by it.

Different URLs -> same content = duplicate content.

You need to move on this fast, or you are going to lose your rankings. You've got 4 weeks tops before you lose your rankings.

Trust me - I speak from experience. I had a variations on what you have, it toook me a long time to figure out what happened, and then I had to wait weeks after I fixed it before my rankings were restored. The net result is that 2006 will end up being one of our worst years. Everything is fine now, thank goodness...

"All of the pages are indexed in Google."

In my case, I sometimes had more than 30 (!) URLs all pointing to the same content, all of them happily indexed by google, and my rankings sank like a stone. Just because they are indexed does not mean you do not have a problem.

You have duplicate content issues. Fix it now. The preferred fix is to use your httpd.conf and .htaccess files to do the fix. robots.txt is a blunt tool for a fine-tune job, but you might get away with it in your case.

** And with all due respect, you need to look at the rest of your site/urls and make sure you don't have other dup content issues. You seem to have the presence of midn to recofgnize this as dup content - but you may have inadvertently done this elsewhere and still not realize it. Leave no stone unturned in your search.

Trust me - fix this now.

europeforvisitors




msg:3170562
 3:06 pm on Nov 28, 2006 (gmt 0)

Why not simply replace the duplicate pages with newly written pages? Does the exact same text need to be on all three pages?

LifeinAsia




msg:3170681
 4:24 pm on Nov 28, 2006 (gmt 0)

I'm still trying to figure out you have to have 3 different pages, 1 for each affiliate program. Can you explain this is more detail? I find it difficult to believe that there is not soem way to get around that problem, and I'm sure we can provide several different solutions if we knew what the underlying block is.

Jordo needs a drink




msg:3170744
 5:05 pm on Nov 28, 2006 (gmt 0)

How about sites that have one page and then a "printable" version of the same page. Aren't they essentially doing the same thing?

I use no follow and no index on my printable version pages...

luke175




msg:3170950
 7:33 pm on Nov 28, 2006 (gmt 0)

I'm still trying to figure out you have to have 3 different pages, 1 for each affiliate program. Can you explain this is more detail? I find it difficult to believe that there is not soem way to get around that problem, and I'm sure we can provide several different solutions if we knew what the underlying block is.

Each page is the same sales page with a different payment processor.

As you may or may not know there are various affiliate programs out there for those that sell software. Clickbank, Paydotcom, Sharesale, among hundreds of others.

I have multiple setup because some of my best affiliates are in countries not supported by one or another. For example, some affiliates are in Singapore and an affiliate processor won't pay there, etc.

It exposes me to a lot of places and has worked well for me so far.

JoeHouse




msg:3170984
 7:55 pm on Nov 28, 2006 (gmt 0)

I have a question and there are many mixed feelings from the experts on this.

Its regarding homepages that read like this: "mydomain.com/index.html"

Question: Lets say you start on the homepage which initially reads "http://www.mydomain.com"

Then you go to a section page on that same site and later decide to return back to the homepage. However this time the homepage reads: "http:www.mydomain.com/index.html"

Does google see this as duplicate content because there are two url's "http://www.mydomain.com" and "http://www.mydomain.com/index.html" now having the same content?

Does google look at this as dup content? Will I be penalized for this?

Should I change it to all read "http://www.domain.com" If I should change it, what would be the best way?

There appears to be different opinions on this subject and I would like to hear what is the best way to handle this to ensure no penalties, supplemental pages and better rankings on Google.

g1smd




msg:3171081
 8:48 pm on Nov 28, 2006 (gmt 0)

Yes, that is yet another form of duplicate content.

You'll need a 301 redirect from the index file name to "/" for each folder and the root.

You should update all internal links to no longer include the actual index file name in the link.

Make sure that when you link to a folder-based URL that you always end with a trailing / on the very end of the URL.

URLs that you redirect will show up as Supplmental for some while. You can safely ignore those.

dibbern2




msg:3171089
 8:53 pm on Nov 28, 2006 (gmt 0)

Helpnow has spelled it out. It looks like you are risking a big penalty against the smaller convienence of keeping those pages alive. That's a bad gamble: big loss vs. small win.

I would NOINDEX those puppies NOW!

JoeHouse




msg:3171100
 8:59 pm on Nov 28, 2006 (gmt 0)

g1smd or anybody who would like to jump in

Among other things can this be the cause of all my pages (except the homepage) being supplemental?

Also if I correct everything that is causing supplemental issues, is there any hope I will get back into the main index any time soon?

How long does getting from supplemental back to main index process take?

JoeHouse




msg:3171105
 9:01 pm on Nov 28, 2006 (gmt 0)

dibbern2

Can you elaborate on this?

I tend to agree with g1smd take on this.

Please Advise.

Thanks!

purplekitty




msg:3171115
 9:07 pm on Nov 28, 2006 (gmt 0)

How about sites that have one page and then a "printable" version of the same page. Aren't they essentially doing the same thing?

I use a pdf file as a printable version on a site I created at the beginning of the year and had gotten all of the html pages indexed in the main index. Eventually, Google latched onto the pdf files since it can read the text and started indexing those instead sending my previously indexed html pages into supplemental. I researched what the problem might be and immediately added a disallow of pdfs to my robots.txt and added noindex tags to my links to the pdfs. The pdfs fell out of the main index slowly. My html pages are still in supplemental, but my homepage is finally showing up on the 2nd page of the serps for some competitive terms I was shooting for.

dibbern2




msg:3171222
 10:08 pm on Nov 28, 2006 (gmt 0)

Joe, I was addressing the OP. G1's advice is always expert. Good luck.

helpnow




msg:3171425
 1:01 am on Nov 29, 2006 (gmt 0)

It can take 2-6 weeks to get your rankings back, all things being equal, i.e. I do not know what may happen with this being the xmas season - I do not know how google will behave in the next 6 weeks with SERPs...

Don't worry about supplementals. Those in and of themselves are not a problem. Some pages that are "supp" are not necessarily dup content. Supplemenals are simply supplemental content, for "whatever" reason. Maybe dupes, maybe just because they are of no extra value and google doesn't want them in the main index. Whatever. Doesn't matter. Do not worry too much about how many pages are or are not in supplemental. Yes, some dup cont pages can end up in the supp index, which makes sense, because they are supplemental pages now... But really, do not get too caught up about what you see there.

The possibility that any URLs you have put 'out there' are dup content is your big problem. You need to use httpd.conf to rewrite any damage that has been done, i.e. you need to permanently move, read: rewrite, all duplicate URLs (except for 1, keep reading) to the 1 stable URL you are going to go with. So, take the set of URLs that point to one page of content. Pick 1 that you want google to keep. Then, all the rest of the URLs, you need to use rewrite in httpd.conf to permanently change those URLs to point to the 1 URL you want google to know about.

Then, when google crawls your site, when they hit one of those URLs you are now trying to kill, your rewrites will intercept it, and change the URL to the 1 URL you want to keep, so that google will understand that there are, say, 3 URLs that have permanently moved into one URL now. So theoretically, the 1 URL will stay in the main index, and the others will get moved to supplemental.

So what will happen is, slowly, depending on how often google crawls your site, they will begin hitting those "bad" URLs, run into your rewrite, and be given the 1 URL you want google to have. So, over time google will begin to understand those bad URLs are no good and which URL you say is OK. Then, when they do their next data refresh, your new set of URLs, with the dupes now dropped out, will come in, and your rankings will / should come back.

It will take time, again, a few weeks at least. So, patience is needed at that point. Makes the fixes, then leave it alone.

Meanwhile, from what I understand, those dup URLs may actually remain in the supp index for 1 year, and then after that, they will finally drop out. This is why it doesn't matter if they are in the supp index, that won't affect your rankings.

Hope this helps more...

helpnow




msg:3171427
 1:05 am on Nov 29, 2006 (gmt 0)

P.S. In my opinion, and I may be wrong ; ), noindex is great if you do that _before_ the damage is done. But now that the damage has been done, and google knows about the bad dupe URLs, you need to do more than noindex -> you need to rewrite those bad URLs into the good URLs using rewrite rules and your httpd.conf. Kinda crappy, but you have to. Your httpd.conf is the road map you now need to provide google with so they know what the hell is going on and how to navigate their way, on your behalf, out of this duplicate content tar baby you got yourself into.

JoeHouse




msg:3171506
 2:23 am on Nov 29, 2006 (gmt 0)

helpnow

That's great advice. Seems like a ton of work. My question is this. Has anybody gotten out of supplemental and into main index without having to rewrite all url's?

Unless I am mistaken I do believe many have gotten out without rewriting url's? Can anybody confirm this?

My major problem was dup content from manufacturer's product descriptions which I have now corrected.

Since the removal of dup content google has pick up a couple thousand new pages but have placed them all into supplemental except for homepage which is in main index.

It appears google has recognized my efforts and now preparing to let me out into the main index.

Traffic has picked up a lot. In fact google loves to display my results in the highly visable "One Box" results.

I could be wrong but I think you can get out without an entire rewrite of url's.

luke175




msg:3172435
 6:32 pm on Nov 29, 2006 (gmt 0)

P.S. In my opinion, and I may be wrong ; ), noindex is great if you do that _before_ the damage is done. But now that the damage has been done, and google knows about the bad dupe URLs, you need to do more than noindex -> you need to rewrite those bad URLs into the good URLs using rewrite rules and your httpd.conf. Kinda crappy, but you have to. Your httpd.conf is the road map you now need to provide google with so they know what the hell is going on and how to navigate their way, on your behalf, out of this duplicate content tar baby you got yourself into.

Couldn't a person use Google's "remove a page from the index" feature to remove an offending page once it has been incorrectly indexed?

I did this before when a members-only area of a site was crawled and it was delisted in about 48 hours.

helpnow




msg:3172498
 7:19 pm on Nov 29, 2006 (gmt 0)

"Couldn't a person use Google's "remove a page from the index" feature to remove an offending page once it has been incorrectly indexed? "

The problem with that is, it isn't just about google. What about Yahoo? About.com? etc. And, even worse, what about other webmasters who may have linked to you, some you may be aware of, some you never dreamed of? What if they are using an URL you want eradicated? These other sources will subsequently get crawled and the URL you are trying to get rid of will once again resurface and you'll be back to Square 1. It'll be deja vu all over again.<grin>

luke175




msg:3172537
 7:46 pm on Nov 29, 2006 (gmt 0)

I was referring to doing the above after writing robots.txt to exclude the page so it won't be reindexed.

g1smd




msg:3172674
 9:35 pm on Nov 29, 2006 (gmt 0)

You can try to remove the page from view, but the removal tool does not actually remove the page from the index at all.

It merely removes the page from view for 90 or 180 days and then it reappears with exactly the same status that it had when it was "removed".

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved