homepage Welcome to WebmasterWorld Guest from 54.204.215.209
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld

Home / Forums Index / Code, Content, and Presentation / Content Management
Forum Library, Charter, Moderators: ergophobe

Content Management Forum

    
Duplicate content dilemma
BrianHayesMusic




msg:4591666
 11:37 pm on Jul 9, 2013 (gmt 0)

Hello all. I'm a new-be at SEO and am working with a few sites that I've built myself.

I'm a classical guitarist and have two almost identical sites; one has my contact info so people can get a hold of me directly and the other is "agent friendly," meaning that it has none of my contact info. Still, there is quite a bit of content that is duplicated on each site, such as my bio, and I've been told that this is a big no-no for Google to recognize the site.

Is there a way of keeping Google from seeing the content in one of the sites? Some sort of <# no robots.txt> that has to be inserted or something? Or am I just forced to re-write my bio for the second site to prevent Google from ignoring both pages?

Any help is greatly appreciated and sorry I'm such a new-be.

Brian Hayes

 

lorax




msg:4591696
 3:43 am on Jul 10, 2013 (gmt 0)

Welcome to WebmasterWorld Brian!
What you're really asking for is the noindex directive [google.com].

The noindex directive tells the bots not to index the content of the webpage you use the directive on - so in a sense, you can tell the bots to ignore the pages that are duplicated on one of your sites. This will avoid the duplicate content issue.

phranque




msg:4591718
 5:55 am on Jul 10, 2013 (gmt 0)

welcome to WebmasterWorld, Brian!

since you mentioned "robots.txt" you should understand that robots.txt is actually a protocol for excluding crawlers such as googlebot, but if you want to use the meta robots noindex element you must allow the url to be crawled or the bot won't see the noindex directive.

to prevent indexing of an HTML document you put this in the <head> of the document:
<meta name="robots" content="noindex">.

ergophobe




msg:4591947
 8:16 pm on Jul 10, 2013 (gmt 0)

Brian, I've been working with a lot of entertainers lately and this seems ubiquitous. Some have five or six websites targeting different niches with the same content over and over again.

We usually try to identify their strongest domain in terms of rankings and suggest they consolidate on that domain. Sometimes there are compelling reasons to consolidate on a weaker domain (part of a branding strategy or larger marketing strategy). But we rarely see cases where entertainers truly benefit from having multiple, nearly identical sites.

So I know you said that one is "agent friendly", but how can you control who goes to which site? If someone searches on your name or what have you, how will they get from one site to the other?

I would suggest that if most of the content is duplicate, that means that you really should have just one site. If only a bit if it is duplicate, you should rewrite that content and you should interlink the sites and also claim both sites as a Google Author.

Finally, the other thing I see in entertainer sites is terrible URL canonicalization.

Say your domain is example.com, if I type

www.example.com/page
www.example.com/page/
example.com/page
www.example.com/page/
www.example.com/page?q=random_string

And so forth, do those all resolve? Only one should resolve and the rest should redirect to that single page.

Another simple test. Do a site:example.com search

How many pages has Google indexed? If that number is much, much higher than the number of pages you have, you have a dupe content problem of some sort.

I've seen entertainers with 4-year-old blogs show 18,000 indexed pages for their site. This would mean five days a week, he's writing 18 posts per day... or he has a major dupe content issue.

BrianHayesMusic




msg:4592053
 3:06 am on Jul 11, 2013 (gmt 0)

Thanks for the responses! There's a lot to digest here but I will start by responding to some of ergophobe's questions.

Ergophobe, thank you for your detailed response and questions.

The purpose for having an agent friendly web site, of course, is so that agents can share my web site with their clients without giving them direct access to me. It's essential to have a web site like this when working with agents. The main way they will access this web site is by me sharing the domain with them directly. I was also considering having an "Agent Friendly Website" link in my main menu on my other site so that it could be accessed that way too. The site that I need to be recognized by Google, though, is not the agent friendly one but the other site, since with that site, I'm trying to promote myself without the help of agents.

Everything you see on these two sites, I built myself but since I'm fairly new to all of this, I'm still quite unfamiliar with much of the terminology. I don't know what you meant by "resolve" in the example you gave. (You had asked "Do those all resolve?".) You'd also asked: "How many pages has Google indexed?" I'm sorry to be such a new-be but don't know what that means either. I've also never "blogged" and am not really clear on that.

Is there hope for me or am I pretty much screwed?

[edited by: ergophobe at 5:31 am (utc) on Jul 11, 2013]
[edit reason] removed personal URL [/edit]

ergophobe




msg:4592063
 5:47 am on Jul 11, 2013 (gmt 0)

Ah...

Resolve.

"Resolve" means that if I type that URL into the browser, I get a webpage at that address (essentially, the process of figuring that a URL points to a given place on a given machine is called "resolving" the URL).

Ideally, you want one and only one address to resolve for any given page. Any variations of that URL should redirect (see below). So a URL with and without a 'www' is TWO different URLs. Only one should resolve. The other should redirect.

Redirect.
Redirects come in different forms. The most important thing is that you want to use a Permanent Redirect. This tells Google that the URL that the user typed in is *never* going to be a valid address for this page. These are also called "301 Redirects" because that is the numerical code for a Permanent Redirect.

Use the Firefox Live HTTP Headers extension to see what's happening on the server level in terms of redirects.

As a general rule, if you have www.example.com on your marketing materials, you want anyone who types in example.com to get redirected to www.example.com

How many pages are indexed in google?

searching on "site:example.com" will tell you

Blogging
That was just what that guy was doing. Has nothing to do with you. My only point was that no one guy was going to write 18,000 pages of content in 4 years, so he therefore must have a dupe content problem.

Is there hope?
We all knew nothing about the internet at one point. Keep asking questions. When you see a question that you can answer (maybe here: [webmasterworld.com...] ), pitch in and help out so other people with more knowledge will have time to answer your questions. That's how it works!

ergophobe




msg:4592065
 5:48 am on Jul 11, 2013 (gmt 0)

PS... and once you have some terminology to search with, Google is your friend

https://www.google.com/search?q=301+redirect&oq=301+redirect&aqs=chrome.0.57j62l3.3453j0&sourceid=chrome&ie=UTF-8

BrianHayesMusic




msg:4592069
 6:56 am on Jul 11, 2013 (gmt 0)

Thank you so much for your time. The whole thing seems frightening to me right now but I'm sure I won't continue to feel so buried as long as I work really hard.

One more question (actually, it's from before; it just must be hard to believe that I'm this ignorant.)... What does "indexed" mean?

ergophobe




msg:4592134
 3:26 pm on Jul 11, 2013 (gmt 0)

Indexed: Google knows about that page and has it in their index.

Being indexed doesn't mean you rank for anything, but a page that's not indexed can't possibly rank and a page that's indexed under several different URLs (say with and without 'www') will have trouble ranking because its potential authority is being split multiple ways.

Think of the Aesop's fable where the father gives each son in turn a bundle of sticks and asks each son to break it. None can. Then the father unbinds the bundle and gives each son one stick and asks hime to break it, which each son does easily. (http://mythfolklore.net/aesopica/milowinter/13.htm )

When you have multiple URLs for each page, it's like those pages are one thin stick. When you solve the dupe content problems, it's like binding those sticks together.

So you want to look at the indexed pages for your site both to be sure your pages that count *are* indexed AND to be sure that you don't have a lot of junk URLs that should not be getting indexed.

lucy24




msg:4592222
 8:08 pm on Jul 11, 2013 (gmt 0)

The purpose for having an agent friendly web site, of course, is so that agents can share my web site with their clients without giving them direct access to me. It's essential to have a web site like this when working with agents. The main way they will access this web site is by me sharing the domain with them directly. I was also considering having an "Agent Friendly Website" link in my main menu on my other site so that it could be accessed that way too. The site that I need to be recognized by Google, though, is not the agent friendly one but the other site, since with that site, I'm trying to promote myself without the help of agents.

What's to stop a potential client from looking up your name in the search engine of their choice and finding your direct site? Doesn't have to be from innate nastiness or let's-cut-out-the-middleman thriftiness; it can also happen when they misplace the link the agent gave them.

This is a webmaster site and not a music business site, so we can't suggest that it may be simpler to take a position of "All contract negotiations MUST go through my agent". Then it wouldn't matter what URL people use to find you, because the money can only flow along one path.

ergophobe




msg:4592237
 9:09 pm on Jul 11, 2013 (gmt 0)

That's what I was driving at above. You have a basic paradox

1. You're trying to get your main site to rank

2. You have a class of visitors you don't want to see that site.

Those goals are fundamentally at odds. At best, you can offer a site that is agent friendly and have the entire site as "noindex", but as Lucy24 says, there's nothing to keep those people there.

Indeed, I would say that anyone who is doing their due diligence is going to search on your name. When they do so, the non-agent-friendly site will come up in the search results. So I would guess that the vast majority of people who see the agent-friendly site will also see the other one.

This is why we overwhelmingly try to get entertainers to consolidate on one site.

Like Lucy24 said, we're not in the music biz, so this could be bad advice. However, it seems to me that what I see with models is that they have their personal site and their agent has an agency site that has a page for each model. To me, it's up to the agent to create the agent-friendly pages which should be consolidated on the agency site. But again, I know nothing about the music biz.

BrianHayesMusic




msg:4592348
 3:46 am on Jul 12, 2013 (gmt 0)

That's very interesting points you all are making. I might look into changing my opinion on having an agent friendly site. As far as my agent's clients being able to find me by searching the web... Yes, absolutely they could. But people who choose to work with agents, typically want to work with agents. The agents would just rather not put my contact info right into the palm of their hands, is all. But I will be looking into all of this further and asking other people in my biz as well. Thank you all for your input!

lorax




msg:4592436
 1:52 pm on Jul 12, 2013 (gmt 0)

I'm curious which site you want to get indexed & ranked by the search bots?

You could simply add a line to your robots.txt file to tell the bots not to visit/index an entire site. [robotstxt.org...]

ergophobe




msg:4592472
 3:43 pm on Jul 12, 2013 (gmt 0)

>>tell the bots not to visit/index

lorax - robots.txt does not tell SEs not to index a site. Even worse, if the site is indexed and then you block the crawl with robots.txt, you at that point can't add a rel=noindex meta tag.

If you don't want a page/site indexed, use a meta tag with rel=noindex. Note that the page will still get crawled.

If you don't want a page/site crawled, use robots.txt. Note that the page may still get indexed.

If you don't want a page crawled or indexed, use both.

phranque




msg:4592483
 4:03 pm on Jul 12, 2013 (gmt 0)

If you don't want a page crawled or indexed, use both.


that dog won't hunt.
(see my post above)

BrianHayesMusic




msg:4592513
 5:16 pm on Jul 12, 2013 (gmt 0)

This is awesome and thanks so much for all of your comments.

Phranque,

You're saying this would be the solution, then? (BELOW) Placing that code you posted at the very beginning? Or does it need to be placed somewhere else in my code? This is the code for one of the pages in my site.


<meta name="robots" content="noindex">

[edited by: ergophobe at 6:03 pm (utc) on Jul 12, 2013]
[edit reason] No need to post huge amounts of code [/edit]

BrianHayesMusic




msg:4592514
 5:17 pm on Jul 12, 2013 (gmt 0)

I'm gonna come back and delete all of that later, once you've had a chance to see it.

BrianHayesMusic




msg:4592516
 5:20 pm on Jul 12, 2013 (gmt 0)

And I guess the verdict is still out as far as whether or not to use both. Ergophobe said that you can so I'm waiting to see how this discussion resolves!

BrianHayesMusic




msg:4592524
 5:31 pm on Jul 12, 2013 (gmt 0)

Iorsx, you're right on the money. That's exactly what I was trying to do, initially. Have my main site indexed and my agent friendly site not, to avoid duplicate content.

ergophobe




msg:4592538
 6:01 pm on Jul 12, 2013 (gmt 0)

that dog won't hunt.
(see my post above)


Yeah, I made the same mistake I was trying to correct in lorax's post! And that statement is in direct conflict with my initial comments in that same post. Zoiks!

What I mean to say, is that you need to use the noindex until it's removed from the index and Google has recorded it as a noindex, then block the crawl.

But yes, blocking the crawl is for blocking the crawl (saving bandwidth), and noindexing is for noindexing.

And I guess the verdict is still out as far as whether or not to use both. Ergophobe said that you can


Cause ergophobe is an idiot and types faster than he thinks. In the past month I've lectured two clients on exactly why what I posted above is wrong and why phranque is right. I should know better!

lucy24




msg:4592583
 8:23 pm on Jul 12, 2013 (gmt 0)

Cause ergophobe is an idiot and types faster than he thinks.

Memorize this locution: "When my {fingers typed|mouth said} A, my brain meant B."

lorax




msg:4593015
 11:27 am on Jul 14, 2013 (gmt 0)

you need to use the noindex until [the page or resource] is removed from the index and Google has recorded it as a noindex

Absolutely. I'd neglected to illuminate this point so thank you.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Content Management
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved