homepage Welcome to WebmasterWorld Guest from 54.204.64.152
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Pubcon Website
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

This 142 message thread spans 5 pages: < < 142 ( 1 2 3 4 [5]     
WordPress And Google: Avoiding Duplicate Content Issues
What about posts in few different categories?
manca




msg:3097708
 3:50 pm on Sep 26, 2006 (gmt 0)

Hey guys,

I was wondering what do you think about blogs and WordPress. As you know wordpress can have categories in which it'll show certian posts.
So now I can have 3 categories: A,B,C and then make a post which will be posted in all 3 cats...it'll show in each category, as well as on main page and in archives. As you can see there are many places on the site where that certian post shows.

What do you think, is this duplicate content, or not? How does Google treat such a behaviour?

Any clues?

THanks,
Manca

 

skweb




msg:3151383
 3:13 pm on Nov 9, 2006 (gmt 0)

Our company has six blogs using WP - never tinkered with the settings. Indeed, only when one does a site: command, one can see RSS/trackback/email this/comments RSS feeds but generally not in an actual Google search.

I have been following the whole discussion on duplicate content - simply not an issue, even though we often assign multiple categories to each post. Google relies heavily on what you actually link to and what anchor text you use to serve the results.

So for all of you who believe that we are in business for our readers rather than bots, relax - just create good content and life will be good.

johne




msg:3154038
 3:47 am on Nov 12, 2006 (gmt 0)

My RSS comment feed is showing up in supplemental results. It seems to be the last little bit that I need to clean up. Anybody have any ideas? This one seems tougher in that the comments RSS feed directory "/feed/" comes at the end of the permalink URL.

Is there a way I can remove it via robots.txt using a wildcard? Think I heard this on a webmasterradio show but i can't recall the specifics.

Thank you.

tedster




msg:3154043
 3:55 am on Nov 12, 2006 (gmt 0)

Here's the Google reference on their support for pattern matching / wildcards in the robots.txt file [google.com].

Also note that Yahoo's slurp now also supports wild cards in the robots.txt file.

Gumball Monkey




msg:3158083
 2:24 am on Nov 16, 2006 (gmt 0)

Implemented the changes recently and my pages have slowly disappeared from the supplemental index. However, there is still duplicate content due to the indexing of both the content of my mainpage and single posts. Is it wise to disable indexing of the mainpage? Is there a better solution?

victorP




msg:3158093
 2:35 am on Nov 16, 2006 (gmt 0)

1. Use the "more" tag.

2. Do not allow catagories to get indexed.

3. Generate unique titles and meta desc for the single posts

4. Drop meta desc. from index2,index3,index4, ect.

Google will generate its own unique description based on the content.

5. Generate unique titles for index2,index3,index4, ect.

I did this and the results are rather good!

Gumball Monkey




msg:3158682
 4:16 pm on Nov 16, 2006 (gmt 0)

Awesome VictorP, but could you point me in the right direction as far as coding?

MrBlack




msg:3162395
 1:12 pm on Nov 20, 2006 (gmt 0)

<This message was spliced on to this thread from another location>

I have a wordpress blog, around a month or two old, and after a slow start it was very well indexed by googlebot and started to do very well in the serps.

However I noticed that my wp feed started to rank higher than individual posts or my home page. So to counter this I placed some disallow rules in my robots.txt....but I have just seen a big drop in the amount of pages listed in G's index for my blog.

Have I made a mistake in my robots.txt?

User-agent: *
Disallow: /wp-
Disallow: /search
Disallow: /feed
Disallow: /comments/feed
Disallow: /feed/$
Disallow: /*/feed/$
Disallow: /*/feed/rss/$
Disallow: /*/trackback/$
Disallow: /*/*/feed/$
Disallow: /*/*/feed/rss/$
Disallow: /*/*/trackback/$
Disallow: /*/*/*/feed/$
Disallow: /*/*/*/feed/rss/$
Disallow: /*/*/*/trackback/$

[edited by: tedster at 5:33 pm (utc) on Nov. 20, 2006]

Craig_F




msg:3172073
 2:32 pm on Nov 29, 2006 (gmt 0)

just noticed an old blog has both the dynamic php urls and the static urls listed for some pages. any ideas how to properly redirect *all* of those at once to the static version?

g1smd




msg:3172681
 9:37 pm on Nov 29, 2006 (gmt 0)

Since * is a widcard, I don't think you can have multiple wildcards in a disallow line.

tedster




msg:3172691
 9:41 pm on Nov 29, 2006 (gmt 0)

Here's one example of two wildcards on Google's own support pages:

To block access to all URLs that include a question mark (?), you could use the following entry:

User-agent: *
Disallow: /*?*

[google.com...]


john5000




msg:3183358
 12:24 pm on Dec 9, 2006 (gmt 0)

Do these links result in duplicate content problems?

http://www.example.com/test-post/

http://www.example.com/test-post/#comments

They are both the same page, but the second link jumps down to the comments section.

tedster




msg:3183743
 7:30 pm on Dec 9, 2006 (gmt 0)

No duplicate trouble there -- the "named anchor" part of a url is not spidered.

john5000




msg:3186139
 12:14 pm on Dec 12, 2006 (gmt 0)

Great, thanks!

Now how can I say noindex for numbered pages that follow the home page?
So that the home page is indexed, but the 2nd, 3rd, 4th... etc pages are not?

g1smd




msg:3187305
 11:47 am on Dec 13, 2006 (gmt 0)

Add <meta name="robots" content="noiindex"> on each one that should not be indexed.

john5000




msg:3191396
 11:42 pm on Dec 17, 2006 (gmt 0)

>>Add <meta name="robots" content="noiindex"> on each one that should not be indexed.<<

Thanks, but I don't think that works with wordpress. I have one header template file for the whole site. All the pages are dynamically generated.

In my header I have this:


<?php
if (is_single() ¦¦ is_page() ¦¦ is_home()) {
echo "<meta name=\"robots\" content=\"index,follow\"/>\n";
} else {
echo "<meta name=\"robots\" content=\"noindex,follow\"/>\n";
}
?>

I want to say noindex for all the pages that are after the homepage, which are just a chronological ordering of posts as they're bumped off the homepage.

iridiax




msg:3191438
 12:23 am on Dec 18, 2006 (gmt 0)

I have this in my home.php to make sure that only the first page is indexed:

<?php
if (is_home() && ($paged <= "1")) {
echo "<meta name=\"robots\" content=\"index,follow\"/>\n";
} else {
echo "<meta name=\"robots\" content=\"noindex,follow\"/>\n";
}
?>

john5000




msg:3191521
 2:21 am on Dec 18, 2006 (gmt 0)

Oh sweet, it worked! Thanks iridiax!

I added that little bit && ($paged <= "1") to my code above and it worked like a charm.

Thanks a million :)

PaulPA




msg:3193255
 1:55 pm on Dec 19, 2006 (gmt 0)

I would say that Adam's post [googlewebmastercentral.blogspot.com] makes this topic even more important:

Understand your CMS: Make sure you're familiar with how content is displayed on your Web site, particularly if it includes a blog, a forum, or related system that often shows the same content in multiple formats.

purple




msg:3207349
 10:32 pm on Jan 3, 2007 (gmt 0)

Just read the thread, this is a great thread!

Can someone summarise what code I should put in my header.php so only the index page and the single post pages are cached by google.

Also what I need to add to robots.txt to stop feeds being cached.

purple




msg:3216147
 4:10 pm on Jan 11, 2007 (gmt 0)

I have this in my home.php to make sure that only the first page is indexed

Does Home.php = index.php or header.php?

I still can't get this code to work

<?php if(is_home() is_single() is_page()){
echo <meta name="robots" content="index,follow">;
} else {
echo <meta name="robots" content="noindex,follow">;
}?>

Can I paste into Header.php as is? I have read something about pipes? what do i need to change in layman terms.

Patrick Taylor




msg:3216261
 5:24 pm on Jan 11, 2007 (gmt 0)

"home" refers to index.php, and as far as I can see, that code should work if pasted into header.php (but with pipe symbols, not broken pipe symbols).

PaulPA




msg:3216269
 5:28 pm on Jan 11, 2007 (gmt 0)

After looking at all the posts and knowing the need for preventing dup content, I'm wondering why there isn't a greater push to just stop this in the robot.txt. I would think most people are using the custom URI option and it seems that stopping the indexing of categories, archives, extra pages and feeds could be accomplished pretty easily that way. Am I missing something?

< continued here: [webmasterworld.com...] >

[edited by: tedster at 5:44 am (utc) on Mar. 11, 2007]

This 142 message thread spans 5 pages: < < 142 ( 1 2 3 4 [5]
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved