Forum Moderators: Robert Charlton & goodroi
I was wondering what do you think about blogs and WordPress. As you know wordpress can have categories in which it'll show certian posts.
So now I can have 3 categories: A,B,C and then make a post which will be posted in all 3 cats...it'll show in each category, as well as on main page and in archives. As you can see there are many places on the site where that certian post shows.
What do you think, is this duplicate content, or not? How does Google treat such a behaviour?
Any clues?
THanks,
Manca
As I see it the individual pages are the "money pages". There used to be a great illustration on this page [searchengineworld.com...] but it seems to have gone AWOL
Basically the "more" tag allows you to decide where the dividing line for what to show on the home and category pages is. If you write 4 paragraphs put the "more" tag at the end of the first. This way the remaining content (majority) is only on the individual page and not duplicated anywhere else.
Graywolf, thanks for the tip.
Do you find most of your search results in Google go directly to an individual post, or do the related categories show up in the index?
For example, for some of my keywords I get the category indexed as a result and for other keywords the individual post gets indexed.
Include all the stuff for Googlebot in that section because if there is a User-agent: Googlebot section, then Google totally ignores the User-agent: * section of the robots.txt file.
Just read your remark: I cannot confirm your statement; I have seen that Googlebot has always taken into account the * section if there is also a Googlebot section.
Or are there other conditions I do not know?
Why Google Might "Ignore" a robots.txt Disallow Rule [webmasterworld.com]
<?php if(is_home() ¦¦ is_single() ¦¦ is_page()){
echo <meta name="robots" content="index,follow">;
} else {
echo <meta name="robots" content="noindex,follow">;
}?>
Is there anything else I need to change apart from the ¦¦ to get it working? :(
TD
<?php if(is_home() ¦¦ is_single() ¦¦ is_page()){
echo <meta name="robots" content="index,follow">;
} else {
echo <meta name="robots" content="noindex,follow">;
}?>
Is there anything else I need to change apart from the ¦¦ to get it working? :(
TD
Why Google Might "Ignore" a robots.txt Disallow Rule
Many thanks for referencing that brain washing robots.txt thread! For many years I have seen the specifications the other way round!
BUT just now I am checking one simple and very specific example (disallowing only one page of a site) which does not fit to the "new rules": Only mentioned in the * section the page should be indexed by G; in earlier times SITE: showed only the URL as usual but now the page is not listed at all (all other pages of the site are indexed).
Every time I post to any of my blogs now I always create a unique except for each post. In doing this I get a "snippet" of content that only goes to one category page and a full "page" of content that goes to the permalink page. Of course I only select one category per post. In doing this it really splits up the content nicely.
Like others I would love some recommendations however on how to keep google bot from indexing my various rss feeds.
Any ideas?
Brian
Like others I would love some recommendations however on how to keep google bot from indexing my various rss feeds.
You can then configure feedburner to just use a summary for each post. The way I have things set up is feedburner uses either the first 250 words of a post, or if I've manually entered a meta description in the exerpt field (with help from head-meta description plugin) then the description/excerpt gets shown in the feed.
Unfortunately, I believe feedburner pages can still be indexed, and partial feeds may be unwanted for some publishers. But atleast the feed has clickable links back to your website and I would imagine GOOG and other SEs might recognize feedburner feeds as feeds, and not penalize for duplicate content (any thoughts on this?).
I thought about just adding <meta name="robots" content="noindex,follow"> to the top of all the wordpress feed template files, but then I read on one of google's Q/A pages that this might interfere with indexing the site in google's blog search.
I thought about just adding <meta name="robots" content="noindex,follow"> to the top of all the wordpress feed template files, but then I read on one of google's Q/A pages that this might interfere with indexing the site in google's blog search.
So you think it might be a bad idea to disallow */feed pages in robots.txt?
I΄m very troubled by this problem. Around 80% of my google hits are coming in to the /post/feed pages.
Here's what I read from google:
What if I don't want to be listed [In Google Blog Search]?If you do not publish a site feed for your blog, it will not be included in Blog Search. However, if you previously published a site feed that was included, the old posts will remain in the index, even though new ones are not added.
Blog Search will also respect robots.txt files and NOINDEX, NOFOLLOW meta tags, as described here.
[google.com...]I dont know if "disallow" is treated differently from "noindex". I also don't know how these tags would effect other blog search engines like technorati.
I'm still learning all this stuff on the fly. Hopefully some others can chime in on the feed issue :)
Yeap the disallowing the feeds is a tricky one. I've manage only to add a nofollow to the links that point to feeds.
Here goes:
edit /wp-includes/feed-functions.php
find function comments_rss_link
add the nofollow: echo "<a rel='nofollow' href='$url'>$link_text</a>";
The only problem is that WP will pars doublequotes as singlequotes.
Things I've done:
- Basic 301 from all non-www pages to www pages
- Same 301 if the page is called without / at the end to redirect it to the same page with / in the end.
- An unique meta desc. tags using head-meta plugin
- Testing if page is archive or page>1 and adding noindex,follow meta tag
- Optimized title for page, so it looks: "Name of Post: Name of blog" for perm. posts and "Name of blog" for main page. Also for categories: "Name of category Category: Name of blog" (unique titles)
things are all moving out of the similar pages section into pages of their own right. nothing is supplemental. in fact, it's all looking really good now.
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{HTTP_HOST}!^www.yoursite.com$ [NC]
RewriteRule ^(.*)$ [yoursite.com...] [R,L]
RewriteBase /
RewriteCond %{REQUEST_FILENAME}!-f
RewriteCond %{REQUEST_FILENAME}!-d
RewriteRule . /index.php [L]
</IfModule>
Replace 'yoursite.com' with your own URL - obviously! ;)
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{HTTP_HOST} !^www\.yoursite\.com$ [NC]
RewriteRule ^(.*)$ http://www.yoursite.com/$1 [R,L]
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>