Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Google is indexing my RSS, should I block them?

Should I block RSS content at robots.txt?

         

pmurillo

3:22 pm on Mar 23, 2007 (gmt 0)

10+ Year Member



I'm the webmaster of < example.com >. Today I have done a site:example.com query at Google.com and was very pleased to discover that there are around 3.600 pages indexed. Previously, they were around 360.

However, I have later discovered that most of the content indexed by Google is in the form of RSS files, which is not directly viewable when you click on it, not at least in the form of HTML pages, which is what I wanted.

My site is principally a forum, and for each post there is a permalink and a RSS feed. Should I block the RSS files using a robots.txt? Is Google going to see my content as duplicate and penalize me for it?

I'm looking forward to your suggestions. I'm clueless about what to do now.

<Sorry, no specific domain names.
See Forum Charter [webmasterworld.com]>

[edited by: tedster at 4:00 pm (utc) on Mar. 23, 2007]

jimbeetle

4:40 pm on Mar 23, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I truly doubt (hope) Google would not see RSS feeds as duplicate content. If it did, it could literally knock millions upon millions of pages out of its index. I would hope the PHDs at G are a bit brighter than this.

But not all that bright. I block my feeds simply because I can't stand Google including RSS feeds in its main index. In my opinion, one of the worst user experiences is clicking on a search result only to find it's a feed. A literal waste of the user's time and effort. Why G doesn't break feeds out in a onebox feature I just don't can't comprehend.

Either way, blocking your feeds is probably the safest bet.

pmurillo

9:04 am on Mar 25, 2007 (gmt 0)

10+ Year Member



Thanks for the reply. I also think that clicking on a RSS file because it was listed on the SERP is a very bad user experience, so I have decided to block access to the RSS files. Would this robots.txt file do the trick?

User-agent: *
Disallow: /*.rss$

panos

11:59 am on Mar 25, 2007 (gmt 0)

10+ Year Member



Why don't you dress up the RSS output with CSS?

You can then specify that it's an RSS feed (with some explanation about what RSS is) and put a link that leads your visitors to the HTML pages.

pmurillo

1:24 pm on Mar 25, 2007 (gmt 0)

10+ Year Member



Interesting observation. Not sure about how to dress the RSS with CSS. Can you please ellaborate on that?

panos

1:49 pm on Mar 25, 2007 (gmt 0)

10+ Year Member



If you already know CSS setting one up for your RSS feed is trivial.

Start by adding a xml-stylesheet tag to your RSS feed:

<?xml-stylesheet type="text/css" href="http://www.example.com/rss.css"?>

The next step is to create the CSS file.

Inside this file you can define how each RSS tag is displayed.

The following will work with a RSS 2.O and it's only applied to some tags, you can add style to other RSS tags too.

rss {
display: block;
font-family: arial,sans-serif;
}
title {
display: block;
margin: 10px;
padding: 4px;
color: black;
border-bottom: 1px solid gray;
}
link {
display: block;
font-size: small;
padding-left: 10px;
}
item {
display: block;
padding: 4px 25px 4px 25px;
}

To make links clickable you have to use an XSL stylesheet (just google xsl for more info).

pmurillo

3:05 pm on Mar 25, 2007 (gmt 0)

10+ Year Member



Many thanks for your reply. The CSS works like a charm. I will try the XLS to make the content clickable, as you suggest.

My RSS feed containst HTML entities like <p> and <BR>. Is there any way to get them displayed as they would be shown by a browser?

Currently Internet explorer displays them as a text editor would

I'm looking forward to your reply

followgreg

1:57 am on Mar 26, 2007 (gmt 0)

10+ Year Member




Well concerning the GG Phd's mentioned above I unfortunately see that on one of our blogs RSS is indexed while ACTUAL pages are not - moreover RSS's have more PR than actual pages.That's a little silly IMO from GG to to that.
Anyway, I don't really care but it hurts to see nice unique pages with no PR while a comment feed has PR. Anyways :)