Why does Google return RSS feeds in the SERPs?

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Why does Google return RSS feeds in the SERPs?

This is very annoying.

mikomido

3:29 am on Sep 4, 2007 (gmt 0)

Why do they do this? I click on a SERP link and expect a Web page, but get asked to subscribe to a feed! There isn't even any indication that it's an RSS feed, like they give you with PDFs.

tedster

4:26 pm on Sep 4, 2007 (gmt 0)

I'm with you, it is frustrating and it seems to me to be a waste of real estate on the SERP.

Badger37

4:46 pm on Sep 4, 2007 (gmt 0)

I also agree - it looks like a "school boy error" to me...

jimbeetle

4:59 pm on Sep 4, 2007 (gmt 0)

Yeah, RSS in the SERPs is among my biggest peeves. A complete waste of time for the user.

WiseWebDude

5:03 pm on Sep 4, 2007 (gmt 0)

Amen! Another thing that is irritating is YouTube in SERPS, when I am at work, we have YouTube blocked due to some goofballs abusing it...now all we see is a blank spot with a red x in it. All of that junks up the user's experience. YouTube, PDF, RSS Feeds, and even the darn Google News is crapping up the results now. Like some of you said, looks like some schoolboy did it.

Bones

5:04 pm on Sep 4, 2007 (gmt 0)

I'd be happy to seem them go too.

I'm sure it just confuses anyone who has no clue what an RSS feed is. (Most people I'd imagine!)

Samizdata

6:54 pm on Sep 4, 2007 (gmt 0)

Between Wikipedia, YouTube, Google News, indented results - and unwanted images on top of the lot - there is often nothing left above the fold worth bothering with these days. Including RSS and PDF (not to mention MFA) is just further evidence of Google's sad decline in quality.

A "Google Page Two" search engine might prove to be a winner...

WiseWebDude

4:15 pm on Sep 5, 2007 (gmt 0)

A "Google Page Two" search engine might prove to be a winner...

Hm, interesting idea! Maybe have one Google for the goofballs who like myspace crap and don't mind cluttered, glittery crap on a site and one for the intellectual type who likes his/her results nice, neat, clean and free from pics/junk/rss/videos/news in results. I never thought I would see the day when Google would clutter up like they have, but I guess I was mistaken. It's almost as if they are trying to become a portal like Yahoo now. Just MHO.

phantombookman

5:05 pm on Sep 5, 2007 (gmt 0)

>>A "Google Page Two" search engine might prove to be a winner

Somebody I know recently told me
"I don't bother with the first page of results I just go to the second page and onwards"
He's just a regular user - a very interesting observation.

As was this from a 14 year old girl
"never click on the results without a www in front of it or anything with loads of numbers and letters at the end, they're all dodgy sites - everyone knows that!"

I learn a great talking with people who have no knowledge of SEO or websites, it's easy to forget that they vastly outnumber us

Reno

5:57 pm on Sep 5, 2007 (gmt 0)

Perhaps it's time for Google to expand the Advanced Search page, by allowing the user to select the kind of returns they want (or do NOT want), as in:

Only display the following types of search results:

[] HTML
[] PHP
[] ASP
[] PDF
[] RSS
etc etc

............................

koan

8:20 pm on Sep 5, 2007 (gmt 0)

I never thought I would see the day when Google would clutter up like they have

You would think they forgot what partly made them so popular in the beginning, an uncluttered interface and results.

I'm with you all, I hate the extra RSS/PDF/YouTube/Image results. If I wanted to those, I would be searching for those, or at least, make it a "selectable" option in a search.

vincevincevince

11:18 am on Sep 6, 2007 (gmt 0)

Most real browsers do a good job of rendering an RSS feed, and an RSS feed when well rendered is no different to the front page of a newspaper - apart from the fact that it has fewer adverts and clutter.

Wlauzon

2:23 pm on Sep 6, 2007 (gmt 0)

Another problem is if you are using Google alerts, you can get 10-20 alerts on the exact same thing as all the various automated MFA blogs feed off of each other.

I just got an alert a few minutes ago that listed 7 alerts - and every one was an exact duplicate

[edited by: Wlauzon at 2:25 pm (utc) on Sep. 6, 2007]

g1smd

8:08 pm on Sep 7, 2007 (gmt 0)

I hate those RSS feeds in the results. They almost never have what I was looking for in the feed page that was delivered.

Marcia

8:51 pm on Sep 7, 2007 (gmt 0)

Bad, bad user experience. They should know better.

g1smd

9:23 pm on Sep 7, 2007 (gmt 0)

[] HTML
[] PHP
[] ASP

Err. Those are all "web pages", built using HTML code. The difference is only in how they were produced.

The others are entirely different types of documents.

Reno

10:03 pm on Sep 7, 2007 (gmt 0)

web pages", built using HTML code

I do realize that. The point is to give the user a choice, so they could just as easily say:

Do NOT Show:
[] PDF
[] RSS
[] Pages with embedded video
etc

.....................................

encyclo

1:18 am on Sep 8, 2007 (gmt 0)

To answer the original question: RSS feeds are indexed only because Google can't tell what the format the page is in.

All indexed RSS feeds are served with the mime-type

text/xml

- an outdated generic XML mimetype which covers not just RSS but also generic XML. Google indexes XML as plain text.

If users (and tools - WordPress serves RSS feeds as

text/xml

for example) actually used the recommended mime type

application/rss+xml

then they would not be indexed. Atom feeds are never indexed because they can't be served as

text/xml

only as

application/atom+xml

GoogleGuy

2:54 am on Sep 16, 2007 (gmt 0)

I think that we've been getting better recently about not showing RSS/Atom feeds in the search results.

Brett_Tabke

1:21 pm on Sep 17, 2007 (gmt 0)

> about not showing RSS/Atom feeds in the search results.

I like them there, but wish ther were some indication that they were RSS feeds (so I know what I am clicking on).

jtara

4:38 pm on Sep 17, 2007 (gmt 0)

Why aren't webmasters excluding RSS feeds using robots.txt? Don't they realize the potential for a duplicate-content penalty if they don't?

RSS feeds shouldn't be difficult to identify. While there are multiple formats and versions, they should still be easily-identified from specific XML tags.

On the other hand, it could be USEFUL to search for RSS feeds SPECIFICALLY. But, like so many other things, Google doesn't give us the ability to do that, because, we, as mere users, aren't so very smart, and need the PhDs at Google to decide what to shove down our throats.

So, that throws open the question of excluding with robots.txt again. Really, what is needed is for the search industry to decide what to do with RSS feeds and standardize on it. Ideally, RSS feeds SHOULD be indexed, but only shown when a user specifically requests a search that includes RSS feeds. (Note that I said INCLUDES RSS feeds - not Google's lame way of partitioning the world into black-and-white either-or categories that you have to go to different Google pages to search for - e.g. Academic Search, etc.).

An aside, as an earlier poster noted, PHP, ASP, etc. are not "content types". The confusion over this is just one reason why I recommend that webmasters NOT use .php, .asp, etc. extensions. Just use .html or no extension at all. There is no need to clue-in hackers as to what technology you are using, and, at the same time, confuse users.

jimbeetle

4:49 pm on Sep 17, 2007 (gmt 0)

I think that we've been getting better recently about not showing RSS/Atom feeds in the search results.

Better, but not yet good ;-)

Brett_Tabke

4:52 pm on Sep 17, 2007 (gmt 0)

> Why aren't webmasters excluding RSS feeds using robots.txt?

Why would you tell a spider to go away on your most important content!? That rss feed is seo gold. It is one of the main entry points and spider discovery pages on your entire site. I think it is as important as your homepage itself.

> Don't they realize the potential for a duplicate-content penalty if they don't?

There is less than zero risk.

There are two risks from excluding spiders from your RSS feeds:

a- One of the zillion rss scrapers will snag your page content and republish it before the search engines get it. Thus, you become the dupe content spammer on your own site!
b- The engines don't realize what is your freshest and most important content you have. That is a long term problem with "ever fresh" google.

You want Google and every other engine to munch your RSS feeds as fast as they can. Like you said Jlara, I personally recommend feeding your rss feeds to engines a few hours before you feed them to the general public. That will stop all the page scrapping/dupe content nonsense that is going on with the rss-discovery based content scrappers.

> "content types".

If you have ever programmed any type of rss/atom/xml aggregator, you know that there is so much total JUNK for content types and formats out there, that it is a wonder anything works at all. Hats off to Google for trying to sort the junk pile out.

GaryTheScubaGuy

5:03 pm on Sep 17, 2007 (gmt 0)

This is part of their 'Universal Search' and MSN is also showing similar results in their Live 2.0

This will become a growing issue in the future that you will hear more and more complaints about because showing all of these types of results in the top ten are pushing a few of the longstanding sites onto the second page.

GaryTheScubaGuy

jtara

6:28 pm on Sep 17, 2007 (gmt 0)

> Why aren't webmasters excluding RSS feeds using robots.txt?
Why would you tell a spider to go away on your most important content!?

The original post was a complaint that RSS feeds appear in search-engine results. As I pointed-out in an earlier response, wanting/not wanting RSS to appear in SERPs is really a user-specific preference, though. One - like many - that Google doesn't give the user to choose, though.

I should have been more specific (really, didn't throughly think it through...) about "excluding RSS feeds using robots.txt."

Unfortunately, current Internet standards make a mess of this situation.

If you'd like search engines to use your RSS feed to discover new content, but don't want the RSS feed itself indexed, the only way to do it today is with a META tag, which, unfortunately, aren't universally-recognized by search engines.

noindex,follow would seem appropriate.

This still doesn't put control in the hands of users, though, where it belongs. Users should be able to decide whether they want to see RSS feeds in results or not. That means either the search engines have to figure out whether it's an RSS feed or not, or the standards (and adhesion to standards) needs to be improved.

> Don't they realize the potential for a duplicate-content penalty if they don't?
There is less than zero risk.

Mea culpa. See above. That is, unless, you're RSS feed doesn't point to your main URLs for the content, but special URLs for RSS readers (perhaps to provide different formatting, etc.) In that case, you certainly do have a duplicate-content risk.

iamlost

6:50 pm on Sep 17, 2007 (gmt 0)

But now that side order of ads looks so very much more relevant...

blend27

7:20 pm on Sep 17, 2007 (gmt 0)

[] PDF � Click - Contacts Adobe Server, Tries to Launch an Adobe Acrobat Upgrade = Cancel, Close

[] RSS � Looks like complete nonsense when viewed in IE = Back(where was I?.. side order looks good) or Close(most of the time)

Video? don't even go there...

Added:
BTW, One of the large Shopping Comparison sites just dumped over 335,000 feeds, reviews of the products/services dating back to Feb 2000 - so much for ever fresh, and it is all indexed!

rohitj

1:26 am on Sep 18, 2007 (gmt 0)

Its important to note that a lot of browsers have built-in RSS feed readers that will present these pages in a useful manner.

blend27

1:56 am on Sep 18, 2007 (gmt 0)

--lot of browsers have built-in RSS--

Not in IE6 as of today, most USERS have NO CLUE what RSS is.

kamikaze Optimizer

2:58 am on Sep 18, 2007 (gmt 0)

IE 7.0 reads RSS and of course FireFox. I am with Brett, I like them there but wish that Google labeled them like Yahoo did until a year or so ago.

This 36 message thread spans 2 pages: 36