Forum Moderators: Robert Charlton & goodroi
A "Google Page Two" search engine might prove to be a winner...
A "Google Page Two" search engine might prove to be a winner...
Hm, interesting idea! Maybe have one Google for the goofballs who like myspace crap and don't mind cluttered, glittery crap on a site and one for the intellectual type who likes his/her results nice, neat, clean and free from pics/junk/rss/videos/news in results. I never thought I would see the day when Google would clutter up like they have, but I guess I was mistaken. It's almost as if they are trying to become a portal like Yahoo now. Just MHO.
Somebody I know recently told me
"I don't bother with the first page of results I just go to the second page and onwards"
He's just a regular user - a very interesting observation.
As was this from a 14 year old girl
"never click on the results without a www in front of it or anything with loads of numbers and letters at the end, they're all dodgy sites - everyone knows that!"
I learn a great talking with people who have no knowledge of SEO or websites, it's easy to forget that they vastly outnumber us
I never thought I would see the day when Google would clutter up like they have
You would think they forgot what partly made them so popular in the beginning, an uncluttered interface and results.
I'm with you all, I hate the extra RSS/PDF/YouTube/Image results. If I wanted to those, I would be searching for those, or at least, make it a "selectable" option in a search.
I just got an alert a few minutes ago that listed 7 alerts - and every one was an exact duplicate
[edited by: Wlauzon at 2:25 pm (utc) on Sep. 6, 2007]
All indexed RSS feeds are served with the mime-type
text/xml - an outdated generic XML mimetype which covers not just RSS but also generic XML. Google indexes XML as plain text. If users (and tools - WordPress serves RSS feeds as
text/xml for example) actually used the recommended mime type application/rss+xml then they would not be indexed. Atom feeds are never indexed because they can't be served as text/xml only as application/atom+xml.
RSS feeds shouldn't be difficult to identify. While there are multiple formats and versions, they should still be easily-identified from specific XML tags.
On the other hand, it could be USEFUL to search for RSS feeds SPECIFICALLY. But, like so many other things, Google doesn't give us the ability to do that, because, we, as mere users, aren't so very smart, and need the PhDs at Google to decide what to shove down our throats.
So, that throws open the question of excluding with robots.txt again. Really, what is needed is for the search industry to decide what to do with RSS feeds and standardize on it. Ideally, RSS feeds SHOULD be indexed, but only shown when a user specifically requests a search that includes RSS feeds. (Note that I said INCLUDES RSS feeds - not Google's lame way of partitioning the world into black-and-white either-or categories that you have to go to different Google pages to search for - e.g. Academic Search, etc.).
An aside, as an earlier poster noted, PHP, ASP, etc. are not "content types". The confusion over this is just one reason why I recommend that webmasters NOT use .php, .asp, etc. extensions. Just use .html or no extension at all. There is no need to clue-in hackers as to what technology you are using, and, at the same time, confuse users.
Why would you tell a spider to go away on your most important content!? That rss feed is seo gold. It is one of the main entry points and spider discovery pages on your entire site. I think it is as important as your homepage itself.
> Don't they realize the potential for a duplicate-content penalty if they don't?
There is less than zero risk.
There are two risks from excluding spiders from your RSS feeds:
a- One of the zillion rss scrapers will snag your page content and republish it before the search engines get it. Thus, you become the dupe content spammer on your own site!
b- The engines don't realize what is your freshest and most important content you have. That is a long term problem with "ever fresh" google.
You want Google and every other engine to munch your RSS feeds as fast as they can. Like you said Jlara, I personally recommend feeding your rss feeds to engines a few hours before you feed them to the general public. That will stop all the page scrapping/dupe content nonsense that is going on with the rss-discovery based content scrappers.
> "content types".
If you have ever programmed any type of rss/atom/xml aggregator, you know that there is so much total JUNK for content types and formats out there, that it is a wonder anything works at all. Hats off to Google for trying to sort the junk pile out.
This will become a growing issue in the future that you will hear more and more complaints about because showing all of these types of results in the top ten are pushing a few of the longstanding sites onto the second page.
GaryTheScubaGuy
> Why aren't webmasters excluding RSS feeds using robots.txt?Why would you tell a spider to go away on your most important content!?
The original post was a complaint that RSS feeds appear in search-engine results. As I pointed-out in an earlier response, wanting/not wanting RSS to appear in SERPs is really a user-specific preference, though. One - like many - that Google doesn't give the user to choose, though.
I should have been more specific (really, didn't throughly think it through...) about "excluding RSS feeds using robots.txt."
Unfortunately, current Internet standards make a mess of this situation.
If you'd like search engines to use your RSS feed to discover new content, but don't want the RSS feed itself indexed, the only way to do it today is with a META tag, which, unfortunately, aren't universally-recognized by search engines.
noindex,follow would seem appropriate.
This still doesn't put control in the hands of users, though, where it belongs. Users should be able to decide whether they want to see RSS feeds in results or not. That means either the search engines have to figure out whether it's an RSS feed or not, or the standards (and adhesion to standards) needs to be improved.
> Don't they realize the potential for a duplicate-content penalty if they don't?There is less than zero risk.
Mea culpa. See above. That is, unless, you're RSS feed doesn't point to your main URLs for the content, but special URLs for RSS readers (perhaps to provide different formatting, etc.) In that case, you certainly do have a duplicate-content risk.
[] RSS – Looks like complete nonsense when viewed in IE = Back(where was I?.. side order looks good) or Close(most of the time)
Video? don't even go there...
Added:
BTW, One of the large Shopping Comparison sites just dumped over 335,000 feeds, reviews of the products/services dating back to Feb 2000 - so much for ever fresh, and it is all indexed!