|Google won't index YahooGroups messages?|
Anyone know why or how to resolve it?
I have a long-established (Oct 1999) YahooGroup with 31392 messages full of relevant and meaningful content. Many of those messages have URLs in them that link back to my company's web site. Obviously, this is a trove of content and potential search benefit.
However, despite having the YahooGroup archive set to "open", allowing non-members and casual browsers to see the messages, Google never seems to index the messages themselves. It has picked up the pages that list the message subjects and have links _to_ the messages, but it refuses to index the messages themselves.
This is, of course, vexing to me. I've tried finding an explanation of the cause (and hopefully a solution) numerous times. It's hard finding the info you want when the relevant keywards are mostly "Google" and "Yahoo" and "Groups".
I've seen a few other people reporting the same problem in various places, but never an explanation or solution. I'm assuming it must be a Google policy rather than a Yahoo policy, since Yahoo would definitely want Google to direct traffic to them if it were their decision.
All insight welcomed.
BTW, I know Brett Tabke's nickname from the 80's. Hi Brett! ;)
Are Yahoo Groups messages even indexed by Yahoo in their regular search?
Why do you not move your group to google groups?
|BTW, I know Brett Tabke's nickname from the 80's. Hi Brett! ;) |
Very interesting indeed. What was it ;)
I don't know of any way to import my 32000 messages of important content into Google Groups, so that doesn't exactly solve the problem of content-not-being-indexed.
I have a tool for batch-downloading the historical YahooGroups data, but without being able to batch load it into Google Groups, GG is not a solution. If necessary I could create a "mirror" of the content on my own website, but then that needs to be maintained and updated as new content is created on the YahooGroup. Plus, it's a messy workaround for the problem of content that ought to be indexed simply not being indexed for no obviously good reason.
reseller: I'm keeping that as a highly-guarded secret. You're welcome to ask Brett if he wants it disclosed. :)
[edited by: XenonofArcticus at 11:20 pm (utc) on Jan. 9, 2007]
Well, dang it. I must have hit the wrong dang button when I replied to this earlier.
|Are Yahoo Groups messages even indexed by Yahoo in their regular search? |
From what I found earlier using Yahoo Site Explorer [siteexplorer.search.yahoo.com], basically no.
I checked a few very large, seemingly quite active public groups and the most pages I found was for one large group -- 18. Most had only 2 or 3 pages, a couple had 8 and 11, or something along those lines.
Does anyone have any idea about why Google would choose to deliberately black out this large and fairly rich body of information from their index? I can see possibly applying some form of uniform link-nofollow policy to prevent gobs of link spam from influencing sites ranking, but the content of many of these lists is valuable information. Dare I say, better than the informational level of the larger Internet as a whole -- especially lists like ours which is strictly members-post-only and closely policed for topic relevance.
|From what I found earlier using Yahoo Site Explorer, basically no. |
That's what I thought. Perhaps the question should actually be why isn't Yahoo indexing their own content? If they've not got confidence in it, it's not surprising Google don't index it either. Too spammy overall perhaps?
I think matt cutts stated several times he hates signature links in forums so maybe this is the cause of why your content isn't spidered.
[edited by: Pirates at 12:24 am (utc) on Jan. 10, 2007]
I see some Yahoo Groups that have most of their messages indexed.
This might be a "duplicate content" question because Yahoo makes each message available at multiple URLs (some with extra subdomain information). Maybe that has triggered something for some groups.
|Does anyone have any idea about why Google would choose to deliberately black out this large and fairly rich body of information from their index? |
As said, it ain't only Google, Yahoo appears to do a very poor job of it also (and a quick check shows MSN Live results are similar).
Have you run your group's main URL through Yahoo Site Explorer?
g1smd, yeah dupe content can be a problem here. I also noticed that most of the groups I checked have only one or two backlinks, one of which is from the groups directory page. So not enough link love might also come into play.
Pirates: I can see Matt's viewpoint on that, as signature links can be abused. I don't think it's improper to have my company's URL in the signature of on-topic messages I post to various forums though. If I'm an expert on pink poodles, and I post a lot of useful pink poodle info to a Poodle Forum, I think that there is a defensible association between the content I posted and my web page.
Forum spamming and signature spamming is just taking the same thing much further -- too far.
As far as duplicate content goes, while you can _reach_ the messages through several avenues, I'm aware of only one, unique URL for each message. I could be wrong. Either way, Google should show _something_ for it, though I would expect the ranking could be reduced if the content were mirrored through multiple Yahoo subdomains.
> I see some Yahoo Groups that have most of their messages indexed.
Do you? I couldn't find any. Is there a way you could send me a URL somehow of a group that has at least _one_ actual message indexed by Google? Maybe I can figure out what they do differently.
>As said, it ain't only Google, Yahoo appears to do a very poor job of it also (and a quick check shows MSN Live results are similar).
Yeah, but Google is the only one I care about. I know Yahoo's search results are useless. :)
>So not enough link love might also come into play.
Possibly. My group is linked directly from my company's main website navbar, so it's quite exposed there. And, as I said, Google did spider the group page, just not the actual _messages_. So it seems like a deliberate policy, though it still could be a very specific accidental oversight.
|And, as I said, Google did spider the group page, just not the actual _messages_. So it seems like a deliberate policy, though it still could be a very specific accidental oversight. |
You seem to keep wanting to think that this is just Google not indexing the Yahoo groups. What I'm trying to say is that both Yahoo and Msn are also not doing a good job of this. There is something, somewhere that seems to stump the bots.
Have you checked your groups pages in Yahoo Site Explorer?
I guess I have no specific reason to assume it's a policy decision by Google, other than the fact that it would be counter to Yahoo's goals to block other search engines from indexing their content. Y! makes money running ads on their YahooGroups content, so I can't conceive of a reason why they would want to block access to it to search engines. Therefore, my assumption was that the blocking was being done by Google. However, as has been noted, MSN and Yahoo themselves seem blind to this content too, so it's an open question.
I am not very familiar with Yahoo Site Epxlorer, but I punched in my group's URL. The only useful information that I got from it was:
Pages (1) ¦ Inlinks (1)
and an offer to authenticate the site (which I can't do, because it's a YahooGroup forum, not a real site that I can upload arbitrary files to).
Sort of indicates that Yahoo doesn't have much in its index either, which comes back to the original question. Why, and is it because of someone's decision (who? and why?), something I did wrong (but it seems to afflict almost every YahooGroup) or some technical HTML mishap preventing spidering.
What a tangled web...
Regrettably, there really isn't anyone at either YahooGroups or Google whom one could ask for an informed opinion on the matter.
He, he, he. Yep.
I just looked again at the source for the message index page and an individual message page to see if there was soemthing in the code that was blocking links being followed. Unfortunately, Y! has so much stuff going on at the top of the source I really can't make much sense of anything.
It might be an interesting test if you throw a link from your main site to a couple of individual messages. Give it a couple of weeks, then go back and see what pops up in siteexplorer.
I had a suspicion that Y!G's occasional interstitial ads might somehow throw spiders off. If you notice, when you try to read a message, sometimes you get an ad from that URL instead with a "click to go on" link. I wondered if somehow they were using some "go on" technique that was impassible to spiders.
But then I thought -- surely Yahoo can't be so incompetent as to accidentally excluded all search engines from seeing and directing traffic to their ad-supported content?
I suspect only Google or Yahoo know the whole answer. One would expect Google to try to dig out all the content it can, within technical feasibility, but perhaps there's a hang-up in navigating Y!G's message pages. I can't think of a good way to quickly prove or disprove this, not having access to a spider of my own.