Welcome to WebmasterWorld Guest from 18.104.22.168
However, despite having the YahooGroup archive set to "open", allowing non-members and casual browsers to see the messages, Google never seems to index the messages themselves. It has picked up the pages that list the message subjects and have links _to_ the messages, but it refuses to index the messages themselves.
This is, of course, vexing to me. I've tried finding an explanation of the cause (and hopefully a solution) numerous times. It's hard finding the info you want when the relevant keywards are mostly "Google" and "Yahoo" and "Groups".
I've seen a few other people reporting the same problem in various places, but never an explanation or solution. I'm assuming it must be a Google policy rather than a Yahoo policy, since Yahoo would definitely want Google to direct traffic to them if it were their decision.
All insight welcomed.
BTW, I know Brett Tabke's nickname from the 80's. Hi Brett! ;)
I have a tool for batch-downloading the historical YahooGroups data, but without being able to batch load it into Google Groups, GG is not a solution. If necessary I could create a "mirror" of the content on my own website, but then that needs to be maintained and updated as new content is created on the YahooGroup. Plus, it's a messy workaround for the problem of content that ought to be indexed simply not being indexed for no obviously good reason.
reseller: I'm keeping that as a highly-guarded secret. You're welcome to ask Brett if he wants it disclosed. :)
[edited by: XenonofArcticus at 11:20 pm (utc) on Jan. 9, 2007]
Are Yahoo Groups messages even indexed by Yahoo in their regular search?
From what I found earlier using Yahoo Site Explorer [siteexplorer.search.yahoo.com], basically no.
I checked a few very large, seemingly quite active public groups and the most pages I found was for one large group -- 18. Most had only 2 or 3 pages, a couple had 8 and 11, or something along those lines.
From what I found earlier using Yahoo Site Explorer, basically no.
That's what I thought. Perhaps the question should actually be why isn't Yahoo indexing their own content? If they've not got confidence in it, it's not surprising Google don't index it either. Too spammy overall perhaps?
joined:Dec 1, 2003
[edited by: Pirates at 12:24 am (utc) on Jan. 10, 2007]
This might be a "duplicate content" question because Yahoo makes each message available at multiple URLs (some with extra subdomain information). Maybe that has triggered something for some groups.
Does anyone have any idea about why Google would choose to deliberately black out this large and fairly rich body of information from their index?
As said, it ain't only Google, Yahoo appears to do a very poor job of it also (and a quick check shows MSN Live results are similar).
Have you run your group's main URL through Yahoo Site Explorer?
g1smd, yeah dupe content can be a problem here. I also noticed that most of the groups I checked have only one or two backlinks, one of which is from the groups directory page. So not enough link love might also come into play.
Forum spamming and signature spamming is just taking the same thing much further -- too far.
As far as duplicate content goes, while you can _reach_ the messages through several avenues, I'm aware of only one, unique URL for each message. I could be wrong. Either way, Google should show _something_ for it, though I would expect the ranking could be reduced if the content were mirrored through multiple Yahoo subdomains.
> I see some Yahoo Groups that have most of their messages indexed.
Do you? I couldn't find any. Is there a way you could send me a URL somehow of a group that has at least _one_ actual message indexed by Google? Maybe I can figure out what they do differently.
>As said, it ain't only Google, Yahoo appears to do a very poor job of it also (and a quick check shows MSN Live results are similar).
Yeah, but Google is the only one I care about. I know Yahoo's search results are useless. :)
>So not enough link love might also come into play.
Possibly. My group is linked directly from my company's main website navbar, so it's quite exposed there. And, as I said, Google did spider the group page, just not the actual _messages_. So it seems like a deliberate policy, though it still could be a very specific accidental oversight.
And, as I said, Google did spider the group page, just not the actual _messages_. So it seems like a deliberate policy, though it still could be a very specific accidental oversight.
You seem to keep wanting to think that this is just Google not indexing the Yahoo groups. What I'm trying to say is that both Yahoo and Msn are also not doing a good job of this. There is something, somewhere that seems to stump the bots.
Have you checked your groups pages in Yahoo Site Explorer?
I am not very familiar with Yahoo Site Epxlorer, but I punched in my group's URL. The only useful information that I got from it was:
Pages (1) ¦ Inlinks (1)
and an offer to authenticate the site (which I can't do, because it's a YahooGroup forum, not a real site that I can upload arbitrary files to).
Sort of indicates that Yahoo doesn't have much in its index either, which comes back to the original question. Why, and is it because of someone's decision (who? and why?), something I did wrong (but it seems to afflict almost every YahooGroup) or some technical HTML mishap preventing spidering.
What a tangled web...
Regrettably, there really isn't anyone at either YahooGroups or Google whom one could ask for an informed opinion on the matter.
What a tangled web...
He, he, he. Yep.
I just looked again at the source for the message index page and an individual message page to see if there was soemthing in the code that was blocking links being followed. Unfortunately, Y! has so much stuff going on at the top of the source I really can't make much sense of anything.
It might be an interesting test if you throw a link from your main site to a couple of individual messages. Give it a couple of weeks, then go back and see what pops up in siteexplorer.
But then I thought -- surely Yahoo can't be so incompetent as to accidentally excluded all search engines from seeing and directing traffic to their ad-supported content?
I suspect only Google or Yahoo know the whole answer. One would expect Google to try to dig out all the content it can, within technical feasibility, but perhaps there's a hang-up in navigating Y!G's message pages. I can't think of a good way to quickly prove or disprove this, not having access to a spider of my own.