Forum Moderators: Robert Charlton & goodroi
Many times they appear on larger corporate sites, but at times they show up on niche sites.
Has anyone come close to breaking the algo on this one?
I am not asking anyone to give it up, rather I want to find out how many people have figured it out.
minnapple
To begin with, I'd say the Sitelinks algorithm is still a work in progress. At the end of February, the criteria seemd to get a lot easier - see: Google Adds Sitelinks for a New Batch of Websites [webmasterworld.com]. But I will share some observations.
If your domain is NOT made of generic keywords, then you've got an easier road. The more disambiguation the search query needs, the less likely it is to see Sitelinks, as I see it. But it's still very possible. I've got one client whose business name and domain name is made up of three one syllable words, concatenated. They now get Sitelinks on a search for those three very common words, even with spaces.
Home pages with a short main menu seem to do better. The simple site structure makes life easier for the algo, I assume. However, Sitelinks are not restricted just to the menu links. Recently a different client published an article that got significant direct backlinks - and that article now appears as a Sitelink.
That last observation points either to backlinks or traffic data, I'd say.
[edited by: tedster at 5:00 am (utc) on Mar. 13, 2008]
one of the sitelinks used "XYZ" for the anchor text and another used "Clients" for anchor text.
this wasn't semantically clients as in paying customers but rather server clients.
everything that i could find on the site or inbound anchor text used "Example Clients" and "Example XYZ".
never could find the source...
i'm sure all three of us (minnapple, tedster and certainly myself) have been through the matt cutts sitelink video numerous times.
if i missed something specific that answers any of the questions above, please point out where in the video you found it.
i can assure you matt isn't giving up the algo to anything...
I will try and find it [but need sleep first]. I saw it a few weeks ago on youtube and Matt was talking about sitelinks. He didn't go into too deep, but did give an idea how it get it to work. Kinda.
[edited by: McMohan at 3:57 am (utc) on Mar. 14, 2008]
Now the season is beginning, and so is their traffic. And Sitelinks just reappeared this week!
anyone figured it out?
I've described the qualifier for Sitelinks in general terms as "when a site is sufficiently dominant for a given search," but I've never been able to pin down the specifics. There were a few times when I thought I had figured it out... things like enough multiple pages ranking in the top ten or twenty or fifty when you turned off the dupe filter... but I always found exceptions to whatever pattern I tried.
I have found the links Google chooses tell me a little bit about what Google might look at when it looks at links....
Eg, it's been said that, in case of multiple links on page A to page B, Google will only look at the first link it finds on page A. At least in the Sitelinks, though, I see that the anchor text to page B from our second link on our home page is what Google is using in one of the Sitelinks. It ignores the first.
On the same results, Google has used the alt text from an image link for Sitelink text...
...and for another of the links it has edited the anchor text of a text navigation link, dropping an adjective and using a word in the nav anchor which overlaps with a word in the filename.
I don't know, though, that Google weights these things for rankings, but you can see a lot of factors at work.
Can i throw in an example here which might help...
< On one particular search > there is a sitelink showing for "d=19&Itemid=36" now its safe to say this 'phrase' as link text won't be on the site itself.
What is also interesting is that the sitelink in question links to a page that isnt even cached.
[edited by: tedster at 7:25 pm (utc) on Mar. 14, 2008]
One of the areas that Google seems to be working on here is 1) how to label those sitelinks as well as 2) how to choose the links and 3) when to award them. Using the query string as anchor text is certainly not user-friendly!
Co-occurence data.
Interesting thought. A client has Sitelinks for a search that contains a collective noun which semantically encompasses all of the links. It's as if, say, the Sitelinks displayed include the words Red, Green, Blue, Yellow, Orange, etc, and we get Sitelinks on a search that includes the word "colors". I hadn't thought the site was particularly well optimized for the collective noun, and was surprised at the Sitelinks on this search.
What is also interesting is that the sitelink in question links to a page that isnt even cached.
That seems to point to traffic data, doesn't it? I know that some ISPs (many?) are now selling traffic data, and I wonder if Google is buying. Of course, one isolated case could just be a hiccup - I'll definitely be watching for other examples.
(WebmasterWorld search):
co-occurrence data [google.com]
how many people have figured it out
1) The site links seem to be predictive of what people doing the search might be expected to be looking to find on the site, based on query analysis using historical query statistics.
2) The site links aren't always staying the same in subsquent searches, based on what seems to be the users' other queries and query expansions and refinements are during their "searching sessions."
(#1 and #2 are mentioned in another patent I'd have to dig out.)
3) The connection of the phrases in the site links doesn't appear to be dependent on presence on the page or in the site's navigation (or inbound links) - tf/idf, but the relevance could be assessed with a high degree of accuracy/predictability by analyzing the co-occurrence data.
An information retrieval system uses phrases to index, retrieve, organize and describe documents. Phrases are identified that predict the presence of other phrases in documents. Documents are the indexed according to their included phrases. Related phrases and phrase extensions are also identified. Phrases in a query are identified and used to retrieve and rank documents. Phrases are also used to cluster documents in the search results, create document descriptions, and eliminate duplicate documents from the search results, and from the index.
<side issue>
4) Experimenting with #1 and/or #2 in one of their "betas" that they do could theoretically have had some glitch resulting in that unexpected #6 "penalty" mess.
</side issue>
Take a look at the search results for just the one word search chrysler [google.com]. Aha, but then take a look at the search results for chrysler truck [google.com] (or trucks).
Chrysler doesn't make trucks under their brand name, but Dodge is their brand, owned by them, and Dodge does. But I couldn't find references on the Chrysler site to trucks, and didn't find links to the Dodge site anywhere either.
Even though the terms (including brands) aren't at all prominent in the sites' navigation, there's a statistically large co-occurrence, both first-order and second-order, all over the web; like with dealers selling both brands, those brands co-occur on their sites, so the instances of occurrence of both Chrysler and trucks together on sites/pages would be high in number.
What really caught my interest was the site link for muscle cars in the Dodge listing (in the search result for chrysler truck), which really doesn't pertain to trucks. But - I doubt if it's possible to find a website about muscle cars that doesn't make mention, probably prominently, of Dodge and/or the individual models that qualify as muscle cars. Then, there's some reference on the Dodge site to Chrysler, LLC and Chrysler financial - but not any noticeable linking going on.
Looking at the results page itself, the one word search has related searches on bottom (query refinement), and a couple of results for news. But the two word, more specific search, has all the bells and whistles, like product search and photos. Another thing is that the results don't remain the same. Many hours ago when I checked, I got a site link for a financial term for Chrysler search that isn't there any more when I return.
They're also redirecting and tracking clicks for each and every one of the links for those extra bells and whistles, including the site links, and with a big surge of increased appearance of sitelinks following not long after that #6 "bug" it isn't hard to figure that usage statistics are and were in play in a big way.
As far as this specific patent is concerned (in which co-occurrence is prominently made reference to throughout - as in all of that group of related patents), it gets kind of specific toward the end of the description section and gives a fairly thorough explanation of a few things.
[Note: Ordinarily I wouldn't post specific search terms or results, if only for privacy considerations, but this is a highly visible, internationally known major corporation that's been around for about 100 years (and I checked with the powers that be beforehand).]
[edited by: Marcia at 9:13 am (utc) on Mar. 16, 2008]
i wonder how much chrysler really appreciates that sitelink that just says "Town".
In the example I cite above where Google edited the anchor text of one of our links, the change was essentially changing something like "Widget Information" to "Information," and in that case, we don't mind. ;)
I'm wondering in a case like Chrysler's, where they potentially have many different sets of site links, what their options are in WMT to set preferences.
Marcia alludes to Google's tracking and redirecting the links. Mouse over them, if you haven't, and take a look at the query strings on the urls. They are tracking a lot of information.
I've now got some myself but those came in the recent tranche. The site that has them is not my most popular nor the one with the most links etc, nor the best build and structure - it does have the best quality natural links though and once again the term it gets the sitelinks for is so obviously the right site to be at #1
There are bound to be several factors but I believe the above to be the crux of the matter
They got beaten out for town & country [google.com] by far when there had to be a choice made. Is that because it appears more together with magazine than car words, numerically in a co-occurrence matrix?
And then, there's the town car [google.com], which is Lincoln (Ford), also co-occurring with cars, vehicles, autos, automobiles, etc. Even the photos for image search are the Lincoln, not Chrysler, though in a search for Lincoln town car there aren't any site links - but then, is that specific enough not to need any assists to further narrow it down for users.
What's interesting is that even though it's Lincoln that's the right choice for relevance for "town car," in those search boxes on top with choices for model, make and location - I see chrysler entered as the default choice even though it isn't Chrysler, it's Lincoln. Is that because I have a Google_tracked history of looking at and for Chrysler vehicles in current/recent searches? Or are they just giving an alternative because the Lincoln search result is right under it?
_________________________________________
Afterthought:
Translated into Google patent-speak, there seems to be a predictive element involved with choice of terms for site links. When the secondary terms (that appear as anchor text in the site links) have co-occured enough times in a co-occurrence matrix together with the search term that triggered the listing with the site links in the search results, it can be statistically deduced that the search term used is predictive of those secondary search terms. In other words, users who did the search could be predicted as being likely to do searches on those secondary terms when refining and narrowing down their searches to be more specific.
For example, when we saw muscle cars as a secondary site link term for Dodge, even though we did a search for Chrysler trucks, couldn't that be because there's so much usage of Dodge and muscle cars together that it's established that they go together? And can't it be predicted that a high percentage of users searching for "dodge" can be predicted as likely to also search for muscle cars?
If they're feasible possibilities, there's no other way to do that statistically without using co-occurrence data because there's no human-like reasoning or logic in a straight mathematical algo.
[edited by: Marcia at 10:49 pm (utc) on Mar. 16, 2008]
I have an example that I've been bleating about on other threads. The site in question has bought hundreds of inbounds with the main target terms in anchor text and has done affiliate deals with firms that are just starting out offering payments for clicks and/or conversions. The outbound URLs are fairly simple in the form http://www.example.com/landingpage.html?aff=123 so they could look to Googlebot like cleanish links.
The result of this is all of the inbounds tell Google that hundreds of sites think it is about "search term" and the fact that it has one way outbounds to some sites that are absolutely on topic makes it the authority for that term.
On Google.com it is #2 for the term and for allinanchor, but on Google.co.uk it is #1 with site links.
It could be just a glitch, it only started on Thursday or it could be indicative of a formula that works. Or I guess it could be something in the interaction of the uk filters with the main algo.
Cheers
Sid
yet a search for google does not show google with site links for a search its totally authoritative for
On the Google Webmaster Help Center page that talks about Sitelinks...
How does Google compile the list of links shown below some search results? [google.com]
...Google says...
We only show sitelinks for results when we think they'll be useful to the user. If the structure of your site doesn't allow our algorithms to find good sitelinks, or we don't think that the sitelinks for your site are relevant for the user's query, we won't show them.
I don't think that Google's site is well-structured for Sitelinks. It may also be that Google doesn't want deep links to its site in its main listing.