homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Visit PubCon.com
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

Sitelinks Algo - anyone figured it out?

 11:39 pm on Mar 12, 2008 (gmt 0)

I see site links on certain search results in google.
Mulitple links showing categories within a site.

Many times they appear on larger corporate sites, but at times they show up on niche sites.

Has anyone come close to breaking the algo on this one?

I am not asking anyone to give it up, rather I want to find out how many people have figured it out.




 11:53 pm on Mar 12, 2008 (gmt 0)

I sure haven't figured it.

To begin with, I'd say the Sitelinks algorithm is still a work in progress. At the end of February, the criteria seemd to get a lot easier - see: Google Adds Sitelinks for a New Batch of Websites [webmasterworld.com]. But I will share some observations.

If your domain is NOT made of generic keywords, then you've got an easier road. The more disambiguation the search query needs, the less likely it is to see Sitelinks, as I see it. But it's still very possible. I've got one client whose business name and domain name is made up of three one syllable words, concatenated. They now get Sitelinks on a search for those three very common words, even with spaces.

Home pages with a short main menu seem to do better. The simple site structure makes life easier for the algo, I assume. However, Sitelinks are not restricted just to the menu links. Recently a different client published an article that got significant direct backlinks - and that article now appears as a Sitelink.

That last observation points either to backlinks or traffic data, I'd say.

[edited by: tedster at 5:00 am (utc) on Mar. 13, 2008]


 12:16 am on Mar 13, 2008 (gmt 0)

i would like to know how they determine the anchor text for sitelinks.
we had a customer last year that showed sitelinks for a search on "example.com", where "Example" was the open source project name (at example.net), the commercial entity name (Example, Inc), the protocol name (Example XYZ) and the client program name.

one of the sitelinks used "XYZ" for the anchor text and another used "Clients" for anchor text.
this wasn't semantically clients as in paying customers but rather server clients.
everything that i could find on the site or inbound anchor text used "Example Clients" and "Example XYZ".
never could find the source...


 3:08 am on Mar 13, 2008 (gmt 0)

There was an answer on YouTube about this from Matt Cutts.

google... matt cutts site:youtube.com

and you should find it.


 3:32 am on Mar 13, 2008 (gmt 0)

i'm sure all three of us (minnapple, tedster and certainly myself) have been through the matt cutts sitelink video [mattcutts.com] numerous times.
if i missed something specific that answers any of the questions above, please point out where in the video you found it.
i can assure you matt isn't giving up the algo to anything...


 8:59 am on Mar 13, 2008 (gmt 0)

i'm sure all three of us (minnapple, tedster and certainly myself) have been through the matt cutts sitelink video numerous times.
if i missed something specific that answers any of the questions above, please point out where in the video you found it.
i can assure you matt isn't giving up the algo to anything...

I will try and find it [but need sleep first]. I saw it a few weeks ago on youtube and Matt was talking about sitelinks. He didn't go into too deep, but did give an idea how it get it to work. Kinda.


 3:57 am on Mar 14, 2008 (gmt 0)

Perhaps it uses the toolbar data to some extent? Such as the amount of traffic a site gets by direct URL type-in. Searching for the company name in Google also seems to have some influence on whether a site will get sitelinks. These two factors are telling of a company whose name and URL are familiar to a large audience and speaks authority.

[edited by: McMohan at 3:57 am (utc) on Mar. 14, 2008]


 4:18 am on Mar 14, 2008 (gmt 0)

Co-occurence data.


 4:28 am on Mar 14, 2008 (gmt 0)

Interesting observation here - this site represents a seasonal business that just started warming up. Last year, they had Sitelinks for some pretty juicy generic terms, but that was during "the season". Those Sitelinks disappeared along with the warm weather.

Now the season is beginning, and so is their traffic. And Sitelinks just reappeared this week!

Robert Charlton

 8:01 am on Mar 14, 2008 (gmt 0)

anyone figured it out?

I've described the qualifier for Sitelinks in general terms as "when a site is sufficiently dominant for a given search," but I've never been able to pin down the specifics. There were a few times when I thought I had figured it out... things like enough multiple pages ranking in the top ten or twenty or fifty when you turned off the dupe filter... but I always found exceptions to whatever pattern I tried.

I have found the links Google chooses tell me a little bit about what Google might look at when it looks at links....

Eg, it's been said that, in case of multiple links on page A to page B, Google will only look at the first link it finds on page A. At least in the Sitelinks, though, I see that the anchor text to page B from our second link on our home page is what Google is using in one of the Sitelinks. It ignores the first.

On the same results, Google has used the alt text from an image link for Sitelink text...

...and for another of the links it has edited the anchor text of a text navigation link, dropping an adjective and using a word in the nav anchor which overlaps with a word in the filename.

I don't know, though, that Google weights these things for rankings, but you can see a lot of factors at work.


 2:31 pm on Mar 14, 2008 (gmt 0)

Hi, I am new to WebmasterWorld, but you might recognise me from the other seo forums maybe...

Can i throw in an example here which might help...

< On one particular search > there is a sitelink showing for "d=19&Itemid=36" now its safe to say this 'phrase' as link text won't be on the site itself.

What is also interesting is that the sitelink in question links to a page that isnt even cached.

[edited by: tedster at 7:25 pm (utc) on Mar. 14, 2008]


 7:27 pm on Mar 14, 2008 (gmt 0)

Welcome to the forums, soxos. Interesting report - looks like the Sitelinks algo still has some rough edges.

One of the areas that Google seems to be working on here is 1) how to label those sitelinks as well as 2) how to choose the links and 3) when to award them. Using the query string as anchor text is certainly not user-friendly!


 9:06 pm on Mar 14, 2008 (gmt 0)

Its not my site btw!

All else I can say is that i've search "the site" in G and the backlinks in Y and I can't find the strange siteline link text "d=19&Itemid=36" anywhere.

Robert Charlton

 9:23 pm on Mar 14, 2008 (gmt 0)

Co-occurence data.

Interesting thought. A client has Sitelinks for a search that contains a collective noun which semantically encompasses all of the links. It's as if, say, the Sitelinks displayed include the words Red, Green, Blue, Yellow, Orange, etc, and we get Sitelinks on a search that includes the word "colors". I hadn't thought the site was particularly well optimized for the collective noun, and was surprised at the Sitelinks on this search.


 9:34 pm on Mar 14, 2008 (gmt 0)

What is also interesting is that the sitelink in question links to a page that isnt even cached.

That seems to point to traffic data, doesn't it? I know that some ISPs (many?) are now selling traffic data, and I wonder if Google is buying. Of course, one isolated case could just be a hiccup - I'll definitely be watching for other examples.


 11:29 pm on Mar 14, 2008 (gmt 0)

I own blue-widgets.uk.com and have recently been given sitelinks for a three phrase term.

However in the more results link it points to site:uk.com producing results from all other domains on the .uk.com extension.

So like you say its still a bit buggy!


 12:04 am on Mar 15, 2008 (gmt 0)

welcome to WebmasterWorld [webmasterworld.com], soxos!

(WebmasterWorld search):
co-occurrence data [google.com]


 6:25 am on Mar 16, 2008 (gmt 0)

how many people have figured it out

I wouldn't even consider presuming to have it figured out, but there are some clues out there and I've got a few well-developed suspicions and theories.

1) The site links seem to be predictive of what people doing the search might be expected to be looking to find on the site, based on query analysis using historical query statistics.

2) The site links aren't always staying the same in subsquent searches, based on what seems to be the users' other queries and query expansions and refinements are during their "searching sessions."

(#1 and #2 are mentioned in another patent I'd have to dig out.)

3) The connection of the phrases in the site links doesn't appear to be dependent on presence on the page or in the site's navigation (or inbound links) - tf/idf, but the relevance could be assessed with a high degree of accuracy/predictability by analyzing the co-occurrence data.

An information retrieval system uses phrases to index, retrieve, organize and describe documents. Phrases are identified that predict the presence of other phrases in documents. Documents are the indexed according to their included phrases. Related phrases and phrase extensions are also identified. Phrases in a query are identified and used to retrieve and rank documents. Phrases are also used to cluster documents in the search results, create document descriptions, and eliminate duplicate documents from the search results, and from the index.

Phrase-based generation of document descriptions

<side issue>
4) Experimenting with #1 and/or #2 in one of their "betas" that they do could theoretically have had some glitch resulting in that unexpected #6 "penalty" mess.
</side issue>


 8:01 am on Mar 16, 2008 (gmt 0)

4) Experimenting with #1 and/or #2 in one of their "betas" that they do could theoretically have had some glitch resulting in that unexpected #6 "penalty" mess.

This is the first plausible theory i've heard that seems to make sense of the #6 issue.
Nice work!


 9:11 am on Mar 16, 2008 (gmt 0)

I think a good example can be seen with a search on a major corporate site that has very little on their site or navigation as far as keywords go, though they would have the inbound linkage and authority status for the company name.

Take a look at the search results for just the one word search chrysler [google.com]. Aha, but then take a look at the search results for chrysler truck [google.com] (or trucks).

Chrysler doesn't make trucks under their brand name, but Dodge is their brand, owned by them, and Dodge does. But I couldn't find references on the Chrysler site to trucks, and didn't find links to the Dodge site anywhere either.

Even though the terms (including brands) aren't at all prominent in the sites' navigation, there's a statistically large co-occurrence, both first-order and second-order, all over the web; like with dealers selling both brands, those brands co-occur on their sites, so the instances of occurrence of both Chrysler and trucks together on sites/pages would be high in number.

What really caught my interest was the site link for muscle cars in the Dodge listing (in the search result for chrysler truck), which really doesn't pertain to trucks. But - I doubt if it's possible to find a website about muscle cars that doesn't make mention, probably prominently, of Dodge and/or the individual models that qualify as muscle cars. Then, there's some reference on the Dodge site to Chrysler, LLC and Chrysler financial - but not any noticeable linking going on.

Looking at the results page itself, the one word search has related searches on bottom (query refinement), and a couple of results for news. But the two word, more specific search, has all the bells and whistles, like product search and photos. Another thing is that the results don't remain the same. Many hours ago when I checked, I got a site link for a financial term for Chrysler search that isn't there any more when I return.

They're also redirecting and tracking clicks for each and every one of the links for those extra bells and whistles, including the site links, and with a big surge of increased appearance of sitelinks following not long after that #6 "bug" it isn't hard to figure that usage statistics are and were in play in a big way.

As far as this specific patent is concerned (in which co-occurrence is prominently made reference to throughout - as in all of that group of related patents), it gets kind of specific toward the end of the description section and gives a fairly thorough explanation of a few things.

[Note: Ordinarily I wouldn't post specific search terms or results, if only for privacy considerations, but this is a highly visible, internationally known major corporation that's been around for about 100 years (and I checked with the powers that be beforehand).]

[edited by: Marcia at 9:13 am (utc) on Mar. 16, 2008]


 9:30 am on Mar 16, 2008 (gmt 0)

ok this is probably a pretty close example of the weirdness i was referring to in my first post in this thread.
the search in this case would be for chrysler.com [google.com].
i wonder how much chrysler really appreciates that sitelink that just says "Town".
google still has some work to do before they can claim sitelinks have the most relevant anchor text.
(or maybe the ampersand messed things up in this case.)

Robert Charlton

 6:03 pm on Mar 16, 2008 (gmt 0)

i wonder how much chrysler really appreciates that sitelink that just says "Town".

In the example I cite above where Google edited the anchor text of one of our links, the change was essentially changing something like "Widget Information" to "Information," and in that case, we don't mind. ;)

I'm wondering in a case like Chrysler's, where they potentially have many different sets of site links, what their options are in WMT to set preferences.

Marcia alludes to Google's tracking and redirecting the links. Mouse over them, if you haven't, and take a look at the query strings on the urls. They are tracking a lot of information.


 6:34 pm on Mar 16, 2008 (gmt 0)

I believe the main criteria is that the site appears to be the 'natural home' of the query.
That's why so many corporates etc originally got them.
I always assumed it was link, PR, traffic etc based but I know one site that had them quite a while ago, not that much traffic only PR4 but it was so obviously the site to return for the 2 word search !

I've now got some myself but those came in the recent tranche. The site that has them is not my most popular nor the one with the most links etc, nor the best build and structure - it does have the best quality natural links though and once again the term it gets the sitelinks for is so obviously the right site to be at #1

There are bound to be several factors but I believe the above to be the crux of the matter


 8:49 pm on Mar 16, 2008 (gmt 0)

I wonder if the ampersand in "Town & Country" triggered a bug in the code that creates the link text.


 10:13 pm on Mar 16, 2008 (gmt 0)

Town is ambiguous, they were lucky to get it at all.

They got beaten out for town & country [google.com] by far when there had to be a choice made. Is that because it appears more together with magazine than car words, numerically in a co-occurrence matrix?

And then, there's the town car [google.com], which is Lincoln (Ford), also co-occurring with cars, vehicles, autos, automobiles, etc. Even the photos for image search are the Lincoln, not Chrysler, though in a search for Lincoln town car there aren't any site links - but then, is that specific enough not to need any assists to further narrow it down for users.

What's interesting is that even though it's Lincoln that's the right choice for relevance for "town car," in those search boxes on top with choices for model, make and location - I see chrysler entered as the default choice even though it isn't Chrysler, it's Lincoln. Is that because I have a Google_tracked history of looking at and for Chrysler vehicles in current/recent searches? Or are they just giving an alternative because the Lincoln search result is right under it?


Translated into Google patent-speak, there seems to be a predictive element involved with choice of terms for site links. When the secondary terms (that appear as anchor text in the site links) have co-occured enough times in a co-occurrence matrix together with the search term that triggered the listing with the site links in the search results, it can be statistically deduced that the search term used is predictive of those secondary search terms. In other words, users who did the search could be predicted as being likely to do searches on those secondary terms when refining and narrowing down their searches to be more specific.

For example, when we saw muscle cars as a secondary site link term for Dodge, even though we did a search for Chrysler trucks, couldn't that be because there's so much usage of Dodge and muscle cars together that it's established that they go together? And can't it be predicted that a high percentage of users searching for "dodge" can be predicted as likely to also search for muscle cars?

If they're feasible possibilities, there's no other way to do that statistically without using co-occurrence data because there's no human-like reasoning or logic in a straight mathematical algo.

[edited by: Marcia at 10:49 pm (utc) on Mar. 16, 2008]


 4:02 pm on Apr 6, 2008 (gmt 0)

yet a search for google does not show google with site links for a search its totally authoritative for.


 7:09 pm on Apr 6, 2008 (gmt 0)


I have an example that I've been bleating about on other threads. The site in question has bought hundreds of inbounds with the main target terms in anchor text and has done affiliate deals with firms that are just starting out offering payments for clicks and/or conversions. The outbound URLs are fairly simple in the form http://www.example.com/landingpage.html?aff=123 so they could look to Googlebot like cleanish links.

The result of this is all of the inbounds tell Google that hundreds of sites think it is about "search term" and the fact that it has one way outbounds to some sites that are absolutely on topic makes it the authority for that term.

On Google.com it is #2 for the term and for allinanchor, but on Google.co.uk it is #1 with site links.

It could be just a glitch, it only started on Thursday or it could be indicative of a formula that works. Or I guess it could be something in the interaction of the uk filters with the main algo.




 8:36 pm on Apr 6, 2008 (gmt 0)

Forgot to mention. If I search for their domain name they don't get site links but are #1.



Robert Charlton

 9:34 pm on Apr 6, 2008 (gmt 0)

yet a search for google does not show google with site links for a search its totally authoritative for

On the Google Webmaster Help Center page that talks about Sitelinks...

How does Google compile the list of links shown below some search results? [google.com]

...Google says...

We only show sitelinks for results when we think they'll be useful to the user. If the structure of your site doesn't allow our algorithms to find good sitelinks, or we don't think that the sitelinks for your site are relevant for the user's query, we won't show them.

I don't think that Google's site is well-structured for Sitelinks. It may also be that Google doesn't want deep links to its site in its main listing.

Global Options:
 top home search open messages active posts  

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved