|Why are links from "related" sites more valuable?|
It doesn't make sense and I don't think they are
| 9:48 pm on Mar 15, 2005 (gmt 0)|
Everywhere you look you hear people saying how it's so important to get incoming links from other related or "themed" sites. This doesn't make any sense to me, and I don't believe it for a second.
When in doubt, I just look at this type of thing logically. If I have a travel site and I link to a great dog training site, why is my "vote" for the dog training site any less important than a link/vote from another dog related site? Unless a link is coming from a known authority/hub, I think all links are and should be pretty much an equal vote.
If you think about it, the whole thing just doesn't make any sense. The entire web is nothing but links, and who says a site can or should only link to other related sites?
It doesn't make any sense to me. If this is true, it can only be viewed as a ruthless tactic by G to increase peoples' dependence on their search engine in the future. But it still doesn't make any sense to me ...
Yet another reason is that there's no way G can classify certain types of sites - it would just be a huge hole in their algorithm if they gave too much weight only to "themed" links.
For example, we offer a free service that 99% of the users on the Internet might find useful. And thus, we would be happy to have any site on the net link to us, and if they linked to us it would be logical and make sense.
Yes you could attempt to categorize our site into one of a few narrow categories, but there is no "correct" way to categorize our site (or 1000s of other important sites).
Your thoughts? ...
| 10:44 pm on Mar 15, 2005 (gmt 0)|
Ideally get links from related Pages.
| 12:37 am on Mar 16, 2005 (gmt 0)|
[speculation] The theme of pages that link to you is one factor among many that SEs might look at. That wouldn't mean all your links had to come from similar pages, or even a majority. It would be enough that some of them did.
If a thousand sites linked to you from totally random topic areas, the algo would notice, "People seem to like this page." However, it wouldn't give you any support for, "We think this page is about widget-making supplies, but is it really?"
On the other hand, if the SE saw that a more-than-random percentage of the pages linking to you had something to do with widgets, the SE might think, "Hey, there's some clustering here, and it fits what we already thought the page was about." Bingo, you get another point.
As long as you had enough widget-related links for the SEs to spot some clear patterns, the rest of your links could come from anywhere. [/speculation]
More is always better, of course. The best thing about well-themed links is that they can send well-targeted visitors.
| 1:00 am on Mar 16, 2005 (gmt 0)|
|However, it wouldn't give you any support for, "We think this page is about widget-making supplies, but is it really?" |
How does the subject matter of one website give any insight, confirmation, etc. as to the subject matter of another site that it links to?
Isn't that what anchor text is for?
Unless of course you assume that websites only link to other related sites, which is not true and makes no sense.
| 1:16 am on Mar 16, 2005 (gmt 0)|
You're right that one link by itself would give little or no confirmation about a page's topic.
But if, say, 200 or 300 out of 1000 links came from pages/sites that had something to do with your topic, the SEs would notice a pattern in your links. It's the pattern, more than any particular link, that might help to confirm the page's topic and relevance. Relevant links help to strengthen your overall pattern. Off-topic links aren't going to hurt you, though.
| 1:34 am on Mar 16, 2005 (gmt 0)|
I guess what I don't understand is how a search engine can infer the content of another site/page based on the content of the linking site? If 300 travel sites link to my dog training site, are the SEs going to assume my site is travel related? I just don't see how this concept can be given any weight ...
| 1:42 am on Mar 16, 2005 (gmt 0)|
A lot of very very smart people at Google have been devoting a lot of effort to do exactly what you say makes no sense. Believe what you will about how successful they are now, but they clearly will get better at semantic parsing.
| 4:07 am on Mar 16, 2005 (gmt 0)|
if 300 hundred random people tell you John Smith makes the best widgets it means one thing. If 300 widget experts tell you John Smith makes the best widgets that usually carries more weight.
| 4:12 am on Mar 16, 2005 (gmt 0)|
|If 300 travel sites link to my dog training site, are the SEs going to assume my site is travel related? |
Are those travel sites the only links you have? If yes, that would not be a natural pattern for a site about dogs, and would certainly give the search engines mixed messages. "Why are so many travel sites linking to this one? We thought it was about dogs. Why do no other dog sites like this one? What are we missing?"
(Yes, I do have a tendency to anthropomorphize the search engines ... to think of them in almost human terms!)
| 5:34 am on Mar 16, 2005 (gmt 0)|
Doesn't it say on Google's website? I'm 99.99999% sure it does. (Ugh, please don't make me find it- look around in their technology page).
Here's an interesting discussion that veers off into this theme.
Also, Marissa Mayer from Google said they were deprecating inbound links from irrelevant originating pages (in a link dev forum at SES SJ 2003). That is, the inbound still has some power, just not as much.
| 6:26 am on Mar 16, 2005 (gmt 0)|
It still doesn't make much sense to me. I must really be missing something. If that's the case then "general" sites would have low link popularity and low PR.
Take Slashdot for example. Out of the gazillions of sites that link to Slashdot, how many do you think are "technology news" sites? Maybe 1% or less? If you removed the other 99% would slashdot still have massive link popularity and be a PR9 site?
I can think of millions of examples ...
And back to my site which is also a "general" site. 99% of people in the US have the potential to be interested in our service, and any site on the web could/would logically link to us if they liked our site. How is G going to fairly treat our incoming links when literally any link to our site is a "quality" link? Are they only going to give weight to links to our site from other sites that are dedicated solely to "cool free services"?
| 7:05 am on Mar 16, 2005 (gmt 0)|
a lot of this depends on just what the search engine uses from a page to give semantic vector values to the hyperlink itself, what semantic parts of the linking page are associated with the hyperlink itself. In one paper I read I think it was 50 bytes before and after the hyperlink itself, without actual reference to anchor text. 50 bytes is a pretty good approximation for long anchor text. Of course, the search engines could use 75 bytes or 100 or only 25. If the anchor text was only 15 bytes then 35 more bytes of text could be read. Also, if the anchor text leads, and there was a link above your link to a totally unrelated site, then some of that above link's anchor and/or descriptive text could be associated with your link. How would the addition of these other components dilute the link? Well, if in the past 100% of the semantic value came from 50 characters before and after the link itself, then you would get a vector value for each of these terms, and for term combinations that exist in this string. If Title and H1 headings are thrown into the fray, it could be that these would simply be additional component vectors, which could give added weight on additional terms that might be found there, and the original vector components would remain intact, they would be diluted by the increase in the additonal componet vectors.
it may also be that the search engines will use other components of the linking page to give added semantic vectors to the text immediately before and after the hyperlink. they could incorporate page title, and H1 headings for instance. In this case link category pages would more likely give you better semantic vector values than you would get from general link pages. It may even be possible that an additional component(s) would be added, from the general semantic/topological classification of the site as a whole. (I am pretty certain this is not being done and may not be done for quite a while [years], but I do believe this will be the tendency of algorithmic progression.)
I do believe that you should put your efforts towards getting links from pages that contain as much related-to-your-site content as you can get, but I would recommend that you not limit yourself to only getting such links. just lean in that direction.
A SPRINGTIME ASIDE:
Let's say you had an Easter Egg Hunt, with prizes given out for whoever scored the most egg-points:
Yellow eggs = 1 point
Red eggs = 2 points
Blue eggs = 3 points
There are 200 Blue eggs totallying 600 points
There are 300 Red eggs totallying 600 points
There are 600 Yellow eggs totallying 600 points
The blue eggs would represent related sites
The red eggs would represent semi-related sites
The yellow eggs would represent non-related sites
The eggs are hidden randomly about a proper Easter Egg Hunt venue, under leaves, behind bushes, in the mailbox, in the crook of a tree, on bunny websites, on garden websites, on egg pages, on childrens' pages and even perhaps on religious pages. Every participant starts from a different postion.
Now, if you knew where all the eggs were before you started, you could calculate (it would be an extemely difficult calculation) an optimum trajectory through the hunt venue to obtain the most eggs. however, since there are other participants and you do not know their paths, you are likely to just miss out on snatching some of the eggs up. it would be worth 3 times as much effort and time to obtain a blue egg as a yellow egg, but if you found youself in the midst of a bunch of yellow eggs, it would still be worthwhile to snatch them up.
if you went after only blue eggs there's a good chance you would not win the prize, because a lot of others will be going for these also, passing up yellows and reds to get to them. of course, loading up with yellow eggs just might slow you down since you would have to carry a lot more.
Most likely, the winner will have a judicious assortment of various colored eggs, and would very likely be the one who never passed up any egg.
| 8:01 am on Mar 16, 2005 (gmt 0)|
Ok, I told you where to look but it seems you didn't bother, and forced me to do it for you. If you go to the New Orleans pubcon, buy me a couple top shelf drinks and I'll forget about it. ;)
Here it is, straight from Google [google.com]:
|Important, high-quality sites receive a higher PageRank, which Google remembers each time it conducts a search. Of course, important pages mean nothing to you if they don't match your query. |
Ok, the above is a reference to the votes a website receives (quantity). Take note of that, especially the note that quantity (link popularity) isn't necessarily the arbiter of relevance by itself.
|So, Google combines PageRank with sophisticated text-matching techniques to find pages that are both important and relevant to your search. Google goes far beyond the number of times a term appears on a page and examines all aspects of the page's content (and the content of the pages linking to it) to determine if it's a good match for your query. |
There you have it, the other half of the vote coin (quality), the relevance.
|It still doesn't make much sense to me. I must really be missing something. If that's the case then "general" sites would have low link popularity and low PR. |
Take Slashdot for example....
How often do you see Slashdot in the serps for anything?
Re-read my post. There are two components: The vote and the relevance deprecation. If your "general" vote counts for like 30% as much as a "relevant" vote, then you have to get X times as many votes to make up for it. Does that make sense? But that is just PART of the equation of what makes a site jump up in the serps.
Ok, so there's the vote threshold. Then there's the relevance threshold part of the algo that comprises the h1 tags (haha), bold tags, proximity, hilltop, localrank, neighborhood that your link partners put you in, blobity blob and whatnot. Whew!
The last thing I wanted to say is that Google has come a long way from anchor text analysis and simple vote counting. It was Daniel Dulitz, a Google engineer, who explained fairly well that there are many many sophisticated things going on beyond the simple process of gathering links and subsequently making the toolbar light up. Here is the interview in case you want to read it [e-marketing-news.co.uk].
| 8:42 am on Mar 16, 2005 (gmt 0)|
"why are related sites more valuable"
I would say because they give Google a context within to put the anchor text...
Consider a link where the anchor text is "java"
The link appears on 2 pages, 1 where the page title is "Software" and the other "Coffee"
Where does a user searching for "java virtual machine" probably want to go, the software of coffee page?