Forum Moderators: open
From my travels, it seems like analysis of links in a Semantic environment seems to be where its going.
Other things include Markov chains, block level link analysis (semantic stuff), Link Analysis Ranking (LAR)
I also found these particularly relevant:
[informatics.indiana.edu...]
[informatics.indiana.edu...]
Have you any thoughts on how links are going to effect SERPs in the future & how the SEs operate?
J
Could be "block level link analysis" part of the new msn preview?
Google seems to focus more on meaning:
"We want to be able to search and find these [entities] and the relationships between them, rather than you typing in the words specifically" recently Peter Norvig said [eweek.com]
Call me crazy but this is what I think I am seeing.
I was always wondering what the 'related:'-search command was good for, since it mainly seemd to mirror odp-structures. Now, with Menczer basing their "semantic-similarity"-coefficient on distance within the odp-tree (which is not a tree), I get the feeling that such issues have been incorporated in some way into the ranking-algos, thereby lowering the impact of backlinks and PR. This would explain why - as pointed out elsewhere - so many dmoz-dupes and pseudo-directories polluted recent search results on less-competitive terms. If this be the case i would estimate current state of google findings as a first approach to such cluster-analysis, imperfect, and to be improved in the near future.
What I found unsatisfactory was that to my opinion it was not always clear in all cases whether Menczers analysis refers to domains or single urls, whether e.g. link-distance refers to domain or URL. For example it sounde to me that in the first paper they investigated only the starting index-page of the 100.000 URLs (domains?) found in the odp. But maybe I just didn't understand correctly.
Considering what has been observed so far in other threads about recent shifts in the serps it seems quite likely that Menczers content-similarity-coefficient or a similar measurement presumably based on probabilistic lexical analysis of page-pairs is now added to the ranking algos and perhaps even TPR. It seems as if content in some manner now influences the way backlinks are valued.
From a linguistic point of view I find it quite problematic that probabilistic issues creep in thru the backdoor 50 years after chomsky's attacks on skinner. This can only be an intermediate state and as soon as possible should be substituted by integrating analysis of sysntactic and semantic features of the natural language of the body-texts right into the html-parsing-algorithms.
Such a shift towards cluster-analysis of lexical distance, odp-structure-distance, link-distance (and maybe even other features) in combination with doc_z's hints on continuous crawls (see [webmasterworld.com...] ) might also indicate that it is a first workaround for the performance and capacity issues on crawling and backlink-evaluating the whole internet, which becomes more and more problematic..
If analysis of the relevance of a website is from now on performed on such a high level of cluster-analysis, I'd come to a conclusion, which goes well in accordance with other observations:
size DOES matter!
If you promote your own website, make sure you begin to cooperate as soon as possible with those you have so far called your "competitors". Build larger units until your cluster becomes an authority of its own or you will vanish. If you write or promote different websites for different customers make sure you concentrate on topic-related parts of your customers until you - as connecting them all - will be viewed as an authority in your region of the net.
I think the good people @ GOOG are learning a lot from other fields than semantics, some very remotely connected to the concept of "search as we know it" eg. behavioral sciences, mechanics, etc. <pun>perhaps even rocket science</pun>
Let's not forget that it's a very academic environment they're working in, so they do actively pursue a lot of research, of which some proportion is applied. Still, it's also very much "hands on", so my best guess when confronted with two ranking methods is: Simplicity wins.
Semantics and linguistics is of course interesting when considering the topic of a page, and i'm sure you can also derive quite complicated algorithms to determine if a page is really about something or just mumbo-jumbo. Still, there's differences: Topic extraction should be orders of magnitude simpler than "natural text" validation, especially as what's natural text on the web would not be natural text in a textbook, newspaper, or post-it note. And then, there's natural differences between info sites, e-shops, artistic sites, entertainment sites, news sites, and the lot.
Where semantics fail is in the search box. Enter one to three words, some perhaps even ambiguous: I enter "orange widgets" meaning widgets for oranges, and i get widgets for apples that are orange coloured. Then again, i might be totally wrong here. I'm quite sure that "they're doing something" here as well. Without that, we would never have gotten the define tool or the calculator.
>> Have you any thoughts on how links are going to effect SERPs in the future & how the SEs operate?
That's a very very ....i'd say extremely broad question. Anyway, that's just semantics, as the saying goes. Personally, i've got a lot of thoughts about this, but most will be off topic for the thread headline, so i'll stick with links, SERPS and the present, as that's a more limited scope.
I think most readers of this forum can agree that "links are not just links", ie. some links are worth more than others. I'm not thinking about high-PR versus low-PR links here, rather it's more along the thread topics of (all fictional):
- "does links from links.html count?"
- "i see pr0n sites in my log stats"
- "does it pay to pay for links"
- "sitewide links vs. one high PR link"
- "are links s*ndb*xed?"
- "my 73,000 backlinks doesn't show anymore"
- "does Amaz*n benefit from aff links?"
- "anybody know a good recip/directory/GB/[insert word here] script?"
- "how to find good link partners?"
- "how to write a javascript link?"
- "what's the current rate for a PR "X" link?"
I'm sure you've seen something similar to the above somewhere close recently. If that massive interest in linking is not a (very very... i'd say extremely) big red flag to a business based on the value of links, well...
So, i think it would be very stupid for the good people @GOOG Corp not to consider making some differences to the way links are treated and assigned importance. I also think.. that is, i don't think, that these people are stupid.
I recall that some months ago i complained publicly in these forums about how easy "it" had become (and i wasn't the only one to do that). There was a period in which anchor text was essentially all it took to get rankings, and it was so obvious that in hindsight i think it was "too obvious" ie. something was brewing in the back office.
I'm probably totally wrong about this, as it would imply that they had a period where something else on the inside had very much focus, and demanded some attention shifts away from serps for a while.
So, to add insult to injury, here's some wild speculation: I think we see the first weak signs that links (and pages/sites that give them as well as receive them) are not treated totally equal these days. Also, i think this trend will continue.
---
Added: Markov chains? That reminds me so much of some of the best classes i had back in the school days. Haven't seen them applied anywhere since then, though.
FACK. Nevertheless we should keep in the back of our minds that google claims to heavily work on such issues as automated translation and I suppose the insights gained on those fields will from now on continuously improve the ranking algos.
> I think we see the first weak signs that links ..are not treated totally equal these days. Also, i think this trend will continue.
Why are you calling this a "wild speculation?"
> simplicity wins
Yo. So I'd regard the "lexical similarity" - issue a good place to start with.
Does anyone know whether any source code executing that vector space analysis and discrete cosine transformation is available somewhere in the net? I don't think it is wasted time to do some empirical research on whether such an algo might explain the reported shifts in ranking.
As to the paper synergy pointed to:
Reading about the analogies of growing social and internet networks I immediately had to think of what happens to the brain in the first two years of a child. It is a period of massive growth of synapses which almost comes to an end around the second birthday. From then on new links still emerge but only on the costs of others. I think the internet-linkomania will also come to an end the next years.
If google now switches over to an added lexical analysis this would perfectly correspont to what in research on first language acquisition is called the "first word spurt" and which is said to happen between the 18th and 24th month of life.
another good point to start with would to me thus be to swallow the results of the keyword-suggestion tool as long as we can, because I think in the near future it will be faked like the link:-command and TPR.
> Markov chains? .. back in the school days. Haven't seen them applied anywhere since then, though.
I saw them applied in Shea/Wilson's "illuminatus" Don't spit on the floor. lol. And reading that was the reason why I paid no attention to vector spaces and statistics back in those schooldays, so:
How the hell again do you calculate the cosine value between two vectors in more than three dimensions? I stared at these formulars and felt like an idiot. for dummies please.
Seems as if this thread has been too academic from the start. PDFs are dead ends. What a pity.