Forum Moderators: open
maybe some of the phds at google should brush up on heisenberg.
[google.com...]
it is simply not possible to make an objective ranking of sites using however fancy algorithm google likes without expecting commercial sites (and non-commercial sites) to manipulate the algorithm to boost rankings.
think about it, tops spots convert to more traffic and more sales and dollars in the bank. and then for google to call such attempts "spam" or allude that there is something evil about this is the height of arrogance or naive stupidity.
no matter how fancy the algorithm it can still be tested and worked. it is only a matter of time before googles gives up and switches to paid positions.
Excellent point. If this were possible, a competitor could sabotage competition by arranging for links to that competitor's site that would lower its ranking. At most Google should just give no benefit for some links. The problem with theming is that any algo will sometimes fail to recognize the theme connection between a linking page and the linked to page. Theming in theory is nice, but doing it well in practice is far more difficult.
Surely, it would be straightforward to allow users to choose. Google could place a slide bar on the front end of the search engine. They could then monitor what users found to be best setting (because users would tend to use the same setting if previously successful). Having discovered what users feel to be the best settings, the slidebar could be gradually recalibrated so that the default position (center) was the user's favorite.
Does this make sense to people?
"Google meets Gödel"
Great alliteration, tedster
Formally, Gödel's incompleteness theorem is merely answer to second Hilbert's problem: Can it be proven that the axioms of logic are consistent? Or, in popular form - whether mathematics is "complete" (in the sense that every statement can be either proved or disproved). Gödel indicated that the answer is "no," in the sense that any formal system interesting enough to formulate its own consistency can prove its own consistency iff it is inconsistent. Informally, it means that all consistent systems powerful enough include undecidable propositions. Obviously, the undecidable propositions can be desided only in terms of more powerful systems that include the original system as a special case.
Apparently, Google Page Rank is an example of undecidable proposition, because:
1. It's not easy to calculate conclusive and final PR. Indeed, every month for many years Google tries to calculate it, and every month for many years - different results!
2. It's too difficult ('cause no one knows where it's exactly located in distributed system) and too dangerous ('cause of High Voltage! 110V) to adjust PR by hand.
Frankly, the PR problem can be reduced to 1 bite (yes/no) question: Is it better for an average customer to have the PR system, than have no PR at all? So, to be or not to be, - that is the question about Larry Page's Rank (sorry, William & Larry)
Alas, according to Gödel, no one of Google top executives, including distinguished GoogleGuy, can answer this 1 bite question, 'cause they are 100% inside Google. And the same about noble knights of the SEO road, 'cause they are 99% soaked in Google. Also, there can be no help from professional Gödelysts (logicians), 'cause their activity is focused only around three points of great theoretical significance:
1. That's impossible!
2. That's ridiculous!
3. That's cool!
Fortunately, there are pragmatic Gödelysts, who’re concerned about finding that sufficiently powerful system that includes Google as a special case and capable of answering the main PR question? So, to be pragmatic, you should find or create such a system. A handy check-list given below may be used in such an endevour.
A preliminary list of bodies and organs possibly capable of answering the main Gödelian proposition about Google PR:
1. Supreme Court
2. US Congress
3. Elizabeth II
4. Wall Street
5. Investors who hold Google stocks
6. Investors who don't hold Google stocks
7. #5 together with #6
8. Subway passengers
9. UFO passengers
10. #8 together with #9
11. WebmasterWorld
12. Other. Please, specify: ...
I'm anxious to see what's coming next: KAM theorem, hopf algebra, super membrane theory, WARP drive, ...
How about osmosis, the gas laws and semi-permeable membranes?
PS
For those of us who have been out of academia for 20 years or more, when quoting laws by number, could you summarise the law in brackets. I can still remember Newton's laws of motion, but I have no idea which is the first, etc.
note that I am differentiating page rank from result ranking. result ranking is far more important than page rank.
if the forum moderate permits Ill post more details....
I'm anxious to see what's coming next: KAM theorem, hopf algebra, super membrane theory, WARP drive, ...
There's always the Law of Reciprocity. That entails mutual effort and benefit, as in we pay a fee and we get advertising or listings. Google's application of the Law of Reciprocity is called AdWords.
(1 joule = 1 newton x 1 metre, or 1J=1Nx1M=1Nm)
So in google terms, PR5 = 1000000 guest book links, because the weight of the object being shifted is directly proportional to the energy of the work being done, -;
I think ...
I am in two minds as to post the methodology or not, if its an inherent limitation of the page rank algorithm then google might not be able to fix it. then again with 50+ phds if they cant or havent already fixed it then something is seriously wrong.
I would recommend to publish the technique.
If there is currently a bug, then you would force Google to fix it. It seem to be complicated to build the necessary structures (pages/sites). Also, it would take time until there is an effect in the ranking, especially because it is related to off-page factors (PR and/or anchor text). Therefore, there would be time enough for Google to solve the problem. I doubt that it would be impossible to fix it.
Also, if there is a serious problem, I'm sure that someone else will discover it or is already influencing the results.
I still doubt that there is a problem. However, I'm interested in seeing the method you used.
Here's another couple of thoughts..
"Chaos Theory" - although *small* changes usually get dampened down, when large sites go offline the results can be profound. Update Cassandra appeared to be missing the "ODP effect" due to server problems at the ODP. Changes in other high PageRank sites can cause substantial ripples. Or is PageRank more robust than that?
"The Traveling Salesman Problem" - it's well known that it becomes computationally infeasible to generate an optimum route between involving many points in a sufficiently complex system. PageRank is a little like that. If you have to work out three billion pages and their relationships to one another, you're looking at 9,000,000,000,000,000,000 (9 quintillion) complex calculations, which presumably you would need to run multiple times to eventually end up with a stable figure. Is this feasible for Google?
The Heisenberg comparison though is interesting, but I don't think accurate. It's not the act of measuring PageRank that changes that changes it, it's the actions of those who analyse PageRank. Which leads me to..
"Evolution Theory". For many web sites, Google is the predominant environmental factor causing them to succeed or fail. Websites that are not optimised for Google fail, those that are, may survive. As Google changes its algorithm, it will shake up the process, but surviving sites will be more responsive to Google's changes. However, the are other evolutionary niches, such as sites optimised for Inktomi. A sudden shift from a Google environment to an Inktomi environment could sideline those sites purely designed to live in a Google world. Think about small marsupials.. perfect adapted to live in their native environment.. until someone introduces cats. ;)
Chaos Theory
PR calculation isn't a chaotic system (non-linear dynamical system) - it's a linear system without dynamic.
you have to work out three billion pages and their relationships to one another, you're looking at 9,000,000,000,000,000,000 (9 quintillion) complex calculations, which presumably you would need to run multiple times to eventually end up with a stable figure.
No, I already said that you can get the exact solution in at most three billion steps.
Details came from one of the programmers that worked for
the SEO company that did the test, which has since shut down.
Approx 50 top level domain where used with around 100 subdomains on each one.
Each subdomain was setup as a unique web site with a random number of pages. Each sudomain was internally linked to maximise pagerank on the index page. All subdomains where randomly interlinked.
The actual pages where generated on the fly using templates and english markov chain text generators and random image generators. Generating fresh thematical gramatical english content.
Every subdomain linked to a target webpage using the same target keyword phrase.
Now you can see from the structure that something like 5000+ links would have been added to the target web page pushing its search results through the roof for the target search phrase.
The result was that the target webpage disapeared from page 1 down to some lower page. When the target links where changed to a different site, at the next update the original site returned to the first page and the new target site dropped from page 1 like a lead balloon.
While I don't think it was the result they where after it certainly leads to disturbing implications does it not?
Now one would think that google might be interested in this effect and investigate but I was told that google simply denied it was possible and pr0'ed the test pages and the site that did the test.
I am not privy to the actual ranking movements, but from the above orders of magnitude it looks to me like the effect is that a tiny negative penalty thas is somehow transmitted to the target page from each spam site and that the cumulative effect of a large number of 'spam sites' was downwards ranking movement for the target phrase. I didnt think to ask if anything happened to the actual pagerank, its pretty irrelavent given the actual result.
Obviously this is a serious flaw in googles structure. If google haven't fixed it then maybe these details being made public might prod them to do so.
does anyone actually have the resources or inclination to test this?
doc_z, your statement, "PR calculation isn't a chaotic system (non-linear dynamical system) - it's a linear system without dynamic," ignores a fundamental point. The data set is dynamic. The system is chaotic. Actual PR is changing by the second, and sometimes in very significant ways. When the PR is calcuated it may be on a fixed data set, but that data set is already invalid. One of the problems with estimated PR.
Does the measurement perturb the system? Probably not in the quantum sense, but certainly knowledge of the measuremnt does, so the system is altered in a socioligical sense.
WBF
Adding to the list in post #34, I submit "Superstring Theory"
A professor of theoretical physics said of it :-
"Ironically, the superstring equations stand before us in perfectly well-defined form, yet we are too primitive to understand why they work so well and too dim witted to solve them. The search for the theory of the universe is perhaps finally entering its last phase, awaiting the birth of a new mathematics powerful enough to solve it."
A bit like that with Google, we are too dim witted to solve their algo.
The data set is dynamic. The system is chaotic.
Even if the system would be dynamic, it wouldn't be a chaotic system. There is just a set of linear equations. Perhaps it looks chaotic, but from a mathematical point it isn't. Also, just because the links are changing, I wouldn't call it a dynamic system. (Of course, the system has changed, but nothing more.)
random walk
The result was that the target webpage disapeared from page 1 down to some lower page. When the target links where changed to a different site, at the next update the original site returned to the first page and the new target site dropped from page 1 like a lead balloon.
As said before, you don't decreased PR, but you changed the ranking. However, even this shouldn't be possible. Obviously, it's hard to reproduce the results one by one. Perhaps the main effect can be reached by some easier structure, e.g. without different domains. (One has to try it ...)
deus777, I still have a question: was there an outgoing link from the target page to one of those domains. If yes, that could be part of the problem.
The following question is not directly connected, but does anybody have an explanation why a search for BestBBS yields www.bestbbs.com only on #2. It seems that anchor text makes the difference, even though BestBBS has lots of them.
Great philosophical discussion here.
Dynamic System
Linear/Nonlinear System
Chaotic System
Functional Equation/System
When the target links where changed to a different site, at the next(!) update the original site returned to the first pageObviously deus777 uses (unconsciously) a wrong functional equation PR( m ) = F( m ) and, as a result, draws wrong conclusions from his/her data. Knowing the time lag (3 months) in functional equation may draw an opposite conclusions from the same data.
Q: Why do we need all those academic definitions?
A: Theory of systems is a well developed and described science. If you know exactly what sort of a system you are dealing with, you may just open a textbook on a proper page and learn 90% of system properties in 'ready to eat' form. Remaining 10% you already know from your experience. And no guess, no inverse engineering. Cool!
There's been some speculation that this type of penalty is being applied with the current update. When were these tests performed?
Also is there any way the penalized domains could be associated with the domains linking to them? doc_z asked about links back, what about registration details for the domains?
A model which would describe the time evolution of PR would be a dynamic system. Such a model doesn't exist so far. (Although, it could be interesting to develop such a model.) This system is probably non-linear and might show chaotic behavior.
Maybe cellular automata are a better comparison?
(By the way, a thermodynamic equilibrium doesn't mean that the system is static.)
PR calculation isn't a dynamic system...
(no time derivation).
...but the dynamic isn't part of the PR
In principle, you can do the calculation within one step: PR = M^-1 * (1-d).
In particular the time development of PR for the fictitious time k (the calculation process) isn't chaotic.
I find the Page Rank scoring system perfectly compatible with the Second Law of Thermodynamics, which simply means that energy never completely dissipates or disappears...So from that aspect, PR seems to have harnessed the power of the Second Law and has a firm foundation in scientific theory.
...But there's no way we can argue it doesn't have rational foundations.
If iteration process starts with initial values of PR = 1 for all pages, and uses equation:
PR = a + b * (PR ...)
then initial 'energy' (total PR) is merely N, where N is number of pages, and three scenario are possible:
1. if a + b < 1, final energy < N, energy disappears from the system against 1st law of thermodynamics
2. if a + b = 1, final energy = N
3. if a + b > 1, final energy > N, additional energy appears in the system against 1st law of thermodynamics
Founding Fathers selected second case, or more precisely:
a = (1 - d), b = d, so a + b = 1.
In other words, they selected conservation of energy. Actually, the energy in this case may sometime be lost (due to pages with inbound links and no outbound links). However, this effect is small and the system is rather close to conservative case. So, laws of equilibrium thermodynamics (1st, 2nd and many others) may be applied with some restrictions to the case of PR system. Thus, our intuitive feeling of relation between Second Law of Thermodynamics and PR system does have a firm rational scientific foundations based on the choice:
(1 - d) + d =1
In the experiment there was absolutely no linkage back from the target web page and the spam farm. In the second exercise the new target web page was a competitors site and again absolutely no connection between the target page and the farm.
I am told there where some minor issues around the competitor overreacting while in a state of deep shock giving way to profound panic.
My understanding is the issue was never even addressed by google, the SEO in question went away so the problem has only been known by a small number of people.
Without being able to test it I suspect the effect still works. Having thought about it for the last few days, my feeling is now that the effect isn't entirely in the pagerank algorithm but rather in the rankings algorithm.
AFAIK the rankings algorithm uses primarily the pagerank of sites with search phrase text in inbound pages to determine ranking. Should a tiny negative value penalty be carelessly introduced then a very large number of cumulative negatives might have caused a ranking slip for the search terms.
My guess is that google deeemed the websites spam because of the subdomains being interlinked. If the subdomains where set up to transfer maximum pagerank to the top level domain and only these where interlinked and only these pointed to the target web page then they may not have caught the spam filter. A second guess is the rate of introduction of the domains was too fast or the templates where not varied enough.
Anyway thats all speculation.
AFAIK as I know the top level domains where in the same owner name, the target and the farm where on separate ip blocks with different owners. although all the TLDs where on the same ip. all ips in the 1st experiment where on the same ASN.
however in the second experiment there was no relation whatsoever between ips, domain owners or ASNs between the target page and the farm. it could have made an interesting experiment to see precisely what parts of the experiment where mandatory and how far it could be scaled back and still the move the target web page. it appeared the 2nd target page owner became perplexingly somewhat less than cooperative.
In the experiment there was absolutely no linkage back from the target web page and the spam farm. In the second exercise the new target web page was a competitors site and again absolutely no connection between the target page and the farm.
Thanks for answering my question.
Indeed, it looks as if it is possible to hurt the ranking (not PR).
Without being able to test it I suspect the effect still works.
One should look if the problem already appears for simpler conditions, e.g. for a large number of identical links from the same site (as the BestBBS link on the bottom of this page). Perhaps one can fix the cause.
AFAIK the rankings algorithm uses primarily the pagerank of sites with search phrase text in inbound pages to determine ranking.
Currently anchor text seems to be the most important factor.
---------------------------------------------------------
Time derivative is necessary in continuous dynamic system.
I didn't necessarily refer to a continuous system. Time derivation can be understood as continuous derivation as well as a discretized version. Anyhow, there is no (discretized version) of a time derivation in the PR calculation.
If you stop the iterations at k=10, you will see some values of PR ...
Apart from the fact that Google isn't showing these values, I already explained that this is a fictitious (or artificial) time which depend on the algorithm and has no meaning. These are just numerical methods to invert a matrix which also can be done (in principle) in one step.
In this case there is indeed no dynamic. But if you have merely two steps - it'll be already dynamic.
Obviously, there is just one step: PR = M^-1 * (1-d). There is no initial guess for PR, there is just the solution within one step. Therefore, in this case there is no (pseudo) dynamic at all.
Second Law of Thermodynamics
I took this as a metaphor (I never saw this as a real model) and I like this metaphor.
If you want a serious comment:
Even if you neglect the dead ends, there is no conservation law. Neither for the pseudo dynamic during iterations nor for the sequence of updates.
For the update process you can start with nearly any initial PR you want (Strictly speaken, the scalar product of the initial PR vector and the finial solution must be non-zero. For example, this is easily fulfilled for chosing arbitrary, positive initial values.). Therefore, you can chose the initial total PR you want. The final solution will be the same. (In the case of no dead ends, the total PR is the number of pages.)
The sequence of updates is no closed system. You can simply add pages and change the total PR.
Although I like this discussion, I think it is indeed like a random walk (not the most efficient way to go). Perhaps one should stop here.
...this is a fictitious (or artificial) time...
100% correct. In a language of discrete dynamics systems the term 'time' means 'an independent variable of a discrete system'. And this fictitious short term is as commonly used instead of long definition.
>> Second Law of Thermodynamics
> I took this as a metaphor...
Again 100% correct. ThermoDynamics is a science about temperature, more precisely about heat energy. So, we should invent a new term 'Page-Rank-o-Dynamics'. However, it's generally accepted in a language of Interdisciplinary Sciences that each new law is called after the name of a science that discovered that law first. So, the term 'Second Law of Thermodynamics' is generally accepted in Sociology, Neural Networks, Cellular Automata and many other sciences.
The sequence of updates is no closed system.
Frankly, we are speaking about different things. I - about professional language generally accepted in theory of Complex Systems and Interdisciplinary Sciences. You - about emotions of a specialist in Linear Matrix Algebra who encounters the interdisciplinary border for the first time. I remember this experience; everything is strange and unusual around here. However if you survive, you'll forget about your emotions in a few weeks, if don't - then stay within Linear Algebra. It's a lovely science by the way. I still cannot forget it!
I think we indeed should stop discussion here... or just follow:
Amazon > Textbooks > 'Discrete-Time Dynamic Models'
AND 'An Interdisciplinary Introduction to... Series'