Forum Moderators: Robert Charlton & goodroi
Fact #1
The Recent Directory Whack
You have to wonder why Google took so long to whack these directories. Their backlinks were in the worst neighborhoods, with loads of paid dreck for backlinks. Looks like the same directory dreck Google manually whacked three years ago (Bluefind, etc.) If Google's algorithm was so powerful, it would have whacked those directories awhile ago using automated methods of detecting crap linkage, then run it by a test of ten or twenty parameters to determine if it's a crap site or not, then pull the plug automatically if it is. It takes most of us two to fifteen seconds to accurately determine if a site is crap. But Google can't. It takes a hand tweak to do it.
Fact #2
The No-Follow tag.
For all the talk about graphing link neighborhoods and being able to identify sites likely to be manipulating links, in the real world Google could identify crap links to crap neighborhoods and devalue them right away. Google needs the No-Follow tag the way a parapalegic needs a wheelchair.
Fact #3
Paid links
They work. They work. And they work like crazy. Google's algorithm loves it and ranks websites with paid backlinks with two thumbs up. There is no denying that Google's algo cannot spot paid links, even when they're all coming from the same neighborhood of radio stations. Those links keep on working.
Fact #4
Webmaster Snitch Central
If Google's algorithm was so smart it wouldn't need to solicit webmasters to rat out other webmasters for buying text links.
Fact #5
Human Review
Google pays humans to review it's SERPs by hand. The fact that Google hired an army of hand checkers is tacit admission that that their algorithm cannot cope with spammy sites and regularly rewards them with high positions in their SERPs.
Fact #6
Wikipedia
Is Wikipedia really the best answer to most questions posed on Google? Either the Internet is broken or Google has to try harder.
Fact #7
Duplicate Content
Webmasters have been battling Google about it's inability to detect which site has original content and which is an imposter spoofing, cloaking, or reprinting it. Worse yet, Google sometimes ranks the duplicated content above the original content.
[edited by: lawman at 11:22 pm (utc) on Oct. 12, 2007]
I'd like to expand on the wikipedia entry. Wikipedia, contrary to popular notions, is not written by experts. Much of wikipedia is copied from elsewhere. A true authority site is written by the experts themselves. Yes there are places where experts have written entries, but wikipedia is not an authoritative website- it is a well linked website. There are many more sites more authoritative, such as the original websites from which the wiki editors cribbed their content, and on sites written by experts.
That Google's algo defaults to wikipedia is a fault of the algorithm, and not a good response to so many queries.
[edited by: The_Shower_Scene at 11:09 pm (utc) on Oct. 12, 2007]
#5 human review? I expect nothing less. this is the nature of AI. It needs calibration, constant reality checks. This isn't a sign of weakness. It's just good quality control.
#6a wikipedia isn't a "default", it's merely the most optimized site there is, for reasons we've gone over ad infinitum here at ww. sure, it's a problem when stubs appear in the SERPs, and yes, it's anonymously written. However, it's often excellent, and it's very successful. Wikipedia is a complex phenomenon, and is not completely black/white or good/bad. Simplistic opinions about Wikipedia are worse than useless.
#6b "Webmasters have been battling Google" over duplicate content? Webmasters are battling content thieves, not Google. Assign blame where it belongs. Spam's a huge problem, and we all (webmasters, search engines, all legitimate operators) are trying to find ways to deal with it.
but why do you frame this around Google? Why not "search?" Why must Google be held accountable for the actions of every spammer, scammer, and shady operator out there? And can you name a search engine that has a shorter list of problems?
As for dupe it works simply on who has the most powerful domain. This "first to publish" stuff is just rubbish that people have made up and Google has said nothing. I can copy your content and put it on a pr 7 domain and whoop your ass.
Webmaster Snitch Central
I do not like the snitching either. It is us against Google but some cry babies get upset because someone gets ahead of them. It is about marketing not snitching. Look at what they guy above you is doing it and use the techniques yourself instead of being hell bent on revenge. One day you will be zapped by G after following their guidelines, that will be the day the changes you or you will give up.
Paid links:
At the end of the day google is just a computer not some voodoo machine. Links work and is the basis of their algo, so go on and milk it for all it is worth.
[edited by: Crush at 6:29 am (utc) on Oct. 13, 2007]
I really wonder whether the term algorithm as defined in the early 20th century is still appropriate for what computer-clusters of the size of google's are currently doing. The whole is more than the sum of its parts.
For instance, we had this discussion on error-probability quite some time ago here at ww. Given that google is meanwhile running almost a million PCs (that is ten by the power of six), this has become a very, very relevant factor. You never know what some of these machines are doing, shortly before they run insane. Of course googles engineers are supposedly doing what they can to minimize the worst effects, but they are only human beings. And in addition to that, as a CEO, you'll never know what some of your programmers and administrators are doing;)
I have grown up with the definition of "mythos" as the opposite counterpart of "logos", thus denoting what you do not know (yet), with "enlightment" being the (historical) process of bringing light to this fog.
In the eighties, computers were viewed somewhat "mystic", because many people didn't know about computer languges. This has changed considerably meanwhile, but I think we are currently entering an era where we should accept, that even our most brilliant experts have no idea any more, what these machine-clusters are precisely doing. No: I was NOT talking about everflux;)
All joking apart: The best source we have to demystify google's algo is the applied patents. Tedster has done the best he could to compile a list e.g. here [webmasterworld.com] and in some other threads, which I didn't flag. It seems only some very few of us found the time to have a closer look. It is a really hard job to keep up with research on google's algo nowadays, and I admittedly include myself among the lazy ones.
Rather more time to amplify mystification, IMHO.
As for dupe it works simply on who has the most powerful domain. This "first to publish" stuff is just rubbish
Bingo. And if you want to see absolute proof of this in real time, get your site hooked up as a Google News source site and write a few articles about a high-value commercial subject. Your content will be stolen within minutes of being published (they scrap the GNews feed) and wind-up benefiting the PR7-8 sites that stole it.
Oh come on. Show me one person who ever thought it was.
That's just too blatant a straw man not to respond to. I'll leave aside the rest of your post, since even the points on which you're correct contain more rant than fact.
----------------------------------------------------------------
I can say is that their pharmacology and biochemistry and their toxicology is written by experts. In fact, many of the pharma companies have writers whose job is to manage wikipedia content.
I am talking top notch data here, complete with 3-d chemical structures.
[edited by: tedster at 8:35 pm (utc) on Oct. 24, 2007]
[edit reason] removed links [/edit]
Google's algorithm is an attempt to give search users a set of results that they like, on a massive and essentially automated scale. The algorithm, for all its complexity (and at times literal stupidity), is measured by how effective it is at getting that end result: user satisfaction.
Google's algorithm is NOT an attempt to judge websites according to the webmaster guidelines. The algo comes first, and the guidelines come later - an attempt to communicate to the public a bit about how the algo is intended to work, whether it does work that way in reality or not.
Yes, there has been some public talk from Google about "artifical intelligence" and all that. But quality is still not something that can be measured computationally. AI has not come anywhere near that far.
Ranking high by looking at the algo might give a better serp for a while. But at the end it's the users satisfaction that decides how long you could stay at the top (or nearby). I belive user satisfaction means more than 50% for the ranking.
Google is driven by results/user satisfaction. If the users get satisfied they stay with Google. If the stay Google becomes a major serchengine. If they become a major searchengine more companies are willing to pay for their ads.
But like all other companies they make mistakes from time to time. This means companies that are popular by endusers falls back from the serp. But most often they comes back, if they follow the rules.
That being said, you can have the cure to cancer & if it is on the 20th page nobody will see it. You have to think like a machine when you are reaching out to a machine.
Ask yourself - who makes the algorithms?
If I were a Male, PHD, nerd, computer scientist, who spent much time on the internet and in research libraries reading reams of information looking for pearls, how do I know what is fluff and what is meat.
Here's how scientists read journals:
Perfunctory scan - (sometimes called "reading through the cover" since in scientific journals the cover often contains the headlines of the research papers - Check out the New England Journal of medicine to see what I mean) <This is where your title tag comes in>
Exploratory skim - turn to the article, read the headings, read the abstract see if it is worth your time - <this is where the 1st few sentences come in>
Appendix -- what are the sources for the article; is this article cited in other articles -- <this is where backwards links come in>
All of the other search engine criteria are checks and balances relating to the above IMHO.