Welcome to WebmasterWorld Guest from 220.127.116.11
Forum Moderators: open
First, love the site. It's one of my first stops every morning, afternoon, and evening.
Normally, TheRegister [theregister.co.uk] is the great because you have tech reporters that actually have a clue about most of this net stuff. However, a couple articles on Google The Reg has run the last week, haven't entirely been on the mark.
So let's back up:
Last week there was the now reported and rereported article on a Google news phenom that allowed pure company press releases [theregister.co.uk] to slip into the mainstream news. We thought it was such a good story, that we covered it ourselves [webmasterworld.com].
While we are at it let's backup a couple of months to when Google launched the news feature. At the time, much noise was made about the fact that Google News is produced by machine algo's alone. No human editors are involved in headline clippings. Obviously, there must be some editorial decisions involved in the process, or they would include WebmasterWorld headlines in the mix as well.
So when Google says press release inclusion in to the database was a bug, I completely understand it as valid explanation. Not only is it valid, it is quite common. Bug spotting is one of the prime motivations for Googlers to read here regularly. We have been here picking out Google bugs for many years. We just found another one yesterday [webmasterworld.com].
That is not to insinuate that I don't agree with you on some finer points:
Why secretive? The company refuses to publish its News Policy - and it maintains the fiction that the selection and composition of stories on its "News section" was "determined by a computer".
Derived queries, blind queries, or query free content generation is a pretty sophisticated art. So sophisticated that it has landed a paper [cs.berkeley.edu] on it at this years www12 conference which was co-authored by Googles own Sergey Brin and Monika Henzinger.
That's as true as the assertion that the selection and composition of the story you're reading now was "determined by a computer", too.
I can understand how you would view headline generation as such magic that it would look like slight of hand. Granted, there must be editorial decisions made, but even those editorial decisions are arrived at mathematically. This is the same reason why the algo interped a press release as a story. It is impossible at this point for a computer to determine the difference between a press release, and a "news story".
Google has stated several times that the news option may be experimented with [webmasterworld.com] as a revenue option - this is not news - nor is the backhanded comment about payola.
Aside from that, the entire follow up set of comments made on The Reg are without merit. I feel they are uninformed of how such a complex system such as Google works day-to-day.
In Andrews latest installment [theregister.co.uk] of his Google saga, he insists that some how Google doodled with his search results on being googlewashed.
Google updates it's full index once a month [webmasterworld.com]. During that time, it reworks part of it's algo, and puts a fresh database online that is built from the previous months full web crawl. We are currently here waiting [webmasterworld.com] with about 100,000 webmasters for the last week for this months update Cassandra [webmasterworld.com].
Inbetween those big super updates, we have what we call FreshBot. Freshie was so named after the fact that Google puts FRESH! tags next to any listing indexed within the last 72hours. FreshBot indexes newly discovered and newly updated pages almost continuously. It then adds it to the main Google index and gives the page a little boost [webmasterworld.com]. We call it the FreshBot Sweepstakes because freshbot indexed pages can jump to the top 5 of any keyword at any given time. That boost lasts about 72 hrs.
So a story that coins a phrase (googlewashed), and that dozens of others link to, should be pretty near the top.
Not until Google performs a full update and the page is actually in the full index. Inbound links to it have not be accounted for before it can rise again in the rankings.
- You run a story on GoogleWashing. It gets picked up by FreshBot.
- It gets added to the index with a bit of a boost in the algo.
- 72 hours later, that page loses its "fresh" status and drops in the rankings. The theory is that pages that are really are fresh or updated, will have a higher value to visitors. That page will not rank well again until Google performs the full update. (eg: googlewash became googlewashed in 72 hours and basically kicked itself out of the rankings)
Other pages - such as blogs - that use the term GoogleWash, are boosted because they came after your story. They are now in the middle of their own "freshbot" cycle. Many of those pages have higher PR than the Registers. Since the term is new, and there were no other pages in the index with "googlewash" in them, I would expect to see those pages that rank high right now for GoogleWash, to drop after their "Fresh Period" is over with (same as the regs story dropped).
After that, I predict that the Regs story on GoogleWashing will eventually rise back in to a top postion.
That's my interpretation of what Andrew was seeing.
Looking at the readers letters about the inclusion of PR releases in the news search they struck a cord with the searching public. The Reg got a story and Google got some good feedback, almost a win, win.
I must agree though Andrews latest installment just looks like a guy after a story and to hell with the truth.
It's not as if the research would be hard to do.
[regarding an insignificant case of Googlebombing, discussed, explained, and replicated by WW - the register claimed that by googlebombing a keyphrase with no competition, the meaning of the word had changed. But if no-one typed it anyway, it never meant anything to most people in the first place...]
What word will change next?
I could almost have been reading
The Sun over here in the UK...
And the article about press releases in Google news:
[because their press releases show up in Google News]
There you have it: the RIAA is a bona fide news organization. Maybe it deserves to join a guild, or qualify for 'embedded status' with the military in Iraq?
This seems to be a pointless trifle about semantics. If Microsoft release an important press release about their new version of Windows, this is clearly news that people will be interested in. If Google gets it straight from the horse's mouth, then what's the problem? It doesn't mean it's not news because it doesn't come from a recognised news organisation. Or is Andrew Orlowski getting some journalistic jealousy?
At the very least, the labeling should be clear. You can't say it's okay because the bot is too dumb -- this is an important public policy question about the role of the Web in public discourse. If the algo can't hack it, then pull the plug on the algo. Don't expect the public to study Google for months so that they can determine the reasons why the rankings shift the way they do.
If Microsoft release an important press release about their new version of Windows, this is clearly news that people will be interested in. If Google gets it straight from the horse's mouth, then what's the problem? It doesn't mean it's not news because it doesn't come from a recognised news organisation.
The problem is that it doesn't mean it is news either. To present press releases as "news" is wrought with dangers. That's why real news organization use human beings to apply some level of editorial decision-making on press releases and, if decided to publish the information, to present it with some balance, if necessary.
Algo explanations don't excuse the fact that a) certain select bloggers have disproportionate power due to the freshbot, and b) if Google's bot can't tell real news from corporate press releases and industry astroturf, then it shouldn't be called Google News.
Kackle, why do those two things mean that Google news is not a news site? The most you could say is that it isn't a very accurate news site that it's biased, and that would only be your opinion anyway. What is 'real news' and why is Google's defiition any worse than your own?
Don't expect the public to study Google for months so that they can determine the reasons why the rankings shift the way they do
The general public have no need whatsoever to concern themselves with the algorithm. They use Google news to look up news just the same as any other news site. They don't need compex explanations about how Google works any more than they need complex explanations of how the news programs they watch on TV are put together and produced. They're only interested in the final product, and rightly so. If the results aren't good enough because of press releases or whatever else, people will srop using it.
I'm sure he meant Google-worshipping sycophants... ;)
Since when is an op/ed piece considered news? Editorials aren't news. Opinions aren't news. News must be fact.
Sorry, The Register produced an op/ed piece, called it news then blasted Google for not sticking with real news? Please...
<edit>lack of caffeine</edit>
[edited by: digitalghost at 11:32 pm (utc) on April 10, 2003]
The other issue is Google's use of press releases on the Google News search.
Let's not confuse the two issues, because at Google, different algos are at work for the main index crawls (monthly and freshbot) and the Google News crawl.
With respect to what I call "news," I believe that it is socially irresponsible to suggest that the entire history of professional journalism can be replaced by a handful of software engineers writing some new algo. I expect any algo to be more responsive to the public interest than Google's algo seems to be. Professional journalism includes an entire range of rules that have evolved over time: journalists announce who they are and where they're from; journalists respect the on-record, not-for-attribution, background, or off-the-record requirements of their sources, in order to develop the sort of trust that expands their information base; journalists generally try to check out their facts; journalists are expected to be free of conflicts of interest; news publications label ads, when necessary, to make them clearly distinct from news content; journalists generally try to solicit comments from both sides of an issue, etc.
It's not so much that Google cannot use the word "news." The problem is that without clear and distinct labeling, the fact that corporate public relations material is intermixed with material written by professional journalists, serves to give the corporate material a "not guilty by association" imprimatur that is unearned and undeserved.
Having said that, there's a huge gray area in high-tech news, particularly in the Internet marketing field, where you see semi-journalists who work for companies with a vested interest in promoting commercialism on the web. These semi-journalists go out and collect corporate press releases, and basically repackage them. Some of them are closer to being real journalists than others. It's not always an easy call.
The Register is "gonzo high-tech news." Opinionated as hell. Cute headlines. Entertaining to read. And by the way, Orlowski had two important pieces on Google in one week that came close to being "scoops." Scoops are news.
The Register is fairly irreverent, which is its journalistic style. It tends to get scoops other publications don't, and it's one of Slashdot's favorite sources for information. I read it at least once every day, including weekends.
I think it's a storm in a teacup really. I *do* use Google news now and again, but for real content I'll always go to the BBC, Ananova or CNN first. Google News is only a beta. In other words.. it doesn't matter.
Long live El Reg and Google :)
I believe that it is socially irresponsible to suggest that the entire history of professional journalism can be replaced by a handful of software engineers writing some new algo.
Hmmm... that's a bit like saying it's irresponsible to suggest that years of history of manually calculating range tables could be replaced by a tiny bit of silicon, or that millions of years of evolution could be replaced by some fancy rendering algos...
īJust because it's not perfect, doesn't mean it won't get tehre... or at least as "perfect" as the human , "historical" equivalent...
The problem is that without clear and distinct labeling...
Actually, Google is placing clear and distinct labels on the press releases.
I found the RegCo article to have a breathlessly hysterical tone that undermined some interesting points they had to make. I don't agree with much of what they said, or share their chest thumping "outrage" either.
Orlowski had two important pieces on Google in one week that came close to being "scoops." Scoops are news
That's where our definition of news differs then Kackle. I see news as a report of recent or important events, and that source of that news does not bother me a great deal. Going back to my earlier example, if Microsoft issue a press release, this will be the source of all the journalists articles I read about it anyway. The announcement from Microsoft is news, whether it is filtered by a reporter or not.
What I don't think is news is one person's anti-google rant peppered with misinformation. There is nothing new or even particularly interesting about the register articles in question, which may be why they rely so much on hyperbole and tabloidese.
As an intelligent adult, I am able to look at a source of information and decide on how relevant it is and how much trust I will place in the source of that aricle, whether it is Microsoft, or the New York Times.
Am I the only one who has a sense that the article isn't about Google at all, but about the fact that some people object to writers who aren't professional journalists getting their material on news sites too. I notice that the two criticisms of Google stem from sources of news other than professional journalists - weblogs and press releases.
it is socially irresponsible to suggest that the entire history of professional journalism can be replaced by a handful of software engineers writing some new algo
You have definitely caught the tone of Andrew Orlowski's articles. Who said that Google are trying to replace the entire history of journalism? Where are these claims coming from? So far i've heard that a handful of bloggers are able to change the meaning of words at will (i'm waiting for the dictionary rewrite) and that Google is trying to take down the entire history of professional journalism. Can we try and keep this discussion realistic?
I have long suspected it had already been replaced by a handful of simians using stochastic keyboards.
But software engineers would certainly be a step even further up the evolutionary scale.
Yes, the Guardian, Deutsche Welle, Radio Netherlands and a few other independent voices show up in an unburied form, but Google News certainly does not reflect the views of the 95% of the planet's population that aren't Americans.
I can in no way believe that some independent algo is deciding on what they present as the truth.
It may be like PR. English language sites tend to dominate the high PR ones, because there are more potential pages to link to them. Deutsche Welle and Radio Netherlands tend not to to get so many English language pages linking to them:
PR6. Less than the Webmasterworld home page.
Google News certainly does not reflect the views of the 95% of the planet's population that aren't Americans.
Google News does not claim to represent the views of the world's population.
Google News' "About [news.google.com]" page does not claim to represent anything but an aggregation of news sources worldwide.
They don't claim to be a reflection of world opinion. Only a collection of worldwide news stories.
As it says on their "About [news.google.com]" page, "You pick the item that interests you, then go directly to the site which published the account you wish to read."
No, you're not. Depending on the angle from which you view it, the Internet is so many different things. And from one particular angle it's a fight. A fight between traditional power structure from the last couple of hundred years and people ridding themselves of that power structure. Traditional journalistic media are a part of that fight.
And some of them are desperate. Not without a reason: They are losing.
You and I and many others are more and more becoming our own journalists, both as writers and as readers. With the help of Google and other search engines.
Anyway, surely what is news is a matter of opinion. News is also events which happen. In theory my birth is news - it was to my family anyway, but the fact that it did not make the newspapers or the TV news is because someone (an editor) decided that it was not 'news' to enough people. Surely the google computer is just acting as an editor.
It's (good) news to me when nvidia release their next generation graphics card - I often read press releases about things like that.
I think the only thing we should do is sit and wait to see how google handles press releases. I think everyone is jumping the gun and shouting their heads off about nothing.
Should we be surprised? Google only answers some few emails too as has been noted here at WW many times. It's a pattern.
As of this morning I noticed ZDNet has picked up the story so it's off to make it's rounds in the online tech media.
Still like TheReg, still read it every day. I have to admire the way The Register was willing to point out what they thought was something dodgy about Google instead of the usual puff-pieces about Google-and-the-chef-and-the-food at the GooglePlex.