Welcome to WebmasterWorld Guest from 126.96.36.199
Forum Moderators: open
The user comments are pretty negative, so I'll try to pull the balance back the other way. I always appreciate hearing Everyman's perspective, even though we've got different views of some things, e.g. how Google ranks internal pages from a site; I think we do a good job of that. If you haven't read Everyman's "search engines and responsibility" thread and his google-watch.org site, I encourage you to. That said, I do disagree with statements like "Eventually, a FAST-type engine should be administered by a consortium of librarians who are protected civil servants of a world government." :)
Anybody have thoughts on the Salon article?
If Google where ever compelled to open it's log file on you, it could easily track 100% of the searches associated with your tracking code.I totally agree, and there is no doubt in that.
But just as others have said, this has always been trackable through the ISP. Just look at FBI tracking work on the internet. Everything is trackable on the internet. Dial-up or not.
As many others have said, the whole cookie issue is a nonissue to me. What I do and how I use Googles services at Google.com is 100% Googles business. If they can use the data to build a better search engine, so much the better.
I also totally agree.
It's once you leave Google.com and start surfing the web, with the toolbar spying on every click - that's a bit too much. Especially given both IE's and the toolbars problematic security history.
Just to be clear, I wasnt trying to argue that google doesnt track users information ie. previous searches, visited URL's, etc.
I was discussing technology, that storing this data in client-side cookies is not pratical and is not how it is done. :)
Ok, some facts:
I've traced now a Microsoft IE session with Googlebar Advanced Feature [-> PR] enabled.
1] I open the browser, and start to browsing sites OUTSIDE of whatyouwant.google.whatyouwant.
2] When I type/click any URL in the main bar, Googlebar phone home with the URL I've typed/clicked.
Of course Google-home knows also my IP and the current time of my request.
3] I've not seen any cookie sent out.
Maybe my logging-software don't see the cookies sent in 3].
I'm not so friend with Microsoft tools.
But 1] and 2] point are 100% sure.
PaulPaul, I don't think any one has suggested Google is storing searches or previous searches. What is suggested is that there is a tracking code in the cookie. That code could be matched to a cookie/log file on Google. A cookie I just got from Google:
See that Pref ID? That's your personal tracking code. The TM and LM looks to be unix time, while s could equal a hashed search value.
Assign and Store a unique S/N (license), for each Google toolbar d/led, in a database stored on a google server. Then every time the toolbar is online, and data starts transmitting (IP address, searches, etc), go to the google DB and write any data you want. And we all know how fast google DB operations are ;)
Ok, lets have a vote on the future of WebmasterWorld. Everyone gets one vote for every post they've left.
That's the way PageRank works. It disproportionaly favors the large sites.
Wrong, it proportionally favors sites it sees as important. The democratic model is a semi-valid analogy, but it essentially works. The idea is not just that sites vote for you, but that sites relevant to yours vote for you. If a site is considered an important site in your sector or theme, and they vote for you, that means that a site that knew what they were talking about thinks you know what you're talking about too.
It wouldn't work in a national democracy, but it's perfect where content is concerned, especially on the web. Take the SEO world as an analogy, and particular methods to be things people are "voting" for. Consensus Gentium may be that a particular method works (e.g., putting a link to a search engine helps your ranking) but if a resource that's considered important or an authority, like Danny Sullivan, says it's untrue, then his "vote" on that counts for more. Shouldn't "important" or "authority" sites hold the same ability?
Now, let's take a look at the alternative: every site's vote is equal. Then we get into the idea that quantity not quality matter most. So any spammer could make their PageRank anything they wanted just by making mini-site after mini-site and putting links up to the site they want to rank. They rank above quality sites, even though those quality sites were recommended by Slashdot and Webmaster world and the Mercury News and whoever else Google may deem as important.
Of course, it's not just size/age that determines the importance of a site. That's silly. The pagerank system may be simple in concept, but it's not in practice. A site's importance is a mixture of numerous factors. As Google says on their explanation of PageRank, they base a vote's weight on the importance of a site, which is determined by the site's content and quality - size and age are no indication of quality.
Google's saying (using your webmasterworld analogy): We don't want to give the people with the most posts a stronger vote, we want to give people with the smarter, better, more uniquely insightful posts the stronger vote.
P.S. - Please don't delete my account Brett!
[edited by: Filipe at 6:00 am (utc) on Aug. 30, 2002]
Then we make people deliberately opt-in for the advanced options. At that point, you pretty much have to assume that full disclosure + opt-in = informed consent from the user. Other than another window ("are you rilly rilly sure?"), there's not a lot else that we could do. The privacy people that I've talked to (GRC folks, etc.) seem to agree that's the right way to inform the user and get their consent..
Just for a moment - stand back. Forget about Cookies and Google's policies.
Look what Everyman has just done in the space of 48 hours. He has just invented a whole new on line marketing strategy! He now has traffic to burn (with links from slashdot, Salon, and every webmaster forum on the planet) - and think of all the extra pagerank he's getting from all those links.
Not to mention the ego from having so many people talking about him online!
So thats the deal - I'm now going to think who I can blame for something really obscure........... to get more PageRank and inbound links and traffic!
But, you'll admit, also, that, 'cause of this feature, Google will have the world biggest database of "browsing-preferences-usability-etc", if the number of Googlebar-PR user grows dramatically.
But, this is Google job, and ok.
The problem I try to point, is that all this stuff come with
a big issue about privacy [unless you throw away the client IP logging, or your Googlebar users connect via proxy..].
And that seems that people doesn't know all that.
In other words [as I've stated in a previous post in this thread]:
We hope Google will not share this new, maybe growing database, with any company/government.
But we fear a little, 'cause of its enormous value..
The idea is not just that sites vote for you, but that sites relevant to yours vote for you. If a site is considered an important site in your sector or theme, and they vote for you, that means that a site that knew what they were talking about thinks you know what you're talking about too.
I don't think so. "Importance" is defined relative to PageRank. The formula feeds on itself recursively.
Here's an example of how PageRank works:
Salon - PR of 9. Yellow liberal journalism at its worst. I never mentioned the words "Rumsfeld" or "United Airlines." I actually told Mr. Manjoo, the interviewer and author of the Salon piece, that I thought my site did okay in Google, and that I was speaking as a representative of the public interest. The "royal we" he refers to is because I speak for Public Information Research, Inc. (www.pir.org). By the way, the www.namebase.org site he linked in the article is not the site in Google. That's an unadorned CGI site that's frequently disallowed from Google because I tend to get penalized when it isn't. If you want to slam me for my crummy little site, then go to the www.pir.org site that I told Mr. Manjoo was our main site, not the www.namebase.org site that he decided to link in the article.
Slashdot - PR of 9. Mindless, reactionary blogging at its worst from script-kiddies who can't spell. They accuse me of technical incompetence for not knowing how to disable cookies, and then add that there's no way that little cookie could store 36 years of data in any case. What complete idiots.
WebmasterWorld - PR of 8. Folks are getting smarter here. Only half of the comments are idiotic, and the other half are well-informed.
www.pir.org - PR of 7. Smarter still. That's my site. Full of essays and book reviews. Lots of names. For serious researchers. Been around for a long, long time. The only charge that Oliver North was convicted on came from an obscure name linked to North that was discovered by a journalist in 1987 using my database. With results like this, repeated many times over many years, I can survive without Slashdot's approval (or even without Google's approval). Let's see now, what was PageRank doing in 1987?
www.cia-on-campus.org - PR of 6. Only site in the world that deals with the history of the CIA's involvement on U.S. campuses. Most of the material was painstakingly transcribed from old yellow articles and documents.
www.google-watch.org - Too new to be ranked, but I predict a 5 for it on the next cycle. Yes, I know how to delete my cookies. There's a 25 percent chance that I can even steal your Explorer cookie in my Google cookie demo (depends on whether your version of Explorer is vulnerable to this exploit). I hope GoogleGuy likes my anonymous Google proxy and doesn't block us, otherwise I'll have to write it up and switch to an Inktomi proxy. Proxy use is limited to 10 searches per hour per IP number. There are also load monitors that kick in because we don't have 10,000 servers.
Most of the work on this google-watch site was done to highlight my concern over Google's cookies and the privacy issues they raise. Mr. Manjoo mentioned cookies in the article, but not very prominently, given that half of the interview was about cookies. Sheesh, I have just one short essay on PageRank!
I was going to add an essay about Google's cache and why I don't like it. But so many people think it's so cool from a user's point of view (it's webmasters who lose here, not the user), that I didn't think I'd get anyone to agree with me. Now it seems like that was a silly reason not to write the essay! (I need an essay on the toolbar too. I'm brimming with opinions on the toolbar.)
So you see, PageRank allows mediocrity to rise to the top. Once on top, a site's power to suppress anything outside of "conventional wisdom" due to its PageRank, insures that mediocrity prevails.
To quote CmdrTaco at Slashdot: "Google's system seems to work the best if you ask me but, on the other hand, link popularity may not provide the most intelligent top rankings."
You got the last part right, Commander with a PR of 9. Just click and read the comments below from your assorted Slashdotters.
Filipe says: The idea is not just that sites vote for you, but that sites relevant to yours vote for you. If a site is considered an important site in your sector or theme, and they vote for you, that means that a site that knew what they were talking about thinks you know what you're talking about too.
I don't think so. "Importance" is defined relative to PageRank. The formula feeds on itself recursively.
I'm not sure I quite see how what you just said contradicts what I'm saying. Are you suggesting that Google doesn't look at themes at all and bases their algorithm entirely on PageRank? That they don't look to see if sites that link to you have anything to do with your site's field? GoogleGuy will attest that PageRank is just a single aspect of their algorithm, not the only factor.
Now this guy has a point, for example a number of sites I work with are quite varied. Google gives the BBC website a pagerank of 9 or 10 I think. Bottom line, webmasters at the BBC can put just about any keyword in their Title tag and take the listings. I've Seen queries where pages are completely made up of these.
The problem comes if you happen to be a webmaster with a site that crosses with a site that has a big PR value. And for this guy here I am completely with him. The fact that one article can cause such a big stir is I think a fair reflection on the staganation that is taking place a little as Google keeps the SEO community moist with new tools and features, which are good but so is keeping the head above water from a privacy point of view. It's all well and good being "touchy-feely" in your online communication but no-one ever complained about content that was written to inform. I think this guy is coming from the point of view of Google now seeming to hide a little behind "we are a dotty com" wheras they are actually collecting data and using it in the same way as any serious online business. Deception I get the feeling is his view?
Surely people here understand his point of view? Stop being so protective its only a search engine!
Every other form of media on the planet is totally dominanted by a hand full of people and the beauty of the internet is that everyone can be heard so I fully understand Everymans view on Google becoming dangerously powerful.
The fact remains though that IMHO it still returns the best, most relevant results and until the day another engine returns better results, I for one will continue to use it.
That said and done:
"It's also important to emphasize that page rank is only one of more than one hundred different factors we use to determine the relevancy of a page for a search query,"
Which must leave us with at least 75 other more important things to talk about?
If my daughter would complain in such a way, I would say she is spoiled.
What is the alternative at the moment? It may not be perfect, but it could only improve, why else would Google keep these PhD's employed? If it does not improve [webmasterworld.com]turn elswhere.
On Google and democracy..
On paying for Pagerank, there would be many ways of largely discounting this effect, some are mentioned here:
If you publish information on the WWW, you are dealing with conventions of the WWW. One of them is that you can add a no-cache tag.
If you say you want to help people find information (Google), you should also abide to the general conventions on information (copyright).
A grey, middle way may be for Google to only show the cache of Fresh sites (they only last a day or two and not one or two months), but I would dread the day Google stopped showing the cache version of pages.
jdMorgan's earlier observation on the opt-in, instead of the opt-out, would be most correct.
Everyman- (as you probably know) your experience with the media not getting the story right happens all the time :(. Its not that they are trying to twist the story, but its human nature to only hear what you want to hear/remember what you want to remember.. although sometimes there is intentional twisting... The best example I can give is news headlines-- if you look at a news headline and read the story, rarely does your initial impression of the headline actaully turn out to be the way it really is / as important as the headline made it out to be... Extrapolate this to the entire story representing the facts just like the news headline was supposed to and you can realize that news reports must always be taken with a grain of salt. http://www.counterpunch.org/ is a good website that uncovers innaccuracy in news reports, and also there is a great site that constantly critiques the new york times' stories.
Sorry for the rant :P
Googleguy- You say that the toolbar isn't required.. and your right.. but what about post MSFT DOJ settlement where computers can come off the shelf customized a la Yahoo's toolbars/instant messegers/IE themes... in this time Google partners with Dell(or someone) to put the Google Toolbar on every IE along with other Google branded devices just like Yahoo is working to do...
Then the toolbar, while not required, will already be installed and the users probably will never realize exactly what data is possibly collected. (Although to be fair you could turn off the more extreme things by default like PR shower/etc).
I love Google though, I depend on it to do my job- its my knowledge base for developing intranet applications and internet websites. Google Cache is important in that regard as well!
Keep up the great work, everyone!
I disagree with the prevailing view that page rank favors large sites. Even on huge keywords like "car insurance" there is one pr 4 in the top ten, and there are no pr8's! Most of these are smaller (and maybe a little spammy) sites. Yes, big companies tend to have lots of incoming links and high pr, but because of the way that google assigns internal pr, many of their pages are 3's, 4's and 5's. Basically, all a little guy has to do is get into dmoz and he has a pr 4 or even 5.
OK, I'll ring in on the caching too... If I write a book, nobody is going to ask me if it's ok to put it in a library! At least with google I can write a robots.txt and keep my pages out if I want!
Wrong, it proportionally favors sites it sees as important.
I don't want to speak for Brett, but I think that misses his valid point:
Ok, lets have a vote on the future of WebmasterWorld. Everyone gets one vote for every post they've left.
That's the way PageRank works. It disproportionatly favors the large sites.
The point is: PageRank is allocated per page, not per site. Larger sites have a built-in advantage in this "democracy" over smaller sites simply because they have more votes.
(It's my point, anyway, whether it's Brett's or not!)
I used site:namebase.org inurl:* [google.com] to discover this fact.
Perhaps part of the problem is that a massive amount of Namebase.Org's pages are deep/invisible web material served from a cgi script and Googlebot has been unable to discover this content.
Example of a Namebase page [namebase.org]
Methods are easily available to help Google find/crawl this material.
You'll see different results trying rubble88's query with the other site [google.com].
[edited by: JayC at 3:06 pm (utc) on Aug. 30, 2002]
Of course, he may have been reading this forum and come up with an alternative strategy of publicity to get his site more widely known. If lots of us visit the site and link to him then he should improve his PR6 and 88 back links
In fact, he is directly quoted on the topic.
From the Salon article,
"My problem has been to get Google to go deep enough into my site," he says. In other words, Brandt wants Google to index the 100,000 names he has in his database, so that a Google search for "Donald Rumsfeld" will bring up NameBase's page for the secretary of defense.
The point is: PageRank is allocated per page , not per site. Larger sites have a built-in advantage in this "democracy" over smaller sites simply because they have more votes.
Yes, except that instead of "larger sites" I'd say "sites with higher home page PageRank." They can cast their votes externally (linking to outside domains), or they can use their votes internally (linking to deeper pages within that same site). My point in the essay, to the extent that my concern with PageRank is "personal" as Mr. Manjoo claims, is simply that www.pir.org is an example of a site where the links coming into the site rarely come into the deep pages. They come into the home page, for the most part. So I have PR of 7 to spread around to 100,000 deep pages, and this 7 doesn't go very far. Most of the deep pages end up with PR 0 or PR 1. That makes them uncompetitive in many cases, depending on the keywords being used.
If I started with a PR 9 instead of a 7, the inside pages would probably be 3 or 4. If you think there's little difference between a 0 and a 3, when everything else is equal, you should try tracking how well the 0 page competes. It can easily get buried in the SERPs, despite excellent on-page characteristics that would have Inktomi, for example, putting that same page in the top ten.
Thus, only big guys can have big databases due to the way PageRank works. This was explained to Mr. Manjoo, and this is why he came up with the example of searching for "Rumsfeld." He was trying to illustrate this point. But how he got from "Rumsfeld" to "United Airlines," I'll never know!
What makes Google so good are the SERP's, the reason were having this debate is because Google have the best SERP's of all the Search Engines.
We can moan and groan about why this and that site isnt ranked higher and why all the big companys have high PR, but at the end of the day all that matters is how good the SERP's are and and the moment Google wins hands down.
ps. i dont think they are trying to take over the world or spy on us all with there cookies either!
I can easily generate 40,000 pages from the data I've got about music - and so can everybody else - and it will all end up in cloaking, spamming, and doorways again to try and get high search rankings.
I think Google's PR lessen the effect of on-page techniques (which often means tricks) and searchers get more relevant results.
Teoma is a real world application of some of the concepts and ideas from the often discussed but never publicly released IBM search engine, Clever. More about Clever and link analysis in a 1999 Scientific American [cs.cornell.edu] article.
This 2001 Science [cs.cornell.edu] article is also worthy of your attention.
[edited by: ciml at 3:57 pm (utc) on Aug. 30, 2002]
[edit reason] Quote Snipped [/edit]
If I started with a PR 9 instead of a 7, the inside pages would probably be 3 or 4. If you think there's little difference between a 0 and a 3, when everything else is equal, you should try tracking how well the 0 page competes.
That is the crux of it IMO.
IMO Page Rank is intended to court the big players. From a business (Google's) point of view it makes sense. It creates in the internet world the same big company culture that exists in the real world, with the same partnership and funding opportunities. If all the big companies got together to create a search engine with the aim to create a power structure long enjoyed by big business in the real world, it would work a lot like Google.
I think it is very hard to appreciate Everyman's point of view unless you compete or investigate in a category where the big PR folks live, in Google or in the real world.
I'm just not sure it is Google that should be the target, remember the other search engines largely failed by not adopting a business approach paralleling the real world. Maybe we should just be annoyed for having a world that is what it is.