Forum Moderators: open
216.239.46.118 - - [06/May/2002:23:51:06 -0700] "GET /location_that_I_made_to_test_googlebot/ HTTP/1.0" 401 46 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"
Please set traps and prove me wrong but I am positive that they do this! The theory to discount this used to be that the location was crawled because it was in a referral log somewhere. Well, that is not the case because I only referrer to myself and my logs are private.
However, I think the PR idea is wrong.
If no one is linking to the page - IT HAS NO PR. If it has no PR - then it can't give PR to other pages. If it can't give PR to other pages, then it won't be used in any calculation.
Google does not need the data from the toolbar for the purpose of finding new sites.
They use it to see where you are going. This will allow them to do the custom SERPs that I am confident they will in the future. It allows them to figure out which sites are more popular - this also could be entered into the algo. It allows them to see patterns of similar interests. The toolbar is a goldmine.
THE MOST USELESS THING WOULD BE TO FIND NEW SITES.
I agree with GoogleGuy that their policy doesn't prevent this. Their policy doesn't prevent them from coming over your house and burning it down either.
That being said - it just would not be a good idea for Google to do this. Google knows this. If someone recognized computer expert (other than Lisa :) ) Set up a controlled test to prove Lisa's theory - and was successful - it would be bad PR for Google. It doesn't matter what their policy is - look at Intel with their serial numbers and other things that come along now and then.
Hopefully Lisa will let us know what the response is from Google.
And I think we should be thankful GG is here in the capacity he is. We can't expect anything more than what we have gotten. There isn't any Yahoo Yokel or Inktomi Ignoramus.
This thread.....it started talking about the toolbar and privacy, the two main entities. The first privacy "barrier" we came to IMO, was about the fact that google DOES find these "hidden URL's" one way or another (perhaps doing Lisa's experiment a few dozen times could confirm it either way.
I think what GG was getting at first was that the information that Google gathers has no affiliation with the person its being gathered from.
The thing is, who was the toolbar for? If you say joe public, then this isnt really an issue for them. If the toolbar was for us, why did google make it for us???? ;)
And yet, there's nothing in your profile that could be used to identify you.
Well, I'm not running the world's biggest search engine either. Your right to privacy is inversely proportional to the amount of power you exercise over others.
Close readers know who I am: [webmasterworld.com...]
I started out with info in my profile a year ago. Then I got nervous because I realized how much power Google had over my site. My reasons are no different from many others who have thin profiles on WebmasterWorld.
Now it hardly matters, because Google knows much more about me than I know about them. I'm still waiting for the axe to fall....
Also, when I look at those private pages with the PR feature on, it gives the private pages the same PR as the site's public home page, even though there are no inbound links to the private pages. PR would therefore seem to be imputed from a site's overall PR, no?
This is my second or third posting in Webmaster world and I feel compelled because of the issues that Lisa has raised.
Lisa's test is interesting to say the least and Google using the toolbar to "find" interesting pages does make sense. I read an interview a few months ago (can't seem to find it sorry) in which a representative from Google stated that thier main goal was to index and make searchable ALL the world's information. A big job and pretty much impossible.
With Google having said that it only makes sense that they would use the toolbar to find pages to spider and determine if it is worthy of being in the index.
Just because googlebot visited the page doesn't mean that it will be indexed. With Googlebot being as hungry as it is it would not surprise me if Lisa's test page is available in an upcoming index.
Mack mentioned that Google isn't interested in interesting pages. Mack, I'm sorry but I have to differ. I think Google is at least interested in taking a look at every available document that they can get their hands on. They may not make it available through thier index but Google sure does want to know about it. Just my two cents.
For example, I set up a web site that nobody was linking to that included some very detailed research papers that I had written in school. And they were indexed.
People can believe what they want.
PR HAS NOTHING TO DO WITH HOW INTERESTING A PAGE IS.
Google doesn't care. They specifically state they will not crawl every page and tons of people try and get their pages into google without success. Content will only get you so far - to believe otherwise relegates you to the bottom of the SERPs.
IR100xPR0=0
To think that all you need to get your site into the index is to visit with the toolbar defies logic.
I think it is possible google visited Lisa's page. Other people with thousands of pages have not confirmed this. Why just Lisa?
There is simply no way they would include pages based on toolbar visits only. If it turns out that google actually is including pages in this manner - I will tattoo their logo on my butt.
Why am I so special? Beats me. Perhaps because the domain is not known by Google and more over that particular url on my domain they saw me looking at a lot. Who knows why they visited it. They don't index 401s. But I am 100% sure the only thing that communicated the URL to Google was the toolbar. There are no links to it as the page is two layers deep and I made the page in April.
It is funny; I can see some of you guys guessing at the location of the secret page in my log files. ;) Now stop that. You are guessing on the wrong domain anyway.
From a webmasters point of view, we need as much data as possible to run an effective site. To assume that Google wouldn't use every scrap of data available to them is some what niave. I know the ethical debate is another discussion altogether, but no one can say Google hasn't fully disclosed in big red letters what the toolbar can and can't do.
Back up top to the original proposition, does this really surprise anyone?
Does it suprise me that the collect that data?
No, I always assumed they have a file somewhere with EVERY SITE I visit. There is no reason for the toolbar to find out what sites are not listed and then not communicate the data.
In fact, we know it happens the other way around. Therefore - we ALWAYS knew google received this data. To think otherwise means the toolbar works by magic.
SO it is only a question of what they do with the data?
Do they use it to list sites?
NO. This still would not make any sense. I guess it does to those that believe submitting their sites helps - I don't - and google has said as much (except for foreign sites).
Do they collect the data? Of course - why would they not? THE GET THE DATA - Why not keep it? We knew what we were getting in (or should have known) for when we signed up for the toolbar.
Could google be visiting sites for other reasons? Sure - they are on the edge of technology - it would not suprise me that they visit sites based on the toolbar, but there would be no reason to list them.
I have always believed the toolbar holds out great promise. I strongly believe google is working on and will release a custom SERP type system. They HAVE NO CHOICE.
This will be the next generation and the toolbar will play a great part. There will always be privacy concerns - and I think google should NOT KEEP individual data past some point - or allow people to delete it.
The toolbar is worth billions. Just wait and see.
GoogleBot visits what you visit if you have the toolbarbut I am 100% sure that GG is a Google rep. and the PageRank is the best bate that a Search engine has ever created-
What about outgoing links?
I am 100% sure that GG is a Google rep. and the PageRank is the best bate that a Search engine has ever created
GG IS a google rep - he/she/them hasn't claimed anything differently. Google doesn't need to study the habits of those that rank high - they know how the ranking system works.
I think google is busted on this one. They can always change habits (just turn the function off for a while while WMW is not looking), which will make it look like they never did it before, so that there's no proof one way or another.
('busted') - BTW, I don't care where they get the URLs, and I don't think it's a privacy issue either. If you need a URL that's private, make sure you need password access to it.
I think it would be interesting to do a survey to see the correlation between two questions and individual believes about Google:
1) I submit pages to Google (TRUE/FALSE)
2) I think Google is visiting sites based on Toolbar data (TRUE/FALSE)
If there was some advantage to doing what they are suspected of doing, then maybe they could be "busted". I just don't see it - and am still willing to put my butt on the line :)
I agree that it isn't against their privacy policy, but that isn't saying much.
Amazon subsidiary Alexa settled a class-action lawsuit last year because their privacy policy was inadequate. It failed to state that personally-identifiable information was being collected. Alexa agreed to pay out up to $1.9 million to users whose personal information was in their database, at $40 for each user. Alexa admitted no wrongdoing, but their privacy policy is now very detailed and explicit. The FTC was considering action, and then took no action once the class-action settlement was reached.
Since there have been reports that Google, on occasion, indexes pages that no one expected or intended to be in their index, the fact that the toolbar is used in this way becomes a potential liability. Not so much because Google has fetched pages from the public Web, but more because users of the toolbar have not been informed that they should surf semi-private sites only with the advanced features turned off. Most such semi-private sites or directories have no links coming in, and the webmasters do not anticipate any interest from bots at all. Therefore, they aren't excluded in robots.txt, or password protected.
If Google were to admit that its toolbar is finding new sites, it means that anyone who can prove damages from Google's aggressive crawling and indexing would have a much easier task when it comes to proving that Google's toolbar is responsible.
Therefore, Google is much better advised, from a legal point of view, to claim that referrer information is leaking out of browsers, and that their toolbar is blameless and would never do such a thing.
A series of careful tests, with witnesses, would be needed to convince a jury that the toolbar is being used in this way. It would have to be done quietly, before Google turns off the toolbar-fed crawling. Most lawyers would not be up to the task, so the situation remains at a stalemate.
At the same time, to aggressively deny that this is happening, thereby drawing attention to the issue and encouraging webmasters to demonstrate that it is in fact happening, would undermine Google's credibility.
Much better to remain silent; it solves the problem from both ends.
In my own personal view--note me distancing this post from being official ;)--security through obscurity is a bad idea. In general, any time you can type a URI into a browser and get confidential data, there's a risk that the URI will leak to other people or to a search engine. I know Lisa handles this fact properly by using a password on her sensitive pages--every webmaster should do that. Every six months or so, some cyberstore or bank (!) leaves customer data or even (heaven forbid!) credit card data lying out on a "secret" url that no one could ever guess. Invariably, the secret url shows up in a referrer log or some similar list of links. Then some poor webmaster gets the shock of his year, and we have to hustle to take down a page.
My takehome message is: Security through obscurity == bad. If someone could type your url into a browser and find something sensitive, protect it with authentication. Search engines get better at finding pages over time--so play it safe and protect confidential data with passwords. The toolbar is an optional program we provide, with opt-in required for the advanced features. If you don't feel comfortable with the toolbar, don't opt-in for advanced features, or don't install the toolbar at all.
And just to make it an even fourth time, let me clarify that I'm just giving my personal opinion here--I'm not speaking in an official capacity. :)
You're saying, GoogleGuy, that you don't know if the toolbar data is used to detect new sites or new pages. You think it isn't used this way, but you don't know. In other words, your bosses at Google will not tell you. Or if you do know, they won't let you tell us. Or you don't want to tell us. Or you don't think we have a right to know. One or more of the above.
But if the toolbar does do this, then your unofficial understanding is that this is already permitted by the clause in your privacy policy that data can be used to "improve your crawls."
And therefore, the argument that Google's privacy policy should warn toolbar users who opt to use advanced features, that unwanted, unexpected, and invasive indexing may result from such use, is a specious argument. If privacy breeches occur, it's the fault of sloppy webmasters who should use password protection.
And finally, readers of WmW can't take anything you say seriously, because it's all unofficial, and Google has no intention of making their official position known, or of changing its behavior. And the behavior in question may or may not be occurring, and that's unofficial also.
Is this perfectly clear to you now, Lisa?
I gave up holding credit card data on our servers as it was too much of a risk, the hackers are better than us and have more time to break in. If we were big enough to put a dedicated team of about three people on security issues then I might think about it but as it stands not a chance.
In all cases you need to think about information and how sensitive it is if it is on a website.
Do you want people trawling through your log files? Do you want your webstats to be publically viewable? If not password protect them.
Yes, this is what any responsible Webmaster will tell you.
> And finally, readers of WebmasterWorld can't take anything you say seriously, because it's all unofficial, ...
I doubt that Google give many official replies (checked, of course, by marketing and legal staff) at 10:30pm PST on the July 4 weekend. :) GoogleGuy's always been clear, he's here in his personal time to come talk to us. If you want an official response then you should look elsewhere [google.com].
PsychoTekk, that should work. Alternatively you can turn off 'advanced features' and achieve the same thing.