Welcome to WebmasterWorld Guest from 220.127.116.11
Forum Moderators: open
18.104.22.168 - - [06/May/2002:23:51:06 -0700] "GET /location_that_I_made_to_test_googlebot/ HTTP/1.0" 401 46 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"
Please set traps and prove me wrong but I am positive that they do this! The theory to discount this used to be that the location was crawled because it was in a referral log somewhere. Well, that is not the case because I only referrer to myself and my logs are private.
I hope they stick with their "needs a few links to show up in listings" motto because that could create some problems.
Maybe Google is compartmentalized and each section only knows about certain things so that they don't get leaked out.
Either way, when a SE rep talks, I only listen to what they have to say about something new or bug-fixing.
When they say what they do and don't do, I just let it go out the other ear.
It says they only report back for figuring out what PR to report to you and if it is in the directory.
If their explanation is misleading, we might have a case of creeping privacy-violation at Google, or a case of the right hand not knowing what the left hand is doing. It's been 17 months since they chose their privacy wording so carefully with respect to the toolbar, and something may have changed.
I can forgive the CIA for setting illegal cookies after they apologize and stop doing it -- the CIA is pretty big and their document search site was outsourced. (Still, you'd think the spooks would double-check their contractor's work.)
But Google with 300 or so employees? It would be hard for them to say "Oops!"
Here is a quote from GoogleGuy
Sorry, chris_f, your url leaked out some other way.
... installing the toolbar didn't make googlebot crawl your page
Scenario: If you check out your best friend's private collection of revealing photos of you, that he put up in a private, unlinked, special, new directory on his website for a photography-class assignment, Google won't tell anyone that you went to this site.
GoogleGuy, please get an opinion on this from someone responsible for privacy at Google -- someone willing to go on the public record with a real name. And while you're at it, do something about those ridiculous 36-year cookies! I still haven't heard from your Director of Corporate Communications, even though he said on March 22 that someone would get back to me "shortly."
Google is NOT INTERESTED IN FINDING NEW PAGES THAT ARE NOT LINKED FROM OTHER PLACES.
Google doesn't care about "Interesting Pages". They don't work that way. If you are interested in a page - chances are you found it from a website. If that is correct - google will find it on its own.
They want pages with high PR. A page with no links to it has 0 PR.
It is just a waste of resources for them to do this for anything but testing.
It is no different than people submitting their pages to google. It is useless. Google works by PR+IR. If you have no PR - then you have nothing as far as they are concerned.
I have always loved google, but their privacy poilcy basically says "we collect everything we can about your searching history, but won't give it to anyone else without a court order."
Therefore - there is no privacy with google itself - just third parties.
Google probably has one of the top 10 most valuable databases of information in the world. This is one of the reasons I valued them so high for their IPO.
I am not doubting what Lisa says, just doubting that it is being used to produce a listing for SERP. It defies logic.
I would think there is enough info for them to prove where this came from if Lisa is wrong. Just a guess on that though.
Interesting none the less.
I am not concerned with it showing up in SERPs. With a natural PR of 1 and no links to it would not show up on a regular searches. But inside my page I list lots of valuable information (not that they could get past my 401). Well, if you search for item 3 and item 489 listed on my secret page then maybe it is the only URL on the Internet that has a cross pollination of both those two or three keywords. So in less common searches it would show up (had it not been 401 protected).
I encourage you other webmasters to test this out. Make a webpage, Turn advanced toolbar on. Only visit the page by typing the address in your address bar. Place a log on that one page. Wait until the next crawl and see what happens. Make sure not to link to this page, make sure the page is not a default document, make sure not to tell anyone the location, and make sure not to link to outside URLs.
Are we supposed to believe that in a corporation of 300 employees, the managers who set policy also write code? If the answer is no, then are we supposed to believe that management tells their software engineers everything? Does a software engineer writing for one application know what's happening with the software engineers writing for a different application?
The fact that GoogleGuy is indeed a Googler, and he writes code, is not completely satisfying. The whole issue of anonymity bothers me. I fail to understand why Google doesn't have an official ombudsman on staff who is not anonymous, and who has the authority to go around to anyone at Google and dig out answers, and publish what he knows on Webmasterworld or anywhere else, without fear of immediate retribution, according to the terms of his contract. The Washington Post has one. Who can claim that Google is so much less important than the Washington Post these days, in terms of international impact? That's the bottom-line problem with Webmasterworld and an unidentified GoogleGuy. (I don't like the Washington Post; I'm just saying that an ombudsman is not a bad idea in principle, and would be a lot better than an unidentifiable GoogleGuy.)
This issue that Lisa brought up is a perfect example of how a Google ombudsman could function in a very helpful way without revealing trade secrets. The issue I addressed to Google's Director of Corporate Communications is another example.
Chris_R: I agree that it doesn't make sense. But it only doesn't make sense within the narrow confines of what we presume Google is all about, and within our understanding of the role of PageRank. What if Google has ulterior motives?
Privacy is a more fundamental ethic than PageRank. (Actually, PageRank is not an ethic at all -- it's an excuse for not having anything better.) If I'm making a claim based on notions of privacy, then yes, you can say that it makes no sense for Google to be doing this in terms of PageRank. But the first question must be, "Is Google indeed doing this?" The question that follows must be, "If so, and if not for reasons of PageRank, then why are they doing this?"
In other words, the facts about what's happening are most important. Next, the justification from GoogleGuy is important. The least important thing is why this is smart or stupid in terms of PageRank.
For all we know, PageRank could be a CIA cover story! (Hey, I'm just kidding, tofu!)
This problem is this: Librarians have been around for over 100 years. Journalists have been around for over 100 years. Over the course of decades, a consensus of public accountability evolves in professions such as these. Lobbying organizations emerge to do things like protect First Amendment rights. Chinese Walls are erected between news departments and op-ed departments, and advertising is clearly marked as such. Conflicts of interest are fair game for exposure. By now, in these 100-year old professions, everyone has a sense of where the line is drawn in terms of the public interest.
Then along comes the Internet, at warp speed. In four years, Google owns the most important database on earth. And here we're all sitting at our keyboards, begging for scraps of information from some anonymous "GoogleGuy."
It shouldn't be that way; the public interest deserves more respect than that. It's no one's fault, and I'm not accusing Google of doing anything bad. It's just that it all happened so fast that the public sector hasn't had the time it needs to assert itself.
Another perfect example of this is Microsoft. They aren't self-consciously "bad," it could be argued. It just evolved that way because it happened so fast.
Google became important much faster than Microsoft. That means we have to be more alert about what Google is doing. Everyone can see by now where we failed with Microsoft.
Perhaps Google will follow the toolbar to pages with the intent in counting links rather that actualy indexing the page. It does make sence to an extent to gather this date from a point of view of giving the user what it wants. One question... did the page location_that_I_made_to_test_googlebot actualy enf up in the index??? or was it the recent crawl. search results...
Your search - location_that_I_made_to_test_googlebot - did not match any documents.
No pages were found containing "location_that_i_made_to_test_googlebot".
very interestig find Lisa :)
That was not the actual page name nor path. I have omitted to for security reasons. Everytime someone visits that page I receive an email if it is not comming from my IP. The page would not be indexed because it is password protected. And if it was indexed (impossible because it is protected) it would not show up in the SERPs for like another month. Remember they still have to index and calculate all those pages.
I've been using the advanced toolbar to navigate though private webpages since it was made available for download and never have had a Google IP hit any one of them. Even links that contain the word google in the URL.
Maybe your ISP or perhaps some spyware already installed on your computer is maintaining logs on your Internet activity and saving them offsite on some server somewhere (cia.gov?)and Google just happened to run across one of them.
For one, I am my own ISP.
Two, I monitor very closely. No Spyware makes it past me. But I allow the GG-toolbar because I value it. :)
Three, my server farm with that website is located in the same building as me.
Four, My personal traffic to and from this website doesn't leave my building.
I think most people know all this, but I just wanted to emphasize it. If you want to keep something private on the web, .htaccess and passwords are your friends. If you want to keep something out of Google (or any other search engine), robots.txt and meta tags are your friends. If someone can type a url into a browser and find your page, don't count on a secret url remaining secret. Use passwords or robots.txt to protect data.
That's my public service announcement for the day. :)
I have worked as a coder for about 18 years and, believe me, I have never come across a fellow programmer who can spell even easy words, let alone some of the complicated ones GG has used.
He is an obvious charlatan and exhibits an education which disbars him from the noble art of coding. ;)
GET /search?client=navclient-auto&ch=52387311422&q=info:http%3A%2F%2Fwww%2Etest%2Ede%2F HTTP/1.1
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)
google get's the data and the respondse, then closes the connection.
HTTP/1.0 200 OK
Date: Wed, 08 May 2002 07:30:40 GMT
what for would google need that data if it was not for the purpose to get
aware of brandnew sites as fast as possible?
this of course only happens if one loads documents that are not located
one the computer itself, where those documents on your hd i suppose would be
considered as private *G*
i don't like the msie that much anymore anyway, i rather use opera now,
much more secure.
about the privacy stuff... i don't care either, if there is a docu that
i don't want to be spidered i set up a robots.txt file
btw, since m$, spyware and other apps, is there left any privacy on the net? ;)
I can understand people's fear about private areas being indexed - so why don't people password protect them and make sure in the robots file Google doesn't crawl the directory?
In some ways this WOULD be a nifty feature. I am about to launch a new site of mine that's taken bloody weeks to finish and I would love it if I can get the bugger into the next Google update... So I will visit the URL a few times with my advanced toolbar activated before Friday ;)
To the others,
Personally I find robots.txt a big red flag for evil bots. It says, Hey Bot! Come look at what I donít want you to look at. It is right here. I hate spamcrawlers. But for real search engines it can be useful for sales prices and other data you donít want cached in an engine. Things you want to have change based on the user or the special. Never hide things with robots.txt. Hide with a 401 status code and prompt for username and password.
The fact that GoogleGuy is indeed a Googler, and he writes code, is not completely satisfying. The whole issue of anonymity bothers me. I fail to understand why Google doesn't have an official ombudsman on staff who is not anonymous
if i remember correctly when GoogleGuy first showed up he announced clearly that although he worked for google, he was visiting wmw in an "unofficial" capacity.
it cannot be another way - even though he is clearly sanctioned by Google - unless he was to only post text that had been carefully thought through and given official clearance at the plex, because posting in a forum is too open to misinterpretation and if it was official it could be quoted all over the net as google policy.
i think we should be grateful that google even bothers at all, check out the regular posts from the other search engines (not)
and like all information - even company policy statements - its up to the reader to make a judgement on the truth/agenda attatched to it.
re. anonymity. i'm happy to be completely anonymous here, as are others, it doesn't devalue my/their contributions, you must judge them for yourself as to their worth, others are not anonymous (possibly) and equally their posts are not extra valid per se and one should judge their posts equally.
Absolutely right, Lisa.
I've been waiting for this thread for some time now; plenty of people here have visited addresses with the Toolbar advanced features on to check for when Google starts using the information. GoogleGuy's direct response makes me suspect strongly that there was some other way of finding the address, but surely they're going to use the data some day.
I take it that your Web stat's are password protected and that you haven't given the secret address to other people?
<added>Thanks, Iguana, you brightened up a rainy morning in Scotland:)</added>