Forum Moderators: open
If so, it seems that people may unknowingly reveal pages that are intended to be private. While "security by obscurity is no security", the unintended consequences could be bad.
I have the same question about the Alexa toolbar, if anyone knows the answer to that too.
using toolbar data in the future to augment our crawl is not only a good idea
So what you are saying off the record is, The toolbar may be used to do just this in the future and is covered by your policy.
I really don't see a big problem with google sending toolbar data for crawls. I just hope it is not used to punish SEOs. I think by putting the PR on the toolbar google is luring SEOs. For what reason is another story.
It's my personal, unofficial belief that using toolbar data in the future to augment our crawl is not only a good idea, but specifically allowed by the original policies we posted.
I think it would be a great way to determine by another means how popular among surfers a URL really is. The PR algorithm now is mostly concerned with other webmasters. But perhaps what surfers like is just as (if not more) important than what webmasters want?
Richard Lowe
Forgive me if this question has been asked... What happens if you simply hide the toolbar? Searches run on Google with the toolbar hidden do not show up in the toolbar search history, so it seems reasonable to assume the toolbar is inactivated when it doesn't display.
While this experiment is encouraging, it doesn't answer the concern about auto updates.
Here's an interesting tidbit. When my toolbar phoned home, it connected to Google as 216.239.*.* and then Google tried to set a cookie for the google.com domain. On my IE 5.5, which has all cookies enabled and has no options for third-party cookies in any event, the google.com cookie attempt didn't take!
I think it's because of the mismatch between the 216.239.*.* and the google.com -- obviously, IE isn't going to do a IP lookup (even if it could), just to check on cookie rights. All they do is some parsing of the info available for a tail match of the domain.
So whenever Google tries to use their IP number instead of the domain (like with the toolbar phone-home and also with cache copies, apparently to increase speed), their cookie appears to go nowhere when they try to set it. That means they can't read whatever cookie was previously set by google.com either!
Big gun jams on little bullet -- a tiny, tiny victory in the cookie wars.
hehe, I brought it up just for you :)
Let's say for example that the domain I just purchased was owned at some time. However long ago this domain had an owner is unknown, and like I said it is highly unlikely that it ever had an owner. Wouldn't Google only check up on this domain during the full crawl mode during pre/post update?
Or is it just a coincidence that Google checks the new pages/sites soon after someone visits it?
The toolbar doesn't worry me, actually I like the idea of crawling pages that someone visits from data collected by the toolbar. You saw my thoughts in another post that discussed Outride and Google. I'm all for it. I know the risks of running something that phones home but I also know the benefits.
So....Is this something Norvig has in store with his machine learning skills. ;) New AI project coming out soon? Are you guys hoarding and testing toolbar data for page modifications, popularity, and surfing patterns?
As of this posting, however, it is a coincidence if a googlebot crawled a site/page soon after someone visited it. When I was in school they discussed types of logical fallacies, such as "post hoc." A Google search returns fun results, but it basically just means that an event follows another event in time, so you assume that the first event caused the second event. So if it rained the last three times after I washed my car, it's wrong to assume that washing my car caused it to rain. Right now, the same thing goes for toolbar and googlebot. :)
Is it a possibility that you are also crawling all the sites of some poular web hosts so that whenever a new site is uploaded, googlebot goes for a bite. Is this one of the many ways you mentioned, GoogleGuy ?
>> So if it rained the last three times after I washed my car, it's wrong to assume that washing my car caused it to rain.
Nice example, GoogleGuy but to look at it a bit differently - let's say I buy a new car, drive it to my home at the middle of the night and am very sure that no one has seen it - but still next day, my friend from the neighbouring town calls me up and congratulates for my new car, I have no clue - how my friend knows about it - was googlebot there in the middle of night and saw me driving my new car ;)
My problem with this is that PHP appends a ?PHPSESSID=xxxx to the end of URLs, and Googlebot is likely to view these as duplicate pages and ommit them, sigh!
Is it that googlebot 'now' truncates URLs at the "?" or has this never been / never going to be the case.
Does anyone have any experience with this please?
Thanks.