Funny, a co-worker mentioned this same issue to me yesterday.
I didn't have a good answer. So, anyone know the real deal?
Welcome to WMW Steve!
I do not believe this is true at all. I have seen evidence to the contary infact. I believe some has assumed this. Or, has visited a site which isn't listed and has seen it has a PR and believed Google had just crawled it. There is no evidence supporting but I can give you loads of evidence again that fact.
I would have to say that I do believe that the GG toolbar phones home with URLs of sites that it has been to
I have seen instances of small, new sites with absolutely no links being found in the Google index - after I've viewed them using IE w/toolbar
This would be a huge security violation.
I seriously doubt google would do this. Several pages I visit have passwords in them such as:
I know this isn't a good method of security, but I have no choice - as those are the way those companies operate.
There is no reason for google to visit sites you are visiting with the toolbar. Google grades sites based on the number and quality of links to it.
Why would they waste resources going to these sites - when they could just follow links off the web? They won't list sites that don't have at least one link to it anyway.
I can understand that the Toolbar 'phones home', however, I don't believe it causes Google to crawl them. I have visited several thousands sites that are not in Google. Also, the maintenance area of one of my clients sites is not excluded in the robots.txt file yet it is not crawled.
Actually I take everything I have said back. I now believe the toolbar is crawling sites. One of my sites which noone and I mean noone knows about has been crawled by Google. This would explain it.
I don't believe that Google use the Toolbar for finding new sites. We would have noticed. Maybe they'll use it to supplement PageRank sometime though, who knows?
I don't go along with the whole obscurity thing, though. There are just too many ways for URLs to leak.
The anecdotal evidence is useful, but I'd like an authoritative answer from someone at Google. Does anybody there read this?
Alexa is more specific (http://www.alexa.com/help/webmasters/index.html), and I'm adding a robot exclusion rule for them. (I don't necessarly want to add robot exclusion for all search engines).
Yes, I too was hoping GoogleGuy might chime in.
>> It doesn't say that it uses them for spidering
What else it going use them for? Google print them out and paper the walls with them or something? Some members of the board have a theory that the toolbar data can be used to ID SEO types, by finding unusual activity patterns
I am now 100% convinced that Google does use the Toolbar to get site addresses to spider. I have just developed a new site. Noone knows about it and as such the are no links in. Yet I have visited in my browser with the toolbar. This site is in Google. If it doesn't use the toolbar to get sites for crawling then how did it get listed?
Google's toolbar advanced options need to get your URL to do the page ranking and category. It's not obvious that they want to do anything else with it, and doing something that would violate privacy seems to fly in the face of Google's good-folks image. So I don't want to speculate any further until we hear from them or someone who knows for sure.
They can't spider pages this way. It would be a security violation and THEY DON'T WANT PAGES THAT DON'T HAVE LINKS TO THEM.
It would make NO SENSE.
I put up new sites a few times a week. There are plenty of people that come there with no links to it.
If you haven't gotten an email from someone spamming you to help you place your new site in the search engines - then you aren't making enought sites :)
MAYBE it is some sort of test thing. It would make no sense, and would be dangerous, for them to add pages this way. I could see maybe visiting the root page for some sort of test thing.
This would be a huge waste of resources. I would be amazed if this were the case.
Shoot me down if you will, but I suspect that Google is gathering visits to sites data through the toolbar. It is one thing to have great link popularity, but links don't directly equate to visits. With the data being presented to them through the toolbar they can trace how the user arrived at a site. Visits to a given site from links on an "on topic" site would possibly boost the relevence of the site..ie PageRank
... I also doubt Google would do that. I had another doorway to one of my websites such as hosting.company.com/~sitename and I usually checked the site through that doorway for development not to have any effect on the actual web site logs and keep the logs accurate (checking the site this way effects another log file on the system)..
Nobody ever knew about this (except for the hosting company with a lot of customers, so they won't ask google to index that), there's no inbound links to that site and the site is not listed anywhere. Google's last update shows that in the search results (which would be considered as SPAM I guess, since the sites are exact duplicates)..
PageRank does not work like that. It is a known equation and the number of visits has no bearing on it.
A page's ranking (which is what I suspect you meant anyway) may very well should have extra emphasis put on it by the amount of trafffic it receives.
Google did mention this in one of their papers.
However, google is smart enough to know the difference between one webmaster working on their pages and legitimate traffic. They would most likely only count each toolbar once.
This will probably be used to some extent in the future - especially after google starts customizing their pages for specific users.
I am sure you are right that google is collecting this data, I don't think they are using it yet.
And your point about PR is well taken in that a page with good page rank doesn't mean it gets many visits - such as the yahoo privacy pages. Great PageRank, but who cares about those pages.
the toolbar I am sure holds great promise for the future.
Why is Googlebot downloading information from our "secret" web server?
It is almost impossible to keep a web server secret by not publishing any links to it. As soon as someone follows a link from your "secret" server to another web server, it is likely that your "secret" URL is in the referer tag, and it can be stored and possibly published by the other web server in its referer log. So, if there is a link to your "secret" web server or page on the web anywhere, it is likely that Googlebot and other "web crawlers" will find it.
I will ask a few questions regarding this:
is it technically possible to do that? (not only in programing it but also in managing all this info)
what Google is doing with the votes(in the toolbar)?
what happens if I vote for a site that doesn't exist?
|THEY DON'T WANT PAGES THAT DON'T HAVE LINKS TO THEM. |
It would make NO SENSE.
Actually, that makes perfect sence they want to find all pages on the Internet. Think of the Internet as a big web, well sometimes it is multiple webs and one web does not touch another web. This is a way for them to find non-touching webs. It is also a way to locate content from a niave or lazy webmaster. They are tilling the earth and they will leave no stone unturned.
I know for a fact that if you visit a page with GGbar that GGbot will visit it too!
There is a simple test. Make a webpage, place no links to it. Visit while viewing with GGbar, wait 3 months. And check if your page is in. I did this. And guess what! That page is now in Google.
Lisa, x2r's quote may be an alternative explanation of your quasi-isolated page experiment.
I've also put up a page with no inbound AND no outbound links, 6 months later and NO visits from either spiders nor humans (exept myself, with the toolbar installed, advanced options activated).
Sorry, chris_f, your url leaked out some other way. :)
> We don't use toolbar data in our
> crawl/indexing, but that would be
The thing everyone who uses any kind of toolbar add-on needs to keep in mind is that they all phone home, and some (Alexa) do crawl the URLS they collect.
And as Googleguy pointed out, the privacy statements almost always include language that makes it O.K. for them to implement such a policy in the future.
Having had Alexa show up and begin tearing through a not-for-the-public site in the past, I've made it a standard policy to not use any toolbar equipped browser when doing any kind of sensitive work.
Even though I believe that Google isn't currently using their toolbar for crawling purposes, I'm also not real confident that they'll send me a personal email If/when they change their mind. :)
What I find a bit strange about the ranking is that pages I created and uploaded seconds ago are already ranked.
To add to the discussion though and to see if any new info has come to light, It would not surprise me if google added urls visited to their database to crawl and if it already exists then they skip it and if it doesn't then it gets crawled.
An easy way to get links that they may not know about otherwise, and as it says in the privacy st. if you do not want to be tracked turn it off.
I agree with the above post...
Links dont always mean popular whereas if a site is getting a million visits a day that means popular. If google notice that a high percentage of visits occur to foo.com then foo.com is popular and deserves to be ranked high in serps
Perhaps that are using a combination of PR and (real) popularity to rank sites.
I just bought a domain about a week ago. It could have had an owner before but I highly doubt it. The domain has been available for a few months so I'm not sure about before that time and I can't find anything referring to it. (ie old links)
I'm still building the site so I have not uploaded any files except for the index page last week to test something. I viewed the page once with the toolbar active and then removed the page when I finished my test.
I checked the log files recently and what do you know? Googlebot came sniffing around soon after I visited the page.
I have been hearing from others that this has really kicked in these past couple weeks.
Yup, I've had a site with no incoming links get spidered shortly after I uploaded the last page.
I did check the site out with IE on a machine with the toolbar, which I believe is the only way it could have been found that quickly.
>>The thing everyone who uses any kind of toolbar add-on needs to keep in mind is that they all phone home, and some (Alexa) do crawl the URLS they collect.<<
Forgive me if this question has been asked... What happens if you simply hide the toolbar? Searches run on Google with the toolbar hidden do not show up in the toolbar search history, so it seems reasonable to assume the toolbar is inactivated when it doesn't display.
Oy, it's the thread that wouldn't die. ;)
To sum up from my point of view:
- There are many many ways a url can become known and then crawled.
- To the best of my knowledge, the toolbar is not currently one of those ways a url becomes known.
- To the best of my knowledge, anyone who is certain that the toolbar has caused their page to be crawled is therefore mistaken.
- If you turn off the advanced features, the toolbar is completely inert and does not report any info to Google.
And these two I would add on an unofficial basis:
- It's my personal, unofficial belief that using toolbar data in the future to augment our crawl is not only a good idea, but specifically allowed by the original policies we posted.
- To the best of my knowledge, no one has ever been forced to install the toolbar. If the toolbar worries you, then just don't install it.
| This 38 message thread spans 2 pages: 38 (  2 ) > > |