homepage Welcome to WebmasterWorld Guest from 54.163.70.249
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Google / Google Desktop Tools and Google Labs Projects
Forum Library, Charter, Moderator: open

Google Desktop Tools and Google Labs Projects Forum

This 38 message thread spans 2 pages: 38 ( [1] 2 > >     
Google toolbar causes page indexing?
Does visiting a page cause it to be indexed?
Steve_Yost




msg:1105644
 1:01 pm on Apr 10, 2002 (gmt 0)

Someone told me that with the Google toolbar installed and Advanced Options on (page ranking and categories), Google will pick up any page you visit for crawling and indexing. Is that true?

If so, it seems that people may unknowingly reveal pages that are intended to be private. While "security by obscurity is no security", the unintended consequences could be bad.

I have the same question about the Alexa toolbar, if anyone knows the answer to that too.

 

Craig_F




msg:1105645
 1:13 pm on Apr 10, 2002 (gmt 0)

Funny, a co-worker mentioned this same issue to me yesterday.

I didn't have a good answer. So, anyone know the real deal?

Welcome to WMW Steve!

chris_f




msg:1105646
 1:14 pm on Apr 10, 2002 (gmt 0)

I do not believe this is true at all. I have seen evidence to the contary infact. I believe some has assumed this. Or, has visited a site which isn't listed and has seen it has a PR and believed Google had just crawled it. There is no evidence supporting but I can give you loads of evidence again that fact.

TallTroll




msg:1105647
 1:33 pm on Apr 10, 2002 (gmt 0)

I would have to say that I do believe that the GG toolbar phones home with URLs of sites that it has been to

I have seen instances of small, new sites with absolutely no links being found in the Google index - after I've viewed them using IE w/toolbar

Chris_R




msg:1105648
 1:44 pm on Apr 10, 2002 (gmt 0)

This would be a huge security violation.

I seriously doubt google would do this. Several pages I visit have passwords in them such as:

www.example.com/aws/432/password/

I know this isn't a good method of security, but I have no choice - as those are the way those companies operate.

There is no reason for google to visit sites you are visiting with the toolbar. Google grades sites based on the number and quality of links to it.

Why would they waste resources going to these sites - when they could just follow links off the web? They won't list sites that don't have at least one link to it anyway.

chris_f




msg:1105649
 1:44 pm on Apr 10, 2002 (gmt 0)

I can understand that the Toolbar 'phones home', however, I don't believe it causes Google to crawl them. I have visited several thousands sites that are not in Google. Also, the maintenance area of one of my clients sites is not excluded in the robots.txt file yet it is not crawled.

chris_f




msg:1105650
 1:46 pm on Apr 10, 2002 (gmt 0)

Actually I take everything I have said back. I now believe the toolbar is crawling sites. One of my sites which noone and I mean noone knows about has been crawled by Google. This would explain it.

ciml




msg:1105651
 1:56 pm on Apr 10, 2002 (gmt 0)

I don't believe that Google use the Toolbar for finding new sites. We would have noticed. Maybe they'll use it to supplement PageRank sometime though, who knows?

I don't go along with the whole obscurity thing, though. There are just too many ways for URLs to leak.

Steve_Yost




msg:1105652
 2:04 pm on Apr 10, 2002 (gmt 0)

The anecdotal evidence is useful, but I'd like an authoritative answer from someone at Google. Does anybody there read this?

Google's privacy policy (http://toolbar.google.com/privacy.html) says it grabs the URLs (obviously). It doesn't say that it uses them for spidering, but it doesn't specifically say that it doesn't.

Alexa is more specific (http://www.alexa.com/help/webmasters/index.html), and I'm adding a robot exclusion rule for them. (I don't necessarly want to add robot exclusion for all search engines).

Craig_F




msg:1105653
 2:19 pm on Apr 10, 2002 (gmt 0)

Yes, I too was hoping GoogleGuy might chime in.

TallTroll




msg:1105654
 2:35 pm on Apr 10, 2002 (gmt 0)

>> It doesn't say that it uses them for spidering

What else it going use them for? Google print them out and paper the walls with them or something? Some members of the board have a theory that the toolbar data can be used to ID SEO types, by finding unusual activity patterns

chris_f




msg:1105655
 2:46 pm on Apr 10, 2002 (gmt 0)

I am now 100% convinced that Google does use the Toolbar to get site addresses to spider. I have just developed a new site. Noone knows about it and as such the are no links in. Yet I have visited in my browser with the toolbar. This site is in Google. If it doesn't use the toolbar to get sites for crawling then how did it get listed?

Steve_Yost




msg:1105656
 2:47 pm on Apr 10, 2002 (gmt 0)

Google's toolbar advanced options need to get your URL to do the page ranking and category. It's not obvious that they want to do anything else with it, and doing something that would violate privacy seems to fly in the face of Google's good-folks image. So I don't want to speculate any further until we hear from them or someone who knows for sure.

Chris_R




msg:1105657
 4:05 pm on Apr 10, 2002 (gmt 0)

They can't spider pages this way. It would be a security violation and THEY DON'T WANT PAGES THAT DON'T HAVE LINKS TO THEM.

It would make NO SENSE.

I put up new sites a few times a week. There are plenty of people that come there with no links to it.

If you haven't gotten an email from someone spamming you to help you place your new site in the search engines - then you aren't making enought sites :)

MAYBE it is some sort of test thing. It would make no sense, and would be dangerous, for them to add pages this way. I could see maybe visiting the root page for some sort of test thing.

This would be a huge waste of resources. I would be amazed if this were the case.

TomA




msg:1105658
 5:33 pm on Apr 10, 2002 (gmt 0)

Shoot me down if you will, but I suspect that Google is gathering visits to sites data through the toolbar. It is one thing to have great link popularity, but links don't directly equate to visits. With the data being presented to them through the toolbar they can trace how the user arrived at a site. Visits to a given site from links on an "on topic" site would possibly boost the relevence of the site..ie PageRank

ahmad




msg:1105659
 5:34 pm on Apr 10, 2002 (gmt 0)


... I also doubt Google would do that. I had another doorway to one of my websites such as hosting.company.com/~sitename and I usually checked the site through that doorway for development not to have any effect on the actual web site logs and keep the logs accurate (checking the site this way effects another log file on the system)..

Nobody ever knew about this (except for the hosting company with a lot of customers, so they won't ask google to index that), there's no inbound links to that site and the site is not listed anywhere. Google's last update shows that in the search results (which would be considered as SPAM I guess, since the sites are exact duplicates)..

Chris_R




msg:1105660
 6:10 pm on Apr 10, 2002 (gmt 0)

PageRank does not work like that. It is a known equation and the number of visits has no bearing on it.

A page's ranking (which is what I suspect you meant anyway) may very well should have extra emphasis put on it by the amount of trafffic it receives.

Google did mention this in one of their papers.

However, google is smart enough to know the difference between one webmaster working on their pages and legitimate traffic. They would most likely only count each toolbar once.

This will probably be used to some extent in the future - especially after google starts customizing their pages for specific users.

I am sure you are right that google is collecting this data, I don't think they are using it yet.

And your point about PR is well taken in that a page with good page rank doesn't mean it gets many visits - such as the yahoo privacy pages. Great PageRank, but who cares about those pages.

the toolbar I am sure holds great promise for the future.

x2r




msg:1105661
 6:24 pm on Apr 10, 2002 (gmt 0)

From [google.com...]

Why is Googlebot downloading information from our "secret" web server?
It is almost impossible to keep a web server secret by not publishing any links to it. As soon as someone follows a link from your "secret" server to another web server, it is likely that your "secret" URL is in the referer tag, and it can be stored and possibly published by the other web server in its referer log. So, if there is a link to your "secret" web server or page on the web anywhere, it is likely that Googlebot and other "web crawlers" will find it.

French_cnx




msg:1105662
 6:31 pm on Apr 10, 2002 (gmt 0)

I will ask a few questions regarding this:
is it technically possible to do that? (not only in programing it but also in managing all this info)
what Google is doing with the votes(in the toolbar)?
what happens if I vote for a site that doesn't exist?

Lisa




msg:1105663
 7:56 pm on Apr 10, 2002 (gmt 0)

THEY DON'T WANT PAGES THAT DON'T HAVE LINKS TO THEM.

It would make NO SENSE.

Actually, that makes perfect sence they want to find all pages on the Internet. Think of the Internet as a big web, well sometimes it is multiple webs and one web does not touch another web. This is a way for them to find non-touching webs. It is also a way to locate content from a niave or lazy webmaster. They are tilling the earth and they will leave no stone unturned.

I know for a fact that if you visit a page with GGbar that GGbot will visit it too!

There is a simple test. Make a webpage, place no links to it. Visit while viewing with GGbar, wait 3 months. And check if your page is in. I did this. And guess what! That page is now in Google.

starec




msg:1105664
 8:12 pm on Apr 10, 2002 (gmt 0)

Lisa, x2r's quote may be an alternative explanation of your quasi-isolated page experiment.
I've also put up a page with no inbound AND no outbound links, 6 months later and NO visits from either spiders nor humans (exept myself, with the toolbar installed, advanced options activated).

GoogleGuy




msg:1105665
 1:48 am on Apr 11, 2002 (gmt 0)
Hey, our privacy policy says that we won't give personally identifiable outside of Google. We don't use toolbar data in our crawl/indexing, but that would be allowed by our privacy policy. The toolbar does go to great lengths to avoid returning personal info. Right now, we strip out username/password from http://user:password@host/path. We also truncate dynamic urls at the? mark. Finally, we try to avoid any intranet urls (that's hard to do exactly, but we do our best).

Sorry, chris_f, your url leaked out some other way. :)

Steve_Yost




msg:1105666
 2:21 am on Apr 11, 2002 (gmt 0)

> We don't use toolbar data in our
> crawl/indexing, but that would be
> allowed by our privacy policy.

Thanks, GoogleGuy. That's exactly the conclusive answer I wanted. I read your policy as not saying anything about this (I don't consider a plain URL to be personally identifiable info), so your statement is what I need. Would you consider adding that to your stated privacy policy?

WebGuerrilla




msg:1105667
 4:59 am on Apr 11, 2002 (gmt 0)

The thing everyone who uses any kind of toolbar add-on needs to keep in mind is that they all phone home, and some (Alexa) do crawl the URLS they collect.

And as Googleguy pointed out, the privacy statements almost always include language that makes it O.K. for them to implement such a policy in the future.

Having had Alexa show up and begin tearing through a not-for-the-public site in the past, I've made it a standard policy to not use any toolbar equipped browser when doing any kind of sensitive work.

Even though I believe that Google isn't currently using their toolbar for crawling purposes, I'm also not real confident that they'll send me a personal email If/when they change their mind. :)

Visit Thailand




msg:1105668
 5:35 am on Jul 4, 2002 (gmt 0)

What I find a bit strange about the ranking is that pages I created and uploaded seconds ago are already ranked.

To add to the discussion though and to see if any new info has come to light, It would not surprise me if google added urls visited to their database to crawl and if it already exists then they skip it and if it doesn't then it gets crawled.

An easy way to get links that they may not know about otherwise, and as it says in the privacy st. if you do not want to be tracked turn it off.

mack




msg:1105669
 6:09 am on Jul 4, 2002 (gmt 0)

I agree with the above post...

Links dont always mean popular whereas if a site is getting a million visits a day that means popular. If google notice that a high percentage of visits occur to foo.com then foo.com is popular and deserves to be ranked high in serps

Perhaps that are using a combination of PR and (real) popularity to rank sites.

msgraph




msg:1105670
 3:34 pm on Aug 9, 2002 (gmt 0)

I just bought a domain about a week ago. It could have had an owner before but I highly doubt it. The domain has been available for a few months so I'm not sure about before that time and I can't find anything referring to it. (ie old links)
I'm still building the site so I have not uploaded any files except for the index page last week to test something. I viewed the page once with the toolbar active and then removed the page when I finished my test.

I checked the log files recently and what do you know? Googlebot came sniffing around soon after I visited the page.

I have been hearing from others that this has really kicked in these past couple weeks.

Drastic




msg:1105671
 4:03 pm on Aug 9, 2002 (gmt 0)

Yup, I've had a site with no incoming links get spidered shortly after I uploaded the last page.

I did check the site out with IE on a machine with the toolbar, which I believe is the only way it could have been found that quickly.

Robert Charlton




msg:1105672
 12:01 am on Aug 10, 2002 (gmt 0)

>>The thing everyone who uses any kind of toolbar add-on needs to keep in mind is that they all phone home, and some (Alexa) do crawl the URLS they collect.<<

Forgive me if this question has been asked... What happens if you simply hide the toolbar? Searches run on Google with the toolbar hidden do not show up in the toolbar search history, so it seems reasonable to assume the toolbar is inactivated when it doesn't display.

GoogleGuy




msg:1105673
 1:58 am on Aug 10, 2002 (gmt 0)

Oy, it's the thread that wouldn't die. ;)

To sum up from my point of view:
- There are many many ways a url can become known and then crawled.
- To the best of my knowledge, the toolbar is not currently one of those ways a url becomes known.
- To the best of my knowledge, anyone who is certain that the toolbar has caused their page to be crawled is therefore mistaken.
- If you turn off the advanced features, the toolbar is completely inert and does not report any info to Google.

And these two I would add on an unofficial basis:
- It's my personal, unofficial belief that using toolbar data in the future to augment our crawl is not only a good idea, but specifically allowed by the original policies we posted.
- To the best of my knowledge, no one has ever been forced to install the toolbar. If the toolbar worries you, then just don't install it.

This 38 message thread spans 2 pages: 38 ( [1] 2 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google Desktop Tools and Google Labs Projects
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved