Forum Moderators: open

Message Too Old, No Replies

Breaking Google TOS?

My IT man says it's not

         

SlyOldDog

9:16 pm on Aug 24, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



We are writing some software to crawl the web for sites which would be good matches for us as link partners.

Now all this talk of bad neighbourhoods got me thinking - what's the best way to ensure a site isn't part of one? Well, in the absence of a Badrank toolbar, I think the best way is to target sites with some PageRank.

So here is the question. If, in order to determine the PageRank we automatically open a browser window with the toolbar installed for each site we are interested in, is that a violation of the TOS? We are not querying Google directly. Actually we are just browsing, but of course we know the toolbar will query Google for the Pagerank of the page, but that is Google's doing - not mine.

And how many queries per hour would be deemed automation?

Any views?

Thanks

[edited by: Marcia at 9:28 pm (utc) on Aug. 24, 2003]

GoogleGuy

9:59 pm on Aug 24, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I think any programmatic queries like that would be unwelcome--the toolbar is intended for personal use.

dmorison

10:48 pm on Aug 24, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Unless you're planning on intercepting the Toolbar queries (are you?) it sounds like you are just having the crawling process open up a browser window with interesting URLs.

If it then requires a human to look at the Googlebar and note down the PageRank then I would say "why bother with that automated step of opening a browser window" if you think that is going to upset Google?

Just have your crawling process output a list of interesting URLs for a human to assess later, then you're not breaking anyone's Terms of Service.

Having said that, I don't think what you've described could be considered automation any more than your browser opening a default "home page" automatically could be; so I sorta see where your IT guy is coming from. GG mentioned personal use, but every SEO in the land is using the toolbar for "business purposes"...

SlyOldDog

10:56 pm on Aug 24, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks. I see the point. That's why I asked the question.

Any non-Google checks we could run to ID a bad neighbourhood?

It's hard to resist the temptation to automate this part of our business. At the moment we pay 2 people to do the same work and they do it very slowly. Google themselves automate everything and anything, and, I might point out, perfom automated queries on my website, so is a case of the pot calling the kettle black :)

I assume the problem is that we would consume too much of Google's bandwidth. As others have pointed out in the past, we would gladly pay for the right to run automated queries, but Google have steadfastly refused such a service, even for the API service.

SlyOldDog

11:00 pm on Aug 24, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Sorry DMorrisson - didn't see your reply before I finished writing.

You don't need to intercept the toolbar or have a human look at the green bar. It's enough to check the temporary internet files where google places a file for each pagerank query. This can be done automatically.

so there is no need to fiddle with any of Google's software and break their TOS in this regard.

I am just looking for an acceptable way of crawling the web and elliminating bad neighbourhoods without a human check. Any ideas would be welome.

dmorison

11:02 pm on Aug 24, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You didn't mention whether you were planning on automatically intercepting the Toolbar communications to extract the PageRank.

That is all this hinges on (IMHO). If you are planning on probing the Toolbar communication to extract PR automatically then a definite no no.

However, if all you are doing is firing up a browser window automatically in order to prompt a human to look at it (and visually extract PR as part of their review process) then I don't think you have a problem.

Google will ban you if they want anyway; that has nothing to do with ToS. They stand only as "Exhibit A" should Google decide to try and sue you for breaking them.

[edited by: dmorison at 11:13 pm (utc) on Aug. 24, 2003]

dmorison

11:06 pm on Aug 24, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Woooozers, I didn't know Toolbar did that.

Ok, in that case; I am going to side with Google as that is akin to automating HTTP requests and screen scraping the results.

A human actually looking at the Toolbar was critical in my defence, so I am afraid I must stand down from this case!

Cheers!

SlyOldDog

11:21 pm on Aug 24, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Perhaps if I changed my WebmasterWorld handle to NiceGuy then I would get a different perspective on this? ;))

plumsauce

7:09 am on Aug 25, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member




First of all, if you decide to proceed. Turn off all cookies, and do it from say, earthlink or aol. They can ban to their hearts content. Of course, AOL paid revenue might drop some. Make sure that there is some appreciable delay used. Give no http-referer or the home page as the referer.

>>That is all this hinges on (IMHO). If you are planning on probing the Toolbar communication to extract PR automatically then a definite no no.

Huh? It's my connection, and I run a sniffer.

Now, if you want to *manually* do this, then

1/ have your program build you a page of links for each search

2/ turn on a sniffer with logging(this might be easier to parse than individual temp files)

3/ bring up the page in a browser

4/ *manually* click on each link

5/ autoproc *your* temp or sniffer dump files on *your* disk

+++

dmorison

7:33 am on Aug 25, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Huh? It's my connection, and I run a sniffer.

Sorry - of course you can run a sniffer if you want to - network traffic is "sniffed" all over the place, but that is not the point.

The point is that when running that sniffer is part of a process that has been intentionally designed to automate PageRank lookups then you are in contravention of Google ToS.

Powdork

7:42 am on Aug 25, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Isn't this the type of thing the Google API is for?

dmorison

7:47 am on Aug 25, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Isn't this the type of thing the Google API is for?

Would be nice, but as far as I know PR (as indicated by the Toolbar) is not accessible via the API.

percentages

8:34 am on Aug 25, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Burn my bandwith with silly games and I'll burn you....expect the same from Google! Even though GG was far more diplomatic ;)

Page Rank is almost meaningless now.....why the heck make an enemy of someone who can be your best friend for no good reason?

MonkeeSage

8:53 am on Aug 25, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'ts not a bandwidth issue, that I can see, since a human sitting there in front of the screen doesn't somehow not use any bandwidth and then when he steps away, bandwidth starts being taken up again...

If Google stores their files on your compute, then once they hit your disk--Google looses all *ownership* of them (they still have the right to use them via their TOS / EULA for the toolbar, they just don't have ownership of them)--they are now owned by YOU--even if you are trying to use them to sabbotage Google and burn Rome with them--they are still yours.

As long as you don't violate any laws by your use of the information, do what you want with it, it's yours after all, Google gave it to you fair and square. :)

Jordan

SlyOldDog

11:21 am on Aug 25, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi Percentages

Page Rank is almost meaningless now.....why the heck make an enemy of someone who can be your best friend for no good reason?

Pagerank isn't meaningless. It's a good means of ensuring the site hasn't been penalized and isn't part of a bad neighbourhood, or at least that that is what Google thinks.

This all came up because of a quote from the Google Search Department where they say linking to 20 bad neighbourhoods could well be a problem. These days linking is an essential part of a web site, so we need to regularly check we don't have bad neighbourhoods in our outgoing links and sites we potentially would link to.

Sometimes I have looked at a site and could find nothing wrong, but SEOs have sworn it was full of dirty tricks. I'd like the Google seal of approval.

eztrip

11:32 am on Aug 25, 2003 (gmt 0)

10+ Year Member



>>If Google stores their files on your compute, then once they hit your disk--Google looses all *ownership* of them (they still have the right to use them via their TOS / EULA for the toolbar, they just don't have ownership of them)--they are now owned by YOU--even if you are trying to use them to sabbotage Google and burn Rome with them--they are still yours.

Hmm. I'm not so sure about that one. If you've ever read an Microsoft or Adobe or any other large software companies License agreement, they generally say that the software in fact is still owned by them and they have the right to take it away from you at any time. I'm not a lawyer but reading the EULA for the toolbar on Intellectual Property Rights, I'd say the what MonkeySage says is not true.

-snip-
You acknowledge that Google or third parties own all right, title and interest in and to the Google Toolbar, portions thereof, or software provided through or in conjunction with the Google Toolbar, including without limitation all Intellectual Property Rights.
-snip-

MonkeeSage

11:50 am on Aug 25, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



An XML file with PR information (how PR is stored on disk, AFAIK) is none of the following:

-Google Toolbar
-Portions thereof
-Software provided through or in conjunction with the Google Toolbar

Google would have to specify that any *information* and / or that any *file format* is also theirs, I believe. And even if they did specify it, local property law would take precedence over third-party contractual / consensual obligation. I'm no lawyer either, but I've seen a number of cases where people were charged with possession of illegally obtained software, even though someone else had put it on their computer, simply because it was on their computer.

Also, they do not specify (that I could find, but perhaps I'm missed it?) that PR information may *only* be accessed through the PR indicator on the toolbar, rather than through the file where the information is stored, which is the real question in this particular issue.

Mabye GoogleGuy can clear up the matter for us. I would personally use the information myself, but I advise everyone to make their own decision in the matter and do the research on what constitutes ownership / possession in their own locale, as well as try to determine if Google officially forbids using the PR information unless it is accessed through the toolbar.

Jordan

plumsauce

3:10 am on Aug 26, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member




The point really is: how much can you get away with?

The sad fact of the matter is lots of "new age enterprises" post terms and shrink wrap agreements that make the eyes glaze over. Observing a TOS is all about one's own conscience and risk tolerance.

I *have* seen spider software that needed to be tuned for various factors until the spider was indistinguishable from a random browser at the level that the target webmaster was willing to invest in reliably detecting anomalies. At that point it was in the door. But not before multiple ban/tune cycles.

Personally, if it comes over the wire, I'll use it with whatever viewing software is useful to *me*, including automated. If they want to restrict it, they can use a subscription model and quit selling ad space. Follow the money.

+++

kpaul

4:23 am on Aug 26, 2003 (gmt 0)

10+ Year Member



Pagerank isn't meaningless. It's a good means of ensuring the site hasn't been penalized and isn't part of a bad neighbourhood, or at least that that is what Google thinks.

Maybe I'm wrong on this, but hasn't PR seemed a little behind (weeks, not months ;) what the page actually probably is (and more importantly how it ranks in the SERPs)? Ditto with backlink.

While I still look at backlinks, more interesting to me of late is the rate at which new pages are added to their index. Check out how many pages are in the various servers over a week or so and you can see variances. Not sure if there's anything there, but it's always nice to have a few new pages of content in the index.

Back to topic-topic, though, I would say you're breaking the TOS having something automatically open browser windows. Google is queried at that point, afaik, to grab the PR. Automating that would, imho, be clearly against the TOS.

My two SERPs,
kpaul

SlyOldDog

7:33 am on Aug 26, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



he he

And if I put Internet Explorer in my list of programs to launch when my PC boots? Isn't that automation too?

I don't want to get into an argument about semantics because in the end if Google doesn't like what I'm doing they will "ex-communicate" me anyway and and discussion of what defines automation would be null and void.

I think Plumsauce is right, but I would like to play the game within the rules where ever I can.

plumsauce

8:05 pm on Aug 26, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



automated queries over http, is this not a spider?

processing results of said queries, is this not a search engine?

so who is calling the kettle black? could it be the frying pan?

+++