Forum Moderators: goodroi

Message Too Old, No Replies

Thumbnails to be clipped, US judge rules.

         

engine

2:39 pm on Feb 22, 2006 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Internet giant Google infringed copyright rules by posting thumbnail-size photos from other websites on its search-results pages, a US judge said in a ruling issued yesterday.

[odt.co.nz...]

john_k

1:16 am on Feb 25, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Ok, surely you have to submit your site to a search engine to get listed in the first place.
That is not correct. Unless a site and its pages are blocked, or indexing algorithms/rules filter out the site, Google and most other search engines follow all links to all sites and index everything they find. They don't need an invitation from you, just a link on some obscure page is enough.

john_k

1:34 am on Feb 25, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The fundamental problem is that the structure and useage of robots.txt is rooted in the idea of an internet that is non-commercial and where everyone plays fair and nice. Unfortunately that internet only existed for a few years (if at all).

In my opinion (worth what you have paid for it) snippets with search results constitute fair use because they present some context (most of the time) in which the searched terms are used.

It seems that a more robust version of robots.txt (i.e. robotsII.txt) might allow website owners the ability to allow indexing, but also:

- set snippet length limits,
- block snippets entirely,
- specify copyright tags,
- indicate watermark overlays for images,
- many others I'm sure.

Similarily, legal penalties should be put in place to prevent rogue spiders from violating the limits laid out by robots.txt

In fact, it seems that if a large search engine entity were to lay out such guidelines and follow them, they would gain further advantage over their competitors while also regaining some respect from the web development community that helped lift them to prominence.

paulroberts3000

11:06 pm on Feb 25, 2006 (gmt 0)

10+ Year Member



robots text wont be much help these images were already stolen or hot linked on sites not controlled by perfect 10 then indexed by google.

Clark

11:45 pm on Feb 25, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Let's face it. The Internet is broken. It needs to be started again from ground up. New protocol. With spammers and thieves in mind. You can't trust senders, all data needs to be double checked. Each packet that goes out should get a check by pinging the sender with hash or something.

The protocol needs to build in some verbiage whereby you accept certain policies vis a vis a search engine protocol. Search engines would be bound by it too to play in that space. A new enforceable robots.txt could be introduced.

I'm sure there are lots of holes in this idea, but there are valid complaints by webmasters who don't want their content "borrowed" and from webmasters who don't want to see search engines blocked from sending them traffic.

We need a protocol where such issues have a chance to be sorted out better.

HughMungus

12:34 am on Feb 26, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You could, but it would reduce SE usability by somewhere between 75 and 99 percent. It would make it vastly harder to find anything and would dramatically increase time wastage for every person on Earth with an Internet connection.
I'm certain that you're just making this misguided argument to be mischievous.

It would also be dependant upon a clear ruling that the page TITLE tag is not protected by copyright. If snippets were really ever ruled to be a violation of copyright, then this wouldn't exactly be a slam dunk. The only pieces of information you could use with any certainty would be the site name and the URL.

I'm referring to snippets being displayed. I've posted in the past about how snippets can provide me information I want without ever having to go visit the website. As soon as some webmasters and/or companies realize this, they'll sue (e.g., directory websites that list phone numbers, addresses, etc.).

jomaxx

5:03 am on Feb 26, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You don't think they already realize this? Anyway I think you're describing a situation that can hypothetically happen but does not usually happen, even when searching for small, discrete bits of information.

I have to track down this kind of data (e.g. tel. numbers, birth & death dates) quite frequently, and usually a snippet is not contiguous enough for me to have any degree of certainty that the data in the snippet is actually associated with the name I am searching for. Google helps me narrow down the search and find the website, but hardly replaces the website.

There are also several aspects of this scenario which are not at all analogous to the case under discussion, including the solution. Do you think the sites you are describing really want to be left out of Google's search results? Unlikely.

legalgeek

9:12 pm on Mar 3, 2006 (gmt 0)



Lawyer and recovering web geek here. I'm working on a hypothetical based on the Perfect 10 v. Google case, and I want to make sure that I get the technical facts right.

I know that Google permits exclusion of images by targeting the googlebot-image user-agent. But is there some way to prevent indexing of images by *all* bots?

Does the following do the trick, at least for gifs and jpegs?

User-agent: *
Disallow: /*.gif$
Disallow: /*.jpg$

Any better way to do it?

What about preventing thumbnailing? I know there isn't a NOTHUMBNAILS counterpart to NOARCHIVE, but would something else do the trick?

TIA.

whoisgregg

10:08 pm on Mar 3, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Any better way to do it?

Not all robots support wildcards. However, for all nice bots, if you place all your images in a single directory ("/images/") and disallow that directory in robots.txt, then they won't fetch your images.

This 68 message thread spans 3 pages: 68