Forum Moderators: open
I find it hypocritical that NYT after using DMOZ (and hence Google) to promote its commercial products FOR FREE, want to hassle Google for its temporary caches. I say, remove all NYT pages from DMOZ and Google directories. Let them pay $299/year for "review" by a well-known directory for each of those pages.
If they don't want it cached, then put the tags on the page.
Problem solved. But that's to easy and nobody can sue anybody over it.
People put stuff on the WORLD WIDE WEB and then get upset when someone actually finds it.
For cyring out loud have all the sane people left the planet?
[edited by: mrguy at 11:30 pm (utc) on July 9, 2003]
Actually there has been a lot of discussion in blog circles about the NYTimes losing out on Google traffic because of hiding their content and how blogs and other sites offer news-related content (flame away) and that NYT is shooting itself in the foot opening the door for independent journalists/news sites and *cough* bloggers to fill the gap. They didn't like NYT hiding it's content and then blaming the blog community for filling in the SERPS.
What's weird is that I thought ALL of the NYT stuff was behind a password anyways. I remember signing up for a username since the week it was possible to do so and my cookies always let me in.
GG, maybe it's a situation where there's a 7 day free version, that gets cached and then it goes for pay and stays in the Google cache for a while? Anyone who DOESN'T have a NYT username/cookies on know how NYT handles this for the outside world?
>>If they don't want it cached, then put the tags on the page.
Sorry, doesn't work for me. What if I came by and "cached" all your content? And then 50 more people came by and "cached" your content? Are you going to ask people to start loading up their pages with different "nocache" tags?
What if I decided to "cache" Google's SERPS and make them available on my site? No difference that I can see. In fact, what if people suddenly decided to "cache" all the content here at WebmasterWorld and slap it up on their sites? I think Brett and the people here would throw a fit.
There's a huge difference between indexing content and copying the content. Google is copying your content and allowing people to view it from their site. The snippet they provide is enough and falls under fair use. I shouldn't have to take an extra step to keep anyone from copying my content and making it accessible on their site. After all, it is my content, what in the hell is it doing on their site?
Also I think it is wrong that sites with the no cache tag do not get a fresh date after all the date does not reflect the cache just the content.
Plus with the cache it often gives outdated information which is not what your site is about etc.
But every smart webmaster know the fact of, when you put someting on the web you can get grab by SE, except if you use robots.txt or Nocache and Noarchive meta.
I was a consultant for one of the biggest independant newspapers here last year and we manage the site in that way with small resume of article with noarchive and nocache with small resume to give the readers the chance to subscribe. In august 2002 (before the makeover) they receive about 3k visits/month from Google. In january 2003 it was 100k/month ;-)
GoogleGuy : The only thing is bogging me (and go amno for lawyers in this case), is why Google don't appear in anymore in 2003 in the WayBackMachine?
I don't know the legality of it at all, but I hope it's not stopped.
So it's okay if I jack all your content as long as I make sure it doesn't blend in with my site? I just need to add some disclaimer?
This is DigitalGhost's cache of ht*tp://www.yoursite.com/.
DigitaGhost's cache is the snapshot that he took of the page as he crawled the web.
If you'd like to link to the original site, please, use my cache link.
Not to mention that sites using absolute positioning look like merde in Google's cache.
I don't know how to make this any clearer, but I'll certainly give it a shot. It doesn't matter if the copyright information is left intact or not and referencing the original doesn't give anyone permission to reproduce the content. Permission must be sought after and received.
Allowing anyone to reproduce your content without your permission sets a dangerous precedent. If you allow Google to reproduce your content without your permission how can you protect you copyright against others?
If there are people that conclude that merely keeping the copyright information intact and providing a link to the original frees the content "borrower" from copyright constraints then I'd like to know how they arrived at that conclusion. The courts certainly don't agree with that line of reasoning.
There is a huge difference between an ISP caching a website and Google's cache. One thing is plumbing, the other is REpublishing. I say plumbing because digital data is not like buying a book. You don't physically take ownership of the electrons when you view a website. You are always looking at a cache on your computer when you deal with digital content.
But on an ISP you are typing xyz.com and you are receiving the same content you would if they didn't cache. Google's site does NOT look the same. At an ISP, if you try to buy a product, it isn't through a cache. If you use Javascript in the Google cache, it won't even work...
As SE aware people, we love the feature. I really enjoyed seeing a potential's date's blog. It was fascinating. But the injured party in the case of copyright infringement doesn't always feel as good as the party enjoying the benefit, just as my would be date would have not been happy that I found out how much she liked Tom Green.
And we have to be fair to the owner of the content. That's why I say the default should be caching turned off and you can put in a tag to request caching.
Just my opinion, and it's one I came to AFTER reading DH's post. I will personally lose out if the cache goes away but I think it's the right thing to do....
people can sometimes call up snapshots of archived stories at NYTimes.com and other registration-only sites.
If that's true, where the heck did the bot get a password to access the registration only section?
If a bot can read it, humans can read it without the bot.
The Google cache sort of bothers me as well. At least it gets updated pretty frequently but it is still copying my page. I don't see that the rare occasion when a site is down is worth having the cache. OTOH I won't block it. I just wonder if Google should rethink it.
Permission must be sought after and received.
Permission is granted or denied in the robots.txt and page meta tags right? I'm not a robots.txt attorney, but it seems to me that Google's database archive of their old stuff, of our old stuff, that was allowed via robots.txt, is forever harmless unless it can be proven that they violated the robots.txt.
And password-protecting the news is a stupid business model that's costing them more. I sort of feel sorry for the NYT SEO staff if there are any.
I really hate the WayBackMachine. I don't want anyone ever to see my early attempts at HTML
It's a neat idea - kind of nostolgic to see what Yahoo looked like in 1996. But yes, my site sucked when I first wrote it, too.
Fortunately it's pretty hard to find anything on the WBM.
Only because most people don't know about it.
And password-protecting the news is a stupid business model that's costing them more.
I don't mind that they want to charge people for reading articles. But it seems futile. If I can't get free information from them, I'll just get the same info somewhere else.
is forever harmless unless
Yes, if you don't want it to be seen, then you shouldn't let the bots get to it.
But I also understand digitalhost's opinion. If Google is allowed to cache your website and make this cache available to the public, why should others not be allowed.
The noarchive meta-tag is working the wrong way. There should be an tag like archive="yes" , so every site which has this tag in the head get's cached, all other sites don't get cached. This is a clear permission of the owner, that Gogle is allowed to cache the site.
By the way, what do you think about archive.org? Copyright violation?
If the Cache shows "pay for" content, then that is cloaking, as Google can see it, but a user can't. Anyone doing so is braking Google's TOS, and should be banned, not complaining about copyright infringement. BEsides, in that case people have SPECIFICALLY taken an action to defraud Google, and then want to complain about Copyright theft. SHEESH!
Secondly, if you make money on adds, then you are likely rotating them, and they still probably show up on the cache, agian, no lose of income. NOTE: Imagine getting paid for a clickthrough from a Google AdSense link on a cached copy of your page. Is that even possible? Tripped out if it is!
Thirdly, why the fuss? This topic is dead and buried!
In any case, my personal hope is that the eventual end of this is maybe the idea that content posted on a public space like an unrestricted website is a lot like making a speech on a street corner and is public domain. I don't think it's particularly dangerous to other copyright issues either.
Isn't there a site republishing a lot of GoogleGuy's comments from this forum? Maybe we'll get to see if these copyright concerns hold water at all. I don't think they do, but it'd be interesting to see who owns GoogleGuy's public comments-- the forum operator or Google? Maybe I can put a chalkboard by the sidewalk for people to write poetry on and then defend my copyright on it.
>If Google's wrong then so is just about every ISP
First, what google does is NOT caching and does not meet the definition of caching anywhere on the net. Placing a branding ad at the top of 3+ billion pages is not caching. Google has perverted the use of the word "cache".
We have looked at this issue indepth. Every net attorney I know I have asked about this issue. Atleast 1 dozen knowledgable internet and tech attorneys have told me flat out that Googles cache would not hold up in court. They have no legal legs to stand one. The "safe harbor" ISP exclusion exception of the DMCA is not applicable to Google since Google does not meet any definition of caching.
>What's wrong with meta "NOARCHIVE"?
Because it is Opt Out. You can't opt out on illegal matters. It's like saying if you don't have a sign on your front yard that says stealing is not ok, then any one can help themselves to your stuff.
>has a robots.txt
Is not an accepted web standard by any body. It has never been used in court nor even admisable evidence. Again, it is Opt out.
> not only do they make abundantly clear the location
> of the original, they also leave the
> copyright information intact
Yes, they put a black and white Google branding ad at the top of the page where the insinuation is that it is Googles content.
> cache issue has been discussed (here)
> time & again for at least three years.
Yep, first time I saw Google in 98 I asked Page **** on the page jacking.
"Many of us copyright lawyers have been waiting for this issue to come up: Google is making copies of all the Web sites they index and they're not asking permission," said Fred Lohman, an attorney at the Electronic Frontier Foundation. "From a strict copyright standpoint, it violates copyright."
In other words, Google has had 5 years to explain how they feel the cache is legal and why it is really used - they never have.
The Google cache is what built Google. No cached pages - no Google.
There are lots of services where you don't Opt-in that share your information without asking. For instant, a phonebook will display your name,address,phone number unless you Opt-out.
I like Google caching my sites, saves on bandwidth, provides a copy of the site just incase the server goes down tempararly. If you don't want your information "cached" then either 1) take it off the internet 2) no-archive tag 3) robots.txt