Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Any advantage to allowing site to be cached by Google?

         

jmorgan

8:24 pm on Sep 29, 2018 (gmt 0)

10+ Year Member Top Contributors Of The Month



I'm seriously considering removing my cached pages from google by using the robots noarchive tag.

But before I pull the trigger, does anyone actually know of any advantages to having your page cached by Google? I can't think of any.

keyplyr

10:23 pm on Sep 29, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I have blocked caching for about 15 years with absolutely no ill affect on index ranking or traffic.

The reason I started blocking Google, Yahoo, Bing, etc from caching my content is...

• I never gave permission for them to use my content as a feature of their search service (copyright)

• When they cache my pages, my ads are stripped.

• When they cache my pages, my server can't control the many security functions

• When they cache my pages, I lose that user data

• When they cache my pages, the user no longer needs to come to my site, visit other pages, buy products, click ads, etc.

• When they cache my pages, my branding is lessened

• When they cache my pages, scrapers have access to my content without the defenses I have on my server to block them

• When they cache my pages, it tells people "hey Google can copy someone's page, I can too then."

Leosghost

10:37 pm on Sep 29, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Word..

keyplyr

3:18 am on Sep 30, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Translators are also a scrapers haven. Stop both at the same time:
X-Robots-Tag: notranslate, noarchive
(add to server header via htaccess using your host's code recommendation.)

aristotle

10:39 am on Sep 30, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



X-Robots-Tag: notranslate, noarchive

This doesn't specify which file types the tag applies to. Or does it implicitly apply to all file types.

lucy24

6:44 pm on Sep 30, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



This doesn't specify which file types the tag applies to.
Detailed questions probably belong in a server-specific subforum. In the case of Apache: a directive lying loose in htaccess* will apply to all requests (or all responses, as the case may be). A directive tucked inside a <FilesMatch> envelope will apply to requests (or responses, which is a whole nother question) of the specified filetype. It is not easy to see how a “notranslate” directive can apply to anything other than a page containing text, but it may be less work for the server to slap the header on all files rather than take the time to evaluate filetypes.


* If you control your own server, you already know this stuff.

Wilburforce

6:47 pm on Sep 30, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



does anyone actually know of any advantages to having your page cached by Google


Not by Google, no, but I have been assisted by evidence from Wayback Machine in two trade mark disputes (and it could also be useful in copyright or other intellectual property cases).

If you're going to do it I would therefore recommend excluding Google specifically, rather than excluding robots in general.

Leosghost

7:05 pm on Sep 30, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



and it could also be useful in copyright or other intellectual property cases

Bear in mind that the wayback machine ( archive.org ) has no legal standing as an adjudicator on copyright or trademark cases, and is , by copying websites and displaying them without written authorisation from the websites in complete breach of copyright law and copyright conventions..the same with their copying: displaying of of any trademarks that might be yours on your sites, that is unauthorised use of trademarks and as legally trademark abuse..

If your sites uses ads the wayback machine strips them..But allows the visitor to see the pages anyway..

Because your site ( images , text , pdfs etc ) can be viewed by a visitor to the wayback machine without them having to visit your site, allowing them can be a very bad idea..

Personally, I block them, began doing so as soon as I discovered what they did..

Wilburforce

8:02 pm on Sep 30, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



the wayback machine ( archive.org ) has no legal standing as an adjudicator on copyright or trademark cases


At the time of the cases I mentioned (both infringement of my registered trade mark, and both settled out of court) the UK Patents County Court would have been the arbiter. In the UK it would now be the Intellectual Property Enterprise Court (IPEC).

Wayback Machine archives are evidence. In a trade mark case, specifically, they are evidence:

1. that a trade mark was used at an earlier time (so that, where it is disputed, prior use can be established);
2. that the mark had been in the public domain;
3. that a website (advertising expenditure) had been employed in advertising the mark.

If you want to show those things clearly and simply, Wayback Machine is very useful. Other types of evidence lead the arguments down a rabbit warren of what was spent where, when and on what, and who could have seen it, and the more detail there is, the more room your opponent has to bury you in detail. If you want to show that your sign was on a hoarding in 1998, a time-stamped photo is about as good as it gets.

If there is any disadvantage to the fact of those archives, I have yet to experience it.

Google's cache, on the other hand, shows snapshots of content that are much too recent to carry any historical weight, so are useless for IP, or for any other benefit to the webmaster that I can think of.

keyplyr

8:34 pm on Sep 30, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The Wayback Machine infringes copyright by caching your web property without permission, the same as any other caching, period.

Whether you allow that or not is up to each webmaster. I do not.

You don't need a remote copy of your intellectual property to prove ownership. Letting Archive.org (Wayback Machine) scrape your content and put it on their server causes numerous problems (listed above.)

I've been blocking Archive.org for years. They keep changing IP ranges and User Agents attempting to sneak past my defenses, but they're pretty stupid about it so they're always caught.

Leosghost

8:41 pm on Sep 30, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If the trademark was registered, which it should be, you'll have the registration documents, which are indisputable legal proof , with the date of registration..and the areas of registrations ( both geographic and type of goods or services ) ..allowing the wayback machine to use your trademark on an unauthorised copy of your site, could ( if those disputes had gone to court, and the opposing lawyers were doing the job that they were being paid for, competently ) be given as indisputable evidence that you did not protect the use of your trademark..

That is a reason ( as any trademark lawyer, worthy of the name ) would tell you, to block the wayback machine..

Registering a trademark ( where I am ) is not expensive .around €300.oo ..I have several..and had several when I lived in the UK..
I have had to defend some of them in court ( in both jurisdictions, and others covered by the registrations ) in each case the fact of having legal, dated registration documents , was the only thing taken into account by the courts..I won each dispute, some were not disputes, they were myself claiming damages directly via the courts in question.."Out of court" via the wayback machine may work, but legal dated registration documents wins every time over a "no legal standing" website that copies other websites , however much they may dress it up as some sort of "public service"..

They ( internet archive ) are the Internet's "self appointed librarians", except they pay nothing to the authors for the books that they copy without permission..They are in fact a scraper with good PR..

Wilburforce

9:45 pm on Sep 30, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If the trademark was registered, which it should be, you'll have the registration documents, which are indisputable legal proof , with the date of registration


Which is fine unless - as in my first case - the infringer claims to have used the mark before you registered it. It then comes down to who first used the mark.

Leosghost

10:05 pm on Sep 30, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The infringer would have to have proof of "prior registration" ..or have their case thrown out of court..
Possibly a use of the trademark on correspondence sent to a govt agency ( like tax authority ) might count as "use", ( and in some are cases use without registration can be accepted ) but archive.org is legally just a website like any other..It is not equivalent to a "public notary"... or a "huissier".. Again a competent lawyer would have won the case without any involvement of what was on archive.org..

btw..here one can have use of a trademark ( prior to registration ) done online via a huissier ( French official court officer / bailiff ) for less than €25.oo...This is legal proof of "use" valid for all EU countries and AFAIK also for USA and Canada and various other countries..ie, those countries accept the document as legal proof of use ..Registration is separate procedure, which as it can take a while for the searches to be done, is usually best proceeded by registration via an huissier..

Agreed that Google cache is completely worthless and counterproductive, but in my opinion ( and that of a friend and trademark lawyer )allowing crawling and caching by archive.org is not a good idea.. for the reasons given re "non defence of trademark" if they are allowed to use it on their cache..We'll have to agree to disagree :)

Wilburforce

10:35 pm on Sep 30, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I think we're getting drawn into a discussion on IP that the OP did not intend. All I wished to point out is that there may be valid reasons to allow archiving of your pages by other organisations than Google.

For the record - and for the purposes of copyright and trade mark law in the UK - making a work available to the public via access in an archive does not constitute publication, and so does not infringe the IP rights of the owner.

Leosghost

11:31 pm on Sep 30, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Would depend on how, and who ( such as courts ) gets to define what is an "archive"..many people ( including some non specialist judges ) would probably confuse "cache" and "archive" when related to the internet..A lot of webmasters don't seem to understand "cache", and don't realise quiet what the "Google cache" is.Plus it is a variable item ( or at least was ) at one time the images shown on a complete webpage in Google's "cache" were not "cached" but were hotlinked directly from the original website..Now they are stored ( if one allows Google to index the images ) on Google's own servers..

If Google or anyone else chose to call their website or it's "cache" an "archive" ( are there restrictions on the use of the word archive ) where would that leave webmasters and IP and TMs etc ..I tend to think of "organisations" as being governmental entities, or at least non-commercial entities..

Returning to Google cache in particular, I notice that when using Google for search , during the past year or so, fewer pages in any SERP have the "cache" available, and many of those that Google do indicate in SERP that they have a "cached copy" if the "cache" link is clicked upon actually lead to Google's "we don't have that " page..If you load the "true page" from SERP, and look at the source code , there is no mention of "no-cache"..It seems that Google are conserving resources on many pages and not keeping a cache, even when not blocked from doing so..

NickMNS

1:01 am on Oct 1, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Some small side points...
Plus it is a variable item ( or at least was ) at one time the images shown on a complete webpage in Google's "cache" were not "cached" but were hotlinked directly from the original website..Now they are stored ( if one allows Google to index the images ) on Google's own servers..

I recently noticed that images in image search are also no longer hot-linked but stored on their servers. My images are SVG and appear in image search as PNG format.

... and many of those that Google do indicate in SERP that they have a "cached copy" if the "cache" link is clicked upon actually lead to Google's "we don't have that " page.

This is likely due to a bug discovered and discussed in another thread where Google has an issue resolving the parameters in its links if the parameter is for a site that is https, simply modifying the parameter by deleting the "s" of https will show the correct page. Details here: [webmasterworld.com...]

tangor

1:42 am on Oct 1, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I just keep it simple. Crawl all you like, but you can't keep it.

It is mine, not yours.

YMMV.

As for copyrights/trademarks--you do that the right way, correct? Registered the content with the appropriate bureaucracies, right?

Leosghost

2:10 am on Oct 1, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



This is likely due to a bug discovered and discussed in another thread where Google has an issue resolving the parameters in its links if the parameter is for a site that is https, simply modifying the parameter by deleting the "s" of https will show the correct page

Just tested this..here, Google's cache is resolving parameters on https sites fine..I'll watch out for the next cache that 404s to see what is in the address bar, but just tested about 10 or so and they didn't 404..could be they fixed it, could be that what I see sometimes is not due to a bug..or is a different bug..

keyplyr

2:15 am on Oct 1, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I wouldn't call the Mobile-first Index Update a bug.

lucy24

2:19 am on Oct 1, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



My images are SVG and appear in image search as PNG format.
They've been doing that for ages. It was very tricky to spot, because what they'd do is request your image file when it came up in a search, but what they actually showed the searcher was a png version. Almost impossible to see what's going on unless you've got an extremely small (but fully indexed) site.

NickMNS

12:36 pm on Oct 1, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I wouldn't call the Mobile-first Index Update a bug.

What does mobile index have to do with anything?

justpassing

3:37 pm on Oct 1, 2018 (gmt 0)

5+ Year Member Top Contributors Of The Month



Any advantage to allowing site to be cached by Google?

Your content can still be accessed when your site is down, but is it an advantage...

Leosghost

5:08 pm on Oct 1, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Any advantage to allowing site to be cached by Google?

Your content can still be accessed when your site is down, but is it an advantage...

Imagine that put in other, only slightly different ways..
People can still visit your home when you are out, but is it an advantage ...
Or..
Your content can still be accessed, and copied, when your site is down, but is it an advantage...

See where I'm going with this ... ;)

justpassing

5:15 pm on Oct 1, 2018 (gmt 0)

5+ Year Member Top Contributors Of The Month



See where I'm going with this ... ;)

Don' t you say in France "Tous les chemins mènent à Rome"? :)

Leosghost

5:39 pm on Oct 1, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Practically ..in France ..the roads, all go to Paris.. ;)

tangor

10:06 pm on Oct 1, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Last time I looked Paris wasn't google...