homepage Welcome to WebmasterWorld Guest from 50.17.162.174
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld
Home / Forums Index / WebmasterWorld / Webmaster General
Forum Library, Charter, Moderators: phranque & physics

Webmaster General Forum

This 55 message thread spans 2 pages: 55 ( [1] 2 > >     
Lawyers Using Your Own Web Site Against You
Your Website May Incriminate You
incrediBILL




msg:3729636
 11:13 pm on Aug 23, 2008 (gmt 0)

PITFALLS OF SAVING YOUR SITE FOR POSTERITY

Search engines automatically cache your pages and something called the Internet Archive, or Wayback Machine, also comes along and makes a permanent copy of your site for "posterity". The problem starts when you realize you may have content on your web site that could result in legal issues. You may act quickly to resolve those issues yet the problems still remain without your knowledge because you didn't act as quickly as all the robots crawling your site.

Unfortunately, legal beagles love that your site was saved for "posterity" when gearing up to file a lawsuit so although you've already done the right thing by cleaning potentially harmful things off your site, the tireless automatons crawling the internet have made sure there's plenty of evidence and the next thing you know, you're about to get hung out to dry.

If you think the lawyers aren't technically savvy, think again:

Browsing a party's Web site will only show the information that the Web site owner currently wants visitors to see. Sometimes, the most valuable information about an opposing party is the information that has been changed or removed. Fortunately, there are ways to see older versions of Web pages. Pages that were changed recently can be viewed through Google's cache feature. Pages that were changed months or years ago may be available through the Internet Archive, also known as the Wayback Machine.

[law.com...]

Not only can they find your content, they do it under cloak without your knowing about it!

Viewing these older versions of Web pages avoids the privacy risks discussed above: The copied pages are not on the company's Web site, so the company has no record of the researcher's activities.

You can forget your rights, just throw them out the window, because the history of your website is already busy squealing on you without your knowledge or permission.

HOW DO YOU PROTECT YOUR SITE FROM HISTORICAL SNOOPING?

Obviously the simplest way is to keep your nose clean so nobody has a reason to be snooping in the first place.

However, this is the internet and you have to OPT-OUT of things to protect your rights.

Here's a few preventative ways to stop your website from being archived and being used as a snitch:

USE NOARCHIVE

Make sure you include the NOARCHIVE meta tag in each web page so that there is no cache in any of the major search engines.

<META NAME="ROBOTS" CONTENT="NOARCHIVE">

USE ROBOTS.TXT

Block all of the archive site spiders, such as used by the Internet Archive, in your site's robots.txt file with an entry as follows:

User-agent: ia_archiver
Disallow: /

The Heritrix software [crawler.archive.org] used by the Internet Archive is Open Source which means there are more archives out there and possibly using deviations of Heritrix that ignore robots.txt and cloak their access to your site.

HELP FOR HOSTED BLOGGER ISSUES

If you're running a blog hosted on a 3rd party service like Blogger or WordPress, your options may be limited to just embedding NOARCHIVE which the Internet Archive ignores, meaning anyone running stock Heritrix code would also ignore by default.

The only way you can exclude your site, according to their site [archive.org], is to contact them directly. Obviously an insufficient amount of businesses and sites in general are aware of the perils posed by the Internet Archive or they would honor the NOARCHIVE tag for those sites with limited access and no robots.txt just to avoid a flood of emails.

OTHER POTENTIAL RISKS

Snap.com has taken screen shots of every web page, then Ask started taking limited screenshots as well as a some new completely graphical search engines like SearchMe. Some screen shots have minimal resolution too tiny to read but others, like Snap and SearchMe, are big enough you can read, and these too are called evidence in a lawsuit. Even the tiniest thumbnail can still show a licensed trademark being used without permission.

Some of the social bookmarking sites that allow large chunks of content to be copied such as Kaboodle, Jeteye, Eurekster, some using tools like Heritrix (see above), to make small archive copies of specific content.

SUMMARY

Obviously there's no way you can completely stop anyone from making copies of your site but it may pay by being diligent in keeping many of these technologies off your site that provide any form of archives.

This is just another form of insurance that could, in the end, save your business, your house, your car, your family...

 

Quadrille




msg:3729884
 9:28 am on Aug 24, 2008 (gmt 0)

Independent archives of your site can be useful supporting evidence - for example, evidence that you published something before a copyright thief did.

I know it's fashionable to live in fear - and 'fight for your rights' - but think before you dispense with every archive; for most people and sites, there is little to fear (no guarantees, of course!), and the likelihood of fighting to get thieves and frauds off your pages may be a much bigger issue.

I have lived the situation of having content stolen, and the thief claim he published first, and I copied him. Archive.org proved him a cheat and a liar - and got his site removed from Google within 24 hours, and removed from the web within 72 hours.

I know it's also fashionable to treat Google et al (but especially Google) like Big Brother or the Devil himself. For most us, that is just not the reality.

It's good to have this advice available, but before use, decide if it really applies to you and your sites. Fighting every 'Google Bashing Campaign' for the sake of it may prove to be a mistake, as well as providing potential camouflage for some cloakers.

And for the record, yes, I'd much prefer these archives were not so public, as they do get abused.

Lord Majestic




msg:3729887
 9:58 am on Aug 24, 2008 (gmt 0)

You can forget your rights, just throw them out the window, because the history of your website is already busy squealing on you without your knowledge or permission.

The accused have a fair few rights in decent countries, but this does not include destruction of evidence - effectively this is obstruction of justice.

This stuff is only useful to those who know they are doing something wrong and want to avoid having 3rd evidence of their actions - be it Internet Archiver or Google's cache.

I think this paranoia is totally unnecessary: if you are that paranoid don't put up public content. It's like shouting something you don't want people to have record on the street and trying to confiscate people's recording devices to avoid being incriminated - the solution in this case is not to do the wrong thing in the first place.

skipfactor




msg:3729938
 3:09 pm on Aug 24, 2008 (gmt 0)

not to do the wrong thing in the first place...

It's not always intentional and most are forever maturing in their online activities. I think no archive is sound advice for any webmaster "talking" online. One day you'll be glad you did with like the biggest sigh ever...

thecoalman




msg:3729965
 4:42 pm on Aug 24, 2008 (gmt 0)

In this day and age with the sue everyone for any reason mentality I don't think its paranoia but being realistic. I've been considering doing this for sometime now so I'm glad this topic came up.

My question other than users no longer being able to view the cached version which can be beneficial in a lot of cases are there any other "gotchas"?

incrediBILL




msg:3730017
 7:20 pm on Aug 24, 2008 (gmt 0)

Archive.org proved him a cheat and a liar - and got his site removed from Google within 24 hours, and removed from the web within 72 hours.

See? Had he listened to my advice he would still be online! ;)

What would you have done if Archive.org had indexed your content on his site prior to re-indexing YOUR site and by the nature of the crawl patterns shown the infringer to have posted it first?

Then you would be the one in trouble possibly losing your site if he made a counter-claim.

Archive.org can be a double-edged sword.

My question other than users no longer being able to view the cached version which can be beneficial in a lot of cases are there any other "gotchas"?

I've been running a high traffic site (all organic traffic) without cache or internet archives for 3 years now without issue, just like WebmasterWorld does as well.

but this does not include destruction of evidence - effectively this is obstruction of justice

There's no destruction of evidence, only the lack of evidence in the first place.

The site owner doesn't have to be the wrong doer, it could be a comment left in a forum or a blog slandering someone or some company that the moderators later erased, yet it gets cached nearly as fast and possibly archived. Depending on who was being slandered, the reputation management surveillance sites could pick up on this very quickly and the next thing you know you're caught in the middle supplying subpoenas for IPs and user accounts simply because the DELETE COMMENT didn't really delete the comment!

Lot's of scenarios possible and disallowing cache and archives means DELETE is really DELETE!

Fighting every 'Google Bashing Campaign' for the sake of it

First, it's not paranoia or Google bashing, it's content control. Copyright was never an OPT-OUT process to protect before the internet came along and SE's introduced cache and then the archives followed. Now you lose control of your copyright the minute your content goes online unless you tell everyone to back off.

This particular thread is about data mining by lawyers but cache and archives have been data mined by scrapers, hackers, competitors, snooping SEOs, surveillance and reputation monitors, and a lot of other companies that want to get information about you without you knowing they're even looking.

In the past I might've called some of this paranoia too but I've seen a few automated internet surveillance sites show up reading my site daily. When I blocked them they started reading Google cache. I know this because they loaded my web tracker built into the page. Don't know what they were looking for, don't care, because it's out of their reach now.

Besides, you never know WHO might be looking at your archives.

How about Picscout and Getty? They crawl the web looking for infringing images on web sites and then sue for massive usage fees. If you had already cleaned these images off your site would you want Google Image search or the Internet Archive to still have a cached copy that could result in costing you thousands for a past infringement? In the past Getty has claimed simply removing the images from your site wasn't enough [webmasterworld.com], they wanted the money if they found proof you infringed, simple as that!

Could you imagine being turned down for cheap life insurance because of some old sky diving photos you posted 5 years ago?

Worse yet, being turned down for a job because of something your potential employers found in the internet archive that used to be posted on your site?

It could be as simple as something you post today, text or images, could simply be a tad embarrassing somewhere down the road such as when TV Polonia [archive.org] lost a case against Echostar thanks to the Internet Archive and I'll bet they wished they would have opted out! ;)

If you're a bit radical in your online youth (or old age in some of our cases) beware because the FBI recently issued a demand of records [nytimes.com] on the internet archive and God only knows what they wanted and he's not talking.

You never know how your website today could impact your future so my advice is to make sure that happens on your website, stays on your website ;)

[edited by: tedster at 11:47 pm (utc) on Oct. 3, 2008]
[edit reason] fixed a link [/edit]

DavidKeffen




msg:3730026
 7:44 pm on Aug 24, 2008 (gmt 0)

I'm inclined to agree that we do need to be careful. For instance one or two of my businesses require occasional changes to our Terms and Conditions, Privacy and various business procedures pages.

Although not high on my list of worries (that's a long list BTW) I've always wondered if a client might print off an archived copy of out-dated pages such as these and then find a way to use them against us.

Probably will never happen, but then again in the UK the whole nation is being actively encouraged by ambulance chasing lawyers to look for any excuse to bring litigation against all and sundry. Day time television is littered with ads by these guys.

Whilst I'm not at the tinfoil hat stage of paranoia, I'm starting to take such things more seriously.

Quadrille




msg:3730062
 10:19 pm on Aug 24, 2008 (gmt 0)

See? Had he listened to my advice he would still be online!

What would you have done if Archive.org had indexed your content on his site prior to re-indexing YOUR site and by the nature of the crawl patterns shown the infringer to have posted it first?

Then you would be the one in trouble possibly losing your site if he made a counter-claim.

Archive.org can be a double-edged sword.

Actually, no.

I can always prove my ownership of my material; but archive.org can provide a quick and easy way to shortcut having to fax multiple documents to Google and the cheat's host.

Without archive .org, or in the scenario you've imagined, proving would certainly have been more difficult.

I suspect my actual experience is much more common than your possible one.

Paranoia is fine, and can be a wise precaution - but you can take it too far, and I suspect this archive witch hunt has some other motive, though I have no idea what that might be.

I remain convinced that for the vast majority of sites, the ability to rescue lost copy, and deal easily with stupid copyright thieves much outweighs any civil liberties arguments. But then, I've never been impressed by civil liberties arguments, so maybe I'm missing something. :)

incrediBILL




msg:3730094
 12:46 am on Aug 25, 2008 (gmt 0)

It's not really a civil liberty issue IMO, it's like I said, when you delete something it should simply stay deleted and not come back to haunt you in the future.

Ever have a customer look up your cache page to ask if he could get yesterday's (or last week's) price?

If you don't have the cache you don't have the potential problem.

As far as backups are concerned, try burning a CD of your content and make a copy to a 3rd party online storage company, lots of options exist. As a matter of fact, most websites will zip down to less than 10MB which you can email to yourself on Yahoo Mail or Gmail and use their unlimited email accounts for backup storage purposes.

I remain convinced that for the vast majority of sites, the ability to rescue lost copy, and deal easily with stupid copyright thieves

The Internet Archive doesn't always index all pages for a site nor does it seem to archive all the images for whatever reason. Additionally, if you run a dynamic site it doesn't work very well whatsoever, or for sites with massive amounts of content so relying on it for anything isn't really advised.

Making your own copies and storing them yourself with regularly dated archives of your own are your best bet.

[edited by: incrediBILL at 12:50 am (utc) on Aug. 25, 2008]

Lord Majestic




msg:3730095
 1:09 am on Aug 25, 2008 (gmt 0)

If you are so paranoid about these things then you should consider that the course of action that you suggest taking is highly abnormal. It effectively shows the intent to hide something, when you do it then you won't be able to quickly resolve any matter by saying it was a mistake - such deliberate highly unusual acts would show you in a very bad light. Any decent lawyer would point out to these things as a sign that you knew that you were doing something wrong and took action to prevent it being discovered.

You are effectively trying to defend yourself using "security through obscurity" hoping that you "plug leaks" of some data that can be stored in IA or Google cache. If you did something in the past that you are not very proud of now then the solution is not to "learn your lesson" and avoid such things being known, but not to do these things in the first place.

Quadrille




msg:3730111
 1:49 am on Aug 25, 2008 (gmt 0)

Can you imagine a newspaper publisher trying to retrieve and destroy every single issue they printed?

Yes, it's the Internet, and "you can" destroy archived copies, or prevent them; looks like the death knell for "Publish And Be Damned".

Strikes me (as an ex journalist) that's it's tragic if news becomes 100% ephemeral, because publishers are scared to let their words be preserved.

And of course it's all pointless; the one person who feels he was libelled (or just wants to make trouble), doesn't need an archived copy - he will download and preserve his own.

Is this really just to give cloakers some long grass in which to hide, or is there an entirely different agenda here? It's hard to believe that a thread per week needs to be created just out of archive-phobia!

[edited by: Quadrille at 1:59 am (utc) on Aug. 25, 2008]

incrediBILL




msg:3730151
 4:33 am on Aug 25, 2008 (gmt 0)

Can you imagine a newspaper publisher trying to retrieve and destroy every single issue they printed?

Funny you mention newspapers as the Belgium court got Google to remove all news and photos, and from "cache":

[chillingeffects.org...]

Find that the activities of Google News and the use of the "Google cached violate in particular the laws on copyright and ancillary rights (1994) and the law on data bases (1998);

It effectively shows the intent to hide something

I know you're a search engine guy so stop trying to vilify my actions just because I don't like search engines playing fast and loose with my content. For some reason you think that my wanting to help people control content they own is a bad thing and it's not. It's called copyright, that amazing little thing SE's would like to wish didn't exist.

You want to make a search engine and show snippets, fine.

You want to publish every page I own and claim it's "fair use" as "cache" so you can retain the customer longer so they don't go to the actual webmasters site and are prone to click on the search engine's own ads more, no fine.

Besides, I have nothing to hide, just visit my site because that's where you'll find my content, not in cache, not in an archive, so no skulking around snooping, if you want to snoop, I get to watch you do it.

However, this is drifting off topic because the point was and still is, why allow anything that ever happened on your site sit out there for anyone to make a case out of?

Things can be taken out of context so there's no way to know how someone might twist something to fit their own agenda down the road so the best defense in protecting your own property and rights is no cache and no archives.

incrediBILL




msg:3730154
 4:37 am on Aug 25, 2008 (gmt 0)

And of course it's all pointless; the one person who feels he was libelled (or just wants to make trouble), doesn't need an archived copy - he will download and preserve his own

Not if he never saw it in the first place and it was removed before he ever found it.

Remember what I said about libel in blogs and forums?

If you delete it, it should be GONE, not lingering elsewhere to cause problems later long after it was already removed.

Thanks for making my case!

Quadrille




msg:3730219
 6:29 am on Aug 25, 2008 (gmt 0)

Sure, but you are taking a special case and treating it like a general case again.

How many webmasters want to post a libel and remove it before anyone sees it? If it's there long enough for *anyone* to see, then it's there long enough for the *wrong* person to see it and archive it for their lawyer.

Most of us want long term human and SE value; most of us would argue against removing stuff without good reason.

Instead of all the paranoia, wouldn't you be better just not posting in the first place? - or just not posting libel, risky stuff and cloaking.

If the Internet is that frightening, there's plenty of Earth-bound jobs with no risk ;)

BeeDeeDubbleU




msg:3730227
 7:02 am on Aug 25, 2008 (gmt 0)

I believe that this is a matter of personal choice but most people would agree that if you are covering your tracks then you thinks that there is a need to do so. Archive.org can be used by both sides. As someone said earlier it is a double edged sword.

How about Picscout and Getty? They crawl the web looking for infringing images on web sites ...

Correct.

... and then sue for massive usage fees.

Wrong (in all but one or two much much vaunted cases).

Getty, Corbis and now Masterfile all use similar tactics, which have not as yet involved taking any small businesses or individuals to court. They only threaten to do this to frighten people into parting with the extortionate sums they demand. I am one of their targets. I have paid them nothing and I have no intention of paying them anything. Clearly archive.org would be an ideal tool for them to use but I have no knowledge of them having used it?

In December 2006 when I received my Corbis demand I blocked Archive.org on all of my websites. Obvisoulsy I was in a defensive situation. After I had checked them all to ensure that there was no possibility of any further attempts to legally(?) extort money from me I started allowing it back in again.

The bottom line is that I have much more trouble with people stealing my content than anything else and I need to have evidence that it is mine. The archive helps me with this.

If you think you may be infringing someone's copyright or if there is a possibility of you saying something slanderous or potentially damaging then by all means block it but the reality is that the vast majority of websites are clean and unlikely to see any problems.

incrediBILL




msg:3730228
 7:03 am on Aug 25, 2008 (gmt 0)

How many webmasters want to post a libel and remove it before anyone sees it?

OK, we're going in circles and not getting anywhere.

I gave a lot of examples besides libel, which would be more likely caused by members of a forum or comments on a blog. Other examples included removing items that infringed on trademarks (logos and such) or copyright infringements such as images used by web designers that when the owner finds out the image infringes yet there is now a cache and maybe an archived permanent record and nowhere to run from copyright tracking services like Picscout.

And you ignored the issue with pricing changes, etc.

Perhaps you've never been on the receiving end of CACHE problem, but libel is just a one small issue you keep needling, hardly the only issue which I've now mentioned multiple times.

Besides, if you want long term SE value, why in the heck would you let the SE display your cache page at any point?

The cached pages give the search engine more opportunity to hold the visitor longer and the SE and their PPC advertisers get a better shot at the visitor retention if the visitor never has to actually visit your site since the cache page(s) are readily available.

Whether the content on that page is current or not is another story, remember they're cached, so the visitor may see you're selling what they want but doen't know it's currently on sale so they see the CACHED version days or weeks old. They then back out of your cache page and pick someone else's site or PPC ad (benefiting the SE) and go somewhere else that they find a better price or sale all because your cache page was old.

Conversely, the opposite is also true that a customer sees a sale or discount in the cache and runs to your store just to see the price has changed and is now annoyed and bailed.

The point is why are you letting the SE set the customers expectations without more tightly controlling what you present to the visitors at all times?

Most scenarios and they all end the same:

Cache bad, Noarchive good, learn it, love it, live it, get it? got it? GOOD!

[edited by: incrediBILL at 7:05 am (utc) on Aug. 25, 2008]

Quadrille




msg:3730272
 8:38 am on Aug 25, 2008 (gmt 0)

The point is why are you letting the SE set the customers expectations without more tightly controlling what you present to the visitors at all times?

I'm not. I'm building and operating my sites, the SEs are doing their thing. There is no valid argument against them, save the civil liberties ones you've already distanced yourself from. Customers do not see caches by accident, and very, very few see them on purpose. It really is not a problem for me, or most other webmasters. Honest.

Most scenarios and they all end the same:

No, that simply is not true. All the ones you cite are either special cases or people who have something to hide.

That simply does not apply to 95%+ of all webmasters.

I agreed (waaaay back ...) that there may sometimes be a case for precautions, even a certain paranoia.

But this "rule" that no-one should entertain the notion of a cache (which as I've pointed out, doesn't stop your enemies doing it anyway, even if you opt out), is simply unnecessary, and possibly a serious mistake.

I don't cloak, I stand by my insults, I'm not scared of shyster lawyers, I don't steal content, and I'm more than happy to have people comparing my present sites with their older versions; indeed, I find it quite entertaining myself.

I have much more to fear from cloaking rivals than I ever did from caches, so I'm unwilling to assist in their camouflage.

I can follow your arguments, I just cannot see why it's such a big deal that you are investing so much in campaigning (obviously in vain!).

Why the fear, that's the unanswered question, why the fear?

Lord Majestic




msg:3730343
 11:34 am on Aug 25, 2008 (gmt 0)

I know you're a search engine guy so stop trying to vilify my actions just because I don't like search engines playing fast and loose with my content.

I am not trying to vilify your actions - I'd post exactly the same response if it was someone else in your your place saying the same stuff or I was not doing search engine work.

As I said before - such active actions clearly designed to eliminate 3rd party evidence will reflect badly on your intent in case of you getting into copyright trouble - you won't be able to say you were simply mistaken because your actions by disabling IA or Google cache would be seen in court as clear intent on trying to prevent being caught. So even if you were making totally innocent error your own actions will make it much worse.

You want to publish every page I own and claim it's "fair use" as "cache" so you can retain the customer longer so they don't go to the actual webmasters site and are prone to click on the search engine's own ads more, no fine.

We don't implement cache - it takes way too much storage (I'd rather have bigger index) and also it is almost never used - only when site is down. As I said above - your behavior is of no relation to my work on search engine.

This paranoia is not healthy.

swa66




msg:3730344
 11:38 am on Aug 25, 2008 (gmt 0)

I've seen idiots (well I guess now I'd better not name them) out of jealousy (for lack of a better motive) go after my sites with publicly posting utterly incorrect information such as pointing to porn archived on a URL I now own from before I owned the domain.

As a result I *always* add the robots.txt blocker for iaarchive on any domain I own. I don't need to like archive.org, I feel it is copyright infringement to start with (they might have their local law loophole, but I don't live in their jurisdiction). And I certainly don't need what some domain parking "service" put on there long before I had anything to do with the domain being (ab)used against me.

Google caches do fade out, as such so far I've left them alone, but I'm seriously considering to deny them access to e.g. forums and other places where users can generate content (that gets moderated after it gets posted)

Similarly I've lost control over some URLs I used to have content on, and I'm happy now archive.org doesn't have a record of me having had in the past anything to do with the crap that's now on there.

Lord Majestic




msg:3730351
 11:41 am on Aug 25, 2008 (gmt 0)

That simply does not apply to 95%+ of all webmasters.

You are exactly right - only I think the percentage is probably closer to 99.9% - what incrediBILL suggests is a very extreme point of view. It's not correct either in my view - if everyone acted like that then we would not have the search engines as we know them today, and that in my view would make the Net a lot poorer - this will include 99.9% of those webmasters that won't get traffic as only top known sites would benefit from it.

maximillianos




msg:3730355
 11:51 am on Aug 25, 2008 (gmt 0)

If I remove a comment or page on my site, and someone else caches it and is displaying it... They are the responsible party for that content now. I cannot control what others steal from my site (and site's like SearchMe and Snap ARE stealing). Let them get burned for their mistakes.

If anything, it demonstrates my site's ability to take action on items deemed inappropriate/illegal content, and it shows the carelessness and lack of control of sites like Snap and SearchMe.

I've been dealing with lawyer requests for content removal for over 8 years with my website (user generated content). Not once have I been targeted for someone else's site not removing illegal content, even if it came from my site.

frontpage




msg:3730358
 11:59 am on Aug 25, 2008 (gmt 0)

Nice write up bill, except there is one problem.

I already opted out of the Wayback Machine for my websites years ago. When you go to their website you can see that the disclaimer that my sites are opted out. However, I still get crawled by ia_archiver occassionally.

I know this because I set up a Mod_security 2.5 rule to catch them.

Archive.org Filter

SecRule HTTP_User-Agent "ia_archiver" "deny,log,status:404"

Heritrix Filter

SecRule HTTP_User-Agent "heritrix" "deny,log,status:404"

Hopefully, for webmasters they are not indexing without using their user agent and just a browser UA.

Lord Majestic




msg:3730362
 12:04 pm on Aug 25, 2008 (gmt 0)

SecRule HTTP_User-Agent "ia_archiver" "deny,log,status:404"

Ok, so this will cause their bot to get 404 error when they request robots.txt? That, according to robots.txt standard, would mean clear sign that site can be crawled - if anything denies like this should allow robots.txt OR issue 403 Forbidden status code - it is recommended by the standard to assume the whole site is not allowed to be crawled when such response is given to robots.txt request.

frontpage




msg:3730363
 12:04 pm on Aug 25, 2008 (gmt 0)

Another thing I have done is add a global No Archive X-Robots tag via .htaccess.

Example:


<Files ~ >
header append X-robots-tag "noarchive"
</Files>

Yes. You are correct. It should be a 403 error by mistake in writing without a cup of coffee.

Lord Majestic




msg:3730364
 12:09 pm on Aug 25, 2008 (gmt 0)

It's okay, I need coffee too :(

[edited by: Lord_Majestic at 12:09 pm (utc) on Aug. 25, 2008]

zomega42




msg:3730406
 12:44 pm on Aug 25, 2008 (gmt 0)

I block the Archive crawler, but for another reason: competitors. I don't want them to be able to look back at my site and see what kind of promotions I've offered, what kind of content has been posted over time, how fast the site has grown, etc.

incrediBILL




msg:3730791
 8:24 pm on Aug 25, 2008 (gmt 0)

clear intent on trying to prevent being caught

You miss the point in that I'm not trying to prevent being caught doing anything so my intent is squeaky clean. However, I run a site full of 3rd party contributions, so if I find one of them has misrepresented themselves and I kick them to the curb, my involvement with them is instantly ended. I simply don't need the hassle of being in the middle of some fishing expedition to figure out who that person was down the road. When the edit is made, it's final, no trail, gone, poof.

Also, try finding WebmasterWorld in any SE cache or archives, you won't.

Why the fear, that's the unanswered question, why the fear?

You confuse prevention and copyright control with fear.

They have no right to my content in the first place, end of story.

If I remove a comment or page on my site, and someone else caches it and is displaying it... They are the responsible party for that content now.

Not entirely true.

One of my old customers was reamed over using a trademark, which he arguably had rights to use but that's another story, which showed up on his site and after he removed it they came after him again because they found it still related to his site in other places.

The lawyers didn't go after the other places, they went after my client because he was the one that used the trademark and they made him clean up the mess.

incrediBILL




msg:3730804
 8:45 pm on Aug 25, 2008 (gmt 0)

if everyone acted like that then we would not have the search engines as we know them today

That's the lamest argument I've ever heard because CACHE doesn't add any value to the quality of the search itself, only if the site is down and it's not the SEs job to be a surrogate site, and the Internet Archives certainly has no value for search whatsoever.

koan




msg:3730883
 10:15 pm on Aug 25, 2008 (gmt 0)

I think it's weird that some people in this thread are trying to spin incrediBILL's posts to make him look on the fringe, paranoid or having something to hide. I disagree, and I actually think they're the ones who are overreacting.

I block Wayback Machine because it is none of their business to have copies of my content, but I still let through Google Cache, as it is not permanent. However, these days, I'm kinda deliberating. On one end I do use it sometimes myself when another site is slow to respond, on the other hand, again, it's none of their business to host my copyrighted content.

I already got some grief from copyright infringers who stole my content and said it was fair game since I allowed Google. Yeah, that's twisted, but I didn't need the aggravation of providing extra ammunition to these low lives.

Anyway, the day I do decide to block Google Cache, I do hope no one will tell me I am being fearful, paranoid, or have something to hide. It's just not the case. I just appreciate having the extra control on my sites against something I never opted in.

Quadrille




msg:3730898
 10:39 pm on Aug 25, 2008 (gmt 0)

Anyway, the day I do decide to block Google Cache, I do hope no one will tell me I am being fearful, paranoid, or have something to hide. It's just not the case. I just appreciate having the extra control on my sites against something I never opted in.

Why would they?

There are reasons (no-ones every denied it) to stop caching, and anyway, what you do is your business.

I don't agree with your reasoning, but that's my problem, not yours.

It's the campaigning that makes Bill stand out from the ordinary cache blockers :)

[edited by: Quadrille at 10:58 pm (utc) on Aug. 25, 2008]

This 55 message thread spans 2 pages: 55 ( [1] 2 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / WebmasterWorld / Webmaster General
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved