Lawyers Using Your Own Web Site Against You

PITFALLS OF SAVING YOUR SITE FOR POSTERITY

Search engines automatically cache your pages and something called the Internet Archive, or Wayback Machine, also comes along and makes a permanent copy of your site for "posterity". The problem starts when you realize you may have content on your web site that could result in legal issues. You may act quickly to resolve those issues yet the problems still remain without your knowledge because you didn't act as quickly as all the robots crawling your site.

Unfortunately, legal beagles love that your site was saved for "posterity" when gearing up to file a lawsuit so although you've already done the right thing by cleaning potentially harmful things off your site, the tireless automatons crawling the internet have made sure there's plenty of evidence and the next thing you know, you're about to get hung out to dry.

If you think the lawyers aren't technically savvy, think again:

Browsing a party's Web site will only show the information that the Web site owner currently wants visitors to see. Sometimes, the most valuable information about an opposing party is the information that has been changed or removed. Fortunately, there are ways to see older versions of Web pages. Pages that were changed recently can be viewed through Google's cache feature. Pages that were changed months or years ago may be available through the Internet Archive, also known as the Wayback Machine.

[law.com...]

Not only can they find your content, they do it under cloak without your knowing about it!

Viewing these older versions of Web pages avoids the privacy risks discussed above: The copied pages are not on the company's Web site, so the company has no record of the researcher's activities.

You can forget your rights, just throw them out the window, because the history of your website is already busy squealing on you without your knowledge or permission.

HOW DO YOU PROTECT YOUR SITE FROM HISTORICAL SNOOPING?

Obviously the simplest way is to keep your nose clean so nobody has a reason to be snooping in the first place.

However, this is the internet and you have to OPT-OUT of things to protect your rights.

Here's a few preventative ways to stop your website from being archived and being used as a snitch:

USE NOARCHIVE

Make sure you include the NOARCHIVE meta tag in each web page so that there is no cache in any of the major search engines.

<META NAME="ROBOTS" CONTENT="NOARCHIVE">

USE ROBOTS.TXT

Block all of the archive site spiders, such as used by the Internet Archive, in your site's robots.txt file with an entry as follows:

User-agent: ia_archiver
Disallow: /

The Heritrix software [crawler.archive.org] used by the Internet Archive is Open Source which means there are more archives out there and possibly using deviations of Heritrix that ignore robots.txt and cloak their access to your site.

HELP FOR HOSTED BLOGGER ISSUES

If you're running a blog hosted on a 3rd party service like Blogger or WordPress, your options may be limited to just embedding NOARCHIVE which the Internet Archive ignores, meaning anyone running stock Heritrix code would also ignore by default.

The only way you can exclude your site, according to their site [archive.org], is to contact them directly. Obviously an insufficient amount of businesses and sites in general are aware of the perils posed by the Internet Archive or they would honor the NOARCHIVE tag for those sites with limited access and no robots.txt just to avoid a flood of emails.

OTHER POTENTIAL RISKS

Snap.com has taken screen shots of every web page, then Ask started taking limited screenshots as well as a some new completely graphical search engines like SearchMe. Some screen shots have minimal resolution too tiny to read but others, like Snap and SearchMe, are big enough you can read, and these too are called evidence in a lawsuit. Even the tiniest thumbnail can still show a licensed trademark being used without permission.

Some of the social bookmarking sites that allow large chunks of content to be copied such as Kaboodle, Jeteye, Eurekster, some using tools like Heritrix (see above), to make small archive copies of specific content.

SUMMARY

Obviously there's no way you can completely stop anyone from making copies of your site but it may pay by being diligent in keeping many of these technologies off your site that provide any form of archives.

This is just another form of insurance that could, in the end, save your business, your house, your car, your family...

Lawyers Using Your Own Web Site Against You

Your Website May Incriminate You

incrediBILL

buckworks

phranque

Quadrille

wheel

skipfactor

phranque

Quadrille

mr_chill

bluntforce

skipfactor

incrediBILL

bluntforce

Tastatura

Quadrille

sandyk20

Lord Majestic

BeeDeeDubbleU

pageoneresults

incrediBILL

tangor

koan

Busynut

bluntforce

greenleaves

amznVibe

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week