Forum Moderators: open
I saw a method one time that sent only a javascript decoding function plus a huge encoded string. It decoded the text on the fly and document.write'ed it back to the screen. Definately a good way to have Google forget all about you.
Also, if you take advantage of the noarchive feature, people won't necessarily be able to get the same page that Google got.
Combining the two does put you under more scrutinty, as Google will review pages that use the noarchive more often that regular pages, but - if you are worried about somebody stealing your hard work, it's a good way to go.
One trick is to serve a visitor a 'poison' version, with some special tags in it.
Then, if anybody steals and doesn't do a thorough job of find / replacing, you'll know that they got the page from you. :)
Hope that helps.
I just found out one of my sites had been hijacked, and I'm still puzzling over the best course of action. Had I taken the measures I described above, there would not have been the issue I'm wrestling with know, after my stuff has been stolen.
If I saw a page that had 'protection' id 'krack' whatever 'protection' was in place just because im like that :D
<!-- Error while reading source file -->
Anyway, the best HTML encryption I've seen is encoding the HTML in hex with a function on top of it. You can decode some of it, but you need to know JavaScript to get all the HTML. For kicks, I ran that page through SimSpider [searchengineworld.com]. Guess what only the non-encrypted parts got crawled.
There are several methods of decrypting the source for this page; easiest is Mozilla's trusty DOM inspector. Just view the page in Mozilla, open the DOM inspector, select <HTML> as the node you wish to view, right-click and select "Copy XML." Open a text editor and paste the source into a new document. The page source is now yours to do with as you see fit.
If you want the link to the site it is explained in detail, sticky me. (If that is ok, otherwise I'll delete this sentence if you notify me about the why...)
There are several Javascript solutions that will scramble the HTML, and write the content out to the browser. I have used these scripts when I had text that I didn't want visitors to be able to copy and paste, or where we don't want the links to be easy to extract. But we didn't want these pages indexed by search engines.
The original question was whether there is a potential Google penalty. This really depends on what you're doing, and what Google decides to do. If you send Google some HTML that isn't the same thing that's displayed in the visitor's browser, you're taking a risk.
I've seen one script that is used to hide affiliate links, that hijacks the merchant's page content for display and allows hidden text to be placed on the page. I would have to guess that they would penalize this if they caught it.
The Mozilla DOM Inspector makes it pretty easy. I haven't found a Javascript scheme that it can't unravel. ruserious already posted the basic instructions, here's a more step by step version.
1. Get Mozilla, make sure you install the Inspector (full or custom install).
2. Go under the Tools menu; Web Development; DOM Inspector.
3. In the DOM Inspector, go under File; Inspect a URL. Type in the URL you want to inspect.
4. After it loads, right-click on the "HTML" node, select "Copy XML"
5. Paste the clipboard contents into a text editor - there's the "hidden" source code.
> 1) Can google-bot still spider the site and retrieve
> the actual verbiage for listing?
It depends: if your software uses Java Applets or pure JavaScript for encrypting the whole HTML code, then Googlbot will see only the submitted result. And this is nothing similar to HTML. As a fact Googlebot will not spider your page and not include it to the index.
If your server submits HTML (in one line worm, for example) or HTML with just a view JavaScripts, the spider will have no problems.
> 2) Does google penalize for using protective measures?
No, but not to be in the Google index is also some sort of penalty.
Chances are, you'll get one stolen eventually - I've had this happen several times.
The thing I mentioned, regarding 'feeding a bad page to the user' is just that: cloak a page, make sure you have a 'human' or well known pagejacking / downloading software.
Then, when a person comes calling, do things to the page that would make it rank differently - but still appear to be the same.
Eg - multiple title tags.
Meta info - stuff it silly.
<!-- comment out a bunch of stuff that people might think was 'optimized' content --!>
Change up the presentation - eg, use an H1 tag - but the visitor gets a <p> tag - both of which would be disguised with css / so the appearance might be the same.
Things along those lines. Between that, and custom CSS files for different visitors, you should be fine. Oh, and of course the obfuscated javascript.
If you do all those things, you can be assured that anybody who pagejacks you will have a tough time indeed figuring out heads or tales of our your source code.
Thus, giving you, more time to spend on making money in the long run, and your competitors ample opportunity to 'chase their tails'.
But, neither matters... the first is noticable, and the second can be gleaned by visiting the various search engines, and finding what they are grabbing from the tags using a few smart but simple queries to narrow down the keywords - or by using a number of the submission related tools to grab the page's head section and display them to you.
- Rob