Source code protection + Google

Forum Moderators: open

Message Too Old, No Replies

Source code protection + Google

Is there a penalty for using it?

gatekeeper

4:41 am on Dec 18, 2002 (gmt 0)

I was wondering about source code protection on the web pages.

If a site is utilizing source code protection software...
1)Can google-bot still spider the site and retrieve the actual verbiage for listing?

2)Does google penalize for using protective measures?

Slade

5:31 am on Dec 18, 2002 (gmt 0)

Can you be a little more specific in what kind of measures you're talking about?

I saw a method one time that sent only a javascript decoding function plus a huge encoded string. It decoded the text on the fly and document.write'ed it back to the screen. Definately a good way to have Google forget all about you.

gatekeeper

3:57 pm on Dec 18, 2002 (gmt 0)

Slade...

What I'm wondering is if there happens to be any programs/scripts/etc that are or can be used to protect your source code and content (densities, meta's, etc)that do the job it was designed to do but STILL works with google in getting your pages indexed, ranked, etc.

rfgdxm1

5:49 pm on Dec 18, 2002 (gmt 0)

What you want is logically impossible. If you want to be in search engines, the spiders have to be able to see your site, including the source code. If the spiders can see the source code, the world can too.

jeremy goodrich

5:59 pm on Dec 18, 2002 (gmt 0)

You could always cloak, and then use a browser sniffer to deliver different css, javascript, etc on the fly. :)

Also, if you take advantage of the noarchive feature, people won't necessarily be able to get the same page that Google got.

Combining the two does put you under more scrutinty, as Google will review pages that use the noarchive more often that regular pages, but - if you are worried about somebody stealing your hard work, it's a good way to go.

One trick is to serve a visitor a 'poison' version, with some special tags in it.

Then, if anybody steals and doesn't do a thorough job of find / replacing, you'll know that they got the page from you. :)

Hope that helps.

I just found out one of my sites had been hijacked, and I'm still puzzling over the best course of action. Had I taken the measures I described above, there would not have been the issue I'm wrestling with know, after my stuff has been stolen.

webbie

6:03 pm on Dec 18, 2002 (gmt 0)

Hi,

I think that there is no problem to do that as I have this on one of my site and the script disable the user to select things but the source code is still available, i've check in a spider simulator.
That's only my opinion of course.

ruserious

7:20 pm on Dec 20, 2002 (gmt 0)

You cannot effectively prevent source code from being stolen, you cannot even make it "hard". IMHO all those programms are like those "loose weight fast scams".

;)

[edited by: WebGuerrilla at 7:54 pm (utc) on Dec. 20, 2002]
[edit reason] removed url [/edit]

EliteWeb

7:25 pm on Dec 20, 2002 (gmt 0)

Source code for the browsers is not intended to be hidden. If you have something you dont want them to see do some internal coding or calls that process the information like passwds and so forth instead of having it in the code.

If I saw a page that had 'protection' id 'krack' whatever 'protection' was in place just because im like that :D

sun818

7:33 pm on Dec 20, 2002 (gmt 0)

I like the sites that put the following in their first line then 100 returns ;)

Anyway, the best HTML encryption I've seen is encoding the HTML in hex with a function on top of it. You can decode some of it, but you need to know JavaScript to get all the HTML. For kicks, I ran that page through SimSpider [searchengineworld.com]. Guess what only the non-encrypted parts got crawled.

robertito62

8:14 pm on Dec 20, 2002 (gmt 0)

Jeremmy...

"...One trick is to serve a visitor a 'poison' version, with some special tags in it..."

will it be ok to disclose on this thread these sweet tags? If not, could you sticky me?

Thanks.

ruserious

12:04 am on Dec 21, 2002 (gmt 0)

I am not sure why Webguerilla deleted the link I posted, but it explained in detail why every html-"protection" just does not work. You can use the Mozilla DOM inspector to still easily view andy source code:

There are several methods of decrypting the source for this page; easiest is Mozilla's trusty DOM inspector. Just view the page in Mozilla, open the DOM inspector, select <HTML> as the node you wish to view, right-click and select "Copy XML." Open a text editor and paste the source into a new document. The page source is now yours to do with as you see fit.

If you want the link to the site it is explained in detail, sticky me. (If that is ok, otherwise I'll delete this sentence if you notify me about the why...)

Fnord

6:17 pm on Dec 28, 2002 (gmt 0)

I still fail to see the logic behind trying to hide META tags. They just aren't that valuable a commodity, are they?

There are several Javascript solutions that will scramble the HTML, and write the content out to the browser. I have used these scripts when I had text that I didn't want visitors to be able to copy and paste, or where we don't want the links to be easy to extract. But we didn't want these pages indexed by search engines.

The original question was whether there is a potential Google penalty. This really depends on what you're doing, and what Google decides to do. If you send Google some HTML that isn't the same thing that's displayed in the visitor's browser, you're taking a risk.

I've seen one script that is used to hide affiliate links, that hijacks the merchant's page content for display and allows hidden text to be placed on the page. I would have to guess that they would penalize this if they caught it.

Brett_Tabke

6:52 pm on Dec 28, 2002 (gmt 0)

>you cannot even make it "hard".

How do you decode encoded/obfuscated javascript?

Fnord

7:43 pm on Dec 28, 2002 (gmt 0)

Brett:

The Mozilla DOM Inspector makes it pretty easy. I haven't found a Javascript scheme that it can't unravel. ruserious already posted the basic instructions, here's a more step by step version.

1. Get Mozilla, make sure you install the Inspector (full or custom install).
2. Go under the Tools menu; Web Development; DOM Inspector.
3. In the DOM Inspector, go under File; Inspect a URL. Type in the URL you want to inspect.
4. After it loads, right-click on the "HTML" node, select "Copy XML"
5. Paste the clipboard contents into a text editor - there's the "hidden" source code.

Flippi

7:48 pm on Dec 28, 2002 (gmt 0)

Hi Gatekeeper

> 1) Can google-bot still spider the site and retrieve
> the actual verbiage for listing?

It depends: if your software uses Java Applets or pure JavaScript for encrypting the whole HTML code, then Googlbot will see only the submitted result. And this is nothing similar to HTML. As a fact Googlebot will not spider your page and not include it to the index.

If your server submits HTML (in one line worm, for example) or HTML with just a view JavaScripts, the spider will have no problems.

> 2) Does google penalize for using protective measures?

No, but not to be in the Google index is also some sort of penalty.

Beachboy

8:00 pm on Dec 28, 2002 (gmt 0)

I am curious about something here. Is the point behind source code protection related to positioning on Google? If this is about protecting optimized code from competitors, you're barking up the wrong tree. The way to victory has a lot more to do with getting high-value, properly utilized inbound links. Somebody slap me upside the head if I misunderstand the issue behind source code protection.

jeremy goodrich

8:13 pm on Dec 28, 2002 (gmt 0)

You also don't want competitors hijacking your site, BeachBoy - it's not all about cloak and dagger stuff.

Chances are, you'll get one stolen eventually - I've had this happen several times.

The thing I mentioned, regarding 'feeding a bad page to the user' is just that: cloak a page, make sure you have a 'human' or well known pagejacking / downloading software.

Then, when a person comes calling, do things to the page that would make it rank differently - but still appear to be the same.

Eg - multiple title tags.

Meta info - stuff it silly.

<!-- comment out a bunch of stuff that people might think was 'optimized' content --!>

Change up the presentation - eg, use an H1 tag - but the visitor gets a <p> tag - both of which would be disguised with css / so the appearance might be the same.

Things along those lines. Between that, and custom CSS files for different visitors, you should be fine. Oh, and of course the obfuscated javascript.

If you do all those things, you can be assured that anybody who pagejacks you will have a tough time indeed figuring out heads or tales of our your source code.

Thus, giving you, more time to spend on making money in the long run, and your competitors ample opportunity to 'chase their tails'.

Beachboy

8:29 pm on Dec 28, 2002 (gmt 0)

Thanks Jeremy, I shall consider myself to be duly "slapped." ;)

Krapulator

6:03 am on Dec 29, 2002 (gmt 0)

This may seem an ignorant point, but is there really any html code out there that is so unique and amazing that it needs to be hidden? Its only markup for petes sake

FoodPlaces

9:47 am on Dec 29, 2002 (gmt 0)

The only advantage I can see about hiding meta tags is (1) if you are using some sort of meta-refresh and dont want it obvious, (2) you dont want others to know what keywords and description work for your site...

But, neither matters... the first is noticable, and the second can be gleaned by visiting the various search engines, and finding what they are grabbing from the tags using a few smart but simple queries to narrow down the keywords - or by using a number of the submission related tools to grab the page's head section and display them to you.

- Rob