Forum Moderators: open

Message Too Old, No Replies

A "Repulsive" Metatag?

Can I hide certain text on a webpage from the search engines

         

peted

2:28 am on Jan 28, 2005 (gmt 0)

10+ Year Member



There may be frames or parts of frames that I do NOT want a search engine to "find" and make available. Is there some sort of a tag that will tell the webcrawler to "skip over this", with the result that I can be relatively sure the skipped text never ends up in the database of the search engines? One thing I'd like to make the object of the skip would be an email address, so email spammers won't find it on the webpage. Any ideas?

tedster

3:41 am on Jan 28, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



One way is to use a javascript document.write() and define the text it writes out in an external .js file. The downside - if your visitor has JavaScript turned off they won't see it either.

If there were an HTML tag or attribute to do this (and there isn't), it would only be obeyed by spiders that obeyed the standard, and email address farming bots are not that well behaved.

peted

10:57 pm on Jan 30, 2005 (gmt 0)

10+ Year Member



Thanks. I was afraid that would be the answer.

Jon_King

10:59 pm on Jan 30, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Tedster, I am under the impression that G parses .js files?

tedster

5:33 am on Jan 31, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Google reads js files as plain text - they do not actualtl parse them as far as I've ever heard. (what a mess they could get into if they did with people running exploits on their servers!)

So if your js breaks the parts you don't want seen into a few variables, and then the document.write() statement concatenates those variables into the full character string - I think that would do it.

kaled

11:12 am on Jan 31, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Spambots are unlikely to play by the rules. Nothing you do by way of special tags, robots.txt, iframes, etc. is likely to help in this regard.

Kaled.

Jon_King

12:12 pm on Jan 31, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yes, big difference between reading and parsing. Thanks Tedster.

valder

1:39 am on Feb 3, 2005 (gmt 0)

10+ Year Member



For hiding email adresses, perhaps post nr. 14 in this thread [webmasterworld.com] could be something to look into?

If you want to disallow bots from accessing certain parts of your server, there's at least 2 methods:
robots.txt and the link attribute rel="nofollow".

<a href="/" rel="nofollow">home</a>

rel="nofollow" works only with <a> elements, and is used only for search engines. Not all search engines are supporting this yet though, but some of the biggest do and I think more will follow with time.

A much more effective way to prevent bots fom accessing certain areas, is to use robots.txt.
While most bots will obey robots.txt, spambots will most likely not. There are ways to fight those that don't though. For instance, you could have a 1x1 px transparent image link to a page that generates random fake email adresses, and put that location in the robots.txt as "disallow". This way, regular bots won't index it, while any bots that don't follow the robots.txt will.

I have an entire section for bots that don't follow robots.txt on my site. 13 pages that generates 1000 fake email adresses each, and they all link to each other. Hopefully, this will teach bot-makers to obey robots.txt.

But robots.txt works only for entire documents, not parts of text as is what you seek.

PatrickDeese

2:07 am on Feb 3, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



How about an iframe to insert the content you don't want spidered, and simply blocking the sourcefile in robots.txt etc?

tedster

4:58 am on Feb 3, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Patrick, that's a nice elegant solution. Well done.

peted

6:49 pm on Feb 11, 2005 (gmt 0)

10+ Year Member



I studied robots.txt in the site glossary and I very generally understand that concept. But what is an "iframe" and what is its significance?

PatrickDeese

8:40 pm on Feb 11, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> But what is an "iframe" and what is its significance?

iframe is a tag in HTML that allows an inline frame to be loaded into a page.

In other words, you can make a "window pane" in one page that shows another.

Amazon uses iframe in their code generator for their affiliates.

Here's the W3 entry on it:

[w3.org...]