homepage Welcome to WebmasterWorld Guest from 54.235.16.159
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld

Visit PubCon.com
Home / Forums Index / Google / Google News Archive
Forum Library, Charter, Moderator: open

Google News Archive Forum

This 168 message thread spans 6 pages: 168 ( [1] 2 3 4 5 6 > >     
Hidden Text
Is this webmaster paranoia or a real problem?
kaled




msg:144913
 12:44 pm on May 27, 2003 (gmt 0)

First, a little about myself :-

I'm programmer first and a reluctant webmaster second. I'm used to having to find workarounds for bugs in Windows. Complaining to Microsoft is truly a wast of time.

It seems to me that there has been much discussion lately about panalties arising from hidden text - even to the point of pages being removed from the Google index. Since I don't use hidden text, I cannot comment from my own experience, but I can look at this issue objectively.

There are three possible policies a search engine can take when encountering hidden text/links, etc.

1: Take no special actions.
2: Ignore all such text, etc. Quite literally, pretend it isn't there.
3: Apply some sort of penalty.

Now lets look at how/why hidden text might be created.
1: As a deliberate attempt to spam search engines.
2: By accident. (Probably applies mostly to zero-length links).
3: Deliberately but without any intention to spam. For instance, a Webmaster might leave himself a todo list in white text in an empty table cell.

Now lets look at how a spammer might fool a search engine.
1: Use very bright gray text against white, etc. (Presumably search engines that worry about hidden text implement thresholds).
2: Use javascript to change background/text colors. (I imagine this can be done with tables but I've never tried it.)

Let's be realistic about this. There is not a snowball's chance in hell of a search engine being able to analyse all the javascript on a page to see if it is being used to hide text. It therefore follows that a spammer will be able to fool search engines using hidden text with no difficulty at all.

It therefore follows that applying penalties where hidden text is found is a total waste of time (if the intention is to combat spam). This just leaves two sensible policies for hidden text.

1: Take no special actions.
2: Ignore all such text, etc. Quite literally, pretend it isn't there.

Now none of this is rocket science. Indeed, if the boffins at Google can't work this out for themselves, they must be suffering from a combined IQ of a retarded chimpanzee.

So, let's see what GG has to say. Do Google boffins shuffle around on their knuckles? Judging from the mess Google is in right now it's easy to laugh and just say yes but the answer in reality is probably no. Nevertheless, given that it is child's play to use hidden text in a manner that can't be detected by search engines, I think it behoves GoogleGuy to say definitively what the policy is. After all, as I have explained, penalties are more likely to catch the innocent than the spammer.

 

mil2k




msg:144914
 1:19 pm on May 27, 2003 (gmt 0)

Kaled Google has the necessary algos to detect hidden text. They actually do it automatically and penalties have been applied in the Very recent past for using hidden text on the webpage. Do a site search and you will find the necessary info. Infact i read one thread last week itself which talked of hidden text and the penalty applied to a webmaster.

kaled




msg:144915
 1:48 pm on May 27, 2003 (gmt 0)

mil2k,

I'm certainly not arguing that penalties do not exist. I'm simply explaining that if penalties do exist for hidden text then there must exist within Google a degree of mind-numbing stupidity that is bordering on incomprehensible.

As I explained, a spammer can use javascript to spam with hidden text without fear of automatic penalties by algo.

Of course, penalties could be applied manually, but I believe that GG has already said that Google prefer to automate all of this.

No matter how you cut it, applying automatic penalties for hidden text is incredibly stupid if that means removing legitimate pages from the index. Why is it stupid? Because real spammers can easily get round it.

Care to comment GG?

Jimwp




msg:144916
 1:51 pm on May 27, 2003 (gmt 0)

Over the weekend we saw the entire elimination of one of our competitors sites from google for what seemed to be hidden text. They had 5 hidden links at the bottom of their index page. They went from a PR6 to a grey bar, many top 5 positions to now being non existent. This is a major site with hundreds of pages all of which are now a grey bar. They also have been removed from the Google directory, again tens if not hundreds of pages once in the directory are now gone. They still show up in DMOZ. Today the hidden text is gone but I hope it takes a while for them to get back in.

shaadi




msg:144917
 1:54 pm on May 27, 2003 (gmt 0)

There is no penalty for hidden text imilap.
And no use reporting it.

4eyes




msg:144918
 1:59 pm on May 27, 2003 (gmt 0)

There a recent discussion here [webmasterworld.com] which touches on this.

Its is not as easy to tell hidden text as it may first seem, as there are many legitimate reasons for making used of DHTML that will end up looking like hidden text.

Hidden text is a real problem for Google to detect automatically. They have said they are going to tackle it, but we are still waiting to see how, and to what extent.

luuna




msg:144919
 1:59 pm on May 27, 2003 (gmt 0)

I think you are wrong when you say that Javascript can safely be used to hide text. All Google has to do is run the Javascript to check the color of the text. Not impossible IMHO.

kaled




msg:144920
 2:30 pm on May 27, 2003 (gmt 0)

Think on this for a moment.

Dodgy sites may well include hidden text. They may be removed by Google for any one of at least four reasons.

1: They triggered a hidden-text alarm.
2: Someone complained about them.
3: Cockup at Google.
4: Transient update problems at Google.

Just because a site that has hidden text in it goes missing from the Google index, it doesn't mean it was removed by algo because of an alarm that was triggered.

For all my criticisms of Google and GG, I still think that Google is a very good search engine and I don't think there can be many idiots working there, therefore I think it unlikely that a severe hidden-text penalty exists.

However, I do believe that Google would like to see hidden text vanish from the internet (no pun intended) so I'm sure that they are happy to see a rumour of this sort spin as far out of control as possible.

As I said previously, I have no hidden text in my website yet in the last couple of weeks I have seen parts of my site drop like stones (seems ok now). If I did have hidden text, as a result of a guilty conscience, I might have panicked, removed the hidden text and then I'd have miraculously seen those parts of my website bounce back. But it would have pure conincidence. Of course, I could have patted myself on the back and said "Gosh that was close run thing. Aren't Google lovely people for scanning my site again so quickly and putting me back in the index".

Does that fiction sound familiar?

PS
My page rank is still showing as zero across the entire site but on one common two-word phrase, I've moved up to #3. I think Google transients may be settling down, but it's not working quite right yet. Until it is, speculation is a waste of time and energy.

Also, I've said it before but I'll say it again. I don't think Google will be fully stable for months yet. Clearly there are big things happening and given the nature of the beast, it will undoubtedly take time to settle. Oh dear god, I sound like GG - time to some proper work.

4eyes




msg:144921
 2:31 pm on May 27, 2003 (gmt 0)

luuna,

Unless I missed something, I don't think anybody is suggesting that it is 'safe' to hide text using javascript.

However, merely reading the javascript will not tell them whether text is hidden with the intent to deceive or an innocent design feature.

For example, using css and dhtml you can have a complicated page with expanding navigation systems and text colour controlled by css.

You could have any number of events that trigger the visibility of <div> content areas or even trigger the switching from one style sheet to another. You can also vary the z-index of positioned <div>s resulting in layers moving behind other content etc etc.

As I said, 'hidden text' is a real problem for Google.

It is less of a problem for any competitor who decides to spam report the site.

kaled




msg:144922
 2:44 pm on May 27, 2003 (gmt 0)

I think you are wrong when you say that Javascript can safely be used to hide text. All Google has to do is run the Javascript to check the color of the text. Not impossible IMHO.

I don't know anything about the internal working of search engines, but I abolutely guarantee you that automatically running the javascript (on every indexed page) to detect final text color is TOTALLY impractical. I doubt that even the the super computers of the CIA, etc, (and they tend to buy their computers by the acre) could do this.

Manually looking at a page is the only practical way of detecting hidden text if javascript is used.

Don't be tempted into thinking you can simply test for the odd javascript keyword. You could easily use the eval function to bypass any semi-intelligent algo.

I may not know how search engines work but I do know what the limits of current technology are. And I do know how the minds of programmers work - I am one.

dwilson




msg:144923
 2:58 pm on May 27, 2003 (gmt 0)

No filter can be expected to get ALL the variations of whatever it's supposed to find. But a hidden-text filter that flags just the obvious stuff would make a lot of us happier. If 5% of the spammers still get by, that's still 95% who don't.

And as the spammers adjust their methods, Google can adjust its methods to recognize them. For every measure there is a counter-measure. This is a war in which Google will probably never be able to claim complete victory, but if it can cut down this kind of spam it will improve results for searchers and help webmasters who practice legitimate SEO.

kaled




msg:144924
 3:07 pm on May 27, 2003 (gmt 0)

No filter can be expected to get ALL the variations of whatever it's supposed to find. But a hidden-text filter that flags just the obvious stuff would make a lot of us happier. If 5% of the spammers still get by, that's still 95% who don't.

I don't know if it is true or not, but I one read that more US soldiers were killed in the Vietnam war by friendly-fire than by hostile fire.

In order to remove this mythical 95% of spammers, how many innocent casualties are acceptable?

kaled




msg:144925
 3:20 pm on May 27, 2003 (gmt 0)

And as the spammers adjust their methods, Google can adjust its methods to recognize them. For every measure there is a counter-measure. This is a war in which Google will probably never be able to claim complete victory, but if it can cut down this kind of spam it will improve results for searchers and help webmasters who practice legitimate SEO.

Just exactly what is 'legitimate SEO'?

From the user's point of view, this is virtually an oxymoron. For instance, how many keywords should be allowed in a title (repeated or not) before this is classified as spam?

Suppose Google suddenly decided that titles with more than 10 words were spam. How many of your pages might then be classed as spam rather than legitimate.

If Google were worried about title-spam, all they should do is ignore all words beyond, say, then tenth. You, on the other hand seem to think that arbitrarily moving the goalposts and removing pages from the index, thereby punishing legitmate sites is ok - presumably until you suffer as a result.

Bernard




msg:144926
 3:24 pm on May 27, 2003 (gmt 0)

Kaled,

I find your assumption that there is a large percentage of innocent webmasters using hidden text (or hidden links) for reasons other than fooling search engines laughable.

If you want to leave a to-do list in your code, the logical approach would be to use a comment - not abuse fonts to hide text from surfers.

I do not see any problem with algorithmically penalizing sites using hidden text / links. The "colossal stupidity" you are claiming is based upon a false assumption IMHO.

Any webmaster who is interested in appearing in a search engine can certainly read and understand their guidelines. If hidden text is declared forbidden, and you still use it, you either don't care about appearing search results or are not so innocent in your motivations.

div01




msg:144927
 3:31 pm on May 27, 2003 (gmt 0)

3: Deliberately but without any intention to spam. For instance, a Webmaster might leave himself a todo list in white text in an empty table cell.

That is why "comments" are part of html.

GoogleGuy




msg:144928
 3:35 pm on May 27, 2003 (gmt 0)

Hi kaled. We've got automatic algorithms that detect hidden text, despite a number of tough cases. Besides Jimwp above, and bodybuilding, and rnrtvb, several other webmasters have verified this. You might want to check these for background:
[webmasterworld.com...]
[webmasterworld.com...]

We given hidden text an automatic penalty that expires after a few weeks, so if the site owner cleans up his site, he can get back into the index.

dwilson




msg:144929
 4:08 pm on May 27, 2003 (gmt 0)

Just exactly what is 'legitimate SEO'?

Legitimate SEO makes it obvious what a page is about.

I could write a great page about the US tank-killing aircraft, the A-10, for example, and hardly ever use "A-10" in my text. I could call it "warthog", "Thunderbolt II", "tank-killer", "awsome fighting machine", "Gatling gun and a pair of engines", etc. And it might be a good informational page. But it will never come up in searches for "A-10". It's not optimized.

A legitimately optimized page will tell a dumb robot what it's actually about. And the ways of speaking the dumb robot's language are what keeps WW interesting.

batdesign




msg:144930
 4:09 pm on May 27, 2003 (gmt 0)

I've reported a site repeatedly for using css divs with display:none that are stuffed with keyword rich text, and google has missed this completely.
The algorithm can't be that hot when stuff like this slips though.

kaled




msg:144931
 4:41 pm on May 27, 2003 (gmt 0)

GoogleGuy, let me quote you from a thread you mention
I'll be happy to give a very definite answer. Looking at the cache of your site, I think we did the right thing. On your root home page, I see a black background, with no white imagery anywhere in sight, and black text (Okay, text with 000001 as the color--is that supposed to fool people?) at the bottom that says "bodybuilding bodybuild body building weightlifting muscle supplements sports nutrition vitamins flex fitness bodybuilding." Sorry to get into specifics, but you did ask.
What part of that is not hidden text? You've taken it off your home page now, but do you think we can't check cached pages? So yes, I think you did have hidden text on your main page, and yes, I think you got caught. I would go over everything with a fine-toothed comb to make sure you've removed every bit of hidden text from your site. Within a month or so, if all of the hidden text is gone, you should should show back up in the index again.

I have a similar list of keywords at the bottom of each index page of all parts of my website. I have not hidden any of them, in fact, there is a small heading above each that reads 'Search Engine Keywords'. I also have 'Search Engine Links' to make sure that any search engine that catches one of my pages is likely to catch the whole site. (This was primarily for Inktomi and Teoma).

Now that's my way of doing things, honest and open. There are better ways to combat spam than to simply penalise a site for a few hidden keywords.

Frankly, I think Google's power is going to its corporate head.

In the future, is Google going to penalise me for cross-linking my own site (of a couple of dozen pages). If so, when will Google issue its warnings, before or after the change of policy.

We've got automatic algorithms that detect hidden text, despite a number of tough cases.

I don't consider myself to be any sort of expert in javascript, or the like, but I'll make a bet with you on any reasonable terms, that I can get a page into the Google index that is absolutely full of hidden text.

So, do you accept the challenge? If yes, what do I win?

kaled




msg:144932
 4:54 pm on May 27, 2003 (gmt 0)

A legitimately optimized page will tell a dumb robot what it's actually about. And the ways of speaking the dumb robot's language are what keeps WW interesting.

Some people would say that using hidden text is simply speaking the dumb robot's language and is legitimate.

Frankly, I think the use of hidden text is downright underhand and I wouldn't do it myself. But that is just my opinion. I don't believe that I have the right to force my beliefs on other people. Google, apparently do feel that they have this right.

I live in the UK and am certainly no expert in US constitutional law. However, Google's policy on hidden text seems to be in breach of the US constitution on freedom of expression. (Is that the first amendment? I can never remember.)

So, GoogleGuy, what do your lawyers have to say?

webdoctor




msg:144933
 5:32 pm on May 27, 2003 (gmt 0)

I live in the UK and am certainly no expert in US constitutional law. However, Google's policy on hidden text seems to be in breach of the US constitution on freedom of expression. (Is that the first amendment? I can never remember.)

Kaled,

Remember, no-one is forced to use Google. Read Google's terms of service page:

Personal Use Only
The Google Services are made available for your personal, non-commercial use only. You may not use the Google Services to sell a product or service, or to increase traffic to your Web site for commercial reasons, such as advertising sales.

As far as freedom of expression goes, basically Google can do what they like - after all, it's their index.

Webdoctor

webdoctor




msg:144934
 5:35 pm on May 27, 2003 (gmt 0)

I have a similar list of keywords at the bottom of each index page of all parts of my website. I have not hidden any of them, in fact, there is a small heading above each that reads 'Search Engine Keywords'. I also have 'Search Engine Links' to make sure that any search engine that catches one of my pages is likely to catch the whole site. (This was primarily for Inktomi and Teoma).

Now that's my way of doing things, honest and open. There are better ways to combat spam than to simply penalise a site for a few hidden keywords.

Kaled,

You are, of course, completely entitled to stuff your pages full of keywords. You can hide them too, if you like. After all, it's your website.

Google are also completely entitled to penalise your site for this - it's their search engine!

:-)

Webdoctor

BigDave




msg:144935
 5:41 pm on May 27, 2003 (gmt 0)

kaled,

Finding hidden text programatically is easy. It would be expensive in programmer time and computation time. All you have to do is render the page and look at the results having the software go through all the iterations of options on the page to see if the text appears.

The difficulty is in deciding where to apply this computationally expensive algorithm. the obvious first place is on sites that are reported with the hidden text box checked in the spam report. As it proves itself then they might start going after some other areas. I doubt that it will ever be applied to the entire index.

As for those that are using hidden text "innocently", Google has stated for a long time that having hidden text is a reason to be removed from the index. Therefore the "innocence" of the use has nothing to do with it. Use hidden text, risk not showing up in google. It's that simple.

As for your Viet Nam reference, I suggest that you go talk to some front line vets about friendly fire. Much of that friendly fire was called in intentionally on their own location in the hopes of saving some of the squad. They knew that some of them would die, but if they didn't call in artillery or air, they would all die.

In a way it is a more appropriate analogy than you realized. Though I do find it to be rather offensive to compare the rights of a web page to be listed in Google to those that died in Viet Nam.

Oh yeah, lose the keword stuffing at the bottom of your page. It's bad design. If you can't fit the words into the content, then I don't want your site to come up in the results. I think it would be a good thing for google to remove page that use that strategy too. I don't care if it's visible or not, it's not quality.

BigDave




msg:144936
 5:51 pm on May 27, 2003 (gmt 0)

I live in the UK and am certainly no expert in US constitutional law. However, Google's policy on hidden text seems to be in breach of the US constitution on freedom of expression. (Is that the first amendment? I can never remember.)

The government cannot abridge your freedom of speech. That does not mean that I cannot kick you out of my house for saying that you think disco is the music of the gods.

In fact, freedom of speech is one of the best protections for google's right to exclude any sites that they wish for any reason that they wish. They are expressing their opinions about which sites are good.

So, GoogleGuy, what do your lawyers have to say?

You might want to read the charter again. especially the part about "pleas to specific users such as a mod or GoogleGuy" being a no-no.

heini




msg:144937
 6:08 pm on May 27, 2003 (gmt 0)

> It would be expensive in programmer time and computation time

That's why all that we algorithmically detect/penalize stuff is mostly PR. Social engineering is much cheaper.

BigDave




msg:144938
 6:20 pm on May 27, 2003 (gmt 0)

That's why all that we algorithmically detect/penalize stuff is mostly PR. Social engineering is much cheaper.

Yeah, social engineering has the best ROI. But for that to keep working, they need to make some sort of progress against spam. Algorithmically catching spam is still cheaper than paying 100 people to sit and stare at the source of pages all day long.

If they hit hidden text with a high degree of accuracy, it might give pause to someone caught for that when they say you should not cloak. Social engineering backed up with real penalties is going to be a lot more effective.

tbear




msg:144939
 6:27 pm on May 27, 2003 (gmt 0)

Putting text with the same colour as a background image is not easy to find with an algo (I think). That has to be found manually....

bobnew32




msg:144940
 6:47 pm on May 27, 2003 (gmt 0)

Yeah, cuz I had hidden text on my site that said "Rendered in this amount of seconds". I quickly removed that tho...

cromo




msg:144941
 7:31 pm on May 27, 2003 (gmt 0)

Hi all,
I think the main point is not how to pass trough the Google algos.
I agree that maybe will always be a trick (but for how long? and always to check and check again the index and logs to see if the sites haven't gone in the fog... is this worth of daily paranoia?).

I'm indeed very interested to the hidden text chapter, and my doubt is: are those algos so smart to distiguish what is "legitimate" hidden text and what is not?

Moreover I think sometimes is not the way you hide text that proofs a spammy, tricky approach, but, for sure, what you hide.

If I project a server-side application for istance, it's not rare the use of some hidden text as a way of transmitting variables. This is certainly functionality, not spamming. But what if in a form there is an hidden field full of keywords?

From a technical point of view, the structure is the same...who can judge that one adds functionality and the other adds spam to the Google index?

Another example: time ago i did a microsite of just-one-page. That page was completely empty of visibile text and images: just few microlinks fluctuating and fading on the screen. Indeed, the page had a relative large amount of content, but content was hidden: all positioned inside many layers visibile on click or on rollover, and so on.

Fortunately the site had no other purposes than experimenting DHTML possibilities.... ;)

BigDave




msg:144942
 7:59 pm on May 27, 2003 (gmt 0)

Putting text with the same colour as a background image is not easy to find with an algo (I think). That has to be found manually....

And that is why it took them a long time to get the new test program just right. There is a lot of difficulty involved in doing this but it is not impossible.

The hard part is figuring out where to set the thresholds.

Putting white text on a picture of a polarbear in a snow storm would be enough to get you in trouble, and easy to spot (yes, even with an algo). But what about blue text on a picture of macaws in the forest? You might have one letter over the blue tail feathers, and the rest over the green leaves. Is that enough to get you banned? This would be a case of the algo being too agressive.

Being able to catch just about any sort of spam algorithmically is not that tough. Programming in the limits to reduce collateral damage that is the tough part.

Google doesn't *want* to penalize anybody. They would much rather we never tried to play them and just stick to the SEO that actually improves the site for all concerned.

This 168 message thread spans 6 pages: 168 ( [1] 2 3 4 5 6 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google News Archive
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved