Black & White Background Image - Hidden Text? - (deprecated) Google News Archive forum at WebmasterWorld

Forum Moderators: open

Message Too Old, No Replies

Black & White Background Image - Hidden Text?

Will Google think my black text is on a black background?

bodybuilding

7:00 am on May 3, 2003 (gmt 0)

On our site we use a background image that is half white and half black. We use tables to align the text inside the white and inside the black. On one side we have black text on a white background and the other is white text on a black background. Will Google be able to "see" the page to see that it is not hidden text? I can't understand how it would be able to do this!

Needless to say, our site was recently banned. Very frustrating since our site has 7000 pages of PURE content and we never spammed or did anything to purposely trick the search engines. We worked hard over 5 years to get the site where it is for the VISITOR to love it, and they do love it, but now it is much harder for people to find... even though we have the content that many people are looking for and can't find elsewhere. (Ugh. Can't sleep...)

Thanks,
Ryan
<...>

[edited by: ciml at 10:58 am (utc) on May 3, 2003]
[edit reason] No .sigs please. [/edit]

msgraph

2:33 pm on May 7, 2003 (gmt 0)

msgraph,
GG has said elsewhere that at present hidden text detection is invoked as the result of spam reports.

I'm just going by what has been said in this thread.

Within a month or so, if all of the hidden text is gone, you should should show back up in the index again.

It would be easier if people would just not try to hide text in the first place.. I don't like having penalties in Google, even if they're automatic.

He basically said that if the hidden text was still present when the filter went live, the site would be removed.

I wonder if someone heard it at pubcon and passed that message on? I know that when the filter went checking on that site (and it did), the hidden text had already been removed.

Maybe I'm wrong but to me this all sounds like these filters are live now. And that these filters are now applied to the monthly update than some kind of special crawler that is sent out when someone fills out a spam report. I'm just looking for clarification.

BigDave

7:03 pm on May 7, 2003 (gmt 0)

It is part of the spam report investigation process at present. There were a lot of spam reports on Jakob Nielsens site, so it definitely hit that site.

If there is no spam report on your site, it should not hit your site at this time.

warrenk

8:15 pm on May 23, 2003 (gmt 0)

The website that is being discussed in this topic definitely had hidden text that wasn't put in by mistake. Every item page had the hidden text. This website was only removed from Google for approximately 3 weeks which brings up an interesting question...how long should a penalty be for deliberate attempts to increase rankings? The websites that are competing to this website (including mine) resort to buying advertising because this website in question has the #1 rankings to most of the keywords (due to hidden text). Has Google changed the amount of time they ban websites?

More Traffic Please

9:51 pm on May 23, 2003 (gmt 0)

Hello all,

I've lurked for quite some time and figured I'd finally chime in.

I have noticed some sites trying to get around the penalty by using a white .gif background and white text on top of that. In the past only a hand check could pick it up. Is there any opinion on whether Google can now pick this up through a filter?

GoogleGuy

10:12 pm on May 23, 2003 (gmt 0)

Hi More Traffic Please--welcome to WebmasterWorld. I wouldn't advise people to try that either. :)

xMadx

10:16 pm on May 23, 2003 (gmt 0)

Some of my competitors are very vicious, yes yes
They use a gradiant background image and thay place text at the bottom of the page with the same color as the last color of the gradiant background.
Does google will detect this kind of spam?
Or does google detect if someone use a background image and place a text with the same color?
If yes it's mean that google download the background image analyse the color and compare the text color. (or something like that), great job but huge amount of data to analyse.
if not in few days we will see a lot of site with background images! :-(

More Traffic Please

10:25 pm on May 23, 2003 (gmt 0)

Wow! One post and one reply from GG, I'm on a roll!

xMadx

10:27 pm on May 23, 2003 (gmt 0)

Yep and we have the same reflexion about background images

shawn

1:07 am on May 24, 2003 (gmt 0)

Greetings,

I have read various posts on hiddent text but I have not really read a clear answer on text that uses a CSS color behind it. I have text at the top of my page with a contact number that is white and the background is green(which is part of the style sheet).

My question is 2 part -

1. would the filter detect this as hiddent text or can it tell that the stlye sheet is providing a background color.

2. It sounds like the only way to have this filter visit you is if someone submits a spam report.

I can supply the site URL via stickymail if anyone needs to view the site to give an acurate answer.

Shawn

LowLevel

3:02 am on May 24, 2003 (gmt 0)

If the objective is to find texts that users can't see, then I think that the definitive algorithm should be able to render the page and to search for texts with an OCR-based method.

I read that Matt Cutts has experience in "recognition of objects in compressed images".

Do you think that this type of "object recognition" experience would be usefull to project an OCR-based algorithm?

I mean: could it be technically possibile to obtain something not too "CPU hungry"?

BigDave

3:26 am on May 24, 2003 (gmt 0)

LowLevel,

You are correct that google has to render the page to properly detect the hidden text. Yet they do not need OCR or anything like that to detect the hidden text, they just need to put a few hooks into the rendering engine.

The background would be rendered first, then as the text is rendered over it, it would be trivial to compare the color of the text with the color of the pixels that the text is being rendered on.

If some text is rendered off the page, then it would also trigger an alarm.

This rendering engine would load CSS and JavaScript. It would trigger any JS events to see if the hidden elements appear. If they appear in any case then the page should not be penalized.

At least that is how I would do it. It should minimize the false positives, while catching most hidden text spammers.

A rather humorous side effect of this algorithm is that it will probably end up improving the SERPs on other search engines as spammer rush to clean up all their sites.

Chris_D

4:31 am on May 24, 2003 (gmt 0)

Hi GG

Well done. I just reread this whole thread - your response to the opening post was absolutely brilliant.

I just checked a few sites which I knew had hidden text - this is a huge leap forward.

The only thing I'm sad about, is that Hormel - the people who have been making canned luncheon meat for over 60 years - which was immortalised in a Monty Python sketch in the 1970s - and who had hidden text on the homepage of one of their UK sites - have cleaned it up!

If the makers of SPAM can't even spam - then I'd have to ask - what's the world coming to? :)

Looks like I need to redo one of my client presentations 'the what not to do presentation' - one of my favourite demos was to surf to the homepage of the world's most famous luncheon meat - and press 'Ctrl A' - always cracked the room up!

: )

Well, there's egg and bacon; egg sausage and bacon; egg and spam; egg bacon and spam; egg bacon sausage and spam; spam bacon sausage and spam; spam egg spam spam bacon and spam; spam sausage spam spam bacon spam tomato and spam....

Chris_D

nanocet

5:04 am on May 24, 2003 (gmt 0)

>The only thing I'm sad about, is that Hormel - the people who have been making canned luncheon
>meat for over 60 years <snip> have cleaned it up!
>
>If the makers of SPAM can't even spam - then I'd have to ask - what's the world coming to? :)

You can still show them by going to archive.org.

LowLevel

5:53 am on May 24, 2003 (gmt 0)

BigDave,

I really like the idea of hooks attached to text writing functions. :)

But I think that comparing the color of the text with the color of the pixels of the background under the text is not enought.

Someone could just create a background with a "negative" image of the text, positioning (via CSS) the text exactly on its negative.

Technically, the color of the text would be different from the color of the underlying background, if you do a pixel-by-pixel (text/background) comparison, but the text would be invisible to the users.

Further, if the background is an image and not a "solid" color, the only safe method to recognize the letters would be a "classic" OCR scan, searching for the contours of the letters or the shape of the letters on a pre-rendered image.

It is interesting to note that this type of algorithm would be different from a normal OCR scan, since OCR algorithms usually search for "any text" in the page, while Google-OCR just needs to search the page for the text found in the HTML code.

BigDave

6:05 am on May 24, 2003 (gmt 0)

Obviously it would have to be more sophisticated than my example. You could do things like check for sufficient contrast to the background in the block of character space for a start. You owould also have to look for more a large group of characters that are questionable. One hard to read character out of a paragraph over an image should not be enough to nail you.

Depending on OCR would greatly complicate things because you would have to have a camera actually pointing at the screen, then communicating between the rendering engine and the OCR application. it's possible, but not too feasible. Everything that OCR could do, could be done algorithmically without the extra steps.

Chris_D

6:05 am on May 24, 2003 (gmt 0)

Hey Nanocet

Good call!

And there it is - still in the wayback machine.... scroll down, 'ctrl A' ..

.. H1 'Chopped pork and ham'......

LOL

Thanks Nanocet - you've made my day!

Best

Chris_D

LowLevel

6:36 am on May 24, 2003 (gmt 0)

BigDave,

I think I have caused a misunderstanding! :) With "OCR" I was referring to an OCR software application, able to "algorithmically scan" image files (produced by a visual browser) with no hardware support.

Another improvement would be to convert color images to grey images before the "file scan", it could make easier and faster the process.

I think that it can be done, but I'm uncertain about the speed of the algorithm. :-/

This 47 message thread spans 2 pages: 47