Forum Moderators: open
Needless to say, our site was recently banned. Very frustrating since our site has 7000 pages of PURE content and we never spammed or did anything to purposely trick the search engines. We worked hard over 5 years to get the site where it is for the VISITOR to love it, and they do love it, but now it is much harder for people to find... even though we have the content that many people are looking for and can't find elsewhere. (Ugh. Can't sleep...)
Thanks,
Ryan
<...>
[edited by: ciml at 10:58 am (utc) on May 3, 2003]
[edit reason] No .sigs please. [/edit]
msgraph,
GG has said elsewhere that at present hidden text detection is invoked as the result of spam reports.
I'm just going by what has been said in this thread.
Within a month or so, if all of the hidden text is gone, you should should show back up in the index again.
It would be easier if people would just not try to hide text in the first place.. I don't like having penalties in Google, even if they're automatic.
He basically said that if the hidden text was still present when the filter went live, the site would be removed.
I wonder if someone heard it at pubcon and passed that message on? I know that when the filter went checking on that site (and it did), the hidden text had already been removed.
Maybe I'm wrong but to me this all sounds like these filters are live now. And that these filters are now applied to the monthly update than some kind of special crawler that is sent out when someone fills out a spam report. I'm just looking for clarification.
I've lurked for quite some time and figured I'd finally chime in.
I have noticed some sites trying to get around the penalty by using a white .gif background and white text on top of that. In the past only a hand check could pick it up. Is there any opinion on whether Google can now pick this up through a filter?
I have read various posts on hiddent text but I have not really read a clear answer on text that uses a CSS color behind it. I have text at the top of my page with a contact number that is white and the background is green(which is part of the style sheet).
My question is 2 part -
1. would the filter detect this as hiddent text or can it tell that the stlye sheet is providing a background color.
2. It sounds like the only way to have this filter visit you is if someone submits a spam report.
I can supply the site URL via stickymail if anyone needs to view the site to give an acurate answer.
Shawn
I read that Matt Cutts has experience in "recognition of objects in compressed images".
Do you think that this type of "object recognition" experience would be usefull to project an OCR-based algorithm?
I mean: could it be technically possibile to obtain something not too "CPU hungry"?
You are correct that google has to render the page to properly detect the hidden text. Yet they do not need OCR or anything like that to detect the hidden text, they just need to put a few hooks into the rendering engine.
The background would be rendered first, then as the text is rendered over it, it would be trivial to compare the color of the text with the color of the pixels that the text is being rendered on.
If some text is rendered off the page, then it would also trigger an alarm.
This rendering engine would load CSS and JavaScript. It would trigger any JS events to see if the hidden elements appear. If they appear in any case then the page should not be penalized.
At least that is how I would do it. It should minimize the false positives, while catching most hidden text spammers.
A rather humorous side effect of this algorithm is that it will probably end up improving the SERPs on other search engines as spammer rush to clean up all their sites.
Well done. I just reread this whole thread - your response to the opening post was absolutely brilliant.
I just checked a few sites which I knew had hidden text - this is a huge leap forward.
The only thing I'm sad about, is that Hormel - the people who have been making canned luncheon meat for over 60 years - which was immortalised in a Monty Python sketch in the 1970s - and who had hidden text on the homepage of one of their UK sites - have cleaned it up!
If the makers of SPAM can't even spam - then I'd have to ask - what's the world coming to? :)
Looks like I need to redo one of my client presentations 'the what not to do presentation' - one of my favourite demos was to surf to the homepage of the world's most famous luncheon meat - and press 'Ctrl A' - always cracked the room up!
: )
Well, there's egg and bacon; egg sausage and bacon; egg and spam; egg bacon and spam; egg bacon sausage and spam; spam bacon sausage and spam; spam egg spam spam bacon and spam; spam sausage spam spam bacon spam tomato and spam....
Chris_D
I really like the idea of hooks attached to text writing functions. :)
But I think that comparing the color of the text with the color of the pixels of the background under the text is not enought.
Someone could just create a background with a "negative" image of the text, positioning (via CSS) the text exactly on its negative.
Technically, the color of the text would be different from the color of the underlying background, if you do a pixel-by-pixel (text/background) comparison, but the text would be invisible to the users.
Further, if the background is an image and not a "solid" color, the only safe method to recognize the letters would be a "classic" OCR scan, searching for the contours of the letters or the shape of the letters on a pre-rendered image.
It is interesting to note that this type of algorithm would be different from a normal OCR scan, since OCR algorithms usually search for "any text" in the page, while Google-OCR just needs to search the page for the text found in the HTML code.
Depending on OCR would greatly complicate things because you would have to have a camera actually pointing at the screen, then communicating between the rendering engine and the OCR application. it's possible, but not too feasible. Everything that OCR could do, could be done algorithmically without the extra steps.
I think I have caused a misunderstanding! :) With "OCR" I was referring to an OCR software application, able to "algorithmically scan" image files (produced by a visual browser) with no hardware support.
Another improvement would be to convert color images to grey images before the "file scan", it could make easier and faster the process.
I think that it can be done, but I'm uncertain about the speed of the algorithm. :-/