Detection Of Hidden Stuff

Forum Moderators: open

Message Too Old, No Replies

Detection Of Hidden Stuff

seeking pub conference feedback

austtr

10:26 pm on Apr 29, 2003 (gmt 0)

From a post in another thread:

<For those who get to the pubconference and talk about this new hidden text algo, please post it here<

OK party people, anyone sober enough yet to write an intelligible response on this?

mrguy

10:28 pm on Apr 29, 2003 (gmt 0)

I have noticed it has been real quite here after this last conferance.

The one prior to that people were giving us a play by play on the day it tool place.

Did I miss the thread for the latest one?

Marcia

10:52 pm on Apr 29, 2003 (gmt 0)

The conference blow by blow, start to finish:

[webmasterworld.com...]

>>new hidden text algo

Anyone who's got any hidden text really should get rid of it now. It'll hit by surprise, and making it through an update is no indication of safety. Also, it wouldn't be a good idea to have it still on if the pages go and it's still there for a re-check.

[edited by: Marcia at 10:54 pm (utc) on April 29, 2003]

pageoneresults

10:54 pm on Apr 29, 2003 (gmt 0)

Marcia, did you hear any discussions concerning dhtml menus and the use of hidden divs?

Marcia

11:01 pm on Apr 29, 2003 (gmt 0)

>>dhtml menus and the use of hidden divs?

I didn't hear anything pageone, all I know is hidden text will be getting hit.

It doesn't seem logical for dhtml menus to be a problem they're so common, but let me ask you this: Do you think hidden divs would be as easy to detect as hidden text?

skipfactor

11:05 pm on Apr 29, 2003 (gmt 0)

>>It'll hit by surprise...

I love surprises, thanks Marcia. :)

Liane

11:08 pm on Apr 29, 2003 (gmt 0)

Unless I misheard him at the pub conference in Boston, I believe Matt Cutts (Google) specifically mentioned that hidden anything will be zapped as soon as it is discovered over the next several updates. He even mentioned that a certain member of the Google Board of Directors (whom he did not name but I am fairly certain was Jakob Nielsen) would get zapped if his site hadn't been altered to remove hidden text. (I checked ... it has been removed.) :)

pageoneresults

11:08 pm on Apr 29, 2003 (gmt 0)

Do you think hidden divs would be as easy to detect as hidden text?

Probably easier...

div.hidden{position:absolute;top:-800;left:-800;}

It is a tough call with dynamic menus that use a hidden div to hide content until a user activates that content and then the div becomes visible. This same process is used to hide text and that is what I'm thinking Google is referring to. There may be many people who get caught in the crossfire if this is something automated.

I have purposely stayed away from using those types of navigation menus just because of the "hidden" aspects of the <div>s. I have many clients asking me why I won't set up menus like that. Well, maybe I can show them after a few updates. I'll take them back to the same sites they showed me as an example and we'll see if they were affected.

Krapulator

11:24 pm on Apr 29, 2003 (gmt 0)

So what I can gather fom this then is that all the savvy spammers will move their hidden text into a div and then hide the div with CSS and viola!

I must say, I am extremely curious to see how this filter is applied...theres going to be lotsa of newbie threads in this forum askling what happened to their site.

austtr

11:50 pm on Apr 29, 2003 (gmt 0)

The answer I'm waiting for involves the common practice of hidden links connecting a family of spam sites, where the text colour is set to be the same as a background .gif or .jpg

To my way of thinking, the computational grunt needed to auto-detect this would be impressive to say the least.

pageoneresults

11:52 pm on Apr 29, 2003 (gmt 0)

Hide the div with CSS and viola!

Actually I think this is what Google is referring to. I don't see much of hidden text anymore, you know, white on white or whatever colors are used.

Some of the more advanced on the edge marketers might be employing the above hidden <div> strategy. There are different ways to go about this, all with the same results, hidden text.

I've reviewed quite a few sites over the past year and have seen this in use and it was working then. Sounds like G has received too many complaints and is now going to take action. The problem is, there are situations where hidden <div>s are used from a design standpoint such as the dhtml menus that I refer to above.

mipapage

12:02 am on Apr 30, 2003 (gmt 0)

Hmm.. Sounds kinda scary. I'm working on a site as I type this message that has a collapsable menu, made with div's set to "display: none;" and javascript. I hope that those clever Googleons have a way of sorting this out!
Though I can't imagine how!

johnser

12:07 am on Apr 30, 2003 (gmt 0)

Any word on whether or not hiding text with CSS will be picked up?

Is Google currently parsing style sheets, can you block them with a robots.txt file and would any of this flag you to them?

pageoneresults

12:08 am on Apr 30, 2003 (gmt 0)

I would think blocking your css might be one of the first flags for an automated filter.

rfgdxm1

12:09 am on Apr 30, 2003 (gmt 0)

I just noticed a site with a huge amount of hidden black text on black backround (along with even a hidden link to porn site) that I specifically invoked the name of Googleguy in reporting went gray toolbar. Googleguy wrote that the filters were being tweaked to automatically pick this up. It made it through the last update, but finally git whacked. Looks like Google is getting better at picking this sort of thing up.

BigDave

12:54 am on Apr 30, 2003 (gmt 0)

I thought I would point out a few things that I noticed.

Matt (aparently) stated that just because you make it through an update, it doesn't mean that you are safe. This tells me that the filter is not applied to all the sites crawled, at the time of the crawl. I suspected that this would be the case, as that would be very computationally intensive.

If they are not going to be hitting every site at the time of the crawl, they could very easily be checking sites constantly from systems with user agents and IPs other than those used by googlebot.

They do not necessarily have to honour robots.txt exclusion protocol if they do not use a bot. It could very well be a "browser" that renders the entire page, CSS, JS and images. This browser could be monitored by a person and fed the URLs to check out automatically. One person could "check" thousands of pages an hour this way. Google never claims that there will not be a person involved, they just say they are implementing a filter that checks for hidden text algorithmically.

If the hidden text filter is fetching new copies of the page instead of working from the cache, they could easily add an additional check against what is in the cache to look for cloaking.

Rick_M

2:31 am on Apr 30, 2003 (gmt 0)

BigDave - that would be quite a job to sit there and review thousands of pages manually. Perhaps they could train a pigeon to do that? Doesn't sound quite as crazy as it did last year...

BigDave

3:28 am on Apr 30, 2003 (gmt 0)

They don't have to review them manually. The algorithm would be part of the browser. The *only* reason for human involvment is so that it doesn't have to abide by robots.txt, and to possibly check out borderline sites.

The point I was trying to make is that you should not consider robots.txt exclusions, CSS or JS tricks to keep your site safe.

Big_Balou

4:18 am on Apr 30, 2003 (gmt 0)

I specifically asked Matt about parsing style sheets as most of us are using them. He said "We will be handleling styles sheets correctly" .

To my thinking it would be relatively easy to detect hidden divs automatically but other techniques might be harder unless they are actually going to use a human to review sites that have been reported.

Chris_R

4:27 am on Apr 30, 2003 (gmt 0)

There is no way a filter will not hit innocent sites.

[fireflyfans.net...]

Is a classic example.

Also - there are tons of ways webmasters make sites. Some use dreamweaver and front page.

You can have cels inside colums inside other tables with different backgrounds and the like. Any change to one of these can make or break a word being visible (In some browsers).

They don't work the same all the time. Imagine a website where the webmaster has black text in a table. He links some of the text and now it is blue [because it is linked].

He then changes the background to black - he now has black on back text - but doesn't realize it because his links are blue. He then unlinks on of the words as he is no longer interested in it, but forgets to delete the word - as it is now invisible to him.

Sure - people can say this is unlikely - I see stuff like this all the time - people forget things they can't see. Google will take thousands of innocent sites - all so the people that whine and complain will shut up.

Soon these people will fix their sites - and the whiners and complainers will be back, but with nothing to whine about....

And of course - it doesn't take a genius to put black text on top of a black gif - no filter is going to be able to do that. This is just a tremendous waste of time to appease those that think invisible text is important to begin with.

BigDave

4:55 am on Apr 30, 2003 (gmt 0)

And of course - it doesn't take a genius to put black text on top of a black gif - no filter is going to be able to do that. This is just a tremendous waste of time to appease those that think invisible text is important to begin with.

Text on top of an image is easy. You just render the page and compare the color of the text bits as they are placed in relation to the background that they are placed on.

This is the same way that you look at text with CSS inside tables inside divs ......

Putting checks on a rendering engine is a lot of work, but it would be the most accurate way to do this.

stever

5:32 am on Apr 30, 2003 (gmt 0)

I just can't see Google doing anything which affects dhtml menus. They are so widespread in general web design (as opposed to WW web design?) that they would be taking a tremendous hit on potential backlash for what would be little return.

As people have mentioned, outside a hand check, who can tell the difference between a background .gif and text? How does your bot read a css file called by .js? (Common dhtml) How does your bot deal with alternate style sheets? (Common dhtml) How does your bot deal with @import? (Very common dhtml) How does your bot deal with sliding menus? (Common dhtml)

IMO it's far too much effort for far too little effect. Far more effective to follow the usual Google methodology and blow smoke...

Marcia

5:38 am on Apr 30, 2003 (gmt 0)

>IPs other than those used by googlebot

That's the way I understand it, that it won't be the same IPs.

chiyo

5:41 am on Apr 30, 2003 (gmt 0)

just a reminder that robots.txt only SUGGESTS to a ROBOT that it shouldnt INDEX certain directories. Even if the robot respects robot.txt through its policies (there is nothing illegal about a robot not respecting robots.txt though it is not good manners and can end up with them being banned on a site by site basis), it does not have anything to do with even suggesting that it should not be "looked at" or downloaded, or used by a "non-robot", or used in other ways.

It makes sense to suggest to a robot that they should not index your css, cgi, admin or other directories and should be seen as helpful.

But Robot.txting out css directories is useless if you are only trying to "hide" CSS invisible text tricks from Search engines.

[edited by: chiyo at 5:57 am (utc) on April 30, 2003]

chiyo

5:43 am on Apr 30, 2003 (gmt 0)

It would be great if the google bot could parse js. We have hundreds of incoming links from other sites that display our RSS feeds using js. We guess at the moment these don't count for link popularity (or PR) purposes, even though they have decided that our content is useful for their readers.

js can also be used to "hide links" if not the text itself. Was any comment made on js as a "hidden text" or "hidden links" method?

[edited by: chiyo at 6:00 am (utc) on April 30, 2003]

LowLevel

6:00 am on Apr 30, 2003 (gmt 0)

I don't know what method Google will use to detect hidden text, but if they have decided to use it, I'm sure they tested it a lot, to be sure that no innocent sites will be penalized.

Just my 2 cents.

liquidstar

6:07 am on Apr 30, 2003 (gmt 0)

there would be literally 10-20 ways around any filter they come up with. what about z-indexing for example.

All the text I want google to see absolute positioned

An image, table, anything solid, over the top using z-imaging and absolute positioning

Would never be able to be detected. Even a hand check would be hard pressed to see the problem.

Much ado about nothing if you ask me. Google would never do anything for a few hundred (thousand) people who complain that would create several thousand (million) complaints that perfectly innocent sites were banned.

Spam in one form or another is here to stay. You just have to beat it. And remember the common SEO saying that it's spam if your competitor is ahead of you; it's great SEO if you're ahead of your competitor.

This is what I tell the guys who work for me. Stop crying start optimizing. Good sites rise to the top... eventually.

chiyo

6:15 am on Apr 30, 2003 (gmt 0)

>>And remember the common SEO saying that it's spam if your competitor is ahead of you; it's great SEO if you're ahead of your competitor. <<

yes its common, and a good way to motivate staff and get them concentrating on positives and negatives - but wrong.

it could be that site is actualy better in non SEO areas like content, usability and value.

we all have a responsibility as people who make a living out of the web to ensure that what the actual searchers see in SERPS are useful and as devoid of spam as possible. Otherwise people will just stop visiting and using the Web and nobody will read your stuff anyway.

Trying to outspam or ourseo your competitors may work for a short time, but not for a long time. The logical long result is more spam, or more sites that are there because of good SEO rather than good utility, value, quality or content.

Thats why i report what i see as spam - but leave it up the SE to decide for their puposes whether it reall is. Yes im a white hat, but for good reason other than being a complainer. We have to approach it both ways

Hollywood

6:24 am on Apr 30, 2003 (gmt 0)

Greed is good, but the google pagerank ban is more like fear is good, be ethical people, come on, a ban gets ya yanked for a few months.

gilli

8:29 am on Apr 30, 2003 (gmt 0)

chiyo: very well said!

This 117 message thread spans 4 pages: 117