Detection Of Hidden Stuff

Forum Moderators: open

Message Too Old, No Replies

Detection Of Hidden Stuff

seeking pub conference feedback

austtr

10:26 pm on Apr 29, 2003 (gmt 0)

From a post in another thread:

<For those who get to the pubconference and talk about this new hidden text algo, please post it here<

OK party people, anyone sober enough yet to write an intelligible response on this?

DaveN

8:49 am on Apr 30, 2003 (gmt 0)

personally I think the problem with googlebot and hidden text is that there are so many ways of hiding text of real purposes not seo,

we used a css to change the colour of the text on one site it started off the same colour as the background and in a layer then the style sheet would throw the layer to the front and change the text to be visible, it was part of the design the site as over 200 pages index will this get burnt?

DaveN

DrOliver

10:09 am on Apr 30, 2003 (gmt 0)

I used to block the css directory to make sure no SE accidentally indexes these files and they would show up in the SERPs. I have no problem taking them out of the robots.txt.

So far so good.

But what is with the text I am hiding from modern browsers, something like:

"This site's design is only visible in graphical browsers that support web standards...."

I do hide text like that in browsers that can handle @import, and it will show up in browsers that don't. So I have a linked CSS file with something like:

div#note{color:red;}

and an imported CSS file with something like:

div#note{display:none;}

There it is: "display:none;". Well great, there must be a thousand other legitimate reasons for "display:none;".

Now what?

robertito62

11:24 am on Apr 30, 2003 (gmt 0)

"...Google would never do anything for a few hundred (thousand) people who complain that would create several thousand (million) complaints that perfectly innocent sites were banned..."

Not sure.
I saw suspicious activity on my counter stats last week.

On a few pages I had this:
<DIV style="LEFT: 0px; VISIBILITY: hidden; WIDTH: 650px; POSITION: absolute; TOP: 0px; HEIGHT: 1px">some copy</DIV>

When checking counter stats I saw the pages that carried this code (about 5, for 5 different keywords) being hit by 4000 clicks, one per day. Usually, hits max at 200/300.

On a closer look, there was only one referral URL for the hits. It was the URL for the search result page for each specific keyword (all page one results).

Counter company killed the referrals. After I complained, they came back as 'bookmarks'. The URL in question was nowhere around.

I have never had 4000 hits from just one single referring URL. Could have been a hitbot, but too much of a coincidence (time of occurrence and pages using Div:hidden).

Needless to say, the style is gone.

vitaplease

11:33 am on Apr 30, 2003 (gmt 0)

Concerning hidden text, Matt Cutts, during the questions and answers afterwards, seemed rather confident that both css and layers would be taken care of in decent way.

He said things would already be starting to happen as of a week ago. (would that be immediately after the deep crawl?).

I am sure they will start out with the obvious stuff and see how the web world reacts.

Basically it was proposed the penalty will be a 30 day thing, until correction.
Pagerank and crawling will not be effected by the hidden text penalty (as I see it, they have to crawl to see your corrections, and once corrected normal PR is back).

WebGuerrilla

9:52 pm on Apr 30, 2003 (gmt 0)

According to inside sources, the hidden text filter will be an automated tool used to process spam reports.

That means that rather than dedicated a bunch of humans to look at all the sites that are using hidden text manually, they will dispatch a separate bot (more than likely posing as a browser) to crawl the sites in question. A separate algo will be used to parse the pages. (including external style sheets).

If a violation is found, the site will automatically be dropped. Once it is dropped, the bot will revisit on occasion to check and see if the offending content has been removed. If it has, the site will be reincluded automatically.

piskie

10:21 pm on Apr 30, 2003 (gmt 0)

IMHO Google is entitled to download (by bot or human) any external file .css .js or anything else that affects the visual display of content as presented to the end user, whether or not these external files are robots.txt protected or not.

I have nothing to hide on my clients sites after reworking to my criteria. If a potential client has these "attributes" on their site and insist on keeping them, then I decline the contract.

I welcome the new algo (as rumoured) providing Google safeguards the innocent better than they have when introducing previous innovative changes.

johnser

10:30 pm on Apr 30, 2003 (gmt 0)

Presumably WebGuerilla this would only be used on sites which have been reported for spam?

It would not be used on sites unless they are reported. Is this correct?

If this is true then User Agent & IP cloaking goes out the window :(

Does this mean there'll be no manual spam checks by Google?

WebGuerrilla

11:02 pm on Apr 30, 2003 (gmt 0)

Presumably WebGuerilla this would only be used on sites which have been reported for spam?

Yes, that is how it was explained to me. (Although I do seem to remember the words "for now" being thrown in).

It would not be used on sites unless they are reported. Is this correct?

I wouldn't go quite that far. There are certain high crime neighborhoods on the web that could certainly see some randome patrols. But for the most part, I think the goal is to combine the power of human spam reports and automated technology.

If this is true then User Agent & IP cloaking goes out the window :(

I'm not sure I follow what you mean. But I think the thing to keep in mind is that the goal really isn't to completely prevent any pages using shady techniques from entering the database. The goal is to lessen the negative impact those pages may have on the user experience.

The biggest single thing that contributes to meeting that goal is to reduce the average shelf-life of content that doesn't meet your standards.

So even if the automation is only applied to content listed in a spam report, it will still have a huge impact because the turnaround time from spam being reported and spam being removed will be dramtically shortend.

Does this mean there'll be no manual spam checks by Google?

No. It just means that humans who are checking will have some powerful tools at their disposal to speed up the process.

rmjvol

2:07 am on May 1, 2003 (gmt 0)

Matt used the name "Spambot" as in "we'll set the Spambot loose on it." My inside info would lead me to believe that SpamBot could be directed both manually and automagically.

GoogleBot (both male & female) has now begotten FreshBot & SpamBot. Can't wait for a family portrait.

rmjvol

Eric_Lander

3:33 am on May 1, 2003 (gmt 0)

It would be great if the google bot could parse js

According to Matt at the conference, you shouldn't think that they cannot parse it. Well, relative to links anyways.

From what I can recall, he mentioned that Gbot will read through it all as if it were plain text. Obviously, anytime you'd see "http://" you'd know that if not a link, there's referencee to something.

Should be interesting how all of this plays out though. Online time will, and should, tell.

mayor

5:29 am on May 1, 2003 (gmt 0)

I don't understand why Google or anyone would care if you hide links in javascript. These links would be hidden from spiders, not humans. Being invisible to spiders, they could not be used for link spamming the SE algorithms.

What am I missing?

mayor

5:40 am on May 1, 2003 (gmt 0)

rmjvol >> GoogleBot has now begotten FreshBot & SpamBot. Can't wait for a family portrait.

With SpamBot pigging out on spam reports, how long before SnitchBot enters the clandestine SEO scene, to spam SpamBot with competitors' pages? It's the community portrait that's gonna be the most interesting.

mil2k

6:27 am on May 1, 2003 (gmt 0)

we all have a responsibility as people who make a living out of the web to ensure that what the actual searchers see in SERPS are useful and as devoid of spam as possible.

Very well said. My thoughts exactly. Very good thread. Please keep the knowledge flowing.

Yidaki

10:22 am on May 1, 2003 (gmt 0)

Forgive me, if my question doesn't fit into the hidden stuff thing but i don't want to start a new thread ... since many here are talking about google rep's answering questions at the last pub con, did matt or somebody else from google say something about their future handling of the usage of guestbook submissions to manipulate pr? Don't get me wrong - i don't want to discuss again the pro's and con's of guestbooks! I'm just curious if the officials explained their opinion about them.

Who talked to matt about guestbooks?

vitaplease

10:25 am on May 1, 2003 (gmt 0)

Maybe another thread subject indeed, but:

I think Matt said something about a guestbook sauce already having been put into place (into the algo) in January of this year, with more to come in the following months.

DaveN

10:37 am on May 1, 2003 (gmt 0)

personally I think they are PR0 Guestbooks at an amazing rate.

DaveN

Yidaki

11:16 am on May 1, 2003 (gmt 0)

>personally I think they are PR0 Guestbooks at an amazing rate.

Could be true but i can't see any changes to the positions of the linked pages yet. Because of my observation that some sites are continiously doing amazing fine with guestbook only links and hidden text / links to a max, i'm really curious to see the spam detection plans going reality. After the last update i observed a big drop in guestbook backlinks for the above mentioned sites - but no drop in postions at all (even the oposite)... this could mean that many guestbooks are now pr0 but the full impact to the positions of the linked pages needs some more update(s) ...

vitaplease and webguerilla said that some filters (hidden text / css / layer) are allready active, if i understand correct, right!? Is this confirmed by others? And, again, did matt or any other google rep say something about guestbooks? vitaplease, did matt explain, what the changes were in January?

vitaplease

11:30 am on May 1, 2003 (gmt 0)

>>vitaplease, did matt explain, what the changes were in January?

No, he only noted that it seemingly came about unnoticed.

I can still see sites with 900 plus guestbookish backlinks doing well (even after the last update with general Pagerank drop), but then there could always be other normal links in play as well.

>>vitaplease and webguerilla said that some filters (hidden text / css / layer) are allready active

Only relaying what I thought Matt said, something to the effect of:
"And you will see these effects taking place as of approx. a week ago".

mipapage

11:31 am on May 1, 2003 (gmt 0)

Hey Yidaki,

I saw a few sites drop from the serps last week, real spammers. They also got greybarred.

As I have mentioned in other threads, I have been seeing different results on www-sj serps for my keywords than all of the other datacenters since the last dance (interestingly, these serps only appear on Yahoo! and Alexa).

The main difference that I can see between these serps and the normal google.com serps is that the blatent spammers are nowhere to be seen. I have a feeling that they may have unleashed spambot on these serps (at least the ones for my keyphrases!), or at least something quite remarkable was done to them wrt spam.

GlynMusica

2:09 pm on May 1, 2003 (gmt 0)

I'd quite like to see Google have a page where webmasters can signal that there site has switched server locations.

I've found in the past that Google has been slack at updating sites and has cached the old versions of websites when moved from a fixed IP to a virtually hosted server. In this respect Google could simply check for the existence of the URL in it's database (thereby filtering out all the spammers that would no doubt see this as a quick route in to the Google index) and, if present reset, whatever data it has in it's database to look for the new DNS data.

If I have made a mistake in the terms above I've used you'll note that it is because it is getting nearer server/dns maintenance and I'm no pro on that.

Glyn.

egomaniac

4:52 pm on May 1, 2003 (gmt 0)

At the conference, I asked Matt Cutts a few questions about the hidden text filter. He told me that the filters are intended to catch things like invisible pixel links. I asked about punctuation links (e.g. a "text" link attached to a single period), and he said that he thought that they already had something in place to catch those. I asked about javascript, and he said not to worry about that. Javascript links are fine, whether they are on page or off-page in remote javascript files.

I did not discuss hidden css layers with Matt.

During his panel comments to the entire audience he said that the filters would start hitting a week to week and a half after the conference.

A note about punctuation links: I don't believe Google has dealt with these previously as Matt thought. I know of a few well-ranked sites that are still using this old technique. Frankly I think the technique offers no value these days due to the importance of anchor text in the link. Nonetheless, some people haven't gotten around to changing it.

My conversation with Matt was very brief. WebGuerilla appears to have had a much more in depth discussion (see his comments earlier in this thread).

gniland

6:18 pm on May 1, 2003 (gmt 0)

In addition to hidden text, the 1x1 pixel image is also supposed to be targeted.

I think the spam report page is going to see an increase in traffic. Instead of someone knowing of a real abuse and reporting it, I can imagine some people submitting all of their competitors' websites just to have the google spambot visit them maybe get lucky and find something.

fashezee

8:06 pm on May 1, 2003 (gmt 0)

If a HTML page does not contain any scripts but yet has content within the <noscript> tag,
will this be considered spam?

RawAlex

8:12 pm on May 1, 2003 (gmt 0)

fashezee, that technique is very similar to people to pile links in between <style> and </style> tags... browsers don't show them, they are invalid style items, so they are not considered in page display, but googlebot sure does love to slurp those things up.

I have had people submit sites for my directory that have index pages over 800 lines long, jammed full of this type of stuff. Technically, the stuff isn't HIDDEN, just placed not to display.... sort of like stuffing the <noframes> section of a frames page.

In the end, it is an attempt to make googlebot see things that the public doesn't see, and should be punished.

Alex

djmad_wax

8:19 pm on May 1, 2003 (gmt 0)

pardon my ignorance but this doesnt have anything to do with ALT tags does it?

mdub

Craig_F

8:58 pm on May 1, 2003 (gmt 0)

djmad_wax, no this is about text the same color as your background so users can't see it. and also 1px links.

fyi -- I just checked the a couple sites I keep tabs on the hidden text guy is PR0 (that's new). The hidden links guy is still good.

ominous

9:11 pm on May 1, 2003 (gmt 0)

What about text that is hidden because it is only of use to a certain segment of the population? For instance, on my sites I use hidden text links that allow people who use screen readers to jump straight to my menu (located at the end of my pages). Will my sites be penalized for attempting to accommodate the handicapped?

WebGuerrilla

9:40 pm on May 1, 2003 (gmt 0)

ominus,

There really isn't any way to know at this point. However, I think this thread shows how difficult building such a tool actually is. There are many different ways to hide stuff. And there are also many different legitimate design elements that could easily get mistaken for an attempt to spam.

We won't really know if it will be a problem or not until we have some examples of sites that have been evaluated by the SpamBot.

Now if you'll excuse me, I'm off to build a site with every hidden text trick I know of so that I can report it and see what happens!

DaveN

7:33 am on May 2, 2003 (gmt 0)

WG it will most probably rank #1 for the next year ;)

Dave

Yidaki

2:09 pm on May 3, 2003 (gmt 0)

>Now if you'll excuse me, I'm off to build a site with every hidden text trick I know of so that I can report it and see what happens!

Seriously, this idea isn't so bad. Better than switching completely over to the dark hats and change all your sites to beat (what i call) spammers by using their tactitcs. I said multiple times, i'd start to spam if spam seems to win over nospam. Allthough i really don't mean this 100% seriously i'd try to build a test site instead (something like sj, lol) and see if this really is the solution. Sadly but one possible way ... i guess that's how the today #1 (in my field, not in general!) started. Build a site, see how far you can go with google and how long you can survive with it. Than go for it or don't. Sigh.

fyi: what really frustrates me after more than 12 month observing the results within my field and putting a terrible huge effort into building quality sites, is the fact that building a site that get's #1 without guidelines in mind takes 0,1% of the time and 0,1% of the work and obviously doesn't bring you so much in trouble ... i don't speak about the sj results and no, i'm not a looser - sometimes my site's listed above spammers. But it took me much much more work than just putting my knowledge into a dark hat and building a #1 with a few clicks and a nice guestbook signing script.

Despite all the discussion about the definition of spam - i see people that are really skilled in seo things, building quite "intelligent" artificial web site structures that are pure spam, stay for a very long time and beat quality sites ... *and* receive traffic that should go to others. My further fear is that if google can't successfully fight spam, they'll someday change to pfi and cut all free traffic. You can see this changes at all of the other (formerly major) se's.

So i'm asking myself again and again (even after a year of WebmasterWorld'ing): should i or shouldn't i? Will google nevertheless stop free traffic one day - so should i take all i can get now? I'll bite myself in the a** if that'd become true ...

please, veterans, tell me to cool down and gimme some of your patience.

This 117 message thread spans 4 pages: 117