Forum Moderators: open
[edited by: DaveAtIFG at 11:04 pm (utc) on May 22, 2004]
[edit reason] No specifics please [/edit]
When you think about it, actually, it's not that challenging to simply have many computers running IE browser controls.
Once the page is rendered by IE, pull out the resulting document object model with colors/font sizes/positioning all settled and base your banning conclusions on that.
Of course, Google might not do this everytime Googlebot hits your website, but I can imagine that they hit at least a couple random webpages from at least 20%-30% of all domains.
The question is - do you feel lucky, punk? Do ya?
(please note this was a sore attempt at humour and not meant to be specific to you or anyone else)
I thought about using CSS to hide it but if google can tell that I'm hiding it anyway, what's gonna happen?
The question is - do you feel lucky, punk? Do ya?
I got it, Clint! Uh, I mean...blaze.
I actually HAVE to use hidden text.
If I don't have a 2 lines of text on a page, the template kind of implodes and looks horrid. So for my pages which don't have much text (only images, nested tables etc) I HAVE to put some hidden text at the bottom of the page.
Try <p> </p>
I HAVE to put some hidden text at the bottom of the page
What about putting some visible text there? I have loads of include files on my site saying things like 'if you can't find the information you are looking for please contact us at blah blah' or 'do you know about our widget installation service?' which I slot in at the bottom of various pages.
Put it in an external CSS file, and then disallow that file in the robots.txt
that's very funny, how would a search engine get around that minor problem?
User-agent: *
Disallow: *.css
they'd have to either violate robots.txt or.. what?
out of about 2500 hits from googlebot during the last weeks there were exactly 0 requests for my external style sheet style.css.
this is what I would expect. I'm sure google is working on this capability, but I'm suspicious when it's claimed that they currently have it in place, that seems like something it would be good to tell people to keep abuse down while search engines get this capability in place.
So I figure it's likely that rather than violate the robots.txt protocol, there may be some minor penalty for disallowing access to the css link rel of a page. That seems to make the most sense to me..
After I read this thread a couple days ago I got rid of all of my hidden text (which was only an h1 tag) and simple made a minor change to the sites structure. Better safe than sorry.
Ultimitely, only above board, squeaky clean methods are advisable....
with that in mind... does anyone else have info about this?
Shannon
how about put the css file at the page of another site and disallow google visits that page?
that's an excellent idea, since excluding css files through robots.txt would start being a surefire redflag to search engines, this would leave your robots.txt file clean.
I still have serious doubts about how far search engines can go with css analysis, that's such a huge jump in how they treat the page data, I don't know they could do it, but who knows?
I wanted to ad some HTML ads to an old (since 1995) big (>20.000 pages) Website, with lots of real unique content (it's a newspaper archive) without the HTML ad text getting indexed by SEs.
So I put them in a iframe, loaded from another host (same domain) and I disallowed that iframe file via robots.txt
Google has sofar dropped as far as I can tell, all those pages from its "visible" cache (if I try via G Toolbar). PR remained unaffected. GBot crawls the site regularly, logs show Googlebot gets 95% of the time HTTP 304 codes, ie it has stored the previous version.
But it won't show most of those pages in SERPs and logs show G referrals to be 1/4 of the # it was 20 days ago.