Welcome to WebmasterWorld Guest from 220.127.116.11
Forum Moderators: open
<a href="java script:void" onclick= "window.open('http://www.somesite.com/');
"...would the robots follow this link..."
Of course, for well behaved bots, you could ask them not to spider it in your robots.txt file. And there is a monster thread somewhere here on the perfect htaccess file to ban badly behaved bots.
Heres a <snip> Coogle cache image of #9 in the SERPS for "window.open". Please consider this: "These terms only appear in links pointing to this page: window open".
I'm not all that sure that this text can be taken for granted, face value. But in this case it might be so.
You can conclude what you like. Personally, i think it proves that Google actually indexes parts of the text on a page that is not visible to a person using a browser.
Here's another link to the G cache: <snip>
It's #8 in the SERPS for "TABLE START". The Smithsonian Institution. It's not about tables, neither wooden ones nor the HTML kind. Do you see TABLE START anywhere on that page? Well, it's got a high PR and the TABLE html-tag is used 42 times (21 start+end) on that page. HTML code is not visible to persons looking at the page by means of a browser, but it's visible to Gbot.
I find it hardly believable that a lot of people would use the link text "table start" when pointing to the SI. A "link:www.si.edu" search reveals that they have around 7,800 backlinks, but i did not find an anchor text of "table" on the ones i tried.
I have not yet found solid evidence for commented-out text. I have tried. It does not seem like it's being indexed, but i am still not 100% sure.
[edited by: claus at 1:56 pm (utc) on July 24, 2003]
Claus, I think we are agreeing with each other ;) My post was just explaining the possible mechanism, and why there are inconsistencies in the reported behaviour.
I've read the thread (previously and now), and I'm not convinced they demonstrate anything definitive, except to confirm that the mechanism described by your post above, and my post above that, is feasible. Many posts in the Google News forum are just conjecture, and some of those you list admit to being that. Then again, many posts are absolute gold.
"...Heres a Coogle cache ..."
I'd really request you remove the urls. Posting urls or search terms which can identify specific sites is against the terms of service. At any rate, the issue this thread is addressing is not what is on those pages, but how google got to those pages. (How Google ranks a page's relevance to search terms is a big discussion that can't be covered by this thread, but yes, many factors are taken into account which are not visible, such as alt tags, file names, urls, etc.)
Personally I don't think this is something you can rely on, although perhaps it is true for some bots now.
Three options / suggetstions:
>> I'm not convinced they demonstrate anything definitive, except
Yes i agree, we'll have to watch developments before we conclude 100%, but there is evidence that G is including more than "the visible parts of a page". Comments are a "high risk zone" regarding spam, so i'd be very careful on indexing that one. On the other hand, the bot lives and thrives off links, so i'd go a long way to make it able to identify more of these.
>> I'd really request you remove the urls
- done, no problem :)
>> how google got to those pages
I think this speaks for itself, although i'm still uncertain that this sentence can always be trusted to mean exactly what it says: "These terms only appear in links pointing to this page: window open".