Unspiderable JavaScript link (same window)

Forum Moderators: open

Message Too Old, No Replies

Unspiderable JavaScript link (same window)

Patrick Taylor

6:13 am on Jul 24, 2003 (gmt 0)

I've tried using this for an unspiderable link that can't be followed by a search engine robot:

... but the URL opens in a new window, and I want an unspiderable link to open the new URL in the same window. Could anyone tell me if this is possible in JavaScript and what the code is?

Many thanks.

Patrick Taylor

10:05 am on Jul 24, 2003 (gmt 0)

I could do it like this:

<a href='javascript:void window.open("http://www.somesite.com/", "_self");'
title="Somesite">SOMESITE</a>

... but does anyone know, would the robots follow this link?

claus

10:45 am on Jul 24, 2003 (gmt 0)

There's some evidence in this thread, that at least Gbot might be able to follow both:

[webmasterworld.com...]

/claus

ShawnR

10:47 am on Jul 24, 2003 (gmt 0)

Another way is:

"...would the robots follow this link..."
Currently most/all robots don't parse javascript. But I recall a post from GoogleGuy to the effect that 'just because googlebot doesn't parse javascript now, don't assume that will always be the case; they are working on it'

Of course, for well behaved bots, you could ask them not to spider it in your robots.txt file. And there is a monster thread somewhere here on the perfect htaccess file to ban badly behaved bots.

Shawn

ShawnR

11:01 am on Jul 24, 2003 (gmt 0)

Here is the post by GoogleGuy I was thinking of: [webmasterworld.com...]

Regarding the thread that claus showed, one thing that can cause this may be that the javascript is visible to the bot, so if it is not commented out, the bot can read it just like a very old browser might display the raw code on screen. So I can understand why there were so many varied experiences in that thread; there are different ways to write the same thing in javascript...

Shawn

claus

12:25 pm on Jul 24, 2003 (gmt 0)

Please re-read these posts from the thread i pointed to: #3, #10, #11, #18, #25. These all confirm that on-page javascript links can be, and have been read.

Msg #29 speaks of off-page javascript links, embedded in a file outside the page that is read. Of course these are another matter.

ShawnR:
>> one thing that can cause this may be that the javascript is visible to the bot, so if it is not commented out, the bot can read it

Even though text is commented out, it's still visible to the bot - the bot retrieves the whole html document with a GET request, there's no sorting of JavaScript, styles, comments, or whatever. All that you serve the Gbot will get read by it (perhaps, in some cases only the first some hundred Kb, i'm not sure about that)

After retrieving the document, it is parsed. In this parsing process, the parsing rules may decide that comments, html code, javascript, or whatever should not be indexed. Now it seems like comment text is ignored, but, such rules may change over time, as it has apparently done with javascript.

>> I recall a post from GoogleGuy to the effect that 'just because googlebot doesn't parse javascript now

Msg #19 and #20 of that thread may imply this, but it's not the exact wording. GG speaks of "hoarding PR" as a general phenomenon that can be identified and even used as a scoring factor itself. It is not specific to javascript, quote: "You can try all sorts of stuff to "conserve PageRank,"" (all sorts, msg #20)

<edit>links in the following remowed, as suggested by ShawnR below</edit>

Heres a <snip> Coogle cache image of #9 in the SERPS for "window.open". Please consider this: "These terms only appear in links pointing to this page: window open".

I'm not all that sure that this text can be taken for granted, face value. But in this case it might be so.

This is not a page on JavaScript. It's not a tutorial or an explanation of the "window.open" method. Far from it. Here's the url of the page: <snip>

Try searching the page for "window". Nada. Then "View Source" -> Search for "window" -> #2 result says bingo! This site apparently uses a javascript dropdown for navigation.

You can conclude what you like. Personally, i think it proves that Google actually indexes parts of the text on a page that is not visible to a person using a browser.

It could also wery well indicate that Google indexes JavaScript links.

Here's another link to the G cache: <snip>

It's #8 in the SERPS for "TABLE START". The Smithsonian Institution. It's not about tables, neither wooden ones nor the HTML kind. Do you see TABLE START anywhere on that page? Well, it's got a high PR and the TABLE html-tag is used 42 times (21 start+end) on that page. HTML code is not visible to persons looking at the page by means of a browser, but it's visible to Gbot.

I find it hardly believable that a lot of people would use the link text "table start" when pointing to the SI. A "link:www.si.edu" search reveals that they have around 7,800 backlinks, but i did not find an anchor text of "table" on the ones i tried.

I have not yet found solid evidence for commented-out text. I have tried. It does not seem like it's being indexed, but i am still not 100% sure.

/claus

[edited by: claus at 1:56 pm (utc) on July 24, 2003]

Patrick Taylor

12:38 pm on Jul 24, 2003 (gmt 0)

Thanks for the replies. The first example I posted was given to me recently as an example of how to write a JavaScript link that would not be followed by a robot, compared to other JavaScript link codes that would be spidered - and some do, allegedly. Exactly why this one wouldn't be spidered wasn't mentioned so I assume it's the "void" word or something... I'm not a programmer by any means. The second example I posted is quite similar in structure (and it does open the URL in the same window) so I'm assuming the effect on a robot would be the same as the first example. I wondered if anyone knows for sure, but this might be quite difficult.

Incidentally, the effect I want to achieve is to control the flow of Google PageRank within my own site rather than to prevent any pages being indexed or to "hide" outgoing links. All the pages are linked in some way or another by pure HTML, but according to my strategy, my index page and a couple of others will have a much higher "raw" PageRank than some of the less important ones, which is the effect I want, but of course depends on whether my JavaScript links are actually unspiderable.

ShawnR

1:47 pm on Jul 24, 2003 (gmt 0)

"...Please re-read these posts..."

Claus, I think we are agreeing with each other ;) My post was just explaining the possible mechanism, and why there are inconsistencies in the reported behaviour.

I've read the thread (previously and now), and I'm not convinced they demonstrate anything definitive, except to confirm that the mechanism described by your post above, and my post above that, is feasible. Many posts in the Google News forum are just conjecture, and some of those you list admit to being that. Then again, many posts are absolute gold.

"...Heres a Coogle cache ..."
I'd really request you remove the urls. Posting urls or search terms which can identify specific sites is against the terms of service. At any rate, the issue this thread is addressing is not what is on those pages, but how google got to those pages. (How Google ranks a page's relevance to search terms is a big discussion that can't be covered by this thread, but yes, many factors are taken into account which are not visible, such as alt tags, file names, urls, etc.)

"...an example of how to write a JavaScript link that would not be followed by a robot, compared to other JavaScript link codes that would be spidered..."

Personally I don't think this is something you can rely on, although perhaps it is true for some bots now.

claus

2:30 pm on Jul 24, 2003 (gmt 0)

Patrick Taylor
- i think the void makes it an "useless" link, as it is in the "href" - on the other hand, the bot probably grabs the url off the "onClick" in stead. One of the first msg numbers i posted above states that the Gbot read a window.open link.

Three options / suggetstions:

put your links in a javascript function, and embed the function in comment tags as usual:
embed the function in another (.js) page, and include it using <script src="...">
Simply don't show all links to all pages on all pages. Build a pyramid structure, like proposed in (one of) "Bretts finest" [searchengineworld.com] - and link only within each theme (do remember the breadcrumb trail)

Following the last option, you don't have to worry, as the links that you don't want followed simply are not there, neither in html nor javascript.

ShawnR:
>> I'm not convinced they demonstrate anything definitive, except

Yes i agree, we'll have to watch developments before we conclude 100%, but there is evidence that G is including more than "the visible parts of a page". Comments are a "high risk zone" regarding spam, so i'd be very careful on indexing that one. On the other hand, the bot lives and thrives off links, so i'd go a long way to make it able to identify more of these.

>> I'd really request you remove the urls

- done, no problem :)

>> how google got to those pages

I think this speaks for itself, although i'm still uncertain that this sentence can always be trusted to mean exactly what it says: "These terms only appear in links pointing to this page: window open".

/claus

gph

10:33 pm on Jul 24, 2003 (gmt 0)

I've never tried this but I'd think something like it would require a bot to read external files and parse js.

in the page:

onclick="myFunc('example')"

in an external file:

function myFunc(domain) {
location.href = 'http://www.' + domain + '.com/';
}

claus

11:55 am on Jul 25, 2003 (gmt 0)

This thread suggests that Gbot now reads external js: [webmasterworld.com...]

Oh, found this one as well: [webmasterworld.com...]

/claus

Patrick Taylor

2:25 pm on Jul 25, 2003 (gmt 0)

Thanks! Interesting, helpful, but inconclusive. I might just try to bury my I-don't-want-to-be-spidered links deep inside a .swf - several movieclips deep.