Can Google Follow JavaScript?

Forum Moderators: open

Message Too Old, No Replies

Can Google Follow JavaScript?

To reduce the number of links from a page,want to introduce JavaScript link

MrRoy

4:56 am on Jul 18, 2003 (gmt 0)

Hello,

I want "SOME OF THE LINKS" from my home page not to be followed by Google (to reduce the number of links at the home page).

Can I use JavaScript for this purpose? Will Google follow JavaScript links?

All responses will be highly appreciated.

Thanks in advance.

Made In Sheffield

8:35 am on Jul 23, 2003 (gmt 0)

Nobody's talking about using Java for links! (bang, bang again, see post 27 :-)

The 100k limit, AFAIK, is only a limit of the page size stored by the cache, I believe a number of sources have confirmed that the information not cached in pages > 100k is still indexed.

Although it's obviously a huge Usability blunder to have a page that size and very very bad practice.

Cheers,
Nigel

claus

8:39 am on Jul 23, 2003 (gmt 0)

Oh, of course, an effective way to hide a link would be to incorporate it in a Java-applet. But it just might be overkill for a link.

A piece written in Flash could also do the trick.

/claus

<added>Made_In_Sheffield watch that head of yours, it might start to hurt ;)</added>

Made In Sheffield

8:53 am on Jul 23, 2003 (gmt 0)

Agreed Java would work, Flash not so sure, I don't use it but I believe Google is starting to index it? Can anyone confirm?

Cheers,
Nigel

P.S. It's a soft wall :)

Imaster

9:25 am on Jul 23, 2003 (gmt 0)

Hi Claus,

You mentioned that the best way to hide links from Google would be to block cgi-bin through robots.txt and then use a script which gives URL's of the format

http://www.domain.com/cgi-bin/redir.pl?id=00012345

where the id represents a URL

I have 2 questions

1) How to implement this? Could you give me links to such scripts as I am unable to find any?

2) What happens in general when Google sees a link but cannot follow it. Do such links contribute to PR leak?

Suppose I have 10 links on my page of which 4 are hidden via such a script. Would google distribute the PR to all the 10 links or only to those 6 visible links?

Thanks,
Internet Master

claus

10:50 am on Jul 23, 2003 (gmt 0)

Imaster, there are a lot of such scripts. I recently gave some advice on installing one in another thread (http://www.webmasterworld.com/forum13/3142.htm) - the thread might not be relevant to you but the product works off the cgi-bin and counts clicks on links that it redirects to. You'll find it here: [aardvarkind.com...]

I did not pick this one but it seems to do the job just as well as others i've seen. You just have to look at "-Shortened URLs" in the "readme.txt" file for guidance on using IDs.

2) PR leak.

First, there's an understanding that a PR Leak is not a leak. Some PR is passed on from your page to the pages you link to, but your own page does not lose any PR in doing this. I've seen this statement in a few threads, here are two examples:

1) [webmasterworld.com...]

2) [webmasterworld.com...]

A link like that would be a link to an internal part of your own site. A part which is access restricted. I would assume that this makes it quite impossible to decide where the 4/10 of the PR should go to. In order to "leak" it must leak to "somewhere" - at least that is the way i understand it.

So, your 4 links will not leak. That is: They will not pass any PR on, as there is nothing to pass it on to. Your six remaining links will pass on 1/6 of the total amount passed on. That is, the total amount "passed on" will be the same, it will only benefit 6 pages in stead of 10. As 1/6 is greater than 1/10 this is good for the pages that get some.

/claus

<edit>typos, clarified text</edit>

Herenvardo

2:57 pm on Jul 29, 2003 (gmt 0)

Made_In_Sheffield: Beware with your head, you can break the wall! ;)

Everybody: I will remember to all of you that in post 23 I explained what is Java and JS. Please name things by their name before I join Made_In_Sheffield's autodestructive behaviour (and my walls won't last too much, so I'll break my neighbours' ones!)

claus: You said that a link can be hidden in a Java-applet, but I'm not sure: linking outside the applet's home domain is valid with Java security policies?

Imaster: I'll give you a quick definition 'bout PR leak:
A page gives PR to the pages it links to without losing its own. If the other page links back, then it'll give more PR to the 1st page. If your 1st page has more links, each one will carry less PR, so the 2nd pages that link back will return less. The page never losses PR by linking out, but it might happen that it gains less than before: this is what is called PR leak.

Regards,
Herenvard�

kamikaze Optimizer

3:20 pm on Jul 29, 2003 (gmt 0)

Google definitely has followed the java links on my site. I have text links on the bottoms of the pages now, but Google indexed the pages before the text links were there.
The wierd thing is, I have a client's site that uses the same type of java scripting, and the Goo didn't make it past the first page the first time it got crawled.
baffled,
Chris

GoogleBot might have visited the pages in question, if you visited the page and had the Google ToolBar turned on to send data to Google.

Thats how Google finds new sites/pages...

Made In Sheffield

4:28 pm on Jul 29, 2003 (gmt 0)

I disagree, I don't think Google do find pages in this way.

If a page is not linked to by another page then it is not part of the web as Google sees it and will therefore not be added to the index by my understanding. I think the only way Google finds pages is by following links from other sites. I'm pretty sure that submitting your site to Google does no good unless you have a link from another site (which kind of makes it a pointless exercise apart from making you feel better). Anyone want to correct me on any of that?

I do however think Alexa does find sites in this way.

Cheers,
Nigel

steve128

9:28 pm on Jul 29, 2003 (gmt 0)

I agree with mr. sheffield wednesday -;
No links in = no google recognition.
Maybe JS links can be followed, in fact I'm fairly sure they can, the point is what weight is given. IMHO none, many linkers like it that way -;))

Visit Thailand

12:23 am on Jul 30, 2003 (gmt 0)

Well someone visited our testpage which was only online about 10 minutes. Unfotunately did not keep the 404 but it was not linked to anything and nothing on it was clicked so no referals. Toolbar? Perhaps.

Mind you that is off topic - Sorry

Made In Sheffield

7:14 am on Jul 30, 2003 (gmt 0)

cabbie,

Nobody's saying Google can't find a page without links. What I am saying is that they won't add it to the index> until they find some links.

This is because of the way the PageRank alrgorythm works.

Why do you think pages get dropped from the index once you take them out of your site navigation but leave the pages there? Because they are no longer linked.

It takes a while but ti does happen in time.

Visit_Thailand

Without looking in your log file and seeing what the User Agent was you can't blame that on Google, do you have the Alexa toolbar installed?

Thanks
Nigel

Andinio

8:02 am on Jul 30, 2003 (gmt 0)

Hi everyone!

This question about hiding links from Google was discussed some months ago and Googleguy made his comments clear about this topic:

Googleguy comments about hidden links [webmasterworld.com]
his comments are page1, fifth comment

Personally, i use a php redirect script hidden from the bot using robots.txt. Google is gracious not to visit pages declared DENY by robots.txt

It's great for hiding repeated links to terms and conditions, privacy statements or adverts.

Read what Googleguy said, hope this helps.

mayday9

8:20 am on Jul 30, 2003 (gmt 0)

A guy said:

"4. Use a redirect script and use robots.txt to disallow your script. (RECOMMENDED!) googlebot won't even touch your script."

then GoogleGuy replied:

"Yah, robots.txt or meta tags are your friends.. Those should both work fine."

Here's what I'm thinking. You can maybe prevent the link from being spidered and indexed, but does it matter PR wise? I mean even if the link isn't followed it's still there and googlebot still saw it. For those who need to hide the link to reduce PR leakage that method won't work.. Since there has been talk about google spidering javascript that method is out too. So it leaves us with <form> redirection but that shouldn't be very hard for google to implement and since GoogleGuy follows this forum it's probably alredy done. If someone thinks of a cool method to make googlebot ignore a link completely (as if it's not a part of html) and posts it here, that will be the end of that method:) I'd still love to have it though.

regards,
Darko

SlowMove

8:58 am on Jul 30, 2003 (gmt 0)

Suppose that you could hide the links with JavaScript or other methods. I know it wasn't your intention, but wouldn't link farms be using this same technique to build link popularity for their own sites? I'd like to hear what GoogleGuy has to say on this.

If you just want your users to find the links, you could just put them on another page and setup a prominent link to that page.

Imaster

11:32 am on Aug 2, 2003 (gmt 0)

Thanks GrinninGordon, Claus, and everyone for your help. I will check it up and post my observations soon :)

Ciao...

Patrick Taylor

11:52 am on Aug 2, 2003 (gmt 0)

To make a link totally unspiderable (not the same as hiding a link), make a small Flash5 file with a movieclip that calls up dynamically-loading text (from an external .txt file) containing the html link. Surely this will do it! And if that doesn't seem do it, put the Flash file in an external JavaScript file. That should nail it. And I do not agree that a normal html outgoing link (per se) doesn't leak PR... I can't prove it - I just disagree.

kaled

12:32 pm on Aug 2, 2003 (gmt 0)

My understanding (though I could be wrong) is that you will benefit more from a backlink on a PR5 page with 10 other links than you will from a similar PR5 page with 100 links. I do NOT recall reading anything authoritative that said something along the lines "The more links you have on your page the lower will be its PR."

To make a link totally unspiderable, simply create it using the document.write method in javascript. If the required link is a named image rather than text, I think you can assign its HREF field. I did something similar to this in an html help file a while back but my recollection of this is rather hazy.

Kaled.

Patrick Taylor

12:55 pm on Aug 2, 2003 (gmt 0)

I've used js links also, and when I check them in a search engine spider simulator they show as [javascript:void...]

The code is:

mayday9

1:05 pm on Aug 2, 2003 (gmt 0)

I've used js links also, and when I check them in a search engine spider simulator they show as [javascript:void...]
The code is:
<a href='javascript:void window.open("http://www.site.com", "_self");'>Link</a>

I think that as long as google recognizes it as a link of any kind (even "http://javascript:void/") it is considered in the PR calculations algo. inbound links increase PR, outbound links decrease it..

Patrick Taylor

1:19 pm on Aug 2, 2003 (gmt 0)

To whom is it passing PR?

Imaster

1:52 pm on Aug 2, 2003 (gmt 0)

I think that as long as google recognizes it as a link of any kind (even "http://javascript:void/") it is considered in the PR calculations algo. inbound links increase PR, outbound links decrease it..

I would like a confirmation on this observation. Can this really be true?

kaled

2:07 pm on Aug 2, 2003 (gmt 0)

In ensuring that the link cannot be followed, you have to ensure that the text that forms the url cannot be recognised. If the url text remains contiguous is may be recognised. If it is an absolute url (beginning http:// or whatever), then recognising the url is easy. Simply using a javascript link will not ensure that the link url is not followed (though it is unlikely that it would be followed).

Think of it this way. Forget HTML, consider the page to be simply text. Finding absolute urls on the page is easy, and if the search engine spiders wish to do so they can follow those urls whether or not they are links.

So, the only way to ensure that such urls are not followed is to use obfuscation and this is most easily done by creating the links using javascript document.write statements to simply create plain old HTML links.

Of course, technically, spiders could run the javascript but this is unlikely. However, you could go one step further by using the OnClick event to launch a javascript function that creates an url on the fly (from say a domain name and a page adddress) and launches it. There is absolutely no way that a spider will cope with this. BUT the key to this is still to consider the page as plain text. If a simple text search yields urls then those urls could, theoretically, be followed by spiders.

Hope this helps,

Kaled.

mayday9

2:34 pm on Aug 2, 2003 (gmt 0)

Kaled,

someone recently posted that Googlebot took his external .js file along with the other pages. This wasn't a common practice in the past as I understand. So it might indicate that google now takes and examines javascript too. Should be a piece of cake for google to handle javascript.

I think Patrick_Taylor's Flash example would do the trick though. Wonder how long this will be (since google and macromedia alredy have some kind of google flash spidering thing going on).

Darko

SlowMove

2:34 pm on Aug 2, 2003 (gmt 0)

If links can be hidden from the search engines, I'm going to start exchanging links with any site, including link farms. I don't recall anyone that wanted to trade links ever asking if the link would be picked up by Google and the other search engines.

kaled

4:31 pm on Aug 2, 2003 (gmt 0)

Mayday9 wrote

Should be a piece of cake for google to handle javascript.

That rather depends on what is meant by "handle javascript". Interpreting a scripted language requires a lot of CPU power. I very much doubt that any spiders currently operating have sufficient CPU cycles to spare to do anything much with javascript. The most rudimentary obfuscation of the urls should be more than enough to defeat any spider. With a little imagination and I think you could come up with a system that would defeat spider technology for the next ten years or so (unless the spider specifically targetted your algos).

I've used javascript to create HTML code. It really is not that difficult.

Kaled.

PS
There are a lot of cgi links pointing to my site. These are not read by spiders (drat, drat and double drat).

ogletree

4:44 pm on Aug 2, 2003 (gmt 0)

Could you put all your links in an access db or any db and have them pulled out by some client side method that Google can not follow. If you just have it in a text file google can look at it. I guess you could dissallow the text file. Google can't parse some script but they can look at the text and gleem URL's.

mayday9

6:38 pm on Aug 2, 2003 (gmt 0)

SlowMove wrote

If links can be hidden from the search engines, I'm going to start exchanging links with any site, including link farms. I don't recall anyone that wanted to trade links ever asking if the link would be picked up by Google and the other search engines.

That won't help you with linkfarms cause they have bots that check if you're linking back to them etc.. and the bots won't see the link. They usually specifically require <a href="URL">DESC</a> link.

kaled,

I guess it's all guessing untill we test it. But since ppl are reporting about Googlebot taking their .js files google must have some agenda with them..

Darko

Imaster

12:07 pm on Aug 30, 2003 (gmt 0)

I am not sure if this has been posted before, but apart from following javascript links, Google also is following and adjusting page ranks for urls such as : http://www.xyz.com/dir/s.asp?l=111

i.e. the url http://www.xyz.com/dir/s.asp?l=111 actually leads to abc.com and if we check the backlinks for abc.com, the link http://www.xyz.com/dir/s.asp?l=111 does appear.

Examples used above are fictitious ;)

seofreak

2:52 pm on Aug 30, 2003 (gmt 0)

i have tested the above successfully, it isn't detected by google

rainborick

2:30 pm on Jul 20, 2003 (gmt 0)

I just spotted the googlebot doing something I hadn't seen before and was wondering if it was really new or just my inexperience showing. It picked up a JavaScript file I use on all my pages for common functions - rollovers, EMail address obfuscation, and such. Based on some recent comments here, I'd guess its most likely just looking for links, but it could be interesting if it was looking for other things, too - like various CSS manipulations.

This 80 message thread spans 3 pages: 80