Can Google Follow JavaScript?

Forum Moderators: open

Message Too Old, No Replies

Can Google Follow JavaScript?

To reduce the number of links from a page,want to introduce JavaScript link

MrRoy

4:56 am on Jul 18, 2003 (gmt 0)

Hello,

I want "SOME OF THE LINKS" from my home page not to be followed by Google (to reduce the number of links at the home page).

Can I use JavaScript for this purpose? Will Google follow JavaScript links?

All responses will be highly appreciated.

Thanks in advance.

kaled

8:53 pm on Jul 20, 2003 (gmt 0)

Other possibilities include Google checking for hidden text; which can of course be achived through JavaScript.

The concensus seems to be that CSS files are ignored. If this is still the case, it seems unlikely javascript files are being scanned for hidden text algos.

Kaled.

PS
Given that AllTheWeb follows javascript urls, I imagine Google will catch up eventually, unless, of course, they choose not to as a deliberate policy decision.

MyWifeSays

9:10 pm on Jul 20, 2003 (gmt 0)

Hi rainborick,

Are you sure it was googlebot? There is a hidden text filter but it is only invoked as a result of a spam report at present (although that may be old news).

Was the IP from the normal googlebot range?

Neo541

5:11 pm on Jul 23, 2003 (gmt 0)

Googlebot came along and requested our external javascript menu array:

64.68.88.38 - - [22/Jul/2003:02:36:50 -0400] "GET /style/MYMENU.js HTTP/1.0" 200 25772 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"

Is this normal? New?

I don't particularly mind, just curious because it is certainly new for our site, even though we've had javascript menus for almost a year now.

Thanks!

Marcia

7:26 pm on Jul 23, 2003 (gmt 0)

Wow! Thanks for letting us know. This could have tremendous implications for some webmasters. I don't have any external JS, I wonder if others are also seeing this.

satanclaus

7:34 pm on Jul 23, 2003 (gmt 0)

Wow thats nutz. I haven't noticed it in the past(and I thought it wasn't being done) but a quick check of my logs shows they're indeed picking up external javascripts

dmorison

3:23 pm on Jul 24, 2003 (gmt 0)

There's a few threads about it, been noticed for a while.

Basic assumption is that they're just looking for URLs within the JavaScript text. That doesn't require the ability to parse it, you just do a regular expression search for anything that smells like a URL.

The other point is that JavaScript can be used to create hidden text (via CSS manipulation). I've heard that a hidden text enquiry only happens in response to a SPAM complaint, so not sure about this, however it might be a spot check or something like that. Perhaps they're being thorough with top results for popular queries, and automatically adding frequent top results to a "have a closer look queue".

Remember you can always use robots.txt to keep G'bot off your JavaScript.

kaled

4:07 pm on Jul 24, 2003 (gmt 0)

Remember you can always use robots.txt to keep G'bot off your JavaScript.

Assuming that Google are :-
A) not planning to index javascript files
B) doing this as part of their anti-seo/spam strategies

can we be certain that robots.txt will be respected?

Kaled.

chiyo

4:14 pm on Jul 24, 2003 (gmt 0)

respecting robots.txt standard does NOT mean that a robot should not read the file, but does mean that it should not be indexed. Given that robots can still respect robots.txt and still read you file, analyze it, and use it in other ways other than indexing it..

Any way that is my understanding. Happy to be proved wrong.

bcolflesh

4:21 pm on Jul 24, 2003 (gmt 0)

respecting robots.txt standard does NOT mean that a robot should not read the file

In fact, that's exactly what it does mean by my reading:

robotstxt.org/wc/norobots.html

It's interesting though, if I were Google, I think I'd analyze the file anyway - it would certainly clean house in many SERPS.

Herenvardo

4:21 pm on Jul 24, 2003 (gmt 0)

Wow! Only two weeks ago I started a topic 'bout this 'cos I needed my JS links to be followed. Now you are telling that gbot retrieves .js files? Wonderful! :) :) :)

I want to an answer to this: even I'm relatively new as webmaster, i've some experience in programming and, after all, JS is a programming language and any robot is a program. So here it goes:
- If a robot is able to crawl through function calls in javascript, it will be able to find ALL links done with it.
- The implementation to crawl through JS calls would be similar to the one to crawl <a> links, so it could be done.
So, this is my oponion: google is beginning to follow JS. The cheaters will need to search for another way to cheat, and I won't need to remake my menus to get them spidered and crawled.

Herenvardo

dmorison

4:24 pm on Jul 24, 2003 (gmt 0)

Hi Herenvardo,

Have you any evidence yet that Googlebot has requested a page that could only have been discovered by parsing your .js?

That would be interesting to know.... keep watching your logs and let us know if it does!

chiyo

4:25 pm on Jul 24, 2003 (gmt 0)

ah yes... i stand corrected thanks bcolflesh..

which brings me to the next point.. seeing the js is delivered to the page on a non robot.txt'd page could google or any other robot claim that it is quite legitimate to read it, as it actually appears on a legitimate page and they didnt have to crawl it independently?

it may be a fine line here.. For example when they spider some SSI pages the content may come from many original directories if it has server side includes.. The text and code to the crawler in this case is "on the page" to the robot as processing is done before it is crawled, but in reality some of it is fetched from other directories. Now js, at least the types i know of, isnt server side, but the output still appears on the page... and may be argued to be fair game for indexing as part of the page, but not as a separate file if its been robot.txt'd in the source directory.

[edited by: chiyo at 4:56 pm (utc) on July 24, 2003]

bcolflesh

4:50 pm on Jul 24, 2003 (gmt 0)

That's what I'm wondering about too - I don't think it's a secret that many competitive searches are filled with sites spamming via js files - these guys (if they were smart) already had the folders where their js files are excluded.

I'm saying Google should scan the file if it's linked to an indexed file.

Neo541

5:12 pm on Jul 24, 2003 (gmt 0)

Have you any evidence yet that Googlebot has requested a page that could only have been discovered by parsing your .js?

I've got standard links on all my pages as well as the javascript, so I can't tell whether Googlebot is following the javascript.

Anyone out there that has had Googlebot follow their external .js file to a page that wasn't linked anywhere else?

sabai

3:45 am on Jul 25, 2003 (gmt 0)

64.68.88.30 - - [21/Jul/2003:03:06:29 -0500] "GET /XXX.js HTTP/1.0" 200 520 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"

Google has requested my javascript file on two occasions over the past month.... have added the following to my js file, but it may take a while to see the outcome (if any)... anybody getting more hits on a js file could try the same thing....

var url1 = 'http://SITENAME.COM/page_1.html';
var url2 = 'SITENAME.COM/page_2.html';
var url3 = '/page_3.html';
var url4 = 'http://' + 'SITENAME' + '.com/page_4.html';
function test()
{
var url5 = 'http://SITENAME.COM/page_5.html';
}
var url6 = 'href=http://SITENAME.COM/page_6.html';
var url7 = 'href="http://SITENAME.COM/page_7.html"';
var url8 = "http://SITENAME.COM/page_7.html";

I read that it might be initiated as some kind of spam check after a spam report, looking for hidden text... but my site in question is under 6 weeks old and is very unlikely to have been reported for spam.

keyplyr

8:24 am on Jul 25, 2003 (gmt 0)

Well, I use numerous external JS files to writeln content, links, utilities into webpages and Google is NOT finding any of this.

sabai

9:28 am on Jul 25, 2003 (gmt 0)

<script language="javascript1.2" type="text/javascript" src="/XXX.js"></script>

Here's what it the external linking looks like on my pages... might be relevant.

Brett_Tabke

3:44 pm on Oct 16, 2003 (gmt 0)

kicked, because evidently there is a newletter story out there stating this is a new phenom. We had a bunch of post submissions that suggest this was new. Obviously - it is NOT.

nativenewyorker

10:30 pm on Oct 16, 2003 (gmt 0)

Sorry Brett,

But you are definitely wrong on this one. This number on www was about 15,000 this morning and it is now 30,000 12 hours later. Look at all the url's that Google has listed that end in .txt

Google's index of JavaScript links [google.com]

Ted

claus

11:00 pm on Oct 16, 2003 (gmt 0)

>> that end in .txt

hmm.. i find those that do not end in txt more interesting. Especially disabling javascript when doing so. Then again, Googles standard cache sentence keeps claiming:

"These terms only appear in links pointing to this page: allintext document write a href"

This 80 message thread spans 3 pages: 80