Google now indexes javascript

Forum Moderators: open

Message Too Old, No Replies

Google now indexes javascript

proof

claus

7:29 pm on Aug 19, 2003 (gmt 0)

Just got proof five minutes ago, there's been talk for a while i know.

Take a look at the pages in these serps: [google.com...]

/claus

Marcia

3:51 am on Aug 20, 2003 (gmt 0)

Good!

Dolemite

4:03 am on Aug 20, 2003 (gmt 0)

OK...back up the bus. Those pages just have javascript mistakenly in places where it could only be indexed, and not ignored as a valid tag, like within <title> tags.

That's not indexing javascript, that's indexing people's mistakes.

chiyo

4:04 am on Aug 20, 2003 (gmt 0)

This is going to be interesting. On a couple of sites we deliver short ads with urls for around half a dozen advertisers. It is a "rotating" external (off page but same site) javascript that delivers one of the six ads in rotation every time a page is loaded.

I wonder to google whether this would be seen as a link to each site on each page, or just the one that is displayed? The former is very bad of course. The latter is much less of a problem.

[edited by: chiyo at 4:08 am (utc) on Aug. 20, 2003]

Key_Master

4:05 am on Aug 20, 2003 (gmt 0)

That's not really proof that Googlebot parses JavaScript. Many of those results are the result of Webmaster errors (e.g. the use of JavaScript in the Title).

Marcia

4:35 am on Aug 20, 2003 (gmt 0)

You mean I got all excited for nothing?

I can't wait 'til they start indexing Javascript. It was reported a few weeks ago that Googlbot was seen grabbing JS files, I imagine it'll be any day now.

TheDave

7:32 am on Aug 20, 2003 (gmt 0)

I don't think we'll see js indexed as such, they probably are parsing the files for links

claus

7:34 am on Aug 20, 2003 (gmt 0)

>> just have javascript mistakenly in places where it could only be indexed

Yes, there's some in titles, but it's still within <script> tags so script tags does not exactly mean "no index". Here's another: [google.com...]

No 4 is "Monster on NME.COM" - look at the exerpt from the page below the title, then look at the page, and finally look at the page source.

The "document.cookie" is found in the excerpt in the SERPS, it is not in visible elements of the page, only in the code, and within these tags:

<script language="JavaScript1.2">
<!-- 
//-->
</script>

I'd still say that this is proof that JavaScript will get read, indexed and make it to the SERPS.

/claus

RonPK

7:55 am on Aug 20, 2003 (gmt 0)

I suspect the script on the page Claus mentioned got indexed because Googlebot didn't see the page as proper HTML. Look at the notice on the SERP's: "File Format: Unrecognized"

Looks like a buggy server that doesn't allways send the correct Content-type header. According to Sam Spade, sometimes it doesn't send any headers at all...

[edited by: RonPK at 8:10 am (utc) on Aug. 20, 2003]

humpingdan

8:03 am on Aug 20, 2003 (gmt 0)

really doesnt look like google is indexing any java script at present! theres been no indication in any of my logs to even entertain that idea, keep wishing!

HitProf

9:29 am on Aug 20, 2003 (gmt 0)

This doesn't look like indexing javascript to me.

Indexing javascript would mean: indexing the pages the javascript redirects to, when a page can only be reached by javascript.

This will be the first application of Google reading javascript. Other things like hiding text will be far behind.

claus

9:55 am on Aug 20, 2003 (gmt 0)

>> javascript would mean: indexing the pages the javascript redirects to

No, that's collecting links from within a javascript, that's something else.

Indexing javascript means indexing the parts of page content that is located in the <script> container (with an optional internal comment tag). This is being done. It is not only being read, it is also being parsed and it is searchable, so you can see it in the SERPS.

>> got indexed because Googlebot didn't see the page as proper HTML

Point taken. These examples are buggy pages, i recognize that. It still shows clearly that the Gbot can and do index such stuff, although only pages using bad html seems to make it to the serps.

Sorry if you feel it's a false alert, i was perhaps a bit fast, but i still think it's significant. It narrows the discussion on javascript indexing down to a matter of choice (by Google) as it's clearly technically possible, so now it's no longer a "can they" question, it's a "(when) will they".

/claus

MonkeeSage

10:11 am on Aug 20, 2003 (gmt 0)

claus:

[...] it's no longer a "can they" question, it's a "(when) will they".

Good point! At the very least we know they can now...hadn't thought about it like that. :)

Jordan

driesie

10:26 am on Aug 20, 2003 (gmt 0)

They might be indexing JavaScript, but they're definitly not parsing it.
If you look at the pages, it's actually indexing JS code, not the effects of it, which is plain wrong in my opinion.

What's the point of indexing the "location.href" string, it should follow the redirect, not index that string.

As a user, if I search for "location href" then I want to see pages that explain what it means, how to implement it etc. I don't want to see links to pages that use that javascript.

This is a bug, not a feature if you ask me.

claus

11:02 am on Aug 20, 2003 (gmt 0)

driesie, they are parsing it. They are not executing it. To parse just means to split it up into elements; the javascript is not considered one big black block as in "the javascript container", the individual elements of the block (like "location" and "href") are identified separately.

What you want them to do is to execute it. Personally i don't want this, and i doubt they ever will, as it opens too many easy opportunities for manipulation.

I just want them to index and parse it so that i can search for it, and then i can execute it myself if i want to. And even if this option is an "advanced search" so that it's not part of the normal SERPS - no problem, i say.

/claus

driesie

1:18 pm on Aug 20, 2003 (gmt 0)

Whatever you want to call it what they do, it's still just indexing it as it was normal text on a page, which is wrong.
Maybe it'd be alright if it was a search on something like javascript.google.com

I would say that in 99% of the cases people don't want those results.
It's like saying that google should index html tags, so if you search for </body> you get every website in the world?
Javascript is part of the page code, not the content.
It would be different it Google could "understand" javascript. That way, they could rightfully index content that is for example dynamically put on a page by javascript, or follow javascript links (for example from dynamic menu navigation) etc.

George Abitbol

1:26 pm on Aug 20, 2003 (gmt 0)

Indexing Javascript code that has been written as text within a HTML document in order not to be executed but to show the code to users is not indexing Javascript code! Sites listed with the searches described above all have the code written as body text. For instance : http://216.239.51.104/search?q=cache:6-Zp8sHy-e8J:penguin.wpi.edu:4546/course/087254 /lab1/fsstuff/ advancedJS.html+var+monster+%3D+document.cookie&hl=en&ie=UTF-8 shows that the query terms are in the example code written as text. If Google indexed Javascript, it would index "Veuillez saisir des mots cl�s, Merci" on this page : http://www.allhtml.com/
But it doesn't :
http://www.google.com/search?hl=en&lr=&ie=ISO-8859-1&q=%22Veuillez+saisir+des+mots+cl%E9s%2C+Merci%22&btnG=Google+Search

These words are part of the Javascript code used by this page, and not shown on the page like sites listed above do. And they are not indexed.

What is more, what would be the point for Google to index Javascript code? What would it had for Google users? It is as pointless as indexing html comments or <table> attributes used on a page! Only cheaters could benefit this, by putting keywords in Javascript comments or functions and variables names...

The only interesting feature with Javascript and google would be to follow javascript links... Indexing code snippets would be useless.

George Abitbol

(edit : i hadn't seen driesie's post)

MonkeeSage

1:45 pm on Aug 20, 2003 (gmt 0)

As claus said, indexing JS in this conversation means parsing it and understanding it as separable entities that Google can distinguish from HTML entities. There is no more question that Google can (has the ability to) do this. The question now becomes if / why / how they should, since we know it is possible.

This is a step foward in the discussion as it was formerly stalled at "Does Gogle even see anything inside of <script> tags?"

Also, the arguments I'm seeing against indexing JS could be equally made for excluding links from indexing (no one wants to see link titles in a search!, no one cares about URLs just show us the content of the page the URL points to instead!, &c.), hence they are not good ones (assuming we all want Google to index links and show them in SERPs).

Jordan

tribal

1:50 pm on Aug 20, 2003 (gmt 0)

"The only interesting feature with Javascript and google would be to follow javascript links... Indexing code snippets would be useless."

For Google, the interesting feature of parsing JS (not the parsing done for indexing, but truly EXECUTING code) would be for detecting JS-abusing cheaters.

For users I can think of at least one use which might be interesting: users seeking examples for creating scripts.

However, I do agree that this should be a feature that users should deliberatly active - not something that just pops up now and then in the SERPs. As allready said, it - like the HTML a page is built from - has nothing to do with content.

creative craig

1:56 pm on Aug 20, 2003 (gmt 0)

rcjordan pulled a gem out of the bag the other day. Scrolling text by Javascript indexed and coming up number 1 for a competitive search term, cant find the thread now though :( It was on Google though! That was the first time that I had seen it before.

Craig

George Abitbol

2:10 pm on Aug 20, 2003 (gmt 0)

MonkeeSage : I think that, in a document, links are much much more relevant than html comments or pieces of javascript code.

tribal : you're right, my "only" was a bit restrictive ;-)

creative_craig : it would be interesting to find the topic, but still, if the scrolled text is contained within a DIV (for instance), this is no surprise at all, since this text is part of the document and can be seen by text-browsers like Lynx (and so google can see it too). A Javascript-generated text would be a greater surprise ;)

George Abitbol

MonkeeSage

2:20 pm on Aug 20, 2003 (gmt 0)

George:

I think links are more relevant too because they are usually more closely related to the content. But they are, amazingly, not actually part of the document content. They are prompts to tell the client to do a certain action...like JS.

As with others, I would also like to see a special seach option for JS snippets, speaking personally.

Jordan

bcolflesh

2:24 pm on Aug 20, 2003 (gmt 0)

As with others, I would also like to see a special seach option for JS snippets, speaking personally.

I'd love to have a tab called "Code" that just returned prog lang snippets!

HitProf

2:50 pm on Aug 20, 2003 (gmt 0)

>No, that's collecting links from within a javascript

That's what Google is about.

>For Google, the interesting feature of parsing JS (not the parsing done for indexing, but truly EXECUTING code) would be for detecting JS-abusing cheaters.

There is a connection between the two. How often isn't javascript being used to redirect the user to a different page and feed the search engine a spammy keyword stuffed page?

If Google would parse/execute javascript and skip refreshing/redirecting pages but index the target pages in stead this would cause a revolution in the SERPS.

claus

4:18 pm on Aug 20, 2003 (gmt 0)

George_Abitbol i understand what you are saying, but you miss the point i'm afraid - i'll try to explain it thoroughly.

First, i am not referring to this tag-lookalike used in html:

&lt;script&gt;

- i am referring to content inside the <script> container and in post #8 even content that is commented out using comment tags.

Now, i know the example pages are not the general rule. I know that these are exceptions that only show up because they are badly formatted. I know those from the first post uses javascript in a place where javascript is not supposed to be used. It's just not the point. The point is derived from the fact that the pages do use properly formatted javascript even such javascript that is commented out. But that's not even the point itself.

The point is that these odd results indicate that Googlebot does not follow a rule similar to this pseudocode:

Read until "<script" or "<!--"
if ("<script" or "<!--" is found) {
skip all characters until after "</script>" or "-->"
}

In stead, this finding proves that it does not skip things between script and comment tags. These things can be read and parsed and they are in fact being so (although i have only seen it in these odd cases). That's the real point here.

The Gbot is fully capable of reading and parsing javascript, there's no need to add this to the bot skill set, as it can clearly do it already. It's as simple as that. It's not odd or spectacular technically, as it also handles certain far more complex file formats, but it is interesting from a webmaster point of view.

Simply put, you can no longer be sure that what you put inside an on-page javascript is virtually invisible to Gbot - not even if you add comment tags. The on-page JavaScript does not make it to the SERPS for normal html pages but we now know for certain is that it's no problem for Gbot to read it and that it has in fact been done.

As for the use of it in SERPS... well, that's someting else than the indexing but i would like it, i need it, and i often curse the SE's of this world for not being able to show it. Comments even more so - there are certain generic comments that it would be of great value to me to be able to search for.

My searching needs are not like the ones of the average surfer, but i do not mind if this was part of an advanced search in stead of the general one. Even a paid one i'd say.

TallTroll

4:28 pm on Aug 20, 2003 (gmt 0)

Try searching Google using the filetype: switch set to "js"....

But look *really hard* at the returned URLs

Dolemite

5:44 pm on Aug 20, 2003 (gmt 0)

Sorry, Claus, I'm still waiting for proof. It sounds like we caught you trying to blow the whistle early and you just can't deal with being wrong.

If you can find me one example of indexed javascript that isn't poorly-formed HTML, uses comment tags properly, sends the right headers, and has no other indexing problems, then that's something.

Otherwise, keep wishing.

MonkeeSage

6:01 pm on Aug 20, 2003 (gmt 0)

Dolemite:

If you can find me one example of indexed javascript that isn't poorly-formed HTML, uses comment tags properly, sends the right headers, and has no other indexing problems, then that's something.

Indeed...something other than what was pointed out. Something that has nothing to do with the conversation.

(claus pre-emptively responded):

[...] To parse just means to split it up into elements; the javascript is not considered one big black block as in "the javascript container", the individual elements of the block (like "location" and "href") are identified separately.

[...]

Now, i know the example pages are not the general rule. I know that these are exceptions that only show up because they are badly formatted. I know those from the first post uses javascript in a place where javascript is not supposed to be used. It's just not the point. The point is derived from the fact that the pages do use properly formatted javascript even such javascript that is commented out. But that's not even the point itself.

The point is that these odd results indicate that Googlebot does not follow a rule similar to this pseudocode:

Read until "<script" or ""
}

In stead, this finding proves that it does not skip things between script and comment tags. These things can be read and parsed and they are in fact being so (although i have only seen it in these odd cases). That's the real point here.

Google IS indexing JavaScript -- just not on purpose it seems, which was freely admitted -- but this fact alone proves two things:

1. They are ABLE to index it, because;
2. They are already parsing it.

Jordan

claus

6:18 pm on Aug 20, 2003 (gmt 0)

TallTroll:
The filetype switch does not get JS files, although it does get files with the js extension, it's the same with css.

Dolemite:
>> sounds like we caught you trying to blow the whistle early and you just can't deal with being wrong

I've got no problem with that, never had. I did write this in post #12: "Sorry if you feel it's a false alert, i was perhaps a bit fast, but i still think it's significant"

And i do think so. Still. As MonkeeSage said in post #18:

"This is a step foward in the discussion as it was formerly stalled at "Does Gogle even see anything inside of <script> tags?" "

Before we had assumptions, now we have knowledge. SERPS are the same it seems, but we know a little bit more about what's going on behind them. At least thats my humble opinion.

/claus

[edited by: claus at 7:42 pm (utc) on Aug. 20, 2003]

Dolemite

7:02 pm on Aug 20, 2003 (gmt 0)

Jordan/MonkeySage just seems to want to pick a fight with me in any possible thread, so I won't gratify him here.

This is a step foward in the discussion as it was formerly stalled at "Does Gogle even see anything inside of <script> tags?"

It doesn't see inside my

<script>

tags nor anyone else's who knows how to use HTML.

The discussion was never stalled on this point. Google needs to be able to handle a large number of errors due to general incompetence and the lack of strict standards in browsers...for this reason complete validation may never truly be a factor. However, there are some errors that it doesn't "handle,"

<script>

tags inside

<title>

tags being somewhat of an extreme example.

Any language is "parsable" by Googlebot by this broken definition. I can link to a Prolog file, insert Prolog randomly in my HTML, or display a few lines of Prolog in a normal/proper way on a webpage. In all cases it might be searchable and one might say google has indexed my Prolog...but put it inside

<HEAD><SCRIPT><!-- //--></SCRIPT></HEAD>

or in an external file using

<SCRIPT SRC=file>

, where if a browser were to do anything with arcane logic languages besides display them, it ought to be, and that Prolog disappears.

This 61 message thread spans 3 pages: 61