Forum Moderators: open
Take a look at the pages in these serps: [google.com...]
/claus
I wonder to google whether this would be seen as a link to each site on each page, or just the one that is displayed? The former is very bad of course. The latter is much less of a problem.
[edited by: chiyo at 4:08 am (utc) on Aug. 20, 2003]
Yes, there's some in titles, but it's still within <script> tags so script tags does not exactly mean "no index". Here's another: [google.com...]
No 4 is "Monster on NME.COM" - look at the exerpt from the page below the title, then look at the page, and finally look at the page source.
The "document.cookie" is found in the excerpt in the SERPS, it is not in visible elements of the page, only in the code, and within these tags:
<script language="JavaScript1.2">
<!--
//-->
</script> I'd still say that this is proof that JavaScript will get read, indexed and make it to the SERPS.
/claus
Looks like a buggy server that doesn't allways send the correct Content-type header. According to Sam Spade, sometimes it doesn't send any headers at all...
[edited by: RonPK at 8:10 am (utc) on Aug. 20, 2003]
No, that's collecting links from within a javascript, that's something else.
Indexing javascript means indexing the parts of page content that is located in the <script> container (with an optional internal comment tag). This is being done. It is not only being read, it is also being parsed and it is searchable, so you can see it in the SERPS.
>> got indexed because Googlebot didn't see the page as proper HTML
Point taken. These examples are buggy pages, i recognize that. It still shows clearly that the Gbot can and do index such stuff, although only pages using bad html seems to make it to the serps.
Sorry if you feel it's a false alert, i was perhaps a bit fast, but i still think it's significant. It narrows the discussion on javascript indexing down to a matter of choice (by Google) as it's clearly technically possible, so now it's no longer a "can they" question, it's a "(when) will they".
/claus
What's the point of indexing the "location.href" string, it should follow the redirect, not index that string.
As a user, if I search for "location href" then I want to see pages that explain what it means, how to implement it etc. I don't want to see links to pages that use that javascript.
This is a bug, not a feature if you ask me.
What you want them to do is to execute it. Personally i don't want this, and i doubt they ever will, as it opens too many easy opportunities for manipulation.
I just want them to index and parse it so that i can search for it, and then i can execute it myself if i want to. And even if this option is an "advanced search" so that it's not part of the normal SERPS - no problem, i say.
/claus
I would say that in 99% of the cases people don't want those results.
It's like saying that google should index html tags, so if you search for </body> you get every website in the world?
Javascript is part of the page code, not the content.
It would be different it Google could "understand" javascript. That way, they could rightfully index content that is for example dynamically put on a page by javascript, or follow javascript links (for example from dynamic menu navigation) etc.
These words are part of the Javascript code used by this page, and not shown on the page like sites listed above do. And they are not indexed.
What is more, what would be the point for Google to index Javascript code? What would it had for Google users? It is as pointless as indexing html comments or <table> attributes used on a page! Only cheaters could benefit this, by putting keywords in Javascript comments or functions and variables names...
The only interesting feature with Javascript and google would be to follow javascript links... Indexing code snippets would be useless.
George Abitbol
(edit : i hadn't seen driesie's post)
This is a step foward in the discussion as it was formerly stalled at "Does Gogle even see anything inside of <script> tags?"
Also, the arguments I'm seeing against indexing JS could be equally made for excluding links from indexing (no one wants to see link titles in a search!, no one cares about URLs just show us the content of the page the URL points to instead!, &c.), hence they are not good ones (assuming we all want Google to index links and show them in SERPs).
Jordan
For Google, the interesting feature of parsing JS (not the parsing done for indexing, but truly EXECUTING code) would be for detecting JS-abusing cheaters.
For users I can think of at least one use which might be interesting: users seeking examples for creating scripts.
However, I do agree that this should be a feature that users should deliberatly active - not something that just pops up now and then in the SERPs. As allready said, it - like the HTML a page is built from - has nothing to do with content.
tribal : you're right, my "only" was a bit restrictive ;-)
creative_craig : it would be interesting to find the topic, but still, if the scrolled text is contained within a DIV (for instance), this is no surprise at all, since this text is part of the document and can be seen by text-browsers like Lynx (and so google can see it too). A Javascript-generated text would be a greater surprise ;)
George Abitbol
I think links are more relevant too because they are usually more closely related to the content. But they are, amazingly, not actually part of the document content. They are prompts to tell the client to do a certain action...like JS.
As with others, I would also like to see a special seach option for JS snippets, speaking personally.
Jordan
That's what Google is about.
>For Google, the interesting feature of parsing JS (not the parsing done for indexing, but truly EXECUTING code) would be for detecting JS-abusing cheaters.
There is a connection between the two. How often isn't javascript being used to redirect the user to a different page and feed the search engine a spammy keyword stuffed page?
If Google would parse/execute javascript and skip refreshing/redirecting pages but index the target pages in stead this would cause a revolution in the SERPS.
First, i am not referring to this tag-lookalike used in html:
<script> Now, i know the example pages are not the general rule. I know that these are exceptions that only show up because they are badly formatted. I know those from the first post uses javascript in a place where javascript is not supposed to be used. It's just not the point. The point is derived from the fact that the pages do use properly formatted javascript even such javascript that is commented out. But that's not even the point itself.
The point is that these odd results indicate that Googlebot does not follow a rule similar to this pseudocode:
Read until "<script" or "<!--"
if ("<script" or "<!--" is found) {
skip all characters until after "</script>" or "-->"
} In stead, this finding proves that it does not skip things between script and comment tags. These things can be read and parsed and they are in fact being so (although i have only seen it in these odd cases). That's the real point here.
The Gbot is fully capable of reading and parsing javascript, there's no need to add this to the bot skill set, as it can clearly do it already. It's as simple as that. It's not odd or spectacular technically, as it also handles certain far more complex file formats, but it is interesting from a webmaster point of view.
Simply put, you can no longer be sure that what you put inside an on-page javascript is virtually invisible to Gbot - not even if you add comment tags. The on-page JavaScript does not make it to the SERPS for normal html pages but we now know for certain is that it's no problem for Gbot to read it and that it has in fact been done.
As for the use of it in SERPS... well, that's someting else than the indexing but i would like it, i need it, and i often curse the SE's of this world for not being able to show it. Comments even more so - there are certain generic comments that it would be of great value to me to be able to search for.
My searching needs are not like the ones of the average surfer, but i do not mind if this was part of an advanced search in stead of the general one. Even a paid one i'd say.
If you can find me one example of indexed javascript that isn't poorly-formed HTML, uses comment tags properly, sends the right headers, and has no other indexing problems, then that's something.
Otherwise, keep wishing.
If you can find me one example of indexed javascript that isn't poorly-formed HTML, uses comment tags properly, sends the right headers, and has no other indexing problems, then that's something.
Indeed...something other than what was pointed out. Something that has nothing to do with the conversation.
(claus pre-emptively responded):
[...] To parse just means to split it up into elements; the javascript is not considered one big black block as in "the javascript container", the individual elements of the block (like "location" and "href") are identified separately.
[...]
Now, i know the example pages are not the general rule. I know that these are exceptions that only show up because they are badly formatted. I know those from the first post uses javascript in a place where javascript is not supposed to be used. It's just not the point. The point is derived from the fact that the pages do use properly formatted javascript even such javascript that is commented out. But that's not even the point itself.
The point is that these odd results indicate that Googlebot does not follow a rule similar to this pseudocode:
Read until "<script" or "<!--"
if ("<script" or "<!--" is found) {
skip all characters until after "</script>" or "-->"
}
In stead, this finding proves that it does not skip things between script and comment tags. These things can be read and parsed and they are in fact being so (although i have only seen it in these odd cases). That's the real point here.
1. They are ABLE to index it, because;
2. They are already parsing it.
Jordan
Dolemite:
>> sounds like we caught you trying to blow the whistle early and you just can't deal with being wrong
I've got no problem with that, never had. I did write this in post #12: "Sorry if you feel it's a false alert, i was perhaps a bit fast, but i still think it's significant"
And i do think so. Still. As MonkeeSage said in post #18:
"This is a step foward in the discussion as it was formerly stalled at "Does Gogle even see anything inside of <script> tags?" "
Before we had assumptions, now we have knowledge. SERPS are the same it seems, but we know a little bit more about what's going on behind them. At least thats my humble opinion.
/claus
[edited by: claus at 7:42 pm (utc) on Aug. 20, 2003]
This is a step foward in the discussion as it was formerly stalled at "Does Gogle even see anything inside of <script> tags?"
It doesn't see inside my
<script> tags nor anyone else's who knows how to use HTML. The discussion was never stalled on this point. Google needs to be able to handle a large number of errors due to general incompetence and the lack of strict standards in browsers...for this reason complete validation may never truly be a factor. However, there are some errors that it doesn't "handle,"
<script> tags inside <title> tags being somewhat of an extreme example. Any language is "parsable" by Googlebot by this broken definition. I can link to a Prolog file, insert Prolog randomly in my HTML, or display a few lines of Prolog in a normal/proper way on a webpage. In all cases it might be searchable and one might say google has indexed my Prolog...but put it inside
<HEAD><SCRIPT><!-- //--></SCRIPT></HEAD> or in an external file using <SCRIPT SRC=file>, where if a browser were to do anything with arcane logic languages besides display them, it ought to be, and that Prolog disappears.