Forum Moderators: open
Other possibilities include Google checking for hidden text; which can of course be achived through JavaScript.
The concensus seems to be that CSS files are ignored. If this is still the case, it seems unlikely javascript files are being scanned for hidden text algos.
Kaled.
PS
Given that AllTheWeb follows javascript urls, I imagine Google will catch up eventually, unless, of course, they choose not to as a deliberate policy decision.
64.68.88.38 - - [22/Jul/2003:02:36:50 -0400] "GET /style/MYMENU.js HTTP/1.0" 200 25772 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"
Is this normal? New?
I don't particularly mind, just curious because it is certainly new for our site, even though we've had javascript menus for almost a year now.
Thanks!
Basic assumption is that they're just looking for URLs within the JavaScript text. That doesn't require the ability to parse it, you just do a regular expression search for anything that smells like a URL.
The other point is that JavaScript can be used to create hidden text (via CSS manipulation). I've heard that a hidden text enquiry only happens in response to a SPAM complaint, so not sure about this, however it might be a spot check or something like that. Perhaps they're being thorough with top results for popular queries, and automatically adding frequent top results to a "have a closer look queue".
Remember you can always use robots.txt to keep G'bot off your JavaScript.
Any way that is my understanding. Happy to be proved wrong.
I want to an answer to this: even I'm relatively new as webmaster, i've some experience in programming and, after all, JS is a programming language and any robot is a program. So here it goes:
- If a robot is able to crawl through function calls in javascript, it will be able to find ALL links done with it.
- The implementation to crawl through JS calls would be similar to the one to crawl <a> links, so it could be done.
So, this is my oponion: google is beginning to follow JS. The cheaters will need to search for another way to cheat, and I won't need to remake my menus to get them spidered and crawled.
Herenvardo
which brings me to the next point.. seeing the js is delivered to the page on a non robot.txt'd page could google or any other robot claim that it is quite legitimate to read it, as it actually appears on a legitimate page and they didnt have to crawl it independently?
it may be a fine line here.. For example when they spider some SSI pages the content may come from many original directories if it has server side includes.. The text and code to the crawler in this case is "on the page" to the robot as processing is done before it is crawled, but in reality some of it is fetched from other directories. Now js, at least the types i know of, isnt server side, but the output still appears on the page... and may be argued to be fair game for indexing as part of the page, but not as a separate file if its been robot.txt'd in the source directory.
[edited by: chiyo at 4:56 pm (utc) on July 24, 2003]
I'm saying Google should scan the file if it's linked to an indexed file.
Have you any evidence yet that Googlebot has requested a page that could only have been discovered by parsing your .js?
I've got standard links on all my pages as well as the javascript, so I can't tell whether Googlebot is following the javascript.
Anyone out there that has had Googlebot follow their external .js file to a page that wasn't linked anywhere else?
64.68.88.30 - - [21/Jul/2003:03:06:29 -0500] "GET /XXX.js HTTP/1.0" 200 520 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"
Google has requested my javascript file on two occasions over the past month.... have added the following to my js file, but it may take a while to see the outcome (if any)... anybody getting more hits on a js file could try the same thing....
var url1 = 'http://SITENAME.COM/page_1.html';
var url2 = 'SITENAME.COM/page_2.html';
var url3 = '/page_3.html';
var url4 = 'http://' + 'SITENAME' + '.com/page_4.html';
function test()
{
var url5 = 'http://SITENAME.COM/page_5.html';
}
var url6 = 'href=http://SITENAME.COM/page_6.html';
var url7 = 'href="http://SITENAME.COM/page_7.html"';
var url8 = "http://SITENAME.COM/page_7.html";
I read that it might be initiated as some kind of spam check after a spam report, looking for hidden text... but my site in question is under 6 weeks old and is very unlikely to have been reported for spam.
But you are definitely wrong on this one. This number on www was about 15,000 this morning and it is now 30,000 12 hours later. Look at all the url's that Google has listed that end in .txt
Google's index of JavaScript links [google.com]
Ted