Welcome to WebmasterWorld Guest from 220.127.116.11
Forum Moderators: open
18.104.22.168 - - [22/Jul/2003:02:36:50 -0400] "GET /style/MYMENU.js HTTP/1.0" 200 25772 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"
Is this normal? New?
Any way that is my understanding. Happy to be proved wrong.
I want to an answer to this: even I'm relatively new as webmaster, i've some experience in programming and, after all, JS is a programming language and any robot is a program. So here it goes:
- The implementation to crawl through JS calls would be similar to the one to crawl <a> links, so it could be done.
So, this is my oponion: google is beginning to follow JS. The cheaters will need to search for another way to cheat, and I won't need to remake my menus to get them spidered and crawled.
which brings me to the next point.. seeing the js is delivered to the page on a non robot.txt'd page could google or any other robot claim that it is quite legitimate to read it, as it actually appears on a legitimate page and they didnt have to crawl it independently?
it may be a fine line here.. For example when they spider some SSI pages the content may come from many original directories if it has server side includes.. The text and code to the crawler in this case is "on the page" to the robot as processing is done before it is crawled, but in reality some of it is fetched from other directories. Now js, at least the types i know of, isnt server side, but the output still appears on the page... and may be argued to be fair game for indexing as part of the page, but not as a separate file if its been robot.txt'd in the source directory.
[edited by: chiyo at 4:56 pm (utc) on July 24, 2003]
I'm saying Google should scan the file if it's linked to an indexed file.
Have you any evidence yet that Googlebot has requested a page that could only have been discovered by parsing your .js?
Anyone out there that has had Googlebot follow their external .js file to a page that wasn't linked anywhere else?
22.214.171.124 - - [21/Jul/2003:03:06:29 -0500] "GET /XXX.js HTTP/1.0" 200 520 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"
var url1 = 'http://SITENAME.COM/page_1.html';
var url2 = 'SITENAME.COM/page_2.html';
var url3 = '/page_3.html';
var url4 = 'http://' + 'SITENAME' + '.com/page_4.html';
var url5 = 'http://SITENAME.COM/page_5.html';
var url6 = 'href=http://SITENAME.COM/page_6.html';
var url7 = 'href="http://SITENAME.COM/page_7.html"';
var url8 = "http://SITENAME.COM/page_7.html";
I read that it might be initiated as some kind of spam check after a spam report, looking for hidden text... but my site in question is under 6 weeks old and is very unlikely to have been reported for spam.
But you are definitely wrong on this one. This number on www was about 15,000 this morning and it is now 30,000 12 hours later. Look at all the url's that Google has listed that end in .txt