Welcome to WebmasterWorld Guest from 188.8.131.52
Forum Moderators: open
I want "SOME OF THE LINKS" from my home page not to be followed by Google (to reduce the number of links at the home page).
All responses will be highly appreciated.
Thanks in advance.
184.108.40.206 - - [22/Jul/2003:02:36:50 -0400] "GET /style/MYMENU.js HTTP/1.0" 200 25772 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"
Is this normal? New?
joined:Apr 13, 2001
Assuming that Google are :-
B) doing this as part of their anti-seo/spam strategies
can we be certain that robots.txt will be respected?
Any way that is my understanding. Happy to be proved wrong.
respecting robots.txt standard does NOT mean that a robot should not read the file
In fact, that's exactly what it does mean by my reading:
It's interesting though, if I were Google, I think I'd analyze the file anyway - it would certainly clean house in many SERPS.
I want to an answer to this: even I'm relatively new as webmaster, i've some experience in programming and, after all, JS is a programming language and any robot is a program. So here it goes:
- The implementation to crawl through JS calls would be similar to the one to crawl <a> links, so it could be done.
So, this is my oponion: google is beginning to follow JS. The cheaters will need to search for another way to cheat, and I won't need to remake my menus to get them spidered and crawled.
which brings me to the next point.. seeing the js is delivered to the page on a non robot.txt'd page could google or any other robot claim that it is quite legitimate to read it, as it actually appears on a legitimate page and they didnt have to crawl it independently?
it may be a fine line here.. For example when they spider some SSI pages the content may come from many original directories if it has server side includes.. The text and code to the crawler in this case is "on the page" to the robot as processing is done before it is crawled, but in reality some of it is fetched from other directories. Now js, at least the types i know of, isnt server side, but the output still appears on the page... and may be argued to be fair game for indexing as part of the page, but not as a separate file if its been robot.txt'd in the source directory.
[edited by: chiyo at 4:56 pm (utc) on July 24, 2003]
I'm saying Google should scan the file if it's linked to an indexed file.
Have you any evidence yet that Googlebot has requested a page that could only have been discovered by parsing your .js?
Anyone out there that has had Googlebot follow their external .js file to a page that wasn't linked anywhere else?
220.127.116.11 - - [21/Jul/2003:03:06:29 -0500] "GET /XXX.js HTTP/1.0" 200 520 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"
var url1 = 'http://SITENAME.COM/page_1.html';
var url2 = 'SITENAME.COM/page_2.html';
var url3 = '/page_3.html';
var url4 = 'http://' + 'SITENAME' + '.com/page_4.html';
var url5 = 'http://SITENAME.COM/page_5.html';
var url6 = 'href=http://SITENAME.COM/page_6.html';
var url7 = 'href="http://SITENAME.COM/page_7.html"';
var url8 = "http://SITENAME.COM/page_7.html";
I read that it might be initiated as some kind of spam check after a spam report, looking for hidden text... but my site in question is under 6 weeks old and is very unlikely to have been reported for spam.
But you are definitely wrong on this one. This number on www was about 15,000 this morning and it is now 30,000 12 hours later. Look at all the url's that Google has listed that end in .txt
"These terms only appear in links pointing to this page: allintext document write a href"