"Googlebot/Test" Pulling External Javascipt?

Forum Moderators: open

Message Too Old, No Replies

"Googlebot/Test" Pulling External Javascipt?

Google seems to be testing something to check about javascript

jazzbo

3:16 pm on Mar 18, 2004 (gmt 0)

64.68.89.190 - - [18/Mar/2004:12:03:53 +0100] "GET /check-inkc.js HTTP/1.1" 200 463 "-" "Googlebot/Test"

Did anybody see something like that in the logs? I tried a search here at WebmasterWorld and didn't turn up anything. Nor did I see it before in our logs.

This happened a day after a full crawl of our sites, although we have the respective *.js in place since the beginning of the year.

Im curious about what 'Googlebot/Test' is up for.

mayor

4:58 pm on Mar 18, 2004 (gmt 0)

Me too. Posted a new thread too, but mods haven't released it. Maybe they won't now, since one is already started.

Anyway, I saw "Googlebot/Test" getting some javascript files and a few regular files as well.

However, it's main appetite seems to be javascript files.

It does seem to obey the robots.txt file.

bull

5:02 pm on Mar 18, 2004 (gmt 0)

It does obey robots.txt as my javascripts are in a disallowed directory and were not spidered.

Dolemite

5:05 pm on Mar 18, 2004 (gmt 0)

It does obey robots.txt as my javascripts are in a disallowed directory and were not spidered.

So because a rarely-seen spider didn't pull your particular .js files, you assume it obeys robots.txt?

loganoski

5:10 pm on Mar 18, 2004 (gmt 0)

Yes, I have GoogleBot/Test spidering several times my .js files.

bull

5:26 pm on Mar 18, 2004 (gmt 0)

So because a rarely-seen spider didn't pull your particular .js files, you assume it obeys robots.txt?

1. The bot is not at all rare, see several threads in f39, too.
2. Yes, from my observations I assume it obeys robots.txt
3. You -or anyone else- need to prove the contrary. No respect for robots.txt is commonly regarded as something, let's say, unethical for robots. In dubio pro reo.

sabai

6:08 pm on Mar 18, 2004 (gmt 0)

are any of you using adwords? It could checking that you don't have popups...

bull

6:13 pm on Mar 18, 2004 (gmt 0)

no.

Powdork

6:16 pm on Mar 18, 2004 (gmt 0)

I wonder if this will put an end to the js redirected doorway pages that seem to be doing so well now.

mayor

6:47 pm on Mar 18, 2004 (gmt 0)

>> javascript redirected doorways

Hmmmm, how do you do check to see if a site is using javascript redirected doorways?

bcolflesh

6:53 pm on Mar 18, 2004 (gmt 0)

how do you do check to see if a site is using javascript redirected doorways?

Log all your browser requests with a proxy/sniffer.

nancyb

6:55 pm on Mar 18, 2004 (gmt 0)

this bot has come around 5 times since 3/4, it pulled robots.txt the first time only, last visit was 3/16.

Requests just one page, same one all 5 times.

only js in the file is a third party counter.

This is a very old page that hasn't been changed/updated in over a year except for site wide navigation changes.

Odd...

GG did say he wasn't familiar with this bot and was going to check it out, but AFAIK he hasn't mentioned it again.

cashmere

6:55 pm on Mar 18, 2004 (gmt 0)

The ones I have seen use onmouseover, so load the page with the cursor outside the window.

bull

7:05 pm on Mar 18, 2004 (gmt 0)

fearing that my language-based redirect will be interpreted as spam, therefore scripts disallowed. Hope there is no penalty on disallowed scripts.

mayor

7:12 pm on Mar 18, 2004 (gmt 0)

Maybe they're looking for those automatic popups used by frauds to set affiliate link cookies, even if the visitor doesn't click on the link.

But why would the search engine care if one affiliate steals commissions from another one? Wouldn't they say, "hahha, serves them right!".

kaled

7:19 pm on Mar 18, 2004 (gmt 0)

I hate to burst bubbles, but this is not new and has nothing to do with the testbot. Google has been spidering and indexing my javascript files for maybe a year now.

Kaled.

mayor

7:21 pm on Mar 18, 2004 (gmt 0)

Yeah, I think a witch hunt about spiders visiting .js files would be a red herring. Maybe they're just looking for the deep web.

jazzbo

9:17 pm on Mar 18, 2004 (gmt 0)

Or they could try to improve on their abilities to check wether a browser gets different content than the bot. This could - like powdork mentioned - very well put an end to js-redirected doorways and would in most cases improve serps - which is the communicated goal if I am not mistaken.

Just my thoughts

jazzbo

mayor

12:42 pm on Mar 20, 2004 (gmt 0)

Well, this bot keeps coming back and is bloating my log files so I did try to exclude it in robots.txt but it doesn't check robots.txt anymore (it did at first) and just keeps sucking up bandwidth.

Would someone please teach this bot some manners!

pontifex

1:21 pm on Mar 20, 2004 (gmt 0)

hi folks,
i got the same JavaScript hungry AI here and it is following the links in functions() as well... I got one page that is loaded as a pop up and only in one external .js file... google picked it up yesterday :-/ strange. are they executing javascript or just following everything that looks like an URL in any file they find?
P!

kaled

1:36 pm on Mar 20, 2004 (gmt 0)

Whilst I very much doubt it, they could be running the javascript to see what transpires. It is much more likely that they are simply finding urls.

If you want to find out which is happening, just place a few dummy functions in your js files. In those dummy functions open a window using an url that does not exist made up of two strings.

If your error logs show the non-existent urls, then the bot is running the javascript.

It might be necessary, to be certain, to connect the dummy functions to a small links somewhere on the page.

Hopefully, someone will be able to provide an answer rather than just speculate and ruminate.

Kaled.

Powdork

4:27 pm on Mar 20, 2004 (gmt 0)

I have reported some sites that would (I would think) warrant a manual penalty because they are polluting the serps with pages that don't exist. My guess is that they haven't been removed by hand because Google wants to catch them with their automated technology. That leads me to believe they may be planning on executing the js.

[edited by: Powdork at 4:37 pm (utc) on Mar. 20, 2004]

irishaff

4:34 pm on Mar 20, 2004 (gmt 0)

I dont know a lot about javascript , but have a small question..

I use JS links to conserve PR ( ie i dont want to leak it to contact page for example ) . the format is javascript:document.write....

Does this mean the pages will now be spidered with this new bot?

micahb37

4:47 pm on Mar 20, 2004 (gmt 0)

GG has made mention in the past that Googlebot will follow links in js.

- you might want to simplify your javascript. Google can often extract urls from javascript, but I can believe that doing utterly weird stuff might mess things up.

[webmasterworld.com...]

Yidaki

6:15 pm on Mar 20, 2004 (gmt 0)

>Google has been spidering and indexing my javascript files for maybe a year now.

Forgive me but i simply don't believe you. I haven't seen this on any of the 50+ sites i manage for me and my clients - many of these sites use javascript for many things - from popup to document write. Never ever had any single js crawled by gbot nor any other major bot.

Enough reasons for me to say that you're either wrong or simply just trying to blow another bubble.

Regarding GBot/Test ignoring robots.txt: it wouldn't surprise me. Doh! A bot that doesn't identify itself with a user agent that contains correct contact info and/or correct whois info +plus+ GoogleGuy claiming that he doesn't know anything about a Testbot ... sorry, but this stinks ...

cyberprosper

7:37 pm on Mar 20, 2004 (gmt 0)

Check the IP address... it is not google. It is some guy who has named his browser :Googlebot: Ban him from your site and move on.

bull

8:07 pm on Mar 20, 2004 (gmt 0)

Check the IP address... it is not google

[webmasterworld.com...]

kaled

12:08 am on Mar 21, 2004 (gmt 0)

Yidaki,

If you've read many of my posts you would know that I very rarely make definitive statements. When I do, you can bet your life that I can back them up.

When you have read and verified the stickymail I am about to send you, I trust you will apologise for calling me a liar.

Kaled.

Chris_R

2:05 am on Mar 21, 2004 (gmt 0)

Now now, let's keep it NICE.

There are a number of reasons both of you could experience what you have experienced and neither one would be wrong.

Google is crawling javascript for me - not for a year I don't think but for a few months at least. It is in the index - with the reverse links pointing to the page with the javascript links.

Google has made some MAJOR changes in the last month or so in the links they crawl and index. Probably the biggest change in two years from what I monitor.

As with any other thing google does - they do it in steps. They probably started off crawling simple javascript stuff first - and moved on from there. Depending on your PR, JS config, and other factors would depend on whether they would crawl you.

borisbaloney

4:10 am on Mar 21, 2004 (gmt 0)

I've been pondering this for an hour or so but can't work it out - what use is indexed javascript files for searchers?

This 58 message thread spans 2 pages: 58