Welcome to WebmasterWorld Guest from 126.96.36.199
Forum Moderators: open
Did anybody see something like that in the logs? I tried a search here at WebmasterWorld and didn't turn up anything. Nor did I see it before in our logs.
This happened a day after a full crawl of our sites, although we have the respective *.js in place since the beginning of the year.
Im curious about what 'Googlebot/Test' is up for.
Well, I haven't seen GBot/Test ingoring robots.txt, to the contrary my logs indicate it does obey robots.txt, but it doesn't read robots.txt often.
So when I just tried to give it the boot for bloating my logfiles, it was to no avail. I guess this bot figures if it's welcomed once, it's okay to come back and pig out and not ask again. And it does have a big appetite for .js files.
Maybe the bot is checking for that, among other things.
Kaled i did not call you a liar. I apologise though for not saying: "you should be careful of making this definitive statement since your or my experience is not enough to draw definitive conclusions."
- these files are named .jscr (is this valid?)
- the files could be linked by a href from somewhere
The only links to these files (on my site) are of the form <SCRIPT SRC="script.jscr">
Enough reasons for me to say that you're either wrong or simply just trying to blow another bubble
Kaled i did not call you a liar.
So was I blowing another bubble? And just exactly what bubbles have I blown in the past?
It has been suggested that I move my js files into another directory and use robots.txt to disallow that directory (by Brett I think, but I could be mistaken).
I never heard of external js files with the suffix ".jscr". Didn't find anything about the suffix jscr [google.com] when searching google. Even developer.netscape.com doesn't mention this suffix. And the only posts here at WebmasterWorld that mention jscr [google.com] are two of your posts. O well, i guess i've learned something new.
In future, when someone mentions anomalous behaviour, ask them to sticky you the relevant url(s) - that way you can investigate. Accusing people of being wrong or blowing bubbles does not progress discussions does it?
As for my use of the extension .jscr, if I were a cartoon fan I could use the extension .loonytunes and I believe browsers would still handle it just fine. I think I'm right in saying I could do the same for my html files but I'm not going to waste time trying it.
PS Yes, I have read your sticky mail.
>I could use the extension .loonytunes and I believe browsers would still handle it just fine.
I will do a test like this the next days:
.. both lines on the same page.
I tend to bet that the .goofy file might get indexed by google while the .js file won't.
it doesn't read robots.txt often.
Afaik, robot indexing is all about mime types. So in cases a server isn't set up to return the correct mime type for a file, obscure files (without appropriate handlers) get treated by robots / indexers as regular text files and therefor get indexed.
Those of you who see their js files requested by gbot: would you mind to check the content type of these files using the Server Header Checker [searchengineworld.com] and report back?
This would be the correct httpd.conf entry for your files, kaled:
Now that makes perfect sense. But I'm not sure my host lets me do anything about this. I'll investigate when I have time. Perhaps if I change from .jscr to .js my host will supply the desired headers.
PS I don't know if its relevant, but whilst there are not any/many .js files indexed by Google, there are many entries for www.....myscript.js?param=value
I didn't think to ask if that applied to on-page or external JS, or both. My bad.
This could - like powdork mentioned - very well put an end to js-redirected doorways
- the redirect could be triggered by a series of events that the bot surely won't be able to simulate when executing the code.
- further more in JS you can write code on-the-fly using eval()
- or think of document.write etc.
Similarly, document.write is not a major problem, however, all this will take a vast quantity of CPU time. It then takes even more CPU power to analyse what's happening, and perhaps more still to simulate user interaction. And then, if Google decide to make use of all this work, presumably, they will ban some pages/sites, but because the algos will be less than perfect, innocent sites will suffer whilst guilty ones will find a way around all this.
Other travel sites appear to have various different techniques to get around this.
I refer to the spider sim here on webmaster world & one called poooodle...
"GET /foresee/stdLauncher.js HTTP/1.1" 302 299 "-" "Googlebot/Test"
"GET /foresee/triggerParams.js HTTP/1.1" 302 299 "-" "Googlebot/Test"
There are many more. It may be programmed to look for scripts which are known violators.
I wonder if this will put an end to the js redirected doorway pages that seem to be doing so well now.
While this is the first thing one might think of, I am wondering if this has anything to do with another problem I have seen recently...
Where keyword-stuffed spider-food is swapped with the actual page contents, by selectively "commenting-out" the undesired version via a function in an external js file, based on the userAgent.
For example -
var reg1 = /.o.g.e.ot/i;
var reg2 = /sl.rp/i;
var reg3 = /i.k.o.i/i;
if (! reg1.test(navigator.userAgent) &&...
Where they're doing a pattern match on the userAgent looking for Googlebot.
However, unless someone has their browser's userAgent set to return "Googlebot", the code still has the desired effect. And because the changes are happening client-side, there are no differences in the page that could be spotted by typical anti-cloaking detection methods.
The site where I came across this is getting #1 listings in the SERPs above valid sites. Even worse, the browser version of these cloaked pages are nothing but affiliate spam!
Per the rules, I can't include specifics regarding the site in my post.
However, GoogleGuy, if you're still out there, I would be more than happy to send you the details so you can pass this on the right people, I seem to recall you providing a specific addy/subject line during some of the updates for such purposes, just let me know where and what to title it and I'll be glad to get it to ya.
Within a few weeks the number of pages indexed by google jumped to over a thousand. For many they now ranked very high for fairly common words in their market.
Sometime over the last month or so everthing has dropped. the number of pages and the rank.
Perhaps googlebot is now following the .js links in the menu and nailing them for duplicate content? That would be a sad state of affairs.
Attempting to test for Googlebot as the user agent at the scripting level is close to barking mad.
kaled, I would agree, as I said, confused programming.
However, as I also indicated, the code still works, as the UA for most browsers will not identify themselves as Googlebot!
That is also confirmed by the fact that these cloaked pages are successfully fooling Google and enjoying many #1 listings above legitimate sites, based on contents which do not appear when viewed by a browser.
In fact, the page title and descriptive "snippet" in the Google listing are even different from what you see from a browser! (which is what first drew my attention to these pages)
The result is the entire cloaked page contents are swapped-out, including the title.
Spider sees one page, browser sees another. Classic cloaking, done client-side to evade detection.
Would love to get all the details to Google, so they can deal with this.
GoogleGuy, you still out there?