homepage Welcome to WebmasterWorld Guest from 23.23.57.182
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
Googlebot Processing Javascript Functions
engine

WebmasterWorld Administrator engine us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month Best Post Of The Month



 
Msg#: 4454580 posted 1:43 pm on May 17, 2012 (gmt 0)

Googlebot Processing Javascript Functions [arstechnica.com]
During the last quarter of 2011, Google finally started to figure out how to efficiently solve the problem from its end, and began to roll out bots that could explore the dynamic content of pages in a limited fashion—crawling through the JavaScript within a page and finding URLs within them to add to the crawl. This required Google to allow its crawlers to send POST requests to websites in some cases, depending on how the JavaScript code was written, rather than the GET request usually used to fetch content. As a result, Google was able to start indexing Facebook comments, for example, as well as other "dynamic comment" systems.
Now, based on the logs Pankratov has shown, it appears that rather than just mining for URLs within scripts, the bots are crawling even deeper than comments, processing JavaScript functions in a way that mimics how they run when users click on the objects that activate them. That would give Google search even better access to the "deep Web"—content hidden in databases and other sources that generally hasn't been indexable before.
.

 

tedster

WebmasterWorld Senior Member tedster us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4454580 posted 6:58 pm on May 17, 2012 (gmt 0)

The big deal here - and Google's urgency - is indexing AJAX content, I assume.

onebuyone



 
Msg#: 4454580 posted 7:05 pm on May 17, 2012 (gmt 0)

Wasn't Googlebot adjusted to crawl FB comments and only them?

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4454580 posted 8:10 pm on May 17, 2012 (gmt 0)

Google's bots added a quarter of a million quids worth of products to the shopping basket of a site last week. They're now blocked.

Sgt_Kickaxe

WebmasterWorld Senior Member sgt_kickaxe us a WebmasterWorld Top Contributor of All Time



 
Msg#: 4454580 posted 9:34 pm on May 17, 2012 (gmt 0)

Google had switched to a visual method of recording page content a long time ago, before they launched page previews. They've been able to pick up textual comments loaded by javascript for a long time now. The real news is that they've begun trusting googlebot to dig deeper into code too.

I would NOT be surprised if your site(or you as a webmaster) needs to pass a *sniff test* with their visual methods first, or have a trustworthy history, before googlebot opens up in your javascript with post requests.

Sgt_Kickaxe

WebmasterWorld Senior Member sgt_kickaxe us a WebmasterWorld Top Contributor of All Time



 
Msg#: 4454580 posted 8:42 am on May 18, 2012 (gmt 0)

g1smd, was googlebot cookied through at least part of the checkout process?

rustybrick

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4454580 posted 11:21 am on May 18, 2012 (gmt 0)

They were doing this 6+ months ago with all that chatter around Disqus comments and Facebook comments.

Planet13

WebmasterWorld Senior Member planet13 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4454580 posted 1:44 pm on May 18, 2012 (gmt 0)

Google's bots added a quarter of a million quids worth of products to the shopping basket of a site last week. They're now blocked.


Huhh... on my shopping cart, I would get these mysterious instances where SEVERAL visitors would all arrive at the same time and add one product to the shopping cart and then leave.

They would all be added at the same minute. At first I thought it was some competitor trying to deplete my inventory, but now it seems more likely that it could be googlebot (because they only order ONE of each item, while a competitor would order hundreds or thousands of an item to deplete the inventory).

dstiles

WebmasterWorld Senior Member dstiles us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4454580 posted 7:51 pm on May 18, 2012 (gmt 0)

if not real user with real web browser then
do not display ANY forms
end if

Ditto javascript, css, whatever

koan

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4454580 posted 4:14 am on May 19, 2012 (gmt 0)

Would it run javascript code of a js file located in a directory blocked by robots.txt?

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4454580 posted 9:23 am on May 19, 2012 (gmt 0)

Would it run javascript code of a js file located in a directory blocked by robots.txt?

Preview definitely would if it could-- but then, Preview isn't a robot. So would the plainclothes bingbot.

dstiles

WebmasterWorld Senior Member dstiles us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4454580 posted 7:42 pm on May 19, 2012 (gmt 0)

koan - you cannot rely on any bot to obey robots.txt in all situations. They used to but, as lucy notes, preview and other plain-clothes bots can do anything.

Detect IP ranges, UAs, headers, whatever either within an htaccess file or within the page itself (I would have thought webmasters should be doing that anyway to determine real visitors). What you do with the catch when you get it depends on you. I generally throw it back with a 403 or, in the case of JS, do not load it within the page.

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4454580 posted 1:54 am on May 20, 2012 (gmt 0)

I generally throw it back with a 403 or, in the case of JS, do not load it within the page.

Do you mean that you include a bit in the js itself to detect the UA and/or IP and act accordingly? So the page gets a little bit fatter but you're shifting the work from your server to the visitor's computer?

Kendo

5+ Year Member



 
Msg#: 4454580 posted 6:10 am on May 20, 2012 (gmt 0)

Well there goes another area of PRIVACY thanks to Google.

A common practice to protect links and content from being indexed is to use JavaScript to write in the link.

Lord, please forgive them for they do not know what they do... after all they are only criminally insane idiots.

tedster

WebmasterWorld Senior Member tedster us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4454580 posted 6:40 am on May 20, 2012 (gmt 0)

The processing of javascript links should not be news to anyone who's been paying attention. It's been happening (and discussed here) for several years. To protect those javascripted links takes another step - Disallow googlebot from crawling your JS file, for instance.

This change is about being able to crawl AJAX content, presumably without the clunky hash-bang workaround. The sky is not always falling ;)

Staffa

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4454580 posted 8:57 am on May 20, 2012 (gmt 0)

Disallow googlebot from crawling your JS file, for instance.

As if, if G has a mind to it, G will follow your robots.txt disallow

I have caught G more than once disregarding robots.txt rules

dstiles

WebmasterWorld Senior Member dstiles us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4454580 posted 8:47 pm on May 20, 2012 (gmt 0)

Lucy - no, it's all part of the page processing. JS never gets sent to bots.

Staffa - see my previous post re: not having to rely on the clunky and easily ignored robots.txt.

Staffa

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4454580 posted 9:31 pm on May 20, 2012 (gmt 0)

dstiles - I know and I certainly do not rely on robots.txt, I'm just surprised that after all those years it's still suggested by tedster

tedster - unless you are using Disallow in the broader sense and not necessarily via robots.txt, in which case "Disallow" threw me off track as it is so specifically associated with robots.txt

tedster

WebmasterWorld Senior Member tedster us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4454580 posted 10:34 pm on May 20, 2012 (gmt 0)

No - I did mean robots.txt. I've had good luck with it, although I have heard that others ran into trouble. Do you have any idea about what the differences might be? I've mostly used it to keep affiliate links from "being counted."

Staffa

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4454580 posted 10:44 am on May 21, 2012 (gmt 0)

I have no idea whatsoever, the sites are as plain as they come without any ads or other external input. The javascript and css are purely for visitors and disallowed in robots.txt and when
G ignores this it gets whacked like any other rogue bot

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4454580 posted 4:25 pm on May 21, 2012 (gmt 0)

The javascript and css are purely for visitors and disallowed in robots.txt and when
G ignores this it gets whacked like any other rogue bot

Do you have the googlebot itself reading and acting on javascript? See, I could have sworn I'd caught it myself. Many times. But I pored over logs and all I could find was Preview consistently misbehaving.

Staffa

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4454580 posted 5:43 pm on May 21, 2012 (gmt 0)

Yes, it's Gbot itself occasionally fetching css and js files.
Preview and translate are blocked as standard ;o)

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved