Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Googlebot crawls URLs dynamically created by JavaScript

         

OutdoorWebcams

10:12 am on Apr 30, 2008 (gmt 0)

10+ Year Member



Maybe this is old news, but yesterday on one of my sites Googlebot started to crawl URLs which are purely created with JavaScript and not linked elsewhere (and now blocked in robots.txt).

Those URLs are web bugs for tracking and logging purposes with dynamic URLs also containing the JavaScript expression 'escape(document.location.href)', not just plain text URLs written in the source code.

Googlebot crawls those web bugs with reasonable parameters (and no guesswork from Googlebot).

Maybe this is a further effort to discover new web pages [webmasterworld.com], especially parsing JavaScript menus.

Have I missed something?

Lightguy1

4:44 pm on Apr 30, 2008 (gmt 0)

10+ Year Member



Are you sure it is the googlebot crawling and that the requests are not coming from somewhere else? Google respects the robots.txt protocol so if you have your robots.txt configured properly, Google bot would not be attempting to crawl those urls...

OutdoorWebcams

4:53 pm on Apr 30, 2008 (gmt 0)

10+ Year Member



Yes, it's Googlebot (Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)) coming from 66.249.72.199 (crawl-66-249-72-199.googlebot.com).

I have placed that URLs after I've noticed that they get crawled by Googlebot and since he reread robots.txt he doen't crawl them any longer.