Welcome to WebmasterWorld Guest from 54.224.200.104

Forum Moderators: goodroi

Message Too Old, No Replies

Googlebot Wildcard

     
8:26 pm on Nov 17, 2003 (gmt 0)

New User

10+ Year Member

joined:Nov 17, 2003
posts:3
votes: 0


Hi,

will my robots.txt that look like this:

User-agent: Googlebot
Disallow: *.js
Disallow: *.css

prevent Google to spider all *.js and *.css files on my server? Also these ones that are in subdirectories? Is there any difference to:

User-agent: Googlebot
Disallow: /*.js
Disallow: /*.css

Will this prevent Google to spider only all *.js and *.css files in my root directory?

4:18 pm on Nov 20, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 21, 1999
posts:2141
votes: 0


The only place wildcards are supported in the robots exclusion protocol [robotstxt.org] is in the User-agent variable.

"Disallow: *.js" will prevent the file named "*.js" being spidered.

"Disallow: /*.js" will prevent the directory named "*.js" being spidered.

<added>I nearly forgot! Welcome to WebmasterWorld, payday! :)</added>

8:05 pm on Nov 21, 2003 (gmt 0)

New User

10+ Year Member

joined:Nov 17, 2003
posts:3
votes: 0


Thanks for reply, but I think googlebot support wildcards. What is the right syntax, only for googlebot, to disallow indexing all *.js files on my server, also in all subdirectorys? Is this possible?
11:43 pm on Nov 21, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 21, 1999
posts:2141
votes: 0


GoogleGuy posted this:
[webmasterworld.com...]
Google says this:
[google.com...]

Maybe else someone has more detailed info?

11:26 am on Nov 22, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:May 14, 2002
posts:1192
votes: 0


From another item [google.com] in the Google FAQ:

Googlebot also understands some extensions to the robots.txt standard. Disallow patterns may include * to match any sequence of characters, and patterns may end in $ to indicate the end of a name. For example, to prevent Googlebot from crawling files that end in .gif, you may use the following robots.txt entry:
[pre]

User-Agent: Googlebot
Disallow: /*.gif$[/pre]

In a previous post someone stated that Googlebot is the only robot to accept these extensions, so using them will not keep other bots out of these pages.