Welcome to WebmasterWorld Guest from 54.158.238.108

Forum Moderators: goodroi

Message Too Old, No Replies

Googlebot Wildcard

     
8:26 pm on Nov 17, 2003 (gmt 0)

New User

10+ Year Member

joined:Nov 17, 2003
posts:3
votes: 0


Hi,

will my robots.txt that look like this:

User-agent: Googlebot
Disallow: *.js
Disallow: *.css

prevent Google to spider all *.js and *.css files on my server? Also these ones that are in subdirectories? Is there any difference to:

User-agent: Googlebot
Disallow: /*.js
Disallow: /*.css

Will this prevent Google to spider only all *.js and *.css files in my root directory?

4:18 pm on Nov 20, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 21, 1999
posts:2141
votes: 0


The only place wildcards are supported in the robots exclusion protocol [robotstxt.org] is in the User-agent variable.

"Disallow: *.js" will prevent the file named "*.js" being spidered.

"Disallow: /*.js" will prevent the directory named "*.js" being spidered.

<added>I nearly forgot! Welcome to WebmasterWorld, payday! :)</added>

8:05 pm on Nov 21, 2003 (gmt 0)

New User

10+ Year Member

joined:Nov 17, 2003
posts:3
votes: 0


Thanks for reply, but I think googlebot support wildcards. What is the right syntax, only for googlebot, to disallow indexing all *.js files on my server, also in all subdirectorys? Is this possible?
11:43 pm on Nov 21, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 21, 1999
posts:2141
votes: 0


GoogleGuy posted this:
[webmasterworld.com...]
Google says this:
[google.com...]

Maybe else someone has more detailed info?

11:26 am on Nov 22, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:May 14, 2002
posts:1192
votes: 0


From another item [google.com] in the Google FAQ:

Googlebot also understands some extensions to the robots.txt standard. Disallow patterns may include * to match any sequence of characters, and patterns may end in $ to indicate the end of a name. For example, to prevent Googlebot from crawling files that end in .gif, you may use the following robots.txt entry:
[pre]

User-Agent: Googlebot
Disallow: /*.gif$[/pre]

In a previous post someone stated that Googlebot is the only robot to accept these extensions, so using them will not keep other bots out of these pages.

 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members