homepage Welcome to WebmasterWorld Guest from 54.211.235.255
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Googlebot Wildcard
payday




msg:1526922
 8:26 pm on Nov 17, 2003 (gmt 0)

Hi,

will my robots.txt that look like this:

User-agent: Googlebot
Disallow: *.js
Disallow: *.css

prevent Google to spider all *.js and *.css files on my server? Also these ones that are in subdirectories? Is there any difference to:

User-agent: Googlebot
Disallow: /*.js
Disallow: /*.css

Will this prevent Google to spider only all *.js and *.css files in my root directory?

 

DaveAtIFG




msg:1526923
 4:18 pm on Nov 20, 2003 (gmt 0)

The only place wildcards are supported in the robots exclusion protocol [robotstxt.org] is in the User-agent variable.

"Disallow: *.js" will prevent the file named "*.js" being spidered.

"Disallow: /*.js" will prevent the directory named "*.js" being spidered.

<added>I nearly forgot! Welcome to WebmasterWorld, payday! :)</added>

payday




msg:1526924
 8:05 pm on Nov 21, 2003 (gmt 0)

Thanks for reply, but I think googlebot support wildcards. What is the right syntax, only for googlebot, to disallow indexing all *.js files on my server, also in all subdirectorys? Is this possible?

DaveAtIFG




msg:1526925
 11:43 pm on Nov 21, 2003 (gmt 0)

GoogleGuy posted this:
[webmasterworld.com...]
Google says this:
[google.com...]

Maybe else someone has more detailed info?

Mohamed_E




msg:1526926
 11:26 am on Nov 22, 2003 (gmt 0)

From another item [google.com] in the Google FAQ:

Googlebot also understands some extensions to the robots.txt standard. Disallow patterns may include * to match any sequence of characters, and patterns may end in $ to indicate the end of a name. For example, to prevent Googlebot from crawling files that end in .gif, you may use the following robots.txt entry:
[pre]

User-Agent: Googlebot
Disallow: /*.gif$[/pre]

In a previous post someone stated that Googlebot is the only robot to accept these extensions, so using them will not keep other bots out of these pages.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved