homepage Welcome to WebmasterWorld Guest from 54.161.191.154
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
exclude subdirectories
seoerer

10+ Year Member



 
Msg#: 529 posted 9:24 pm on Jan 7, 2005 (gmt 0)

Hello,

Suppose I want to exclude the subfolder "/bad" in my robots file. And suppose this subfolder can occur in my site in different ways.

eg:
mysite.com/bad
mysite.com/bad/zzz
mysite.com/abc/bad
mysite.com/abc/bad/xyz
mysite.com/abc/def/bad

I have put in:
"disallow: /bad"
in my robots.txt file thinking it would disallow bots from all those instances above. But it doesn't seem to be working. Looking at my log files, googlebot continues to browse to the disallowed folders/pages. I made this change almost a month ago now so I'm sure googlebot must have refreshed its cache of my robots.txt file by now.

Am I doing something wrong?

 

Staffa

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 529 posted 10:53 pm on Jan 7, 2005 (gmt 0)

If you want to exclude all bots from these sub-dirs then do

User-agent: *
Disallow: /bad/
Disallow: /bad/zzz/
Disallow: /abc/bad/
Disallow: /abc/bad/xyz/
Disallow: /abc/def/bad/

I'm sure that you know already that this only works with bots that do obey the robots.txt file protocol ;o)

seoerer

10+ Year Member



 
Msg#: 529 posted 11:08 pm on Jan 7, 2005 (gmt 0)

Here's my problem with that solution though:

The urls are dynamically generated and there is no way for me to generate a complete list of disallow statements as you mentioned.

So, for example:

mysite.com/abc/variable/bad

I want this to be disallowed. But the "variable" part of the url could literally be anything - it's a unique id coming from a database.

If it helps you to understand my problem, I'll give you a specific example:

mysite.com/products_id/595/language/french

So, in this case, the "595" is the variable, and the "/language" is the subfolder that I want to disallow. There is no way for me to create a zillion disallow statements to cover every possible products_id in my system.

Hope that makes sense?

As you may be able to tell, the URL is actually generated through an apache mod_rewrite. It is normally:

mysite.com?products_id=595&language=french

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved