Welcome to WebmasterWorld Guest from 174.129.151.95

Forum Moderators: goodroi

Message Too Old, No Replies

Robots.txt Question

   

mud

11:00 pm on Jan 11, 2002 (gmt 0)

10+ Year Member



www.url.com/dir1/artist/old/

how would I go about blocking search engines from spidering www.url.com/dir1/artist/old/ but not www.url.com/dir1/artist/ ?

Could I do someting like:
www.url.com/dir1/*/old/ ?

12:27 am on Jan 12, 2002 (gmt 0)

10+ Year Member



Welcome to Webmaster World, Mud.

Disallow: /dir1/artist/old/

mud

7:32 pm on Jan 12, 2002 (gmt 0)

10+ Year Member



Hmm.. thats not quite what I ment! :)

I mean I want to have one line of code in my robots.txt to include all my pages that have that pattern.

There is lots of artists:
www.url.com/dir/artist1/old/
www.url.com/dir/artist2/old/
www.url.com/dir/artist3/old/

Is there a way to do this with one line of code in the robots.txt, or should I give up looking? :p

7:44 pm on Jan 12, 2002 (gmt 0)

WebmasterWorld Administrator brett_tabke is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Sure, just include that mid level subdir and it will get everything in the dirs below it.
[searchengineworld.com...]
8:08 pm on Jan 14, 2002 (gmt 0)

10+ Year Member



I'm having a hard time believing I read this here. See [robotstxt.org ]:

Note that there can only be a single "/robots.txt" on a site. Specifically, you should not put "robots.txt" files in user directories, because a robot will never look at them. If you want your users to be able to create their own "robots.txt", you will need to merge them all into a single "/robots.txt". If you don't want to do this your users might want to use the Robots META Tag instead.

I read this as meaning you shouldn't expect any spider to ever look at robots.txt in subdirectories.

8:10 pm on Jan 14, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Scareduck - welcome to WmW :)

> I read this as meaning you shouldn't expect any spider to ever look at robots.txt in subdirectories.

You got it. Only /robots.txt ever gets requested.

The line required in the above example to do this is

Disallow: /dir/

Every directory below dir will not be indexed by standard compliant robots.

mud

3:21 am on Jan 17, 2002 (gmt 0)

10+ Year Member



however I still want:
www.url.com/dir1/artist/
to be spidered, just not:
www.url.com/dir1/artist/old/

I know I can do the:
Disallow: /dir1/artist/old/

Just would be a lot of work to get a list of 6,000 artists to block the /old/ from. But I guess it sounds like thats the way I'll have to go? :(