homepage Welcome to WebmasterWorld Guest from 23.23.28.23
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld
Visit PubCon.com
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Robots.txt Question
mud




msg:1528705
 11:00 pm on Jan 11, 2002 (gmt 0)

www.url.com/dir1/artist/old/

how would I go about blocking search engines from spidering www.url.com/dir1/artist/old/ but not www.url.com/dir1/artist/ ?

Could I do someting like:
www.url.com/dir1/*/old/ ?

 

mdharrold




msg:1528706
 12:27 am on Jan 12, 2002 (gmt 0)

Welcome to Webmaster World, Mud.

Disallow: /dir1/artist/old/

mud




msg:1528707
 7:32 pm on Jan 12, 2002 (gmt 0)

Hmm.. thats not quite what I ment! :)

I mean I want to have one line of code in my robots.txt to include all my pages that have that pattern.

There is lots of artists:
www.url.com/dir/artist1/old/
www.url.com/dir/artist2/old/
www.url.com/dir/artist3/old/

Is there a way to do this with one line of code in the robots.txt, or should I give up looking? :p

Brett_Tabke




msg:1528708
 7:44 pm on Jan 12, 2002 (gmt 0)

Sure, just include that mid level subdir and it will get everything in the dirs below it.
[searchengineworld.com...]

scareduck




msg:1528709
 8:08 pm on Jan 14, 2002 (gmt 0)

I'm having a hard time believing I read this here. See [robotstxt.org ]:

Note that there can only be a single "/robots.txt" on a site. Specifically, you should not put "robots.txt" files in user directories, because a robot will never look at them. If you want your users to be able to create their own "robots.txt", you will need to merge them all into a single "/robots.txt". If you don't want to do this your users might want to use the Robots META Tag instead.

I read this as meaning you shouldn't expect any spider to ever look at robots.txt in subdirectories.

gethan




msg:1528710
 8:10 pm on Jan 14, 2002 (gmt 0)

Scareduck - welcome to WmW :)

> I read this as meaning you shouldn't expect any spider to ever look at robots.txt in subdirectories.

You got it. Only /robots.txt ever gets requested.

The line required in the above example to do this is

Disallow: /dir/

Every directory below dir will not be indexed by standard compliant robots.

mud




msg:1528711
 3:21 am on Jan 17, 2002 (gmt 0)

however I still want:
www.url.com/dir1/artist/
to be spidered, just not:
www.url.com/dir1/artist/old/

I know I can do the:
Disallow: /dir1/artist/old/

Just would be a lot of work to get a list of 6,000 artists to block the /old/ from. But I guess it sounds like thats the way I'll have to go? :(

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved