Welcome to WebmasterWorld Guest from 220.127.116.11 , register , free tools , login , search , pro membership , help , library , announcements , recent posts , open posts Become a Pro Member
Robots.txt Question mud
how would I go about blocking search engines from spidering www.url.com/dir1/artist/old/ but not www.url.com/dir1/artist/ ?
Could I do someting like:
Welcome to Webmaster World, Mud.
Hmm.. thats not quite what I ment! :)
I mean I want to have one line of code in my robots.txt to include all my pages that have that pattern.
There is lots of artists:
www.url.com/dir/artist1/old/ www.url.com/dir/artist2/old/ www.url.com/dir/artist3/old/
Is there a way to do this with one line of code in the robots.txt, or should I give up looking? :p
Sure, just include that mid level subdir and it will get everything in the dirs below it. [ ...] searchengineworld.com scareduck
I'm having a hard time believing I read this here. See [ ]: robotstxt.org
Note that there can only be a single "/robots.txt" on a site. Specifically, you should not put "robots.txt" files in user directories, because a robot will never look at them. If you want your users to be able to create their own "robots.txt", you will need to merge them all into a single "/robots.txt". If you don't want to do this your users might want to use the Robots META Tag instead.
I read this as meaning you shouldn't expect any spider to ever look at robots.txt in subdirectories.
Scareduck - welcome to WmW :)
> I read this as meaning you shouldn't expect any spider to ever look at robots.txt in subdirectories.
You got it. Only /robots.txt ever gets requested.
The line required in the above example to do this is
Every directory below dir will not be indexed by standard compliant robots.
however I still want: www.url.com/dir1/artist/ to be spidered, just not: www.url.com/dir1/artist/old/
I know I can do the:
Just would be a lot of work to get a list of 6,000 artists to block the /old/ from. But I guess it sounds like thats the way I'll have to go? :(