Forum Moderators: goodroi
-masterdirectory
+subdir1
+subdir2
...
All of my websites and virtual domains (test.domain.com/somepage.asp or www.domain.com/loging.asp) map to some file within this directory structure. There are no actual files at the /masterdirectory/ level. I want to block spiders from having access to everything without having to add meta tags to every asp page. So do I place robots.txt at
/masterdirectory/robots.txt
or
/masterdirectory/subdir1/robots.txt (and then again for each subdir)?
Sorry if this doesn't make sense, but I really need help with this. Thanks a lot!
I don't think robots follow redirects for the robots.txt file. You have to place it in the root like this:
robots.txt has to be in root, however robots should understand redirects and when in some cases redirect is made to other domain than the original one (say domain.com -> www.domain.com), then robots.txt have to be re-requested for new domain: this can happen in course of requesting normal non-robots URL. I can't say how many robots do follow that, but I suppose the most correct ones should even though it is PITA to program that logic and I know that at least some robots don't support it, can't speak for top tier engines however but I would imagine they got it sorted.
Thanks for all the responses so far!
I doubt it, never tried it though.
I am inclined to agree even though technically request for robots.txt is a normal web request that can be subject to redirection: all standard requires is to request it in the root, and not have to start requesting for it elsewhere (like in directories). My bot won't be too happy since it first checks for existance of robots.txt using HEAD request and only makes full request if it gets 200 response code :(
All of my websites and virtual domains (test.domain.com/somepage.asp or www.domain.com/loging.asp) map to some file within this directory structure
Sub-domains will be treated like separate domains for robots.txt purposes, so correct crawler will have to request robots from each of those ie: test.domain.com/robots.txt, since you can point subdomain to its own directory then you can place its own unique robots.txt file in each of those directories without having to redirect anything as redirection will be done implicitly by webserver. This still leaves issue of having lots of robot.txt files, but can't you use something like symbolic link to point to single real robots.txt somewhere else?
I recently dumped the meta robots for a robots.txt thanks to this dedicated section, and it’s sitting comfortably in my /mehere/robots.txt spot. its valid.
My question now is what to do with that meta info I had:
meta content="FOLLOW,INDEX" etc etc ... should I just remove it totally now? Or is there some special meta, or do I leave it in...?
Im not as clever as most here yet, and I’m oblivious to the obvious most times, any feedback would be great.
Thanks!