sub.example.com redirects to sub.example.com/sub/ via 301.
I want to make sure that no spiders get into
the problem is that the robots.txt is at the root. I assume G and other crawlers will ask for the robots.txt at sub.example.com/robots.txt even though the root for the subdomain is really sub.example.com/sub/
How can I set up a separate robots.txt for the subdomain?
The problem is the 301 redirect from sub.domain.com to sub.domain.com/sub/ . This tells the robot that /sub/ is not a root-level directory.
You might want to use a transparent redirect instead, and place your robots.txt for sub.domain.com in sub.domain.com/sub/robots.txt.
Using a transparent redirect means that the robot will see sub.domain.com as a domain in its own right, separate and distinct from domain.com or www.domain.com. Pages will be indexed in the sub.domain.com domain (sub.domain.com will be in the listed URL). If this is a problem, 301-redirect the pages using .htaccess in the /sub/ folder itself; Robots will then reach that level (that subdirectory) and read robots.txt before they see the redirects for the other pages.
My code is for .htaccess on Apache, and yours is for PERL. If you're on Apache, the code I provided would simply replace the PERL script. It would also be processed in "native mode" by Apache, and therefore be more efficient.
You could ask for advice over in the PERL scripting forum if you are not hosted on Apache and can't set up something similar using your control panel (e.g. on an IIS server).