topr8

msg:4486409 | 7:29 am on Aug 20, 2012 (gmt 0) |
probably not the most elegant way but what i do is to have a rewrite rule in each virtual host, to rewrite robots.txt as robots.php. i then have a robots.php in the root of each website that is empty except for an include file, which calls the php file which builds the robots txt (this is a generic file i use to build all the robots txt's)
|
g1smd

msg:4486416 | 8:19 am on Aug 20, 2012 (gmt 0) |
I posted a complete robots.txt file logger script here only a few days ago. It makes a new file once per week. It works via a RewriteRule. It also detects requested hostname and could be easily configured to generate a separate file per site.
|
rowan194

msg:4486463 | 12:11 pm on Aug 20, 2012 (gmt 0) |
I was hoping to do a global config. I've never used mod_rewrite, but I presume it should be like other Apache directives where you can either configure it within a <site ...> container, or globally?
|
phranque

msg:4486464 | 12:20 pm on Aug 20, 2012 (gmt 0) |
mod-rewrite directives are legal in server config context so there's apparently no reason you can't have a server-wide RewriteRule configured there.
|
rowan194

msg:4486465 | 12:30 pm on Aug 20, 2012 (gmt 0) |
I guess the question is whether Apache will permit me to rewrite to an absolute path on the server (one script file per server), or it has to be located within the director[y/ies] permitted for that site (one script file per site... back to square one, really). Will have to do some further investigating.
|
lucy24

msg:4486537 | 5:05 pm on Aug 20, 2012 (gmt 0) |
Oh, I see the problem. You want to rewrite to a location outside the domain where the original request took place. Have a look at how mod_rewrite handles proxies (flags [P] and [PT]). In simplistic terms, that's an external redirect that is made to look like an internal rewrite. Stop screaming, g1, I did say "simplistic". That's assuming you want the robot itself to be shipped off to some new location where you can Do Stuff on the fly. If all you need to do is get the information and process it for later use, that's a whole different question.
|
g1smd

msg:4486559 | 6:46 pm on Aug 20, 2012 (gmt 0) |
You don't need a proxy. Rewrite requests for robots.txt to robots.php and the "include" the rest of the scripting from there. The included files can be abywhere within the server filesystem.
|
lucy24

msg:4486570 | 7:53 pm on Aug 20, 2012 (gmt 0) |
If you're doing that, don't you have to put a separate "robots.php" file in each domain? I got the impression that's what the OP was trying to avoid.
|
rowan194

msg:4486571 | 7:53 pm on Aug 20, 2012 (gmt 0) |
I think I've figured it out. There is no need for per-site rewrite rules. All these lines are global in httpd.conf:
<Directory /var/www/robots-txt/> <Files handler.php> SetHandler application/x-httpd-php </Files> </Directory> ScriptAliasMatch ^/robots.txt$ /var/www/robots-txt/handler.php
This sets the Apache handler for the file /var/www/robots-txt/handler.php to PHP, and maps a request for robots.txt (for any site) to /var/www/robots-txt/handler.php And here's a sample handler which prints a smiley, then adds the original robots.txt file (if it exists)
<?php header("Content-type: text/plain"); echo "# :)\n"; if (file_exists($_SERVER["DOCUMENT_ROOT"] . "/robots.txt")) readfile($_SERVER["DOCUMENT_ROOT"] . "/robots.txt"); ?>
edit: updated config to specifically only enable PHP for the one specific file, rather than an entire directory.
|
|