Welcome to WebmasterWorld Guest from 54.211.17.91

Forum Moderators: goodroi

Message Too Old, No Replies

Apache: execute "global" php script when robots.txt requested

Is there a way to make Apache run a script for a load of a particular file?

   
6:59 am on Aug 20, 2012 (gmt 0)



Hi all,

Wondering if it's possible to configure Apache to run a script when there is a request for a particular file. I guess one way would be a rewrite rule for the entire server (encompassing all hostnames), but I'm curious if there are any other solutions.

I want to log various details when robots.txt is fetched, but I don't want to have to configure each individual site to do it.

Thanks for any tips!
7:29 am on Aug 20, 2012 (gmt 0)

WebmasterWorld Senior Member topr8 is a WebmasterWorld Top Contributor of All Time 10+ Year Member



probably not the most elegant way but what i do is to have a rewrite rule in each virtual host, to rewrite robots.txt as robots.php.

i then have a robots.php in the root of each website that is empty except for an include file, which calls the php file which builds the robots txt (this is a generic file i use to build all the robots txt's)
8:19 am on Aug 20, 2012 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



I posted a complete robots.txt file logger script here only a few days ago. It makes a new file once per week.

It works via a RewriteRule. It also detects requested hostname and could be easily configured to generate a separate file per site.
12:11 pm on Aug 20, 2012 (gmt 0)



I was hoping to do a global config. I've never used mod_rewrite, but I presume it should be like other Apache directives where you can either configure it within a <site ...> container, or globally?
12:20 pm on Aug 20, 2012 (gmt 0)

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



mod-rewrite directives are legal in server config context so there's apparently no reason you can't have a server-wide RewriteRule configured there.
12:30 pm on Aug 20, 2012 (gmt 0)



I guess the question is whether Apache will permit me to rewrite to an absolute path on the server (one script file per server), or it has to be located within the director[y/ies] permitted for that site (one script file per site... back to square one, really). Will have to do some further investigating.
5:05 pm on Aug 20, 2012 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



Oh, I see the problem. You want to rewrite to a location outside the domain where the original request took place. Have a look at how mod_rewrite handles proxies (flags [P] and [PT]). In simplistic terms, that's an external redirect that is made to look like an internal rewrite. Stop screaming, g1, I did say "simplistic".

That's assuming you want the robot itself to be shipped off to some new location where you can Do Stuff on the fly. If all you need to do is get the information and process it for later use, that's a whole different question.
6:46 pm on Aug 20, 2012 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



You don't need a proxy. Rewrite requests for robots.txt to robots.php and the "include" the rest of the scripting from there. The included files can be abywhere within the server filesystem.
7:53 pm on Aug 20, 2012 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



If you're doing that, don't you have to put a separate "robots.php" file in each domain? I got the impression that's what the OP was trying to avoid.
7:53 pm on Aug 20, 2012 (gmt 0)



I think I've figured it out. There is no need for per-site rewrite rules. All these lines are global in httpd.conf:


<Directory /var/www/robots-txt/>
<Files handler.php>
SetHandler application/x-httpd-php
</Files>
</Directory>
ScriptAliasMatch ^/robots.txt$ /var/www/robots-txt/handler.php


This sets the Apache handler for the file /var/www/robots-txt/handler.php to PHP, and maps a request for robots.txt (for any site) to /var/www/robots-txt/handler.php

And here's a sample handler which prints a smiley, then adds the original robots.txt file (if it exists)


<?php
header("Content-type: text/plain");
echo "# :)\n";
if (file_exists($_SERVER["DOCUMENT_ROOT"] . "/robots.txt")) readfile($_SERVER["DOCUMENT_ROOT"] . "/robots.txt");
?>


edit: updated config to specifically only enable PHP for the one specific file, rather than an entire directory.
 

Featured Threads

Hot Threads This Week

Hot Threads This Month