homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Visit PubCon.com
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

Apache: execute "global" php script when robots.txt requested
Is there a way to make Apache run a script for a load of a particular file?

 6:59 am on Aug 20, 2012 (gmt 0)

Hi all,

Wondering if it's possible to configure Apache to run a script when there is a request for a particular file. I guess one way would be a rewrite rule for the entire server (encompassing all hostnames), but I'm curious if there are any other solutions.

I want to log various details when robots.txt is fetched, but I don't want to have to configure each individual site to do it.

Thanks for any tips!



 7:29 am on Aug 20, 2012 (gmt 0)

probably not the most elegant way but what i do is to have a rewrite rule in each virtual host, to rewrite robots.txt as robots.php.

i then have a robots.php in the root of each website that is empty except for an include file, which calls the php file which builds the robots txt (this is a generic file i use to build all the robots txt's)


 8:19 am on Aug 20, 2012 (gmt 0)

I posted a complete robots.txt file logger script here only a few days ago. It makes a new file once per week.

It works via a RewriteRule. It also detects requested hostname and could be easily configured to generate a separate file per site.


 12:11 pm on Aug 20, 2012 (gmt 0)

I was hoping to do a global config. I've never used mod_rewrite, but I presume it should be like other Apache directives where you can either configure it within a <site ...> container, or globally?


 12:20 pm on Aug 20, 2012 (gmt 0)

mod-rewrite directives are legal in server config context so there's apparently no reason you can't have a server-wide RewriteRule configured there.


 12:30 pm on Aug 20, 2012 (gmt 0)

I guess the question is whether Apache will permit me to rewrite to an absolute path on the server (one script file per server), or it has to be located within the director[y/ies] permitted for that site (one script file per site... back to square one, really). Will have to do some further investigating.


 5:05 pm on Aug 20, 2012 (gmt 0)

Oh, I see the problem. You want to rewrite to a location outside the domain where the original request took place. Have a look at how mod_rewrite handles proxies (flags [P] and [PT]). In simplistic terms, that's an external redirect that is made to look like an internal rewrite. Stop screaming, g1, I did say "simplistic".

That's assuming you want the robot itself to be shipped off to some new location where you can Do Stuff on the fly. If all you need to do is get the information and process it for later use, that's a whole different question.


 6:46 pm on Aug 20, 2012 (gmt 0)

You don't need a proxy. Rewrite requests for robots.txt to robots.php and the "include" the rest of the scripting from there. The included files can be abywhere within the server filesystem.


 7:53 pm on Aug 20, 2012 (gmt 0)

If you're doing that, don't you have to put a separate "robots.php" file in each domain? I got the impression that's what the OP was trying to avoid.


 7:53 pm on Aug 20, 2012 (gmt 0)

I think I've figured it out. There is no need for per-site rewrite rules. All these lines are global in httpd.conf:

<Directory /var/www/robots-txt/>
<Files handler.php>
SetHandler application/x-httpd-php
ScriptAliasMatch ^/robots.txt$ /var/www/robots-txt/handler.php

This sets the Apache handler for the file /var/www/robots-txt/handler.php to PHP, and maps a request for robots.txt (for any site) to /var/www/robots-txt/handler.php

And here's a sample handler which prints a smiley, then adds the original robots.txt file (if it exists)

header("Content-type: text/plain");
echo "# :)\n";
if (file_exists($_SERVER["DOCUMENT_ROOT"] . "/robots.txt")) readfile($_SERVER["DOCUMENT_ROOT"] . "/robots.txt");

edit: updated config to specifically only enable PHP for the one specific file, rather than an entire directory.

Global Options:
 top home search open messages active posts  

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved