Welcome to WebmasterWorld Guest from 54.145.58.37

Forum Moderators: goodroi

Message Too Old, No Replies

Apache: execute "global" php script when robots.txt requested

Is there a way to make Apache run a script for a load of a particular file?

     
6:59 am on Aug 20, 2012 (gmt 0)

New User

5+ Year Member

joined:June 30, 2010
posts: 36
votes: 0


Hi all,

Wondering if it's possible to configure Apache to run a script when there is a request for a particular file. I guess one way would be a rewrite rule for the entire server (encompassing all hostnames), but I'm curious if there are any other solutions.

I want to log various details when robots.txt is fetched, but I don't want to have to configure each individual site to do it.

Thanks for any tips!
7:29 am on Aug 20, 2012 (gmt 0)

Senior Member

WebmasterWorld Senior Member topr8 is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Apr 19, 2002
posts:3206
votes: 13


probably not the most elegant way but what i do is to have a rewrite rule in each virtual host, to rewrite robots.txt as robots.php.

i then have a robots.php in the root of each website that is empty except for an include file, which calls the php file which builds the robots txt (this is a generic file i use to build all the robots txt's)
8:19 am on Aug 20, 2012 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


I posted a complete robots.txt file logger script here only a few days ago. It makes a new file once per week.

It works via a RewriteRule. It also detects requested hostname and could be easily configured to generate a separate file per site.
12:11 pm on Aug 20, 2012 (gmt 0)

New User

5+ Year Member

joined:June 30, 2010
posts: 36
votes: 0


I was hoping to do a global config. I've never used mod_rewrite, but I presume it should be like other Apache directives where you can either configure it within a <site ...> container, or globally?
12:20 pm on Aug 20, 2012 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:10562
votes: 14


mod-rewrite directives are legal in server config context so there's apparently no reason you can't have a server-wide RewriteRule configured there.
12:30 pm on Aug 20, 2012 (gmt 0)

New User

5+ Year Member

joined:June 30, 2010
posts: 36
votes: 0


I guess the question is whether Apache will permit me to rewrite to an absolute path on the server (one script file per server), or it has to be located within the director[y/ies] permitted for that site (one script file per site... back to square one, really). Will have to do some further investigating.
5:05 pm on Aug 20, 2012 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:13218
votes: 348


Oh, I see the problem. You want to rewrite to a location outside the domain where the original request took place. Have a look at how mod_rewrite handles proxies (flags [P] and [PT]). In simplistic terms, that's an external redirect that is made to look like an internal rewrite. Stop screaming, g1, I did say "simplistic".

That's assuming you want the robot itself to be shipped off to some new location where you can Do Stuff on the fly. If all you need to do is get the information and process it for later use, that's a whole different question.
6:46 pm on Aug 20, 2012 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


You don't need a proxy. Rewrite requests for robots.txt to robots.php and the "include" the rest of the scripting from there. The included files can be abywhere within the server filesystem.
7:53 pm on Aug 20, 2012 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:13218
votes: 348


If you're doing that, don't you have to put a separate "robots.php" file in each domain? I got the impression that's what the OP was trying to avoid.
7:53 pm on Aug 20, 2012 (gmt 0)

New User

5+ Year Member

joined:June 30, 2010
posts: 36
votes: 0


I think I've figured it out. There is no need for per-site rewrite rules. All these lines are global in httpd.conf:


<Directory /var/www/robots-txt/>
<Files handler.php>
SetHandler application/x-httpd-php
</Files>
</Directory>
ScriptAliasMatch ^/robots.txt$ /var/www/robots-txt/handler.php


This sets the Apache handler for the file /var/www/robots-txt/handler.php to PHP, and maps a request for robots.txt (for any site) to /var/www/robots-txt/handler.php

And here's a sample handler which prints a smiley, then adds the original robots.txt file (if it exists)


<?php
header("Content-type: text/plain");
echo "# :)\n";
if (file_exists($_SERVER["DOCUMENT_ROOT"] . "/robots.txt")) readfile($_SERVER["DOCUMENT_ROOT"] . "/robots.txt");
?>


edit: updated config to specifically only enable PHP for the one specific file, rather than an entire directory.
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members