homepage Welcome to WebmasterWorld Guest from 54.145.183.126
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Apache: execute "global" php script when robots.txt requested
Is there a way to make Apache run a script for a load of a particular file?
rowan194



 
Msg#: 4486406 posted 6:59 am on Aug 20, 2012 (gmt 0)

Hi all,

Wondering if it's possible to configure Apache to run a script when there is a request for a particular file. I guess one way would be a rewrite rule for the entire server (encompassing all hostnames), but I'm curious if there are any other solutions.

I want to log various details when robots.txt is fetched, but I don't want to have to configure each individual site to do it.

Thanks for any tips!

 

topr8

WebmasterWorld Senior Member topr8 us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4486406 posted 7:29 am on Aug 20, 2012 (gmt 0)

probably not the most elegant way but what i do is to have a rewrite rule in each virtual host, to rewrite robots.txt as robots.php.

i then have a robots.php in the root of each website that is empty except for an include file, which calls the php file which builds the robots txt (this is a generic file i use to build all the robots txt's)

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4486406 posted 8:19 am on Aug 20, 2012 (gmt 0)

I posted a complete robots.txt file logger script here only a few days ago. It makes a new file once per week.

It works via a RewriteRule. It also detects requested hostname and could be easily configured to generate a separate file per site.

rowan194



 
Msg#: 4486406 posted 12:11 pm on Aug 20, 2012 (gmt 0)

I was hoping to do a global config. I've never used mod_rewrite, but I presume it should be like other Apache directives where you can either configure it within a <site ...> container, or globally?

phranque

WebmasterWorld Administrator phranque us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4486406 posted 12:20 pm on Aug 20, 2012 (gmt 0)

mod-rewrite directives are legal in server config context so there's apparently no reason you can't have a server-wide RewriteRule configured there.

rowan194



 
Msg#: 4486406 posted 12:30 pm on Aug 20, 2012 (gmt 0)

I guess the question is whether Apache will permit me to rewrite to an absolute path on the server (one script file per server), or it has to be located within the director[y/ies] permitted for that site (one script file per site... back to square one, really). Will have to do some further investigating.

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4486406 posted 5:05 pm on Aug 20, 2012 (gmt 0)

Oh, I see the problem. You want to rewrite to a location outside the domain where the original request took place. Have a look at how mod_rewrite handles proxies (flags [P] and [PT]). In simplistic terms, that's an external redirect that is made to look like an internal rewrite. Stop screaming, g1, I did say "simplistic".

That's assuming you want the robot itself to be shipped off to some new location where you can Do Stuff on the fly. If all you need to do is get the information and process it for later use, that's a whole different question.

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4486406 posted 6:46 pm on Aug 20, 2012 (gmt 0)

You don't need a proxy. Rewrite requests for robots.txt to robots.php and the "include" the rest of the scripting from there. The included files can be abywhere within the server filesystem.

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4486406 posted 7:53 pm on Aug 20, 2012 (gmt 0)

If you're doing that, don't you have to put a separate "robots.php" file in each domain? I got the impression that's what the OP was trying to avoid.

rowan194



 
Msg#: 4486406 posted 7:53 pm on Aug 20, 2012 (gmt 0)

I think I've figured it out. There is no need for per-site rewrite rules. All these lines are global in httpd.conf:


<Directory /var/www/robots-txt/>
<Files handler.php>
SetHandler application/x-httpd-php
</Files>
</Directory>
ScriptAliasMatch ^/robots.txt$ /var/www/robots-txt/handler.php


This sets the Apache handler for the file /var/www/robots-txt/handler.php to PHP, and maps a request for robots.txt (for any site) to /var/www/robots-txt/handler.php

And here's a sample handler which prints a smiley, then adds the original robots.txt file (if it exists)


<?php
header("Content-type: text/plain");
echo "# :)\n";
if (file_exists($_SERVER["DOCUMENT_ROOT"] . "/robots.txt")) readfile($_SERVER["DOCUMENT_ROOT"] . "/robots.txt");
?>


edit: updated config to specifically only enable PHP for the one specific file, rather than an entire directory.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved