Forum Moderators: phranque
I have the .php files rewriten to another extension and if Googlebot showed up and ignored robots.txt (known to have happened), I'm in trouble. Google must've gottten them from the toolbar, since there are links and no one has any idea of the .php extension.
Can it be done and does anyone have any suggestions?
thanks again,
1. Redirect the php files to the static equiv., so even if they are requested, they will redirect any link or browser request to the correct static page. (This is done with THE_REQUEST to avoid a loop.)
Option 1 need *much* more information -- the entire .htaccess you are running, and all possible query_string patterns.
2. Deny all requests to php pages, and do not worry about the redirect. (This is again done with THE_REQUEST, and is the one I personally use most of the time, because it offers some protection for the files that run my sites.)
Option 2 is easy:
RewriteCond %{THE_REQUEST} .
RewriteRule \.php - [F]
The rule in this case is not anchored, so any request containing .php will match the pattern. The condition just checks for a single character, so we can define it as an original request. If it is an original request (link, typed in a browser, etc.) a forbiden error will be generated. If the request is secondary (internal, rewritten to, etc.) the condition will fail and the page will be served. This way, you can stop external access to your php files, but can still use them to serve the information to the static locations.
If you are wanting to redirect, the pages, please let us know.
Hope this helps.
Justin
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^Googlebot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Msnbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Slurp [NC,OR]
RewriteCond %{THE_REQUEST} .
RewriteRule \.php - [F]
Not sure if the last two lines fit in
thanks again,
RewriteCond %{THE_REQUEST} .
RewriteCond %{REQUEST_URI} !(thispage¦anotherpage¦somepage)\.php
RewriteRule \.php - [F]
I have left the second condition unanchored, because I do not know the path to the files, but you could use the full path like this:
RewriteCond %{REQUEST_URI} !^/(somedir/thispage¦another/dir/anotherpage¦somepage)\.php
I added the condition after checking for an original request, so we will not check all internal requests against the REQUEST_URI condition (IOW internal requests will break the condition sooner and free up a little processing.), but they can go in either order.
Hope this helps.
Justin
I tried this:
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^Googlebot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Msnbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Slurp [NC,OR]
RewriteCond %{THE_REQUEST} .
RewriteRule \.php - [F]
but my entire site is 403d (as a regular user too).
any ideas?
thanks again
with a tool that emulates a user agent and it still shows 200 as suppose to 403. Tried several .php files and even replaced {THE_REQUEST} just in case. The same tool shows a ban (403) for the same bots (from an entire domain) with this code:
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^Googlebot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Googlebot.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^FAST-Crawler [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Mediapartners-Google [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Msnbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Msnbot.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Teoma [NC]
RewriteRule .* - [F,L]
I will look at it again later on--with a clearer head I hope ;).
Should be this:
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^Googlebot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Msnbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Slurp [NC]
RewriteCond %{THE_REQUEST} .
RewriteRule \.php - [F]
OR
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^Googlebot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Msnbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Slurp [NC]
RewriteCond %{THE_REQUEST} \.php
RewriteRule \.php - [F]
Justin
BTW I find it easier to use F-fox user-agent switching to test than most sites -- allows you to set the user-agent to anything you want.
Added: This will still allow all regular users access to all php files, the best way to overcome that (if necessary) is to remove the user-agent cond. and use the specific files instead. Should have noted this before, sorry -- trouble communicating clear thought today =)
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^Googlebot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Msnbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Slurp [NC]
RewriteCond %{THE_REQUEST} \.php
RewriteRule \.php - [F]