Forum Moderators: phranque

Message Too Old, No Replies

Page referrer = Page requested

         

Mokita

4:01 am on Nov 22, 2010 (gmt 0)

10+ Year Member



Within the last few months, there was a thread in the Search Engine Spiders forum, where Pfui mentioned that she uses code provided by jdMorgan to thwart requests that have the same page as referrer for the actual page requested.

Try as I might (on several occasions spread over several weeks), I cannot find the thread to which she refers.

Please would some kind soul point me in the right direction?

Just yesterday, one of my sites had a request for robots.txt, ostensibly referred by robots.txt! An impossible scenario for a text only file.

jdMorgan

4:33 am on Nov 30, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I don't know where that other thread went, but it's easy enough to reproduce the technique:

Point solution:

RewriteCond %{HTTP_REFERER} ^robots\.txt$
RewriteRule ^robots\.txt$ - [F]

Multiple URL-paths:

RewriteCond %{HTTP_REFERER} ^https?://www\.example\.com\.?(:[0-9]+)?/([^?#]+)
RewriteCond %2>$1 ^robots\.txt>robots\.txt$ [OR]
RewriteCond %2>$1 ^somedir/somepage\.html>somedir/somepage\.html$ [OR]
RewriteCond %2>$1 ^otherdir/otherpage\.php>otherdir/otherpage\.php$
RewriteRule ^(.+)$ - [F]

Note that the ">" character is arbitrary. It is used only to allow the referrer and the requested URL-path to be combined while allowing the match to be unambiguous (as to where the first ends and the second begins). It need only be a character that has no special meaning to mod_rewrite where it is used, and that also will never appear in one of your URLs. You could also use a string or a different character, for example, "~" or "<->".

Note that the first RewriteCond extracts the URL-path (only) from the HTTP Referer, and the subsequent RewriteConds get that value as "%2".

A more efficient and easier-to-maintain solution is possible using atomic back-references. But those are not supported by all servers, so I present only the 'universal' solutions here.

Jim