Forum Moderators: phranque

Message Too Old, No Replies

Blocking FrontPage User-Agent

Please check if I am doing this correctly

         

mayest

7:47 pm on Jul 21, 2008 (gmt 0)

10+ Year Member



I'm on a shared host without access to server logs. So, I wrote some PHP to log all accesses to my 404 error page. One disturbing thing that I noticed is a couple of visitors apparently accessing my site using FrontPage. I assume that they were scraping pages. I want to block them from now on.

I've seen a couple of user agents:
MSFrontPage/5.0 and MSFrontPage/12.0

They are requesting files that start with /_vti_ (e.g., /_vti_bin/shtml.exe/_vti_rpc). From searching this forum, I think that FrontPage looks for those files automatically so I also want to block anything that makes a request that contains /_vti_.

Here is the code that I used and seems to work, at least for blocking /_vti_ (I can't test for the user-agent):

RewriteCond %{HTTP_USER_AGENT} MS.?Frontpage [NC,OR]
RewriteCond %{REQUEST_URI} /_VTI_ [NC]
RewriteRule .* - [F]

I think that the first condition will block any user agent string that contains "MS(anything)Frontpage." Is that correct?

Now, each of these "scrapers" got a 404 using the above user-agent, but then came back immediately using a user-agent of something like this (exactly this in one case):

Mozilla/2.0 (compatible; MS FrontPage 5.0)

They are all like that, except that the version numbers may differ. So, the final question is am I possibly blocking innocent users that just happen to have some version of FrontPage installed on their PC (not surfing with it), or am I safe on that score?

I realize that they may still come back with a regular browser and do a copy and paste, so I'll probably block any IP that gets caught doing this.

Thanks,

Tim

Samizdata

8:20 pm on Jul 21, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I use these and would say they are perfectly safe:

RewriteCond %{HTTP_USER_AGENT} FrontPage [NC,OR]
RewriteCond %{REQUEST_URI} ^/(MSOfficeŚ_vti) [NC]
RewriteRule .* - [F]

Note: Replace broken pipe in second line with a solid one

...

mayest

8:36 pm on Jul 21, 2008 (gmt 0)

10+ Year Member



Thanks. Your second condition is an improvement over mine. The first one will block any user agent that contains "frontpage" anywhere in the string, right? If so, that is another improvement.