Forum Moderators: phranque
Would anyone know why they visit my site, sometimes as often as 4 times a day but do not actually pull any documents, pages or images?
I've emailed their support and abuse contacts but have had no reply in two weeks.
Many thanks
[edited by: phranque at 3:21 am (utc) on Dec. 11, 2009]
[edit reason] hosting specifics [/edit]
Here's a snippet of my logs with the IP's edited, which might explain my question a little better. No other file hits are omitted,the visits below are the only hits received.
Many thanks for your reply.
---.45.109.18 - - [08/Nov/2009:23:59:28 +0000] "GET / HTTP/1.1" 200 16632 "-" "MSIE 7.0"
---.45.109.18 - - [08/Nov/2009:23:59:28 +0000] "GET / HTTP/1.1" 200 16632 "-" "MSIE 7.0"
---.45.109.18 - - [11/Nov/2009:01:39:38 +0000] "GET / HTTP/1.1" 403 290 "-" "MSIE 7.0"
---.45.109.18 - - [11/Nov/2009:01:39:38 +0000] "GET / HTTP/1.1" 403 290 "-" "MSIE 7.0"
---.45.109.18 - - [11/Nov/2009:04:59:01 +0000] "GET / HTTP/1.1" 403 288 "-" "-"
---.45.109.18 - - [11/Nov/2009:04:59:01 +0000] "GET / HTTP/1.1" 403 288 "-" "-"
---.45.109.18 - - [27/Nov/2009:21:33:22 +0000] "GET / HTTP/1.1" 403 1240 "-" "-"
---.45.109.18 - - [27/Nov/2009:21:33:24 +0000] "GET / HTTP/1.1" 403 1240 "-" "-"
---.45.109.18 - - [27/Nov/2009:22:40:40 +0000] "GET / HTTP/1.1" 403 1240 "-" "-"
---.45.109.18 - - [27/Nov/2009:22:40:44 +0000] "GET / HTTP/1.1" 403 1240 "-" "-"
---.45.109.18 - - [28/Nov/2009:09:34:48 +0000] "GET / HTTP/1.1" 403 1240 "-" "-"
---.45.109.18 - - [28/Nov/2009:09:34:50 +0000] "GET / HTTP/1.1" 403 1240 "-" "-"
---.45.109.18 - - [28/Nov/2009:12:24:01 +0000] "GET / HTTP/1.1" 403 1240 "-" "-"
---.45.109.18 - - [28/Nov/2009:12:24:02 +0000] "GET / HTTP/1.1" 403 1240 "-" "-"
---.45.114.178 - - [28/Nov/2009:13:50:56 +0000] "GET / HTTP/1.1" 403 1241 "-" "-"
---.45.114.178 - - [28/Nov/2009:13:50:57 +0000] "GET / HTTP/1.1" 403 1241 "-" "-"
---.45.114.178 - - [29/Nov/2009:00:00:45 +0000] "GET / HTTP/1.1" 403 1241 "-" "-"
---.45.114.178 - - [29/Nov/2009:00:00:49 +0000] "GET / HTTP/1.1" 403 1241 "-" "-"
---.45.114.178 - - [29/Nov/2009:01:54:21 +0000] "GET / HTTP/1.1" 403 1241 "-" "-"
---.45.114.178 - - [29/Nov/2009:01:54:23 +0000] "GET / HTTP/1.1" 403 1241 "-" "-"
---.45.109.18 - - [29/Nov/2009:03:52:56 +0000] "GET / HTTP/1.1" 403 1240 "-" "-"
---.45.109.18 - - [29/Nov/2009:03:53:01 +0000] "GET / HTTP/1.1" 403 1240 "-" "-"
---.45.109.18 - - [29/Nov/2009:03:52:56 +0000] "GET / HTTP/1.1" 403 1240 "-" "-"
---.45.109.18 - - [29/Nov/2009:03:53:01 +0000] "GET / HTTP/1.1" 403 1240 "-" "-"
---.45.109.18 - - [29/Nov/2009:04:24:45 +0000] "GET / HTTP/1.1" 403 1240 "-" "-"
---.45.109.18 - - [29/Nov/2009:04:24:47 +0000] "GET / HTTP/1.1" 403 1240 "-" "-"
---.45.109.18 - - [30/Nov/2009:15:19:45 +0000] "GET / HTTP/1.1" 403 1240 "-" "-"
---.45.109.18 - - [30/Nov/2009:15:19:46 +0000] "GET / HTTP/1.1" 403 1240 "-" "-"
---.45.109.18 - - [30/Nov/2009:15:19:45 +0000] "GET / HTTP/1.1" 403 1240 "-" "-"
---.45.109.18 - - [30/Nov/2009:15:19:46 +0000] "GET / HTTP/1.1" 403 1240 "-" "-"
---.45.109.18 - - [30/Nov/2009:15:39:14 +0000] "GET / HTTP/1.1" 403 1240 "-" "-"
---.45.109.18 - - [30/Nov/2009:15:39:17 +0000] "GET / HTTP/1.1" 403 1240 "-" "-"
---.45.114.178 - - [30/Nov/2009:16:49:48 +0000] "GET / HTTP/1.1" 403 1241 "-" "-"
---.45.114.178 - - [30/Nov/2009:16:49:51 +0000] "GET / HTTP/1.1" 403 1241 "-" "-"
---.45.114.178 - - [30/Nov/2009:17:04:03 +0000] "GET / HTTP/1.1" 403 1241 "-" "-"
---.45.114.178 - - [30/Nov/2009:17:04:05 +0000] "GET / HTTP/1.1" 403 1241 "-" "-"
---.45.109.18 - - [09/Dec/2009:21:16:00 +0000] "GET / HTTP/1.1" 403 1240 "-" "-"
---.45.109.18 - - [09/Dec/2009:21:16:01 +0000] "GET / HTTP/1.1" 403 1240 "-" "-"
---.45.109.18 - - [10/Dec/2009:13:46:52 +0000] "GET / HTTP/1.1" 403 1240 "-" "-"
---.45.109.18 - - [10/Dec/2009:13:46:58 +0000] "GET / HTTP/1.1" 403 1240 "-" "-"
Sounds like you've already done what you can do.
If you are only getting one scraper from one hosting company, then you're quite lucky.
For the ones that hit you really hard, ask your host to block them at the firewall. If that's not an option, consider making your 403 error document very much shorter, or serving a different 403 error document to 'servers' than you do to browsers (e.g. use mod_rewrite based on the user-agent).
Jim
Maybe it's a site that links to you that is verifying that the page is still valid. Maybe it's an enthusiastic individual who has "subscribed" to the page and gets an notification whenever it changes. There are scads of other innocent possibilities. All you can do is develop a thick skin and try not to micromanage how every single person interacts with your site.