Forum Moderators: open
I started noticing alot of 404 errors (thousands) from this address. I looked at the log and this person was scraping jpegs from my site without going thru the web page that links to them.
There is no robot listed in the log entry each time, just standard looking PC. He tried sequentially grabbing jpegs. My web page generator renames pix as a sequence# and increments it. And EVERY request for a jpeg was OK but it generated a 404 page get also.
Is this some kind of home brewed robot? or some page accelerator? I'd think some 'get site' would show as going thru normal web pages.
And funny as it sounds, the scraper failed to get additional pages that weren't in decimal sequence. (my web page generator doesn't count in decimal). A page grabber package would have done a better job. Has this guy been plaguing anyone else? or is it just my content he was trying to grab. The # resolved to a huge block of ISP address (Verizon, in GA I think ) so I couldn't ban them all.
The # resolved to a huge block of ISP address (Verizon, in GA I think ) so I couldn't ban them all.
the 0-63 Class C of this IP all belongs to the same back-bone provider.
1) Determine if you derive any other traffic from the
back-bone range?
a) If you don't get traffic from back-bones ranges than you
alone must decide what is beneficial or detrimental to
your own website (s)
2) Perhaps you may be able to focus on the back-bones sub-net
ranges (however, from the few I looked at, most of the
sub-net ranges are operated by the back-bone as well)
NOT VERIZON
Is this some kind of home brewed robot? or some page accelerator? I'd think some 'get site' would show as going thru normal web pages.
You haven't provided a UA line from your logs?
Is there a software named in the UA that is image specific.
If your unable to deny utilzing a well known harvesting software that is contained within the UA?
You may be able to use multiple critera (the IP range and a portion of the UA) to reduce the number of innocents.
Don
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2
.0.50727; .NET CLR 3.0.04506.30)"
was what was in the log records for all of them right after the referring page.
# Ends with and comes from Class C or 31
RewriteCond %{HTTP_USER_AGENT} 30)$
RewriteCond %{REMOTE_ADDR} ^216\.237\.31\.
RewriteRule .* - [F]
the 4th paragraph in this link provides and example of the "Combined logs", field data:
[httpd.apache.org...]