Forum Moderators: DixonJones

Message Too Old, No Replies

Got a user agent of "-" pulling my RSS feed once/minute

Any ideas what it might be?

         

trillianjedi

12:33 pm on May 27, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Would a valid feedreader give a blank UA?

Its pulling quite rapidly, which isn't a problem in itself, just don't want to block something that's useful traffic wise.

Any thoughts?

jdMorgan

2:33 pm on May 27, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Where does the IP address lead?

Jim

trillianjedi

2:37 pm on May 27, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi Jim,

Belgium, but not much info beyond that. Could be from a server somewhere or private.

jdMorgan

5:50 pm on May 27, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Did you do an rDNS lookup on it and also ping and tracert it?

A full rDNS lookup will sometimes tell you more, and if it pings and tracert shows routers past the ISP, then it's even more likely to be a server.

Frankly, I block bandwidth hogs first and ask questions later... :)

Jim

trillianjedi

6:31 pm on May 27, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks I'll try that.

I'm not worried about bandwidth - as long as it's sending me some traffic in return. Gut instinct says it's just a scraping crawler though....

trillianjedi

6:49 pm on May 27, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



rDNS was a good tip Jim - thanks. Looked it up, led me to a website which I didn't much like the look of ;)

Only issue now, and perhaps I should take this to the apache forum, buy my .htaccess blocking doesn't seem to be working:-

order allow,deny

deny from 1.2.3.4

allow from all

Where 1.2.3.4 is the offending IP. Something obvious that I got wrong? I'm sure the allow,deny is around the right way?

Thanks!

jdMorgan

7:32 pm on May 27, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> but my .htaccess blocking doesn't seem to be working

Based upon what evidence, specifically?

Nothing wrong with your code on its face, but I find the following construct to be far more useful:


SetEnvIf Request_URI "/(custom_403\.html¦robots\.txt)$" allowit
SetEnvIfNoCase User-agent "grabber" getout
SetEnvIfNoCase Referer "iaea\.org" getout
#
Order Deny,Allow
#
Deny from 127.0.0.199
Deny from 38.0.0.0/8
Deny from env=getout
Allow from env=allowit

This denies access by 127.0.0.199 and 38.x.x.x and the example referer and user-agent unless the request is for robots.txt or for the custom 403 error page. This latter exclusion is required if you use a custom 403 page; Without it, you'll get an 'infinite' loop on 403 responses. I also allow anyone to fetch robots.txt even if they're banned, because many robots interpret any error on robots.txt fetch attempts to mean "access to all is allowed." And basically, I give them fair warning they're not welcome in robots.txt.

Replace the broken pipe "¦" character above with a solid pipe before use; Posting on this forum modifies the pipe character.

Jim

trillianjedi

7:37 pm on May 27, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Based upon what evidence, specifically?

Tailing the logfile, but you're right - just realised, it will still log the attempt ;)

Thanks for the updated code - I'll test that this evening. I won't post in here anything else about .htaccess though (apologies for dragging the forum OT).

Thanks Jim.