Forum Moderators: DixonJones
I block access of all libwww requests until I know who it is. This one has been very persistent for over a year. Hits several times a week, several attempts each time - following links from other themed sites. I can't find any definitive info on this. The IP is Korean. Anyone have anything else? Thanks.
(related threads [google.com])
We're seeing it on a fair number of very different sites - all with a valid referer address that does actually link to those pages.
;)
dcrombie - it's following links but I'm not quite sure if it's a legit link checking tool. My experience has been always with the same Korean IP, but all the referrers are different sites that have links to mine. It could be anything from an indexing agent building a directory - to - an email harvester. I can't seem to find out anything other than conjecture.
The behaviour of following _actual_ inbound links to actual pages, requesting sometimes just the HEAD, mirrors the behaviour of other link checking utilities. Or it could be some kind of research project - there are a few university projects of similar nature in that region.
You could block it on the basis that it doesn't fetch robots.txt, but otherwise I'd classify it as harmless.