Forum Moderators: open

Message Too Old, No Replies

Hit-and-run spidering?

         

jomaxx

5:20 pm on Aug 19, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



In the past couple of months I've been hit by numerous cases where a user comes in, clicks every link on the page within a few seconds, and then leaves again. In other words, they download all the linked pages but do not follow the links on those pages.

They never download images. The referrer id tends to be empty. The user agent varies but is always innocuous (Mozilla/Windows). A wide range of IP's are used, but they tend to trace back to Nigeria or to ISP's with names like "New Skies Satellites N.V." or "GILAT-SATCOM".

Anybody else seeing this or have any idea what this could be?

killroy

5:31 pm on Aug 19, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Been seing th esame thing. One off spiderings with generic UAs containing standard identifiers. Various geographical sources.

Could this be a malicious attempt to steal, repackage and use content for short term profits?

SN

dmorison

5:48 pm on Aug 19, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Could they be from users browsing with web accelerator software installed...

It was all the rage a couple of years ago; but I admit I've not seen them advertised recently. They're the sort of thing that came on magazine cover CD's.

They worked as a local caching proxy; If you visit a page they start downloading all linked pages for you in the background whilst you're reading the first page.

jomaxx

6:18 pm on Aug 19, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It's never connected to an actual user session on the same IP address, but that's why I mentioned the odd satellite/Nigeria connection; I wonder if some satellite-connection ISP could be forward-cacheing pages for its users.

mcavic

8:30 pm on Aug 24, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



For the past several months, I've had a machine hitting my home page *every day*, and then following all of the links. Always from the same IP, same UA, etc.

What makes it strange is that it comes in from a Looksmart SERP as if it was a user searching for my domain name. Also, I used to have a GET form on my home page, and it even submitted the form each time.

dhcp065-025-124-121.neo.rr.com - - [24/Aug/2003:14:33:55 -0500] "GET / HTTP/1.1" 200 2392 "http*//rr.looksmart.com/r_se
arch?l&iacs&key=www.+domain.com&search=0" "Mozilla/4.0 (compatible; MSIE 6.0; Windows 98; Win 9x 4.90; H010818;
.NET CLR 1.0.3705)"

It could be someone that has my page on auto-synchronize, but then why would it have submitted the form?

dmorison

9:19 pm on Aug 24, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It could be someone that has my page on auto-synchronize, but then why would it have submitted the form?

I think there's a broken proxy server (MS, Inktomi) in the wild that under the right set of conditions will begin repeat requests over and over again; at regular intervals for weeks/months/whatever.

I have a whole list of IP's that are firewalled because something originating from them, and I have no idea what other than the theory above, went mad.