FreeFind spidering - Website Analytics - Tracking and Logging forum at WebmasterWorld - WebmasterWorld

Forum Moderators: DixonJones

Message Too Old, No Replies

FreeFind spidering

popey

12:29 pm on Dec 31, 2003 (gmt 0)

10+ Year Member

I notice today that I am getting a host which is spidering my entire site. Its not being nice at all, just getting every single page with no delay between gets.

I have put the offending IP address in my .htaccess so they can't actually get the content and waste my bandwidth. However I'm obviously still getting his from them (albeit resulting in 401s for them :D)

I noticed something odd though. The useragent is

"Mozilla/4.0 (compatible; MSIE 4.01; Windows 95; FreeFind/1.0 (spider@freefind.com))"

Which made me think "home user" again.

I note the IP on adsl-xx-yyy-xx-yyy.example.com looks like a home ADSL user. However the IP appears to be owned by

Free Findcom SBC063203065216030908 (NET-63-203-65-216-1)

Does this mean anything? Is it that FreeFind are spidering my site for someone else? Has someone downloaded a spidering tool from them and is using it at home?

Any ideas?
Cheers,
Al.
(first post, be gentle) :D

[edited by: heini at 11:31 am (utc) on Jan. 2, 2004]
[edit reason] Hi, lets not use real IPs please, thanks! [/edit]

Receptional

1:11 pm on Dec 31, 2003 (gmt 0)

I honestly wouldn't be surprised if teh freefind spider is still working on a win95 machine. To me, Freefind is a great little product which has been working well for years, so they probably never bothered to upgrade it.

As I recall, I don't think there is any reason why a freefind user couldn't use the tool on any site. Thinking about it, it would give the user an instant directory of pages if your site has relevent content. On the plus side, the only thing that would do is send traffic to your site though (although they would get an email every time the number of pages on your site changes) - If you see any traffic coming from freefind.com/find.html?id=lotsofmubers?, go and have a look at that and if necessary go and tell Freefind about it, they would presumably shut down the account for you.

Dixon.

pendanticist

2:46 pm on Dec 31, 2003 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Welcome to WebmasterWorld popey. :)

Just Googling FreeFind [google.com] should tell you just about everything you might want to know.

popey

3:45 pm on Dec 31, 2003 (gmt 0)

10+ Year Member

Thanks.

Am I the only one who thinks this stinks?

Think about it, go to www.freefind.com and put in www.somewebsite.com, they then crawl that site - getting page after page every 3-10 seconds.

Surely thats against spider rules? I thought it was considered good practice to not visit your site more often than every 30-60 seconds or more.

Cheers,
Al.

pendanticist

10:33 am on Jan 2, 2004 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Against our rules perhaps.

My first rule? Most spiders have no rules and therefore must be controlled on our respective ends.

The premise with Bricks-n-Mortar is that the 'building/site' is inherantly hardened by the foundation, walls, etc., against low-level intrusions. In other words, you just can't go up to the building and handily open a man, or overhead door.

On the other hand, the premise on the Internet has always been you can go to a home page and gain entry to much of it's contents very, very easily.

In the Bricks-n-Mortar World, we call intruders thieves.

On the Internet (to a great degree), they're call spiders/bots/harvestors.

popey

11:17 am on Jan 2, 2004 (gmt 0)

10+ Year Member

Good point.

I have actually had a rather speedy set of replies from the guys who run freefind. They're very helpful and have suggested that their spider/bot does comply with robots.txt, and as such I've added

user-agent: FreeFind
disallow: /

to mine. They also were kind enough to kill the spider that was currently running on my site.

I'd like to take this opportunity to thank the guys at FreeFind publicly. That's the kind of service (as a non-paying non-customer) I'd like out of every company on the net.

Cheers,
Al.