Forum Moderators: DixonJones
I have put the offending IP address in my .htaccess so they can't actually get the content and waste my bandwidth. However I'm obviously still getting his from them (albeit resulting in 401s for them :D)
I noticed something odd though. The useragent is
"Mozilla/4.0 (compatible; MSIE 4.01; Windows 95; FreeFind/1.0 (spider@freefind.com))"
Which made me think "home user" again.
I note the IP on adsl-xx-yyy-xx-yyy.example.com looks like a home ADSL user. However the IP appears to be owned by
Free Findcom SBC063203065216030908 (NET-63-203-65-216-1)
Does this mean anything? Is it that FreeFind are spidering my site for someone else? Has someone downloaded a spidering tool from them and is using it at home?
Any ideas?
Cheers,
Al.
(first post, be gentle) :D
[edited by: heini at 11:31 am (utc) on Jan. 2, 2004]
[edit reason] Hi, lets not use real IPs please, thanks! [/edit]
As I recall, I don't think there is any reason why a freefind user couldn't use the tool on any site. Thinking about it, it would give the user an instant directory of pages if your site has relevent content. On the plus side, the only thing that would do is send traffic to your site though (although they would get an email every time the number of pages on your site changes) - If you see any traffic coming from freefind.com/find.html?id=lotsofmubers?, go and have a look at that and if necessary go and tell Freefind about it, they would presumably shut down the account for you.
Dixon.
Just Googling FreeFind [google.com] should tell you just about everything you might want to know.
Am I the only one who thinks this stinks?
Think about it, go to www.freefind.com and put in www.somewebsite.com, they then crawl that site - getting page after page every 3-10 seconds.
Surely thats against spider rules? I thought it was considered good practice to not visit your site more often than every 30-60 seconds or more.
Cheers,
Al.
My first rule? Most spiders have no rules and therefore must be controlled on our respective ends.
The premise with Bricks-n-Mortar is that the 'building/site' is inherantly hardened by the foundation, walls, etc., against low-level intrusions. In other words, you just can't go up to the building and handily open a man, or overhead door.
On the other hand, the premise on the Internet has always been you can go to a home page and gain entry to much of it's contents very, very easily.
In the Bricks-n-Mortar World, we call intruders thieves.
On the Internet (to a great degree), they're call spiders/bots/harvestors.
I have actually had a rather speedy set of replies from the guys who run freefind. They're very helpful and have suggested that their spider/bot does comply with robots.txt, and as such I've added
user-agent: FreeFind
disallow: /
to mine. They also were kind enough to kill the spider that was currently running on my site.
I'd like to take this opportunity to thank the guys at FreeFind publicly. That's the kind of service (as a non-paying non-customer) I'd like out of every company on the net.
Cheers,
Al.