Forum Moderators: open

Message Too Old, No Replies

Kalooga

         

keyplyr

10:13 pm on Apr 12, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Requested robots.txt where it is disallowed, then ignored it. Now banned.

195.210.57.124 - - [12/Apr/2008:06:33:27 -0400] "GET /robots.txt HTTP/1.0" 200 243 "-" "kalooga/kalooga-4.0-dev-datahouse (Kalooga; http://www.kalooga.com; info@kalooga.com)"

incrediBILL

10:27 pm on Apr 12, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



It's also been spotted since '06 as:

"kalooga/Nutch-0.8.1"
"kalooga/Nutch-0.9"

Using all sorts of IPs:

82.150.138.*
85.17.184.*
193.138.250.*
195.210.57.124 (current)
213.132.171.*
213.132.175.*

Hobbs

11:07 pm on Apr 12, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks Bill & keyplyr

195.210.56.0/23
82.150.138.0/24
85.17.184.0/24
193.138.248.0/22
213.132.171.0/24
213.132.175.0/29

all in the Netherlands

I'm still investigating why this UA and IP got 403 on all its 85 requests although neither is blocked fully or partially in my htaccess :-)

lammert

2:31 am on Apr 13, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



all in the Netherlands

This bot is from a Dutch company building an image search engine and they are specifically targetting manually managed image galleries.

piran

11:44 am on Apr 27, 2008 (gmt 0)

10+ Year Member



lammert----
Please elaborate on "specifically targetting manually managed image galleries". I was unable to derive any such helpful information directly from the (apparent) source. Before I can whitelist them for my (dynamic) robots.txt file I must investigate the provenance of their operation, directives and aims.
----best wishes, Robert

incrediBILL

12:40 am on Apr 28, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Funny that this thread popped back to the top as I noticed that Kalooga even has reverse DNS installed making it easy to validate their crawler.

195.210.57.124 -> ge0-v1017.cr1.sig-gro.nl.kalooga.com.

They host on Zylon:

inetnum: 195.210.56.0 - 195.210.57.255
descr: Zylon Internet Services VOF

keyplyr

7:18 am on Apr 28, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Unlike other image searches that may cache images, then display a link to the source, Kalooga hot-links to our images and includes no hyperlink to our web site. Very self-serving and bad netiquette IMO.

piran

7:29 am on Apr 28, 2008 (gmt 0)

10+ Year Member



incrediBILL----
Thank you but I had already derived every scrap of the information
you kindly appended. I reiterate my request for further elaboration:
"specifically targetting manually managed image galleries".
----best wishes, Robert

piran

7:34 am on Apr 28, 2008 (gmt 0)

10+ Year Member



keyplyr----
Interesting background data - I will be on the lookout 'should' I
ever release the proper contents of my dynamic robots.txt file
to that organisation. Meanwhile they will be receiving just what
all other uncooperative bots initially get (disallow everything)
which, on undue persistence, is followed by a masquerading block.
----best wishes, Robert

[PostEdit: minor typos]

incrediBILL

8:39 am on Apr 28, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thank you but I had already derived every scrap of the information
you kindly appended.

It was for the benefit of others that didn't know ;)

lammert

9:01 am on Apr 28, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



piran,
Consider yourself lucky that I reply to this topic because I have already packed my suitcase to leave for a far destination and posted a farewell in the community forum here, but because you are specifically asking me for an explanation about the target of this bot I am willing to reply.

The Dutch FAQ of Kalooga states (translated and paraphrased):

Kalooga collects galleries compiled by passionate people (...) Unlike other search engines (Google, Yahoo) we do not collect individual images but galleries because people compile them with care, which delivers higher quality.

piran

9:21 am on Apr 28, 2008 (gmt 0)

10+ Year Member



lammert----
Noted... I am obliged. Yes, I do fit their remit
but I must judge their methods directly myself.
I used to be an expatriate employed overseas.
Good luck with your travels and may your
greetings exceed your farewells.
----best wishes, Robert

Pigeon

4:41 pm on Jun 1, 2008 (gmt 0)

10+ Year Member



Thanks Hobbs for that comprehensive list of IP ranges. This bot is a frequent nuisance and I have no interest in being crawled by a search engine that is not free for unrestricted public use (www.kalooga.com redirects to a login page). IPs now blocked at firewall.