Forum Moderators: open

Message Too Old, No Replies

BigPanda

new search site crawler

         

Mokita

1:26 am on Jan 30, 2011 (gmt 0)

10+ Year Member



Just found this in my logs:

User Agent: BigPandaExplorer/0.5 (http://www. bigpanda. com.au; crawl-team@bigpanda.com.au)
<spaces inserted to break link>

Asked for (and appears to have respected) robots.txt

There is nothing much to see at the URL except a form, but Facebook yields this info:

BigPanda, the new shopping search site for Australia - coming soon!

dstiles

10:20 pm on Jan 30, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Any idea of the IP, Mokita?

Mokita

10:36 pm on Jan 30, 2011 (gmt 0)

10+ Year Member



It has visited four times in January, first was from 124.150.41.120, the rest from 96.9.169.218.

dstiles

9:51 pm on Jan 31, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The 124 one is an Australian broadband line so I doubt that is valid unless the bot is distributed.

The 96 IP belongs to US server farm Hostnoc so again it's looking dubious.

I can see a bot being run from a broadband line if the service is small and entirely new, although to retain the supplier's T&C I would have thought it would need to be a restricted bot scan.

The URL in the bot's UA is rejected here by firefox as being an auto-redirect to another page. I'm not happy about clicking through on that. :)

The web site's IP is in USA: 50.16.199.nn which is Amazon (including Cloud) (IP range is 50.16.0.0 - 50.19.255.255

Mokita

11:38 pm on Jan 31, 2011 (gmt 0)

10+ Year Member



I think you are being overly cautious in this instance.

First I saw of the bot was when it came from the AU ISP, in early January. That was most likely a test run.

Thereafter it moved to the host farm - so they either knew they were violating TOS or were warned. It still hasn't launched, so appears to be in the testing/ gathering phase.

The website redirects to a benign "listserv" page hosted by MailChimp. It is just a form where you can register to receive news about their eventual launch:

Thanks for your interest in our little part of the web.

If you want to be kept informed when we launch, you can signup using form below.

We hate spam as much as you do, so this information will only be used when we launch.


It also has links to a twitter account and Facebook (which is where I found the quote in my OP).

So far, it is fine by me - it respects robots.txt (I whitelist) and that is all that matters.

dstiles

7:30 pm on Feb 1, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If/when it goes legit and has a proper IP I will consider listing it - I have "customers" in the antipodes. If it hits me from a dynamic IP it will be blocked. Amazon is blocked anyway.