Forum Moderators: open
A quick whois-dot-sc check shows the same Toronto, Ontario-based company is still portaljuice.com and nextopia.com.
FWIW, I have no data on pjspider, but I last saw NextopiaBOT in Jan., Feb., and April of 2004 -- and even then, no robots.txt calls -- running out of similar "toronto-hse-pppXXXXXXX.sympatico.ca" addresses, with this UA:
"NextopiaBOT (+http://www.nextopia.com) distributed crawler client beta v0.8"
Slow ramp-up, eh? :)
No clue where the data goes, or if it's analyzed or sold -- or both, because the company offers a range of search-related products and services.
Regardless, since respecting robots.txt doesn't appear to be part of their repertoire, I block their bots.
On the otherhand, that page has enough info to make me want to keep it off my playground. :)
The fact that it went for content without even looking for a robots.txt file certainly speaks volume through.