Web Extraction, crawling and scraping service
| 3:31 pm on Jul 30, 2013 (gmt 0)|
Have you guys seen PromptCloud? <snip>
Anyone had any experience tracking and blocking it yet?
[edited by: incrediBILL at 6:03 pm (utc) on Aug 1, 2013]
[edit reason] Removed URL. No self-promo URLs please [/edit]
| 6:08 pm on Aug 1, 2013 (gmt 0)|
Hi Annie and thanks for letting us know about this bot.
I haven't seen it yet and have a few questions that would be great if you could answer them.
1. What's the User Agent String?
2. What's the IP range it operates from?
3. Does it honor robots.txt?
4. Is there a page on your site describing your bot? Normally there is a bot page, typically a link is provided in the User Agent string of the bot for webmasters to follow. I looked all over your site and couldn't find any reference.
Please let us know about these items at your convenience.
| 7:53 pm on Aug 1, 2013 (gmt 0)|
FWIW, their domain sub-hosts with BlueHosts, which presents no issue from the regulars here.
PromptCloud opeartes on “Data as a Service” (DaaS) model and deals with large-scale data crawl and extraction, using cutting edge technologies and cloud computing solutions (Nutch, Hadoop, Lucene, Cassandra, etc). These data could be from reviews, blogs, product catalogs, social sites, travel data- basically anything and everything on WWW, and can be useful across all verticals- Market research, travel, Comparison shopping, deal aggregation, reputation management and more.
| 8:19 pm on Aug 1, 2013 (gmt 0)|
My stats are showing that their website is hosted on a shared hosting platform together with 450 other websites...So crawling would be done from other IPs.