Welcome to WebmasterWorld Guest from 54.226.62.26

Forum Moderators: Ocean10000 & incrediBILL

Message Too Old, No Replies

PromptCloud bot

Web Extraction, crawling and scraping service

     

AnnieRogers

3:31 pm on Jul 30, 2013 (gmt 0)



Have you guys seen PromptCloud? <snip>


Anyone had any experience tracking and blocking it yet?

[edited by: incrediBILL at 6:03 pm (utc) on Aug 1, 2013]
[edit reason] Removed URL. No self-promo URLs please [/edit]

incrediBILL

6:08 pm on Aug 1, 2013 (gmt 0)

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Hi Annie and thanks for letting us know about this bot.

I haven't seen it yet and have a few questions that would be great if you could answer them.

1. What's the User Agent String?
2. What's the IP range it operates from?
3. Does it honor robots.txt?
4. Is there a page on your site describing your bot? Normally there is a bot page, typically a link is provided in the User Agent string of the bot for webmasters to follow. I looked all over your site and couldn't find any reference.

Please let us know about these items at your convenience.

Thanks!
Bill

wilderness

7:53 pm on Aug 1, 2013 (gmt 0)

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



FWIW, their domain sub-hosts with BlueHosts, which presents no issue from the regulars here.

PromptCloud opeartes on “Data as a Service” (DaaS) model and deals with large-scale data crawl and extraction, using cutting edge technologies and cloud computing solutions (Nutch, Hadoop, Lucene, Cassandra, etc). These data could be from reviews, blogs, product catalogs, social sites, travel data- basically anything and everything on WWW, and can be useful across all verticals- Market research, travel, Comparison shopping, deal aggregation, reputation management and more.

bhukkel

8:19 pm on Aug 1, 2013 (gmt 0)



My stats are showing that their website is hosted on a shared hosting platform together with 450 other websites...So crawling would be done from other IPs.
 

Featured Threads

Hot Threads This Week

Hot Threads This Month