homepage Welcome to WebmasterWorld Guest from 50.17.107.233
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Accredited PayPal World Seller

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
PromptCloud bot
Web Extraction, crawling and scraping service
AnnieRogers




msg:4597693
 3:31 pm on Jul 30, 2013 (gmt 0)

Have you guys seen PromptCloud? <snip>


Anyone had any experience tracking and blocking it yet?

[edited by: incrediBILL at 6:03 pm (utc) on Aug 1, 2013]
[edit reason] Removed URL. No self-promo URLs please [/edit]

 

incrediBILL




msg:4598486
 6:08 pm on Aug 1, 2013 (gmt 0)

Hi Annie and thanks for letting us know about this bot.

I haven't seen it yet and have a few questions that would be great if you could answer them.

1. What's the User Agent String?
2. What's the IP range it operates from?
3. Does it honor robots.txt?
4. Is there a page on your site describing your bot? Normally there is a bot page, typically a link is provided in the User Agent string of the bot for webmasters to follow. I looked all over your site and couldn't find any reference.

Please let us know about these items at your convenience.

Thanks!
Bill

wilderness




msg:4598502
 7:53 pm on Aug 1, 2013 (gmt 0)

FWIW, their domain sub-hosts with BlueHosts, which presents no issue from the regulars here.

PromptCloud opeartes on “Data as a Service” (DaaS) model and deals with large-scale data crawl and extraction, using cutting edge technologies and cloud computing solutions (Nutch, Hadoop, Lucene, Cassandra, etc). These data could be from reviews, blogs, product catalogs, social sites, travel data- basically anything and everything on WWW, and can be useful across all verticals- Market research, travel, Comparison shopping, deal aggregation, reputation management and more.

bhukkel




msg:4598504
 8:19 pm on Aug 1, 2013 (gmt 0)

My stats are showing that their website is hosted on a shared hosting platform together with 450 other websites...So crawling would be done from other IPs.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved