homepage Welcome to WebmasterWorld Guest from 54.166.14.218
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
Scraper Test Drive
Open Discussion of Scraper Tools and Success Rates
incrediBILL

WebmasterWorld Administrator incredibill us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 4605894 posted 8:58 pm on Aug 29, 2013 (gmt 0)

Someone from Distil networks posted a series of scraper test drives using various tools and techniques against their technology.

Scrape Bot Protection Test:
[extract-web-data.com...]

You can obviously use their testing methodologies to validate your protection against bots.

It's possible you find holes in your methods and either decide to switch to a service like theirs or get a better script for your own hosting.

Scraping through a CAPTCHA:
[extract-web-data.com...]

Couple of other related scraper testing posts also worth a read.

I found it interesting to say the least ;)

DISCLOSURE: I'm not related to or have any personal interest in the site, service or links posted as it's presented here strictly for educational purposes.

 

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4605894 posted 11:35 am on Aug 30, 2013 (gmt 0)

Bill,
Personally, I'd be leery of running an outside product against my sites and/or htaccess. It may not fly anyway and I'd likely have to make exceptions for access.

This is surely what could be considered a 3rd party product and why would I/We invite the possibility of this org using creative solutions to increase their profits?

Don

incrediBILL

WebmasterWorld Administrator incredibill us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 4605894 posted 3:41 am on Aug 31, 2013 (gmt 0)

The guy that wrote the article actually if from a hosting company that claims to have content protection built into their basic service. They aren't the only ones doing combined content protection and hosting as it's becoming somewhat of a trendy thing.

However, I was more interested in all the tools and methods he used to test their service, including the CAPTCHA blow through services.

I use a few programs to attack my own sites every now and then just to see how well they stand up and found some of his methods interesting as well.

Going to see how his stuff measures up to mine, should be amusing.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved