Welcome to WebmasterWorld Guest from 54.198.100.0

Forum Moderators: Ocean10000 & incrediBILL & keyplyr

Message Too Old, No Replies

Scraper Test Drive

Open Discussion of Scraper Tools and Success Rates

     
8:58 pm on Aug 29, 2013 (gmt 0)

Administrator from US 

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 25, 2005
posts:14650
votes: 94


Someone from Distil networks posted a series of scraper test drives using various tools and techniques against their technology.

Scrape Bot Protection Test:
[extract-web-data.com...]

You can obviously use their testing methodologies to validate your protection against bots.

It's possible you find holes in your methods and either decide to switch to a service like theirs or get a better script for your own hosting.

Scraping through a CAPTCHA:
[extract-web-data.com...]

Couple of other related scraper testing posts also worth a read.

I found it interesting to say the least ;)

DISCLOSURE: I'm not related to or have any personal interest in the site, service or links posted as it's presented here strictly for educational purposes.
11:35 am on Aug 30, 2013 (gmt 0)

Senior Member

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2001
posts:5459
votes: 3


Bill,
Personally, I'd be leery of running an outside product against my sites and/or htaccess. It may not fly anyway and I'd likely have to make exceptions for access.

This is surely what could be considered a 3rd party product and why would I/We invite the possibility of this org using creative solutions to increase their profits?

Don
3:41 am on Aug 31, 2013 (gmt 0)

Administrator from US 

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 25, 2005
posts:14650
votes: 94


The guy that wrote the article actually if from a hosting company that claims to have content protection built into their basic service. They aren't the only ones doing combined content protection and hosting as it's becoming somewhat of a trendy thing.

However, I was more interested in all the tools and methods he used to test their service, including the CAPTCHA blow through services.

I use a few programs to attack my own sites every now and then just to see how well they stand up and found some of his methods interesting as well.

Going to see how his stuff measures up to mine, should be amusing.