homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

Scraper Test Drive
Open Discussion of Scraper Tools and Success Rates

 8:58 pm on Aug 29, 2013 (gmt 0)

Someone from Distil networks posted a series of scraper test drives using various tools and techniques against their technology.

Scrape Bot Protection Test:

You can obviously use their testing methodologies to validate your protection against bots.

It's possible you find holes in your methods and either decide to switch to a service like theirs or get a better script for your own hosting.

Scraping through a CAPTCHA:

Couple of other related scraper testing posts also worth a read.

I found it interesting to say the least ;)

DISCLOSURE: I'm not related to or have any personal interest in the site, service or links posted as it's presented here strictly for educational purposes.



 11:35 am on Aug 30, 2013 (gmt 0)

Personally, I'd be leery of running an outside product against my sites and/or htaccess. It may not fly anyway and I'd likely have to make exceptions for access.

This is surely what could be considered a 3rd party product and why would I/We invite the possibility of this org using creative solutions to increase their profits?



 3:41 am on Aug 31, 2013 (gmt 0)

The guy that wrote the article actually if from a hosting company that claims to have content protection built into their basic service. They aren't the only ones doing combined content protection and hosting as it's becoming somewhat of a trendy thing.

However, I was more interested in all the tools and methods he used to test their service, including the CAPTCHA blow through services.

I use a few programs to attack my own sites every now and then just to see how well they stand up and found some of his methods interesting as well.

Going to see how his stuff measures up to mine, should be amusing.

Global Options:
 top home search open messages active posts  

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved