Welcome to WebmasterWorld Guest from 54.163.52.98

Forum Moderators: open

Message Too Old, No Replies

Custom Spider/Scraper - Help!

     

Ivan

4:34 pm on Dec 5, 2008 (gmt 0)

5+ Year Member



Hey!

Im new over here, so a small introduction. I am from Canada, Toronto, and run a small financially focused website.

The problem - many financial insitutions publish their data online, and update it on daily basis. There are over 60 institutions, and to follow each one is very challenging. I want to create a summary page with financial data from those institutions. Release a spider once a day, get their updates, and then post them all together on the website.

Obviosuly copy&paste is off the table since it takes at least 1.5 hour to go through all lenders and get their data. The only possible solution it seems is to set up a custom spider who will crawl specific fields (div tags, table cells), extract data and compile it into one file. The question is - do you know any software that is capable of doing this? I know there are plenty of scrapers out there, but the requirement for a spider is to be able to extract data from specified table cells and in some cases div tags.

I cant go to a data extraction company since they charge too much (do they?). Please let me know if you're aware of any applications that can match those requrements.

Any help guys! Thanks!

LifeinAsia

5:00 pm on Dec 5, 2008 (gmt 0)

WebmasterWorld Administrator lifeinasia is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



I think the bigger problem is the legality of what you want to do. Do you have permission to republish their information?

If so, why don't you ask them for RSS feeds or some other way of having them deliver the data to you in a more easily usable format?

ZydoSEO

4:36 pm on Dec 6, 2008 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



Sounds like this post should be in the Content, Writing, and Copyrighting forum.

And I agree w/ LifeInAsia... If you don't have permission to scrape these site and take their content, then you have much bigger issues with the law.

 

Featured Threads

Hot Threads This Week

Hot Threads This Month