homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Code, Content, and Presentation / Databases
Forum Library, Charter, Moderator: open

Databases Forum

Custom Spider/Scraper - Help!

5+ Year Member

Msg#: 3800983 posted 4:34 pm on Dec 5, 2008 (gmt 0)


Im new over here, so a small introduction. I am from Canada, Toronto, and run a small financially focused website.

The problem - many financial insitutions publish their data online, and update it on daily basis. There are over 60 institutions, and to follow each one is very challenging. I want to create a summary page with financial data from those institutions. Release a spider once a day, get their updates, and then post them all together on the website.

Obviosuly copy&paste is off the table since it takes at least 1.5 hour to go through all lenders and get their data. The only possible solution it seems is to set up a custom spider who will crawl specific fields (div tags, table cells), extract data and compile it into one file. The question is - do you know any software that is capable of doing this? I know there are plenty of scrapers out there, but the requirement for a spider is to be able to extract data from specified table cells and in some cases div tags.

I cant go to a data extraction company since they charge too much (do they?). Please let me know if you're aware of any applications that can match those requrements.

Any help guys! Thanks!



WebmasterWorld Administrator lifeinasia us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

Msg#: 3800983 posted 5:00 pm on Dec 5, 2008 (gmt 0)

I think the bigger problem is the legality of what you want to do. Do you have permission to republish their information?

If so, why don't you ask them for RSS feeds or some other way of having them deliver the data to you in a more easily usable format?


WebmasterWorld Senior Member 5+ Year Member

Msg#: 3800983 posted 4:36 pm on Dec 6, 2008 (gmt 0)

Sounds like this post should be in the Content, Writing, and Copyrighting forum.

And I agree w/ LifeInAsia... If you don't have permission to scrape these site and take their content, then you have much bigger issues with the law.

Global Options:
 top home search open messages active posts  

Home / Forums Index / Code, Content, and Presentation / Databases
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved