Forum Moderators: phranque

Message Too Old, No Replies

Screen scraper, HTML parser, or custom spider tools

What are the free or pay options?

         

sun818

1:30 am on Oct 4, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I'm looking to expand the widgets I sell online. I found a widget supplier who gave me permission to copy their product description. They already provide the other information like title, price, and inventory. But not the description for whatever reason.

Anyway, I looked at a tool called Web Visual Task ($149) which is a programmable spider that will crawl a page and write the data you need into a text file or database. Are there other tools available that can do something similar, possibly for free? I looked at hotscripts and other script respositories. They don't what I want or its not flexible enough. I'm not a programmer so I can't roll my own from scratch. But if the program has a "framework" to program from I can learn that. I just hate the thought of having to copy and paste new product descriptions every day for the rest of my life. ;)

Storyteller

12:05 am on Oct 5, 2003 (gmt 0)

10+ Year Member



If an $150 software does what you need, I'd suggest you buy it since a custom developed crawler will cost 1.5-2 times as much (which I wrote many; if you need one, stickymail me).

If the program can't do it, and you're willing to learn some Perl and do it yourself, there're modules that can facilitate your task to a significant extent. Look for LWP, WWW::Mechanize and HTML::TreeBuilder.

sun818

3:35 am on Oct 6, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



After a full day of searching, I found DB Maker, a good alternative to the one I mentioned above. It doesn't have any command line options, but I can live with that. Its only $49! I can use an open source web crawler to download the files, then I can use this to parse the data and write it out to a CSV file. This is going to save me from so much grunt work! :)