Welcome to WebmasterWorld Guest from

Forum Moderators: phranque

Message Too Old, No Replies

Want to "data mine" my own site

Need HTML parsed into database



8:43 pm on May 10, 2003 (gmt 0)

10+ Year Member

I have a directory on my site that is edited by hand on static HTML pages.

It started small about three years ago but now it's too big for one person to keep up with. I get submissions every day now and need to get this converted to something more manageable.

I went and downloaded one of those "data miner" programs that are supposed to be so obnoxious but I figure if I'm using it on my own site only then that's not so rude.

But, before I install and try to figure out how to use this thing, is this even what I need to do?


3:11 pm on May 11, 2003 (gmt 0)

WebmasterWorld Administrator brett_tabke is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

Sounds like you need a custom app becuase I don't think any off-the-shelf program is going to do what you want.


3:40 pm on May 11, 2003 (gmt 0)

10+ Year Member

I don't think my post was very clear.

The directory now has 52 pages. The index page has static links to the 51 "category" pages. The "categories" are the 50 states plus the District of Columbia.

I need to pull the listings into a database or spreadsheet of some type. There are only three parts to each listing. Here is a URL removed snippet of the code:

<li><a href="http://www.somesite.org/">Some Site</a> - Some City</li>
<li><a href="http://www.anothersite/">Another Site</a> - Another City</li>
<li><a href="http://www.blahblah/">Blah Blah Site</a> - Blah Town</li>
<li><a href="http://yougettheidea.com">You Get the idea</a> - City</li>

I would also need it to pull the state from the page title or somewhere since it's not a part of each individual listing.

Of course, I've no idea what to do with the database once I have it. I figure one thing at a time. Maybe editing static pages is the best way for me to go since I don't know what I'm doing.


4:57 am on May 14, 2003 (gmt 0)

10+ Year Member

I've not tried this myself as I don't use the program much, but I noticed that when you go to File -> Open in Excel, the drop down box allowing you to choose what type of file is listed includes "Web Pages and Web Archives". I guess this means it can import HTML files into a spreadsheet of some sort - might that do what you want?


8:06 am on May 14, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member

If you want something to manage all these pages once you get them mined, send me a PM- I know a great little free program that would probably make keeping up with it all really easy! It's just the first set up that is going to be a pain!



8:59 pm on May 14, 2003 (gmt 0)

10+ Year Member

Believe me, for what you are asking, a custom job would be the best route with doing php and mysql management. Go to the php forum they will really help you out there.

Featured Threads

Hot Threads This Week

Hot Threads This Month