Welcome to WebmasterWorld Guest from

Forum Moderators: phranque

Message Too Old, No Replies

Want to "data mine" my own site

Need HTML parsed into database

8:43 pm on May 10, 2003 (gmt 0)

Junior Member

10+ Year Member

joined:Jan 4, 2003
votes: 0

I have a directory on my site that is edited by hand on static HTML pages.

It started small about three years ago but now it's too big for one person to keep up with. I get submissions every day now and need to get this converted to something more manageable.

I went and downloaded one of those "data miner" programs that are supposed to be so obnoxious but I figure if I'm using it on my own site only then that's not so rude.

But, before I install and try to figure out how to use this thing, is this even what I need to do?

3:11 pm on May 11, 2003 (gmt 0)

Administrator from US 

WebmasterWorld Administrator brett_tabke is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 21, 1999
votes: 12

Sounds like you need a custom app becuase I don't think any off-the-shelf program is going to do what you want.
3:40 pm on May 11, 2003 (gmt 0)

Junior Member

10+ Year Member

joined:Jan 4, 2003
votes: 0

I don't think my post was very clear.

The directory now has 52 pages. The index page has static links to the 51 "category" pages. The "categories" are the 50 states plus the District of Columbia.

I need to pull the listings into a database or spreadsheet of some type. There are only three parts to each listing. Here is a URL removed snippet of the code:

<li><a href="http://www.somesite.org/">Some Site</a> - Some City</li>
<li><a href="http://www.anothersite/">Another Site</a> - Another City</li>
<li><a href="http://www.blahblah/">Blah Blah Site</a> - Blah Town</li>
<li><a href="http://yougettheidea.com">You Get the idea</a> - City</li>

I would also need it to pull the state from the page title or somewhere since it's not a part of each individual listing.

Of course, I've no idea what to do with the database once I have it. I figure one thing at a time. Maybe editing static pages is the best way for me to go since I don't know what I'm doing.

4:57 am on May 14, 2003 (gmt 0)

New User

10+ Year Member

joined:Feb 9, 2003
votes: 0

I've not tried this myself as I don't use the program much, but I noticed that when you go to File -> Open in Excel, the drop down box allowing you to choose what type of file is listed includes "Web Pages and Web Archives". I guess this means it can import HTML files into a spreadsheet of some sort - might that do what you want?
8:06 am on May 14, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 1, 2002
votes: 0

If you want something to manage all these pages once you get them mined, send me a PM- I know a great little free program that would probably make keeping up with it all really easy! It's just the first set up that is going to be a pain!


8:59 pm on May 14, 2003 (gmt 0)

Full Member

10+ Year Member

joined:Apr 27, 2003
votes: 0

Believe me, for what you are asking, a custom job would be the best route with doing php and mysql management. Go to the php forum they will really help you out there.