Welcome to WebmasterWorld Guest from 220.127.116.11 , register , free tools , login , search , subscribe , help , library , announcements , recent posts , open posts Pubcon Website
Want to "data mine" my own site Need HTML parsed into database aroach msg:353385 8:43 pm on May 10, 2003 (gmt 0) I have a directory on my site that is edited by hand on static HTML pages.
It started small about three years ago but now it's too big for one person to keep up with. I get submissions every day now and need to get this converted to something more manageable.
I went and downloaded one of those "data miner" programs that are supposed to be so obnoxious but I figure if I'm using it on my own site only then that's not so rude.
But, before I install and try to figure out how to use this thing, is this even what I need to do?
Brett_Tabke msg:353386 3:11 pm on May 11, 2003 (gmt 0)
Sounds like you need a custom app becuase I don't think any off-the-shelf program is going to do what you want. aroach msg:353387 3:40 pm on May 11, 2003 (gmt 0)
I don't think my post was very clear.
The directory now has 52 pages. The index page has static links to the 51 "category" pages. The "categories" are the 50 states plus the District of Columbia.
I need to pull the listings into a database or spreadsheet of some type. There are only three parts to each listing. Here is a URL removed snippet of the code:
<li><a href="http://www.somesite.org/">Some Site</a> - Some City</li>
<li><a href="http://www.anothersite/">Another Site</a> - Another City</li> <li><a href="http://www.blahblah/">Blah Blah Site</a> - Blah Town</li> <li><a href="http://yougettheidea.com">You Get the idea</a> - City</li>
I would also need it to pull the state from the page title or somewhere since it's not a part of each individual listing.
Of course, I've no idea what to do with the database once I have it. I figure one thing at a time. Maybe editing static pages is the best way for me to go since I don't know what I'm doing.
mischief msg:353388 4:57 am on May 14, 2003 (gmt 0)
I've not tried this myself as I don't use the program much, but I noticed that when you go to File -> Open in Excel, the drop down box allowing you to choose what type of file is listed includes "Web Pages and Web Archives". I guess this means it can import HTML files into a spreadsheet of some sort - might that do what you want? carfac msg:353389 8:06 am on May 14, 2003 (gmt 0)
If you want something to manage all these pages once you get them mined, send me a PM- I know a great little free program that would probably make keeping up with it all really easy! It's just the first set up that is going to be a pain!
bobnew32 msg:353390 8:59 pm on May 14, 2003 (gmt 0)
Believe me, for what you are asking, a custom job would be the best route with doing php and mysql management. Go to the php forum they will really help you out there.