homepage Welcome to WebmasterWorld Guest from 54.234.128.25
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / WebmasterWorld / Webmaster General
Forum Library, Charter, Moderators: phranque & physics

Webmaster General Forum

    
Want to "data mine" my own site
Need HTML parsed into database
aroach




msg:353385
 8:43 pm on May 10, 2003 (gmt 0)

I have a directory on my site that is edited by hand on static HTML pages.

It started small about three years ago but now it's too big for one person to keep up with. I get submissions every day now and need to get this converted to something more manageable.

I went and downloaded one of those "data miner" programs that are supposed to be so obnoxious but I figure if I'm using it on my own site only then that's not so rude.

But, before I install and try to figure out how to use this thing, is this even what I need to do?

 

Brett_Tabke




msg:353386
 3:11 pm on May 11, 2003 (gmt 0)

Sounds like you need a custom app becuase I don't think any off-the-shelf program is going to do what you want.

aroach




msg:353387
 3:40 pm on May 11, 2003 (gmt 0)

I don't think my post was very clear.

The directory now has 52 pages. The index page has static links to the 51 "category" pages. The "categories" are the 50 states plus the District of Columbia.

I need to pull the listings into a database or spreadsheet of some type. There are only three parts to each listing. Here is a URL removed snippet of the code:

<li><a href="http://www.somesite.org/">Some Site</a> - Some City</li>
<li><a href="http://www.anothersite/">Another Site</a> - Another City</li>
<li><a href="http://www.blahblah/">Blah Blah Site</a> - Blah Town</li>
<li><a href="http://yougettheidea.com">You Get the idea</a> - City</li>

I would also need it to pull the state from the page title or somewhere since it's not a part of each individual listing.

Of course, I've no idea what to do with the database once I have it. I figure one thing at a time. Maybe editing static pages is the best way for me to go since I don't know what I'm doing.

mischief




msg:353388
 4:57 am on May 14, 2003 (gmt 0)

I've not tried this myself as I don't use the program much, but I noticed that when you go to File -> Open in Excel, the drop down box allowing you to choose what type of file is listed includes "Web Pages and Web Archives". I guess this means it can import HTML files into a spreadsheet of some sort - might that do what you want?

carfac




msg:353389
 8:06 am on May 14, 2003 (gmt 0)

If you want something to manage all these pages once you get them mined, send me a PM- I know a great little free program that would probably make keeping up with it all really easy! It's just the first set up that is going to be a pain!

dave

bobnew32




msg:353390
 8:59 pm on May 14, 2003 (gmt 0)

Believe me, for what you are asking, a custom job would be the best route with doing php and mysql management. Go to the php forum they will really help you out there.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / WebmasterWorld / Webmaster General
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved