Grabbing Data from a Static html page

Forum Moderators: coopster & phranque

Message Too Old, No Replies

Grabbing Data from a Static html page

Is this possible

David

5:36 pm on Feb 9, 2003 (gmt 0)

I have a potential client who needs to convert a directory to a mysql database. Currently there are 4000 plus Names, addresses phone etc. All have been hand coded on static html pages.

Setting up the database and quiery stuff is easy. The question is, how difficult would it be to strip the data so it can be inserted?

andreasfriedrich

7:01 pm on Feb 9, 2003 (gmt 0)

That will depend on the structure of your HTML pages. If they were built consistently then just use a HTML parser such as HTML::Parser [perldoc.com] to extract the information. This is easy as well ;). If each page is somewhat difference then that will get a bit harder. Although a combination of parser and regular expressions should do the trick as well.

Andreas

jatar_k

7:26 pm on Feb 9, 2003 (gmt 0)

Like andreas says if the pages all have the same format it is fairly simple but if you have to get different info from every page in a different format then it gets pretty heavy.

If it is long lists on few pages it may even be easier to use a text editor to strip html.

Try a google search for html stripper or maybe for a spider that could help you out.

David

7:49 pm on Feb 9, 2003 (gmt 0)

Thanks,
I'm going to dig into his pages a little deeper. Hopefully they are clean and the same.

This is easy as well

Thats relative to how long it takes to get to easy :)