Welcome to WebmasterWorld Guest from

Forum Moderators: open

Message Too Old, No Replies

Converting PDF into HTML

12:13 pm on Oct 25, 2010 (gmt 0)

Full Member

10+ Year Member

joined:July 27, 2005
votes: 0


What is the best way to get table-based data from a PDF converted into an HTML table. I used Acrobat to export as HTML, but this did not work well (too many unnecessary SPANs and the data is mixed-up).

I saved the data as .csv and was hoping to use that in combination with Regex to get table fields wrapped around the data - maybe I gave up too soon.
10:30 am on Nov 17, 2010 (gmt 0)

Full Member

5+ Year Member

joined:Dec 30, 2009
posts: 249
votes: 0

Hi SilverLining, I was tackling this same problem yesterday (specifically tables too) and gave up (I don't have Acrobat though).

If you managed to get to a csv of the table data formatted correctly, it should indeed be possible to process this into a HTML table.

The easiest way would be to copy the data from the csv and paste it into Word and then view the .doc in Google Docs "view as HTML" option and copy the source code.