Forum Moderators: open

Message Too Old, No Replies

Editing Word Docs programatically?

How to extract images?

         

RossWal

7:18 pm on Jan 29, 2003 (gmt 0)

10+ Year Member



I have some 1,400 word docs, each of which has embedded a scanned image and no other content. I need to build a web app that will securely display the images to authenticated users. The problem is I need to pull the jpgs out of the Word docs, and I'd rather not do each one manually, especially since the only way I've found to do that involves a cumbersome save-as-html then recover-the-picture process. 1,400 of those? Lord help me :(.

Please someone, tell me there's a better way (I know little about Office and zip/nada/nutting about VBA).

Thanks,
Ross

wardbekker

7:41 pm on Jan 29, 2003 (gmt 0)

10+ Year Member



RossWal,

You might want to look into the Macro functionality that Microsoft Word offers. I'm not 100% sure if you can strip out the images using the Word Macro API, but i think this is the most promising way i can come up with ;)

Dreamquick

8:31 pm on Jan 29, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If memory serves the newer versions of Word give you access to the Word DOM which is reasonably well documented - I imagine you could access this through any of the MS programming languages and/or VBA.

I know you can find the images via the shapes and inlineshapes properties of the document object as I've used this method to force inline images to embed themselves (sorry the source is at work, im not right now).

It's not the worlds nicest interface but at least you can watch what it does in the main word window to aid debugging, and as a plus its not as bad as crystal reports...

-Tony

tomasz

7:15 pm on Jan 30, 2003 (gmt 0)

10+ Year Member



RossWal

You can write macro to save your doc as HTML. Word will convert them and will create your images in the related folders.

RossWal

8:21 pm on Jan 31, 2003 (gmt 0)

10+ Year Member



tomasz,
That's the direction I was headed. Write a macro that does a save-as for each doc opened (not sure how to wire that up... as I mentioned I've avoided all this Office stuff for years). Then have a vb process loop through the source folder opening each doc in order to get the macro to execute, and also have the vb process copy the macro extracted jpg to the common target folder.

Am I making this more convaluted than it need be?

Thanks all!

aspdaddy

8:40 pm on Jan 31, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You can use the macro recorder to save as html, then just hit alt->F11 and view the code it made.

RossWal

9:03 pm on Jan 31, 2003 (gmt 0)

10+ Year Member




You can use the macro recorder to save as html, then just hit alt->F11 and view the code it made.

Thanks. With 1,400 of these puppies I'd like it to be a completely hands off exercise. I'm wondering if I can get the macro to run automatically as the docs get opened. That way I can open them programatically from VB using shell().

andreasfriedrich

9:29 pm on Jan 31, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You could use a Perl or PHP script which would loop over all the documents and extract the images.

#!/usr/bin/perl -w 
#
use strict;
#
use Win32::OLE;
#
my $w = '';
#
# trying to get a running instance of Word
eval {$w = Win32::OLE->GetActiveObject('Word.Application')};
die "Word not installed" if $@;
#
unless (defined $w) {
# open a new instance of Word
$w = Win32::OLE->new('Word.Application', sub {$_[0]->Quit;})
or die "Canīt start Word";
}
#
# open document
my $d = '';
my $file = 'k:\\word_document.doc';
eval {$d = $w->Documents->Open($file)};
die "Couldnīt open $file\n" if $@;
#
# get the content and print it
my $r = $d->Content;
print $r->Text;

Andreas

tomasz

9:38 pm on Jan 31, 2003 (gmt 0)

10+ Year Member



here it goes,
I hope it will help (I did not test this, but should work)

Sub Macro1()
'set Microsoft script Runtime under refrences

Dim FSO As Scripting.FileSystemObject
Dim SourceFolder As Scripting.Folder, SubFolder As Scripting.Folder
Dim FileItem As Scripting.File
Dim sFileName As String


Set FSO = New Scripting.FileSystemObject
Set SourceFolder = FSO.GetFolder("your folder with your word docs")

For Each FileItem In SourceFolder.Files
sFileName = FileItem.Name
SaveAsHTML sFileName, Left(sFileName, Len(sFileName) - 3) + ".htm"
Next FileItem
End Sub

'======your fuction =========
Function SaveAsHTML(sFile As String, sSaveAs As String)
'
Documents.Open FileName:=sFile, ConfirmConversions:=False, _
ReadOnly:=False, AddToRecentFiles:=False, PasswordDocument:="", _
PasswordTemplate:="", Revert:=False, WritePasswordDocument:="", _
WritePasswordTemplate:="", Format:=wdOpenFormatAuto
ActiveDocument.SaveAs FileName:=sSaveAs, FileFormat:=wdFormatHTML _
, LockComments:=False, Password:="", AddToRecentFiles:=True, _
WritePassword:="", ReadOnlyRecommended:=False, EmbedTrueTypeFonts:=False, _
SaveNativePictureFormat:=False, SaveFormsData:=False, SaveAsAOCELetter:= _
False

End Function

RossWal

11:04 pm on Jan 31, 2003 (gmt 0)

10+ Year Member



Thank you both so much for taking the time to set me on track! tomasz, this was kick start I needed to break into the world of macros.

Ross <==on a roll now ;)