Windows Perl Scripting Forums » General

starting Word via Win32::OLE fails, but Excel or IE works?

(5 posts)
  • Started 11 years ago by enchanter
  • Latest reply from enchanter

Tags:


  1. enchanter
    Member

    Hi All!

    I've been using perl since the 4.018 days, but I have next to no experience programming perl on Windows. I have ActiveState 5.8.8 build 819 installed on XP with SP2 and Office 2003 on a laptop I'm using. Both XP and Office have had all patches available via the automatic update service applied.

    I'm trying to help someone else that needs to extract text from literally thousands of MS Word documents. I've found some examples online that look to start me on the path I want, they generally start with either:

    use Win32::OLE;
    use Win32::OLE::Enum;
    $document = Win32::OLE->GetObject("1517am.doc");


    or


    use warnings;
    use Win32::OLE;
    use Win32::OLE::Const;

    my $wd = Win32::OLE::Const->Load("Microsoft Word 11.0 Object Library")
    or die Win32::OLE->LastError();

    foreach my $k (keys %$wd) {
    printf "$k = %s\n", $wd->{$k};
    }

    my $word = Win32::OLE->new('Word.Application', 'Quit') or die Win32::OLE->LastError();
    $word->{'Visible'} = 1;


    Unfortunately, the new() or GetObject() methods always fail with:


    Win32::OLE(0.1707) error 0x8007007e: "The specified module could not be found" at ...


    After hours and hours of banging my head and assuming I was doing something wrong, I tried switching Word.Application to Excel.Application or InternetExplorer.Application, and those both work! The problem appears to be specific to Word.

    Anyone have any idea about the problem is?

    Also, assuming I can get by this problem, most of the examples I've seen are pretty basic -- they let you pull text out of word, but I would actually like to be able to tell if (for example) a paragraph of text that I just pulled out via

    $paragraphs = $document->Paragraphs();
    $enumerate = new Win32::OLE::Enum($paragraphs);
    while(defined($paragraph = $enumerate->Next())) {
    # do something here
    }

    has text within it that's styled (underlined, bold, etc).

    Can anyone suggest a good reference for how to do that?

    Thanks much,

    Tim
    Posted 11 years ago #
  2. Dave
    Perl guy

    It sounds like there may be a registry mangelement involved. When you create a COM object it looks up the class name (eg. Word.Application) in the registry (HKEY_CLASSES_ROOT\Word.Application) where it locates the path to the application to run. THis may be an actual path (as in "C:\Program Files\Microsoft Office\Office12\winword.exe") or it may be a class ID (such as {000209FF-0000-0000-C000-000000000046} which, in turn, is looked up in HKEY_CLASSES_ROOT\CLSID\{000209FF-0000-0000-C000-000000000046} for a path). If something has messed up those registry entries or the directories or file names have changed (or been deleted) then it would not be able to create the COM object.

    Check the registry entries and the file paths. You may want to reinstall Word.

    Oddly enough, my article in next month's Windows Scripting Solutions is about a script that walks through the registry looking for file extensions that are orphaned (they no longer point to valid programs).

    Regarding the enumeration: The best way to perform enumeration is to use the in() function:

    use Win32::OLE qw( in )
    # First get a word document object
    my $Doc = Win32::OLE->GetObject( "c:\\temp\\MyWordDocument.doc" ) || die;

    print "There are $Doc->Paragraphs()->Count() paragraphs.\n"
    # Now iterate over each paragraph
    foreach my $Paragraph ( in( $Doc->Paragraphs() )
    {
        $ParaCount++;
        print "Paragraph $ParaCount) " .  $Paragraph->Range()->Characters()->Count() . " characters\n";
    }

    Posted 11 years ago #
  3. enchanter
    Member

    Dave Escribió:

    It sounds like there may be a registry mangelement involved. When you create a COM object it looks up the class name (eg. Word.Application) in the registry (HKEY_CLASSES_ROOT\Word.Application) where it locates the path to the application to run. THis may be an actual path (as in "C:\Program Files\Microsoft Office\Office12\winword.exe") or it may be a class ID (such as {000209FF-0000-0000-C000-000000000046} which, in turn, is looked up in HKEY_CLASSES_ROOT\CLSID\{000209FF-0000-0000-C000-000000000046} for a path). If something has messed up those registry entries or the directories or file names have changed (or been deleted) then it would not be able to create the COM object.


    Thanks Dave! That's very useful information for someone that's very new to scripting on Windows! Since my post, I've tried a couple of the very basic scripts on a different Windows box, and they both worked fine, so it is definitely something with how Word is installed on the box I started doing development on.




    Dave Escribió:

    Regarding the enumeration: The best way to perform enumeration is to use the in() function:

    use Win32::OLE qw( in )
    # First get a word document object
    my $Doc = Win32::OLE->GetObject( "c:\\temp\\MyWordDocument.doc" ) || die;

    print "There are $Doc->Paragraphs()->Count() paragraphs.\n"
    # Now iterate over each paragraph
    foreach my $Paragraph ( in( $Doc->Paragraphs() )
    {
        $ParaCount++;
        print "Paragraph $ParaCount) " .  $Paragraph->Range()->Characters()->Count() . " characters\n";
    }




    Thanks for the example of in(), and the pointer to Range().

    As I mentioned in my first post, I also need to find ranges of styled (bold, underline, different font, etc.) characters within each paragraph. Is there a good reference (preferably online, but book form is OK too) someone can point me to, so that I can find what methods (like Range()) are available for Paragraphs() and other Win32::OLE objects?

    Thanks,

    Tim
    Posted 11 years ago #
  4. Dave
    Perl guy

    One of the best sources of info on how Word works is the Microsoft Developer Network (MSDN) online: http://msdn.microsoft.com/
    Do a search for "Word Object Model". You can also make use of any Visual Basic docs on Word since VB is uses the same Word object model that Perl will.
    Posted 11 years ago #
  5. enchanter
    Member

    Dave Escribió:

    One of the best sources of info on how Word works is the Microsoft Developer Network (MSDN) online: http://msdn.microsoft.com/
    Do a search for "Word Object Model". You can also make use of any Visual Basic docs on Word since VB is uses the same Word object model that Perl will.


    Thanks Dave! The search suggestion you gave turned up what looks to be lots of very useful information. This should allow me to make some good progress.

    Thanks for your help!

    Tim
    Posted 11 years ago #

RSS feed for this topic

Reply

You must log in to post.