locked
Need to convert an MS Word table to a .csv file RRS feed

  • Question

  • I have a recurring MS Word document which contains data groupings that represent records; i.e. each record occupies several lines in the document, with a blank line separating each record.  I want to export that to a .csv file which, as I've just found out, is no longer an option in MS Word 2016, but I can be export it as a plain text (.txt) in the Word document format; i.e. just as it looks in the MS Word document.  What I need to do is to somehow convert that .txt file to a .csv, such that each record becomes a series of comma separated fields, with a CR/LF between records; i.e. each time a blank line is encountered.

    The purpose of this is to be able to import the .csv into Excel so that each comma separated value occupies a separate column in the spreadsheet and each record becomes a row. This spreadsheet w/then become an input to other programs; e.g. QuickBooks, Constant Contacts, etc.

    I've tried to just import the .txt into excel, but it simply creates a row for each line in the .txt file.  I thought I'd try to use the Windows Powershell ISE, but not being the least bit familiar with it, it's all Greek to me (this from someone with 58 years of programming, but the last language I used was COBOL before I retired in '93).

    Can someone tell me of a simple/quick-to-learn scripting language, or some other product or better/simpler method by which I can accomplish this task?  At my age trying to learn a complex language is like trying to insert a dollar bill into a coin slot (that's why my name is Capt. Dinosaur), but I need to accomplish this task for my wife's online business (she's 25 years younger than I).


    Capt. Dinosaur

    • Moved by Bill_Stewart Monday, October 2, 2017 7:07 PM This is not "teach me Word object model" forum
    Friday, August 18, 2017 7:51 PM

All replies

  • First off why do you think that Word 2016 doesn't have tables?  it does. 

    I recommend posting in the Word developer forum to learn what kind of table, list or record oriented format is being used in your document.  Tables and lists can be extracted as tables or objects into PowerShell or any Net language and easily exported as a CSV.  If you have an embedded Excel table in Word then that can be easily exported.

    The first and most important step is to find out exactly what is in the Word document.  Try asking the group or person how they are creating these documents.

    If it turns out these documents are created by hand then there is likely no tool that can reliable extract the "table".

    Also note that no one can be of much help with the vague description you have given,  MS Word documents are collections of objects.  If the document is created using a tool or report generator then the data can usually be easily extracted.

    What have you done to try to export the data so far?  Post your code.


    \_(ツ)_/


    • Edited by jrv Friday, August 18, 2017 9:01 PM
    Friday, August 18, 2017 8:58 PM
  • First off why do you think that Word 2016 doesn't have tables?  it does. 

    I didn't say that.  What I said was that Word 2016 no longer has an export option for .CSV files; it only has "Plain Text" (i.e. .txt).

    I recommend posting in the Word developer forum to learn what kind of table, list or record oriented format is being used in your document.

      Tables and lists can be extracted as tables or objects into PowerShell or any Net language and easily exported as a CSV.

    I certainly can, and will do that.  I posted my question in this forum because I'm looking for a way to write a script to create the .csv file to be imported into Excel

      If you have an embedded Excel table in Word then that can be easily exported.

    If I had that, I wouldn't have to do anything.

    The first and most important step is to find out exactly what is in the Word document.  Try asking the group or person how they are creating these documents.

    I know exactly what's in the Word document; It's  just standard word text, but  physically formatted like a table, in that there are a fixed number of lines which constitute a "Conceptual Record" delimited by a blank line.  My problem is that exporting that to Excel - or to a .txt file to be imported into Excel - results in a spreadsheet with ONLY ONE Column, with all the lines of each record imported vertically, just like in the Word document, and that isn't what I need.  I need for the lines (fields) of each conceptual record to be strung out as a ROW in the spreadsheet, so I need some way to actually read the exported .txt file and concatenate the lines of each conceptual record into a string of comma separated values.  Then I can import that into Excel with each record being a row with each comma separated value in its own column (cell).

    If it turns out these documents are created by hand then there is likely no tool that can reliable extract the "table".

    While they are created by hand, they do consistently stick to the above described format.

    Also note that no one can be of much help with the vague description you have given,  MS Word documents are collections of objects.  If the document is created using a tool or report generator then the data can usually be easily extracted.

    OK, here's a more graphic description;  This is what the Word document and .txt file data look like:

    ADEL (3-24-93), (14162) Div. 13
     The Parrish House – 1st and 3rd Wednesday’s 12:00 Noon
     (1401 S Hutchinson Avenue)
    Pres. Maria Hardman
           P: (229) 237-4986
           E: maria_hardman@yahoo.com
    Sec. Liza Hendrix
           P: (229) 237-9225
           E: liza.hendrix@tiftregional.com

    ALBANY (4-22-20), (00235) Div. 5
     Darton College-Student Union Bldg. C, Rm 252 – Wednesday
                  12:00 Noon (2400 Gillionville Road)
    Pres. James Brackin
           P: (229) 888-1136
           E: james.brackin@darton.edu
    Sec. Enoch Smith
           P: (229) 438-0672
           E: ESSmith2121@yahoo.com

    Now, what I want to do is to read that .txt file and write it out to a .csv file as follows:

    ADEL (3-24-93), (14162) Div. 13,The Parrish House – 1st and 3rd Wednesday’s 12:00 Noon,(1401 S Hutchinson Avenue),Pres. Maria Hardman,P: (229) 237-4986,E: maria_hardman@yahoo.com,Sec . Liza Hendrix,P: (229) 237-9225,E: liza.hendrix@tiftregional.com (cr/lf)

    ALBANY (4-22-20), (00235) Div. 5,Darton College-Student Union Bldg. C, Rm 252 – Wednesday,  12:00 Noon (2400 Gillionville Road),Pres. James Brackin,P: (229) 888-1136,E: james.brackin@darton.edu,Sec. Enoch Smith,P: (229) 438-0672,E: ESSmith2121@yahoo.com (cr/lf)

    etc.

    What have you done to try to export the data so far?  Post your code.

    There isn't, as yet, any code, since I haven't learned where/how to create it.  So far I've simply exported the data from Word to a .TXT file in the same format as it is in Word.  That's the file that I want to write a script to change it into a .CSV file, so that it can be properly imported into Excel.

    Also as part of that editing, I'll have to; 1) scan the existing data for commas and change them to semicolons and 2) remove embedded spaces at the beginning of each field.


    \_(ツ)_/




    Capt. Dinosaur






    Saturday, August 19, 2017 12:32 AM
  • Word has only ever had the ability to copy a table to Excel or to convert it to text.  There has never been an Export function in Word.  You are probably thinking of Excel.

    See this for an MS discussion.  https://www.microsofttraining.net/article-1347-how-export-word-table-excel.html


    \_(ツ)_/

    Saturday, August 19, 2017 12:42 AM
  • You need to post your code.  We will not write this for you.  If you do not now how to write a script then consider contacting a consultant to help you.


    \_(ツ)_/

    Saturday, August 19, 2017 12:44 AM
  • You need to post your code.  We will not write this for you.  If you do not now how to write a script then consider contacting a consultant to help you.


    As I wrote above, as yet there isn't any code, since I haven't found a product that I can learn quickly with which to write it.  I've known how to write code and scripts for over 50 years, once I know the particular language; knowing how isn't the problem.  The problem is in finding a product which is quick and easy to learn; the Powershell ISE is far too overwhelming to absorb quickly.  Right now I'm looking into the possibility of using a plain old fashioned Batch File, but I haven't as yet been able to determine if it has the capability to read, edit and write text files (it's been years since I've looked at it.

    I was hoping someone could recommend a quick and easy to learn scripting language that I can use.


    Capt. Dinosaur

    Saturday, August 19, 2017 12:37 PM
  • First you have to find out how this is saved in Word.  Next you can learn PowerShell, C#, VB.Net or any number of languages than can do this.

    Any language that supports COM can automate Word.  All Net languages and scripting systems that support COM can d this.  You can also do this with VBA macros.  Start by posting in the Word VBA developers forum to learn how Word is managed3 in code.  This forum is primarily for administrative scripting and for people who are using scripting languages VBScript and PowerShell.

    You can also look in the Gallery/Repository (link at top of page) for examples of how to automate Word with script.


    \_(ツ)_/

    Saturday, August 19, 2017 4:47 PM
  • Here is a hint.  Learn a scripting language.  PowerShell would be preferred.  Export you Word doc to a text file and then parse the text into the form you need.

    Converting a word document can be done in about 4 lines.  Parsing can be complex depending on how consistent the text is.  There are many third party text parsing programs that can help.  They can use heuristics and hints to convert text into data.


    \_(ツ)_/

    Saturday, August 19, 2017 5:32 PM
  • Here is a list of data extraction programs that can extract fro text files, Word documents and spreadsheets.

    https://en.wikipedia.org/wiki/List_of_text_mining_software


    \_(ツ)_/

    Saturday, August 19, 2017 5:35 PM
  • First you have to find out how this is saved in Word.  Next you can learn PowerShell, C#, VB.Net or any number of languages than can do this.

    I displayed above how the data are saved in both word and exporting to a .txt file yields the same format.  What I need is to be able to read the text file, line by line, and concatenate the fields into comma separated values and write them to a .csv file, starting a new record each time a blank line is encountered.

    I looked at Powershell ISE, and it was too overwhelming - and the Microsoft tutorials were totally confusing - to be able to do what I have to do quickly.  This is for a volunteer function that my wife performs for the Georgia Dept. of Labor, so we have a zero budget and therefore doesn't allow for the purchase of expensive language programs.  I was looking for some kind of free, simple scripting language that makes it easy to read the text file's lines, concatenate them into comma separated values and write the results to a .csv file.

    Any language that supports COM can automate Word.  All Net languages and scripting systems that support COM can d this.  You can also do this with VBA macros.  Start by posting in the Word VBA developers forum to learn how Word is managed3 in code.  This forum is primarily for administrative scripting and for people who are using scripting languages VBScript and PowerShell.

    Thank you for that lead.  I'll open a post in that forum.

    You can also look in the Gallery/Repository (link at top of page) for examples of how to automate Word with script.

    Again, I'm not interested in automating Word.  It only takes a second or two to export the Word document to a text file in the exact same text pattern.  So what I want to do would be done from the text file.


    \_(ツ)_/



    Capt. Dinosaur

    Saturday, August 19, 2017 9:22 PM
  • Here is a hint.  Learn a scripting language.  PowerShell would be preferred.

    That's exactly what I've been trying to do, but Powershell was too overwhelming, and as I said above, the MS Tutorials are too vague to be useful to me.

      Export you Word doc to a text file and then parse the text into the form you need.

    I do believe that that's what I've been trying to explain throughout this thread.  My initial request was to find a simple to use scripting language with which to do that.

    Converting a word document can be done in about 4 lines.  Parsing can be complex depending on how consistent the text is.

    We can depend on the consistency of the word documents.

      There are many third party text parsing programs that can help.

    And THAT is exactly what I've been trying to find.  My initial posting above was a request for information on some specific simple to learn/use scripting languages.  I'll check out that VBA developer's forum that you mentioned above.  Is it in TechNet, or should I search Google for it?

      They can use heuristics and hints to convert text into data.


    \_(ツ)_/



    Capt. Dinosaur

    Saturday, August 19, 2017 9:30 PM
  • Here is a list of data extraction programs that can extract fro text files, Word documents and spreadsheets.

    https://en.wikipedia.org/wiki/List_of_text_mining_software

    Thank you for the link.  I'll check it out.


    \_(ツ)_/



    Capt. Dinosaur

    Saturday, August 19, 2017 9:31 PM