Algorithm for Parsing Text Information

Răspuns Algorithm for Parsing Text Information

  • 8. září 2010 3:39
     
     

    Hi,

    This could be a Word issue or a .NET issue or a language issue, so I need some help where to post.

    I am using Word Interop to go through a document and pull information out into an xml document. The document was created by a regular user, so there is no way to easily extract the data (no controls, no xml). I will add some way of detecting the keywords, and if I cannot extract the odd document or two, that's okay. I plan to use specific words as the keywords, and they generally follow a particular order in the document. The data of interest is either the word after the keyword or the paragraph after the keyword.

    I need to get some ideas on how to plough through the document looking for keywords. 

    1. the best way to use the keyword to find the value location;
    2. how to assign the right method of extracting the value based on the keyword (nextWord or nextParagraph);
    3. how to associate the location in the xml document with the keyword.

    So it seems like it's about the algorithm, but the solution might be helped with knowing the capabilities of .NET better than I do. I use VB.NET, but I don't think this is language specific, is it? I wonder if generics or delegates might help in this situation.

    Who's best to ask?

     

    • Upravený em-squared 8. září 2010 6:39 Clarify writing
    •  

Všechny reakce

  • 15. března 2012 3:54
     
     
    Hello em-squared,
    Thank you for your post!
    As MSDN and TechNet Forums are for IT professionals to post technical questions such as development, testing, deployment, etc. I would suggest posting your question in one of the forums at Microsoft Answers, which helps people troubleshoot problems about Windows, IE, Office and other Microsoft products.
    Located here:
    Hope it would be helpful.
  • 16. března 2012 0:02
     
     Navržená odpověď

    Thanks for the reply, SunYuanhong. I've found the Microsoft Answers to be a good place to get answers about using the software you mentioned, but it is technical. I was looking for a programming solution. For example, I just discovered recently when dealing with strings that represent a file path on a drive in the Windows OS, using the System.IO Path class makes it easy to pull out the part of the path (e.g. the folder name), rather than use your own text manipulation.

    This issue was resolved (I was under a deadline) using brute force: I just made a string of code successively deal with each section that should have in the standard order. It handled about 90% of the files appropriately, but the remainder had to be done manually. It has been a while, so I don't remember much else as I do not work there anymore and cannot look at the files.

    Thanks,

  • 16. března 2012 9:07
    Moderátor
     
     Odpovědět

    You could ask this question in the "Word for Developers" Forum at the following address

    http://social.msdn.microsoft.com/Forums/en-us/worddev/threads

    Bye.


    Luigi Bruno - Microsoft Community Contributor 2011 Award

  • 10. dubna 2012 2:13
     
     

    Thanks, Luigi for your reply.

    To Ed (moderator), I don't appreciate when I go looking for answers in this forum to find things marked as answers when there is no answer. Now, if there was an answer in the forum that Luigi pointed to I would understand that would be an answer if it pointed straight to the thread in that forum, but I don't consider this an answer. I suppose you are obligated to close old threads, and the only way to do this is to mark something as the answer. Tell you what: I'll post to the forum that Luigi mentioned and if an answer is proposed there, I'll come back and link to it. But as I said in my recent post, the urgency has passed.

  • 16. dubna 2012 8:27
    Vlastník
     
     

    This is a "Where do I find" forum.

    The question is "where do I find x".

    The answer is "go ask here". That appears to be what Luigi did (gave you a link of where to go ask the question). This forum exists only for that purpose. Let us know if the forum Luigi sent you to is not the right forum.

    I always give the Asker/OP a full week to return after I propose an answer. That way the Asker can explain whether the attempt to answer the question (in this case, give a location to ask the question) was successful or not.

    Thanks!


    Ed Price (a.k.a User Ed), SQL Server Experience Program Manager (Blog, Twitter, Wiki)

  • 17. dubna 2012 11:56
     
     

    Ed: big humble apologies! You are absolutely right. Since it was a while since I posted, I forgot where I was. Thanks for correcting me. Kudos to Luigi. And, I am getting some nibbles from the people in that forum.

    Thanks!

  • 17. dubna 2012 23:56
    Vlastník
     
     

    Ed: big humble apologies! You are absolutely right. Since it was a while since I posted, I forgot where I was. Thanks for correcting me. Kudos to Luigi. And, I am getting some nibbles from the people in that forum.

    Thanks!

    Great! And thank you.

    Is it okay to mark Luigi's answer, then, or do you need another forum?

    Thanks!


    Ed Price (a.k.a User Ed), SQL Server Experience Program Manager (Blog, Twitter, Wiki)