none
Find Table Cell Value in PDF reader with iTextSharp RRS feed

  • Question

  • In my code I need to read the PDF file content and based on some specific requirement I need to insert the content of PDF into SQL server DB.
    I am using PDFReader of iTextsharpe version : 5.5.13.1. I have updated it through manage nuget packages from my visual studio (IDE).

    It reads well, when it found the entire line in PDF. Problems comes when it found table inside the PDF.

    It first get into column1 and reads the line and jumps into column2 and reads that line and so on.
    Problem is column1 has paragraph string and column2 has paragraph string. It breaks those paragraph into single different lines which has no meaning.

    I want it to work like go to column1 read paragraph and if it find new paragraph after newline then read the paragraph from second line.
    After processing column1 then jumps into colum2.
     
    Currently I am using below code:

    PdfReader reader = new PdfReader(@"D:\pdf1.pdf");
    int PageNum = reader.NumberOfPages;

    string[] sentence;

    for (int i = 1; i <= PageNum; i++)
    {
        ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();
        string currentText = PdfTextExtractor.GetTextFromPage(reader, i, strategy);

        currentText = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(currentText)));
        text.Append(currentText);

        sentence = text.ToString().Split('\n');   
    }

    Please help me in getting the table column value.


    Wednesday, July 3, 2019 8:49 AM

All replies

  • Try using the LocationTextExtractionStrategy()

    ITextExtractionStrategy strategy = new iTextSharp.text.pdf.parser.LocationTextExtractionStrategy();
    string currentText = PdfTextExtractor.GetTextFromPage(reader, i, strategy);

    Hope this helps!

    Wednesday, July 3, 2019 10:09 AM
  • Hi Udai Mathur, 

    Thank you for posting here.

    Unfortunately ITextSharp is a 3rd-party library for which we don't provide help, so I suggest you post in their forums.

    The CLR Forum discuss and ask questions about .NET Framework Base Classes (BCL) such as Collections, I/O, Regigistry, Globalization, Reflection. Also discuss all the other Microsoft libraries that are built on or extend the .NET Framework, including Managed Extensibility Framework (MEF), Charting Controls, CardSpace, Windows Identity Foundation (WIF), Point of Sale (POS), Transactions.

    Thank you for your understanding.

    Best Regards,

    Xingyu Zhao


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Thursday, July 4, 2019 9:24 AM