locked
Extracting Html strings from html source RRS feed

  • Question

  • please HELP !!!!

    I've got this html source:

                                                            

    <!DOCTYPE html>
    <html itemscope itemtype="http://schema.org/QAPage">
    <head>
    <tr>
                <td class="votecell">
    <div class="vote">
            <input type="hidden" name="_id_" value="42472322">
            <a class="vote-up-off" title="This question shows research effort; it is useful and clear">up vote</a>
            <span itemprop="upvoteCount" class="vote-count-post ">0</span>
            <a class="vote-down-off" title="This question does not show any research effort; it is unclear or not useful">down vote</a>
            <a class="star-off" href="#">favorite</a>
            <div class="favoritecount"><b></b></div>
    </div>
                </td>
    <td class="postcell">
    <div>
        <div class="post-text" itemprop="text">
    <blockquote>
      <p><strong>I want to find out the chatID or username of the the new user who joins my telegram channel where my telegram bot is one of its
      admins.</strong></p>
    </blockquote>
    <p>Also I want to know whether I can get the users list of my channel using my bot admin or not?</p>
    <p>Let say that I'm using NetTelegramBotApi in C#, I have tried the below code but didn't worked:</p> <pre class="lang-cs prettyprint prettyprinted"><code><span class="kwd">if</span><span class="pln"> </span><span class="pun">(</span><span class="pln">update</span><span class="pun">.</span><span class="typ">ChannelPost</span><span class="pun">.</span><span class="typ">NewChatMember</span><span class="pln"> </span><span class="pun">!=</span><span class="pln"> </span><span class="kwd">null</span><span class="pun">)</span><span class="pln">
    </span><span class="pun">{</span><span class="pln">
    </span><span class="typ">Console</span><span class="pun">.</span><span class="typ">WriteLine</span><span class="pun">(</span><span class="pln">update</span><span class="pun">.</span><span class="typ">ChannelPost</span><span class="pun">.</span><span class="typ">NewChatMember</span><span class="pun">.</span><span class="typ">Id</span><span class="pun">.</span><span class="typ">ToString</span><span class="pun">());</span><span class="pln">
    </span><span class="kwd">continue</span><span class="pun">;</span><span class="pln">
    </span><span class="pun">}</span></code></pre>
        </div>
        <div class="post-taglist">
            <a href="/questions/tagged/c%23" class="post-tag js-gps-track" title="show questions tagged 'c#'" rel="tag">c#</a> <a href="/questions/tagged/console-application" class="post-tag js-gps-track" title="show questions tagged 'console-application'" rel="tag">console-application</a> <a href="/questions/tagged/telegram" class="post-tag js-gps-track" title="show questions tagged 'telegram'" rel="tag">telegram</a> <a href="/questions/tagged/telegram-bot" class="post-tag js-gps-track" title="" rel="tag">telegram-bot</a> 
        </div>
        <table class="fw">
        <tbody><tr>
        <td class="vt">
    <div class="post-menu"><a href="/q/42472322" title="short permalink to this question" class="short-link" id="link-post-42472322">share</a><span class="lsep">|</span><a href="/posts/42472322/edit" class="suggest-edit-post" title="">improve this question</a></div>        
        </td>
        <td align="right" class="post-signature">
    <div class="user-info ">
        <div class="user-action-time">
            <a href="/posts/42472322/revisions" title="show all edits to this post">edited <span title="2017-02-26 19:27:54Z" class="relativetime">24 mins ago</span></a>
        </div>
        <div class="user-gravatar32">

        </div>
        <div class="user-details">

            <div class="-flair">

            </div>
        </div>
    </div>    </td>
        <td class="post-signature owner">
            <div class="user-info ">
        <div class="user-action-time">
            asked <span title="2017-02-26 18:17:55Z" class="relativetime">1 hour ago</span>
        </div>
        <div class="user-gravatar32">
            <a href="/users/3929525/naser-sadeghi"><div class="gravatar-wrapper-32"><img src="https://i.stack.imgur.com/4poyn.jpg?s=32&amp;g=1" alt="" width="32" height="32"></div></a>
        </div>
        <div class="user-details">
            <a href="/users/3929525/naser-sadeghi">Naser.Sadeghi</a>
            <div class="-flair">
                <span class="reputation-score" title="reputation score " dir="ltr">8</span><span title="4 bronze badges"><span class="badge3"></span><span class="badgecount">4</span></span>
            </div>
        </div>
    </div>
        </td>
        </tr>
        </tbody></table>
    </div>
    </td>
            </tr>
    </body>
    </html>

    what I need is to extract only the " code " which located between < pre > and </pre> in this source and save it in another html page.  I want my code definitely in C#. I've tried to use HtmlAgilityPack several times before but I got no useful result. please any one could help me in that and I would be pretty grateful. this is my last attempt in C# code: 

    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    using HtmlAgilityPack;
    using System.Web;
    using System.IO;
    using System.Xml;
    using System.Xml.XPath;

    namespace ConsoleApplication41
    {
        class Program
        {
       
            static void Main(string[] args)
            {
                HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
                htmlDoc.OptionFixNestedTags = true;
                //   htmlDoc.Load("file://F:/17.txt.html");

                htmlDoc.LoadHtml("file://F:/17.txt.html");

                if (htmlDoc.DocumentNode != null)
                {
                    HtmlAgilityPack.HtmlNode codeNode =
        htmlDoc.DocumentNode.SelectSingleNode("//code");
                    htmlDoc.Save("F:/test/file-1.txt");
                }
            }
        }
    }
                    


    Wednesday, March 29, 2017 9:53 AM

Answers

  • I'd try them over here.

    https://social.msdn.microsoft.com/Forums/vstudio/en-US/home?forum=csharpgeneral

    https://forums.asp.net/37.aspx/1?C+

     

     



    Regards, Dave Patrick ....
    Microsoft Certified Professional
    Microsoft MVP [Windows Server] Datacenter Management

    Disclaimer: This posting is provided "AS IS" with no warranties or guarantees, and confers no rights.

    • Proposed as answer by Just Karl Thursday, March 30, 2017 5:23 PM
    • Marked as answer by Dave PatrickMVP Saturday, April 8, 2017 1:15 PM
    Wednesday, March 29, 2017 12:52 PM
  • Good day Wael Alsabbagh,

    * I think that best option to questions like this is to ask in the fitting language forum, which is C# in your case, as Dave already gave you the link.

    In this specific question, the answer is that there are 2 common option for parsing the text:

    1. You can deal with the text as XML (after all HTML is specific case of XML format). In this case you can simply use LINQ to XML, which is the probably the simplest way. Another option is to use XmlReader for example. There are other classes that you can use for the task (parsing XML is VERY common needs).

    2. You can parse the HTML as pure text, ignore the fact that this is HTML code. In this case you can simply use regular expression and search for the text between "<pre>" and "</pre>". You can use the method Matches(String), which searches the specified input string for all occurrences of a regular expression.

    I hope this solve you question :-)


    signature   Ronen Ariely
     [Personal Site]    [Blog]    [Facebook]    [Linkedin]

    • Proposed as answer by Just Karl Thursday, March 30, 2017 5:23 PM
    • Marked as answer by Dave PatrickMVP Saturday, April 8, 2017 1:15 PM
    Thursday, March 30, 2017 4:56 AM

All replies

  • I'd try them over here.

    https://social.msdn.microsoft.com/Forums/vstudio/en-US/home?forum=csharpgeneral

    https://forums.asp.net/37.aspx/1?C+

     

     



    Regards, Dave Patrick ....
    Microsoft Certified Professional
    Microsoft MVP [Windows Server] Datacenter Management

    Disclaimer: This posting is provided "AS IS" with no warranties or guarantees, and confers no rights.

    • Proposed as answer by Just Karl Thursday, March 30, 2017 5:23 PM
    • Marked as answer by Dave PatrickMVP Saturday, April 8, 2017 1:15 PM
    Wednesday, March 29, 2017 12:52 PM
  • Good day Wael Alsabbagh,

    * I think that best option to questions like this is to ask in the fitting language forum, which is C# in your case, as Dave already gave you the link.

    In this specific question, the answer is that there are 2 common option for parsing the text:

    1. You can deal with the text as XML (after all HTML is specific case of XML format). In this case you can simply use LINQ to XML, which is the probably the simplest way. Another option is to use XmlReader for example. There are other classes that you can use for the task (parsing XML is VERY common needs).

    2. You can parse the HTML as pure text, ignore the fact that this is HTML code. In this case you can simply use regular expression and search for the text between "<pre>" and "</pre>". You can use the method Matches(String), which searches the specified input string for all occurrences of a regular expression.

    I hope this solve you question :-)


    signature   Ronen Ariely
     [Personal Site]    [Blog]    [Facebook]    [Linkedin]

    • Proposed as answer by Just Karl Thursday, March 30, 2017 5:23 PM
    • Marked as answer by Dave PatrickMVP Saturday, April 8, 2017 1:15 PM
    Thursday, March 30, 2017 4:56 AM