locked
Web Scraping with POSTs RRS feed

  • Question

  • What's the best forum in which to post the following ...

    I've done quite a bit of web scraping using VB.Net code like this ...

     wrq = WebRequest.Create(rtbaux.URL)
     wrp = DirectCast(wrq.GetResponse(), HttpWebResponse)

    where:

    Dim wrq As WebRequest
    Dim wrp As HttpWebResponse

    But I have always been able to get what I want by adjusting the URL.  Now I have to deal with a web site which seems to be using POST.   The site provides several pages of data and to move to a page you click on a page number.  But the URL never changes which is what leads me to think that the site/page is using POST.  Also, I find this code in the html ...

    <td><a href="javascript:__doPostBack('abc','Page$2')" style="color:White;">2</a></td>

    and ...

    <script type="text/javascript">
    //<![CDATA[
    var theForm = document.forms['frmGMShippingInfo'];
    if (!theForm) {
        theForm = document.frmGMShippingInfo;
    }
    function __doPostBack(eventTarget, eventArgument) {
        if (!theForm.onsubmit || (theForm.onsubmit() != false)) {
            theForm.__EVENTTARGET.value = eventTarget;
            theForm.__EVENTARGUMENT.value = eventArgument;
            theForm.submit();
        }
    }
    //]]>
    </script>

    I think that with the available documentation and samples I can get through this mostly on my own.  But I am unclear about determining what I need to POST(send).  Can I figure it all out from the html or will I need a packet sniffer?  (I've tried two packet sniffers; one refused to run because it said I don't have a dll which I already have, and the other crashed IE!  Unfortunately I can only afford the free ones.)

    I've also tried MS Network Monitor 3.4.  But I don't think that I have enough networking/IP knowledge to understand its output.  It doesn't seem to report a POST when I click on a page number. 

    I don't speak JavaScript so even if the JavaScript above tells me everything I know I wouldn't how to translate it into a WebRequest POST.

    Where can I find some help with this?

    Thanks,  Bob
    Thursday, April 21, 2011 3:20 AM

Answers

All replies