Loading html into an object

I need a way to convert a full web page I have in a string variable into an object like the IE Dom so I can get values out of cells in the tables on the web page.

I am using VS 2005 for Software Testers (VSST). What I am testing is dynamic and I can't get away with direct record and playback. (As if anyone ever can) What I need to do is post information to the web server and then get out of the response some key information that needs to be used to create the next post .

To post to the web server the test software uses the WebTestRequest object. To get data out of the response you use an Extraction Rule. The test developer (me) can create custom extraction rules if needed. One of the attributes of the response object is the BodyString. That BodyString contains, get this, the entire body text of the response html page. Exactly what you would see if you viewed the source from IE. I would like to take that string and convert it into an object just like the IE Dom. That would allow me to then locate the tables collection and a table object in that collection. Then I could get a row object in the rows collecting for that table. From there I could then get the values in the cells in each row.

Specifically:
To open the edit page for one of the items listed in a tableI need to click on the edit button for that row. All of the edit buttons have the same visible text. To uniquely identify them I have to seach through the Description column in the table, get the name of that control, parse out the unique part, then construct the name of the edit button that goes with that description so it can be sent with the next Post.

What I need is to be pointed in the right direction. I'll do the digging and figure out how to get things working but I need help figuring out what I need to be working on.

Thanks,

MArk B.




Answer this question

Loading html into an object

  • chaillom

    In short words, you just want the content of a web page Here is a little example:


    using(WebClient client = new WebClient())
    {
     string url = "http://www.microsoft.com";
     Stream responseStream = client.OpenRead( url );
     try
     {
      using(StreamReader reader = new StreamReader(responseStream))
      {
       string contentAsString = reader.ReadToEnd();
       
       // TODO: Add logic here.
      }
     }
     finally
     {
      responseStream.Close();
     }
    }

     



  • Woodyone

    Shakalama, thanks for you reply. I have tried it after you suggest the solution, but i didn't encounter any problems.

    I think the image tag will be the hardest, i see a lot of <img src="shakalama.gif">, the xml reader will throw a format exception.

    But this unwell formatted html will be hard to tokenize with Regex or other string techniques.


  • kiranin

    I finally figured out what I needed to do. Code below.

    Add a reference to SHDocVw and a using for it.

    This is the code I'm using:

    if (e.Response.HtmlDocument != null)

    {

    // Get the browser object going.

    SHDocVw.InternetExplorer internetExplorer = new SHDocVw.InternetExplorerClass();

    SHDocVw.IWebBrowser2 webBrowser = (SHDocVw.IWebBrowser2)internetExplorer;

    // Make the web browser visible.

    // This should probably be false for test execution but is true for test dev.

    webBrowser.Visible = true;

    // Display empty page so we have something to manipulate.

    object noValue = System.Reflection.Missing.Value;

    // Navigate somewhere, anywhere, just get there.

    // Not sure why this is done. Was part of the orginal example.

    webBrowser.Navigate("about:blank", ref noValue, ref noValue, ref noValue, ref noValue);

    // Get the html document object of the browser.

    mshtml.IHTMLDocument2 htmlDoc = internetExplorer.Document as mshtml.IHTMLDocument2;

    // Inject the HTML into the browser.

    htmlDoc.writeln(e.Response.BodyString.ToString());

    // TODO: This is where the logic goes that pulls out what we need to send back

    //Close the document.

    htmlDoc.close();

    // Shut down the browser instance.

    webBrowser.Quit();

    e.Success = true;

    return;

    }



  • ketparm

    Hello

    As PJ. van de Sande told that get the Information from the web site. In this way you will recieve the information from the site.

    Parsing or extracting information is the second part, how to extract information from a web page contants

    All HTML pages are basically XML documents. So you can load the Entire page in XML Document. Then you can search for the required information in the xml document.

    Another way of ectracting information is use regular expression to do so. just have a look to these threades

    http://forums.microsoft.com/MSDN/ShowPost.aspx PostID=283380&SiteID=1

    http://forums.microsoft.com/MSDN/ShowPost.aspx PostID=284501&SiteID=1



  • tpack

    Thanks PJ but I don't need to do a web request because that has already been done.

    >I need a way to convert a full web page I have in a string variable into an object like the IE Dom.

    MArk B.



  • N3vik

    Akbar, a really good suggestion! Html can just be readed with a Xml Reader.

    A other easy way to get specific content is to use Regex or Xpath (when you are using a Xml Reader).


  • bradbury9

    hi,

    just one warnning , 2 days ago i thought about useing html page as xmldoc , but i counter some problem,

    html pages doesn't inforce you to close your tags like <td> without </td> , its not case senstive like xml doc like <TD></td> will work but will not work with xml , also it doesn't follow the rule of open and close tags orders like <td><p></td></p> this will work with html but not with xml,

    the wonderfull part was that, this page that full of incomplete tags was produced by IE export favorits

    so you can do that if you are very sure the page that you will read is well formed

    best regards



  • KrishnaSrikanth

    html is not xml, it is sgml(does not require end tags). At times, you are going to have validation issues using the XML DOM. Take a look at http://www.gotdotnet.com/Community/UserSamples/Details.aspx SampleGuid=b90fddce-e60d-43f8-a5c4-c3bd760564bc, it is a great library and will convert HTML to valid XML.
  • RyanE - Windows SDK Docs

    I don't understand your code, you don't specify a url anywere. I you want to make a request to that url and read the response you can use a the HttpWebRequest class to create your own detailed request, the WebClient class is just a facede that uses the HttpWebRequest and WebResponse class.

    Here is the same example i wrote before, but then without using the WebClient class but using your own detailed HttpWebRequest:


    string url = "http://www.microsoft.com";
    using( HttpWebRequest request = HttpWebRequest.Create( url ))
    {
    request.Proxy = WebProxy.GetDefaultProxy();

    WebResponse response = request.GetResponse();

    try
    {
    using( Stream responseStream = response.GetResponseStream() )
    using( StreamReader reader = new StreamReader( responseStream ) )
    {
    string contentAsString = reader.ReadToEnd();

    // TODO: Add logic here.
    }
    }
    finally
    {
    response.Close();
    }
    }




  • ug751

    Puck, thanks for asking that question. PJ, thanks for answering the question. It works!

    But how do I send my request to the web site
    I use the one below for Web Services, but I always send my message to same place.

    private void button2_Click(object sender, System.EventArgs e)
    {
    // Submit Message
    WebReference.MessageService sendMessage = new WebReference.MessageService();
    sendMessage.textBoxMsg(textBox1.Text);
    }

    How do I address my call to the URL below

    http://registry.faa.gov/aircraftinquiry/NNumSQL.asp NNumbertxt=12345&cmndfind.x=12&cmndfind.y=8

    Lazy J


  • Amazon Mother

    PJ. van de Sande wrote:
    Shakalama, thanks for you reply. I have tried it after you suggest the solution, but i didn't encounter any problems.

    I think the image tag will be the hardest, i see a lot of <img src="shakalama.gif">, the xml reader will throw a format exception.

    But this unwell formatted html will be hard to tokenize with Regex or other string techniques.

    PJ

    i say its a good way but if i used it, i have to be curious because html is not always well formated so i'll use it just with trusted pages that i know but not as a general way to retrieve html pages contents. it will need some tests first with the pages that i gonna retrieve to xmldocument . that is just what i meant

    best regards



  • Loading html into an object