How to examine HTML from a web page?

I'm a bit perplexed with this. I've got about 4096 characters that are returned from Webbrowser.documenttext property, but the rest of the page is lopped off when assigned to a string. Any idea if there is someway to get at this information with VB 2005 Express WebBrowser is type web.

 

Thanks,

 

Ben

 

Here's the failing code

Private Sub FGSADiscountAutomator_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load

Dim WebPage As String

If CBoxAutoUpdate.Text = "On Startup" Then

WebBrowser.Navigate(New Uri("http://www.gsa.gov/Portal/gsa/ep/home.do tabId=2"))

Do Until (WebBrowser.DocumentText <> "")

Application.DoEvents()

Loop

WebPage = WebBrowser.DocumentText.ToString

WebPage = WebPage & WebBrowser.DocumentText

MessageBox.Show(InStr(WebBrowser.DocumentText.ToString, "firstlevel", CompareMethod.Text) & " " & WebPage.Length)

FindCatagories(WebPage)

End If

End Sub



Answer this question

How to examine HTML from a web page?

  • David Collie

     

    Actually this is a very POOR technical solution to the problem although it may work it loops continuously.

    There is a corrrect way to do this.

    In the DocumentComplete Event, the document will be complete without the need to append. That's what this event is for.

     

    Public Class Form1

     

        Protected WebPage As String

     

        Private Sub Form1_Load(ByVal sender As Object, ByVal e As System.EventArgs) Handles Me.Load

     

        End Sub

     

        Private Sub cbGo_Click(ByVal sender As Object, ByVal e As System.EventArgs) Handles cbGo.Click

            WebBrowser1.Navigate("http://www.gsa.gov/Portal/gsa/ep/home.do tabId=2")

        End Sub

     

     

       

        Private Sub WebBrowser1_DocumentCompleted(ByVal sender As Object, ByVal e As System.Windows.Forms.WebBrowserDocumentCompletedEventArgs) Handles WebBrowser1.DocumentCompleted

     

            WebPage = WebBrowser1.DocumentText.ToString

            Dim a As Integer = WebPage.Length

        End Sub

     

    End Class

    Your solution is a solution I've seen in VB6 and it shows does not take advantage  of asynchronous processing and use of events. Your solution, looping until the document is read actually consumes most of the processor in that loop.

    It is a really poor solution even though it works.

     

     

     

     

     



  • Moshik

    I'm not worried too much about the web browser scope. What I'm confused with is the string scope. I'm parsing html, so I need to be able to modify it somewhere and get rid of the parts not used. Passing the string too and from event handlers seems to be a possible solution. Or at least calling functions & subroutines from the event handler might work.
  • custom gljaber

    Ok. Thanks.

    It seems like I'm going to need to read up on VB 2005. Any good book suggestions Or learning resources


  • vinod123

    I was playing around with the idea of using event handlers to solve the problem. But I was confused as to how to pass the data to and from the function/subroutine to the handler.

    This is my first VB 2005 programming attempt, and I am coming from VB6. Thank you for the constructive criticism.


  • gunnarD

    Alright, I found a solution for anyone that might be interested. The web browser wasn't able to pull the document in fast enough, so only the first 4096 character could get through before the do until loop evaluated to false.

     

    By using a webbrowser method to check the readystate of the document VB was able to consistently wait until the document was finished loading before moving on to parsing the data.

     

    Do Until (WebBrowser.ReadyState = WebBrowserReadyState.Complete)

    Application.DoEvents()

    Loop


  • Kean

    Ben,

    Where I placed that string, makes it accessible, ie readable and writable from every routine in that class.

    Any time that class is reinstantiated, that string will be too.

    If you want a single string that's never reinstantiated declare it shared or put it in a module and declare it public..

    But that string as a member variable, that string is usable with out passing it, any in that class.



  • Marcus2828

    Ben your welcome. Notice how I defined webpage as a member variable.

    It's accessible to all routines.

    I really didn't mean to be critical, it's just that's such a bad solution that that I don't want to see it catch on because it can degrade performance serverely.

    I'll be glad to help you anytime I can.

    Good luck!!!!



  • MojoRobbins

    Ben,

    I think you're doing the right thing. You're don't small and simple projects and making errors and receiving feedback which is how we all grow and learn.

    If you look around in the Vs2005 advertising, you'll find that MS has a great free online book on VS2005.

    It'll help a lot.

    Renee



  • How to examine HTML from a web page?