having regex problems - no more than a few words?

[edit] I will retract my original question until i find further problems, seems to be ok for now[/edit]

in the mean time i will ask another Q!

I am reading a stream from the net using GetResponse, to a string

since this stream may have inner and outer html code (inner meaning examples shown in html on an html page) - is there a way of finding the html tags (like a normal search pattern)

because I can search for anything else BUT any html tags

do i need to use some other expression for regex or is there another way

other than this, there may be some characters that are used in regex (as a syntax i think) that are actually on the page (like dollar) - how can i search the string for that pattern, as it returns me false

really having difficulty :(



Answer this question

having regex problems - no more than a few words?

  • Babu Krisnaswamy-MSFT

    Technically you can't use regular expressions (REs) (or regular grammars in general) to do any sort of matching like this because the grammar itself doesn't support such a concept.  That's why no "real" programming language can be written with regular grammars.  However one of the extensions that MS made to REs in .NET is the ability to do balanced matching.  This permits things like verifying that parenthesis appear in pairs and other such things.

    However depending on what you want to do with the results RE may not be the best answer.  A RE may become rather complex especially if you need to distinguish between inner and outer HTML or you want to do some sort of parsing of the actual content.  Even with MS'es extensions REs are still context insensitive so you couldn't, for example, distinguish between <HTML> tags and <HTML> in a comment or in a CDATA block.  I would recommend instead that if you want to be able to distinguish between elements in the HTML that you should use a scanner and parser instead.

    You don't have to write your own scanner/parser however as there should be enough support in .NET to do it.  ASP.NET has to parse HTML pages so that would be a good place to start.  You could, in theory, load the HTML stream into an XML document and then use XmlDocument or XmlReader to parse it.  I don't believe there are any HTML elements that would throw the XML parser off.  Alternatively you could load the stream into a hidden WebBrowser control and then use the HtmlDocument class to read the document.

    Michael Taylor - 10/15/05

  • Kob16200

    cant seem to find an exact match of a string specified now in a text
  • cskcsk

    well i kind of found another alternative. instead of looking for the exact tags, just look for the text inside that tag.... also it enables me to make it a bit more secure when searching for tags across html pages incase of server attacks etc...
  • SaravananK

    ok, ill look at that - thanks :)

    what about when the user wants to search a particular string in a text, for example the dollar sign - for regex this is a special character. so giving it this character to search for does not find any matches on the page

     


  • having regex problems - no more than a few words?