extract an a tag

Hi,
I am trying to extract an href attribute from an "a" tag, based on the anchor's text. The HTML content is like this:
<a href="Link'>http://localhost/url_for_a.html">Link A</a>
<br>
<a href="Link'>http://localhost/url_for_a.html">Link B</a>

What extraction rule i can use I tried with Extract Attribute Value, but i did not know what to fill for MatchAttribute.

Thanks,
Dan


Answer this question

extract an a tag

  • Aaron Stebner

    You can use GetAttributeValue, but the problem is that you want the attribute value when the text for the element is set to a certain value and there is no out of the box rule for doing that.  You can access the entire response by accessing the BodyString property from the Response. 

    e.Response.BodyString

    Then you can parse the reponse with regular expressions if you want.



  • ISIMike

    The Extract Attribute Value rule extracts a value from an element which has another element matching a particular value.  Unfortunately, what you want to do is not covered by one of the out of the box rules.  You will need to create a custom extraction rule to accomplish this.  Go to http://msdn2.microsoft.com/en-us/library/default.aspx


    In the search box enter custom extraction rules.  The first hit will be an article about creating and adding a custom rule to your project.

  • JLLemieux

    Dan and Richard,
    We are definitely looking at beefing up the parsing abilities in future releases.  The current parser was initially designed to be able to parse out dependent requests (images, CSS, etc.) from pages very quickly.  We didn't initially plan on making it publicly accessible and, after we did, ended up not having enough time to make it more general-purpose :)  That said, it can still be used to grab links and other useful tags on pages if those tags have a unique ID or name attribute on them.  It's just the text between tags or anything to do with closing tags that is hard to deal with using the current parser.

    Josh


  • nissley

    Thanks Richard,
    I start looking at the parser. Hopping that Microsoft will include such features in next releases.

  • Holyping

    Hi slumley,
    I was trying for some time, without success, to make a regular expression to extract the href attribute. Can you help me, please
    The pattern is a regular "a" tag like

    <a href="www.microsoft.com">The text that I know comes here</a>.

    Can I make an expression to extract www.microsoft.com knowing the "The text that I know comes here" text

    If I succeed to do that, I suppose i can use a regular expression extraction rule, right

    Thanks,
    Dan


  • bxs122

    Thanks Slumley,
    I created my custom extraction rule. There is no way to use tag.GetAttributeValueAsString, right, since there is the same limitation
    I think i have to parse the entire response.
    Any hint how to do that I start looking at regulare extresions.

    Thanks again,
    Dan

  • pratap Kumar

    I had the same issue trying to check the value of a cell in a table.  I found a useful HTML parser at http://www.codeproject.com/csharp/htmlparser.asp that could help you in doing what you need.

    - Richard.

  • extract an a tag