public static List<string> GetImagesFromGoogle(string search)
{
string googleUrl = "http://images.google.com/images q=" + search;
List<string> urlList = new List<string>();
string html = GetHtml(googleUrl);
string regExPattern = @"< \s* img [^\>]* src \s* = \s* [\""\'] ( [^\""\'\s>]* )";
Regex r = new Regex(regExPattern, RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);
MatchCollection matches = r.Matches(html);
foreach (Match m in matches)
{
//only return images from the search result (aka not Google's logo or ads)
if (m.Groups[1].Value.ToString().IndexOf("/images ") != -1)
{
//Provide full path for image
string imageUrl = @"http://images.google.com" + m.Groups[1].Value;
urlList.Add(imageUrl);
}
}
return urlList;
}
hi this function suppose to get all images tags. from a html
problem is that i dont understand what he is doing..
string html = GetHtml(googleUrl);
string regExPattern = @"< \s* img [^\>]* src \s* = \s* [\""\'] ( [^\""\'\s>]* )";
Regex r = new Regex(regExPattern, RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);
MatchCollection matches = r.Matches(html);
what does Regex do why is he declaring this
string regExPattern = @"< \s* img [^\>]* src \s* = \s* [\""\'] ( [^\""\'\s>]* )";
Please help .. i never used the regex class . Thanks!!!

please help understand this:
fcp
Regular expressions (Regex) are a language for parsing and manipulating text. Regex supports three basic types of operations:
-
-
-
Try this link for a tutorial -> http://www.regular-expressions.info/tutorial.htmlSplitting strings into substrings using regular expressions to identify separators
Searching strings for substrings that match patterns in regular expressions
Performing search-and-replace operations using regular expressions to identify the text you want to replace
HTH,
Quirm
Hi,
the string you have sets the criteria for the match. Let's break it in smaller pieces:
\s represents any blank character, like space, tab, newline and a few others.
* means "zero or more of the preceding character"
so, \s* means "zero or more blanks"
[^\>] means any character except greater than, so [^\>]* means zero of more characters, except greater than. (the caret ^ means "except")
[\"\'] means either single (') or double (") quotation mark.
means "zero or one of the preceding character"
so, [\"\'] means "match a single or double quotation mark, if present"
[^\"\'\s>] means any character that is not a single quotation mark, or a double quotation mark, or a blank character, or a greater-than sign.
so [^\"\'\s>]* means zero or more characters that are neither single, nor double quotation marks, nor blanks, nor the greater-than sign.
Putting something between parenthesis means that is the part you are really interested in that will be returned as a match.
Putting it all together, it says, more or less:
Look for a "<img" (possibly with some blanks between < and img), possibly followed by anything that isn't a ">" and then followed by "src=" (possibly with some blanks before and after = ). Ignore a single or double quotation mark, if present. If you find all this, then grab what follows up to the first quotation mark, or blank character, or the end of the img tag (the ">") sign.
In a nutshell, it explains how to grab the value of the src attribute of an img tag. There are a few extra characters, but it should work.
Hope that helps...
--mc
spinone_owner
Crirus
string regExPattern = @"< \s* img [^\>]* src \s* = \s* [\""\'] ( [^\""\'\s>]* )";
Thanks!