I need to extract image urls from HTML File in C#
Posted in Help the coder! on Apr 26, 2009 at 18:56 IST (about 1 year ago). Subscribe to this post
Email
Showing comments 1 to 5 of total 5 on page 1 of 1
Post replyShowing comments 1 to 5 of total 5 on page 1 of 1
« Previous1Next »
mojhonggRank: 218
Can anyone help me by explaining how to extract image urls from HTML File in C#
Posted by mojhongg on Sunday, April 26, 2009, 6:56 pm
thilakRank: 314
But try to search for a regular expression containing "img src= ..>"
maybe something like '#]src\s=\s*(["\'])(.*?)\1#im'
Check this article
Posted by thilak on Sunday, April 26, 2009, 8:54 pm
coolcodeRank: 90
The HTML Agility Pack can do this - just use a query like //img and access the src - like so:
string html;
using (WebClient client = new WebClient()) {
html = client.DownloadString("http://www.google.com");
}
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);
foreach(HtmlNode img in doc.DocumentNode.SelectNodes("//img")) {
Console.WriteLine(img.GetAttributeValue("src", null));
}
Posted by coolcode on Monday, April 27, 2009, 1:55 am
joshi007Rank: 269
You have to parse the HTML and check the img tag use the following link it includes C# library for parsing HTML tags i faced your problem b4 and i used this library and working well with me Parsing HTML tags
Posted by joshi007 on Tuesday, April 28, 2009, 1:39 am
manojmeRank: 67
new Regex(@"
]*src\s*=\s*((('(?[^']*)')|((""(?[^""]*)"")))[^>]*>")
.Matches(htmlString).OfType().Select(x=>x.Groups[1].Value).ToList();
Not tested.
Posted by manojme on Wednesday, April 29, 2009, 1:37 am
Pages: « Previous1Next »