Roy Tang

Programmer, engineer, scientist, critic, gamer, dreamer, and kid-at-heart.

Blog Notes Photos Links Archives About

I need to parse a large amount of text that uses HTML font tags for formatting,

For example:

<font face="fontname" ...>Some text</font>

Specifically, I need to determine which characters would be rendered using each font used in the text. I need to be able to handle stuff like font tags inside another font tag.

I need to use C# for this. Is there some sort of C# parser class to make this easier? Or would I have to write it myself?

Thanks!

Comments

I have not used it, but I have seen the HTML Agility Pack frequently mentioned for this type of thing.
Not sure if this is applicable to your situation as I don’t know what the intended use is, but what about the use of XSLT tempaltes?

You could load the HTML into Internet Explorer, and then query the DOM for font tags, (or CSS style).

Don’t know if this is the best option performance wise.