Six months ago I started the development of AngleSharp, the ultimate angle bracket parser. Three months ago I made the project public, with a working HTML5 parser and a state of the art CSS parser. In those last three months I did not stop working on the library, enhancing it with some features like...
- A fully functional XML parser
- A more complete DOM implementation
- Live collections and coupled parsing
- DOM annotations to allow automatic script binding
- A better API to access everything
- Some helpers like
ToText()
- And an updated parsing model
Since I began working on the parser the specification (Editor's draft - nightly version) changed a bit. What's now? The <template>
element! This element appends all nodes to a special property called content
, which is a DocumentFragment
object, i.e. it is a container for documents.
The special ability of this element is to parse all elements that are contained (building a full DOM for them), but to prevent resource handling like image loading or script execution.
Right now it is only supported in the latest Chrome (27+), Firefox (22+) and Opera (16+). A short time ago it was not even available on the popular page CanIuse.
Additionally I made some comparisons (in performance and reliability) between AngleSharp and the (most well-known) HtmlAgilityPack (HAP) library. Needless to say that AngleSharp came out as a clear winner, being the only one to parse malformed documents correctly (according to the HTML5 standard) and being usually faster. In those scenarios where HAP is been faster, the webpages also contained a lot of inline CSS. This is not parsed by HAP, which might be a point in their favor (well, its a little bit faster then), but could also be a point for me (a parsed CSS content could be used for more interesting applications like transforming documents with external resources to single file documents).
The road map for the next minor release (0.4) of AngleSharp has additional MathML, SVG and XML support on its list. Surely the API will be improved as well.