The AngleSharp project is still ongoing. With the latest release a lot of new features are available. Also the API has been changed completely, to make place for a future proof, robust and extensible version.
Major bug fixes
Some bugs have been fixed. The number was quite limited, but that was expected as a lot of unit tests ensure standard compatibility and robustness. Nevertheless, as the code is huge and the standard is complex, bugs are always possible.
The most annoying bug had to do with overlong character references. A character reference starts with an ampersand. It ends, if a non-name character (in particular a semi-colon) has been found. Then the valid character reference string with the most characters will be chosen. Unfortunately there are apparently character references on some webpages, that really go far beyond the 31 characters that may be found on the longest allowed character reference.
Furthermore bugs in the existing
Location object, the CSSOM and more have been identified and fixed.
The new API of AngleSharp completely relies on interfaces. If an implementation is interesting for the outside, it is public. An example is the
Configuration class. This allows to directly use it, e.g., for inheriting from it and redefining only a subset of the provided methods. Or one directly starts by implementing
IConfiguration. This is now up to the user to decide.
An example: The
innerHTML property of an
Element is now the
InnerHtml property of the
IElement interface. The W3C name is good, however, the upper- and lowercase usage has been adjusted. However, for instance the
bubbles property of an
Event object has been renamed to
IsBubbling (for the interface
IEvent). Most boolean properties start with "Is".
In general the new API allows (re-)implementing some existing interfaces. Inserting them should also work in general, however, some depend on internal stuff. This may be addressed in future versions, depending on usage, problems and solutions.
AngleSharp v0.6 uses a new class called
TextSource for adressing the source code. This partially replaces the
SourceManager. The remains of the previous solution are now directly integrated into the
BaseTokenizer, which is used by the HTML and CSS tokenizer.
Why this new way? Well, the
TextReader worked somehow great, but in general unreliable and hard to control. There was no going back and it was impossible to set a buffer limit (or access the buffer). In general the only solution was to throw away the old one and create a new instance if we wanted to change, e.g., the encoding. Now everything is in our control and we can directly work with text over the new class, which is working with a (network) stream or a fixed (finished) source.
This allows, e.g., to write on the source while processing it. A feature that uses this is the
Write (and additionally the
WriteLine) method of the
With v0.6 nearly the whole CSSOM is included. Yet it is still too rough (and early) to work with it. I will most probably do some major rewriting for v0.7, but we can already see where this is going. The biggest problem with the current point is the distinction between raw values (color, angle, number, percentage, ...) and CSS values (which could be just raw values, or more complicated ones, such as a computed value). Right now the outcome is quite mixed and I will try to make it simple, clean and easy to work with.
Styling is already completely in there, which means that you can easily integrate other styling engines additional to the default CSS engine (which is registered by default as well). One word of warning. Currently two options are available: One is WHICH engine(s) are available, the other IF an engine could be used. Therefore just providing, e.g., a scripting engine is not sufficient. We also need to activate scripting.