Arabica’s Taggle is a HTML parser that outputs a well formed HTML. Since I haven’t found Win32 binaries of Taggle, here’s compiled version along with other Arabica’s tools to parse XML. Original TagSoup is written in Java. Arabica jest C++ library that provides XML and HTML processing functions.
Part of Arabica is Taggle, port of TagSoup to C++. What you can download here are statically compiled Win32 binaries of all Arabica tools, including Taggle.