[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: HTML/XML parser
> Hmm another package to install... I keep installing things to be able to
> use mnemonic, i don't think that is good...
I think it is. It means we're not duplicating work that has already
been done and debugged.
> Another remark you are going to parse the HTML two times? one time to
> correct it and one time by PCCTS. (you can only correct it if you know the
> structure.)
Not quite. Most of the incorrect HTML files are incorrect because of
missing end tags and incorrect nesting. Both can be handled and
corrected without knowing anything else about the structure. I want to
avoid cluttering the parser for correct HTML with tricks and guessing
algorithms. Maybe look at it as a two-stage parser (both could be done
by PCCTS).
Kasper
- References:
- Re: HTML/XML parser
- From: "M.Stekelenburg <m.stekelenburg@student.utwente.nl>" <root@cal006063.student.utwente.nl>