1
0
mirror of https://github.com/ezyang/htmlpurifier.git synced 2025-07-31 19:30:21 +02:00

Make the definition format much more logical. Begin migrating specification docs to their respective classes.

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@133 48356398-32a2-884e-a903-53898d9a118a
This commit is contained in:
Edward Z. Yang
2006-07-30 19:11:18 +00:00
parent 70bd80e66a
commit 558c49a92d
6 changed files with 86 additions and 93 deletions

View File

@@ -1,9 +1,7 @@
HTML Purifier Specification
HTML Purifier
by Edward Z. Yang
== Introduction ==
There are a number of ad hoc HTML filtering solutions out there on the web
(some examples including HTML_Safe, kses and SafeHtmlChecker.class.php) that
claim to filter HTML properly, preventing malicious JavaScript and layout
@@ -56,29 +54,6 @@ HTML tags. Things like blog comments are, in all likelihood, most appropriately
written in an extremely restrictive set of markup that doesn't require
all this functionality (or not written in HTML at all).
== STAGE 1 - parsing ==
Status: A (see source, mainly internals and UTF-8)
The Lexer (currently we have three choices) handles parsing into Tokens.
Here are the mappings for Lexer_PEARSax3
* Start(name, attributes) is openHandler
* End(name) is closeHandler
* Empty(name, attributes) is openHandler (is in array of empties)
* Data(parse(text)) is dataHandler
* Comment(text) is escapeHandler (has leading -)
* Data(text) is escapeHandler (has leading [, CDATA)
Ignorable/not being implemented (although we probably want to output them raw):
* ProcessingInstructions(text) is piHandler
* JavaOrASPInstructions(text) is jaspHandler
== STAGE 2 - remove foreign elements ==
Status: A- (transformations need to be implemented)