mirror of
https://github.com/ezyang/htmlpurifier.git
synced 2025-07-31 19:30:21 +02:00
Remove trailing whitespace.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
This commit is contained in:
@@ -2,7 +2,7 @@ Considerations for ErrorCollection
|
||||
|
||||
Presently, HTML Purifier takes a code-execution centric approach to handling
|
||||
errors. Errors are organized and grouped according to which segment of the
|
||||
code triggers them, not necessarily the portion of the input document that
|
||||
code triggers them, not necessarily the portion of the input document that
|
||||
triggered the error. This means that errors are pseudo-sorted by category,
|
||||
rather than location in the document.
|
||||
|
||||
@@ -13,7 +13,7 @@ can report errors still process the document mostly linearly. Furthermore,
|
||||
not only do they process linearly, but the way they pass off operations to
|
||||
sub-systems mirrors that of the document. For example, AttrValidator will
|
||||
linearly proceed through elements, and on each element will use AttrDef to
|
||||
validate those contents. From there, the attribute might have more
|
||||
validate those contents. From there, the attribute might have more
|
||||
sub-components, which have execution passed off accordingly.
|
||||
|
||||
In fact, each strategy handles a very specific class of "error."
|
||||
@@ -56,7 +56,7 @@ set it as a document-wide error. And actually, nothing needs to be done here.
|
||||
Something interesting to consider is whether or not we care about the locations
|
||||
of attributes and CSS properties, i.e. the sub-objects that compose these things.
|
||||
In terms of consistency, at the very least attributes should have column/line
|
||||
numbers attached to them. However, this may be overkill, as attributes are
|
||||
numbers attached to them. However, this may be overkill, as attributes are
|
||||
uniquely identifiable. You could go even further, with CSS, but they are also
|
||||
uniquely identifiable.
|
||||
|
||||
@@ -80,12 +80,12 @@ cases).
|
||||
4. Setup ErrorCollector to use context information to setup hierarchies.
|
||||
This may require a different internal format. Use objects if it gets
|
||||
complex. [DONE]
|
||||
|
||||
|
||||
ASIDE
|
||||
More on this topic: since we are now binding errors to lines
|
||||
and columns, a particular error can have three relationships to that
|
||||
specific location:
|
||||
|
||||
|
||||
1. The token at that location directly
|
||||
RemoveForeignElements
|
||||
AttrValidator (transforms)
|
||||
@@ -95,50 +95,50 @@ cases).
|
||||
3. A modification to that node (i.e. contents from start to end
|
||||
token) as a whole
|
||||
FixNesting
|
||||
|
||||
|
||||
This needs to be marked accordingly. In the presentation, it might
|
||||
make sense keep (3) separate, have (2) a sublist of (1). (1) can
|
||||
be a closing tag, in which case (3) makes no sense at all, OR it
|
||||
should be related with its opening tag (this may not necessarily
|
||||
be possible before MakeWellFormed is run).
|
||||
|
||||
|
||||
So, the line and column counts as our identifier, so:
|
||||
|
||||
|
||||
$errors[$line][$col] = ...
|
||||
|
||||
|
||||
Then, we need to identify case 1, 2 or 3. They are identified as
|
||||
such:
|
||||
|
||||
|
||||
1. Need some sort of semaphore in RemoveForeignElements, etc.
|
||||
2. If CurrentAttr/CurrentCssProperty is non-null
|
||||
3. Default (FixNesting, MakeWellFormed)
|
||||
|
||||
|
||||
One consideration about (1) is that it usually is actually a
|
||||
(3) modification, but we have no way of knowing about that because
|
||||
of various optimizations. However, they can probably be treated
|
||||
the same. The other difficulty is that (3) is never a line and
|
||||
column; rather, it is a range (i.e. a duple) and telling the user
|
||||
the very start of the range may confuse them. For example,
|
||||
|
||||
|
||||
<b>Foo<div>bar</div></b>
|
||||
^ ^
|
||||
|
||||
|
||||
The node being operated on is <b>, so the error would be assigned
|
||||
to the first caret, with a "node reorganized" error. Then, the
|
||||
ChildDef would have submitted its own suggestions and errors with
|
||||
regard to what's going in the internals. So I suppose this is
|
||||
ok. :-)
|
||||
|
||||
|
||||
Now, the structure of the earlier mentioned ... would be something
|
||||
like this:
|
||||
|
||||
|
||||
object {
|
||||
type = (token|attr|property),
|
||||
value, // appropriate for type
|
||||
errors => array(),
|
||||
sub-errors = [recursive],
|
||||
}
|
||||
|
||||
|
||||
This helps us keep things agnostic. It is also sufficiently complex
|
||||
enough to warrant an object.
|
||||
|
||||
@@ -173,7 +173,7 @@ Then we setup suggestions.
|
||||
|
||||
5. Setup a separate error class which tells the user any modifications
|
||||
HTML Purifier made.
|
||||
|
||||
|
||||
Some information about this:
|
||||
|
||||
Our current paradigm is to tell the user what HTML Purifier did to the HTML.
|
||||
@@ -187,7 +187,7 @@ the correct version isn't a bad idea, but problems arise when:
|
||||
- The user has such bad HTML we do something odd, when we should have just
|
||||
flagged the HTML as an error. Such examples are when we do things like
|
||||
remove text from directly inside a <table> tag. It was probably meant to
|
||||
be in a <td> tag or be outside the table, but we're not smart enough to
|
||||
be in a <td> tag or be outside the table, but we're not smart enough to
|
||||
realize this so we just remove it. In such a case, we should tell the user
|
||||
that there was foreign data in the table, but then we shouldn't "demand"
|
||||
the user remove the data; it's more of a "here's a possible way of
|
||||
@@ -204,6 +204,6 @@ Don't forget to spruce up output.
|
||||
|
||||
6. Output needs to automatically give line and column numbers, basically
|
||||
"at line" on steroids. Look at W3C's output; it's ok. [PARTIALLY DONE]
|
||||
|
||||
|
||||
- We need a standard CSS to apply (check demo.css for some starting
|
||||
styling; some buttons would also be hip)
|
||||
|
Reference in New Issue
Block a user