1
0
mirror of https://github.com/ezyang/htmlpurifier.git synced 2025-08-04 13:18:00 +02:00

Merge in r657-674, prompted by near release of 1.4.0.

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/branches/strict@675 48356398-32a2-884e-a903-53898d9a118a
This commit is contained in:
Edward Z. Yang
2007-01-21 16:07:36 +00:00
parent 37ea1673dd
commit 9a84e11f34
56 changed files with 1356 additions and 509 deletions

View File

@@ -60,7 +60,7 @@ thead th {text-align:left;padding:0.1em;background-color:#EEE;}
<tbody>
<tr><th colspan="2">Standard</th></tr>
<tr class="css1 impl-yes"><td>background-color</td><td>COMPOSITE(&lt;color&gt;, transparent)</td></tr>
<tr class="css1 impl-partial"><td>background</td><td>SHORTHAND</td></tr>
<tr class="css1 impl-yes"><td>background</td><td>SHORTHAND, currently alias for background-color</td></tr>
<tr class="css1 impl-yes"><td>border</td><td>SHORTHAND, MULTIPLE</td></tr>
<tr class="css1 impl-yes"><td>border-color</td><td>MULTIPLE</td></tr>
<tr class="css1 impl-yes"><td>border-style</td><td>MULTIPLE</td></tr>
@@ -145,13 +145,13 @@ thead th {text-align:left;padding:0.1em;background-color:#EEE;}
<tr class="danger css1 impl-yes"><td>background-image</td><td>Dangerous, target milestone 1.3</td></tr>
<tr class="css1 impl-yes"><td>background-attachment</td><td>ENUM(scroll, fixed),
Depends on background-image</td></tr>
<tr class="css1"><td>background-position</td><td>Depends on background-image</td></tr>
<tr class="css1 impl-yes"><td>background-position</td><td>Depends on background-image</td></tr>
<tr class="danger impl-no"><td>cursor</td><td>Dangerous but fluffy</td></tr>
<tr class="danger css1"><td>display</td><td>ENUM(...), Dangerous but interesting;
will not implement list-item, run-in (Opera only) or table (no IE);
inline-block has incomplete IE6 support and requires -moz-inline-box
for Mozilla. Unknown target milestone.</td></tr>
<tr><td class="css1">height</td><td>Interesting, why use it? Unknown target milestone.</td></tr>
<tr class="css1"><td>height</td><td>Interesting, why use it? Unknown target milestone.</td></tr>
<tr class="danger css1 impl-yes"><td>list-style-image</td><td>Dangerous?</td></tr>
<tr class="impl-no"><td>max-height</td><td rowspan="4">No IE 5/6</td></tr>
<tr class="impl-no"><td>min-height</td></tr>
@@ -231,7 +231,7 @@ Mozilla on inside and needs -moz-outline, no IE support.</td></tr>
<tbody>
<tr><th colspan="3">CSS</th></tr>
<tr class="impl-yes"><td>style</td><td>All</td><td>Not all properties may be implemented, parser is good though.</td></tr>
<tr class="impl-yes"><td>style</td><td>All</td><td>Parser is reasonably functional. Status here doesn't count individual properties.</td></tr>
</tbody>
<tbody>
@@ -266,13 +266,13 @@ Mozilla on inside and needs -moz-outline, no IE support.</td></tr>
<tr><td rowspan="5">align</td><td>CAPTION</td><td>Near-equiv style 'caption-side', drop left and right</td></tr>
<tr><td>IMG</td><td rowspan="2">Margin-left and margin-right = auto or parent div</td></tr>
<tr><td>TABLE</td></tr>
<tr><td>HR</td><td>Equivalent style 'text-align' (IE tested)</td></tr>
<tr><td>HR</td><td>Near-equivalent style 'text-align' (Works for IE and Opera, but not Firefox). Also try <code>margin-right:auto; margin-left:0;</code> for left or <code>margin-right:0; margin-left:auto;</code> for right (optionally replacing 0 with the original margin for that side)</td></tr>
<tr class="impl-yes"><td>H1, H2, H3, H4, H5, H6, P</td><td>Equivalent style 'text-align'</td></tr>
<tr class="required impl-yes"><td>alt</td><td>IMG</td><td>Required, insert image filename if src is present or default invalid image text</td></tr>
<tr><td rowspan="3">bgcolor</td><td>TABLE</td><td>Equivalent style 'background-color' (IE tested)</td></tr>
<tr><td>TR</td><td>Equivalent style 'background-color' (IE tested)</td></tr>
<tr><td rowspan="3">bgcolor</td><td>TABLE</td><td>Equivalent style 'background-color'</td></tr>
<tr><td>TR</td><td>Equivalent style 'background-color'</td></tr>
<tr><td>TD, TH</td><td>Equivalent style 'background-color'</td></tr>
<tr><td>border</td><td>IMG</td><td>Equivalent style 'border-width', only applies when link present</td></tr>
<tr><td>border</td><td>IMG</td><td>Near equivalent style 'border-width', as it only applies when link present</td></tr>
<tr><td>clear</td><td>BR</td><td>Near-equiv style 'clear', transform 'all' into 'both'</td></tr>
<tr class="impl-no"><td>compact</td><td>DL, OL, UL</td><td>Boolean, needs custom CSS class; rarely used anyway</td></tr>
<tr class="required impl-yes"><td>dir</td><td>BDO</td><td>Required, insert ltr (or configuration value) if none</td></tr>

View File

@@ -7,6 +7,7 @@ and it's up to you to provide it the proper information and proper context
to be effective. Things to remember:
1. Character Encoding: UTF-8.
This segment will soon be obsoleted by enduser-utf8.html
Currently, the parser runs under the assumption that it is dealing
with UTF-8. Not ISO-8859-1 or Windows-1252, UTF-8. And definitely not "no
character encoding explicitly stated" or UTF-7. If you're not using UTF-8 as
@@ -27,6 +28,7 @@ this may be configurable in the future. Do you want standards compliance?
The doctype is a good place to start.
3. IDs
This segment is obsoleted by enduser-id.html
They need to be unique, but without some knowledge of the
rest of the document, it's difficult to know what's unique. %Attr.IDBlacklist
needs to be set: we may want to consider disallowing IDs by default to

View File

@@ -172,9 +172,10 @@ introduced after it has finished.</p>
<h2>Future plans</h2>
<p>It would probably be a good idea if this code was added to the core
library. Look out for the inclusion of this into the core as a decorator
or the like.</p>
<p>This functionality is part of the core library, using the
HTMLPurifier_Filter class to acheive the desired effect. Our implementation
is slightly different, and this page will be updated to reflect that
once 1.4.0 is released.</p>
</body>
</html>

View File

@@ -31,6 +31,9 @@ information for casual developers using HTML Purifier.</p>
<dt><a href="enduser-slow.html">Speeding up HTML Purifier</a></dt>
<dd>Explains how to speed up HTML Purifier through caching or inbound filtering.</dd>
<dt><a href="enduser-utf8.html">UTF-8</a></dt>
<dd>Describes the rationale for using UTF-8, the ramifications otherwise, and how to make the switch.</dd>
</dl>
<h2>Development</h2>

View File

@@ -14,15 +14,15 @@ Since configuration is dependant on context, internal classes require a
configuration object to be passed as a parameter. (They also require a
Context object).
In relation to HTMLDefinition and CSSDefinition, there is a special class
In relation to HTMLDefinition and CSSDefinition, there could be a special class
of directives that influence the *construction* of the Definition object.
A standard call pattern would look like:
A theoretical call pattern would look like:
1. Client calls Config->getHTMLDefinition()
2. Config calls HTMLDefinition->createNew(this)
3. HTMLDefinition constructs itself with base configuration
4. HTMLDefinition calls Config->get('HTMLDefinition')
5. Config returns array of directives that later construction
4. HTMLDefinition calls Config->get('HTML')
5. Config returns array of directives
6. HTMLDefinition performs operations and changes specified by directives
7. HTMLPurifier returns constructed definition
8. Config caches definition so it doesn't have to be generated again
@@ -33,3 +33,7 @@ custom copy, which OVERRIDES all directives. Only the base, vanilla copy
is the Singleton, the object actually interfaced with is a operated-upon
clone of that object. Also, if an update to the directives would update
the definition, you'd have to force reconstruction.
In practice, the pulling directives from the config object are
solely need-based, and the flex points are littered throughout the
setup() function. Some sort of refactoring is likely in order.

View File

@@ -15,7 +15,10 @@ and properties to allow. HTMLDefinition makes a big part of what HTMLPurifier
is.
The idea, then, is to setup fundamentally different set of definitions, which
can further be customized using simpler configuration options.
can further be customized using simpler configuration options. Alternatively,
they could be implemented as configuration profiles, which simply load
a set of recommended directives to acheive a desired affect (no simpler
config options though).
Here are some fuzzy levels you could set:

View File

@@ -4,8 +4,6 @@ Configuration Ideas
Here are some theoretical configuration ideas that we could implement some
time. Note the naming convention: %Namespace.Directive
%Attr.IDPrefix - prefix all ids with this
%Attr.RewriteFragments - if there's %Attr.IDPrefix we may want to transparently
rewrite the URLs we parse too. However, we can only do it when it's a pure
anchor link, so it's not foolproof

View File

@@ -2,8 +2,8 @@
Is HTML Purifier Strict or Transitional?
A little bit of helpful guidance
Despite the fact that HTML Purifier professes only to support transitional
HTML, it rejects a lot of attributes and elements that are actually, indeed,
Despite the fact that HTML Purifier professes to support both transitional and
strict HTML, it rejects a lot of attributes and elements that are actually, indeed,
valid. You can investigate progress.html to find out precisely what we
are doing to these *deprecated* attributes.
@@ -11,8 +11,8 @@ However, users have found that Strict HTML imposes some quite unreasonable
restrictions on certain things. The start and value attributes in ol and
li (respectively) perhaps are the most contested. There's is currently no
widely supported browser method short of JavaScript that can replace these
two deprecated elements. HTML Purifier does not currently support them, but
it might behoove us to do so while our output is still transitional.
two deprecated elements. It behooves us to allow these deprecated
attributes when the output is transitional.
Fortunantely, that's the only real bugger case. The others have near-perfect
CSS equivalents, and were presentational anyway. However, the other question
@@ -32,5 +32,6 @@ these loose-only constructs in loose mode:
The changed child definitions as well as the ul.start li.value are the most
compelling reasons why loose should be used. We may want offer disabling <u>,
<strike> and <s> by themselves.
<strike> and <s> by themselves. We may also want to offer no pre-emptive
deprecated conversions. This all must be unified.