From 58f00105c8b8ed757c261fd88f8110e55536d431 Mon Sep 17 00:00:00 2001
From: "Edward Z. Yang"
Date: Sat, 9 Jun 2007 14:53:21 +0000
Subject: [PATCH] Update txt docs.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1134 48356398-32a2-884e-a903-53898d9a118a
---
docs/enduser-security.txt | 10 +++-------
docs/index.html | 6 ------
docs/proposal-config.txt | 32 ++++++--------------------------
docs/proposal-filter-levels.txt | 30 ++++++++++++++----------------
docs/ref-strictness.txt | 33 ---------------------------------
docs/ref-whatwg.txt | 23 +++++++++++++++++++----
6 files changed, 42 insertions(+), 92 deletions(-)
delete mode 100644 docs/ref-strictness.txt
diff --git a/docs/enduser-security.txt b/docs/enduser-security.txt
index d33f473c..49aff331 100644
--- a/docs/enduser-security.txt
+++ b/docs/enduser-security.txt
@@ -8,15 +8,11 @@ to be effective. Things to remember:
1. Character Encoding: see enduser-utf8.html for more info.
-2. Doctype: document pending feature completion
-Not strictly necessary, actually. More in-depth discussion once we figure
-out how to get strict loose mode working.
+2. IDs: see enduser-id.html for more info
-3. IDs: see enduser-id.html for more info
-
-4. Links: document pending feature completion
+3. Links: document pending feature completion
Rudimentary blacklisting, we should also allow only relative URIs. We
need a doc to explain the stuff.
-5. CSS: document pending
+4. CSS: document pending
Explain which CSS styles we blocked and why.
diff --git a/docs/index.html b/docs/index.html
index 4bae94b5..4a8dabb1 100644
--- a/docs/index.html
+++ b/docs/index.html
@@ -141,12 +141,6 @@ the code. They may be upgraded to HTML files or stay as TXT scratchpads.
List of vendor-specific tags we may want to transform to W3C compliant markup. |
-
- Reference |
- Strictness |
- Short essay on how loose definition isn't really loose. |
-
-
Reference |
Modularization of HTMLDefinition |
diff --git a/docs/proposal-config.txt b/docs/proposal-config.txt
index 95b8a6d3..a9ee73a4 100644
--- a/docs/proposal-config.txt
+++ b/docs/proposal-config.txt
@@ -1,6 +1,5 @@
Configuration
- [needs updating]
Configuration is documented on a per-use case: if a class uses a certain
value from the configuration object, it has to define its name and what the
@@ -13,29 +12,10 @@ the documentation in ConfigDef for more information on these namespaces.
Since configuration is dependant on context, internal classes require a
configuration object to be passed as a parameter. (They also require a
-Context object).
+Context object). A majority of classes do not need the config object,
+but for those who do, it is a lifesaver.
-In relation to HTMLDefinition and CSSDefinition, there could be a special class
-of directives that influence the *construction* of the Definition object.
-A theoretical call pattern would look like:
-
-1. Client calls Config->getHTMLDefinition()
-2. Config calls HTMLDefinition->createNew(this)
-3. HTMLDefinition constructs itself with base configuration
-4. HTMLDefinition calls Config->get('HTML')
-5. Config returns array of directives
-6. HTMLDefinition performs operations and changes specified by directives
-7. HTMLPurifier returns constructed definition
-8. Config caches definition so it doesn't have to be generated again
-9. Config returns definition
-
-You could also override Config's copy of the definition with your own
-custom copy, which OVERRIDES all directives. Only the base, vanilla copy
-is the Singleton, the object actually interfaced with is a operated-upon
-clone of that object. Also, if an update to the directives would update
-the definition, you'd have to force reconstruction.
-
-In practice, the pulling directives from the config object are
-solely need-based, and the flex points are littered throughout the
-setup() function. Some sort of refactoring is likely in order. See
-ref-xhtml-1.1.txt for more info.
+Definition objects are complex datatypes influenced by their respective
+directive namespaces (HTMLDefinition with HTML and CSSDefinition with CSS).
+If any of these directives is updated, HTML Purifier forces the definition
+to be regenerated.
diff --git a/docs/proposal-filter-levels.txt b/docs/proposal-filter-levels.txt
index ce6c7853..9e9cfbb0 100644
--- a/docs/proposal-filter-levels.txt
+++ b/docs/proposal-filter-levels.txt
@@ -2,23 +2,16 @@
Filter Levels
When one size *does not* fit all
-The more I think about it, the less sense it makes for maintaining one huge
-monolithic HTMLDefinition class. There's simply so much variation that
-could go into this definition: the set of HTML good for blog entries is
-definitely too large for HTML that would be allowed in blog comments. Going
-from Transitional to Strict requires changes to the definition.
+It makes little sense to constrain users to one set of HTML elements and
+attributes and tell them that they are not allowed to mold this in
+any fashion. Many users demand to be able to custom-select which elements
+and attributes they want. This is fine: because HTML Purifier keeps close
+track of what elements are safe to use, there is no way for them to
+accidently allow an XSS-able tag.
-Allowing users to specify their own whitelists is one step (implemented, btw),
-but I have doubts on only doing this. Simply put, the typical programmer is too
-lazy to actually go through the trouble of investigating which tags, attributes
-and properties to allow. HTMLDefinition makes a big part of what HTMLPurifier
-is.
-
-The idea, then, is to setup fundamentally different set of definitions, which
-can further be customized using simpler configuration options. Alternatively,
-they could be implemented as configuration profiles, which simply load
-a set of recommended directives to acheive a desired affect (no simpler
-config options though).
+However, combing through the HTML spec to make your own whitelist can
+be a daunting task. HTML Purifier ought to offer pre-canned filter levels
+that amateur users can select based on what they think is their use-case.
Here are some fuzzy levels you could set:
@@ -46,6 +39,10 @@ make forbidden element to text transformations desirable (for example, images).
== Element Risk Analysis ==
+Although none of the currently supported elements presents a security
+threat per-say, some can cause problems for page layouts or be
+extremely complicated.
+
Legend:
[danger level] - regular tags / uncommon tags ~ deprecated tags
[danger level]* - rare tags
@@ -130,6 +127,7 @@ any CSS properties that are not currently implemented (such as position).
Dangerous, can go outside container - float
Easy to abuse - font-size, font-family (font), width
Colored - background-color (background), border-color (border), color
+ (see proposal-colors.html)
Dramatic - border, list-style-position (list-style), margin, padding,
text-align, text-indent, text-transform, vertical-align, line-height
diff --git a/docs/ref-strictness.txt b/docs/ref-strictness.txt
deleted file mode 100644
index 0167ed86..00000000
--- a/docs/ref-strictness.txt
+++ /dev/null
@@ -1,33 +0,0 @@
-
-Is HTML Purifier Strict or Transitional?
- [rename/deprecation pending]
-
-Despite the fact that HTML Purifier professes to support both transitional and
-strict HTML, it rejects a lot of attributes and elements that are actually, indeed,
-valid. You can investigate progress.html to find out precisely what we
-are doing to these *deprecated* attributes.
-
-However, users have found that Strict HTML imposes some quite unreasonable
-restrictions on certain things. The start and value attributes in ol and
-li (respectively) perhaps are the most contested. There's is currently no
-widely supported browser method short of JavaScript that can replace these
-two deprecated elements. It behooves us to allow these deprecated
-attributes when the output is transitional.
-
-Fortunantely, that's the only real bugger case. The others have near-perfect
-CSS equivalents, and were presentational anyway. However, the other question
-pops up: should we always convert these to the CSS forms when 1. the spec
-allows them anyway and 2. older browsers support them better? After all, the
-whole point about CSS is to seperate styling from content, so inline styling
-doesn't solve that problem.
-
-[new material]
-
-HTML Purifier 1.7 creates a new organizational system for deprecated attribute/
-element transformations. They will be unified under the title of "Tidy", which
-is what they are: cleaning up after deprecated user markup into standards-compliant
-versions. There will also be a change in the default behavior (athough, to the
-end user not inspecting the HTML, there will be no change: in fact, it may
-work even better).
-
-Consult the Advanced API for more details.
\ No newline at end of file
diff --git a/docs/ref-whatwg.txt b/docs/ref-whatwg.txt
index d89344db..070d8e88 100644
--- a/docs/ref-whatwg.txt
+++ b/docs/ref-whatwg.txt
@@ -2,8 +2,23 @@
Web Hypertext Application Technology Working Group
WHATWG
-I don't think we need to worry about them. Untrusted users shouldn't be
-submitting applications, eh? But if some interesting attribute pops up in
-their spec, and might be worth supporting, stick it here.
+== HTML 5 ==
-HTML 5!!!
+URL: http://www.whatwg.org/specs/web-apps/current-work/
+
+HTML 5 defines a kaboodle of new elements and attributes, as well as
+some well-defined, "quirks mode" HTML parsing. Although WHATWG professes
+to be targeted towards web applications, many of their semantic additions
+would be quite useful in regular documents. Eventually, HTML
+Purifier will need to audit their lists and figure out what changes need
+to be made. This process is complicated by the fact that the WHATWG
+doesn't buy into W3C's modularization of XHTML 1.1: we may need
+to remodularize HTML 5 (probably done by section name). No sense in
+committing ourselves till the spec stabilizes, though.
+
+More immediately speaking though, however, is the well-defined parsing
+behavior that HTML 5 adds. While I have little interest in writing
+another DirectLex parser, other parsers like ph5p
+ can be adapted to DOMLex to support much more
+flexible HTML parsing (a cool feature I've seen is how they resolve
+boldbothitalic).