diff --git a/docs/enduser-security.txt b/docs/enduser-security.txt index d33f473c..49aff331 100644 --- a/docs/enduser-security.txt +++ b/docs/enduser-security.txt @@ -8,15 +8,11 @@ to be effective. Things to remember: 1. Character Encoding: see enduser-utf8.html for more info. -2. Doctype: document pending feature completion -Not strictly necessary, actually. More in-depth discussion once we figure -out how to get strict loose mode working. +2. IDs: see enduser-id.html for more info -3. IDs: see enduser-id.html for more info - -4. Links: document pending feature completion +3. Links: document pending feature completion Rudimentary blacklisting, we should also allow only relative URIs. We need a doc to explain the stuff. -5. CSS: document pending +4. CSS: document pending Explain which CSS styles we blocked and why. diff --git a/docs/index.html b/docs/index.html index 4bae94b5..4a8dabb1 100644 --- a/docs/index.html +++ b/docs/index.html @@ -141,12 +141,6 @@ the code. They may be upgraded to HTML files or stay as TXT scratchpads.

List of vendor-specific tags we may want to transform to W3C compliant markup. - - Reference - Strictness - Short essay on how loose definition isn't really loose. - - Reference Modularization of HTMLDefinition diff --git a/docs/proposal-config.txt b/docs/proposal-config.txt index 95b8a6d3..a9ee73a4 100644 --- a/docs/proposal-config.txt +++ b/docs/proposal-config.txt @@ -1,6 +1,5 @@ Configuration - [needs updating] Configuration is documented on a per-use case: if a class uses a certain value from the configuration object, it has to define its name and what the @@ -13,29 +12,10 @@ the documentation in ConfigDef for more information on these namespaces. Since configuration is dependant on context, internal classes require a configuration object to be passed as a parameter. (They also require a -Context object). +Context object). A majority of classes do not need the config object, +but for those who do, it is a lifesaver. -In relation to HTMLDefinition and CSSDefinition, there could be a special class -of directives that influence the *construction* of the Definition object. -A theoretical call pattern would look like: - -1. Client calls Config->getHTMLDefinition() -2. Config calls HTMLDefinition->createNew(this) -3. HTMLDefinition constructs itself with base configuration -4. HTMLDefinition calls Config->get('HTML') -5. Config returns array of directives -6. HTMLDefinition performs operations and changes specified by directives -7. HTMLPurifier returns constructed definition -8. Config caches definition so it doesn't have to be generated again -9. Config returns definition - -You could also override Config's copy of the definition with your own -custom copy, which OVERRIDES all directives. Only the base, vanilla copy -is the Singleton, the object actually interfaced with is a operated-upon -clone of that object. Also, if an update to the directives would update -the definition, you'd have to force reconstruction. - -In practice, the pulling directives from the config object are -solely need-based, and the flex points are littered throughout the -setup() function. Some sort of refactoring is likely in order. See -ref-xhtml-1.1.txt for more info. +Definition objects are complex datatypes influenced by their respective +directive namespaces (HTMLDefinition with HTML and CSSDefinition with CSS). +If any of these directives is updated, HTML Purifier forces the definition +to be regenerated. diff --git a/docs/proposal-filter-levels.txt b/docs/proposal-filter-levels.txt index ce6c7853..9e9cfbb0 100644 --- a/docs/proposal-filter-levels.txt +++ b/docs/proposal-filter-levels.txt @@ -2,23 +2,16 @@ Filter Levels When one size *does not* fit all -The more I think about it, the less sense it makes for maintaining one huge -monolithic HTMLDefinition class. There's simply so much variation that -could go into this definition: the set of HTML good for blog entries is -definitely too large for HTML that would be allowed in blog comments. Going -from Transitional to Strict requires changes to the definition. +It makes little sense to constrain users to one set of HTML elements and +attributes and tell them that they are not allowed to mold this in +any fashion. Many users demand to be able to custom-select which elements +and attributes they want. This is fine: because HTML Purifier keeps close +track of what elements are safe to use, there is no way for them to +accidently allow an XSS-able tag. -Allowing users to specify their own whitelists is one step (implemented, btw), -but I have doubts on only doing this. Simply put, the typical programmer is too -lazy to actually go through the trouble of investigating which tags, attributes -and properties to allow. HTMLDefinition makes a big part of what HTMLPurifier -is. - -The idea, then, is to setup fundamentally different set of definitions, which -can further be customized using simpler configuration options. Alternatively, -they could be implemented as configuration profiles, which simply load -a set of recommended directives to acheive a desired affect (no simpler -config options though). +However, combing through the HTML spec to make your own whitelist can +be a daunting task. HTML Purifier ought to offer pre-canned filter levels +that amateur users can select based on what they think is their use-case. Here are some fuzzy levels you could set: @@ -46,6 +39,10 @@ make forbidden element to text transformations desirable (for example, images). == Element Risk Analysis == +Although none of the currently supported elements presents a security +threat per-say, some can cause problems for page layouts or be +extremely complicated. + Legend: [danger level] - regular tags / uncommon tags ~ deprecated tags [danger level]* - rare tags @@ -130,6 +127,7 @@ any CSS properties that are not currently implemented (such as position). Dangerous, can go outside container - float Easy to abuse - font-size, font-family (font), width Colored - background-color (background), border-color (border), color + (see proposal-colors.html) Dramatic - border, list-style-position (list-style), margin, padding, text-align, text-indent, text-transform, vertical-align, line-height diff --git a/docs/ref-strictness.txt b/docs/ref-strictness.txt deleted file mode 100644 index 0167ed86..00000000 --- a/docs/ref-strictness.txt +++ /dev/null @@ -1,33 +0,0 @@ - -Is HTML Purifier Strict or Transitional? - [rename/deprecation pending] - -Despite the fact that HTML Purifier professes to support both transitional and -strict HTML, it rejects a lot of attributes and elements that are actually, indeed, -valid. You can investigate progress.html to find out precisely what we -are doing to these *deprecated* attributes. - -However, users have found that Strict HTML imposes some quite unreasonable -restrictions on certain things. The start and value attributes in ol and -li (respectively) perhaps are the most contested. There's is currently no -widely supported browser method short of JavaScript that can replace these -two deprecated elements. It behooves us to allow these deprecated -attributes when the output is transitional. - -Fortunantely, that's the only real bugger case. The others have near-perfect -CSS equivalents, and were presentational anyway. However, the other question -pops up: should we always convert these to the CSS forms when 1. the spec -allows them anyway and 2. older browsers support them better? After all, the -whole point about CSS is to seperate styling from content, so inline styling -doesn't solve that problem. - -[new material] - -HTML Purifier 1.7 creates a new organizational system for deprecated attribute/ -element transformations. They will be unified under the title of "Tidy", which -is what they are: cleaning up after deprecated user markup into standards-compliant -versions. There will also be a change in the default behavior (athough, to the -end user not inspecting the HTML, there will be no change: in fact, it may -work even better). - -Consult the Advanced API for more details. \ No newline at end of file diff --git a/docs/ref-whatwg.txt b/docs/ref-whatwg.txt index d89344db..070d8e88 100644 --- a/docs/ref-whatwg.txt +++ b/docs/ref-whatwg.txt @@ -2,8 +2,23 @@ Web Hypertext Application Technology Working Group WHATWG -I don't think we need to worry about them. Untrusted users shouldn't be -submitting applications, eh? But if some interesting attribute pops up in -their spec, and might be worth supporting, stick it here. +== HTML 5 == -HTML 5!!! +URL: http://www.whatwg.org/specs/web-apps/current-work/ + +HTML 5 defines a kaboodle of new elements and attributes, as well as +some well-defined, "quirks mode" HTML parsing. Although WHATWG professes +to be targeted towards web applications, many of their semantic additions +would be quite useful in regular documents. Eventually, HTML +Purifier will need to audit their lists and figure out what changes need +to be made. This process is complicated by the fact that the WHATWG +doesn't buy into W3C's modularization of XHTML 1.1: we may need +to remodularize HTML 5 (probably done by section name). No sense in +committing ourselves till the spec stabilizes, though. + +More immediately speaking though, however, is the well-defined parsing +behavior that HTML 5 adds. While I have little interest in writing +another DirectLex parser, other parsers like ph5p + can be adapted to DOMLex to support much more +flexible HTML parsing (a cool feature I've seen is how they resolve +boldbothitalic).