diff --git a/docs/ref-content-models.txt b/docs/ref-content-models.txt new file mode 100644 index 00000000..11d4aca7 --- /dev/null +++ b/docs/ref-content-models.txt @@ -0,0 +1,48 @@ + +Handling Content Model Changes + + +1. Context + +The distinction between Transitional and Strict document types is somewhat +of an anomaly in the lineage of XHTML document types (following 1.0, no +doctypes do not have flavors: instead, modularization is used to let +document authors vary their elements). This transition is usually quite +straight-forward, as W3C usually deprecates attributes or elements, which +are quite easily handled using tag and attribute transforms. + +However, for two elements,
, and
, W3C elected +to also change the content model.
and originally +accepted both inline and block elements, but in the strict doctype they +only allow block elements. With
, the situation is inverted: +

tags were now forbidden from appearing within this tag. + + +2. Current situation + +Currently, HTML Purifier treats

specially during Tidy mode +using a custom ChildDef class StrictBlockquote. StrictBlockquote +operates similarly to Required, except that when it encounters an inline +element, it will wrap it in a block tag (as specified by +%HTML.BlockWrapper, the default is

). The naming suggests it can +only be used for

s, although it may be possible to +genericize it to work on other cases of this nature (this would be of +little practical application, as no other element in XHTML 1.1 or earlier +has a block-only content model). + +Tidy currently contains no custom, lenient implementation for
. +If one were to be written, it would likely operate on the principle that, +when a

tag were to be encountered, it would be replaced with a +leading and trailing
tag (the contents of

, being inline, are +not an issue). There is no prior work with this sort of operation. + + +3. Outside applicability + +There are a number of other elements that contain restrictive content +models, such as

    or (the latter is restrictive in that it +does not allow block elements). In the former case, an errant node +is eliminated completely, in the latter case, the text of the node +would is preserved (as the parent node does allow PCDATA). Custom +content model implementations probably are not the best way of handling +these cases, instead, node bubbling should be implemented instead. diff --git a/docs/ref-loose-vs-strict.txt b/docs/ref-loose-vs-strict.txt deleted file mode 100644 index 4f47ea7d..00000000 --- a/docs/ref-loose-vs-strict.txt +++ /dev/null @@ -1,18 +0,0 @@ - -Loose versus Strict - [rename/deprecation pending] - -The most common change between doctypes are between the two flavors of HTML 4.01 and -XHTML 1.0: Transitional (Loose) and Strict. Besides deprecated attributes and elements -(which are quite easy to identify), there are two content model changes that were -made: - -BLOCKQUOTE changes from 'flow' to 'block' - current behavior: inline inner contents should not be nuked, block-ify as necessary -ADDRESS from potpourri to Inline (removes p tags) - current behavior: block tags silently dropped - ideal behavior: replace block elements with something like
    . (not high priority, - somewhat difficult to implement) - -We're missing strict support for U, S, STRIKE: this needs to be fixed soon (and -is quite simple to fix).