diff --git a/docs/enduser-tidy.html b/docs/enduser-tidy.html new file mode 100644 index 00000000..51b52128 --- /dev/null +++ b/docs/enduser-tidy.html @@ -0,0 +1,230 @@ + + +
+ + + + +You've probably heard of HTML Tidy, Dave Raggett's little piece +of software that cleans up poorly written HTML. Let me say it straight +out:
+ +This ain't HTML Tidy!
+ +Rather, Tidy stands for a cool set of Tidy-inspired in HTML Purifier +that allows users to submit deprecated elements and attributes and get +valid strict markup back. For example:
+ +<center>Centered</center>+ +
...becomes:
+ +<div style="text-align:center;">Centered</div>+ +
...when this particular fix is run on the HTML. This tutorial will give +you down the lowdown of what exactly HTML Purifier will do when Tidy +is on, and how to fine tune this behavior. Once again, you do +not need Tidy installed on your PHP to use these features!
+ +Tidy will do several things to your HTML:
+ +Levels describe how aggressive the Tidy module should be when +cleaning up HTML. There are four levels to pick: none, light, medium +and heavy. Each of these levels has a well-defined set of behavior +associated with it, although it may change depending on your doctype.
+ +By default, Tidy operates on the medium level. You can +change the level of cleaning by setting the %HTML.TidyLevel configuration +directive:
+ +$config->set('HTML', 'TidyLevel', 'heavy'); // burn baby burn!+ +
It depends on what doctype you're using. If your documents are HTML
+4.01 Transitional, HTML Purifier will be lazy
+and won't clean up your center
+or font
tags. But if you're using HTML 4.01 Strict,
+HTML Purifier has no choice: it has to convert them, or they will
+be nuked out of existence. So while light on Transitional will result
+in little to no changes, light on Strict will still result in quite
+a lot of fixes.
This is different behavior from 1.6 or before, where deprecated +tags in transitional documents would +always be cleaned up regardless. This is also better behavior.
+ +HTML Purifier is tasked with converting deprecated tags and +attributes to standards-compliant alternatives, which usually +need copious amounts of CSS. It's also not foolproof: sometimes +things do get lost in the translation. This is why when HTML Purifier +can get away with not doing cleaning, it won't; this is why +the default value is medium and not heavy.
+ +Fortunately, only a few attributes have problems with the switch +over. They are described below:
+ +Element@Attr | +Changes | +
---|---|
caption@align | +Firefox supports stuffing the caption on the + left and right side of the table, a feature that + Internet Explorer, understandably, does not have. + When align equals right or left, the text will simply + be aligned on the left or right side. | +
img@align | +The implementation for align bottom is good, but not + perfect. There are a few pixel differences. | +
br@clear | +Clear both gets a little wonky in Internet Explorer. Haven't + really been able to figure out why. | +
hr@noshade | +All browsers implement this slightly differently: we've + chosen to make noshade horizontal rules gray. | +
There are a few more minor, although irritating, bugs. +Some older browsers support deprecated attributes, +but not CSS. Transformed elements and attributes will look unstyled +to said browsers. Also, CSS precedence is slightly different for +inline styles versus presentational markup. In increasing precedence:
+ +This means that styling that may have been masked by external CSS +declarations will start showing up (a good thing, perhaps). Finally, +if you've turned off the style attribute, almost all of +these transformations will not work. Sorry mates.
+ +You can review the rendering before and after of these transformations +by consulting the attrTransform.php +smoketest.
+ +So you want HTML Purifier to clean up your HTML, but you're not +so happy about the br@clear implementation. That's perfectly fine! +HTML Purifier will make accomodations:
+ +$config->set('HTML', 'Doctype', 'XHTML 1.0 Transitional'); +$config->set('HTML', 'TidyLevel', 'heavy'); // all changes, minus... +$config->set('HTML', 'TidyRemove', 'br@clear');+ +
That third line does the magic, removing the br@clear fix
+from the module, ensuring that <br clear="both" />
+will pass through unharmed. The reverse is possible too:
$config->set('HTML', 'Doctype', 'XHTML 1.0 Transitional'); +$config->set('HTML', 'TidyLevel', 'none'); // no changes, plus... +$config->set('HTML', 'TidyAdd', 'p@align');+ +
In this case, all transformations are shut off, except for the p@align +one, which you found handy.
+ +To find out what the names of fixes you want to turn on or off are,
+you'll have to consult the source code, specifically the files in
+HTMLPurifier/HTMLModule/Tidy/
. There is, however, a
+general syntax:
Name | +Example | +Interpretation | +
---|---|---|
element | +font | +Tag transform for element | +
element@attr | +br@clear | +Attribute transform for attr on element | +
@attr | +@lang | +Global attribute transform for attr | +
e#content_model_type | +blockquote#content_model_type | +Change of child processing implementation for e | +
The lowdown is, quite frankly, HTML Purifier's default settings are +probably good enough. The next step is to bump the level up to heavy, +and if that still doesn't satisfy your appetite, do some fine tuning. +Other than that, don't worry about it: this all works silently and +effectively in the background.
+ +