mirror of
				https://github.com/ezyang/htmlpurifier.git
				synced 2025-10-20 16:26:15 +02:00 
			
		
		
		
	git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1651 48356398-32a2-884e-a903-53898d9a118a
		
			
				
	
	
		
			219 lines
		
	
	
		
			8.2 KiB
		
	
	
	
		
			HTML
		
	
	
	
	
	
			
		
		
	
	
			219 lines
		
	
	
		
			8.2 KiB
		
	
	
	
		
			HTML
		
	
	
	
	
	
| <?xml version="1.0" encoding="UTF-8"?>
 | |
| <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
 | |
|     "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
 | |
| <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head>
 | |
| <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
 | |
| <meta name="description" content="Specification for HTML Purifier's advanced API for defining custom filtering behavior." />
 | |
| <link rel="stylesheet" type="text/css" href="style.css" />
 | |
| 
 | |
| <title>Advanced API - HTML Purifier</title>
 | |
| 
 | |
| </head><body>
 | |
| 
 | |
| <h1>Advanced API</h1>
 | |
| 
 | |
| <div id="filing">Filed under Development</div>
 | |
| <div id="index">Return to the <a href="index.html">index</a>.</div>
 | |
| <div id="home"><a href="http://htmlpurifier.org/">HTML Purifier</a> End-User Documentation</div>
 | |
| 
 | |
| <p>
 | |
|   <strong>Warning:</strong> This document may be out-of-date. When in doubt,
 | |
|   consult the source code documentation.
 | |
| </p>
 | |
| 
 | |
| <p>HTML Purifier currently natively supports only a subset of HTML's
 | |
| allowed elements, attributes, and behavior; specifically, this subset
 | |
| is the set of elements that are safe for untrusted users to use.
 | |
| However, HTML Purifier is often utilized to ensure standards-compliance
 | |
| from input that is trusted (making it a sort of Tidy substitute),
 | |
| and often users need to define new elements or attributes. The
 | |
| advanced API is oriented specifically for these use-cases.</p>
 | |
| 
 | |
| <p>Our goals are to let the user:</p>
 | |
| 
 | |
| <dl>
 | |
|     <dt>Select</dt>
 | |
|     <dd><ul>
 | |
|         <li>Doctype</li>
 | |
|         <!-- <li>Filterset</li> -->
 | |
|         <li>Elements / Attributes / Modules</li>
 | |
|         <li>Tidy</li>
 | |
|     </ul></dd>
 | |
|     <dt>Customize</dt>
 | |
|     <dd><ul>
 | |
|         <li>Attributes</li>
 | |
|         <li>Elements</li>
 | |
|         <!--<li>Doctypes</li>-->
 | |
|     </ul></dd>
 | |
| </dl>
 | |
| 
 | |
| <h2>Select</h2>
 | |
| 
 | |
| <p>For basic use, the user will have to specify some basic parameters. This
 | |
| is not strictly necessary, as HTML Purifier's default setting will always
 | |
| output safe code, but is required for standards-compliant output.</p>
 | |
| 
 | |
| <h3>Selecting a Doctype</h3>
 | |
| 
 | |
| <p>The first thing to select is the <strong>doctype</strong>. This
 | |
| is essential for standards-compliant output.</p>
 | |
| 
 | |
| <p class="technical">This identifier is based
 | |
| on the name the W3C has given to the document type and <em>not</em>
 | |
| the DTD identifier.</p>
 | |
| 
 | |
| <p>This parameter is set via the configuration object:</p>
 | |
| 
 | |
| <pre>$config->set('HTML', 'Doctype', 'XHTML 1.0 Transitional');</pre>
 | |
| 
 | |
| <p>Due to historical reasons, the default doctype is XHTML 1.0
 | |
| Transitional, however, we really shouldn't be guessing what the user's
 | |
| doctype is. Fortunantely, people who can't be bothered to set this won't
 | |
| be bothered when their pages stop validating.</p>
 | |
| 
 | |
| <h3>Selecting Elements / Attributes / Modules</h3>
 | |
| 
 | |
| <p>HTML Purifier will, by default, allow as many elements and attributes
 | |
| as possible. However, a user may decide to roll their own filterset by
 | |
| selecting modules, elements and attributes to allow for their own
 | |
| specific use-case. This can be done using %HTML.Allowed:</p>
 | |
| 
 | |
| <pre>$config->set('HTML', 'Allowed', 'a[href|title],em,p,blockquote');</pre>
 | |
| 
 | |
| <p class="technical">The directive %HTML.Allowed is a convenience feature
 | |
| that may be fully expressed with the legacy interface.</p>
 | |
| 
 | |
| <p>We currently support another interface from older versions:</p>
 | |
| 
 | |
| <pre>$config->set('HTML', 'AllowedElements', 'a,em,p,blockquote');
 | |
| $config->set('HTML', 'AllowedAttributes', 'a.href,a.title');</pre>
 | |
| 
 | |
| <p>A user may also choose to allow modules using a specialized
 | |
| directive:</p>
 | |
| 
 | |
| <pre>$config->set('HTML', 'AllowedModules', 'Hypertext,Text,Lists');</pre>
 | |
| 
 | |
| <p>But it is not expected that this feature will be widely used.</p>
 | |
| 
 | |
| <p class="technical">Module selection will work slightly differently
 | |
| from the other AllowedElements and AllowedAttributes directives by
 | |
| directly modifying the doctype you are operating in, in the spirit of
 | |
| XHTML 1.1's modularization. We stop users from shooting themselves in the
 | |
| foot by mandating the modules in %HTML.CoreModules be used.</p>
 | |
| 
 | |
| <p class="technical">Modules are distinguished from regular elements by the
 | |
| case of their first letter. While XML distinguishes between and allows
 | |
| lower and uppercase letters in element names, XHTML uses only lower-case
 | |
| element names for sake of consistency.</p>
 | |
| 
 | |
| <h3>Selecting Tidy</h3>
 | |
| 
 | |
| <p>The name of this segment of functionality is inspired off of Dave
 | |
| Ragget's program HTML Tidy, which purported to help clean up HTML. In
 | |
| HTML Purifier, Tidy functionality involves turning unsupported and
 | |
| deprecated elements into standards-compliant ones, maintaining
 | |
| backwards compatibility, and enforcing best practices.</p>
 | |
| 
 | |
| <p>This is a complicated feature, and is explained more in depth at
 | |
| <a href="enduser-tidy.html">the Tidy documentation page</a>.</p>
 | |
| 
 | |
| <!--
 | |
| <h3>Unified selector</h3>
 | |
| 
 | |
| <p>Because selecting each and every one of these configuration options
 | |
| is a chore, we may wish to offer a specialized configuration method
 | |
| for selecting a filterset. Possibility:</p>
 | |
| 
 | |
| <pre>function selectFilter($doctype, $filterset, $tidy)</pre>
 | |
| 
 | |
| <p>...which is simply a light wrapper over the individual configuration
 | |
| calls. A custom config file format or text format could also be adopted.</p>
 | |
| -->
 | |
| 
 | |
| <h2>Customize</h2>
 | |
| 
 | |
| <p>By reviewing topic posts in the support forum, we determined that
 | |
| there were two primarily demanded customization features people wanted:
 | |
| to add an attribute to an existing element, and to add an element.
 | |
| Thus, we'll want to create convenience functions for these common
 | |
| use-cases.</p>
 | |
| 
 | |
| <p>Note that the functions described here are only available if
 | |
| a raw copy of <code>HTMLPurifier_HTMLDefinition</code> was retrieved.
 | |
| Furthermore, caching may prevent your changes from immediately
 | |
| being seen: consult <a href="enduser-customize.html">enduser-customize.html</a> on how
 | |
| to work around this.</p>
 | |
| 
 | |
| <h3>Attributes</h3>
 | |
| 
 | |
| <p>An attribute is bound to an element by a name and has a specific
 | |
| <code>AttrDef</code> that validates it. The interface is therefore:</p>
 | |
| 
 | |
| <pre>function addAttribute($element, $attribute, $attribute_def);</pre>
 | |
| 
 | |
| <p>Example of the functionality in action:</p>
 | |
| 
 | |
| <pre>$def->addAttribute('a', 'rel', 'Enum#nofollow');</pre>
 | |
| 
 | |
| <p>The <code>$attribute_def</code> value is flexible,
 | |
| to make things simpler. It can be a literal object or:</p>
 | |
| 
 | |
| <ul>
 | |
|     <!--<li>Class name: We'll instantiate it for you</li>
 | |
|     <li>Function name: We'll create an <code>HTMLPurifier_AttrDef_Anonymous</code>
 | |
|         class with that function registered as a callback.</li>-->
 | |
|     <li>String attribute type: We'll use <code>HTMLPurifier_AttrTypes</code>
 | |
|         to resolve it for you. Any data that follows a hash mark (#) will
 | |
|         be used to customize the attribute type: in the example above, 
 | |
|         we specify which values for Enum to allow.</li>
 | |
| </ul>
 | |
| 
 | |
| <h3>Elements</h3>
 | |
| 
 | |
| <p>An element requires certain information as specified by
 | |
| <code>HTMLPurifier_ElementDef</code>. However, not all of it is necessary,
 | |
| the usual things required are:</p>
 | |
| 
 | |
| <ul>
 | |
|     <li>Attributes</li>
 | |
|     <li>Content model/type</li>
 | |
|     <li>Registration in a content set</li>
 | |
| </ul>
 | |
| 
 | |
| <p>This suggests an API like this:</p>
 | |
| 
 | |
| <pre>function addElement($element, $type, $contents,
 | |
|     $attr_collections = array(); $attributes = array());</pre>
 | |
| 
 | |
| <p>Each parameter explained in depth:</p>
 | |
| 
 | |
| <dl>
 | |
|     <dt><code>$element</code></dt>
 | |
|     <dd>Element name, ex. 'label'</dd>
 | |
|     <dt><code>$type</code></dt>
 | |
|     <dd>Content set to register in, ex. 'Inline' or 'Flow'</dd>
 | |
|     <dt><code>$contents</code></dt>
 | |
|     <dd>Description of allowed children. This is a merged form of
 | |
|         <code>HTMLPurifier_ElementDef</code>'s member variables
 | |
|         <code>$content_model</code> and <code>$content_model_type</code>,
 | |
|         where the form is <q>Type: Model</q>, ex. 'Optional: Inline'.
 | |
|         There are also a number of predefined templates one may use.</dd>
 | |
|     <dt><code>$attr_collections</code></dt>
 | |
|     <dd>Array (or string if only one) of attribute collection(s) to
 | |
|         merge into the attributes array.</dd>
 | |
|     <dt><code>$attributes</code></dt>
 | |
|     <dd>Array of attribute names to attribute definitions, much like
 | |
|         the above-described attribute customization.</dd>
 | |
| </dl>
 | |
| 
 | |
| <p>A possible usage:</p>
 | |
| 
 | |
| <pre>$def->addElement('font', 'Inline', 'Optional: Inline', 'Common',
 | |
|     array('color' => 'Color'));</pre>
 | |
| 
 | |
| <p>See <code>HTMLPurifier/HTMLModule.php</code> for details.</p>
 | |
| 
 | |
| <div id="version">$Id$</div>
 | |
| 
 | |
| </body></html>
 |