mirror of
https://github.com/ezyang/htmlpurifier.git
synced 2025-08-04 21:28:06 +02:00
Compare commits
3 Commits
v1.6.1-str
...
v2.0.0-str
Author | SHA1 | Date | |
---|---|---|---|
|
42858ad594 | ||
|
5ecb11f19a | ||
|
0101311193 |
2
Doxyfile
2
Doxyfile
@@ -4,7 +4,7 @@
|
|||||||
# Project related configuration options
|
# Project related configuration options
|
||||||
#---------------------------------------------------------------------------
|
#---------------------------------------------------------------------------
|
||||||
PROJECT_NAME = HTML Purifier
|
PROJECT_NAME = HTML Purifier
|
||||||
PROJECT_NUMBER = 1.6.1
|
PROJECT_NUMBER = 2.0.0
|
||||||
OUTPUT_DIRECTORY = "C:/Documents and Settings/Edward/My Documents/My Webs/htmlpurifier/docs/doxygen"
|
OUTPUT_DIRECTORY = "C:/Documents and Settings/Edward/My Documents/My Webs/htmlpurifier/docs/doxygen"
|
||||||
CREATE_SUBDIRS = NO
|
CREATE_SUBDIRS = NO
|
||||||
OUTPUT_LANGUAGE = English
|
OUTPUT_LANGUAGE = English
|
||||||
|
57
INSTALL
57
INSTALL
@@ -1,4 +1,3 @@
|
|||||||
|
|
||||||
Install
|
Install
|
||||||
How to install HTML Purifier
|
How to install HTML Purifier
|
||||||
|
|
||||||
@@ -8,13 +7,13 @@ installation GUI, you've come to the wrong place!) The impatient can scroll
|
|||||||
down to the bottom of this INSTALL document to see the code, but you really
|
down to the bottom of this INSTALL document to see the code, but you really
|
||||||
should make sure a few things are properly done.
|
should make sure a few things are properly done.
|
||||||
|
|
||||||
Todo: Convert to using the array syntax for configuration.
|
|
||||||
|
|
||||||
|
|
||||||
1. Compatibility
|
1. Compatibility
|
||||||
|
|
||||||
HTML Purifier works in both PHP 4 and PHP 5, from PHP 4.3.9 and up. It has no
|
HTML Purifier works in both PHP 4 and PHP 5, from PHP 4.3.2 and up. It has no
|
||||||
core dependencies with other libraries. (Whoopee!)
|
core dependencies with other libraries.
|
||||||
|
|
||||||
Optional extensions are iconv (usually installed) and tidy (also common).
|
Optional extensions are iconv (usually installed) and tidy (also common).
|
||||||
If you use UTF-8 and don't plan on pretty-printing HTML, you can get away with
|
If you use UTF-8 and don't plan on pretty-printing HTML, you can get away with
|
||||||
@@ -50,6 +49,7 @@ be standards compliant. HTML Purifier can deal with these doctypes:
|
|||||||
* XHTML 1.0 Strict
|
* XHTML 1.0 Strict
|
||||||
* HTML 4.01 Transitional
|
* HTML 4.01 Transitional
|
||||||
* HTML 4.01 Strict
|
* HTML 4.01 Strict
|
||||||
|
* XHTML 1.1 sans Ruby
|
||||||
|
|
||||||
...and these character encodings:
|
...and these character encodings:
|
||||||
|
|
||||||
@@ -68,11 +68,11 @@ the doctype from this code in your HTML documents:
|
|||||||
<meta http-equiv="Content-type" content="text/html;charset=ENCODING">
|
<meta http-equiv="Content-type" content="text/html;charset=ENCODING">
|
||||||
|
|
||||||
For legacy codebases these declarations may be missing. If that is the case,
|
For legacy codebases these declarations may be missing. If that is the case,
|
||||||
STOP, and read up on character encodings and doctypes (in that order). Here
|
STOP, and read docs/enduser-utf8.html
|
||||||
are some links:
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
* http://www.joelonsoftware.com/articles/Unicode.html
|
|
||||||
* http://alistapart.com/stories/doctype/
|
|
||||||
|
|
||||||
You may currently be vulnerable to XSS and other security threats, and HTML
|
You may currently be vulnerable to XSS and other security threats, and HTML
|
||||||
Purifier won't be able to fix that.
|
Purifier won't be able to fix that.
|
||||||
@@ -116,27 +116,30 @@ websites):
|
|||||||
|
|
||||||
Note that HTML Purifier's support for non-Unicode encodings is crippled by the
|
Note that HTML Purifier's support for non-Unicode encodings is crippled by the
|
||||||
fact that any character not supported by that encoding will be silently
|
fact that any character not supported by that encoding will be silently
|
||||||
dropped, EVEN if it is ampersand escaped. This is a current limitation of
|
dropped, EVEN if it is ampersand escaped. If you want to work around
|
||||||
HTML Purifier that we are NOT actively working to fix. Patches are welcome,
|
this, you are welcome to read docs/enduser-utf8.html for a workaround,
|
||||||
but there are so many other gotchas and problems in I18N for non-Unicode
|
but please be cognizant of the issues the "solution" creates.
|
||||||
encodings that this functionality is low priority. See
|
|
||||||
<http://ppewww.ph.gla.ac.uk/~flavell/charset/form-i18n.html> for a more
|
|
||||||
detailed lowdown on the topic.
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
4.2. Setting a different doctype
|
4.2. Setting a different doctype
|
||||||
|
|
||||||
For those of you stuck using HTML 4.01 Transitional, you can disable
|
For those of you using HTML 4.01 Transitional, you can disable
|
||||||
XHTML output like this:
|
XHTML output like this:
|
||||||
|
|
||||||
$config->set('Core', 'XHTML', false);
|
$config->set('HTML', 'Doctype', 'HTML 4.01 Transitional');
|
||||||
|
|
||||||
I recommend that you use XHTML, although not as much as I recommend UTF-8. If
|
Other supported doctypes include:
|
||||||
your HTML 4.01 page validates, good for you!
|
|
||||||
|
|
||||||
Currently, we can only guarantee transitional-complaint output, future
|
|
||||||
versions will also allow strict-compliant output.
|
* HTML 4.01 Strict
|
||||||
|
* HTML 4.01 Transitional
|
||||||
|
* XHTML 1.0 Strict
|
||||||
|
* XHTML 1.0 Transitional
|
||||||
|
* XHTML 1.1
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
@@ -184,9 +187,17 @@ If your website is in a different encoding or doctype, use this code:
|
|||||||
require_once '/path/to/htmlpurifier/library/HTMLPurifier.auto.php';
|
require_once '/path/to/htmlpurifier/library/HTMLPurifier.auto.php';
|
||||||
|
|
||||||
$config = HTMLPurifier_Config::createDefault();
|
$config = HTMLPurifier_Config::createDefault();
|
||||||
$config->set('Core', 'Encoding', 'ISO-8859-1'); //replace with your encoding
|
$config->set('Core', 'Encoding', 'ISO-8859-1'); // replace with your encoding
|
||||||
$config->set('Core', 'XHTML', true); //replace with false if HTML 4.01
|
$config->set('HTML', 'Doctype', 'HTML 4.01 Transitional'); // replace with your doctype
|
||||||
$purifier = new HTMLPurifier($config);
|
$purifier = new HTMLPurifier($config);
|
||||||
|
|
||||||
$clean_html = $purifier->purify($dirty_html);
|
$clean_html = $purifier->purify($dirty_html);
|
||||||
?>
|
?>
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
7. Caching
|
||||||
|
|
||||||
|
HTML Purifier generates some cache files to speed up its execution. For
|
||||||
|
maximum performance, make sure that library/HTMLPurifier/DefinitionCache/Serializer
|
||||||
|
is writeable by the webserver.
|
@@ -17,7 +17,7 @@ ce document pour quelques choses.
|
|||||||
|
|
||||||
1. Compatibilité
|
1. Compatibilité
|
||||||
|
|
||||||
HTML Purifier fonctionne dans PHP 4 et PHP 5. PHP 4.3.9 est le dernier
|
HTML Purifier fonctionne dans PHP 4 et PHP 5. PHP 4.3.2 est le dernier
|
||||||
version que je le testais. Il ne dépend de les autre librairies.
|
version que je le testais. Il ne dépend de les autre librairies.
|
||||||
|
|
||||||
Les extensions optionnel est iconv (en général déjà installer) et
|
Les extensions optionnel est iconv (en général déjà installer) et
|
||||||
|
61
NEWS
61
NEWS
@@ -9,7 +9,62 @@ NEWS ( CHANGELOG and HISTORY ) HTMLPurifier
|
|||||||
. Internal change
|
. Internal change
|
||||||
==========================
|
==========================
|
||||||
|
|
||||||
1.7.0, unknown release date
|
2.0.0, released 2007-06-20
|
||||||
|
# Completely refactored HTMLModuleManager, decentralizing safety
|
||||||
|
information
|
||||||
|
# Transform modules changed to Tidy modules, which offer more flexibility
|
||||||
|
and better modularization
|
||||||
|
# Configuration object now finalizes itself when a read operation is
|
||||||
|
performed on it, ensuring that its internal state stays consistent.
|
||||||
|
To revert this behavior, you can set the $autoFinalize member variable
|
||||||
|
off, but it's not recommended.
|
||||||
|
# New compact syntax for AttrDef objects that can be used to instantiate
|
||||||
|
new objects via make()
|
||||||
|
# Definitions (esp. HTMLDefinition) are now cached for a significant
|
||||||
|
performance boost. You can disable caching by setting %Core.DefinitionCache
|
||||||
|
to null. You CANNOT edit raw definitions without setting the corresponding
|
||||||
|
DefinitionID directive (%HTML.DefinitionID for HTMLDefinition).
|
||||||
|
# Contents between <script> tags are now completely removed if <script>
|
||||||
|
is not allowed
|
||||||
|
# Prototype-declarations for Lexer removed in favor of configuration
|
||||||
|
determination of Lexer implementations.
|
||||||
|
! HTML Purifier now works in PHP 4.3.2.
|
||||||
|
! Configuration form-editing API makes tweaking HTMLPurifier_Config a
|
||||||
|
breeze!
|
||||||
|
! Configuration directives that accept hashes now allow new string
|
||||||
|
format: key1:value1,key2:value2
|
||||||
|
! ConfigDoc now factored into OOP design
|
||||||
|
! All deprecated elements now natively supported
|
||||||
|
! Implement TinyMCE styled whitelist specification format in
|
||||||
|
%HTML.Allowed
|
||||||
|
! Config object gives more friendly error messages when things go wrong
|
||||||
|
! Advanced API implemented: easy functions for creating elements (addElement)
|
||||||
|
and attributes (addAttribute) on HTMLDefinition
|
||||||
|
! Add native support for required attributes
|
||||||
|
- Deprecated and removed EnableRedundantUTF8Cleaning. It didn't even work!
|
||||||
|
- DOMLex will not emit errors when a custom error handler that does not
|
||||||
|
honor error_reporting is used
|
||||||
|
- StrictBlockquote child definition refrains from wrapping whitespace
|
||||||
|
in tags now.
|
||||||
|
- Bug resulting from tag transforms to non-allowed elements fixed
|
||||||
|
- ChildDef_Custom's regex generation has been improved, removing several
|
||||||
|
false positives
|
||||||
|
. Unit test for ElementDef created, ElementDef behavior modified to
|
||||||
|
be more flexible
|
||||||
|
. Added convenience functions for HTMLModule constructors
|
||||||
|
. AttrTypes now has accessor functions that should be used instead
|
||||||
|
of directly manipulating info
|
||||||
|
. TagTransform_Center deprecated in favor of generic TagTransform_Simple
|
||||||
|
. Add extra protection in AttrDef_URI against phantom Schemes
|
||||||
|
. Doctype object added to HTMLDefinition which describes certain aspects
|
||||||
|
of the operational document type
|
||||||
|
. Lexer is now pre-emptively included, with a conditional include for the
|
||||||
|
PHP5 only version.
|
||||||
|
. HTMLDefinition and CSSDefinition have a common parent class: Definition.
|
||||||
|
. DirectLex can now track line-numbers
|
||||||
|
. Preliminary error collector is in place, although no code actually reports
|
||||||
|
errors yet
|
||||||
|
. Factor out most of ValidateAttributes to new AttrValidator class
|
||||||
|
|
||||||
1.6.1, released 2007-05-05
|
1.6.1, released 2007-05-05
|
||||||
! Support for more deprecated attributes via transformations:
|
! Support for more deprecated attributes via transformations:
|
||||||
@@ -61,7 +116,7 @@ NEWS ( CHANGELOG and HISTORY ) HTMLPurifier
|
|||||||
- Error messages are emitted when you attempt to "allow" elements or
|
- Error messages are emitted when you attempt to "allow" elements or
|
||||||
attributes that HTML Purifier does not support
|
attributes that HTML Purifier does not support
|
||||||
|
|
||||||
1.5.1, unknown release date
|
|
||||||
- Fix segfault in unit test. The problem is not very reproduceable and
|
- Fix segfault in unit test. The problem is not very reproduceable and
|
||||||
I don't know what causes it, but a six line patch fixed it.
|
I don't know what causes it, but a six line patch fixed it.
|
||||||
|
|
||||||
@@ -260,4 +315,4 @@ NEWS ( CHANGELOG and HISTORY ) HTMLPurifier
|
|||||||
! First public release, most functionality implemented. Notable omissions are:
|
! First public release, most functionality implemented. Notable omissions are:
|
||||||
+ Shorthand CSS properties
|
+ Shorthand CSS properties
|
||||||
+ Table CSS properties
|
+ Table CSS properties
|
||||||
+ Deprecated attribute transformations
|
+ Deprecated attribute transformations
|
47
TODO
47
TODO
@@ -1,4 +1,3 @@
|
|||||||
|
|
||||||
TODO List
|
TODO List
|
||||||
|
|
||||||
= KEY ====================
|
= KEY ====================
|
||||||
@@ -7,33 +6,34 @@ TODO List
|
|||||||
? Maybe I'll Do It
|
? Maybe I'll Do It
|
||||||
==========================
|
==========================
|
||||||
|
|
||||||
1.7 release [Advanced API]
|
2.1 release [Refactor, refactor!]
|
||||||
# Complete advanced API, and fully document it
|
|
||||||
# Implement all edge-case attribute transforms
|
|
||||||
# Implement all deprecated tags and attributes
|
|
||||||
- Parse TinyMCE-style whitelist into our %HTML.Allow* whitelists (possibly
|
|
||||||
do this earlier)
|
|
||||||
? HTML interface for tweaking configuration to see changes
|
|
||||||
|
|
||||||
|
|
||||||
1.8 release [Refactor, refactor!]
|
|
||||||
# URI validation routines tighter (see docs/dev-code-quality.html) (COMPLEX)
|
# URI validation routines tighter (see docs/dev-code-quality.html) (COMPLEX)
|
||||||
# Advanced URI filtering schemes (see docs/proposal-new-directives.txt)
|
# Advanced URI filtering schemes (see docs/proposal-new-directives.txt)
|
||||||
- Configuration profiles: predefined directives set with one func call
|
- Configuration profiles: predefined directives set with one func call
|
||||||
- Implement IDREF support (harder than it seems, since you cannot have
|
- Implement IDREF support (harder than it seems, since you cannot have
|
||||||
IDREFs to non-existent IDs)
|
IDREFs to non-existent IDs)
|
||||||
- Allow non-ASCII characters in font names
|
- Allow non-ASCII characters in font names
|
||||||
|
- Genericize special cases in RemoveForeignElements
|
||||||
|
|
||||||
1.9 release [Error'ed]
|
2.2 release [Error'ed]
|
||||||
# Error logging for filtering/cleanup procedures
|
# Error logging for filtering/cleanup procedures
|
||||||
- Requires I18N facilities to be created first (COMPLEX)
|
- Requires I18N facilities to be created first (COMPLEX)
|
||||||
- XSS-attempt detection
|
- XSS-attempt detection
|
||||||
- More fine-grained control over escaping behavior
|
- More fine-grained control over escaping behavior
|
||||||
- Silently drop content inbetween SCRIPT tags (can be generalized to allow
|
|
||||||
specification of elements that, when detected as foreign, trigger removal
|
|
||||||
of children, although unbalanced tags could wreck havoc (or at least
|
|
||||||
delete the rest of the document)).
|
|
||||||
|
|
||||||
1.10 release [Do What I Mean, Not What I Say]
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
2.3 release [Do What I Mean, Not What I Say]
|
||||||
# Additional support for poorly written HTML
|
# Additional support for poorly written HTML
|
||||||
- Microsoft Word HTML cleaning (i.e. MsoNormal, but research essential!)
|
- Microsoft Word HTML cleaning (i.e. MsoNormal, but research essential!)
|
||||||
- Friendly strict handling of <address> (block -> <br>)
|
- Friendly strict handling of <address> (block -> <br>)
|
||||||
@@ -48,9 +48,14 @@ TODO List
|
|||||||
- Append something to duplicate IDs so they're still usable (impl. note: the
|
- Append something to duplicate IDs so they're still usable (impl. note: the
|
||||||
dupe detector would also need to detect the suffix as well)
|
dupe detector would also need to detect the suffix as well)
|
||||||
|
|
||||||
2.0 release [Beyond HTML]
|
2.4 release [It's All About Trust] (floating)
|
||||||
|
# Implement untrusted, dangerous elements/attributes
|
||||||
|
|
||||||
|
3.0 release [Beyond HTML]
|
||||||
# Legit token based CSS parsing (will require revamping almost every
|
# Legit token based CSS parsing (will require revamping almost every
|
||||||
AttrDef class)
|
AttrDef class)
|
||||||
|
# More control over allowed CSS properties (maybe modularize it in the
|
||||||
|
same fashion!)
|
||||||
# Formatters for plaintext (COMPLEX)
|
# Formatters for plaintext (COMPLEX)
|
||||||
- Auto-paragraphing (be sure to leverage fact that we know when things
|
- Auto-paragraphing (be sure to leverage fact that we know when things
|
||||||
shouldn't be paragraphed, such as lists and tables).
|
shouldn't be paragraphed, such as lists and tables).
|
||||||
@@ -65,7 +70,7 @@ TODO List
|
|||||||
- Convert RTL/LTR override characters to <bdo> tags, or vice versa on demand.
|
- Convert RTL/LTR override characters to <bdo> tags, or vice versa on demand.
|
||||||
Also, enable disabling of directionality
|
Also, enable disabling of directionality
|
||||||
|
|
||||||
3.0 release [To XML and Beyond]
|
4.0 release [To XML and Beyond]
|
||||||
- Extended HTML capabilities based on namespacing and tag transforms (COMPLEX)
|
- Extended HTML capabilities based on namespacing and tag transforms (COMPLEX)
|
||||||
- Hooks for adding custom processors to custom namespaced tags and
|
- Hooks for adding custom processors to custom namespaced tags and
|
||||||
attributes, offer default implementation
|
attributes, offer default implementation
|
||||||
@@ -78,12 +83,18 @@ Ongoing
|
|||||||
- WordPress (mostly written, needs beta-testing)
|
- WordPress (mostly written, needs beta-testing)
|
||||||
- eFiction
|
- eFiction
|
||||||
- more! (look for ones that use WYSIWYGs)
|
- more! (look for ones that use WYSIWYGs)
|
||||||
|
- Complete basic smoketests
|
||||||
|
|
||||||
Unknown release (on a scratch-an-itch basis)
|
Unknown release (on a scratch-an-itch basis)
|
||||||
? Semi-lossy dumb alternate character encoding transfor
|
? Semi-lossy dumb alternate character encoding transfor
|
||||||
? Have 'lang' attribute be checked against official lists, achieved by
|
? Have 'lang' attribute be checked against official lists, achieved by
|
||||||
encoding all characters that have string entity equivalents
|
encoding all characters that have string entity equivalents
|
||||||
- Explain how to use HTML Purifier in non-PHP languages
|
- Explain how to use HTML Purifier in non-PHP languages
|
||||||
|
- Abstract ChildDef_BlockQuote to work with all elements that only
|
||||||
|
allow blocks in them, required or optional
|
||||||
|
- Reorganize Unit Tests
|
||||||
|
- Refactor loop tests (esp. AttrDef_URI)
|
||||||
|
- Reorganize configuration directives (Create more namespaces! Get messy!)
|
||||||
|
|
||||||
Requested
|
Requested
|
||||||
? Native content compression, whitespace stripping (don't rely on Tidy, make
|
? Native content compression, whitespace stripping (don't rely on Tidy, make
|
||||||
@@ -92,4 +103,4 @@ Requested
|
|||||||
Wontfix
|
Wontfix
|
||||||
- Non-lossy smart alternate character encoding transformations (unless
|
- Non-lossy smart alternate character encoding transformations (unless
|
||||||
patch provided)
|
patch provided)
|
||||||
- Pretty-printing HTML, users can use Tidy on the output on entire page
|
- Pretty-printing HTML, users can use Tidy on the output on entire page
|
14
WHATSNEW
14
WHATSNEW
@@ -1,7 +1,7 @@
|
|||||||
The 1.6.1 release, code-named 'Ach! We missed something! Run!', completes
|
HTML Purifier 2.0 is the culmination of two major architectural changes.
|
||||||
HTML Purifier's roster of attribute transformations. It also implements
|
The first is Tidy, which enables HTML Purifier to both natively support
|
||||||
a number of minor features (such as better font transformations, smarter
|
deprecated elements and also convert them to standards-compliant
|
||||||
HTML parsing, the CSS property 'white-space' and XHTML 1.1), a few bug
|
alternatives. The second is the Advanced API, which enables users to
|
||||||
fixes (most notably fixed __autoload compatibility issues) and a ton
|
create new elements and attributes with ease. Keeping in line with a
|
||||||
of refactoring. 1.6 was for things that absolutely could not wait: this
|
commitment to high quality, there are also five esoteric bug-fixes and a
|
||||||
release, developed in a more leisurely pace, fills in the gaps.
|
plethora of subtle improvements that enhance the library.
|
||||||
|
4
WYSIWYG
4
WYSIWYG
@@ -16,7 +16,3 @@ trouble. Therein lies the solution:
|
|||||||
HTML Purifier is perfect for filtering pure-HTML input from WYSIWYG editors.
|
HTML Purifier is perfect for filtering pure-HTML input from WYSIWYG editors.
|
||||||
|
|
||||||
Enough said.
|
Enough said.
|
||||||
|
|
||||||
There is a proof-of-concept integration of HTML Purifier with the Mantis
|
|
||||||
bugtracker at http://hp.jpsband.org/mantis/ You can see notes on how
|
|
||||||
this integration was acheived at http://hp.jpsband.org/mantis_notes.txt
|
|
||||||
|
@@ -2,216 +2,37 @@
|
|||||||
|
|
||||||
/**
|
/**
|
||||||
* Generates XML and HTML documents describing configuration.
|
* Generates XML and HTML documents describing configuration.
|
||||||
|
* @note PHP 5 only!
|
||||||
*/
|
*/
|
||||||
|
|
||||||
/*
|
/*
|
||||||
TODO:
|
TODO:
|
||||||
- make XML format richer (see below)
|
- make XML format richer (see XMLSerializer_ConfigSchema)
|
||||||
- extend XSLT transformation (see the corresponding XSLT file)
|
- extend XSLT transformation (see the corresponding XSLT file)
|
||||||
- allow generation of packaged docs that can be easily moved
|
- allow generation of packaged docs that can be easily moved
|
||||||
- multipage documentation
|
- multipage documentation
|
||||||
- determine how to multilingualize
|
- determine how to multilingualize
|
||||||
- factor out code into classes
|
- add blurbs to ToC
|
||||||
*/
|
*/
|
||||||
|
|
||||||
// ---------------------------------------------------------------------------
|
|
||||||
// Check and configure environment
|
|
||||||
|
|
||||||
if (version_compare('5', PHP_VERSION, '>')) exit('Requires PHP 5 or higher.');
|
if (version_compare('5', PHP_VERSION, '>')) exit('Requires PHP 5 or higher.');
|
||||||
error_reporting(E_ALL);
|
error_reporting(E_ALL); // probably not possible to use E_STRICT
|
||||||
|
|
||||||
|
|
||||||
// ---------------------------------------------------------------------------
|
|
||||||
// Include HTML Purifier library
|
|
||||||
|
|
||||||
|
// load dual-libraries
|
||||||
require_once '../library/HTMLPurifier.auto.php';
|
require_once '../library/HTMLPurifier.auto.php';
|
||||||
|
require_once 'library/ConfigDoc.auto.php';
|
||||||
|
|
||||||
// ---------------------------------------------------------------------------
|
|
||||||
// Setup convenience functions
|
|
||||||
|
|
||||||
function appendHTMLDiv($document, $node, $html) {
|
|
||||||
global $purifier;
|
|
||||||
$html = $purifier->purify($html);
|
|
||||||
$dom_html = $document->createDocumentFragment();
|
|
||||||
$dom_html->appendXML($html);
|
|
||||||
|
|
||||||
$dom_div = $document->createElement('div');
|
|
||||||
$dom_div->setAttribute('xmlns', 'http://www.w3.org/1999/xhtml');
|
|
||||||
$dom_div->appendChild($dom_html);
|
|
||||||
|
|
||||||
$node->appendChild($dom_div);
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
// ---------------------------------------------------------------------------
|
|
||||||
// Load copies of HTMLPurifier_ConfigDef and HTMLPurifier
|
|
||||||
|
|
||||||
$schema = HTMLPurifier_ConfigSchema::instance();
|
$schema = HTMLPurifier_ConfigSchema::instance();
|
||||||
$purifier = new HTMLPurifier();
|
$style = 'plain'; // use $_GET in the future
|
||||||
|
$configdoc = new ConfigDoc();
|
||||||
|
$output = $configdoc->generate($schema, $style);
|
||||||
|
|
||||||
|
// write out
|
||||||
// ---------------------------------------------------------------------------
|
file_put_contents("$style.html", $output);
|
||||||
// Generate types.xml, a document describing the constraint "type"
|
|
||||||
|
|
||||||
$types_document = new DOMDocument('1.0', 'UTF-8');
|
|
||||||
$types_root = $types_document->createElement('types');
|
|
||||||
$types_document->appendChild($types_root);
|
|
||||||
$types_document->formatOutput = true;
|
|
||||||
foreach ($schema->types as $name => $expanded_name) {
|
|
||||||
$types_type = $types_document->createElement('type', $expanded_name);
|
|
||||||
$types_type->setAttribute('id', $name);
|
|
||||||
$types_root->appendChild($types_type);
|
|
||||||
}
|
|
||||||
$types_document->save('types.xml');
|
|
||||||
|
|
||||||
|
|
||||||
// ---------------------------------------------------------------------------
|
|
||||||
// Generate configdoc.xml, a document documenting configuration directives
|
|
||||||
|
|
||||||
$dom_document = new DOMDocument('1.0', 'UTF-8');
|
|
||||||
$dom_root = $dom_document->createElement('configdoc');
|
|
||||||
$dom_document->appendChild($dom_root);
|
|
||||||
$dom_document->formatOutput = true;
|
|
||||||
|
|
||||||
// add the name of the application
|
|
||||||
$dom_root->appendChild($dom_document->createElement('title', 'HTML Purifier'));
|
|
||||||
|
|
||||||
/*
|
|
||||||
TODO for XML format:
|
|
||||||
- create a definition (DTD or other) once interface stabilizes
|
|
||||||
*/
|
|
||||||
|
|
||||||
foreach($schema->info as $namespace_name => $namespace_info) {
|
|
||||||
|
|
||||||
$dom_namespace = $dom_document->createElement('namespace');
|
|
||||||
$dom_root->appendChild($dom_namespace);
|
|
||||||
|
|
||||||
$dom_namespace->setAttribute('id', $namespace_name);
|
|
||||||
$dom_namespace->appendChild(
|
|
||||||
$dom_document->createElement('name', $namespace_name)
|
|
||||||
);
|
|
||||||
$dom_namespace_description = $dom_document->createElement('description');
|
|
||||||
$dom_namespace->appendChild($dom_namespace_description);
|
|
||||||
appendHTMLDiv($dom_document, $dom_namespace_description,
|
|
||||||
$schema->info_namespace[$namespace_name]->description);
|
|
||||||
|
|
||||||
foreach ($namespace_info as $name => $info) {
|
|
||||||
|
|
||||||
if ($info->class == 'alias') continue;
|
|
||||||
|
|
||||||
$dom_directive = $dom_document->createElement('directive');
|
|
||||||
$dom_namespace->appendChild($dom_directive);
|
|
||||||
|
|
||||||
$dom_directive->setAttribute('id', $namespace_name . '.' . $name);
|
|
||||||
$dom_directive->appendChild(
|
|
||||||
$dom_document->createElement('name', $name)
|
|
||||||
);
|
|
||||||
|
|
||||||
$dom_constraints = $dom_document->createElement('constraints');
|
|
||||||
$dom_directive->appendChild($dom_constraints);
|
|
||||||
|
|
||||||
$dom_type = $dom_document->createElement('type', $info->type);
|
|
||||||
if ($info->allow_null) {
|
|
||||||
$dom_type->setAttribute('allow-null', 'yes');
|
|
||||||
}
|
|
||||||
$dom_constraints->appendChild($dom_type);
|
|
||||||
|
|
||||||
if ($info->allowed !== true) {
|
|
||||||
$dom_allowed = $dom_document->createElement('allowed');
|
|
||||||
$dom_constraints->appendChild($dom_allowed);
|
|
||||||
foreach ($info->allowed as $allowed => $bool) {
|
|
||||||
$dom_allowed->appendChild(
|
|
||||||
$dom_document->createElement('value', $allowed)
|
|
||||||
);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
$raw_default = $schema->defaults[$namespace_name][$name];
|
|
||||||
if (is_bool($raw_default)) {
|
|
||||||
$default = $raw_default ? 'true' : 'false';
|
|
||||||
} elseif (is_string($raw_default)) {
|
|
||||||
$default = "\"$raw_default\"";
|
|
||||||
} elseif (is_null($raw_default)) {
|
|
||||||
$default = 'null';
|
|
||||||
} else {
|
|
||||||
$default = print_r(
|
|
||||||
$schema->defaults[$namespace_name][$name], true
|
|
||||||
);
|
|
||||||
}
|
|
||||||
|
|
||||||
$dom_default = $dom_document->createElement('default', $default);
|
|
||||||
|
|
||||||
// remove this once we get a DTD
|
|
||||||
$dom_default->setAttribute('xml:space', 'preserve');
|
|
||||||
|
|
||||||
$dom_constraints->appendChild($dom_default);
|
|
||||||
|
|
||||||
$dom_descriptions = $dom_document->createElement('descriptions');
|
|
||||||
$dom_directive->appendChild($dom_descriptions);
|
|
||||||
|
|
||||||
foreach ($info->descriptions as $file => $file_descriptions) {
|
|
||||||
foreach ($file_descriptions as $line => $description) {
|
|
||||||
$dom_description = $dom_document->createElement('description');
|
|
||||||
$dom_description->setAttribute('file', $file);
|
|
||||||
$dom_description->setAttribute('line', $line);
|
|
||||||
appendHTMLDiv($dom_document, $dom_description, $description);
|
|
||||||
$dom_descriptions->appendChild($dom_description);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
}
|
|
||||||
|
|
||||||
}
|
|
||||||
|
|
||||||
// print_r($dom_document->saveXML());
|
|
||||||
|
|
||||||
// save a copy of the raw XML
|
|
||||||
$dom_document->save('configdoc.xml');
|
|
||||||
|
|
||||||
|
|
||||||
// ---------------------------------------------------------------------------
|
|
||||||
// Generate final output using XSLT
|
|
||||||
|
|
||||||
// load the stylesheet
|
|
||||||
$xsl_stylesheet_name = 'plain';
|
|
||||||
$xsl_stylesheet = "styles/$xsl_stylesheet_name.xsl";
|
|
||||||
$xsl_dom_stylesheet = new DOMDocument();
|
|
||||||
$xsl_dom_stylesheet->load($xsl_stylesheet);
|
|
||||||
|
|
||||||
// setup the XSLT processor
|
|
||||||
$xsl_processor = new XSLTProcessor();
|
|
||||||
|
|
||||||
// perform the transformation
|
|
||||||
$xsl_processor->importStylesheet($xsl_dom_stylesheet);
|
|
||||||
$html_output = $xsl_processor->transformToXML($dom_document);
|
|
||||||
|
|
||||||
// some slight fudges to preserve backwards compatibility
|
|
||||||
$html_output = str_replace('/>', ' />', $html_output); // <br /> not <br/>
|
|
||||||
$html_output = str_replace(' xmlns=""', '', $html_output); // rm unnecessary xmlns
|
|
||||||
|
|
||||||
if (class_exists('Tidy')) {
|
|
||||||
// cleanup output
|
|
||||||
$config = array(
|
|
||||||
'indent' => true,
|
|
||||||
'output-xhtml' => true,
|
|
||||||
'wrap' => 80
|
|
||||||
);
|
|
||||||
$tidy = new Tidy;
|
|
||||||
$tidy->parseString($html_output, $config, 'utf8');
|
|
||||||
$tidy->cleanRepair();
|
|
||||||
$html_output = (string) $tidy;
|
|
||||||
}
|
|
||||||
|
|
||||||
// write it to a file (todo: parse into seperate pages)
|
|
||||||
file_put_contents("$xsl_stylesheet_name.html", $html_output);
|
|
||||||
|
|
||||||
|
|
||||||
// ---------------------------------------------------------------------------
|
|
||||||
// Output for instant feedback
|
|
||||||
|
|
||||||
if (php_sapi_name() != 'cli') {
|
if (php_sapi_name() != 'cli') {
|
||||||
echo $html_output;
|
// output = instant feedback
|
||||||
|
echo $output;
|
||||||
} else {
|
} else {
|
||||||
echo 'Files generated successfully.';
|
echo 'Files generated successfully.';
|
||||||
}
|
}
|
||||||
|
10
configdoc/library/ConfigDoc.auto.php
Normal file
10
configdoc/library/ConfigDoc.auto.php
Normal file
@@ -0,0 +1,10 @@
|
|||||||
|
<?php
|
||||||
|
|
||||||
|
/**
|
||||||
|
* This is a stub include that automatically configures the include path.
|
||||||
|
*/
|
||||||
|
|
||||||
|
set_include_path(dirname(__FILE__) . PATH_SEPARATOR . get_include_path() );
|
||||||
|
require_once 'ConfigDoc.php';
|
||||||
|
|
||||||
|
?>
|
39
configdoc/library/ConfigDoc.php
Normal file
39
configdoc/library/ConfigDoc.php
Normal file
@@ -0,0 +1,39 @@
|
|||||||
|
<?php
|
||||||
|
|
||||||
|
require_once 'ConfigDoc/HTMLXSLTProcessor.php';
|
||||||
|
require_once 'ConfigDoc/XMLSerializer/Types.php';
|
||||||
|
require_once 'ConfigDoc/XMLSerializer/ConfigSchema.php';
|
||||||
|
|
||||||
|
class ConfigDoc
|
||||||
|
{
|
||||||
|
|
||||||
|
function generate($schema, $xsl_stylesheet_name = 'plain', $parameters = array()) {
|
||||||
|
// generate types document, describing type constraints
|
||||||
|
$types_serializer = new ConfigDoc_XMLSerializer_Types();
|
||||||
|
$types_document = $types_serializer->serialize($schema);
|
||||||
|
$types_document->save(dirname(__FILE__) . '/../types.xml'); // only ONE
|
||||||
|
|
||||||
|
// generate configdoc.xml, documents configuration directives
|
||||||
|
$schema_serializer = new ConfigDoc_XMLSerializer_ConfigSchema();
|
||||||
|
$schema_document = $schema_serializer->serialize($schema);
|
||||||
|
$schema_document->save('configdoc.xml');
|
||||||
|
|
||||||
|
// setup transformation
|
||||||
|
$xsl_stylesheet = dirname(__FILE__) . "/../styles/$xsl_stylesheet_name.xsl";
|
||||||
|
$xslt_processor = new ConfigDoc_HTMLXSLTProcessor();
|
||||||
|
$xslt_processor->setParameters($parameters);
|
||||||
|
$xslt_processor->importStylesheet($xsl_stylesheet);
|
||||||
|
|
||||||
|
return $xslt_processor->transformToHTML($schema_document);
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Remove any generated files
|
||||||
|
*/
|
||||||
|
function cleanup() {
|
||||||
|
unlink('configdoc.xml');
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
?>
|
62
configdoc/library/ConfigDoc/HTMLXSLTProcessor.php
Normal file
62
configdoc/library/ConfigDoc/HTMLXSLTProcessor.php
Normal file
@@ -0,0 +1,62 @@
|
|||||||
|
<?php
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Special XSLTProcessor specifically for HTML documents. Loosely
|
||||||
|
* based off of XSLTProcessor, but not really
|
||||||
|
*/
|
||||||
|
class ConfigDoc_HTMLXSLTProcessor
|
||||||
|
{
|
||||||
|
|
||||||
|
protected $xsltProcessor;
|
||||||
|
|
||||||
|
public function __construct() {
|
||||||
|
$this->xsltProcessor = new XSLTProcessor();
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Imports stylesheet for processor to use
|
||||||
|
* @param $xsl XSLT DOM tree, or filename of the XSL transformation
|
||||||
|
*/
|
||||||
|
public function importStylesheet($xsl) {
|
||||||
|
if (is_string($xsl)) {
|
||||||
|
$xsl_file = $xsl;
|
||||||
|
$xsl = new DOMDocument();
|
||||||
|
$xsl->load($xsl_file);
|
||||||
|
}
|
||||||
|
return $this->xsltProcessor->importStylesheet($xsl);
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Transforms an XML file into HTML based on the stylesheet
|
||||||
|
* @param $xml XML DOM tree
|
||||||
|
*/
|
||||||
|
public function transformToHTML($xml) {
|
||||||
|
$out = $this->xsltProcessor->transformToXML($xml);
|
||||||
|
|
||||||
|
// fudges for HTML backwards compatibility
|
||||||
|
$out = str_replace('/>', ' />', $out); // <br /> not <br/>
|
||||||
|
$out = str_replace(' xmlns=""', '', $out); // rm unnecessary xmlns
|
||||||
|
if (class_exists('Tidy')) {
|
||||||
|
// cleanup output
|
||||||
|
$config = array(
|
||||||
|
'indent' => true,
|
||||||
|
'output-xhtml' => true,
|
||||||
|
'wrap' => 80
|
||||||
|
);
|
||||||
|
$tidy = new Tidy;
|
||||||
|
$tidy->parseString($out, $config, 'utf8');
|
||||||
|
$tidy->cleanRepair();
|
||||||
|
$out = (string) $tidy;
|
||||||
|
}
|
||||||
|
return $out;
|
||||||
|
}
|
||||||
|
|
||||||
|
public function setParameters($options) {
|
||||||
|
foreach ($options as $name => $value) {
|
||||||
|
$this->xsltProcessor->setParameter('', $name, $value);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
?>
|
26
configdoc/library/ConfigDoc/XMLSerializer.php
Normal file
26
configdoc/library/ConfigDoc/XMLSerializer.php
Normal file
@@ -0,0 +1,26 @@
|
|||||||
|
<?php
|
||||||
|
|
||||||
|
/**
|
||||||
|
* The XMLSerializer hierarchy of classes consist of classes that take
|
||||||
|
* objects and serialize them into XML, specifically DOM, form; this
|
||||||
|
* super-class contains convenience functions for those classes.
|
||||||
|
*/
|
||||||
|
class ConfigDoc_XMLSerializer
|
||||||
|
{
|
||||||
|
|
||||||
|
protected function appendHTMLDiv($document, $node, $html) {
|
||||||
|
$purifier = HTMLPurifier::getInstance();
|
||||||
|
$html = $purifier->purify($html);
|
||||||
|
$dom_html = $document->createDocumentFragment();
|
||||||
|
$dom_html->appendXML($html);
|
||||||
|
|
||||||
|
$dom_div = $document->createElement('div');
|
||||||
|
$dom_div->setAttribute('xmlns', 'http://www.w3.org/1999/xhtml');
|
||||||
|
$dom_div->appendChild($dom_html);
|
||||||
|
|
||||||
|
$node->appendChild($dom_div);
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
?>
|
118
configdoc/library/ConfigDoc/XMLSerializer/ConfigSchema.php
Normal file
118
configdoc/library/ConfigDoc/XMLSerializer/ConfigSchema.php
Normal file
@@ -0,0 +1,118 @@
|
|||||||
|
<?php
|
||||||
|
|
||||||
|
require_once 'ConfigDoc/XMLSerializer.php';
|
||||||
|
|
||||||
|
class ConfigDoc_XMLSerializer_ConfigSchema extends ConfigDoc_XMLSerializer
|
||||||
|
{
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Serializes a schema into DOM form
|
||||||
|
* @todo Split into sub-serializers
|
||||||
|
* @param $schema HTMLPurifier_ConfigSchema to serialize
|
||||||
|
*/
|
||||||
|
public function serialize($schema) {
|
||||||
|
$dom_document = new DOMDocument('1.0', 'UTF-8');
|
||||||
|
$dom_root = $dom_document->createElement('configdoc');
|
||||||
|
$dom_document->appendChild($dom_root);
|
||||||
|
$dom_document->formatOutput = true;
|
||||||
|
|
||||||
|
// add the name of the application
|
||||||
|
$dom_root->appendChild($dom_document->createElement('title', 'HTML Purifier'));
|
||||||
|
|
||||||
|
/*
|
||||||
|
TODO for XML format:
|
||||||
|
- create a definition (DTD or other) once interface stabilizes
|
||||||
|
*/
|
||||||
|
|
||||||
|
foreach($schema->info as $namespace_name => $namespace_info) {
|
||||||
|
|
||||||
|
$dom_namespace = $dom_document->createElement('namespace');
|
||||||
|
$dom_root->appendChild($dom_namespace);
|
||||||
|
|
||||||
|
$dom_namespace->setAttribute('id', $namespace_name);
|
||||||
|
$dom_namespace->appendChild(
|
||||||
|
$dom_document->createElement('name', $namespace_name)
|
||||||
|
);
|
||||||
|
$dom_namespace_description = $dom_document->createElement('description');
|
||||||
|
$dom_namespace->appendChild($dom_namespace_description);
|
||||||
|
$this->appendHTMLDiv($dom_document, $dom_namespace_description,
|
||||||
|
$schema->info_namespace[$namespace_name]->description);
|
||||||
|
|
||||||
|
foreach ($namespace_info as $name => $info) {
|
||||||
|
|
||||||
|
if ($info->class == 'alias') continue;
|
||||||
|
|
||||||
|
$dom_directive = $dom_document->createElement('directive');
|
||||||
|
$dom_namespace->appendChild($dom_directive);
|
||||||
|
|
||||||
|
$dom_directive->setAttribute('id', $namespace_name . '.' . $name);
|
||||||
|
$dom_directive->appendChild(
|
||||||
|
$dom_document->createElement('name', $name)
|
||||||
|
);
|
||||||
|
|
||||||
|
$dom_constraints = $dom_document->createElement('constraints');
|
||||||
|
$dom_directive->appendChild($dom_constraints);
|
||||||
|
|
||||||
|
$dom_type = $dom_document->createElement('type', $info->type);
|
||||||
|
if ($info->allow_null) {
|
||||||
|
$dom_type->setAttribute('allow-null', 'yes');
|
||||||
|
}
|
||||||
|
$dom_constraints->appendChild($dom_type);
|
||||||
|
|
||||||
|
if ($info->allowed !== true) {
|
||||||
|
$dom_allowed = $dom_document->createElement('allowed');
|
||||||
|
$dom_constraints->appendChild($dom_allowed);
|
||||||
|
foreach ($info->allowed as $allowed => $bool) {
|
||||||
|
$dom_allowed->appendChild(
|
||||||
|
$dom_document->createElement('value', $allowed)
|
||||||
|
);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
$raw_default = $schema->defaults[$namespace_name][$name];
|
||||||
|
if (is_bool($raw_default)) {
|
||||||
|
$default = $raw_default ? 'true' : 'false';
|
||||||
|
} elseif (is_string($raw_default)) {
|
||||||
|
$default = "\"$raw_default\"";
|
||||||
|
} elseif (is_null($raw_default)) {
|
||||||
|
$default = 'null';
|
||||||
|
} else {
|
||||||
|
$default = print_r(
|
||||||
|
$schema->defaults[$namespace_name][$name], true
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
$dom_default = $dom_document->createElement('default', $default);
|
||||||
|
|
||||||
|
// remove this once we get a DTD
|
||||||
|
$dom_default->setAttribute('xml:space', 'preserve');
|
||||||
|
|
||||||
|
$dom_constraints->appendChild($dom_default);
|
||||||
|
|
||||||
|
$dom_descriptions = $dom_document->createElement('descriptions');
|
||||||
|
$dom_directive->appendChild($dom_descriptions);
|
||||||
|
|
||||||
|
foreach ($info->descriptions as $file => $file_descriptions) {
|
||||||
|
foreach ($file_descriptions as $line => $description) {
|
||||||
|
$dom_description = $dom_document->createElement('description');
|
||||||
|
// refuse to write $file if it's a full path
|
||||||
|
if (str_replace('\\', '/', realpath($file)) != $file) {
|
||||||
|
$dom_description->setAttribute('file', $file);
|
||||||
|
$dom_description->setAttribute('line', $line);
|
||||||
|
}
|
||||||
|
$this->appendHTMLDiv($dom_document, $dom_description, $description);
|
||||||
|
$dom_descriptions->appendChild($dom_description);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
return $dom_document;
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
?>
|
27
configdoc/library/ConfigDoc/XMLSerializer/Types.php
Normal file
27
configdoc/library/ConfigDoc/XMLSerializer/Types.php
Normal file
@@ -0,0 +1,27 @@
|
|||||||
|
<?php
|
||||||
|
|
||||||
|
require_once 'ConfigDoc/XMLSerializer.php';
|
||||||
|
|
||||||
|
class ConfigDoc_XMLSerializer_Types extends ConfigDoc_XMLSerializer
|
||||||
|
{
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Serializes the types in a schema into DOM form
|
||||||
|
* @param $schema HTMLPurifier_ConfigSchema owner of types to serialize
|
||||||
|
*/
|
||||||
|
public function serialize($schema) {
|
||||||
|
$types_document = new DOMDocument('1.0', 'UTF-8');
|
||||||
|
$types_root = $types_document->createElement('types');
|
||||||
|
$types_document->appendChild($types_root);
|
||||||
|
$types_document->formatOutput = true;
|
||||||
|
foreach ($schema->types as $name => $expanded_name) {
|
||||||
|
$types_type = $types_document->createElement('type', $expanded_name);
|
||||||
|
$types_type->setAttribute('id', $name);
|
||||||
|
$types_root->appendChild($types_type);
|
||||||
|
}
|
||||||
|
return $types_document;
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
?>
|
@@ -12,19 +12,21 @@
|
|||||||
indent = "no"
|
indent = "no"
|
||||||
media-type = "text/html"
|
media-type = "text/html"
|
||||||
/>
|
/>
|
||||||
|
<xsl:param name="css" select="'styles/plain.css'"/>
|
||||||
|
<xsl:param name="title" select="'Configuration Documentation'"/>
|
||||||
|
|
||||||
<xsl:variable name="typeLookup" select="document('../types.xml')" />
|
<xsl:variable name="typeLookup" select="document('../types.xml')" />
|
||||||
|
|
||||||
<xsl:template match="/">
|
<xsl:template match="/">
|
||||||
<html lang="en" xml:lang="en">
|
<html lang="en" xml:lang="en">
|
||||||
<head>
|
<head>
|
||||||
<title>Configuration Documentation - <xsl:value-of select="/configdoc/title" /></title>
|
<title><xsl:value-of select="$title" /> - <xsl:value-of select="/configdoc/title" /></title>
|
||||||
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8" />
|
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8" />
|
||||||
<link rel="stylesheet" type="text/css" href="styles/plain.css" />
|
<link rel="stylesheet" type="text/css" href="{$css}" />
|
||||||
</head>
|
</head>
|
||||||
<body>
|
<body>
|
||||||
<div id="library"><xsl:value-of select="/configdoc/title" /></div>
|
<div id="library"><xsl:value-of select="/configdoc/title" /></div>
|
||||||
<h1>Configuration Documentation</h1>
|
<h1><xsl:value-of select="$title" /></h1>
|
||||||
<h2>Table of Contents</h2>
|
<h2>Table of Contents</h2>
|
||||||
<ul id="toc">
|
<ul id="toc">
|
||||||
<xsl:apply-templates mode="toc" />
|
<xsl:apply-templates mode="toc" />
|
||||||
@@ -76,15 +78,17 @@
|
|||||||
<table class="constraints">
|
<table class="constraints">
|
||||||
<xsl:apply-templates />
|
<xsl:apply-templates />
|
||||||
<!-- Calculated other values -->
|
<!-- Calculated other values -->
|
||||||
<tr>
|
<xsl:if test="../descriptions/description[@file]">
|
||||||
<th>Used by:</th>
|
<tr>
|
||||||
<td>
|
<th>Used by:</th>
|
||||||
<xsl:for-each select="../descriptions/description">
|
<td>
|
||||||
<xsl:if test="position()>1">, </xsl:if>
|
<xsl:for-each select="../descriptions/description">
|
||||||
<xsl:value-of select="@file" />
|
<xsl:if test="position()>1">, </xsl:if>
|
||||||
</xsl:for-each>
|
<xsl:value-of select="@file" />
|
||||||
</td>
|
</xsl:for-each>
|
||||||
</tr>
|
</td>
|
||||||
|
</tr>
|
||||||
|
</xsl:if>
|
||||||
</table>
|
</table>
|
||||||
</xsl:template>
|
</xsl:template>
|
||||||
<xsl:template match="directive//description">
|
<xsl:template match="directive//description">
|
||||||
|
@@ -17,9 +17,12 @@
|
|||||||
<div id="home"><a href="http://htmlpurifier.org/">HTML Purifier</a> End-User Documentation</div>
|
<div id="home"><a href="http://htmlpurifier.org/">HTML Purifier</a> End-User Documentation</div>
|
||||||
|
|
||||||
<p>HTML Purifier currently natively supports only a subset of HTML's
|
<p>HTML Purifier currently natively supports only a subset of HTML's
|
||||||
allowed elements, attributes, and behavior. This is by design,
|
allowed elements, attributes, and behavior; specifically, this subset
|
||||||
but as the user is always right, they'll need some method to overload
|
is the set of elements that are safe for untrusted users to use.
|
||||||
these behaviors.</p>
|
However, HTML Purifier is often utilized to ensure standards-compliance
|
||||||
|
from input that is trusted (making it a sort of Tidy substitute),
|
||||||
|
and often users need to define new elements or attributes. The
|
||||||
|
advanced API is oriented specifically for these use-cases.</p>
|
||||||
|
|
||||||
<p>Our goals are to let the user:</p>
|
<p>Our goals are to let the user:</p>
|
||||||
|
|
||||||
@@ -27,20 +30,15 @@ these behaviors.</p>
|
|||||||
<dt>Select</dt>
|
<dt>Select</dt>
|
||||||
<dd><ul>
|
<dd><ul>
|
||||||
<li>Doctype</li>
|
<li>Doctype</li>
|
||||||
<li>Mode: Lenient / Correctional</li>
|
<!-- <li>Filterset</li> -->
|
||||||
<li>Elements / Attributes / Modules</li>
|
<li>Elements / Attributes / Modules</li>
|
||||||
<li>Filterset</li>
|
<li>Tidy</li>
|
||||||
</ul></dd>
|
</ul></dd>
|
||||||
<dt>Customize</dt>
|
<dt>Customize</dt>
|
||||||
<dd><ul>
|
<dd><ul>
|
||||||
<li>Attributes</li>
|
<li>Attributes</li>
|
||||||
<li>Elements</li>
|
<li>Elements</li>
|
||||||
</ul></dd>
|
<!--<li>Doctypes</li>-->
|
||||||
<dt>Internals</dt>
|
|
||||||
<dd><ul>
|
|
||||||
<li>Modules / Elements / Attributes / Attribute Types</li>
|
|
||||||
<li>Filtersets</li>
|
|
||||||
<li>Doctype</li>
|
|
||||||
</ul></dd>
|
</ul></dd>
|
||||||
</dl>
|
</dl>
|
||||||
|
|
||||||
@@ -68,136 +66,64 @@ Transitional, however, we really shouldn't be guessing what the user's
|
|||||||
doctype is. Fortunantely, people who can't be bothered to set this won't
|
doctype is. Fortunantely, people who can't be bothered to set this won't
|
||||||
be bothered when their pages stop validating.</p>
|
be bothered when their pages stop validating.</p>
|
||||||
|
|
||||||
<h3>Selecting Mode</h3>
|
|
||||||
|
|
||||||
<p>Within doctypes, there are various <strong>modes</strong> of operation.
|
|
||||||
These indicate variant behaviors that, while not strictly changing the
|
|
||||||
allowed set of elements and attributes, definitely affect the output.
|
|
||||||
Currently, we have two modes, which may be used together:</p>
|
|
||||||
|
|
||||||
<dl>
|
|
||||||
<dt>Lenient</dt>
|
|
||||||
<dd>
|
|
||||||
<p>Deprecated elements and attributes will be transformed into
|
|
||||||
standards-compliant alternatives when explicitly disallowed.</p>
|
|
||||||
<p>For example, in the XHTML 1.0 Strict doctype, a <code>center</code>
|
|
||||||
element would be turned into a <code>div</code> with the CSS property
|
|
||||||
<code>text-align:center;</code>, but in XHTML 1.0 Transitional
|
|
||||||
the element would be preserved.</p>
|
|
||||||
<p>This mode is on by default.</p>
|
|
||||||
</dd>
|
|
||||||
<dt>Correctional[items to correct]</dt>
|
|
||||||
<dd>
|
|
||||||
<p>Deprecated elements and attributes will be transformed into
|
|
||||||
standards-compliant alternatives whenever possible.
|
|
||||||
It may have various levels of operation.</p>
|
|
||||||
<p>Referring back to the previous example, the <code>center</code> element would
|
|
||||||
be transformed in both cases. However, elements without a
|
|
||||||
reasonable standards-compliant alternative will be preserved
|
|
||||||
in their form.</p>
|
|
||||||
<p>A user may want to correct certain deprecated attributes, but
|
|
||||||
not others. For example, the <code>bgcolor</code> attribute may be
|
|
||||||
acceptable, but the <code>center</code> element not; also, possibly,
|
|
||||||
an HTML Purifier transformation may be buggy, so the user wants
|
|
||||||
to forgo it. Thus, correctional accepts an array defining which
|
|
||||||
elements and attributes to cleanup, or no parameter at all, which
|
|
||||||
means everything gets corrected. This also means that each
|
|
||||||
correction needs to be given a unique ID that can be referenced
|
|
||||||
in this manner. (We may also allow globbing, like *.name or a.*
|
|
||||||
for mass-enabling correction, and subtractive mode, where things
|
|
||||||
specified stop correction.) This array gets passed into the
|
|
||||||
constructor of the mode's module.</p>
|
|
||||||
<p>This mode is on by default.</p>
|
|
||||||
</dd>
|
|
||||||
</dl>
|
|
||||||
|
|
||||||
<p>A possible call to select modes would be:</p>
|
|
||||||
|
|
||||||
<pre>$config->set('HTML', 'Mode', array('correctional', 'lenient'));</pre>
|
|
||||||
|
|
||||||
<p>If modes have extra parameters, a hash is necessary:</p>
|
|
||||||
|
|
||||||
<pre>$config->set('HTML', 'Mode', array(
|
|
||||||
'correctional' => 'center,a.name',
|
|
||||||
'lenient' => true // this one's just boolean
|
|
||||||
));</pre>
|
|
||||||
|
|
||||||
<p>Modes may be specified along with the doctype declaration (we may want
|
|
||||||
to get a better set of separator characters):</p>
|
|
||||||
|
|
||||||
<pre>$config->setDoctype('XHTML Transitional 1.0', '+correctional[center,a.name] -lenient');</pre>
|
|
||||||
|
|
||||||
<p>
|
|
||||||
With regards to the various levels of operation conjectured in the
|
|
||||||
Correctional mode, this is prompted by the fact that a user may want to
|
|
||||||
correct certain problems but not others, for example, fix the <code>center</code>
|
|
||||||
element but not the <code>u</code> element, both of which are deprecated.
|
|
||||||
Having an integer <q>level</q> will not work very well for such fine
|
|
||||||
grained tweaking, but an array of specific settings might.</p>
|
|
||||||
|
|
||||||
<h3>Selecting Elements / Attributes / Modules</h3>
|
<h3>Selecting Elements / Attributes / Modules</h3>
|
||||||
|
|
||||||
<p></p>
|
<p>HTML Purifier will, by default, allow as many elements and attributes
|
||||||
|
as possible. However, a user may decide to roll their own filterset by
|
||||||
|
selecting modules, elements and attributes to allow for their own
|
||||||
|
specific use-case. This can be done using %HTML.Allowed:</p>
|
||||||
|
|
||||||
<p>If this cookie cutter approach doesn't appeal to a user, they may
|
<pre>$config->set('HTML', 'Allowed', 'a[href|title],em,p,blockquote');</pre>
|
||||||
decide to roll their own filterset by selecting modules, elements and
|
|
||||||
attributes to allow.</p>
|
|
||||||
|
|
||||||
<p class="technical">This would make use of the same facilities
|
<p class="technical">The directive %HTML.Allowed is a convenience feature
|
||||||
as a filterset author would use, except that it would go under an
|
that may be fully expressed with the legacy interface.</p>
|
||||||
<q>anonymous</q> filterset that would be auto-selected if any of the
|
|
||||||
relevant module/elements/attribute selection configuration directives were
|
|
||||||
non-null.</p>
|
|
||||||
|
|
||||||
<p>In practice, this is the most commonly demanded feature. Most users are
|
<p>We currently support another interface from older versions:</p>
|
||||||
perfectly happy defining a filterset that looks like:</p>
|
|
||||||
|
|
||||||
<pre>$config->setAllowedHTML('a[href,title];em;p;blockquote');</pre>
|
|
||||||
|
|
||||||
<p class="technical">The directive %HTML.Allowed is a convenience function
|
|
||||||
that may be fully expressed with the legacy interface, and thus is
|
|
||||||
given its own setter.</p>
|
|
||||||
|
|
||||||
<p>We currently support a separated interface, which also must be preserved:</p>
|
|
||||||
|
|
||||||
<pre>$config->set('HTML', 'AllowedElements', 'a,em,p,blockquote');
|
<pre>$config->set('HTML', 'AllowedElements', 'a,em,p,blockquote');
|
||||||
$config->set('HTML', 'AllowedAttributes', 'a.href,a.title');</pre>
|
$config->set('HTML', 'AllowedAttributes', 'a.href,a.title');</pre>
|
||||||
|
|
||||||
<p>A user may also choose to allow modules:</p>
|
<p>A user may also choose to allow modules using a specialized
|
||||||
|
directive:</p>
|
||||||
|
|
||||||
<pre>$config->set('HTML', 'AllowedModules', 'Hypertext,Text,Lists'); // or
|
<pre>$config->set('HTML', 'AllowedModules', 'Hypertext,Text,Lists');</pre>
|
||||||
$config->setAllowedHTML('Hypertext,Text,Lists');</pre>
|
|
||||||
|
|
||||||
<p>But it is not expected that this feature will be widely used.</p>
|
<p>But it is not expected that this feature will be widely used.</p>
|
||||||
|
|
||||||
<p class="fixme">The granularity of these modules is too coarse for
|
<p class="technical">Module selection will work slightly differently
|
||||||
the average user (for example, the core module loads everything from
|
from the other AllowedElements and AllowedAttributes directives by
|
||||||
the essential <code>p</code> element to the not-so-safe <code>h1</code>
|
directly modifying the doctype you are operating in, in the spirit of
|
||||||
element). How do we make this still a viable solution? Possible answers
|
XHTML 1.1's modularization. We stop users from shooting themselves in the
|
||||||
may be sub-modules or module parameters. This may not even be a problem,
|
foot by mandating the modules in %HTML.CoreModules be used.</p>
|
||||||
considering that most people won't be selecting modules.</p>
|
|
||||||
|
|
||||||
<p class="technical">Modules are distinguished from regular elements by the
|
<p class="technical">Modules are distinguished from regular elements by the
|
||||||
case of their first letter. While XML distinguishes between and allows
|
case of their first letter. While XML distinguishes between and allows
|
||||||
lower and uppercase letters in element names, most well-known XML
|
lower and uppercase letters in element names, XHTML uses only lower-case
|
||||||
languages use only lower-case
|
|
||||||
element names for sake of consistency.</p>
|
element names for sake of consistency.</p>
|
||||||
|
|
||||||
<p class="technical">Considering that, internally speaking, as mandated by
|
<h3>Selecting Tidy</h3>
|
||||||
the XHTML 1.1 Modularization specification, we have organized our
|
|
||||||
elements around modules, considerable gymnastics will be needed to
|
|
||||||
get this sort of functionality working.</p>
|
|
||||||
|
|
||||||
|
<p>The name of this segment of functionality is inspired off of Dave
|
||||||
|
Ragget's program HTML Tidy, which purported to help clean up HTML. In
|
||||||
|
HTML Purifier, Tidy functionality involves turning unsupported and
|
||||||
|
deprecated elements into standards-compliant ones, maintaining
|
||||||
|
backwards compatibility, and enforcing best practices.</p>
|
||||||
|
|
||||||
|
<p>This is a complicated feature, and is explained more in depth at
|
||||||
|
<a href="enduser-tidy.html">the Tidy documentation page</a>.</p>
|
||||||
|
|
||||||
|
<!--
|
||||||
<h3>Unified selector</h3>
|
<h3>Unified selector</h3>
|
||||||
|
|
||||||
<p>Because selecting each and every one of these configuration options
|
<p>Because selecting each and every one of these configuration options
|
||||||
is a chore, we may wish to offer a specialized configuration method
|
is a chore, we may wish to offer a specialized configuration method
|
||||||
for selecting a filterset. Possibility:</p>
|
for selecting a filterset. Possibility:</p>
|
||||||
|
|
||||||
<pre>function selectFilter($doctype, $filterset, $mode)</pre>
|
<pre>function selectFilter($doctype, $filterset, $tidy)</pre>
|
||||||
|
|
||||||
<p>...which is simply a light wrapper over the individual configuration
|
<p>...which is simply a light wrapper over the individual configuration
|
||||||
calls. A custom config file format or text format could also be adopted.</p>
|
calls. A custom config file format or text format could also be adopted.</p>
|
||||||
|
-->
|
||||||
|
|
||||||
<h2>Customize</h2>
|
<h2>Customize</h2>
|
||||||
|
|
||||||
@@ -209,38 +135,34 @@ use-cases.</p>
|
|||||||
|
|
||||||
<p>Note that the functions described here are only available if
|
<p>Note that the functions described here are only available if
|
||||||
a raw copy of <code>HTMLPurifier_HTMLDefinition</code> was retrieved.
|
a raw copy of <code>HTMLPurifier_HTMLDefinition</code> was retrieved.
|
||||||
<code>addAttribute</code> may work on a processed copy, but for
|
Furthermore, caching may prevent your changes from immediately
|
||||||
consistency's sake we will mandate this for everything.</p>
|
being seen: consult <a href="enduser-customize.html">enduser-customize.html</a> on how
|
||||||
|
to work around this.</p>
|
||||||
|
|
||||||
<h3>Attributes</h3>
|
<h3>Attributes</h3>
|
||||||
|
|
||||||
<p>An attribute is bound to an element by a name and has a specific
|
<p>An attribute is bound to an element by a name and has a specific
|
||||||
<code>AttrDef</code> that validates it. Thus, the interface should
|
<code>AttrDef</code> that validates it. The interface is therefore:</p>
|
||||||
be:</p>
|
|
||||||
|
|
||||||
<pre>function addAttribute($element, $attribute, $attribute_def);</pre>
|
<pre>function addAttribute($element, $attribute, $attribute_def);</pre>
|
||||||
|
|
||||||
<p>With a use-case that looks like:</p>
|
<p>Example of the functionality in action:</p>
|
||||||
|
|
||||||
<pre>$def->addAttribute('a', 'rel', new HTMLPurifier_AttrDef_Enum(array('nofollow')));</pre>
|
<pre>$def->addAttribute('a', 'rel', 'Enum#nofollow');</pre>
|
||||||
|
|
||||||
<p>The <code>$attribute_def</code> value can be a little flexible,
|
<p>The <code>$attribute_def</code> value is flexible,
|
||||||
to make things simpler. We'll let it also be:</p>
|
to make things simpler. It can be a literal object or:</p>
|
||||||
|
|
||||||
<ul>
|
<ul>
|
||||||
<li>Class name: We'll instantiate it for you</li>
|
<!--<li>Class name: We'll instantiate it for you</li>
|
||||||
<li>Function name: We'll create an <code>HTMLPurifier_AttrDef_Anonymous</code>
|
<li>Function name: We'll create an <code>HTMLPurifier_AttrDef_Anonymous</code>
|
||||||
class with that function registered as a callback.</li>
|
class with that function registered as a callback.</li>-->
|
||||||
<li>String attribute type: We'll use <code>HTMLPurifier_AttrTypes</code>
|
<li>String attribute type: We'll use <code>HTMLPurifier_AttrTypes</code>
|
||||||
</li>
|
to resolve it for you. Any data that follows a hash mark (#) will
|
||||||
<li>String starting with <code>enum(</code>: We'll explode it and stuff it in an
|
be used to customize the attribute type: in the example above,
|
||||||
<code>HTMLPurifier_AttrDef_Enum</code> for you.</li>
|
we specify which values for Enum to allow.</li>
|
||||||
</ul>
|
</ul>
|
||||||
|
|
||||||
<p>Making the previous example written as:</p>
|
|
||||||
|
|
||||||
<pre>$def->addAttribute('a', 'rel', 'enum(nofollow)');</pre>
|
|
||||||
|
|
||||||
<h3>Elements</h3>
|
<h3>Elements</h3>
|
||||||
|
|
||||||
<p>An element requires certain information as specified by
|
<p>An element requires certain information as specified by
|
||||||
@@ -255,7 +177,8 @@ the usual things required are:</p>
|
|||||||
|
|
||||||
<p>This suggests an API like this:</p>
|
<p>This suggests an API like this:</p>
|
||||||
|
|
||||||
<pre>function addElement($element, $type, $content_model, $attributes = array());</pre>
|
<pre>function addElement($element, $type, $contents,
|
||||||
|
$attr_collections = array(); $attributes = array());</pre>
|
||||||
|
|
||||||
<p>Each parameter explained in depth:</p>
|
<p>Each parameter explained in depth:</p>
|
||||||
|
|
||||||
@@ -264,11 +187,15 @@ the usual things required are:</p>
|
|||||||
<dd>Element name, ex. 'label'</dd>
|
<dd>Element name, ex. 'label'</dd>
|
||||||
<dt><code>$type</code></dt>
|
<dt><code>$type</code></dt>
|
||||||
<dd>Content set to register in, ex. 'Inline' or 'Flow'</dd>
|
<dd>Content set to register in, ex. 'Inline' or 'Flow'</dd>
|
||||||
<dt><code>$content_model</code></dt>
|
<dt><code>$contents</code></dt>
|
||||||
<dd>Description of allowed children. This is a merged form of
|
<dd>Description of allowed children. This is a merged form of
|
||||||
<code>HTMLPurifier_ElementDef</code>'s member variables
|
<code>HTMLPurifier_ElementDef</code>'s member variables
|
||||||
<code>$content_model</code> and <code>$content_model_type</code>,
|
<code>$content_model</code> and <code>$content_model_type</code>,
|
||||||
where the form is <q>Type: Model</q>, ex. 'Optional: Inline'.</dd>
|
where the form is <q>Type: Model</q>, ex. 'Optional: Inline'.
|
||||||
|
There are also a number of predefined templates one may use.</dd>
|
||||||
|
<dt><code>$attr_collections</code></dt>
|
||||||
|
<dd>Array (or string if only one) of attribute collection(s) to
|
||||||
|
merge into the attributes array.</dd>
|
||||||
<dt><code>$attributes</code></dt>
|
<dt><code>$attributes</code></dt>
|
||||||
<dd>Array of attribute names to attribute definitions, much like
|
<dd>Array of attribute names to attribute definitions, much like
|
||||||
the above-described attribute customization.</dd>
|
the above-described attribute customization.</dd>
|
||||||
@@ -276,11 +203,10 @@ the usual things required are:</p>
|
|||||||
|
|
||||||
<p>A possible usage:</p>
|
<p>A possible usage:</p>
|
||||||
|
|
||||||
<pre>$def->addElement('font', 'Inline', 'Optional: Inline',
|
<pre>$def->addElement('font', 'Inline', 'Optional: Inline', 'Common',
|
||||||
array(0 => array('Common'), 'color' => 'Color'));</pre>
|
array('color' => 'Color'));</pre>
|
||||||
|
|
||||||
<p>We may want to Common attribute collection inclusion to be added
|
<p>See <code>HTMLPurifier/HTMLModule.php</code> for details.</p>
|
||||||
by default.</p>
|
|
||||||
|
|
||||||
<div id="version">$Id$</div>
|
<div id="version">$Id$</div>
|
||||||
|
|
||||||
|
791
docs/enduser-customize.html
Normal file
791
docs/enduser-customize.html
Normal file
@@ -0,0 +1,791 @@
|
|||||||
|
<?xml version="1.0" encoding="UTF-8"?>
|
||||||
|
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
|
||||||
|
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
|
||||||
|
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head>
|
||||||
|
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
|
||||||
|
<meta name="description" content="Tutorial for customizing HTML Purifier's tag and attribute sets." />
|
||||||
|
<link rel="stylesheet" type="text/css" href="style.css" />
|
||||||
|
|
||||||
|
<title>Customize - HTML Purifier</title>
|
||||||
|
|
||||||
|
</head><body>
|
||||||
|
|
||||||
|
<h1 class="subtitled">Customize!</h1>
|
||||||
|
<div class="subtitle">HTML Purifier is a Swiss-Army Knife</div>
|
||||||
|
|
||||||
|
<div id="filing">Filed under End-User</div>
|
||||||
|
<div id="index">Return to the <a href="index.html">index</a>.</div>
|
||||||
|
<div id="home"><a href="http://htmlpurifier.org/">HTML Purifier</a> End-User Documentation</div>
|
||||||
|
|
||||||
|
<div id="applicability">
|
||||||
|
This document covers currently unreleased functionality and
|
||||||
|
only applies to recent SVN checkouts.
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
You may have heard of the <a href="dev-advanced-api.html">Advanced API</a>.
|
||||||
|
If you're interested in reading dry prose and boring functional
|
||||||
|
specifications, feel free to click that link to get a no-nonsense overview
|
||||||
|
on the Advanced API. For the rest of us, there's this tutorial. By the time
|
||||||
|
you're finished reading this, you should have a pretty good idea on
|
||||||
|
how to implement custom tags and attributes that HTML Purifier may not have.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<h2>Is it necessary?</h2>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
Before we even write any code, it is paramount to consider whether or
|
||||||
|
not the code we're writing is necessary or not. HTML Purifier, by default,
|
||||||
|
contains a large set of elements and attributes: large enough so that
|
||||||
|
<em>any</em> element or attribute in XHTML 1.0 (and its HTML variant)
|
||||||
|
that can be safely used by the general public is implemented.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
So what needs to be implemented? (Feel free to skip this section if
|
||||||
|
you know what you want).
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<h3>XHTML 1.0</h3>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
All of the modules listed below are based off of the
|
||||||
|
<a href="http://www.w3.org/TR/2001/REC-xhtml-modularization-20010410/abstract_modules.html#sec_5.2.">modularization of
|
||||||
|
XHTML</a>, which, while technically for XHTML 1.1, is quite a useful
|
||||||
|
resource.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>Structure</li>
|
||||||
|
<li>Frames</li>
|
||||||
|
<li>Applets (deprecated)</li>
|
||||||
|
<li>Forms</li>
|
||||||
|
<li>Image maps</li>
|
||||||
|
<li>Objects</li>
|
||||||
|
<li>Frames</li>
|
||||||
|
<li>Events</li>
|
||||||
|
<li>Meta-information</li>
|
||||||
|
<li>Style sheets</li>
|
||||||
|
<li>Link (not hypertext)</li>
|
||||||
|
<li>Base</li>
|
||||||
|
<li>Name</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
If you don't recognize it, you probably don't need it. But the curious
|
||||||
|
can look all of these modules up in the above-mentioned document. Note
|
||||||
|
that inline scripting comes packaged with HTML Purifier (more on this
|
||||||
|
later).
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<h3>XHTML 1.1</h3>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
We have not implemented the
|
||||||
|
<a href="http://www.w3.org/TR/2001/REC-ruby-20010531/">Ruby module</a>,
|
||||||
|
which defines a set of tags
|
||||||
|
for publishing short annotations for text, used mostly in Japanese
|
||||||
|
and Chinese school texts.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<h3>XHTML 2.0</h3>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
<a href="http://www.w3.org/TR/xhtml2/">XHTML 2.0</a> is still a
|
||||||
|
working draft, so any elements introduced in the
|
||||||
|
specification have not been implemented and will not be implemented
|
||||||
|
until we get a recommendation or proposal. Because XHTML 2.0 is
|
||||||
|
an entirely new markup language, implementing rules for it will be
|
||||||
|
no easy task.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<h3>HTML 5</h3>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
<a href="http://www.whatwg.org/specs/web-apps/current-work/">HTML 5</a>
|
||||||
|
is a fork of HTML 4.01 by WHATWG, who believed that XHTML 2.0 was headed
|
||||||
|
in the wrong direction. It too is a working draft, and may change
|
||||||
|
drastically before publication, but it should be noted that the
|
||||||
|
<code>canvas</code> tag has been implemented by many browser vendors.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<h3>Proprietary</h3>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
There are a number of proprietary tags still in the wild. Many of them
|
||||||
|
have been documented in <a href="ref-proprietary-tags.txt">ref-proprietary-tags.txt</a>,
|
||||||
|
but there is currently no implementation for any of them.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<h3>Extensions</h3>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
There are also a number of other XML languages out there that can
|
||||||
|
be embedded in HTML documents: two of the most popular are MathML and
|
||||||
|
SVG, and I frequently get requests to implement these. But they are
|
||||||
|
expansive, comprehensive specifications, and it would take far too long
|
||||||
|
to implement them <em>correctly</em> (most systems I've seen go as far
|
||||||
|
as whitelisting tags and no further; come on, what about nesting!)
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
Word of warning: HTML Purifier is currently <em>not</em> namespace
|
||||||
|
aware.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<h2>Giving back</h2>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
As you may imagine from the details above (don't be abashed if you didn't
|
||||||
|
read it all: a glance over would have done), there's quite a bit that
|
||||||
|
HTML Purifier doesn't implement. Recent architectural changes have
|
||||||
|
allowed HTML Purifier to implement elements and attributes that are not
|
||||||
|
safe! Don't worry, they won't be activated unless you set %HTML.Trusted
|
||||||
|
to true, but they certainly help out users who need to put, say, forms
|
||||||
|
on their page and don't want to go through the trouble of reading this
|
||||||
|
and implementing it themself.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
So any of the above that you implement for your own application could
|
||||||
|
help out some other poor sap on the other side of the globe. Help us
|
||||||
|
out, and send back code so that it can be hammered into a module and
|
||||||
|
released with the core. Any code would be greatly appreciated!
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<h2>And now...</h2>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
Enough philosophical talk, time for some code:
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<pre>$config = HTMLPurifier_Config::createDefault();
|
||||||
|
$config->set('HTML', 'DefinitionID', 'enduser-customize.html tutorial');
|
||||||
|
$config->set('HTML', 'DefinitionRev', 1);
|
||||||
|
$def =& $config->getHTMLDefinition(true);</pre>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
Assuming that HTML Purifier has already been properly loaded (hint:
|
||||||
|
include <code>HTMLPurifier.auto.php</code>), this code will set up
|
||||||
|
the environment that you need to start customizing the HTML definition.
|
||||||
|
What's going on?
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>
|
||||||
|
The first three lines are regular configuration code:
|
||||||
|
<ul>
|
||||||
|
<li>
|
||||||
|
%HTML.DefinitionID is set to a unique identifier for your
|
||||||
|
custom HTML definition. This prevents it from clobbering
|
||||||
|
other custom definitions on the same installation.
|
||||||
|
</li>
|
||||||
|
<li>
|
||||||
|
%HTML.DefinitionRev is a revision integer of your HTML
|
||||||
|
definition. Because HTML definitions are cached, you'll need
|
||||||
|
to increment this whenever you make a change in order to flush
|
||||||
|
the cache.
|
||||||
|
</li>
|
||||||
|
</ul>
|
||||||
|
</li>
|
||||||
|
<li>
|
||||||
|
The fourth line retrieves a raw <code>HTMLPurifier_HTMLDefinition</code>
|
||||||
|
object that we will be tweaking. If the parameter was removed, we
|
||||||
|
would be retrieving a fully formed definition object, which is somewhat
|
||||||
|
useless for customization purposes.
|
||||||
|
</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<h3>Broken backwards-compatibility</h3>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
Those of you who have already been twiddling around with the raw
|
||||||
|
HTML definition object, you'll be noticing that you're getting an error
|
||||||
|
when you attempt to retrieve the raw definition object without specifying
|
||||||
|
a DefinitionID. It is vital to caching (see below) that you make a unique
|
||||||
|
name for your customized definition, so make up something right now and
|
||||||
|
things will operate again.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<h2>Turn off caching</h2>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
To make development easier, we're going to temporarily turn off
|
||||||
|
definition caching:
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<pre>$config = HTMLPurifier_Config::createDefault();
|
||||||
|
$config->set('HTML', 'DefinitionID', 'enduser-customize.html tutorial');
|
||||||
|
$config->set('HTML', 'DefinitionRev', 1);
|
||||||
|
<strong>$config->set('Core', 'DefinitionCache', null); // remove this later!</strong>
|
||||||
|
$def =& $config->getHTMLDefinition(true);</pre>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
A few things should be mentioned about the caching mechanism before
|
||||||
|
we move on. For performance reasons, HTML Purifier caches generated
|
||||||
|
<code>HTMLPurifier_Definition</code> objects in serialized files
|
||||||
|
stored (by default) in <code>library/HTMLPurifier/DefinitionCache/Serializer</code>.
|
||||||
|
A lot of processing is done in order to create these objects, so it
|
||||||
|
makes little sense to repeat the same processing over and over again
|
||||||
|
whenever HTML Purifier is called.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
In order to identify a cache entry, HTML Purifier uses three variables:
|
||||||
|
the library's version number, the value of %HTML.DefinitionRev and
|
||||||
|
a serial of relevant configuration. Whenever any of these changes,
|
||||||
|
a new HTML definition is generated. Notice that there is no way
|
||||||
|
for the definition object to track changes to customizations: here, it
|
||||||
|
is up to you to supply appropriate information to DefinitionID and
|
||||||
|
DefinitionRev.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<h2 id="addAttribute">Add an attribute</h2>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
For this example, we're going to implement the <code>target</code> attribute found
|
||||||
|
on <code>a</code> elements. To implement an attribute, we have to
|
||||||
|
ask a few questions:
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<ol>
|
||||||
|
<li>What element is it found on?</li>
|
||||||
|
<li>What is its name?</li>
|
||||||
|
<li>Is it required or optional?</li>
|
||||||
|
<li>What are valid values for it?</li>
|
||||||
|
</ol>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
The first three are easy: the element is <code>a</code>, the attribute
|
||||||
|
is <code>target</code>, and it is not a required attribute. (If it
|
||||||
|
was required, we'd need to append an asterisk to the attribute name,
|
||||||
|
you'll see an example of this in the addElement() example).
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
The last question is a little trickier.
|
||||||
|
Lets allow the special values: _blank, _self, _target and _top.
|
||||||
|
The form of this is called an <strong>enumeration</strong>, a list of
|
||||||
|
valid values, although only one can be used at a time. To translate
|
||||||
|
this into code form, we write:
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<pre>$config = HTMLPurifier_Config::createDefault();
|
||||||
|
$config->set('HTML', 'DefinitionID', 'enduser-customize.html tutorial');
|
||||||
|
$config->set('HTML', 'DefinitionRev', 1);
|
||||||
|
$config->set('Core', 'DefinitionCache', null); // remove this later!
|
||||||
|
$def =& $config->getHTMLDefinition(true);
|
||||||
|
<strong>$def->addAttribute('a', 'target', 'Enum#_blank,_self,_target,_top');</strong></pre>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
The <code>Enum#_blank,_self,_target,_top</code> does all the magic.
|
||||||
|
The string is split into two parts, separated by a hash mark (#):
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<ol>
|
||||||
|
<li>The first part is the name of what we call an <code>AttrDef</code></li>
|
||||||
|
<li>The second part is the parameter of the above-mentioned <code>AttrDef</code></li>
|
||||||
|
</ol>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
If that sounds vague and generic, it's because it is! HTML Purifier defines
|
||||||
|
an assortment of different attribute types one can use, and each of these
|
||||||
|
has their own specialized parameter format. Here are some of the more useful
|
||||||
|
ones:
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<table class="table">
|
||||||
|
<thead>
|
||||||
|
<tr>
|
||||||
|
<th>Type</th>
|
||||||
|
<th>Format</th>
|
||||||
|
<th>Description</th>
|
||||||
|
</tr>
|
||||||
|
</thead>
|
||||||
|
<tbody>
|
||||||
|
<tr>
|
||||||
|
<th>Enum</th>
|
||||||
|
<td><em>[s:]</em>value1,value2,...</td>
|
||||||
|
<td>
|
||||||
|
Attribute with a number of valid values, one of which may be used. When
|
||||||
|
s: is present, the enumeration is case sensitive.
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<th>Bool</th>
|
||||||
|
<td>attribute_name</td>
|
||||||
|
<td>
|
||||||
|
Boolean attribute, with only one valid value: the name
|
||||||
|
of the attribute.
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<th>CDATA</th>
|
||||||
|
<td></td>
|
||||||
|
<td>
|
||||||
|
Attribute of arbitrary text. Can also be referred to as <strong>Text</strong>
|
||||||
|
(the specification makes a semantic distinction between the two).
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<th>ID</th>
|
||||||
|
<td></td>
|
||||||
|
<td>
|
||||||
|
Attribute that specifies a unique ID
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<th>Pixels</th>
|
||||||
|
<td></td>
|
||||||
|
<td>
|
||||||
|
Attribute that specifies an integer pixel length
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<th>Length</th>
|
||||||
|
<td></td>
|
||||||
|
<td>
|
||||||
|
Attribute that specifies a pixel or percentage length
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<th>NMTOKENS</th>
|
||||||
|
<td></td>
|
||||||
|
<td>
|
||||||
|
Attribute that specifies a number of name tokens, example: the
|
||||||
|
<code>class</code> attribute
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<th>URI</th>
|
||||||
|
<td></td>
|
||||||
|
<td>
|
||||||
|
Attribute that specifies a URI, example: the <code>href</code>
|
||||||
|
attribute
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<th>Number</th>
|
||||||
|
<td></td>
|
||||||
|
<td>
|
||||||
|
Attribute that specifies an positive integer number
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
For a complete list, consult
|
||||||
|
<a href="http://htmlpurifier.org/svnroot/htmlpurifier/trunk/library/HTMLPurifier/AttrTypes.php"><code>library/HTMLPurifier/AttrTypes.php</code></a>;
|
||||||
|
more information on attributes that accept parameters can be found on their
|
||||||
|
respective includes in
|
||||||
|
<a href="http://htmlpurifier.org/svnroot/htmlpurifier/trunk/library/HTMLPurifier/AttrDef/"><code>library/HTMLPurifier/AttrDef</code></a>.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
Sometimes, the restrictive list in AttrTypes just doesn't cut it. Don't
|
||||||
|
sweat: you can also use a fully instantiated object as the value. The
|
||||||
|
equivalent, verbose form of the above example is:
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<pre>$config = HTMLPurifier_Config::createDefault();
|
||||||
|
$config->set('HTML', 'DefinitionID', 'enduser-customize.html tutorial');
|
||||||
|
$config->set('HTML', 'DefinitionRev', 1);
|
||||||
|
$config->set('Core', 'DefinitionCache', null); // remove this later!
|
||||||
|
$def =& $config->getHTMLDefinition(true);
|
||||||
|
<strong>$def->addAttribute('a', 'target', new HTMLPurifier_AttrDef_Enum(
|
||||||
|
array('_blank','_self','_target','_top')
|
||||||
|
));</strong></pre>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
Trust me, you'll learn to love the shorthand.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<h2>Add an element</h2>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
Adding attributes is really small-fry stuff, though, and it was possible
|
||||||
|
to add them (albeit a bit more wordy) prior to 2.0. The real gem of
|
||||||
|
the Advanced API is adding elements. There are five questions to
|
||||||
|
ask when adding a new element:
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<ol>
|
||||||
|
<li>What is the element's name?</li>
|
||||||
|
<li>What content set does this element belong to?</li>
|
||||||
|
<li>What are the allowed children of this element?</li>
|
||||||
|
<li>What attributes does the element allow that are general?</li>
|
||||||
|
<li>What attributes does the element allow that are specific to this element?</li>
|
||||||
|
</ol>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
It's a mouthful, and you'll be slightly lost if your not familiar with
|
||||||
|
the HTML specification, so let's explain them step by step.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<h3>Content set</h3>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
The HTML specification defines two major content sets: Inline
|
||||||
|
and Block. Each of these
|
||||||
|
content sets contain a list of elements: Inline contains things like
|
||||||
|
<code>span</code> and <code>b</code> while Block contains things like
|
||||||
|
<code>div</code> and <code>blockquote</code>.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
These content sets amount to a macro mechanism for HTML definition. Most
|
||||||
|
elements in HTML are organized into one of these two sets, and most
|
||||||
|
elements in HTML allow elements from one of these sets. If we had
|
||||||
|
to write each element verbatim into each other element's allowed
|
||||||
|
children, we would have ridiculously large lists; instead we use
|
||||||
|
content sets to compactify the declaration.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
Practically speaking, there are several useful values you can use here:
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<table class="table">
|
||||||
|
<thead>
|
||||||
|
<tr>
|
||||||
|
<th>Content set</th>
|
||||||
|
<th>Description</th>
|
||||||
|
</tr>
|
||||||
|
</thead>
|
||||||
|
<tbody>
|
||||||
|
<tr>
|
||||||
|
<th>Inline</th>
|
||||||
|
<td>Character level elements, text</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<th>Block</th>
|
||||||
|
<td>Block-like elements, like paragraphs and lists</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<th><em>false</em></th>
|
||||||
|
<td>
|
||||||
|
Any element that doesn't fit into the mold, for example <code>li</code>
|
||||||
|
or <code>tr</code>
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
By specifying a valid value here, all other elements that use that
|
||||||
|
content set will also allow your element, without you having to do
|
||||||
|
anything. If you specify <em>false</em>, you'll have to register
|
||||||
|
your element manually.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<h3>Allowed children</h3>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
Allowed children defines the elements that this element can contain.
|
||||||
|
The allowed values may range from none to a complex regexp depending on
|
||||||
|
your element.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
If you've ever taken a look at the HTML DTD's before, you may have
|
||||||
|
noticed declarations like this:
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<pre><!ELEMENT LI - O (%flow;)* -- list item --></pre>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
The <code>(%flow;)*</code> indicates the allowed children of the
|
||||||
|
<code>li</code> tag: <code>li</code> allows any number of flow
|
||||||
|
elements as its children. In HTML Purifier, we'd write it like
|
||||||
|
<code>Flow</code> (here's where the content sets we were
|
||||||
|
discussing earlier come into play). There are three shorthand content models you
|
||||||
|
can specify:
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<table class="table">
|
||||||
|
<thead>
|
||||||
|
<tr>
|
||||||
|
<th>Content model</th>
|
||||||
|
<th>Description</th>
|
||||||
|
</tr>
|
||||||
|
</thead>
|
||||||
|
<tbody>
|
||||||
|
<tr>
|
||||||
|
<th>Empty</th>
|
||||||
|
<td>No children allowed, like <code>br</code> or <code>hr</code></td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<th>Inline</th>
|
||||||
|
<td>Any number of inline elements and text, like <code>span</code></td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<th>Flow</th>
|
||||||
|
<td>Any number of inline elements, block elements and text, like <code>div</code></td>
|
||||||
|
</tr>
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
This covers 90% of all the cases out there, but what about elements that
|
||||||
|
break the mold like <code>ul</code>? This guy requires at least one
|
||||||
|
child, and the only valid children for it are <code>li</code>. The
|
||||||
|
content model is: <code>Required: li</code>. There are two parts: the
|
||||||
|
first type determines what <code>ChildDef</code> will be used to validate
|
||||||
|
content models. The most common values are:
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<table class="table">
|
||||||
|
<thead>
|
||||||
|
<tr>
|
||||||
|
<th>Type</th>
|
||||||
|
<th>Description</th>
|
||||||
|
</tr>
|
||||||
|
</thead>
|
||||||
|
<tbody>
|
||||||
|
<tr>
|
||||||
|
<th>Required</th>
|
||||||
|
<td>Children must be one or more of the valid elements</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<th>Optional</th>
|
||||||
|
<td>Children can be any number of the valid elements</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<th>Custom</th>
|
||||||
|
<td>Children must follow the DTD-style regex</td>
|
||||||
|
</tr>
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
You can also implement your own <code>ChildDef</code>: this was done
|
||||||
|
for a few special cases in HTML Purifier such as <code>Chameleon</code>
|
||||||
|
(for <code>ins</code> and <code>del</code>), <code>StrictBlockquote</code>
|
||||||
|
and <code>Table</code>.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
The second part specifies either valid elements or a regular expression.
|
||||||
|
Valid elements are separated with horizontal bars (|), i.e.
|
||||||
|
"<code>a | b | c</code>". Use #PCDATA to represent plain text.
|
||||||
|
Regular expressions are based off of DTD's style:
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>Parentheses () are used for grouping</li>
|
||||||
|
<li>Commas (,) separate elements that should come one after another</li>
|
||||||
|
<li>Horizontal bars (|) indicate one or the other elements should be used</li>
|
||||||
|
<li>Plus signs (+) are used for a one or more match</li>
|
||||||
|
<li>Asterisks (*) are used for a zero or more match</li>
|
||||||
|
<li>Question marks (?) are used for a zero or one match</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
For example, "<code>a, b?, (c | d), e+, f*</code>" means "In this order,
|
||||||
|
one <code>a</code> element, at most one <code>b</code> element,
|
||||||
|
one <code>c</code> or <code>d</code> element (but not both), one or more
|
||||||
|
<code>e</code> elements, and any number of <code>f</code> elements."
|
||||||
|
Regex veterans should be able to jump right in, and those not so savvy
|
||||||
|
can always copy-paste W3C's content model definitions into HTML Purifier
|
||||||
|
and hope for the best.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
A word of warning: while the regex format is extremely flexible on
|
||||||
|
the developer's side, it is
|
||||||
|
quite unforgiving on the user's side. If the user input does not <em>exactly</em>
|
||||||
|
match the specification, the entire contents of the element will
|
||||||
|
be nuked. This is why there is are specific content model types like
|
||||||
|
Optional and Required: while they could be implemented as <code>Custom:
|
||||||
|
(valid | elements)*</code>, the custom classes contain special recovery
|
||||||
|
measures that make sure as much of the user's original content gets
|
||||||
|
through. HTML Purifier's core, as a rule, does not use Custom.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
One final note: you can also use Content Sets inside your valid elements
|
||||||
|
lists or regular expressions. In fact, the three shorthand content models
|
||||||
|
mentioned above are just that: abbreviations:
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<table class="table">
|
||||||
|
<thead>
|
||||||
|
<tr>
|
||||||
|
<th>Content model</th>
|
||||||
|
<th>Implementation</th>
|
||||||
|
</tr>
|
||||||
|
</thead>
|
||||||
|
<tbody>
|
||||||
|
<tr>
|
||||||
|
<th>Inline</th>
|
||||||
|
<td>Optional: Inline | #PCDATA</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<th>Flow</th>
|
||||||
|
<td>Optional: Flow | #PCDATA</td>
|
||||||
|
</tr>
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
When the definition is compiled, Inline will be replaced with a
|
||||||
|
horizontal-bar separated list of inline elements. Also, notice that
|
||||||
|
it does not contain text: you have to specify that yourself.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<h3>Common attributes</h3>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
Congratulations: you have just gotten over the proverbial hump (Allowed
|
||||||
|
children). Common attributes is much simpler, and boils down to
|
||||||
|
one question: does your element have the <code>id</code>, <code>style</code>,
|
||||||
|
<code>class</code>, <code>title</code> and <code>lang</code> attributes?
|
||||||
|
If so, you'll want to specify the <code>Common</code> attribute collection,
|
||||||
|
which contains these five attributes that are found on almost every
|
||||||
|
HTML element in the specification.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
There are a few more collections, but they're really edge cases:
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<table class="table">
|
||||||
|
<thead>
|
||||||
|
<tr>
|
||||||
|
<th>Collection</th>
|
||||||
|
<th>Attributes</th>
|
||||||
|
</tr>
|
||||||
|
</thead>
|
||||||
|
<tbody>
|
||||||
|
<tr>
|
||||||
|
<th>I18N</th>
|
||||||
|
<td><code>lang</code>, possibly <code>xml:lang</code></td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<th>Core</th>
|
||||||
|
<td><code>style</code>, <code>class</code>, <code>id</code> and <code>title</code></td>
|
||||||
|
</tr>
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
Common is a combination of the above-mentioned collections.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<h3>Attributes</h3>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
If you didn't read the <a href="#addAttribute">previous section on
|
||||||
|
adding attributes</a>, read it now. The last parameter is simply
|
||||||
|
array of attribute names to attribute implementations, in the exact
|
||||||
|
same format as <code>addAttribute()</code>.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<h3>Putting it all together</h3>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
We're going to implement <code>form</code>. Before we embark, lets
|
||||||
|
grab a reference implementation from over at the
|
||||||
|
<a href="http://www.w3.org/TR/html4/sgml/loosedtd.html">transitional DTD</a>:
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<pre><!ELEMENT FORM - - (%flow;)* -(FORM) -- interactive form -->
|
||||||
|
<!ATTLIST FORM
|
||||||
|
%attrs; -- %coreattrs, %i18n, %events --
|
||||||
|
action %URI; #REQUIRED -- server-side form handler --
|
||||||
|
method (GET|POST) GET -- HTTP method used to submit the form--
|
||||||
|
enctype %ContentType; "application/x-www-form-urlencoded"
|
||||||
|
accept %ContentTypes; #IMPLIED -- list of MIME types for file upload --
|
||||||
|
name CDATA #IMPLIED -- name of form for scripting --
|
||||||
|
onsubmit %Script; #IMPLIED -- the form was submitted --
|
||||||
|
onreset %Script; #IMPLIED -- the form was reset --
|
||||||
|
target %FrameTarget; #IMPLIED -- render in this frame --
|
||||||
|
accept-charset %Charsets; #IMPLIED -- list of supported charsets --
|
||||||
|
></pre>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
Juicy! With just this, we can answer four of our five questions:
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<ol>
|
||||||
|
<li>What is the element's name? <strong>form</strong></li>
|
||||||
|
<li>What content set does this element belong to? <strong>Block</strong>
|
||||||
|
(this needs a little sleuthing, I find the easiest way is to search
|
||||||
|
the DTD for <code>FORM</code> and determine which set it is in.)</li>
|
||||||
|
<li>What are the allowed children of this element? <strong>One
|
||||||
|
or more flow elements, but no nested <code>form</code>s</strong></li>
|
||||||
|
<li>What attributes does the element allow that are general? <strong>Common</strong></li>
|
||||||
|
<li>What attributes does the element allow that are specific to this element? <strong>A whole bunch, see ATTLIST;
|
||||||
|
we're going to the vital ones: <code>action</code>, <code>method</code> and <code>name</code></strong></li>
|
||||||
|
</ol>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
Time for some code:
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<pre>$config = HTMLPurifier_Config::createDefault();
|
||||||
|
$config->set('HTML', 'DefinitionID', 'enduser-customize.html tutorial');
|
||||||
|
$config->set('HTML', 'DefinitionRev', 1);
|
||||||
|
$config->set('Core', 'DefinitionCache', null); // remove this later!
|
||||||
|
$def =& $config->getHTMLDefinition(true);
|
||||||
|
$def->addAttribute('a', 'target', new HTMLPurifier_AttrDef_Enum(
|
||||||
|
array('_blank','_self','_target','_top')
|
||||||
|
));
|
||||||
|
<strong>$form =& $def->addElement(
|
||||||
|
'form', // name
|
||||||
|
'Block', // content set
|
||||||
|
'Flow', // allowed children
|
||||||
|
'Common', // attribute collection
|
||||||
|
array( // attributes
|
||||||
|
'action*' => 'URI',
|
||||||
|
'method' => 'Enum#get|post',
|
||||||
|
'name' => 'ID'
|
||||||
|
)
|
||||||
|
);
|
||||||
|
$form->excludes = array('form' => true);</strong></pre>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
Each of the parameters corresponds to one of the questions we asked.
|
||||||
|
Notice that we added an asterisk to the end of the <code>action</code>
|
||||||
|
attribute to indicate that it is required. If someone specifies a
|
||||||
|
<code>form</code> without that attribute, the tag will be axed.
|
||||||
|
Also, the extra line at the end is a special extra declaration that
|
||||||
|
prevents forms from being nested within each other.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
And that's all there is to it! Implementing the rest of the form
|
||||||
|
module is left as an exercise to the user; to see more examples
|
||||||
|
check the <a href="http://htmlpurifier.org/svnroot/htmlpurifier/trunk/library/HTMLPurifier/HTMLModule/"><code>library/HTMLPurifier/HTMLModule/</code></a> directory
|
||||||
|
in your local HTML Purifier installation.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<h2>And beyond...</h2>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
Perceptive users may have realized that, to a certain extent, we
|
||||||
|
have simply re-implemented the facilities of XML Schema or the
|
||||||
|
Document Type Definition. What you are seeing here, however, is
|
||||||
|
not just an XML Schema or Document Type Definition: it is a fully
|
||||||
|
expressive method of specifying the definition of HTML that is
|
||||||
|
a portable superset of the capabilities of the two above-mentioned schema
|
||||||
|
languages. What makes HTMLDefinition so powerful is the fact that
|
||||||
|
if we don't have an implementation for a content model or an attribute
|
||||||
|
definition, you can supply it yourself by writing a PHP class.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
There are many facets of HTMLDefinition beyond the Advanced API I have
|
||||||
|
walked you through today. To find out more about these, you can
|
||||||
|
check out these source files:
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li><a href="http://htmlpurifier.org/svnroot/htmlpurifier/trunk/library/HTMLPurifier/HTMLModule.php"><code>library/HTMLPurifier/HTMLModule.php</code></a></li>
|
||||||
|
<li><a href="http://htmlpurifier.org/svnroot/htmlpurifier/trunk/library/HTMLPurifier/ElementDef.php"><code>library/HTMLPurifier/ElementDef.php</code></a></li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<div id="version">$Id: enduser-tidy.html 1158 2007-06-18 19:26:29Z Edward $</div>
|
||||||
|
|
||||||
|
</body></html>
|
@@ -8,15 +8,11 @@ to be effective. Things to remember:
|
|||||||
|
|
||||||
1. Character Encoding: see enduser-utf8.html for more info.
|
1. Character Encoding: see enduser-utf8.html for more info.
|
||||||
|
|
||||||
2. Doctype: document pending feature completion
|
2. IDs: see enduser-id.html for more info
|
||||||
Not strictly necessary, actually. More in-depth discussion once we figure
|
|
||||||
out how to get strict loose mode working.
|
|
||||||
|
|
||||||
3. IDs: see enduser-id.html for more info
|
3. Links: document pending feature completion
|
||||||
|
|
||||||
4. Links: document pending feature completion
|
|
||||||
Rudimentary blacklisting, we should also allow only relative URIs. We
|
Rudimentary blacklisting, we should also allow only relative URIs. We
|
||||||
need a doc to explain the stuff.
|
need a doc to explain the stuff.
|
||||||
|
|
||||||
5. CSS: document pending
|
4. CSS: document pending
|
||||||
Explain which CSS styles we blocked and why.
|
Explain which CSS styles we blocked and why.
|
||||||
|
235
docs/enduser-tidy.html
Normal file
235
docs/enduser-tidy.html
Normal file
@@ -0,0 +1,235 @@
|
|||||||
|
<?xml version="1.0" encoding="UTF-8"?>
|
||||||
|
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
|
||||||
|
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
|
||||||
|
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head>
|
||||||
|
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
|
||||||
|
<meta name="description" content="Tutorial for tweaking HTML Purifier's Tidy-like behavior." />
|
||||||
|
<link rel="stylesheet" type="text/css" href="style.css" />
|
||||||
|
|
||||||
|
<title>Tidy - HTML Purifier</title>
|
||||||
|
|
||||||
|
</head><body>
|
||||||
|
|
||||||
|
<h1>Tidy</h1>
|
||||||
|
|
||||||
|
<div id="filing">Filed under Development</div>
|
||||||
|
<div id="index">Return to the <a href="index.html">index</a>.</div>
|
||||||
|
<div id="home"><a href="http://htmlpurifier.org/">HTML Purifier</a> End-User Documentation</div>
|
||||||
|
|
||||||
|
<div id="applicability">
|
||||||
|
This document covers currently unreleased functionality and
|
||||||
|
only applies to recent SVN checkouts.
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<p>You've probably heard of HTML Tidy, Dave Raggett's little piece
|
||||||
|
of software that cleans up poorly written HTML. Let me say it straight
|
||||||
|
out:</p>
|
||||||
|
|
||||||
|
<p class="emphasis">This ain't HTML Tidy!</p>
|
||||||
|
|
||||||
|
<p>Rather, Tidy stands for a cool set of Tidy-inspired in HTML Purifier
|
||||||
|
that allows users to submit deprecated elements and attributes and get
|
||||||
|
valid strict markup back. For example:</p>
|
||||||
|
|
||||||
|
<pre><center>Centered</center></pre>
|
||||||
|
|
||||||
|
<p>...becomes:</p>
|
||||||
|
|
||||||
|
<pre><div style="text-align:center;">Centered</div></pre>
|
||||||
|
|
||||||
|
<p>...when this particular fix is run on the HTML. This tutorial will give
|
||||||
|
you down the lowdown of what exactly HTML Purifier will do when Tidy
|
||||||
|
is on, and how to fine tune this behavior. Once again, <strong>you do
|
||||||
|
not need Tidy installed on your PHP to use these features!</strong></p>
|
||||||
|
|
||||||
|
<h2>What does it do?</h2>
|
||||||
|
|
||||||
|
<p>Tidy will do several things to your HTML:</p>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>Convert deprecated elements and attributes to standards-compliant
|
||||||
|
alternatives</li>
|
||||||
|
<li>Enforce XHTML compatibility guidelines and other best practices</li>
|
||||||
|
<li>Preserve data that would normally be removed as per W3C</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<h2>What are levels?</h2>
|
||||||
|
|
||||||
|
<p>Levels describe how aggressive the Tidy module should be when
|
||||||
|
cleaning up HTML. There are four levels to pick: none, light, medium
|
||||||
|
and heavy. Each of these levels has a well-defined set of behavior
|
||||||
|
associated with it, although it may change depending on your doctype.</p>
|
||||||
|
|
||||||
|
<dl>
|
||||||
|
<dt>light</dt>
|
||||||
|
<dd>This is the <strong>lenient</strong> level. If a tag or attribute
|
||||||
|
is about to be removed because it isn't supported by the
|
||||||
|
doctype, Tidy will step in and change into an alternative that
|
||||||
|
is supported.</dd>
|
||||||
|
<dt>medium</dt>
|
||||||
|
<dd>This is the <strong>correctional</strong> level. At this level,
|
||||||
|
all the functions of light are performed, as well as some extra,
|
||||||
|
non-essential best practices enforcement. Changes made on this
|
||||||
|
level are very benign and are unlikely to cause problems.</dd>
|
||||||
|
<dt>heavy</dt>
|
||||||
|
<dd>This is the <strong>aggressive</strong> level. If a tag or
|
||||||
|
attribute is deprecated, it will be converted into a non-deprecated
|
||||||
|
version, no ifs ands or buts.</dd>
|
||||||
|
</dl>
|
||||||
|
|
||||||
|
<p>By default, Tidy operates on the <strong>medium</strong> level. You can
|
||||||
|
change the level of cleaning by setting the %HTML.TidyLevel configuration
|
||||||
|
directive:</p>
|
||||||
|
|
||||||
|
<pre>$config->set('HTML', 'TidyLevel', 'heavy'); // burn baby burn!</pre>
|
||||||
|
|
||||||
|
<h2>Is the light level really light?</h2>
|
||||||
|
|
||||||
|
<p>It depends on what doctype you're using. If your documents are HTML
|
||||||
|
4.01 <em>Transitional</em>, HTML Purifier will be lazy
|
||||||
|
and won't clean up your <code>center</code>
|
||||||
|
or <code>font</code> tags. But if you're using HTML 4.01 <em>Strict</em>,
|
||||||
|
HTML Purifier has no choice: it has to convert them, or they will
|
||||||
|
be nuked out of existence. So while light on Transitional will result
|
||||||
|
in little to no changes, light on Strict will still result in quite
|
||||||
|
a lot of fixes.</p>
|
||||||
|
|
||||||
|
<p>This is different behavior from 1.6 or before, where deprecated
|
||||||
|
tags in transitional documents would
|
||||||
|
always be cleaned up regardless. This is also better behavior.</p>
|
||||||
|
|
||||||
|
<h2>My pages look different!</h2>
|
||||||
|
|
||||||
|
<p>HTML Purifier is tasked with converting deprecated tags and
|
||||||
|
attributes to standards-compliant alternatives, which usually
|
||||||
|
need copious amounts of CSS. It's also not foolproof: sometimes
|
||||||
|
things do get lost in the translation. This is why when HTML Purifier
|
||||||
|
can get away with not doing cleaning, it won't; this is why
|
||||||
|
the default value is <strong>medium</strong> and not heavy.</p>
|
||||||
|
|
||||||
|
<p>Fortunately, only a few attributes have problems with the switch
|
||||||
|
over. They are described below:</p>
|
||||||
|
|
||||||
|
<table class="table">
|
||||||
|
<thead><tr>
|
||||||
|
<th>Element@Attr</th>
|
||||||
|
<th>Changes</th>
|
||||||
|
</tr></thead>
|
||||||
|
<tbody>
|
||||||
|
<tr>
|
||||||
|
<td>caption@align</td>
|
||||||
|
<td>Firefox supports stuffing the caption on the
|
||||||
|
left and right side of the table, a feature that
|
||||||
|
Internet Explorer, understandably, does not have.
|
||||||
|
When align equals right or left, the text will simply
|
||||||
|
be aligned on the left or right side.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>img@align</td>
|
||||||
|
<td>The implementation for align bottom is good, but not
|
||||||
|
perfect. There are a few pixel differences.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>br@clear</td>
|
||||||
|
<td>Clear both gets a little wonky in Internet Explorer. Haven't
|
||||||
|
really been able to figure out why.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>hr@noshade</td>
|
||||||
|
<td>All browsers implement this slightly differently: we've
|
||||||
|
chosen to make noshade horizontal rules gray.</td>
|
||||||
|
</tr>
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
|
|
||||||
|
<p>There are a few more minor, although irritating, bugs.
|
||||||
|
Some older browsers support deprecated attributes,
|
||||||
|
but not CSS. Transformed elements and attributes will look unstyled
|
||||||
|
to said browsers. Also, CSS precedence is slightly different for
|
||||||
|
inline styles versus presentational markup. In increasing precedence:</p>
|
||||||
|
|
||||||
|
<ol>
|
||||||
|
<li>Presentational attributes</li>
|
||||||
|
<li>External style sheets</li>
|
||||||
|
<li>Inline styling</li>
|
||||||
|
</ol>
|
||||||
|
|
||||||
|
<p>This means that styling that may have been masked by external CSS
|
||||||
|
declarations will start showing up (a good thing, perhaps). Finally,
|
||||||
|
if you've turned off the style attribute, almost all of
|
||||||
|
these transformations will not work. Sorry mates.</p>
|
||||||
|
|
||||||
|
<p>You can review the rendering before and after of these transformations
|
||||||
|
by consulting the <a
|
||||||
|
href="http://htmlpurifier.org/live/smoketests/attrTransform.php">attrTransform.php
|
||||||
|
smoketest</a>.</p>
|
||||||
|
|
||||||
|
<h2>I like the general idea, but the specifics bug me!</h2>
|
||||||
|
|
||||||
|
<p>So you want HTML Purifier to clean up your HTML, but you're not
|
||||||
|
so happy about the br@clear implementation. That's perfectly fine!
|
||||||
|
HTML Purifier will make accomodations:</p>
|
||||||
|
|
||||||
|
<pre>$config->set('HTML', 'Doctype', 'XHTML 1.0 Transitional');
|
||||||
|
$config->set('HTML', 'TidyLevel', 'heavy'); // all changes, minus...
|
||||||
|
<strong>$config->set('HTML', 'TidyRemove', 'br@clear');</strong></pre>
|
||||||
|
|
||||||
|
<p>That third line does the magic, removing the br@clear fix
|
||||||
|
from the module, ensuring that <code><br clear="both" /></code>
|
||||||
|
will pass through unharmed. The reverse is possible too:</p>
|
||||||
|
|
||||||
|
<pre>$config->set('HTML', 'Doctype', 'XHTML 1.0 Transitional');
|
||||||
|
$config->set('HTML', 'TidyLevel', 'none'); // no changes, plus...
|
||||||
|
<strong>$config->set('HTML', 'TidyAdd', 'p@align');</strong></pre>
|
||||||
|
|
||||||
|
<p>In this case, all transformations are shut off, except for the p@align
|
||||||
|
one, which you found handy.</p>
|
||||||
|
|
||||||
|
<p>To find out what the names of fixes you want to turn on or off are,
|
||||||
|
you'll have to consult the source code, specifically the files in
|
||||||
|
<code>HTMLPurifier/HTMLModule/Tidy/</code>. There is, however, a
|
||||||
|
general syntax:</p>
|
||||||
|
|
||||||
|
<table class="table">
|
||||||
|
<thead>
|
||||||
|
<tr>
|
||||||
|
<th>Name</th>
|
||||||
|
<th>Example</th>
|
||||||
|
<th>Interpretation</th>
|
||||||
|
</tr>
|
||||||
|
</thead>
|
||||||
|
<tbody>
|
||||||
|
<tr>
|
||||||
|
<td>element</td>
|
||||||
|
<td>font</td>
|
||||||
|
<td>Tag transform for <em>element</em></td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>element@attr</td>
|
||||||
|
<td>br@clear</td>
|
||||||
|
<td>Attribute transform for <em>attr</em> on <em>element</em></td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>@attr</td>
|
||||||
|
<td>@lang</td>
|
||||||
|
<td>Global attribute transform for <em>attr</em></td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>e#content_model_type</td>
|
||||||
|
<td>blockquote#content_model_type</td>
|
||||||
|
<td>Change of child processing implementation for <em>e</em></td>
|
||||||
|
</tr>
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
|
|
||||||
|
<h2>So... what's the lowdown?</h2>
|
||||||
|
|
||||||
|
<p>The lowdown is, quite frankly, HTML Purifier's default settings are
|
||||||
|
probably good enough. The next step is to bump the level up to heavy,
|
||||||
|
and if that still doesn't satisfy your appetite, do some fine tuning.
|
||||||
|
Other than that, don't worry about it: this all works silently and
|
||||||
|
effectively in the background.</p>
|
||||||
|
|
||||||
|
<div id="version">$Id$</div>
|
||||||
|
|
||||||
|
</body></html>
|
@@ -8,8 +8,8 @@ require_once '../../library/HTMLPurifier.auto.php';
|
|||||||
$config = HTMLPurifier_Config::createDefault();
|
$config = HTMLPurifier_Config::createDefault();
|
||||||
|
|
||||||
// configuration goes here:
|
// configuration goes here:
|
||||||
$config->set('Core', 'Encoding', 'ISO-8859-1'); //replace with your encoding
|
$config->set('Core', 'Encoding', 'UTF-8'); // replace with your encoding
|
||||||
$config->set('Core', 'XHTML', true); // set to false if HTML 4.01
|
$config->set('HTML', 'Doctype', 'XHTML 1.0 Transitional'); // replace with your doctype
|
||||||
|
|
||||||
$purifier = new HTMLPurifier($config);
|
$purifier = new HTMLPurifier($config);
|
||||||
|
|
||||||
|
@@ -34,6 +34,12 @@ information for casual developers using HTML Purifier.</p>
|
|||||||
<dt><a href="enduser-utf8.html">UTF-8: The Secret of Character Encoding</a></dt>
|
<dt><a href="enduser-utf8.html">UTF-8: The Secret of Character Encoding</a></dt>
|
||||||
<dd>Describes the rationale for using UTF-8, the ramifications otherwise, and how to make the switch.</dd>
|
<dd>Describes the rationale for using UTF-8, the ramifications otherwise, and how to make the switch.</dd>
|
||||||
|
|
||||||
|
<dt><a href="enduser-tidy.html">Tidy</a></dt>
|
||||||
|
<dd>Tutorial for tweaking HTML Purifier's Tidy-like behavior.</dd>
|
||||||
|
|
||||||
|
<dt><a href="enduser-customize.html">Customize</a></dt>
|
||||||
|
<dd>Tutorial for customizing HTML Purifier's tag and attribute sets.</dd>
|
||||||
|
|
||||||
</dl>
|
</dl>
|
||||||
|
|
||||||
<h2>Development</h2>
|
<h2>Development</h2>
|
||||||
@@ -128,8 +134,8 @@ the code. They may be upgraded to HTML files or stay as TXT scratchpads.</p>
|
|||||||
|
|
||||||
<tr>
|
<tr>
|
||||||
<td>Reference</td>
|
<td>Reference</td>
|
||||||
<td><a href="ref-loose-vs-strict.txt">Loose vs.Strict</a></td>
|
<td><a href="ref-content-models.txt">Handling Content Model Changes</a></td>
|
||||||
<td>Differences between HTML Strict and Transitional versions.</td>
|
<td>Discusses how to tidy up content model changes using custom ChildDef classes.</td>
|
||||||
</tr>
|
</tr>
|
||||||
|
|
||||||
<tr>
|
<tr>
|
||||||
@@ -140,14 +146,8 @@ the code. They may be upgraded to HTML files or stay as TXT scratchpads.</p>
|
|||||||
|
|
||||||
<tr>
|
<tr>
|
||||||
<td>Reference</td>
|
<td>Reference</td>
|
||||||
<td><a href="ref-strictness.txt">Strictness</a></td>
|
<td><a href="ref-html-modularization.txt">Modularization of HTMLDefinition</a></td>
|
||||||
<td>Short essay on how loose definition isn't really loose.</td>
|
<td>Provides a high-level overview of the concepts behind HTMLModules.</td>
|
||||||
</tr>
|
|
||||||
|
|
||||||
<tr>
|
|
||||||
<td>Reference</td>
|
|
||||||
<td><a href="ref-xhtml-1.1.txt">XHTML 1.1</a></td>
|
|
||||||
<td>What we'd have to do to support XHTML 1.1.</td>
|
|
||||||
</tr>
|
</tr>
|
||||||
|
|
||||||
<tr>
|
<tr>
|
||||||
|
@@ -12,29 +12,10 @@ the documentation in ConfigDef for more information on these namespaces.
|
|||||||
|
|
||||||
Since configuration is dependant on context, internal classes require a
|
Since configuration is dependant on context, internal classes require a
|
||||||
configuration object to be passed as a parameter. (They also require a
|
configuration object to be passed as a parameter. (They also require a
|
||||||
Context object).
|
Context object). A majority of classes do not need the config object,
|
||||||
|
but for those who do, it is a lifesaver.
|
||||||
|
|
||||||
In relation to HTMLDefinition and CSSDefinition, there could be a special class
|
Definition objects are complex datatypes influenced by their respective
|
||||||
of directives that influence the *construction* of the Definition object.
|
directive namespaces (HTMLDefinition with HTML and CSSDefinition with CSS).
|
||||||
A theoretical call pattern would look like:
|
If any of these directives is updated, HTML Purifier forces the definition
|
||||||
|
to be regenerated.
|
||||||
1. Client calls Config->getHTMLDefinition()
|
|
||||||
2. Config calls HTMLDefinition->createNew(this)
|
|
||||||
3. HTMLDefinition constructs itself with base configuration
|
|
||||||
4. HTMLDefinition calls Config->get('HTML')
|
|
||||||
5. Config returns array of directives
|
|
||||||
6. HTMLDefinition performs operations and changes specified by directives
|
|
||||||
7. HTMLPurifier returns constructed definition
|
|
||||||
8. Config caches definition so it doesn't have to be generated again
|
|
||||||
9. Config returns definition
|
|
||||||
|
|
||||||
You could also override Config's copy of the definition with your own
|
|
||||||
custom copy, which OVERRIDES all directives. Only the base, vanilla copy
|
|
||||||
is the Singleton, the object actually interfaced with is a operated-upon
|
|
||||||
clone of that object. Also, if an update to the directives would update
|
|
||||||
the definition, you'd have to force reconstruction.
|
|
||||||
|
|
||||||
In practice, the pulling directives from the config object are
|
|
||||||
solely need-based, and the flex points are littered throughout the
|
|
||||||
setup() function. Some sort of refactoring is likely in order. See
|
|
||||||
ref-xhtml-1.1.txt for more info.
|
|
||||||
|
@@ -2,23 +2,16 @@
|
|||||||
Filter Levels
|
Filter Levels
|
||||||
When one size *does not* fit all
|
When one size *does not* fit all
|
||||||
|
|
||||||
The more I think about it, the less sense it makes for maintaining one huge
|
It makes little sense to constrain users to one set of HTML elements and
|
||||||
monolithic HTMLDefinition class. There's simply so much variation that
|
attributes and tell them that they are not allowed to mold this in
|
||||||
could go into this definition: the set of HTML good for blog entries is
|
any fashion. Many users demand to be able to custom-select which elements
|
||||||
definitely too large for HTML that would be allowed in blog comments. Going
|
and attributes they want. This is fine: because HTML Purifier keeps close
|
||||||
from Transitional to Strict requires changes to the definition.
|
track of what elements are safe to use, there is no way for them to
|
||||||
|
accidently allow an XSS-able tag.
|
||||||
|
|
||||||
Allowing users to specify their own whitelists is one step (implemented, btw),
|
However, combing through the HTML spec to make your own whitelist can
|
||||||
but I have doubts on only doing this. Simply put, the typical programmer is too
|
be a daunting task. HTML Purifier ought to offer pre-canned filter levels
|
||||||
lazy to actually go through the trouble of investigating which tags, attributes
|
that amateur users can select based on what they think is their use-case.
|
||||||
and properties to allow. HTMLDefinition makes a big part of what HTMLPurifier
|
|
||||||
is.
|
|
||||||
|
|
||||||
The idea, then, is to setup fundamentally different set of definitions, which
|
|
||||||
can further be customized using simpler configuration options. Alternatively,
|
|
||||||
they could be implemented as configuration profiles, which simply load
|
|
||||||
a set of recommended directives to acheive a desired affect (no simpler
|
|
||||||
config options though).
|
|
||||||
|
|
||||||
Here are some fuzzy levels you could set:
|
Here are some fuzzy levels you could set:
|
||||||
|
|
||||||
@@ -46,6 +39,10 @@ make forbidden element to text transformations desirable (for example, images).
|
|||||||
|
|
||||||
== Element Risk Analysis ==
|
== Element Risk Analysis ==
|
||||||
|
|
||||||
|
Although none of the currently supported elements presents a security
|
||||||
|
threat per-say, some can cause problems for page layouts or be
|
||||||
|
extremely complicated.
|
||||||
|
|
||||||
Legend:
|
Legend:
|
||||||
[danger level] - regular tags / uncommon tags ~ deprecated tags
|
[danger level] - regular tags / uncommon tags ~ deprecated tags
|
||||||
[danger level]* - rare tags
|
[danger level]* - rare tags
|
||||||
@@ -114,6 +111,10 @@ Partially presentational - table.cellpadding, table.cellspacing,
|
|||||||
|
|
||||||
== CSS Risk Analysis ==
|
== CSS Risk Analysis ==
|
||||||
|
|
||||||
|
Currently, there is no support for fine-grained "allowed CSS" specification,
|
||||||
|
mainly because I'm lazy, partially because no one has asked for it. However,
|
||||||
|
this will be added eventually.
|
||||||
|
|
||||||
There are certain CSS elements that are extremely useful inline, but then
|
There are certain CSS elements that are extremely useful inline, but then
|
||||||
as you get to more presentation oriented styling it may not always be
|
as you get to more presentation oriented styling it may not always be
|
||||||
appropriate to inline them.
|
appropriate to inline them.
|
||||||
@@ -126,6 +127,7 @@ any CSS properties that are not currently implemented (such as position).
|
|||||||
Dangerous, can go outside container - float
|
Dangerous, can go outside container - float
|
||||||
Easy to abuse - font-size, font-family (font), width
|
Easy to abuse - font-size, font-family (font), width
|
||||||
Colored - background-color (background), border-color (border), color
|
Colored - background-color (background), border-color (border), color
|
||||||
|
(see proposal-colors.html)
|
||||||
Dramatic - border, list-style-position (list-style), margin, padding,
|
Dramatic - border, list-style-position (list-style), margin, padding,
|
||||||
text-align, text-indent, text-transform, vertical-align, line-height
|
text-align, text-indent, text-transform, vertical-align, line-height
|
||||||
|
|
||||||
|
48
docs/ref-content-models.txt
Normal file
48
docs/ref-content-models.txt
Normal file
@@ -0,0 +1,48 @@
|
|||||||
|
|
||||||
|
Handling Content Model Changes
|
||||||
|
|
||||||
|
|
||||||
|
1. Context
|
||||||
|
|
||||||
|
The distinction between Transitional and Strict document types is somewhat
|
||||||
|
of an anomaly in the lineage of XHTML document types (following 1.0, no
|
||||||
|
doctypes do not have flavors: instead, modularization is used to let
|
||||||
|
document authors vary their elements). This transition is usually quite
|
||||||
|
straight-forward, as W3C usually deprecates attributes or elements, which
|
||||||
|
are quite easily handled using tag and attribute transforms.
|
||||||
|
|
||||||
|
However, for two elements, <blockquote>, <body> and <address>, W3C elected
|
||||||
|
to also change the content model. <blockquote> and <body> originally
|
||||||
|
accepted both inline and block elements, but in the strict doctype they
|
||||||
|
only allow block elements. With <address>, the situation is inverted:
|
||||||
|
<p> tags were now forbidden from appearing within this tag.
|
||||||
|
|
||||||
|
|
||||||
|
2. Current situation
|
||||||
|
|
||||||
|
Currently, HTML Purifier treats <blockquote> specially during Tidy mode
|
||||||
|
using a custom ChildDef class StrictBlockquote. StrictBlockquote
|
||||||
|
operates similarly to Required, except that when it encounters an inline
|
||||||
|
element, it will wrap it in a block tag (as specified by
|
||||||
|
%HTML.BlockWrapper, the default is <p>). The naming suggests it can
|
||||||
|
only be used for <blockquote>s, although it may be possible to
|
||||||
|
genericize it to work on other cases of this nature (this would be of
|
||||||
|
little practical application, as no other element in XHTML 1.1 or earlier
|
||||||
|
has a block-only content model).
|
||||||
|
|
||||||
|
Tidy currently contains no custom, lenient implementation for <address>.
|
||||||
|
If one were to be written, it would likely operate on the principle that,
|
||||||
|
when a <p> tag were to be encountered, it would be replaced with a
|
||||||
|
leading and trailing <br /> tag (the contents of <p>, being inline, are
|
||||||
|
not an issue). There is no prior work with this sort of operation.
|
||||||
|
|
||||||
|
|
||||||
|
3. Outside applicability
|
||||||
|
|
||||||
|
There are a number of other elements that contain restrictive content
|
||||||
|
models, such as <ul> or <span> (the latter is restrictive in that it
|
||||||
|
does not allow block elements). In the former case, an errant node
|
||||||
|
is eliminated completely, in the latter case, the text of the node
|
||||||
|
would is preserved (as the parent node does allow PCDATA). Custom
|
||||||
|
content model implementations probably are not the best way of handling
|
||||||
|
these cases, instead, node bubbling should be implemented instead.
|
@@ -1,10 +1,8 @@
|
|||||||
|
|
||||||
XHTML 1.1 and HTML Purifier
|
The Modularization of HTMLDefinition in HTML Purifier
|
||||||
|
|
||||||
Todo for XHTML 1.1 support <http://www.w3.org/TR/xhtml11/changes.html>
|
Todo for XHTML 1.1 support <http://www.w3.org/TR/xhtml11/changes.html>
|
||||||
1. Scratch lang entirely in favor of xml:lang
|
1. Support Ruby <http://www.w3.org/TR/2001/REC-ruby-20010531/>
|
||||||
2. Scratch name entirely in favor of id (partially-done)
|
|
||||||
3. Support Ruby <http://www.w3.org/TR/2001/REC-ruby-20010531/>
|
|
||||||
|
|
||||||
HTML Purifier uses the modularization of XHTML
|
HTML Purifier uses the modularization of XHTML
|
||||||
<http://www.w3.org/TR/xhtml-modularization/> to organize the internals
|
<http://www.w3.org/TR/xhtml-modularization/> to organize the internals
|
||||||
@@ -12,25 +10,10 @@ of HTMLDefinition into a more manageable and extensible fashion. Rather
|
|||||||
than have one super-object, HTMLDefinition is split into HTMLModules,
|
than have one super-object, HTMLDefinition is split into HTMLModules,
|
||||||
each of which are responsible for defining elements, their attributes,
|
each of which are responsible for defining elements, their attributes,
|
||||||
and other properties (for a more indepth coverage, see
|
and other properties (for a more indepth coverage, see
|
||||||
/library/HTMLPurifier/HTMLModule.php's docblock comments).
|
/library/HTMLPurifier/HTMLModule.php's docblock comments). These modules
|
||||||
|
are managed by HTMLModuleManager.
|
||||||
|
|
||||||
The modules that W3C defines and we support are:
|
Modules that we don't support but could support are:
|
||||||
|
|
||||||
* 5.1. Attribute Collections (technically not a module
|
|
||||||
* 5.2. Core Modules
|
|
||||||
o 5.2.2. Text Module
|
|
||||||
o 5.2.3. Hypertext Module
|
|
||||||
o 5.2.4. List Module
|
|
||||||
* 5.4. Text Extension Modules
|
|
||||||
o 5.4.1. Presentation Module
|
|
||||||
o 5.4.2. Edit Module
|
|
||||||
o 5.4.3. Bi-directional Text Module
|
|
||||||
* 5.6. Table Modules
|
|
||||||
o 5.6.2. Tables Module
|
|
||||||
* 5.7. Image Module
|
|
||||||
* 5.18. Style Attribute Module
|
|
||||||
|
|
||||||
Modules that we don't support but coul support are:
|
|
||||||
|
|
||||||
* 5.6. Table Modules
|
* 5.6. Table Modules
|
||||||
o 5.6.1. Basic Tables Module [?]
|
o 5.6.1. Basic Tables Module [?]
|
||||||
@@ -38,10 +21,8 @@ Modules that we don't support but coul support are:
|
|||||||
* 5.9. Server-side Image Map Module [?]
|
* 5.9. Server-side Image Map Module [?]
|
||||||
* 5.12. Target Module [?]
|
* 5.12. Target Module [?]
|
||||||
* 5.21. Name Identification Module [deprecated]
|
* 5.21. Name Identification Module [deprecated]
|
||||||
* 5.22. Legacy Module [deprecated]
|
|
||||||
|
|
||||||
These modules will not be implemented due to their dangerousness or
|
These modules would be implemented as "unsafe":
|
||||||
inapplicability as an XHTML fragment:
|
|
||||||
|
|
||||||
* 5.2. Core Modules
|
* 5.2. Core Modules
|
||||||
o 5.2.1. Structure Module
|
o 5.2.1. Structure Module
|
||||||
@@ -64,11 +45,7 @@ of robust tools for handling them (the main problem is that all the
|
|||||||
current parsers are usually PHP 5 only and solely-validating, not
|
current parsers are usually PHP 5 only and solely-validating, not
|
||||||
correcting).
|
correcting).
|
||||||
|
|
||||||
The abstraction of the HTMLDefinition creation process will also
|
This system may be generalized and ported over for CSS.
|
||||||
contribute to a need for a caching system. Cache invalidation would be
|
|
||||||
difficult, but could be done by comparing the HTML and Attr config
|
|
||||||
namespaces with a copy that was packaged along with the serialized
|
|
||||||
HTMLDefinition object.
|
|
||||||
|
|
||||||
== General Use-Case ==
|
== General Use-Case ==
|
||||||
|
|
||||||
@@ -91,7 +68,7 @@ like this:
|
|||||||
<?php
|
<?php
|
||||||
$config = HTMLPurifier_Config::createDefault();
|
$config = HTMLPurifier_Config::createDefault();
|
||||||
$def =& $config->getHTMLDefinition(true); // reference to raw
|
$def =& $config->getHTMLDefinition(true); // reference to raw
|
||||||
unset($def->modules['Hypertext']); // rm ''a'' link
|
$def->addElement('marquee', 'Block', 'Flow', 'Common');
|
||||||
$purifier = new HTMLPurifier($config);
|
$purifier = new HTMLPurifier($config);
|
||||||
$purifier->purify($html); // now the definition is finalized
|
$purifier->purify($html); // now the definition is finalized
|
||||||
?>
|
?>
|
||||||
@@ -184,4 +161,4 @@ Content sets can be altered using HTMLModule->content_sets, an associative
|
|||||||
array of content set names to content set contents. If the content set
|
array of content set names to content set contents. If the content set
|
||||||
already exists, your values are appended on to it (great for, say,
|
already exists, your values are appended on to it (great for, say,
|
||||||
registering the font tag as an inline element), otherwise it is
|
registering the font tag as an inline element), otherwise it is
|
||||||
created. They are substituted into content_model.
|
created. They are substituted into content_model.
|
@@ -1,37 +0,0 @@
|
|||||||
|
|
||||||
Loose versus Strict
|
|
||||||
Changes from one doctype to another
|
|
||||||
|
|
||||||
There are changes. Wow, how insightful. Not everything changed is relevant
|
|
||||||
to HTML Purifier, though, so let's take a look:
|
|
||||||
|
|
||||||
== Major incompatibilities ==
|
|
||||||
|
|
||||||
[done] BLOCKQUOTE changes from 'flow' to 'block'
|
|
||||||
current behavior: inline inner contents should not be nuked, block-ify as necessary
|
|
||||||
[partially-done] U, S, STRIKE cut
|
|
||||||
current behavior: removed completely
|
|
||||||
projected behavior: replace with appropriate inline span + CSS
|
|
||||||
[done] ADDRESS from potpourri to Inline (removes p tags)
|
|
||||||
current behavior: block tags silently dropped
|
|
||||||
ideal behavior: replace tags with something like <br>. (not high priority)
|
|
||||||
|
|
||||||
== Things we can loosen up ==
|
|
||||||
|
|
||||||
Tags DIR, MENU, CENTER, ISINDEX, FONT, BASEFONT? allowed in loose
|
|
||||||
current behavior: transform to strict-valid forms
|
|
||||||
Attributes allowed in loose (see attribute transforms in 'dev-progress.html')
|
|
||||||
current behavior: projected to transform into strict-valid forms
|
|
||||||
|
|
||||||
== Periphery issues ==
|
|
||||||
|
|
||||||
A tag's attribute 'target' (for selecting frames) cut
|
|
||||||
current behavior: not allowed at all
|
|
||||||
projected behavior: use loose doctype if needed, needs valid values
|
|
||||||
[done] OL/LI tag's attribute 'start'/'value' (for renumbering lists) cut
|
|
||||||
current behavior: no substitute, just delete when in strict, allow in loose
|
|
||||||
Attribute 'name' deprecated in favor of 'id'
|
|
||||||
current behavior: dropped silently
|
|
||||||
projected behavior: create proper AttrTransform
|
|
||||||
[done] PRE tag allows SUB/SUP? (strict dtd comment vs syntax, loose disallows)
|
|
||||||
current behavior: disallow as usual
|
|
@@ -18,5 +18,7 @@ HTML Purifier context.
|
|||||||
|
|
||||||
<listing>, monospace pre-variant (extremely rare)
|
<listing>, monospace pre-variant (extremely rare)
|
||||||
<plaintext>, escapes all tags to the end of document
|
<plaintext>, escapes all tags to the end of document
|
||||||
<ruby> and friends, (more research needed, appears to be XHTML 1.1 markup)
|
|
||||||
<xmp>, monospace, replace with pre
|
<xmp>, monospace, replace with pre
|
||||||
|
|
||||||
|
These should be put into their own Tidy module, not loaded by default(?). These
|
||||||
|
all qualify as "lenient" transforms.
|
@@ -1,37 +0,0 @@
|
|||||||
|
|
||||||
Is HTML Purifier Strict or Transitional?
|
|
||||||
A little bit of helpful guidance
|
|
||||||
|
|
||||||
Despite the fact that HTML Purifier professes to support both transitional and
|
|
||||||
strict HTML, it rejects a lot of attributes and elements that are actually, indeed,
|
|
||||||
valid. You can investigate progress.html to find out precisely what we
|
|
||||||
are doing to these *deprecated* attributes.
|
|
||||||
|
|
||||||
However, users have found that Strict HTML imposes some quite unreasonable
|
|
||||||
restrictions on certain things. The start and value attributes in ol and
|
|
||||||
li (respectively) perhaps are the most contested. There's is currently no
|
|
||||||
widely supported browser method short of JavaScript that can replace these
|
|
||||||
two deprecated elements. It behooves us to allow these deprecated
|
|
||||||
attributes when the output is transitional.
|
|
||||||
|
|
||||||
Fortunantely, that's the only real bugger case. The others have near-perfect
|
|
||||||
CSS equivalents, and were presentational anyway. However, the other question
|
|
||||||
pops up: should we always convert these to the CSS forms when 1. the spec
|
|
||||||
allows them anyway and 2. older browsers support them better? After all, the
|
|
||||||
whole point about CSS is to seperate styling from content, so inline styling
|
|
||||||
doesn't solve that problem.
|
|
||||||
|
|
||||||
It's an icky question, and we'll have to deal with it as more and more
|
|
||||||
transforms get implemented. As of right now, however, we currently support
|
|
||||||
these loose-only constructs in loose mode:
|
|
||||||
|
|
||||||
- <ul start="1">, <li value="1"> attributes
|
|
||||||
- <u>, <strike>, <s> tags
|
|
||||||
- flow children in <blockquote>
|
|
||||||
- mixed children in <address>
|
|
||||||
|
|
||||||
The changed child definitions as well as the ul.start li.value are the most
|
|
||||||
compelling reasons why loose should be used. We may want offer disabling <u>,
|
|
||||||
<strike> and <s> by themselves. We may also want to offer no pre-emptive
|
|
||||||
deprecated conversions. This all must be unified.
|
|
||||||
|
|
@@ -2,8 +2,23 @@
|
|||||||
Web Hypertext Application Technology Working Group
|
Web Hypertext Application Technology Working Group
|
||||||
WHATWG
|
WHATWG
|
||||||
|
|
||||||
I don't think we need to worry about them. Untrusted users shouldn't be
|
== HTML 5 ==
|
||||||
submitting applications, eh? But if some interesting attribute pops up in
|
|
||||||
their spec, and might be worth supporting, stick it here.
|
|
||||||
|
|
||||||
(none so far, as you can see)
|
URL: http://www.whatwg.org/specs/web-apps/current-work/
|
||||||
|
|
||||||
|
HTML 5 defines a kaboodle of new elements and attributes, as well as
|
||||||
|
some well-defined, "quirks mode" HTML parsing. Although WHATWG professes
|
||||||
|
to be targeted towards web applications, many of their semantic additions
|
||||||
|
would be quite useful in regular documents. Eventually, HTML
|
||||||
|
Purifier will need to audit their lists and figure out what changes need
|
||||||
|
to be made. This process is complicated by the fact that the WHATWG
|
||||||
|
doesn't buy into W3C's modularization of XHTML 1.1: we may need
|
||||||
|
to remodularize HTML 5 (probably done by section name). No sense in
|
||||||
|
committing ourselves till the spec stabilizes, though.
|
||||||
|
|
||||||
|
More immediately speaking though, however, is the well-defined parsing
|
||||||
|
behavior that HTML 5 adds. While I have little interest in writing
|
||||||
|
another DirectLex parser, other parsers like ph5p
|
||||||
|
<http://jero.net/lab/ph5p/> can be adapted to DOMLex to support much more
|
||||||
|
flexible HTML parsing (a cool feature I've seen is how they resolve
|
||||||
|
<b>bold<i>both</b>italic</i>).
|
||||||
|
@@ -25,6 +25,7 @@ h4 {font-family:sans-serif; font-size:0.9em; font-weight:bold; }
|
|||||||
.aside {margin-left:2em; font-family:sans-serif; font-size:0.9em; }
|
.aside {margin-left:2em; font-family:sans-serif; font-size:0.9em; }
|
||||||
blockquote .label {font-weight:bold; font-size:1em; margin:0 0 .1em;
|
blockquote .label {font-weight:bold; font-size:1em; margin:0 0 .1em;
|
||||||
border-bottom:1px solid #CCC;}
|
border-bottom:1px solid #CCC;}
|
||||||
|
.emphasis {font-weight:bold; text-align:center; font-size:1.3em;}
|
||||||
|
|
||||||
/* A regular table */
|
/* A regular table */
|
||||||
.table {border-collapse:collapse; border-bottom:2px solid #888; margin-left:2em; }
|
.table {border-collapse:collapse; border-bottom:2px solid #888; margin-left:2em; }
|
||||||
@@ -66,3 +67,5 @@ q:after {
|
|||||||
/* Marks off sections that are lacking. */
|
/* Marks off sections that are lacking. */
|
||||||
.fixme {margin-left:2em; }
|
.fixme {margin-left:2em; }
|
||||||
.fixme:before {content:"Fix me: "; font-weight:bold; color:#C00; }
|
.fixme:before {content:"Fix me: "; font-weight:bold; color:#C00; }
|
||||||
|
|
||||||
|
#applicability {margin: 1em 5%; font-style:italic;}
|
||||||
|
@@ -22,7 +22,7 @@
|
|||||||
*/
|
*/
|
||||||
|
|
||||||
/*
|
/*
|
||||||
HTML Purifier 1.6.1 - Standards Compliant HTML Filtering
|
HTML Purifier 2.0.0 - Standards Compliant HTML Filtering
|
||||||
Copyright (C) 2006 Edward Z. Yang
|
Copyright (C) 2006 Edward Z. Yang
|
||||||
|
|
||||||
This library is free software; you can redistribute it and/or
|
This library is free software; you can redistribute it and/or
|
||||||
@@ -42,7 +42,7 @@
|
|||||||
|
|
||||||
// almost every class has an undocumented dependency to these, so make sure
|
// almost every class has an undocumented dependency to these, so make sure
|
||||||
// they get included
|
// they get included
|
||||||
require_once 'HTMLPurifier/ConfigSchema.php';
|
require_once 'HTMLPurifier/ConfigSchema.php'; // important
|
||||||
require_once 'HTMLPurifier/Config.php';
|
require_once 'HTMLPurifier/Config.php';
|
||||||
require_once 'HTMLPurifier/Context.php';
|
require_once 'HTMLPurifier/Context.php';
|
||||||
|
|
||||||
@@ -51,6 +51,23 @@ require_once 'HTMLPurifier/Generator.php';
|
|||||||
require_once 'HTMLPurifier/Strategy/Core.php';
|
require_once 'HTMLPurifier/Strategy/Core.php';
|
||||||
require_once 'HTMLPurifier/Encoder.php';
|
require_once 'HTMLPurifier/Encoder.php';
|
||||||
|
|
||||||
|
require_once 'HTMLPurifier/LanguageFactory.php';
|
||||||
|
|
||||||
|
HTMLPurifier_ConfigSchema::define(
|
||||||
|
'Core', 'Language', 'en', 'string', '
|
||||||
|
ISO 639 language code for localizable things in HTML Purifier to use,
|
||||||
|
which is mainly error reporting. There is currently only an English (en)
|
||||||
|
translation, so this directive is currently useless.
|
||||||
|
This directive has been available since 2.0.0.
|
||||||
|
');
|
||||||
|
|
||||||
|
HTMLPurifier_ConfigSchema::define(
|
||||||
|
'Core', 'CollectErrors', false, 'bool', '
|
||||||
|
Whether or not to collect errors found while filtering the document. This
|
||||||
|
is a useful way to give feedback to your users. CURRENTLY NOT IMPLEMENTED.
|
||||||
|
This directive has been available since 2.0.0.
|
||||||
|
');
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Main library execution class.
|
* Main library execution class.
|
||||||
*
|
*
|
||||||
@@ -64,12 +81,12 @@ require_once 'HTMLPurifier/Encoder.php';
|
|||||||
class HTMLPurifier
|
class HTMLPurifier
|
||||||
{
|
{
|
||||||
|
|
||||||
var $version = '1.6.1';
|
var $version = '2.0.0';
|
||||||
|
|
||||||
var $config;
|
var $config;
|
||||||
var $filters;
|
var $filters;
|
||||||
|
|
||||||
var $lexer, $strategy, $generator;
|
var $strategy, $generator;
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Final HTMLPurifier_Context of last run purification. Might be an array.
|
* Final HTMLPurifier_Context of last run purification. Might be an array.
|
||||||
@@ -89,7 +106,6 @@ class HTMLPurifier
|
|||||||
|
|
||||||
$this->config = HTMLPurifier_Config::create($config);
|
$this->config = HTMLPurifier_Config::create($config);
|
||||||
|
|
||||||
$this->lexer = HTMLPurifier_Lexer::create();
|
|
||||||
$this->strategy = new HTMLPurifier_Strategy_Core();
|
$this->strategy = new HTMLPurifier_Strategy_Core();
|
||||||
$this->generator = new HTMLPurifier_Generator();
|
$this->generator = new HTMLPurifier_Generator();
|
||||||
|
|
||||||
@@ -117,7 +133,23 @@ class HTMLPurifier
|
|||||||
|
|
||||||
$config = $config ? HTMLPurifier_Config::create($config) : $this->config;
|
$config = $config ? HTMLPurifier_Config::create($config) : $this->config;
|
||||||
|
|
||||||
|
// implementation is partially environment dependant, partially
|
||||||
|
// configuration dependant
|
||||||
|
$lexer = HTMLPurifier_Lexer::create($config);
|
||||||
|
|
||||||
$context = new HTMLPurifier_Context();
|
$context = new HTMLPurifier_Context();
|
||||||
|
|
||||||
|
// set up global context variables
|
||||||
|
if ($config->get('Core', 'CollectErrors')) {
|
||||||
|
// may get moved out if other facilities use it
|
||||||
|
$language_factory = HTMLPurifier_LanguageFactory::instance();
|
||||||
|
$language = $language_factory->create($config->get('Core', 'Language'));
|
||||||
|
$context->register('Locale', $language);
|
||||||
|
|
||||||
|
$error_collector = new HTMLPurifier_ErrorCollector();
|
||||||
|
$context->register('ErrorCollector', $language);
|
||||||
|
}
|
||||||
|
|
||||||
$html = HTMLPurifier_Encoder::convertToUTF8($html, $config, $context);
|
$html = HTMLPurifier_Encoder::convertToUTF8($html, $config, $context);
|
||||||
|
|
||||||
for ($i = 0, $size = count($this->filters); $i < $size; $i++) {
|
for ($i = 0, $size = count($this->filters); $i < $size; $i++) {
|
||||||
@@ -130,7 +162,7 @@ class HTMLPurifier
|
|||||||
// list of tokens
|
// list of tokens
|
||||||
$this->strategy->execute(
|
$this->strategy->execute(
|
||||||
// list of un-purified tokens
|
// list of un-purified tokens
|
||||||
$this->lexer->tokenizeHTML(
|
$lexer->tokenizeHTML(
|
||||||
// un-purified HTML
|
// un-purified HTML
|
||||||
$html, $config, $context
|
$html, $config, $context
|
||||||
),
|
),
|
||||||
@@ -164,6 +196,23 @@ class HTMLPurifier
|
|||||||
return $array_of_html;
|
return $array_of_html;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Singleton for enforcing just one HTML Purifier in your system
|
||||||
|
*/
|
||||||
|
function &getInstance($prototype = null) {
|
||||||
|
static $htmlpurifier;
|
||||||
|
if (!$htmlpurifier || $prototype) {
|
||||||
|
if (is_a($prototype, 'HTMLPurifier')) {
|
||||||
|
$htmlpurifier = $prototype;
|
||||||
|
} elseif ($prototype) {
|
||||||
|
$htmlpurifier = new HTMLPurifier(HTMLPurifier_Config::create($prototype));
|
||||||
|
} else {
|
||||||
|
$htmlpurifier = new HTMLPurifier();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return $htmlpurifier;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@@ -1,7 +1,6 @@
|
|||||||
<?php
|
<?php
|
||||||
|
|
||||||
require_once 'HTMLPurifier/AttrTypes.php';
|
require_once 'HTMLPurifier/AttrTypes.php';
|
||||||
require_once 'HTMLPurifier/AttrDef/Lang.php';
|
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Defines common attribute collections that modules reference
|
* Defines common attribute collections that modules reference
|
||||||
@@ -12,8 +11,6 @@ class HTMLPurifier_AttrCollections
|
|||||||
|
|
||||||
/**
|
/**
|
||||||
* Associative array of attribute collections, indexed by name
|
* Associative array of attribute collections, indexed by name
|
||||||
* @note Technically, the composition of these is more complicated,
|
|
||||||
* but we bypass it using our own excludes property
|
|
||||||
*/
|
*/
|
||||||
var $info = array();
|
var $info = array();
|
||||||
|
|
||||||
@@ -25,27 +22,29 @@ class HTMLPurifier_AttrCollections
|
|||||||
* @param $modules Hash array of HTMLPurifier_HTMLModule members
|
* @param $modules Hash array of HTMLPurifier_HTMLModule members
|
||||||
*/
|
*/
|
||||||
function HTMLPurifier_AttrCollections($attr_types, $modules) {
|
function HTMLPurifier_AttrCollections($attr_types, $modules) {
|
||||||
$info =& $this->info;
|
|
||||||
// load extensions from the modules
|
// load extensions from the modules
|
||||||
foreach ($modules as $module) {
|
foreach ($modules as $module) {
|
||||||
foreach ($module->attr_collections as $coll_i => $coll) {
|
foreach ($module->attr_collections as $coll_i => $coll) {
|
||||||
|
if (!isset($this->info[$coll_i])) {
|
||||||
|
$this->info[$coll_i] = array();
|
||||||
|
}
|
||||||
foreach ($coll as $attr_i => $attr) {
|
foreach ($coll as $attr_i => $attr) {
|
||||||
if ($attr_i === 0 && isset($info[$coll_i][$attr_i])) {
|
if ($attr_i === 0 && isset($this->info[$coll_i][$attr_i])) {
|
||||||
// merge in includes
|
// merge in includes
|
||||||
$info[$coll_i][$attr_i] = array_merge(
|
$this->info[$coll_i][$attr_i] = array_merge(
|
||||||
$info[$coll_i][$attr_i], $attr);
|
$this->info[$coll_i][$attr_i], $attr);
|
||||||
continue;
|
continue;
|
||||||
}
|
}
|
||||||
$info[$coll_i][$attr_i] = $attr;
|
$this->info[$coll_i][$attr_i] = $attr;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
// perform internal expansions and inclusions
|
// perform internal expansions and inclusions
|
||||||
foreach ($info as $name => $attr) {
|
foreach ($this->info as $name => $attr) {
|
||||||
// merge attribute collections that include others
|
// merge attribute collections that include others
|
||||||
$this->performInclusions($info[$name]);
|
$this->performInclusions($this->info[$name]);
|
||||||
// replace string identifiers with actual attribute objects
|
// replace string identifiers with actual attribute objects
|
||||||
$this->expandIdentifiers($info[$name], $attr_types);
|
$this->expandIdentifiers($this->info[$name], $attr_types);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -57,16 +56,20 @@ class HTMLPurifier_AttrCollections
|
|||||||
function performInclusions(&$attr) {
|
function performInclusions(&$attr) {
|
||||||
if (!isset($attr[0])) return;
|
if (!isset($attr[0])) return;
|
||||||
$merge = $attr[0];
|
$merge = $attr[0];
|
||||||
|
$seen = array(); // recursion guard
|
||||||
// loop through all the inclusions
|
// loop through all the inclusions
|
||||||
for ($i = 0; isset($merge[$i]); $i++) {
|
for ($i = 0; isset($merge[$i]); $i++) {
|
||||||
|
if (isset($seen[$merge[$i]])) continue;
|
||||||
|
$seen[$merge[$i]] = true;
|
||||||
// foreach attribute of the inclusion, copy it over
|
// foreach attribute of the inclusion, copy it over
|
||||||
|
if (!isset($this->info[$merge[$i]])) continue;
|
||||||
foreach ($this->info[$merge[$i]] as $key => $value) {
|
foreach ($this->info[$merge[$i]] as $key => $value) {
|
||||||
if (isset($attr[$key])) continue; // also catches more inclusions
|
if (isset($attr[$key])) continue; // also catches more inclusions
|
||||||
$attr[$key] = $value;
|
$attr[$key] = $value;
|
||||||
}
|
}
|
||||||
if (isset($info[$merge[$i]][0])) {
|
if (isset($this->info[$merge[$i]][0])) {
|
||||||
// recursion
|
// recursion
|
||||||
$merge = array_merge($merge, isset($info[$merge[$i]][0]));
|
$merge = array_merge($merge, $this->info[$merge[$i]][0]);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
unset($attr[0]);
|
unset($attr[0]);
|
||||||
@@ -79,20 +82,47 @@ class HTMLPurifier_AttrCollections
|
|||||||
* @param $attr_types HTMLPurifier_AttrTypes instance
|
* @param $attr_types HTMLPurifier_AttrTypes instance
|
||||||
*/
|
*/
|
||||||
function expandIdentifiers(&$attr, $attr_types) {
|
function expandIdentifiers(&$attr, $attr_types) {
|
||||||
|
|
||||||
|
// because foreach will process new elements we add, make sure we
|
||||||
|
// skip duplicates
|
||||||
|
$processed = array();
|
||||||
|
|
||||||
foreach ($attr as $def_i => $def) {
|
foreach ($attr as $def_i => $def) {
|
||||||
|
// skip inclusions
|
||||||
if ($def_i === 0) continue;
|
if ($def_i === 0) continue;
|
||||||
if (!is_string($def)) continue;
|
|
||||||
|
if (isset($processed[$def_i])) continue;
|
||||||
|
|
||||||
|
// determine whether or not attribute is required
|
||||||
|
if ($required = (strpos($def_i, '*') !== false)) {
|
||||||
|
// rename the definition
|
||||||
|
unset($attr[$def_i]);
|
||||||
|
$def_i = trim($def_i, '*');
|
||||||
|
$attr[$def_i] = $def;
|
||||||
|
}
|
||||||
|
|
||||||
|
$processed[$def_i] = true;
|
||||||
|
|
||||||
|
// if we've already got a literal object, move on
|
||||||
|
if (is_object($def)) {
|
||||||
|
// preserve previous required
|
||||||
|
$attr[$def_i]->required = ($required || $attr[$def_i]->required);
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
|
||||||
if ($def === false) {
|
if ($def === false) {
|
||||||
unset($attr[$def_i]);
|
unset($attr[$def_i]);
|
||||||
continue;
|
continue;
|
||||||
}
|
}
|
||||||
if (isset($attr_types->info[$def])) {
|
|
||||||
$attr[$def_i] = $attr_types->info[$def];
|
if ($t = $attr_types->get($def)) {
|
||||||
|
$attr[$def_i] = $t;
|
||||||
|
$attr[$def_i]->required = $required;
|
||||||
} else {
|
} else {
|
||||||
trigger_error('Attempted to reference undefined attribute type', E_USER_ERROR);
|
|
||||||
unset($attr[$def_i]);
|
unset($attr[$def_i]);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|
||||||
}
|
}
|
||||||
|
@@ -14,11 +14,17 @@ class HTMLPurifier_AttrDef
|
|||||||
{
|
{
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Tells us whether or not an HTML attribute is minimized. Only the
|
* Tells us whether or not an HTML attribute is minimized. Has no
|
||||||
* boolean attribute vapourware would use this.
|
* meaning in other contexts.
|
||||||
*/
|
*/
|
||||||
var $minimized = false;
|
var $minimized = false;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Tells us whether or not an HTML attribute is required. Has no
|
||||||
|
* meaning in other contexts
|
||||||
|
*/
|
||||||
|
var $required = false;
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Validates and cleans passed string according to a definition.
|
* Validates and cleans passed string according to a definition.
|
||||||
*
|
*
|
||||||
@@ -62,6 +68,20 @@ class HTMLPurifier_AttrDef
|
|||||||
$string = str_replace(array("\r", "\t"), ' ', $string);
|
$string = str_replace(array("\r", "\t"), ' ', $string);
|
||||||
return $string;
|
return $string;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Factory method for creating this class from a string.
|
||||||
|
* @param $string String construction info
|
||||||
|
* @return Created AttrDef object corresponding to $string
|
||||||
|
* @public
|
||||||
|
*/
|
||||||
|
function make($string) {
|
||||||
|
// default implementation, return flyweight of this object
|
||||||
|
// if overloaded, it is *necessary* for you to clone the
|
||||||
|
// object (usually by instantiating a new copy) and return that
|
||||||
|
return $this;
|
||||||
|
}
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|
||||||
?>
|
?>
|
@@ -2,43 +2,47 @@
|
|||||||
|
|
||||||
require_once 'HTMLPurifier/AttrDef.php';
|
require_once 'HTMLPurifier/AttrDef.php';
|
||||||
|
|
||||||
|
HTMLPurifier_ConfigSchema::define(
|
||||||
|
'Core', 'ColorKeywords', array(
|
||||||
|
'maroon' => '#800000',
|
||||||
|
'red' => '#FF0000',
|
||||||
|
'orange' => '#FFA500',
|
||||||
|
'yellow' => '#FFFF00',
|
||||||
|
'olive' => '#808000',
|
||||||
|
'purple' => '#800080',
|
||||||
|
'fuchsia' => '#FF00FF',
|
||||||
|
'white' => '#FFFFFF',
|
||||||
|
'lime' => '#00FF00',
|
||||||
|
'green' => '#008000',
|
||||||
|
'navy' => '#000080',
|
||||||
|
'blue' => '#0000FF',
|
||||||
|
'aqua' => '#00FFFF',
|
||||||
|
'teal' => '#008080',
|
||||||
|
'black' => '#000000',
|
||||||
|
'silver' => '#C0C0C0',
|
||||||
|
'gray' => '#808080'
|
||||||
|
), 'hash', '
|
||||||
|
Lookup array of color names to six digit hexadecimal number corresponding
|
||||||
|
to color, with preceding hash mark. Used when parsing colors.
|
||||||
|
This directive has been available since 2.0.0.
|
||||||
|
');
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Validates Color as defined by CSS.
|
* Validates Color as defined by CSS.
|
||||||
*/
|
*/
|
||||||
class HTMLPurifier_AttrDef_CSS_Color extends HTMLPurifier_AttrDef
|
class HTMLPurifier_AttrDef_CSS_Color extends HTMLPurifier_AttrDef
|
||||||
{
|
{
|
||||||
|
|
||||||
/**
|
|
||||||
* Color keyword lookup table.
|
|
||||||
* @todo Extend it to include all usually allowed colors.
|
|
||||||
*/
|
|
||||||
var $colors = array(
|
|
||||||
'maroon' => '#800000',
|
|
||||||
'red' => '#F00',
|
|
||||||
'orange' => '#FFA500',
|
|
||||||
'yellow' => '#FF0',
|
|
||||||
'olive' => '#808000',
|
|
||||||
'purple' => '#800080',
|
|
||||||
'fuchsia' => '#F0F',
|
|
||||||
'white' => '#FFF',
|
|
||||||
'lime' => '#0F0',
|
|
||||||
'green' => '#008000',
|
|
||||||
'navy' => '#000080',
|
|
||||||
'blue' => '#00F',
|
|
||||||
'aqua' => '#0FF',
|
|
||||||
'teal' => '#008080',
|
|
||||||
'black' => '#000',
|
|
||||||
'silver' => '#C0C0C0',
|
|
||||||
'gray' => '#808080'
|
|
||||||
);
|
|
||||||
|
|
||||||
function validate($color, $config, &$context) {
|
function validate($color, $config, &$context) {
|
||||||
|
|
||||||
|
static $colors = null;
|
||||||
|
if ($colors === null) $colors = $config->get('Core', 'ColorKeywords');
|
||||||
|
|
||||||
$color = trim($color);
|
$color = trim($color);
|
||||||
if (!$color) return false;
|
if (!$color) return false;
|
||||||
|
|
||||||
$lower = strtolower($color);
|
$lower = strtolower($color);
|
||||||
if (isset($this->colors[$lower])) return $this->colors[$lower];
|
if (isset($colors[$lower])) return $colors[$lower];
|
||||||
|
|
||||||
if ($color[0] === '#') {
|
if ($color[0] === '#') {
|
||||||
// hexadecimal handling
|
// hexadecimal handling
|
||||||
|
@@ -18,18 +18,6 @@ class HTMLPurifier_AttrDef_CSS_Font extends HTMLPurifier_AttrDef
|
|||||||
*/
|
*/
|
||||||
var $info = array();
|
var $info = array();
|
||||||
|
|
||||||
/**
|
|
||||||
* System font keywords.
|
|
||||||
*/
|
|
||||||
var $system_fonts = array(
|
|
||||||
'caption' => true,
|
|
||||||
'icon' => true,
|
|
||||||
'menu' => true,
|
|
||||||
'message-box' => true,
|
|
||||||
'small-caption' => true,
|
|
||||||
'status-bar' => true
|
|
||||||
);
|
|
||||||
|
|
||||||
function HTMLPurifier_AttrDef_CSS_Font($config) {
|
function HTMLPurifier_AttrDef_CSS_Font($config) {
|
||||||
$def = $config->getCSSDefinition();
|
$def = $config->getCSSDefinition();
|
||||||
$this->info['font-style'] = $def->info['font-style'];
|
$this->info['font-style'] = $def->info['font-style'];
|
||||||
@@ -42,13 +30,22 @@ class HTMLPurifier_AttrDef_CSS_Font extends HTMLPurifier_AttrDef
|
|||||||
|
|
||||||
function validate($string, $config, &$context) {
|
function validate($string, $config, &$context) {
|
||||||
|
|
||||||
|
static $system_fonts = array(
|
||||||
|
'caption' => true,
|
||||||
|
'icon' => true,
|
||||||
|
'menu' => true,
|
||||||
|
'message-box' => true,
|
||||||
|
'small-caption' => true,
|
||||||
|
'status-bar' => true
|
||||||
|
);
|
||||||
|
|
||||||
// regular pre-processing
|
// regular pre-processing
|
||||||
$string = $this->parseCDATA($string);
|
$string = $this->parseCDATA($string);
|
||||||
if ($string === '') return false;
|
if ($string === '') return false;
|
||||||
|
|
||||||
// check if it's one of the keywords
|
// check if it's one of the keywords
|
||||||
$lowercase_string = strtolower($string);
|
$lowercase_string = strtolower($string);
|
||||||
if (isset($this->system_fonts[$lowercase_string])) {
|
if (isset($system_fonts[$lowercase_string])) {
|
||||||
return $lowercase_string;
|
return $lowercase_string;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@@ -10,19 +10,15 @@ require_once 'HTMLPurifier/AttrDef.php';
|
|||||||
class HTMLPurifier_AttrDef_CSS_FontFamily extends HTMLPurifier_AttrDef
|
class HTMLPurifier_AttrDef_CSS_FontFamily extends HTMLPurifier_AttrDef
|
||||||
{
|
{
|
||||||
|
|
||||||
/**
|
|
||||||
* Generic font family keywords.
|
|
||||||
* @protected
|
|
||||||
*/
|
|
||||||
var $generic_names = array(
|
|
||||||
'serif' => true,
|
|
||||||
'sans-serif' => true,
|
|
||||||
'monospace' => true,
|
|
||||||
'fantasy' => true,
|
|
||||||
'cursive' => true
|
|
||||||
);
|
|
||||||
|
|
||||||
function validate($string, $config, &$context) {
|
function validate($string, $config, &$context) {
|
||||||
|
static $generic_names = array(
|
||||||
|
'serif' => true,
|
||||||
|
'sans-serif' => true,
|
||||||
|
'monospace' => true,
|
||||||
|
'fantasy' => true,
|
||||||
|
'cursive' => true
|
||||||
|
);
|
||||||
|
|
||||||
$string = $this->parseCDATA($string);
|
$string = $this->parseCDATA($string);
|
||||||
// assume that no font names contain commas in them
|
// assume that no font names contain commas in them
|
||||||
$fonts = explode(',', $string);
|
$fonts = explode(',', $string);
|
||||||
@@ -31,7 +27,7 @@ class HTMLPurifier_AttrDef_CSS_FontFamily extends HTMLPurifier_AttrDef
|
|||||||
$font = trim($font);
|
$font = trim($font);
|
||||||
if ($font === '') continue;
|
if ($font === '') continue;
|
||||||
// match a generic name
|
// match a generic name
|
||||||
if (isset($this->generic_names[$font])) {
|
if (isset($generic_names[$font])) {
|
||||||
$final .= $font . ', ';
|
$final .= $font . ', ';
|
||||||
continue;
|
continue;
|
||||||
}
|
}
|
||||||
|
@@ -10,23 +10,19 @@ require_once 'HTMLPurifier/AttrDef.php';
|
|||||||
class HTMLPurifier_AttrDef_CSS_TextDecoration extends HTMLPurifier_AttrDef
|
class HTMLPurifier_AttrDef_CSS_TextDecoration extends HTMLPurifier_AttrDef
|
||||||
{
|
{
|
||||||
|
|
||||||
/**
|
|
||||||
* Lookup table of allowed values.
|
|
||||||
* @protected
|
|
||||||
*/
|
|
||||||
var $allowed_values = array(
|
|
||||||
'line-through' => true,
|
|
||||||
'overline' => true,
|
|
||||||
'underline' => true
|
|
||||||
);
|
|
||||||
|
|
||||||
function validate($string, $config, &$context) {
|
function validate($string, $config, &$context) {
|
||||||
|
|
||||||
|
static $allowed_values = array(
|
||||||
|
'line-through' => true,
|
||||||
|
'overline' => true,
|
||||||
|
'underline' => true
|
||||||
|
);
|
||||||
|
|
||||||
$string = strtolower($this->parseCDATA($string));
|
$string = strtolower($this->parseCDATA($string));
|
||||||
$parts = explode(' ', $string);
|
$parts = explode(' ', $string);
|
||||||
$final = '';
|
$final = '';
|
||||||
foreach ($parts as $part) {
|
foreach ($parts as $part) {
|
||||||
if (isset($this->allowed_values[$part])) {
|
if (isset($allowed_values[$part])) {
|
||||||
$final .= $part . ' ';
|
$final .= $part . ' ';
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
@@ -29,7 +29,7 @@ class HTMLPurifier_AttrDef_CSS_URI extends HTMLPurifier_AttrDef_URI
|
|||||||
if ($uri_string[$new_length] != ')') return false;
|
if ($uri_string[$new_length] != ')') return false;
|
||||||
$uri = trim(substr($uri_string, 0, $new_length));
|
$uri = trim(substr($uri_string, 0, $new_length));
|
||||||
|
|
||||||
if (isset($uri[0]) && ($uri[0] == "'" || $uri[0] == '"')) {
|
if (!empty($uri) && ($uri[0] == "'" || $uri[0] == '"')) {
|
||||||
$quote = $uri[0];
|
$quote = $uri[0];
|
||||||
$new_length = strlen($uri) - 1;
|
$new_length = strlen($uri) - 1;
|
||||||
if ($uri[$new_length] !== $quote) return false;
|
if ($uri[$new_length] !== $quote) return false;
|
||||||
|
@@ -45,6 +45,22 @@ class HTMLPurifier_AttrDef_Enum extends HTMLPurifier_AttrDef
|
|||||||
return $result ? $string : false;
|
return $result ? $string : false;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* @param $string In form of comma-delimited list of case-insensitive
|
||||||
|
* valid values. Example: "foo,bar,baz". Prepend "s:" to make
|
||||||
|
* case sensitive
|
||||||
|
*/
|
||||||
|
function make($string) {
|
||||||
|
if (strlen($string) > 2 && $string[0] == 's' && $string[1] == ':') {
|
||||||
|
$string = substr($string, 2);
|
||||||
|
$sensitive = true;
|
||||||
|
} else {
|
||||||
|
$sensitive = false;
|
||||||
|
}
|
||||||
|
$values = explode(',', $string);
|
||||||
|
return new HTMLPurifier_AttrDef_Enum($values, $sensitive);
|
||||||
|
}
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|
||||||
?>
|
?>
|
30
library/HTMLPurifier/AttrDef/HTML/Bool.php
Normal file
30
library/HTMLPurifier/AttrDef/HTML/Bool.php
Normal file
@@ -0,0 +1,30 @@
|
|||||||
|
<?php
|
||||||
|
|
||||||
|
require_once 'HTMLPurifier/AttrDef.php';
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Validates a boolean attribute
|
||||||
|
*/
|
||||||
|
class HTMLPurifier_AttrDef_HTML_Bool extends HTMLPurifier_AttrDef
|
||||||
|
{
|
||||||
|
|
||||||
|
var $name;
|
||||||
|
var $minimized = true;
|
||||||
|
|
||||||
|
function HTMLPurifier_AttrDef_HTML_Bool($name = false) {$this->name = $name;}
|
||||||
|
|
||||||
|
function validate($string, $config, &$context) {
|
||||||
|
if (empty($string)) return false;
|
||||||
|
return $this->name;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* @param $string Name of attribute
|
||||||
|
*/
|
||||||
|
function make($string) {
|
||||||
|
return new HTMLPurifier_AttrDef_HTML_Bool($string);
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
?>
|
35
library/HTMLPurifier/AttrDef/HTML/Color.php
Normal file
35
library/HTMLPurifier/AttrDef/HTML/Color.php
Normal file
@@ -0,0 +1,35 @@
|
|||||||
|
<?php
|
||||||
|
|
||||||
|
require_once 'HTMLPurifier/AttrDef.php';
|
||||||
|
require_once 'HTMLPurifier/AttrDef/CSS/Color.php'; // for %Core.ColorKeywords
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Validates a color according to the HTML spec.
|
||||||
|
*/
|
||||||
|
class HTMLPurifier_AttrDef_HTML_Color extends HTMLPurifier_AttrDef
|
||||||
|
{
|
||||||
|
|
||||||
|
function validate($string, $config, &$context) {
|
||||||
|
|
||||||
|
static $colors = null;
|
||||||
|
if ($colors === null) $colors = $config->get('Core', 'ColorKeywords');
|
||||||
|
|
||||||
|
$string = trim($string);
|
||||||
|
|
||||||
|
if (empty($string)) return false;
|
||||||
|
if (isset($colors[$string])) return $colors[$string];
|
||||||
|
if ($string[0] === '#') $hex = substr($string, 1);
|
||||||
|
else $hex = $string;
|
||||||
|
|
||||||
|
$length = strlen($hex);
|
||||||
|
if ($length !== 3 && $length !== 6) return false;
|
||||||
|
if (!ctype_xdigit($hex)) return false;
|
||||||
|
if ($length === 3) $hex = $hex[0].$hex[0].$hex[1].$hex[1].$hex[2].$hex[2];
|
||||||
|
|
||||||
|
return "#$hex";
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
?>
|
@@ -26,22 +26,20 @@ HTMLPurifier_ConfigSchema::define(
|
|||||||
class HTMLPurifier_AttrDef_HTML_LinkTypes extends HTMLPurifier_AttrDef
|
class HTMLPurifier_AttrDef_HTML_LinkTypes extends HTMLPurifier_AttrDef
|
||||||
{
|
{
|
||||||
|
|
||||||
/** Lookup array of attribute names to configuration name */
|
|
||||||
var $configLookup = array(
|
|
||||||
'rel' => 'AllowedRel',
|
|
||||||
'rev' => 'AllowedRev'
|
|
||||||
);
|
|
||||||
|
|
||||||
/** Name config attribute to pull. */
|
/** Name config attribute to pull. */
|
||||||
var $name;
|
var $name;
|
||||||
|
|
||||||
function HTMLPurifier_AttrDef_HTML_LinkTypes($name) {
|
function HTMLPurifier_AttrDef_HTML_LinkTypes($name) {
|
||||||
if (!isset($this->configLookup[$name])) {
|
$configLookup = array(
|
||||||
|
'rel' => 'AllowedRel',
|
||||||
|
'rev' => 'AllowedRev'
|
||||||
|
);
|
||||||
|
if (!isset($configLookup[$name])) {
|
||||||
trigger_error('Unrecognized attribute name for link '.
|
trigger_error('Unrecognized attribute name for link '.
|
||||||
'relationship.', E_USER_ERROR);
|
'relationship.', E_USER_ERROR);
|
||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
$this->name = $this->configLookup[$name];
|
$this->name = $configLookup[$name];
|
||||||
}
|
}
|
||||||
|
|
||||||
function validate($string, $config, &$context) {
|
function validate($string, $config, &$context) {
|
||||||
|
@@ -93,7 +93,6 @@ class HTMLPurifier_AttrDef_URI extends HTMLPurifier_AttrDef
|
|||||||
{
|
{
|
||||||
|
|
||||||
var $host;
|
var $host;
|
||||||
var $PercentEncoder;
|
|
||||||
var $embeds_resource;
|
var $embeds_resource;
|
||||||
|
|
||||||
/**
|
/**
|
||||||
@@ -101,12 +100,14 @@ class HTMLPurifier_AttrDef_URI extends HTMLPurifier_AttrDef
|
|||||||
*/
|
*/
|
||||||
function HTMLPurifier_AttrDef_URI($embeds_resource = false) {
|
function HTMLPurifier_AttrDef_URI($embeds_resource = false) {
|
||||||
$this->host = new HTMLPurifier_AttrDef_URI_Host();
|
$this->host = new HTMLPurifier_AttrDef_URI_Host();
|
||||||
$this->PercentEncoder = new HTMLPurifier_PercentEncoder();
|
|
||||||
$this->embeds_resource = (bool) $embeds_resource;
|
$this->embeds_resource = (bool) $embeds_resource;
|
||||||
}
|
}
|
||||||
|
|
||||||
function validate($uri, $config, &$context) {
|
function validate($uri, $config, &$context) {
|
||||||
|
|
||||||
|
static $PercentEncoder = null;
|
||||||
|
if ($PercentEncoder === null) $PercentEncoder = new HTMLPurifier_PercentEncoder();
|
||||||
|
|
||||||
// We'll write stack-based parsers later, for now, use regexps to
|
// We'll write stack-based parsers later, for now, use regexps to
|
||||||
// get things working as fast as possible (irony)
|
// get things working as fast as possible (irony)
|
||||||
|
|
||||||
@@ -116,7 +117,7 @@ class HTMLPurifier_AttrDef_URI extends HTMLPurifier_AttrDef
|
|||||||
$uri = $this->parseCDATA($uri);
|
$uri = $this->parseCDATA($uri);
|
||||||
|
|
||||||
// fix up percent-encoding
|
// fix up percent-encoding
|
||||||
$uri = $this->PercentEncoder->normalize($uri);
|
$uri = $PercentEncoder->normalize($uri);
|
||||||
|
|
||||||
// while it would be nice to use parse_url(), that's specifically
|
// while it would be nice to use parse_url(), that's specifically
|
||||||
// for HTTP and thus won't work for our generic URI parsing
|
// for HTTP and thus won't work for our generic URI parsing
|
||||||
@@ -157,6 +158,14 @@ class HTMLPurifier_AttrDef_URI extends HTMLPurifier_AttrDef
|
|||||||
);
|
);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// something funky weird happened in the registry, abort!
|
||||||
|
if (!$scheme_obj) {
|
||||||
|
trigger_error(
|
||||||
|
'Default scheme object "' . $config->get('URI', 'DefaultScheme') . '" was not readable',
|
||||||
|
E_USER_WARNING
|
||||||
|
);
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
// the URI we're processing embeds_resource a resource in the page, but the URI
|
// the URI we're processing embeds_resource a resource in the page, but the URI
|
||||||
// it references cannot be located
|
// it references cannot be located
|
||||||
|
@@ -15,13 +15,10 @@ class HTMLPurifier_AttrDef_URI_IPv4 extends HTMLPurifier_AttrDef
|
|||||||
*/
|
*/
|
||||||
var $ip4;
|
var $ip4;
|
||||||
|
|
||||||
function HTMLPurifier_AttrDef_URI_IPv4() {
|
|
||||||
$oct = '(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9])'; // 0-255
|
|
||||||
$this->ip4 = "(?:{$oct}\\.{$oct}\\.{$oct}\\.{$oct})";
|
|
||||||
}
|
|
||||||
|
|
||||||
function validate($aIP, $config, &$context) {
|
function validate($aIP, $config, &$context) {
|
||||||
|
|
||||||
|
if (!$this->ip4) $this->_loadRegex();
|
||||||
|
|
||||||
if (preg_match('#^' . $this->ip4 . '$#s', $aIP))
|
if (preg_match('#^' . $this->ip4 . '$#s', $aIP))
|
||||||
{
|
{
|
||||||
return $aIP;
|
return $aIP;
|
||||||
@@ -31,6 +28,15 @@ class HTMLPurifier_AttrDef_URI_IPv4 extends HTMLPurifier_AttrDef
|
|||||||
|
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Lazy load function to prevent regex from being stuffed in
|
||||||
|
* cache.
|
||||||
|
*/
|
||||||
|
function _loadRegex() {
|
||||||
|
$oct = '(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9])'; // 0-255
|
||||||
|
$this->ip4 = "(?:{$oct}\\.{$oct}\\.{$oct}\\.{$oct})";
|
||||||
|
}
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|
||||||
?>
|
?>
|
@@ -13,6 +13,8 @@ class HTMLPurifier_AttrDef_URI_IPv6 extends HTMLPurifier_AttrDef_URI_IPv4
|
|||||||
|
|
||||||
function validate($aIP, $config, &$context) {
|
function validate($aIP, $config, &$context) {
|
||||||
|
|
||||||
|
if (!$this->ip4) $this->_loadRegex();
|
||||||
|
|
||||||
$original = $aIP;
|
$original = $aIP;
|
||||||
|
|
||||||
$hex = '[0-9a-fA-F]';
|
$hex = '[0-9a-fA-F]';
|
||||||
|
@@ -20,7 +20,10 @@ HTMLPurifier_ConfigSchema::define(
|
|||||||
);
|
);
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Post-transform that ensures the required attrs of img (alt and src) are set
|
* Transform that supplies default values for the src and alt attributes
|
||||||
|
* in img tags, as well as prevents the img tag from being removed
|
||||||
|
* because of a missing alt tag. This needs to be registered as both
|
||||||
|
* a pre and post attribute transform.
|
||||||
*/
|
*/
|
||||||
class HTMLPurifier_AttrTransform_ImgRequired extends HTMLPurifier_AttrTransform
|
class HTMLPurifier_AttrTransform_ImgRequired extends HTMLPurifier_AttrTransform
|
||||||
{
|
{
|
||||||
@@ -29,6 +32,7 @@ class HTMLPurifier_AttrTransform_ImgRequired extends HTMLPurifier_AttrTransform
|
|||||||
|
|
||||||
$src = true;
|
$src = true;
|
||||||
if (!isset($attr['src'])) {
|
if (!isset($attr['src'])) {
|
||||||
|
if ($config->get('Core', 'RemoveInvalidImg')) return $attr;
|
||||||
$attr['src'] = $config->get('Attr', 'DefaultInvalidImage');
|
$attr['src'] = $config->get('Attr', 'DefaultInvalidImage');
|
||||||
$src = false;
|
$src = false;
|
||||||
}
|
}
|
||||||
|
@@ -1,10 +1,14 @@
|
|||||||
<?php
|
<?php
|
||||||
|
|
||||||
|
require_once 'HTMLPurifier/AttrDef/Lang.php';
|
||||||
|
require_once 'HTMLPurifier/AttrDef/Enum.php';
|
||||||
|
require_once 'HTMLPurifier/AttrDef/HTML/Bool.php';
|
||||||
require_once 'HTMLPurifier/AttrDef/HTML/ID.php';
|
require_once 'HTMLPurifier/AttrDef/HTML/ID.php';
|
||||||
require_once 'HTMLPurifier/AttrDef/HTML/Length.php';
|
require_once 'HTMLPurifier/AttrDef/HTML/Length.php';
|
||||||
require_once 'HTMLPurifier/AttrDef/HTML/MultiLength.php';
|
require_once 'HTMLPurifier/AttrDef/HTML/MultiLength.php';
|
||||||
require_once 'HTMLPurifier/AttrDef/HTML/Nmtokens.php';
|
require_once 'HTMLPurifier/AttrDef/HTML/Nmtokens.php';
|
||||||
require_once 'HTMLPurifier/AttrDef/HTML/Pixels.php';
|
require_once 'HTMLPurifier/AttrDef/HTML/Pixels.php';
|
||||||
|
require_once 'HTMLPurifier/AttrDef/HTML/Color.php';
|
||||||
require_once 'HTMLPurifier/AttrDef/Integer.php';
|
require_once 'HTMLPurifier/AttrDef/Integer.php';
|
||||||
require_once 'HTMLPurifier/AttrDef/Text.php';
|
require_once 'HTMLPurifier/AttrDef/Text.php';
|
||||||
require_once 'HTMLPurifier/AttrDef/URI.php';
|
require_once 'HTMLPurifier/AttrDef/URI.php';
|
||||||
@@ -16,14 +20,19 @@ class HTMLPurifier_AttrTypes
|
|||||||
{
|
{
|
||||||
/**
|
/**
|
||||||
* Lookup array of attribute string identifiers to concrete implementations
|
* Lookup array of attribute string identifiers to concrete implementations
|
||||||
* @public
|
* @protected
|
||||||
*/
|
*/
|
||||||
var $info = array();
|
var $info = array();
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Constructs the info array
|
* Constructs the info array, supplying default implementations for attribute
|
||||||
|
* types.
|
||||||
*/
|
*/
|
||||||
function HTMLPurifier_AttrTypes() {
|
function HTMLPurifier_AttrTypes() {
|
||||||
|
// pseudo-types, must be instantiated via shorthand
|
||||||
|
$this->info['Enum'] = new HTMLPurifier_AttrDef_Enum();
|
||||||
|
$this->info['Bool'] = new HTMLPurifier_AttrDef_HTML_Bool();
|
||||||
|
|
||||||
$this->info['CDATA'] = new HTMLPurifier_AttrDef_Text();
|
$this->info['CDATA'] = new HTMLPurifier_AttrDef_Text();
|
||||||
$this->info['ID'] = new HTMLPurifier_AttrDef_HTML_ID();
|
$this->info['ID'] = new HTMLPurifier_AttrDef_HTML_ID();
|
||||||
$this->info['Length'] = new HTMLPurifier_AttrDef_HTML_Length();
|
$this->info['Length'] = new HTMLPurifier_AttrDef_HTML_Length();
|
||||||
@@ -32,10 +41,42 @@ class HTMLPurifier_AttrTypes
|
|||||||
$this->info['Pixels'] = new HTMLPurifier_AttrDef_HTML_Pixels();
|
$this->info['Pixels'] = new HTMLPurifier_AttrDef_HTML_Pixels();
|
||||||
$this->info['Text'] = new HTMLPurifier_AttrDef_Text();
|
$this->info['Text'] = new HTMLPurifier_AttrDef_Text();
|
||||||
$this->info['URI'] = new HTMLPurifier_AttrDef_URI();
|
$this->info['URI'] = new HTMLPurifier_AttrDef_URI();
|
||||||
|
$this->info['LanguageCode'] = new HTMLPurifier_AttrDef_Lang();
|
||||||
|
$this->info['Color'] = new HTMLPurifier_AttrDef_HTML_Color();
|
||||||
|
|
||||||
// number is really a positive integer (one or more digits)
|
// number is really a positive integer (one or more digits)
|
||||||
|
// FIXME: ^^ not always, see start and value of list items
|
||||||
$this->info['Number'] = new HTMLPurifier_AttrDef_Integer(false, false, true);
|
$this->info['Number'] = new HTMLPurifier_AttrDef_Integer(false, false, true);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Retrieves a type
|
||||||
|
* @param $type String type name
|
||||||
|
* @return Object AttrDef for type
|
||||||
|
*/
|
||||||
|
function get($type) {
|
||||||
|
|
||||||
|
// determine if there is any extra info tacked on
|
||||||
|
if (strpos($type, '#') !== false) list($type, $string) = explode('#', $type, 2);
|
||||||
|
else $string = '';
|
||||||
|
|
||||||
|
if (!isset($this->info[$type])) {
|
||||||
|
trigger_error('Cannot retrieve undefined attribute type ' . $type, E_USER_ERROR);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
return $this->info[$type]->make($string);
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Sets a new implementation for a type
|
||||||
|
* @param $type String type name
|
||||||
|
* @param $impl Object AttrDef for type
|
||||||
|
*/
|
||||||
|
function set($type, $impl) {
|
||||||
|
$this->info[$type] = $impl;
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
?>
|
?>
|
||||||
|
105
library/HTMLPurifier/AttrValidator.php
Normal file
105
library/HTMLPurifier/AttrValidator.php
Normal file
@@ -0,0 +1,105 @@
|
|||||||
|
<?php
|
||||||
|
|
||||||
|
class HTMLPurifier_AttrValidator
|
||||||
|
{
|
||||||
|
|
||||||
|
|
||||||
|
function validateToken($token, &$config, &$context) {
|
||||||
|
|
||||||
|
$definition = $config->getHTMLDefinition();
|
||||||
|
|
||||||
|
// create alias to global definition array, see also $defs
|
||||||
|
// DEFINITION CALL
|
||||||
|
$d_defs = $definition->info_global_attr;
|
||||||
|
|
||||||
|
// copy out attributes for easy manipulation
|
||||||
|
$attr = $token->attr;
|
||||||
|
|
||||||
|
// do global transformations (pre)
|
||||||
|
// nothing currently utilizes this
|
||||||
|
foreach ($definition->info_attr_transform_pre as $transform) {
|
||||||
|
$attr = $transform->transform($attr, $config, $context);
|
||||||
|
}
|
||||||
|
|
||||||
|
// do local transformations only applicable to this element (pre)
|
||||||
|
// ex. <p align="right"> to <p style="text-align:right;">
|
||||||
|
foreach ($definition->info[$token->name]->attr_transform_pre
|
||||||
|
as $transform
|
||||||
|
) {
|
||||||
|
$attr = $transform->transform($attr, $config, $context);
|
||||||
|
}
|
||||||
|
|
||||||
|
// create alias to this element's attribute definition array, see
|
||||||
|
// also $d_defs (global attribute definition array)
|
||||||
|
// DEFINITION CALL
|
||||||
|
$defs = $definition->info[$token->name]->attr;
|
||||||
|
|
||||||
|
// iterate through all the attribute keypairs
|
||||||
|
// Watch out for name collisions: $key has previously been used
|
||||||
|
foreach ($attr as $attr_key => $value) {
|
||||||
|
|
||||||
|
// call the definition
|
||||||
|
if ( isset($defs[$attr_key]) ) {
|
||||||
|
// there is a local definition defined
|
||||||
|
if ($defs[$attr_key] === false) {
|
||||||
|
// We've explicitly been told not to allow this element.
|
||||||
|
// This is usually when there's a global definition
|
||||||
|
// that must be overridden.
|
||||||
|
// Theoretically speaking, we could have a
|
||||||
|
// AttrDef_DenyAll, but this is faster!
|
||||||
|
$result = false;
|
||||||
|
} else {
|
||||||
|
// validate according to the element's definition
|
||||||
|
$result = $defs[$attr_key]->validate(
|
||||||
|
$value, $config, $context
|
||||||
|
);
|
||||||
|
}
|
||||||
|
} elseif ( isset($d_defs[$attr_key]) ) {
|
||||||
|
// there is a global definition defined, validate according
|
||||||
|
// to the global definition
|
||||||
|
$result = $d_defs[$attr_key]->validate(
|
||||||
|
$value, $config, $context
|
||||||
|
);
|
||||||
|
} else {
|
||||||
|
// system never heard of the attribute? DELETE!
|
||||||
|
$result = false;
|
||||||
|
}
|
||||||
|
|
||||||
|
// put the results into effect
|
||||||
|
if ($result === false || $result === null) {
|
||||||
|
// remove the attribute
|
||||||
|
unset($attr[$attr_key]);
|
||||||
|
} elseif (is_string($result)) {
|
||||||
|
// simple substitution
|
||||||
|
$attr[$attr_key] = $result;
|
||||||
|
}
|
||||||
|
|
||||||
|
// we'd also want slightly more complicated substitution
|
||||||
|
// involving an array as the return value,
|
||||||
|
// although we're not sure how colliding attributes would
|
||||||
|
// resolve (certain ones would be completely overriden,
|
||||||
|
// others would prepend themselves).
|
||||||
|
}
|
||||||
|
|
||||||
|
// post transforms
|
||||||
|
|
||||||
|
// ex. <x lang="fr"> to <x lang="fr" xml:lang="fr">
|
||||||
|
foreach ($definition->info_attr_transform_post as $transform) {
|
||||||
|
$attr = $transform->transform($attr, $config, $context);
|
||||||
|
}
|
||||||
|
|
||||||
|
// ex. <bdo> to <bdo dir="ltr">
|
||||||
|
foreach ($definition->info[$token->name]->attr_transform_post as $transform) {
|
||||||
|
$attr = $transform->transform($attr, $config, $context);
|
||||||
|
}
|
||||||
|
|
||||||
|
// commit changes
|
||||||
|
$token->attr = $attr;
|
||||||
|
return $token;
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
?>
|
@@ -1,5 +1,7 @@
|
|||||||
<?php
|
<?php
|
||||||
|
|
||||||
|
require_once 'HTMLPurifier/Definition.php';
|
||||||
|
|
||||||
require_once 'HTMLPurifier/AttrDef/CSS/Background.php';
|
require_once 'HTMLPurifier/AttrDef/CSS/Background.php';
|
||||||
require_once 'HTMLPurifier/AttrDef/CSS/BackgroundPosition.php';
|
require_once 'HTMLPurifier/AttrDef/CSS/BackgroundPosition.php';
|
||||||
require_once 'HTMLPurifier/AttrDef/CSS/Border.php';
|
require_once 'HTMLPurifier/AttrDef/CSS/Border.php';
|
||||||
@@ -15,13 +17,24 @@ require_once 'HTMLPurifier/AttrDef/CSS/TextDecoration.php';
|
|||||||
require_once 'HTMLPurifier/AttrDef/CSS/URI.php';
|
require_once 'HTMLPurifier/AttrDef/CSS/URI.php';
|
||||||
require_once 'HTMLPurifier/AttrDef/Enum.php';
|
require_once 'HTMLPurifier/AttrDef/Enum.php';
|
||||||
|
|
||||||
|
HTMLPurifier_ConfigSchema::define(
|
||||||
|
'CSS', 'DefinitionRev', 1, 'int', '
|
||||||
|
<p>
|
||||||
|
Revision identifier for your custom definition. See
|
||||||
|
%HTML.DefinitionRev for details. This directive has been available
|
||||||
|
since 2.0.0.
|
||||||
|
</p>
|
||||||
|
');
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Defines allowed CSS attributes and what their values are.
|
* Defines allowed CSS attributes and what their values are.
|
||||||
* @see HTMLPurifier_HTMLDefinition
|
* @see HTMLPurifier_HTMLDefinition
|
||||||
*/
|
*/
|
||||||
class HTMLPurifier_CSSDefinition
|
class HTMLPurifier_CSSDefinition extends HTMLPurifier_Definition
|
||||||
{
|
{
|
||||||
|
|
||||||
|
var $type = 'CSS';
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Assoc array of attribute name to definition object.
|
* Assoc array of attribute name to definition object.
|
||||||
*/
|
*/
|
||||||
@@ -30,7 +43,7 @@ class HTMLPurifier_CSSDefinition
|
|||||||
/**
|
/**
|
||||||
* Constructs the info array. The meat of this class.
|
* Constructs the info array. The meat of this class.
|
||||||
*/
|
*/
|
||||||
function setup($config) {
|
function doSetup($config) {
|
||||||
|
|
||||||
$this->info['text-align'] = new HTMLPurifier_AttrDef_Enum(
|
$this->info['text-align'] = new HTMLPurifier_AttrDef_Enum(
|
||||||
array('left', 'right', 'center', 'justify'), false);
|
array('left', 'right', 'center', 'justify'), false);
|
||||||
|
@@ -38,8 +38,21 @@ class HTMLPurifier_ChildDef_Custom extends HTMLPurifier_ChildDef
|
|||||||
if ($raw{0} != '(') {
|
if ($raw{0} != '(') {
|
||||||
$raw = "($raw)";
|
$raw = "($raw)";
|
||||||
}
|
}
|
||||||
$reg = str_replace(',', ',?', $raw);
|
$el = '[#a-zA-Z0-9_.-]+';
|
||||||
$reg = preg_replace('/([#a-zA-Z0-9_.-]+)/', '(,?\\0)', $reg);
|
$reg = $raw;
|
||||||
|
|
||||||
|
// COMPLICATED! AND MIGHT BE BUGGY! I HAVE NO CLUE WHAT I'M
|
||||||
|
// DOING! Seriously: if there's problems, please report them.
|
||||||
|
|
||||||
|
// setup all elements as parentheticals with leading commas
|
||||||
|
$reg = preg_replace("/$el/", '(,\\0)', $reg);
|
||||||
|
|
||||||
|
// remove commas when they were not solicited
|
||||||
|
$reg = preg_replace("/([^,(|]\(+),/", '\\1', $reg);
|
||||||
|
|
||||||
|
// remove all non-paranthetical commas: they are handled by first regex
|
||||||
|
$reg = preg_replace("/,\(/", '(', $reg);
|
||||||
|
|
||||||
$this->_pcre_regex = $reg;
|
$this->_pcre_regex = $reg;
|
||||||
}
|
}
|
||||||
function validateChildren($tokens_of_children, $config, &$context) {
|
function validateChildren($tokens_of_children, $config, &$context) {
|
||||||
@@ -60,11 +73,11 @@ class HTMLPurifier_ChildDef_Custom extends HTMLPurifier_ChildDef
|
|||||||
$list_of_children .= $token->name . ',';
|
$list_of_children .= $token->name . ',';
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
$list_of_children = rtrim($list_of_children, ',');
|
// add leading comma to deal with stray comma declarations
|
||||||
|
$list_of_children = ',' . rtrim($list_of_children, ',');
|
||||||
$okay =
|
$okay =
|
||||||
preg_match(
|
preg_match(
|
||||||
'/^'.$this->_pcre_regex.'$/',
|
'/^,?'.$this->_pcre_regex.'$/',
|
||||||
$list_of_children
|
$list_of_children
|
||||||
);
|
);
|
||||||
|
|
||||||
|
@@ -29,7 +29,6 @@ class HTMLPurifier_ChildDef_Required extends HTMLPurifier_ChildDef
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
$this->elements = $elements;
|
$this->elements = $elements;
|
||||||
$this->gen = new HTMLPurifier_Generator();
|
|
||||||
}
|
}
|
||||||
var $allow_empty = false;
|
var $allow_empty = false;
|
||||||
var $type = 'required';
|
var $type = 'required';
|
||||||
@@ -57,6 +56,12 @@ class HTMLPurifier_ChildDef_Required extends HTMLPurifier_ChildDef
|
|||||||
// some configuration
|
// some configuration
|
||||||
$escape_invalid_children = $config->get('Core', 'EscapeInvalidChildren');
|
$escape_invalid_children = $config->get('Core', 'EscapeInvalidChildren');
|
||||||
|
|
||||||
|
// generator
|
||||||
|
static $gen = null;
|
||||||
|
if ($gen === null) {
|
||||||
|
$gen = new HTMLPurifier_Generator();
|
||||||
|
}
|
||||||
|
|
||||||
foreach ($tokens_of_children as $token) {
|
foreach ($tokens_of_children as $token) {
|
||||||
if (!empty($token->is_whitespace)) {
|
if (!empty($token->is_whitespace)) {
|
||||||
$result[] = $token;
|
$result[] = $token;
|
||||||
@@ -80,7 +85,7 @@ class HTMLPurifier_ChildDef_Required extends HTMLPurifier_ChildDef
|
|||||||
$result[] = $token;
|
$result[] = $token;
|
||||||
} elseif ($pcdata_allowed && $escape_invalid_children) {
|
} elseif ($pcdata_allowed && $escape_invalid_children) {
|
||||||
$result[] = new HTMLPurifier_Token_Text(
|
$result[] = new HTMLPurifier_Token_Text(
|
||||||
$this->gen->generateFromToken($token, $config)
|
$gen->generateFromToken($token, $config)
|
||||||
);
|
);
|
||||||
}
|
}
|
||||||
continue;
|
continue;
|
||||||
@@ -91,7 +96,7 @@ class HTMLPurifier_ChildDef_Required extends HTMLPurifier_ChildDef
|
|||||||
} elseif ($pcdata_allowed && $escape_invalid_children) {
|
} elseif ($pcdata_allowed && $escape_invalid_children) {
|
||||||
$result[] =
|
$result[] =
|
||||||
new HTMLPurifier_Token_Text(
|
new HTMLPurifier_Token_Text(
|
||||||
$this->gen->generateFromToken( $token, $config )
|
$gen->generateFromToken( $token, $config )
|
||||||
);
|
);
|
||||||
} else {
|
} else {
|
||||||
// drop silently
|
// drop silently
|
||||||
|
@@ -45,8 +45,8 @@ extends HTMLPurifier_ChildDef_Required
|
|||||||
if (!$is_inline) {
|
if (!$is_inline) {
|
||||||
if (!$depth) {
|
if (!$depth) {
|
||||||
if (
|
if (
|
||||||
$token->type == 'text' ||
|
($token->type == 'text' && !$token->is_whitespace) ||
|
||||||
!isset($this->elements[$token->name])
|
($token->type != 'text' && !isset($this->elements[$token->name]))
|
||||||
) {
|
) {
|
||||||
$is_inline = true;
|
$is_inline = true;
|
||||||
$ret[] = $block_wrap_start;
|
$ret[] = $block_wrap_start;
|
||||||
|
@@ -1,5 +1,28 @@
|
|||||||
<?php
|
<?php
|
||||||
|
|
||||||
|
require_once 'HTMLPurifier/ConfigSchema.php';
|
||||||
|
|
||||||
|
// member variables
|
||||||
|
require_once 'HTMLPurifier/HTMLDefinition.php';
|
||||||
|
require_once 'HTMLPurifier/CSSDefinition.php';
|
||||||
|
require_once 'HTMLPurifier/Doctype.php';
|
||||||
|
require_once 'HTMLPurifier/DefinitionCacheFactory.php';
|
||||||
|
|
||||||
|
// accomodations for versions earlier than 4.3.10 and 5.0.2
|
||||||
|
// borrowed from PHP_Compat, LGPL licensed, by Aidan Lister <aidan@php.net>
|
||||||
|
if (!defined('PHP_EOL')) {
|
||||||
|
switch (strtoupper(substr(PHP_OS, 0, 3))) {
|
||||||
|
case 'WIN':
|
||||||
|
define('PHP_EOL', "\r\n");
|
||||||
|
break;
|
||||||
|
case 'DAR':
|
||||||
|
define('PHP_EOL', "\r");
|
||||||
|
break;
|
||||||
|
default:
|
||||||
|
define('PHP_EOL', "\n");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Configuration object that triggers customizable behavior.
|
* Configuration object that triggers customizable behavior.
|
||||||
*
|
*
|
||||||
@@ -15,6 +38,11 @@
|
|||||||
class HTMLPurifier_Config
|
class HTMLPurifier_Config
|
||||||
{
|
{
|
||||||
|
|
||||||
|
/**
|
||||||
|
* HTML Purifier's version
|
||||||
|
*/
|
||||||
|
var $version = '2.0.0';
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Two-level associative array of configuration directives
|
* Two-level associative array of configuration directives
|
||||||
*/
|
*/
|
||||||
@@ -26,14 +54,26 @@ class HTMLPurifier_Config
|
|||||||
var $def;
|
var $def;
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Cached instance of HTMLPurifier_HTMLDefinition
|
* Indexed array of definitions
|
||||||
*/
|
*/
|
||||||
var $html_definition;
|
var $definitions;
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Cached instance of HTMLPurifier_CSSDefinition
|
* Bool indicator whether or not config is finalized
|
||||||
*/
|
*/
|
||||||
var $css_definition;
|
var $finalized = false;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Bool indicator whether or not to automatically finalize
|
||||||
|
* the object if a read operation is done
|
||||||
|
*/
|
||||||
|
var $autoFinalize = true;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Namespace indexed array of serials for specific namespaces (see
|
||||||
|
* getSerial for more info).
|
||||||
|
*/
|
||||||
|
var $serials = array();
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* @param $definition HTMLPurifier_ConfigSchema that defines what directives
|
* @param $definition HTMLPurifier_ConfigSchema that defines what directives
|
||||||
@@ -58,6 +98,7 @@ class HTMLPurifier_Config
|
|||||||
$ret = HTMLPurifier_Config::createDefault();
|
$ret = HTMLPurifier_Config::createDefault();
|
||||||
if (is_string($config)) $ret->loadIni($config);
|
if (is_string($config)) $ret->loadIni($config);
|
||||||
elseif (is_array($config)) $ret->loadArray($config);
|
elseif (is_array($config)) $ret->loadArray($config);
|
||||||
|
if (isset($revision)) $ret->revision = $revision;
|
||||||
return $ret;
|
return $ret;
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -78,13 +119,16 @@ class HTMLPurifier_Config
|
|||||||
* @param $key String key
|
* @param $key String key
|
||||||
*/
|
*/
|
||||||
function get($namespace, $key, $from_alias = false) {
|
function get($namespace, $key, $from_alias = false) {
|
||||||
|
if (!$this->finalized && $this->autoFinalize) $this->finalize();
|
||||||
if (!isset($this->def->info[$namespace][$key])) {
|
if (!isset($this->def->info[$namespace][$key])) {
|
||||||
trigger_error('Cannot retrieve value of undefined directive',
|
// can't add % due to SimpleTest bug
|
||||||
|
trigger_error('Cannot retrieve value of undefined directive ' . htmlspecialchars("$namespace.$key"),
|
||||||
E_USER_WARNING);
|
E_USER_WARNING);
|
||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
if ($this->def->info[$namespace][$key]->class == 'alias') {
|
if ($this->def->info[$namespace][$key]->class == 'alias') {
|
||||||
trigger_error('Cannot get value from aliased directive, use real name',
|
$d = $this->def->info[$namespace][$key];
|
||||||
|
trigger_error('Cannot get value from aliased directive, use real name ' . $d->namespace . '.' . $d->name,
|
||||||
E_USER_ERROR);
|
E_USER_ERROR);
|
||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
@@ -96,14 +140,35 @@ class HTMLPurifier_Config
|
|||||||
* @param $namespace String namespace
|
* @param $namespace String namespace
|
||||||
*/
|
*/
|
||||||
function getBatch($namespace) {
|
function getBatch($namespace) {
|
||||||
|
if (!$this->finalized && $this->autoFinalize) $this->finalize();
|
||||||
if (!isset($this->def->info[$namespace])) {
|
if (!isset($this->def->info[$namespace])) {
|
||||||
trigger_error('Cannot retrieve undefined namespace',
|
trigger_error('Cannot retrieve undefined namespace ' . htmlspecialchars($namespace),
|
||||||
E_USER_WARNING);
|
E_USER_WARNING);
|
||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
return $this->conf[$namespace];
|
return $this->conf[$namespace];
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Returns a md5 signature of a segment of the configuration object
|
||||||
|
* that uniquely identifies that particular configuration
|
||||||
|
* @param $namespace Namespace to get serial for
|
||||||
|
*/
|
||||||
|
function getBatchSerial($namespace) {
|
||||||
|
if (empty($this->serials[$namespace])) {
|
||||||
|
$this->serials[$namespace] = md5(serialize($this->getBatch($namespace)));
|
||||||
|
}
|
||||||
|
return $this->serials[$namespace];
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Retrieves all directives, organized by namespace
|
||||||
|
*/
|
||||||
|
function getAll() {
|
||||||
|
if (!$this->finalized && $this->autoFinalize) $this->finalize();
|
||||||
|
return $this->conf;
|
||||||
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Sets a value to configuration.
|
* Sets a value to configuration.
|
||||||
* @param $namespace String namespace
|
* @param $namespace String namespace
|
||||||
@@ -111,15 +176,16 @@ class HTMLPurifier_Config
|
|||||||
* @param $value Mixed value
|
* @param $value Mixed value
|
||||||
*/
|
*/
|
||||||
function set($namespace, $key, $value, $from_alias = false) {
|
function set($namespace, $key, $value, $from_alias = false) {
|
||||||
|
if ($this->isFinalized('Cannot set directive after finalization')) return;
|
||||||
if (!isset($this->def->info[$namespace][$key])) {
|
if (!isset($this->def->info[$namespace][$key])) {
|
||||||
trigger_error('Cannot set undefined directive to value',
|
trigger_error('Cannot set undefined directive ' . htmlspecialchars("$namespace.$key") . ' to value',
|
||||||
E_USER_WARNING);
|
E_USER_WARNING);
|
||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
if ($this->def->info[$namespace][$key]->class == 'alias') {
|
if ($this->def->info[$namespace][$key]->class == 'alias') {
|
||||||
if ($from_alias) {
|
if ($from_alias) {
|
||||||
trigger_error('Double-aliases not allowed, please fix '.
|
trigger_error('Double-aliases not allowed, please fix '.
|
||||||
'ConfigSchema bug');
|
'ConfigSchema bug with' . "$namespace.$key");
|
||||||
}
|
}
|
||||||
$this->set($this->def->info[$namespace][$key]->namespace,
|
$this->set($this->def->info[$namespace][$key]->namespace,
|
||||||
$this->def->info[$namespace][$key]->name,
|
$this->def->info[$namespace][$key]->name,
|
||||||
@@ -128,7 +194,7 @@ class HTMLPurifier_Config
|
|||||||
}
|
}
|
||||||
$value = $this->def->validate(
|
$value = $this->def->validate(
|
||||||
$value,
|
$value,
|
||||||
$this->def->info[$namespace][$key]->type,
|
$type = $this->def->info[$namespace][$key]->type,
|
||||||
$this->def->info[$namespace][$key]->allow_null
|
$this->def->info[$namespace][$key]->allow_null
|
||||||
);
|
);
|
||||||
if (is_string($value)) {
|
if (is_string($value)) {
|
||||||
@@ -139,23 +205,36 @@ class HTMLPurifier_Config
|
|||||||
if ($this->def->info[$namespace][$key]->allowed !== true) {
|
if ($this->def->info[$namespace][$key]->allowed !== true) {
|
||||||
// check to see if the value is allowed
|
// check to see if the value is allowed
|
||||||
if (!isset($this->def->info[$namespace][$key]->allowed[$value])) {
|
if (!isset($this->def->info[$namespace][$key]->allowed[$value])) {
|
||||||
trigger_error('Value not supported', E_USER_WARNING);
|
trigger_error('Value not supported, valid values are: ' .
|
||||||
|
$this->_listify($this->def->info[$namespace][$key]->allowed), E_USER_WARNING);
|
||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
if ($this->def->isError($value)) {
|
if ($this->def->isError($value)) {
|
||||||
trigger_error('Value is of invalid type', E_USER_WARNING);
|
trigger_error('Value for ' . "$namespace.$key" . ' is of invalid type, should be ' . $type, E_USER_WARNING);
|
||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
$this->conf[$namespace][$key] = $value;
|
$this->conf[$namespace][$key] = $value;
|
||||||
if ($namespace == 'HTML' || $namespace == 'Attr') {
|
|
||||||
// reset HTML definition if relevant attributes changed
|
// reset definitions if the directives they depend on changed
|
||||||
$this->html_definition = null;
|
// this is a very costly process, so it's discouraged
|
||||||
}
|
// with finalization
|
||||||
if ($namespace == 'CSS') {
|
if ($namespace == 'HTML' || $namespace == 'CSS') {
|
||||||
$this->css_definition = null;
|
$this->definitions[$namespace] = null;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
$this->serials[$namespace] = false;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Convenience function for error reporting
|
||||||
|
* @private
|
||||||
|
*/
|
||||||
|
function _listify($lookup) {
|
||||||
|
$list = array();
|
||||||
|
foreach ($lookup as $name => $b) $list[] = $name;
|
||||||
|
return implode(', ', $list);
|
||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
@@ -164,26 +243,71 @@ class HTMLPurifier_Config
|
|||||||
* called before it's been setup, otherwise won't work.
|
* called before it's been setup, otherwise won't work.
|
||||||
*/
|
*/
|
||||||
function &getHTMLDefinition($raw = false) {
|
function &getHTMLDefinition($raw = false) {
|
||||||
if (
|
return $this->getDefinition('HTML', $raw);
|
||||||
empty($this->html_definition) || // hasn't ever been setup
|
|
||||||
($raw && $this->html_definition->setup) // requesting new one
|
|
||||||
) {
|
|
||||||
$this->html_definition = new HTMLPurifier_HTMLDefinition($this);
|
|
||||||
if ($raw) return $this->html_definition; // no setup!
|
|
||||||
}
|
|
||||||
if (!$this->html_definition->setup) $this->html_definition->setup();
|
|
||||||
return $this->html_definition;
|
|
||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Retrieves reference to the CSS definition
|
* Retrieves reference to the CSS definition
|
||||||
*/
|
*/
|
||||||
function &getCSSDefinition() {
|
function &getCSSDefinition($raw = false) {
|
||||||
if ($this->css_definition === null) {
|
return $this->getDefinition('CSS', $raw);
|
||||||
$this->css_definition = new HTMLPurifier_CSSDefinition();
|
}
|
||||||
$this->css_definition->setup($this);
|
|
||||||
|
/**
|
||||||
|
* Retrieves a definition
|
||||||
|
* @param $type Type of definition: HTML, CSS, etc
|
||||||
|
* @param $raw Whether or not definition should be returned raw
|
||||||
|
*/
|
||||||
|
function &getDefinition($type, $raw = false) {
|
||||||
|
if (!$this->finalized && $this->autoFinalize) $this->finalize();
|
||||||
|
$factory = HTMLPurifier_DefinitionCacheFactory::instance();
|
||||||
|
$cache = $factory->create($type, $this);
|
||||||
|
if (!$raw) {
|
||||||
|
// see if we can quickly supply a definition
|
||||||
|
if (!empty($this->definitions[$type])) {
|
||||||
|
if (!$this->definitions[$type]->setup) {
|
||||||
|
$this->definitions[$type]->setup($this);
|
||||||
|
}
|
||||||
|
return $this->definitions[$type];
|
||||||
|
}
|
||||||
|
// memory check missed, try cache
|
||||||
|
$this->definitions[$type] = $cache->get($this);
|
||||||
|
if ($this->definitions[$type]) {
|
||||||
|
// definition in cache, return it
|
||||||
|
return $this->definitions[$type];
|
||||||
|
}
|
||||||
|
} elseif (
|
||||||
|
!empty($this->definitions[$type]) &&
|
||||||
|
!$this->definitions[$type]->setup
|
||||||
|
) {
|
||||||
|
// raw requested, raw in memory, quick return
|
||||||
|
return $this->definitions[$type];
|
||||||
}
|
}
|
||||||
return $this->css_definition;
|
// quick checks failed, let's create the object
|
||||||
|
if ($type == 'HTML') {
|
||||||
|
$this->definitions[$type] = new HTMLPurifier_HTMLDefinition();
|
||||||
|
} elseif ($type == 'CSS') {
|
||||||
|
$this->definitions[$type] = new HTMLPurifier_CSSDefinition();
|
||||||
|
} else {
|
||||||
|
trigger_error("Definition of $type type not supported");
|
||||||
|
$false = false;
|
||||||
|
return $false;
|
||||||
|
}
|
||||||
|
// quick abort if raw
|
||||||
|
if ($raw) {
|
||||||
|
if (is_null($this->get($type, 'DefinitionID'))) {
|
||||||
|
// fatally error out if definition ID not set
|
||||||
|
trigger_error("Cannot retrieve raw version without specifying %$type.DefinitionID", E_USER_ERROR);
|
||||||
|
$false = false;
|
||||||
|
return $false;
|
||||||
|
}
|
||||||
|
return $this->definitions[$type];
|
||||||
|
}
|
||||||
|
// set it up
|
||||||
|
$this->definitions[$type]->setup($this);
|
||||||
|
// save in cache
|
||||||
|
$cache->set($this->definitions[$type], $this);
|
||||||
|
return $this->definitions[$type];
|
||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
@@ -192,6 +316,7 @@ class HTMLPurifier_Config
|
|||||||
* @param $config_array Configuration associative array
|
* @param $config_array Configuration associative array
|
||||||
*/
|
*/
|
||||||
function loadArray($config_array) {
|
function loadArray($config_array) {
|
||||||
|
if ($this->isFinalized('Cannot load directives after finalization')) return;
|
||||||
foreach ($config_array as $key => $value) {
|
foreach ($config_array as $key => $value) {
|
||||||
$key = str_replace('_', '.', $key);
|
$key = str_replace('_', '.', $key);
|
||||||
if (strpos($key, '.') !== false) {
|
if (strpos($key, '.') !== false) {
|
||||||
@@ -208,15 +333,63 @@ class HTMLPurifier_Config
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Loads configuration values from $_GET/$_POST that were posted
|
||||||
|
* via ConfigForm
|
||||||
|
* @param $array $_GET or $_POST array to import
|
||||||
|
* @param $index Index/name that the config variables are in
|
||||||
|
* @param $mq_fix Boolean whether or not to enable magic quotes fix
|
||||||
|
* @static
|
||||||
|
*/
|
||||||
|
function loadArrayFromForm($array, $index, $mq_fix = true) {
|
||||||
|
$array = (isset($array[$index]) && is_array($array[$index])) ? $array[$index] : array();
|
||||||
|
$mq = get_magic_quotes_gpc() && $mq_fix;
|
||||||
|
foreach ($array as $key => $value) {
|
||||||
|
if (!strncmp($key, 'Null_', 5) && !empty($value)) {
|
||||||
|
unset($array[substr($key, 5)]);
|
||||||
|
unset($array[$key]);
|
||||||
|
}
|
||||||
|
if ($mq) $array[$key] = stripslashes($value);
|
||||||
|
}
|
||||||
|
return @HTMLPurifier_Config::create($array);
|
||||||
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Loads configuration values from an ini file
|
* Loads configuration values from an ini file
|
||||||
* @param $filename Name of ini file
|
* @param $filename Name of ini file
|
||||||
*/
|
*/
|
||||||
function loadIni($filename) {
|
function loadIni($filename) {
|
||||||
|
if ($this->isFinalized('Cannot load directives after finalization')) return;
|
||||||
$array = parse_ini_file($filename, true);
|
$array = parse_ini_file($filename, true);
|
||||||
$this->loadArray($array);
|
$this->loadArray($array);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Checks whether or not the configuration object is finalized.
|
||||||
|
* @param $error String error message, or false for no error
|
||||||
|
*/
|
||||||
|
function isFinalized($error = false) {
|
||||||
|
if ($this->finalized && $error) {
|
||||||
|
trigger_error($error, E_USER_ERROR);
|
||||||
|
}
|
||||||
|
return $this->finalized;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Finalizes configuration only if auto finalize is on and not
|
||||||
|
* already finalized
|
||||||
|
*/
|
||||||
|
function autoFinalize() {
|
||||||
|
if (!$this->finalized && $this->autoFinalize) $this->finalize();
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Finalizes a configuration object, prohibiting further change
|
||||||
|
*/
|
||||||
|
function finalize() {
|
||||||
|
$this->finalized = true;
|
||||||
|
}
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|
||||||
?>
|
?>
|
||||||
|
@@ -8,6 +8,7 @@ require_once 'HTMLPurifier/ConfigDef/DirectiveAlias.php';
|
|||||||
|
|
||||||
/**
|
/**
|
||||||
* Configuration definition, defines directives and their defaults.
|
* Configuration definition, defines directives and their defaults.
|
||||||
|
* @note If you update this, please update Printer_ConfigForm
|
||||||
* @todo The ability to define things multiple times is confusing and should
|
* @todo The ability to define things multiple times is confusing and should
|
||||||
* be factored out to its own function named registerDependency() or
|
* be factored out to its own function named registerDependency() or
|
||||||
* addNote(), where only the namespace.name and an extra descriptions
|
* addNote(), where only the namespace.name and an extra descriptions
|
||||||
@@ -66,6 +67,8 @@ class HTMLPurifier_ConfigSchema {
|
|||||||
$this->defineNamespace('URI', 'Features regarding Uniform Resource Identifiers.');
|
$this->defineNamespace('URI', 'Features regarding Uniform Resource Identifiers.');
|
||||||
$this->defineNamespace('HTML', 'Configuration regarding allowed HTML.');
|
$this->defineNamespace('HTML', 'Configuration regarding allowed HTML.');
|
||||||
$this->defineNamespace('CSS', 'Configuration regarding allowed CSS.');
|
$this->defineNamespace('CSS', 'Configuration regarding allowed CSS.');
|
||||||
|
$this->defineNamespace('Output', 'Configuration relating to the generation of (X)HTML.');
|
||||||
|
$this->defineNamespace('Cache', 'Configuration for DefinitionCache and related subclasses.');
|
||||||
$this->defineNamespace('Test', 'Developer testing configuration for our unit tests.');
|
$this->defineNamespace('Test', 'Developer testing configuration for our unit tests.');
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -303,6 +306,7 @@ class HTMLPurifier_ConfigSchema {
|
|||||||
if ($allow_null && $var === null) return null;
|
if ($allow_null && $var === null) return null;
|
||||||
switch ($type) {
|
switch ($type) {
|
||||||
case 'mixed':
|
case 'mixed':
|
||||||
|
//if (is_string($var)) $var = unserialize($var);
|
||||||
return $var;
|
return $var;
|
||||||
case 'istring':
|
case 'istring':
|
||||||
case 'string':
|
case 'string':
|
||||||
@@ -343,6 +347,16 @@ class HTMLPurifier_ConfigSchema {
|
|||||||
$var = explode(',',$var);
|
$var = explode(',',$var);
|
||||||
// remove spaces
|
// remove spaces
|
||||||
foreach ($var as $i => $j) $var[$i] = trim($j);
|
foreach ($var as $i => $j) $var[$i] = trim($j);
|
||||||
|
if ($type === 'hash') {
|
||||||
|
// key:value,key2:value2
|
||||||
|
$nvar = array();
|
||||||
|
foreach ($var as $keypair) {
|
||||||
|
$c = explode(':', $keypair, 2);
|
||||||
|
if (!isset($c[1])) continue;
|
||||||
|
$nvar[$c[0]] = $c[1];
|
||||||
|
}
|
||||||
|
$var = $nvar;
|
||||||
|
}
|
||||||
}
|
}
|
||||||
if (!is_array($var)) break;
|
if (!is_array($var)) break;
|
||||||
$keys = array_keys($var);
|
$keys = array_keys($var);
|
||||||
|
@@ -6,6 +6,8 @@ require_once 'HTMLPurifier/ChildDef/Empty.php';
|
|||||||
require_once 'HTMLPurifier/ChildDef/Required.php';
|
require_once 'HTMLPurifier/ChildDef/Required.php';
|
||||||
require_once 'HTMLPurifier/ChildDef/Optional.php';
|
require_once 'HTMLPurifier/ChildDef/Optional.php';
|
||||||
|
|
||||||
|
// NOT UNIT TESTED!!!
|
||||||
|
|
||||||
class HTMLPurifier_ContentSets
|
class HTMLPurifier_ContentSets
|
||||||
{
|
{
|
||||||
|
|
||||||
|
41
library/HTMLPurifier/Definition.php
Normal file
41
library/HTMLPurifier/Definition.php
Normal file
@@ -0,0 +1,41 @@
|
|||||||
|
<?php
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Super-class for definition datatype objects, implements serialization
|
||||||
|
* functions for the class.
|
||||||
|
*/
|
||||||
|
class HTMLPurifier_Definition
|
||||||
|
{
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Has setup() been called yet?
|
||||||
|
*/
|
||||||
|
var $setup = false;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* What type of definition is it?
|
||||||
|
*/
|
||||||
|
var $type;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Sets up the definition object into the final form, something
|
||||||
|
* not done by the constructor
|
||||||
|
* @param $config HTMLPurifier_Config instance
|
||||||
|
*/
|
||||||
|
function doSetup($config) {
|
||||||
|
trigger_error('Cannot call abstract method', E_USER_ERROR);
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Setup function that aborts if already setup
|
||||||
|
* @param $config HTMLPurifier_Config instance
|
||||||
|
*/
|
||||||
|
function setup($config) {
|
||||||
|
if ($this->setup) return;
|
||||||
|
$this->setup = true;
|
||||||
|
$this->doSetup($config);
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
?>
|
121
library/HTMLPurifier/DefinitionCache.php
Normal file
121
library/HTMLPurifier/DefinitionCache.php
Normal file
@@ -0,0 +1,121 @@
|
|||||||
|
<?php
|
||||||
|
|
||||||
|
require_once 'HTMLPurifier/DefinitionCache/Serializer.php';
|
||||||
|
require_once 'HTMLPurifier/DefinitionCache/Null.php';
|
||||||
|
|
||||||
|
require_once 'HTMLPurifier/DefinitionCache/Decorator.php';
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Abstract class representing Definition cache managers that implements
|
||||||
|
* useful common methods and is a factory.
|
||||||
|
* @todo Get some sort of versioning variable so the library can easily
|
||||||
|
* invalidate the cache with a new version
|
||||||
|
* @todo Make the test runner cache aware and allow the user to easily
|
||||||
|
* flush the cache
|
||||||
|
* @todo Create a separate maintenance file advanced users can use to
|
||||||
|
* cache their custom HTMLDefinition, which can be loaded
|
||||||
|
* via a configuration directive
|
||||||
|
* @todo Implement memcached
|
||||||
|
*/
|
||||||
|
class HTMLPurifier_DefinitionCache
|
||||||
|
{
|
||||||
|
|
||||||
|
var $type;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* @param $name Type of definition objects this instance of the
|
||||||
|
* cache will handle.
|
||||||
|
*/
|
||||||
|
function HTMLPurifier_DefinitionCache($type) {
|
||||||
|
$this->type = $type;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Generates a unique identifier for a particular configuration
|
||||||
|
* @param Instance of HTMLPurifier_Config
|
||||||
|
*/
|
||||||
|
function generateKey($config) {
|
||||||
|
return $config->version . '-' . // possibly replace with function calls
|
||||||
|
$config->get($this->type, 'DefinitionRev') . '-' .
|
||||||
|
$config->getBatchSerial($this->type);
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Tests whether or not a key is old with respect to the configuration's
|
||||||
|
* version and revision number.
|
||||||
|
* @param $key Key to test
|
||||||
|
* @param $config Instance of HTMLPurifier_Config to test against
|
||||||
|
*/
|
||||||
|
function isOld($key, $config) {
|
||||||
|
list($version, $revision, $hash) = explode('-', $key, 3);
|
||||||
|
$compare = version_compare($version, $config->version);
|
||||||
|
if ($compare > 0) return false;
|
||||||
|
if ($compare == 0 && $revision >= $config->get($this->type, 'DefinitionRev')) return false;
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Checks if a definition's type jives with the cache's type
|
||||||
|
* @note Throws an error on failure
|
||||||
|
* @param $def Definition object to check
|
||||||
|
* @return Boolean true if good, false if not
|
||||||
|
*/
|
||||||
|
function checkDefType($def) {
|
||||||
|
if ($def->type !== $this->type) {
|
||||||
|
trigger_error("Cannot use definition of type {$def->type} in cache for {$this->type}");
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Adds a definition object to the cache
|
||||||
|
*/
|
||||||
|
function add($def, $config) {
|
||||||
|
trigger_error('Cannot call abstract method', E_USER_ERROR);
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Unconditionally saves a definition object to the cache
|
||||||
|
*/
|
||||||
|
function set($def, $config) {
|
||||||
|
trigger_error('Cannot call abstract method', E_USER_ERROR);
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Replace an object in the cache
|
||||||
|
*/
|
||||||
|
function replace($def, $config) {
|
||||||
|
trigger_error('Cannot call abstract method', E_USER_ERROR);
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Retrieves a definition object from the cache
|
||||||
|
*/
|
||||||
|
function get($config) {
|
||||||
|
trigger_error('Cannot call abstract method', E_USER_ERROR);
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Removes a definition object to the cache
|
||||||
|
*/
|
||||||
|
function remove($config) {
|
||||||
|
trigger_error('Cannot call abstract method', E_USER_ERROR);
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Clears all objects from cache
|
||||||
|
*/
|
||||||
|
function flush($config) {
|
||||||
|
trigger_error('Cannot call abstract method', E_USER_ERROR);
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Clears all expired (older version or revision) objects from cache
|
||||||
|
*/
|
||||||
|
function cleanup($config) {
|
||||||
|
trigger_error('Cannot call abstract method', E_USER_ERROR);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
?>
|
63
library/HTMLPurifier/DefinitionCache/Decorator.php
Normal file
63
library/HTMLPurifier/DefinitionCache/Decorator.php
Normal file
@@ -0,0 +1,63 @@
|
|||||||
|
<?php
|
||||||
|
|
||||||
|
require_once 'HTMLPurifier/DefinitionCache.php';
|
||||||
|
|
||||||
|
require_once 'HTMLPurifier/DefinitionCache/Decorator/Memory.php';
|
||||||
|
require_once 'HTMLPurifier/DefinitionCache/Decorator/Cleanup.php';
|
||||||
|
|
||||||
|
class HTMLPurifier_DefinitionCache_Decorator extends HTMLPurifier_DefinitionCache
|
||||||
|
{
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Cache object we are decorating
|
||||||
|
*/
|
||||||
|
var $cache;
|
||||||
|
|
||||||
|
function HTMLPurifier_DefinitionCache_Decorator() {}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Lazy decorator function
|
||||||
|
* @param $cache Reference to cache object to decorate
|
||||||
|
*/
|
||||||
|
function decorate(&$cache) {
|
||||||
|
$decorator = $this->copy();
|
||||||
|
// reference is necessary for mocks in PHP 4
|
||||||
|
$decorator->cache =& $cache;
|
||||||
|
$decorator->type = $cache->type;
|
||||||
|
return $decorator;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Cross-compatible clone substitute
|
||||||
|
*/
|
||||||
|
function copy() {
|
||||||
|
return new HTMLPurifier_DefinitionCache_Decorator();
|
||||||
|
}
|
||||||
|
|
||||||
|
function add($def, $config) {
|
||||||
|
return $this->cache->add($def, $config);
|
||||||
|
}
|
||||||
|
|
||||||
|
function set($def, $config) {
|
||||||
|
return $this->cache->set($def, $config);
|
||||||
|
}
|
||||||
|
|
||||||
|
function replace($def, $config) {
|
||||||
|
return $this->cache->replace($def, $config);
|
||||||
|
}
|
||||||
|
|
||||||
|
function get($config) {
|
||||||
|
return $this->cache->get($config);
|
||||||
|
}
|
||||||
|
|
||||||
|
function flush($config) {
|
||||||
|
return $this->cache->flush($config);
|
||||||
|
}
|
||||||
|
|
||||||
|
function cleanup($config) {
|
||||||
|
return $this->cache->cleanup($config);
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
?>
|
45
library/HTMLPurifier/DefinitionCache/Decorator/Cleanup.php
Normal file
45
library/HTMLPurifier/DefinitionCache/Decorator/Cleanup.php
Normal file
@@ -0,0 +1,45 @@
|
|||||||
|
<?php
|
||||||
|
|
||||||
|
require_once 'HTMLPurifier/DefinitionCache/Decorator.php';
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Definition cache decorator class that cleans up the cache
|
||||||
|
* whenever there is a cache miss.
|
||||||
|
*/
|
||||||
|
class HTMLPurifier_DefinitionCache_Decorator_Cleanup extends
|
||||||
|
HTMLPurifier_DefinitionCache_Decorator
|
||||||
|
{
|
||||||
|
|
||||||
|
var $name = 'Cleanup';
|
||||||
|
|
||||||
|
function copy() {
|
||||||
|
return new HTMLPurifier_DefinitionCache_Decorator_Cleanup();
|
||||||
|
}
|
||||||
|
|
||||||
|
function add($def, $config) {
|
||||||
|
$status = parent::add($def, $config);
|
||||||
|
if (!$status) parent::cleanup($config);
|
||||||
|
return $status;
|
||||||
|
}
|
||||||
|
|
||||||
|
function set($def, $config) {
|
||||||
|
$status = parent::set($def, $config);
|
||||||
|
if (!$status) parent::cleanup($config);
|
||||||
|
return $status;
|
||||||
|
}
|
||||||
|
|
||||||
|
function replace($def, $config) {
|
||||||
|
$status = parent::replace($def, $config);
|
||||||
|
if (!$status) parent::cleanup($config);
|
||||||
|
return $status;
|
||||||
|
}
|
||||||
|
|
||||||
|
function get($config) {
|
||||||
|
$ret = parent::get($config);
|
||||||
|
if (!$ret) parent::cleanup($config);
|
||||||
|
return $ret;
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
?>
|
48
library/HTMLPurifier/DefinitionCache/Decorator/Memory.php
Normal file
48
library/HTMLPurifier/DefinitionCache/Decorator/Memory.php
Normal file
@@ -0,0 +1,48 @@
|
|||||||
|
<?php
|
||||||
|
|
||||||
|
require_once 'HTMLPurifier/DefinitionCache/Decorator.php';
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Definition cache decorator class that saves all cache retrievals
|
||||||
|
* to PHP's memory; good for unit tests or circumstances where
|
||||||
|
* there are lots of configuration objects floating around.
|
||||||
|
*/
|
||||||
|
class HTMLPurifier_DefinitionCache_Decorator_Memory extends
|
||||||
|
HTMLPurifier_DefinitionCache_Decorator
|
||||||
|
{
|
||||||
|
|
||||||
|
var $definitions;
|
||||||
|
var $name = 'Memory';
|
||||||
|
|
||||||
|
function copy() {
|
||||||
|
return new HTMLPurifier_DefinitionCache_Decorator_Memory();
|
||||||
|
}
|
||||||
|
|
||||||
|
function add($def, $config) {
|
||||||
|
$status = parent::add($def, $config);
|
||||||
|
if ($status) $this->definitions[$this->generateKey($config)] = $def;
|
||||||
|
return $status;
|
||||||
|
}
|
||||||
|
|
||||||
|
function set($def, $config) {
|
||||||
|
$status = parent::set($def, $config);
|
||||||
|
if ($status) $this->definitions[$this->generateKey($config)] = $def;
|
||||||
|
return $status;
|
||||||
|
}
|
||||||
|
|
||||||
|
function replace($def, $config) {
|
||||||
|
$status = parent::replace($def, $config);
|
||||||
|
if ($status) $this->definitions[$this->generateKey($config)] = $def;
|
||||||
|
return $status;
|
||||||
|
}
|
||||||
|
|
||||||
|
function get($config) {
|
||||||
|
$key = $this->generateKey($config);
|
||||||
|
if (isset($this->definitions[$key])) return $this->definitions[$key];
|
||||||
|
$this->definitions[$key] = parent::get($config);
|
||||||
|
return $this->definitions[$key];
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
?>
|
@@ -0,0 +1,47 @@
|
|||||||
|
<?php
|
||||||
|
|
||||||
|
require_once 'HTMLPurifier/DefinitionCache/Decorator.php';
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Definition cache decorator template.
|
||||||
|
*/
|
||||||
|
class HTMLPurifier_DefinitionCache_Decorator_Template extends
|
||||||
|
HTMLPurifier_DefinitionCache_Decorator
|
||||||
|
{
|
||||||
|
|
||||||
|
var $name = 'Template'; // replace this
|
||||||
|
|
||||||
|
function copy() {
|
||||||
|
// replace class name with yours
|
||||||
|
return new HTMLPurifier_DefinitionCache_Decorator_Template();
|
||||||
|
}
|
||||||
|
|
||||||
|
// remove methods you don't need
|
||||||
|
|
||||||
|
function add($def, $config) {
|
||||||
|
return parent::add($def, $config);
|
||||||
|
}
|
||||||
|
|
||||||
|
function set($def, $config) {
|
||||||
|
return parent::set($def, $config);
|
||||||
|
}
|
||||||
|
|
||||||
|
function replace($def, $config) {
|
||||||
|
return parent::replace($def, $config);
|
||||||
|
}
|
||||||
|
|
||||||
|
function get($config) {
|
||||||
|
return parent::get($config);
|
||||||
|
}
|
||||||
|
|
||||||
|
function flush() {
|
||||||
|
return parent::flush();
|
||||||
|
}
|
||||||
|
|
||||||
|
function cleanup($config) {
|
||||||
|
return parent::cleanup($config);
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
?>
|
37
library/HTMLPurifier/DefinitionCache/Null.php
Normal file
37
library/HTMLPurifier/DefinitionCache/Null.php
Normal file
@@ -0,0 +1,37 @@
|
|||||||
|
<?php
|
||||||
|
|
||||||
|
require_once 'HTMLPurifier/DefinitionCache.php';
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Null cache object to use when no caching is on.
|
||||||
|
*/
|
||||||
|
class HTMLPurifier_DefinitionCache_Null extends HTMLPurifier_DefinitionCache
|
||||||
|
{
|
||||||
|
|
||||||
|
function add($def, $config) {
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
function set($def, $config) {
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
function replace($def, $config) {
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
function get($config) {
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
function flush($config) {
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
function cleanup($config) {
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
?>
|
129
library/HTMLPurifier/DefinitionCache/Serializer.php
Normal file
129
library/HTMLPurifier/DefinitionCache/Serializer.php
Normal file
@@ -0,0 +1,129 @@
|
|||||||
|
<?php
|
||||||
|
|
||||||
|
require_once 'HTMLPurifier/DefinitionCache.php';
|
||||||
|
|
||||||
|
HTMLPurifier_ConfigSchema::define(
|
||||||
|
'Cache', 'SerializerPath', null, 'string/null', '
|
||||||
|
<p>
|
||||||
|
Absolute path with no trailing slash to store serialized definitions in.
|
||||||
|
Default is within the
|
||||||
|
HTML Purifier library inside DefinitionCache/Serializer. This
|
||||||
|
path must be writable by the webserver. This directive has been
|
||||||
|
available since 2.0.0.
|
||||||
|
</p>
|
||||||
|
');
|
||||||
|
|
||||||
|
class HTMLPurifier_DefinitionCache_Serializer extends
|
||||||
|
HTMLPurifier_DefinitionCache
|
||||||
|
{
|
||||||
|
|
||||||
|
function add($def, $config) {
|
||||||
|
if (!$this->checkDefType($def)) return;
|
||||||
|
$file = $this->generateFilePath($config);
|
||||||
|
if (file_exists($file)) return false;
|
||||||
|
$this->_prepareDir($config);
|
||||||
|
return $this->_write($file, serialize($def));
|
||||||
|
}
|
||||||
|
|
||||||
|
function set($def, $config) {
|
||||||
|
if (!$this->checkDefType($def)) return;
|
||||||
|
$file = $this->generateFilePath($config);
|
||||||
|
$this->_prepareDir($config);
|
||||||
|
return $this->_write($file, serialize($def));
|
||||||
|
}
|
||||||
|
|
||||||
|
function replace($def, $config) {
|
||||||
|
if (!$this->checkDefType($def)) return;
|
||||||
|
$file = $this->generateFilePath($config);
|
||||||
|
if (!file_exists($file)) return false;
|
||||||
|
$this->_prepareDir($config);
|
||||||
|
return $this->_write($file, serialize($def));
|
||||||
|
}
|
||||||
|
|
||||||
|
function get($config) {
|
||||||
|
$file = $this->generateFilePath($config);
|
||||||
|
if (!file_exists($file)) return false;
|
||||||
|
return unserialize(file_get_contents($file));
|
||||||
|
}
|
||||||
|
|
||||||
|
function remove($config) {
|
||||||
|
$file = $this->generateFilePath($config);
|
||||||
|
if (!file_exists($file)) return false;
|
||||||
|
return unlink($file);
|
||||||
|
}
|
||||||
|
|
||||||
|
function flush($config) {
|
||||||
|
$dir = $this->generateDirectoryPath($config);
|
||||||
|
$dh = opendir($dir);
|
||||||
|
while (false !== ($filename = readdir($dh))) {
|
||||||
|
if (empty($filename)) continue;
|
||||||
|
if ($filename[0] === '.') continue;
|
||||||
|
unlink($dir . '/' . $filename);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
function cleanup($config) {
|
||||||
|
$this->_prepareDir($config);
|
||||||
|
$dir = $this->generateDirectoryPath($config);
|
||||||
|
$dh = opendir($dir);
|
||||||
|
while (false !== ($filename = readdir($dh))) {
|
||||||
|
if (empty($filename)) continue;
|
||||||
|
if ($filename[0] === '.') continue;
|
||||||
|
$key = substr($filename, 0, strlen($filename) - 4);
|
||||||
|
if ($this->isOld($key, $config)) unlink($dir . '/' . $filename);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Generates the file path to the serial file corresponding to
|
||||||
|
* the configuration and definition name
|
||||||
|
*/
|
||||||
|
function generateFilePath($config) {
|
||||||
|
$key = $this->generateKey($config);
|
||||||
|
return $this->generateDirectoryPath($config) . '/' . $key . '.ser';
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Generates the path to the directory contain this cache's serial files
|
||||||
|
* @note No trailing slash
|
||||||
|
*/
|
||||||
|
function generateDirectoryPath($config) {
|
||||||
|
$base = $config->get('Cache', 'SerializerPath');
|
||||||
|
$base = is_null($base) ? dirname(__FILE__) . '/Serializer' : $base;
|
||||||
|
return $base . '/' . $this->type;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Convenience wrapper function for file_put_contents
|
||||||
|
* @param $file File name to write to
|
||||||
|
* @param $data Data to write into file
|
||||||
|
* @return Number of bytes written if success, or false if failure.
|
||||||
|
*/
|
||||||
|
function _write($file, $data) {
|
||||||
|
static $file_put_contents;
|
||||||
|
if ($file_put_contents === null) {
|
||||||
|
$file_put_contents = function_exists('file_put_contents');
|
||||||
|
}
|
||||||
|
if ($file_put_contents) {
|
||||||
|
return file_put_contents($file, $data);
|
||||||
|
}
|
||||||
|
$fh = fopen($file, 'w');
|
||||||
|
if (!$fh) return false;
|
||||||
|
$status = fwrite($fh, $data);
|
||||||
|
fclose($fh);
|
||||||
|
return $status;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Prepares the directory that this type stores the serials in
|
||||||
|
*/
|
||||||
|
function _prepareDir($config) {
|
||||||
|
$directory = $this->generateDirectoryPath($config);
|
||||||
|
if (!is_dir($directory)) {
|
||||||
|
mkdir($directory);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
?>
|
90
library/HTMLPurifier/DefinitionCacheFactory.php
Normal file
90
library/HTMLPurifier/DefinitionCacheFactory.php
Normal file
@@ -0,0 +1,90 @@
|
|||||||
|
<?php
|
||||||
|
|
||||||
|
require_once 'HTMLPurifier/DefinitionCache.php';
|
||||||
|
|
||||||
|
HTMLPurifier_ConfigSchema::define(
|
||||||
|
'Core', 'DefinitionCache', 'Serializer', 'string/null', '
|
||||||
|
This directive defines which method to use when caching definitions,
|
||||||
|
the complex data-type that makes HTML Purifier tick. Set to null
|
||||||
|
to disable caching (not recommended, as you will see a definite
|
||||||
|
performance degradation). This directive has been available since 2.0.0.
|
||||||
|
');
|
||||||
|
|
||||||
|
HTMLPurifier_ConfigSchema::defineAllowedValues(
|
||||||
|
'Core', 'DefinitionCache', array('Serializer')
|
||||||
|
);
|
||||||
|
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Responsible for creating definition caches.
|
||||||
|
*/
|
||||||
|
class HTMLPurifier_DefinitionCacheFactory
|
||||||
|
{
|
||||||
|
|
||||||
|
var $caches = array('Serializer' => array());
|
||||||
|
var $decorators = array();
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Initialize default decorators
|
||||||
|
*/
|
||||||
|
function setup() {
|
||||||
|
$this->addDecorator('Cleanup');
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Retrieves an instance of global definition cache factory.
|
||||||
|
* @static
|
||||||
|
*/
|
||||||
|
static function &instance($prototype = null) {
|
||||||
|
static $instance;
|
||||||
|
if ($prototype !== null) {
|
||||||
|
$instance = $prototype;
|
||||||
|
} elseif ($instance === null || $prototype === true) {
|
||||||
|
$instance = new HTMLPurifier_DefinitionCacheFactory();
|
||||||
|
$instance->setup();
|
||||||
|
}
|
||||||
|
return $instance;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Factory method that creates a cache object based on configuration
|
||||||
|
* @param $name Name of definitions handled by cache
|
||||||
|
* @param $config Instance of HTMLPurifier_Config
|
||||||
|
*/
|
||||||
|
function &create($type, $config) {
|
||||||
|
// only one implementation as for right now, $config will
|
||||||
|
// be used to determine implementation
|
||||||
|
$method = $config->get('Core', 'DefinitionCache');
|
||||||
|
if ($method === null) {
|
||||||
|
$null = new HTMLPurifier_DefinitionCache_Null($type);
|
||||||
|
return $null;
|
||||||
|
}
|
||||||
|
if (!empty($this->caches[$method][$type])) {
|
||||||
|
return $this->caches[$method][$type];
|
||||||
|
}
|
||||||
|
$cache = new HTMLPurifier_DefinitionCache_Serializer($type);
|
||||||
|
foreach ($this->decorators as $decorator) {
|
||||||
|
$new_cache = $decorator->decorate($cache);
|
||||||
|
// prevent infinite recursion in PHP 4
|
||||||
|
unset($cache);
|
||||||
|
$cache = $new_cache;
|
||||||
|
}
|
||||||
|
$this->caches[$method][$type] = $cache;
|
||||||
|
return $this->caches[$method][$type];
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Registers a decorator to add to all new cache objects
|
||||||
|
* @param
|
||||||
|
*/
|
||||||
|
function addDecorator($decorator) {
|
||||||
|
if (is_string($decorator)) {
|
||||||
|
$class = "HTMLPurifier_DefinitionCache_Decorator_$decorator";
|
||||||
|
$decorator = new $class;
|
||||||
|
}
|
||||||
|
$this->decorators[$decorator->name] = $decorator;
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
?>
|
55
library/HTMLPurifier/Doctype.php
Normal file
55
library/HTMLPurifier/Doctype.php
Normal file
@@ -0,0 +1,55 @@
|
|||||||
|
<?php
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Represents a document type, contains information on which modules
|
||||||
|
* need to be loaded.
|
||||||
|
*/
|
||||||
|
class HTMLPurifier_Doctype
|
||||||
|
{
|
||||||
|
/**
|
||||||
|
* Full name of doctype
|
||||||
|
*/
|
||||||
|
var $name;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* List of standard modules (string identifiers or literal objects)
|
||||||
|
* that this doctype uses
|
||||||
|
*/
|
||||||
|
var $modules = array();
|
||||||
|
|
||||||
|
/**
|
||||||
|
* List of modules to use for tidying up code
|
||||||
|
*/
|
||||||
|
var $tidyModules = array();
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Is the language derived from XML (i.e. XHTML)?
|
||||||
|
*/
|
||||||
|
var $xml = true;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* List of aliases for this doctype
|
||||||
|
*/
|
||||||
|
var $aliases = array();
|
||||||
|
|
||||||
|
function HTMLPurifier_Doctype($name = null, $xml = true, $modules = array(),
|
||||||
|
$tidyModules = array(), $aliases = array()
|
||||||
|
) {
|
||||||
|
$this->name = $name;
|
||||||
|
$this->xml = $xml;
|
||||||
|
$this->modules = $modules;
|
||||||
|
$this->tidyModules = $tidyModules;
|
||||||
|
$this->aliases = $aliases;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Clones the doctype, use before resolving modes and the like
|
||||||
|
*/
|
||||||
|
function copy() {
|
||||||
|
return new HTMLPurifier_Doctype(
|
||||||
|
$this->name, $this->xml, $this->modules, $this->tidyModules, $this->aliases
|
||||||
|
);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
?>
|
125
library/HTMLPurifier/DoctypeRegistry.php
Normal file
125
library/HTMLPurifier/DoctypeRegistry.php
Normal file
@@ -0,0 +1,125 @@
|
|||||||
|
<?php
|
||||||
|
|
||||||
|
require_once 'HTMLPurifier/Doctype.php';
|
||||||
|
|
||||||
|
// Legacy directives for doctype specification
|
||||||
|
HTMLPurifier_ConfigSchema::define(
|
||||||
|
'HTML', 'Strict', false, 'bool',
|
||||||
|
'Determines whether or not to use Transitional (loose) or Strict rulesets. '.
|
||||||
|
'This directive is deprecated in favor of %HTML.Doctype. '.
|
||||||
|
'This directive has been available since 1.3.0.'
|
||||||
|
);
|
||||||
|
|
||||||
|
HTMLPurifier_ConfigSchema::define(
|
||||||
|
'HTML', 'XHTML', true, 'bool',
|
||||||
|
'Determines whether or not output is XHTML 1.0 or HTML 4.01 flavor. '.
|
||||||
|
'This directive is deprecated in favor of %HTML.Doctype. '.
|
||||||
|
'This directive was available since 1.1.'
|
||||||
|
);
|
||||||
|
HTMLPurifier_ConfigSchema::defineAlias('Core', 'XHTML', 'HTML', 'XHTML');
|
||||||
|
|
||||||
|
class HTMLPurifier_DoctypeRegistry
|
||||||
|
{
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Hash of doctype names to doctype objects
|
||||||
|
* @protected
|
||||||
|
*/
|
||||||
|
var $doctypes;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Lookup table of aliases to real doctype names
|
||||||
|
* @protected
|
||||||
|
*/
|
||||||
|
var $aliases;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Registers a doctype to the registry
|
||||||
|
* @note Accepts a fully-formed doctype object, or the
|
||||||
|
* parameters for constructing a doctype object
|
||||||
|
* @param $doctype Name of doctype or literal doctype object
|
||||||
|
* @param $modules Modules doctype will load
|
||||||
|
* @param $modules_for_modes Modules doctype will load for certain modes
|
||||||
|
* @param $aliases Alias names for doctype
|
||||||
|
* @return Reference to registered doctype (usable for further editing)
|
||||||
|
*/
|
||||||
|
function ®ister($doctype, $xml = true, $modules = array(),
|
||||||
|
$tidy_modules = array(), $aliases = array()
|
||||||
|
) {
|
||||||
|
if (!is_array($modules)) $modules = array($modules);
|
||||||
|
if (!is_array($tidy_modules)) $tidy_modules = array($tidy_modules);
|
||||||
|
if (!is_array($aliases)) $aliases = array($aliases);
|
||||||
|
if (!is_object($doctype)) {
|
||||||
|
$doctype = new HTMLPurifier_Doctype(
|
||||||
|
$doctype, $xml, $modules, $tidy_modules, $aliases
|
||||||
|
);
|
||||||
|
}
|
||||||
|
$this->doctypes[$doctype->name] =& $doctype;
|
||||||
|
$name = $doctype->name;
|
||||||
|
// hookup aliases
|
||||||
|
foreach ($doctype->aliases as $alias) {
|
||||||
|
if (isset($this->doctypes[$alias])) continue;
|
||||||
|
$this->aliases[$alias] = $name;
|
||||||
|
}
|
||||||
|
// remove old aliases
|
||||||
|
if (isset($this->aliases[$name])) unset($this->aliases[$name]);
|
||||||
|
return $doctype;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Retrieves reference to a doctype of a certain name
|
||||||
|
* @note This function resolves aliases
|
||||||
|
* @note When possible, use the more fully-featured make()
|
||||||
|
* @param $doctype Name of doctype
|
||||||
|
* @return Reference to doctype object
|
||||||
|
*/
|
||||||
|
function &get($doctype) {
|
||||||
|
if (isset($this->aliases[$doctype])) $doctype = $this->aliases[$doctype];
|
||||||
|
if (!isset($this->doctypes[$doctype])) {
|
||||||
|
trigger_error('Doctype ' . htmlspecialchars($doctype) . ' does not exist');
|
||||||
|
$anon = new HTMLPurifier_Doctype($doctype);
|
||||||
|
return $anon;
|
||||||
|
}
|
||||||
|
return $this->doctypes[$doctype];
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Creates a doctype based on a configuration object,
|
||||||
|
* will perform initialization on the doctype
|
||||||
|
* @note Use this function to get a copy of doctype that config
|
||||||
|
* can hold on to (this is necessary in order to tell
|
||||||
|
* Generator whether or not the current document is XML
|
||||||
|
* based or not).
|
||||||
|
*/
|
||||||
|
function make($config) {
|
||||||
|
$original_doctype = $this->get($this->getDoctypeFromConfig($config));
|
||||||
|
$doctype = $original_doctype->copy();
|
||||||
|
return $doctype;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Retrieves the doctype from the configuration object
|
||||||
|
*/
|
||||||
|
function getDoctypeFromConfig($config) {
|
||||||
|
// recommended test
|
||||||
|
$doctype = $config->get('HTML', 'Doctype');
|
||||||
|
if ($doctype !== null) {
|
||||||
|
return $doctype;
|
||||||
|
}
|
||||||
|
// backwards-compatibility
|
||||||
|
if ($config->get('HTML', 'XHTML')) {
|
||||||
|
$doctype = 'XHTML 1.0';
|
||||||
|
} else {
|
||||||
|
$doctype = 'HTML 4.01';
|
||||||
|
}
|
||||||
|
if ($config->get('HTML', 'Strict')) {
|
||||||
|
$doctype .= ' Strict';
|
||||||
|
} else {
|
||||||
|
$doctype .= ' Transitional';
|
||||||
|
}
|
||||||
|
return $doctype;
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
?>
|
@@ -51,6 +51,8 @@ class HTMLPurifier_ElementDef
|
|||||||
* Abstract string representation of internal ChildDef rules. See
|
* Abstract string representation of internal ChildDef rules. See
|
||||||
* HTMLPurifier_ContentSets for how this is parsed and then transformed
|
* HTMLPurifier_ContentSets for how this is parsed and then transformed
|
||||||
* into an HTMLPurifier_ChildDef.
|
* into an HTMLPurifier_ChildDef.
|
||||||
|
* @warning This is a temporary variable that is not available after
|
||||||
|
* being processed by HTMLDefinition
|
||||||
* @public
|
* @public
|
||||||
*/
|
*/
|
||||||
var $content_model;
|
var $content_model;
|
||||||
@@ -58,6 +60,9 @@ class HTMLPurifier_ElementDef
|
|||||||
/**
|
/**
|
||||||
* Value of $child->type, used to determine which ChildDef to use,
|
* Value of $child->type, used to determine which ChildDef to use,
|
||||||
* used in combination with $content_model.
|
* used in combination with $content_model.
|
||||||
|
* @warning This must be lowercase
|
||||||
|
* @warning This is a temporary variable that is not available after
|
||||||
|
* being processed by HTMLDefinition
|
||||||
* @public
|
* @public
|
||||||
*/
|
*/
|
||||||
var $content_model_type;
|
var $content_model_type;
|
||||||
@@ -78,14 +83,47 @@ class HTMLPurifier_ElementDef
|
|||||||
* have to worry about this one.
|
* have to worry about this one.
|
||||||
* @public
|
* @public
|
||||||
*/
|
*/
|
||||||
var $descendants_are_inline;
|
var $descendants_are_inline = false;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* List of the names of required attributes this element has. Dynamically
|
||||||
|
* populated.
|
||||||
|
* @public
|
||||||
|
*/
|
||||||
|
var $required_attr = array();
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Lookup table of tags excluded from all descendants of this tag.
|
* Lookup table of tags excluded from all descendants of this tag.
|
||||||
|
* @note SGML permits exclusions for all descendants, but this is
|
||||||
|
* not possible with DTDs or XML Schemas. W3C has elected to
|
||||||
|
* use complicated compositions of content_models to simulate
|
||||||
|
* exclusion for children, but we go the simpler, SGML-style
|
||||||
|
* route of flat-out exclusions, which correctly apply to
|
||||||
|
* all descendants and not just children. Note that the XHTML
|
||||||
|
* Modularization Abstract Modules are blithely unaware of such
|
||||||
|
* distinctions.
|
||||||
* @public
|
* @public
|
||||||
*/
|
*/
|
||||||
var $excludes = array();
|
var $excludes = array();
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Is this element safe for untrusted users to use?
|
||||||
|
*/
|
||||||
|
var $safe;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Low-level factory constructor for creating new standalone element defs
|
||||||
|
* @static
|
||||||
|
*/
|
||||||
|
static function create($safe, $content_model, $content_model_type, $attr) {
|
||||||
|
$def = new HTMLPurifier_ElementDef();
|
||||||
|
$def->safe = (bool) $safe;
|
||||||
|
$def->content_model = $content_model;
|
||||||
|
$def->content_model_type = $content_model_type;
|
||||||
|
$def->attr = $attr;
|
||||||
|
return $def;
|
||||||
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Merges the values of another element definition into this one.
|
* Merges the values of another element definition into this one.
|
||||||
* Values from the new element def take precedence if a value is
|
* Values from the new element def take precedence if a value is
|
||||||
@@ -99,24 +137,57 @@ class HTMLPurifier_ElementDef
|
|||||||
// merge in the includes
|
// merge in the includes
|
||||||
// sorry, no way to override an include
|
// sorry, no way to override an include
|
||||||
foreach ($v as $v2) {
|
foreach ($v as $v2) {
|
||||||
$def->attr[0][] = $v2;
|
$this->attr[0][] = $v2;
|
||||||
}
|
}
|
||||||
continue;
|
continue;
|
||||||
}
|
}
|
||||||
|
if ($v === false) {
|
||||||
|
if (isset($this->attr[$k])) unset($this->attr[$k]);
|
||||||
|
continue;
|
||||||
|
}
|
||||||
$this->attr[$k] = $v;
|
$this->attr[$k] = $v;
|
||||||
}
|
}
|
||||||
foreach($def->attr_transform_pre as $k => $v) $this->attr_transform_pre[$k] = $v;
|
$this->_mergeAssocArray($this->attr_transform_pre, $def->attr_transform_pre);
|
||||||
foreach($def->attr_transform_post as $k => $v) $this->attr_transform_post[$k] = $v;
|
$this->_mergeAssocArray($this->attr_transform_post, $def->attr_transform_post);
|
||||||
foreach($def->auto_close as $k => $v) $this->auto_close[$k] = $v;
|
$this->_mergeAssocArray($this->auto_close, $def->auto_close);
|
||||||
foreach($def->excludes as $k => $v) $this->excludes[$k] = $v;
|
$this->_mergeAssocArray($this->excludes, $def->excludes);
|
||||||
|
|
||||||
|
if(!empty($def->content_model)) {
|
||||||
|
$this->content_model .= ' | ' . $def->content_model;
|
||||||
|
$this->child = false;
|
||||||
|
}
|
||||||
|
if(!empty($def->content_model_type)) {
|
||||||
|
$this->content_model_type = $def->content_model_type;
|
||||||
|
$this->child = false;
|
||||||
|
}
|
||||||
if(!is_null($def->child)) $this->child = $def->child;
|
if(!is_null($def->child)) $this->child = $def->child;
|
||||||
if(!empty($def->content_model)) $this->content_model .= ' | ' . $def->content_model;
|
if($def->descendants_are_inline) $this->descendants_are_inline = $def->descendants_are_inline;
|
||||||
if(!empty($def->content_model_type)) $this->content_model_type = $def->content_model_type;
|
if(!is_null($def->safe)) $this->safe = $def->safe;
|
||||||
if(!is_null($def->descendants_are_inline)) $this->descendants_are_inline = $def->descendants_are_inline;
|
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Merges one array into another, removes values which equal false
|
||||||
|
* @param $a1 Array by reference that is merged into
|
||||||
|
* @param $a2 Array that merges into $a1
|
||||||
|
*/
|
||||||
|
function _mergeAssocArray(&$a1, $a2) {
|
||||||
|
foreach ($a2 as $k => $v) {
|
||||||
|
if ($v === false) {
|
||||||
|
if (isset($a1[$k])) unset($a1[$k]);
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
$a1[$k] = $v;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Retrieves a copy of the element definition
|
||||||
|
*/
|
||||||
|
function copy() {
|
||||||
|
return unserialize(serialize($this));
|
||||||
|
}
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|
||||||
?>
|
?>
|
||||||
|
@@ -1,7 +1,5 @@
|
|||||||
<?php
|
<?php
|
||||||
|
|
||||||
require_once 'HTMLPurifier/EntityLookup.php';
|
|
||||||
|
|
||||||
HTMLPurifier_ConfigSchema::define(
|
HTMLPurifier_ConfigSchema::define(
|
||||||
'Core', 'Encoding', 'utf-8', 'istring',
|
'Core', 'Encoding', 'utf-8', 'istring',
|
||||||
'If for some reason you are unable to convert all webpages to UTF-8, '.
|
'If for some reason you are unable to convert all webpages to UTF-8, '.
|
||||||
|
@@ -24,8 +24,8 @@ class HTMLPurifier_EntityParser
|
|||||||
* @protected
|
* @protected
|
||||||
*/
|
*/
|
||||||
var $_substituteEntitiesRegex =
|
var $_substituteEntitiesRegex =
|
||||||
'/&(?:[#]x([a-fA-F0-9]+)|[#]0*(\d+)|([A-Za-z]+));?/';
|
'/&(?:[#]x([a-fA-F0-9]+)|[#]0*(\d+)|([A-Za-z_:][A-Za-z0-9.\-_:]*));?/';
|
||||||
// 1. hex 2. dec 3. string
|
// 1. hex 2. dec 3. string (XML style)
|
||||||
|
|
||||||
|
|
||||||
/**
|
/**
|
||||||
@@ -97,7 +97,6 @@ class HTMLPurifier_EntityParser
|
|||||||
} else {
|
} else {
|
||||||
if (isset($this->_special_ent2dec[$matches[3]])) return $entity;
|
if (isset($this->_special_ent2dec[$matches[3]])) return $entity;
|
||||||
if (!$this->_entity_lookup) {
|
if (!$this->_entity_lookup) {
|
||||||
require_once 'HTMLPurifier/EntityLookup.php';
|
|
||||||
$this->_entity_lookup = HTMLPurifier_EntityLookup::instance();
|
$this->_entity_lookup = HTMLPurifier_EntityLookup::instance();
|
||||||
}
|
}
|
||||||
if (isset($this->_entity_lookup->table[$matches[3]])) {
|
if (isset($this->_entity_lookup->table[$matches[3]])) {
|
||||||
|
73
library/HTMLPurifier/ErrorCollector.php
Normal file
73
library/HTMLPurifier/ErrorCollector.php
Normal file
@@ -0,0 +1,73 @@
|
|||||||
|
<?php
|
||||||
|
|
||||||
|
require_once 'HTMLPurifier/Generator.php';
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Error collection class that enables HTML Purifier to report HTML
|
||||||
|
* problems back to the user
|
||||||
|
*/
|
||||||
|
class HTMLPurifier_ErrorCollector
|
||||||
|
{
|
||||||
|
|
||||||
|
var $errors = array();
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Sends an error message to the collector for later use
|
||||||
|
* @param string Error message text
|
||||||
|
* @param HTMLPurifier_Token Token that caused error
|
||||||
|
* @param array Tokens surrounding the offending token above, use true as placeholder
|
||||||
|
*/
|
||||||
|
function send($msg, $token, $context_tokens = array(true)) {
|
||||||
|
$this->errors[] = array($msg, $token, $context_tokens);
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Retrieves raw error data for custom formatter to use
|
||||||
|
* @param List of arrays in format of array(Error message text,
|
||||||
|
* token that caused error, tokens surrounding token)
|
||||||
|
*/
|
||||||
|
function getRaw() {
|
||||||
|
return $this->errors;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Default HTML formatting implementation for error messages
|
||||||
|
* @param $config Configuration array, vital for HTML output nature
|
||||||
|
*/
|
||||||
|
function getHTMLFormatted($config) {
|
||||||
|
$generator = new HTMLPurifier_Generator();
|
||||||
|
$context = new HTMLPurifier_Context();
|
||||||
|
$generator->generateFromTokens(array(), $config, $context); // initialize
|
||||||
|
$ret = array();
|
||||||
|
|
||||||
|
$errors = $this->errors;
|
||||||
|
|
||||||
|
// sort error array by line
|
||||||
|
if ($config->get('Core', 'MaintainLineNumbers')) {
|
||||||
|
$lines = array();
|
||||||
|
foreach ($errors as $error) $lines[] = $error[1]->line;
|
||||||
|
array_multisort($lines, SORT_ASC, $errors);
|
||||||
|
}
|
||||||
|
|
||||||
|
foreach ($errors as $error) {
|
||||||
|
$string = $generator->escape($error[0]); // message
|
||||||
|
if (!empty($error[1]->line)) {
|
||||||
|
$string .= ' at line ' . $error[1]->line;
|
||||||
|
}
|
||||||
|
$string .= ' (<code>';
|
||||||
|
foreach ($error[2] as $token) {
|
||||||
|
if ($token !== true) {
|
||||||
|
$string .= $generator->escape($generator->generateFromToken($token));
|
||||||
|
} else {
|
||||||
|
$string .= '<strong>' . $generator->escape($generator->generateFromToken($error[1])) . '</strong>';
|
||||||
|
}
|
||||||
|
}
|
||||||
|
$string .= '</code>)';
|
||||||
|
$ret[] = $string;
|
||||||
|
}
|
||||||
|
return $ret;
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
?>
|
@@ -1,59 +1,65 @@
|
|||||||
<?php
|
<?php
|
||||||
|
|
||||||
require_once 'HTMLPurifier/Lexer.php';
|
|
||||||
|
|
||||||
HTMLPurifier_ConfigSchema::define(
|
HTMLPurifier_ConfigSchema::define(
|
||||||
'Core', 'CleanUTF8DuringGeneration', false, 'bool',
|
'Output', 'CommentScriptContents', true, 'bool',
|
||||||
'When true, HTMLPurifier_Generator will also check all strings it '.
|
'Determines whether or not HTML Purifier should attempt to fix up '.
|
||||||
'escapes for UTF-8 well-formedness as a defense in depth measure. '.
|
'the contents of script tags for legacy browsers with comments. This '.
|
||||||
'This could cause a considerable performance impact, and is not '.
|
'directive was available since 1.7.'
|
||||||
'strictly necessary due to the fact that the Lexers should have '.
|
|
||||||
'ensured that all the UTF-8 strings were well-formed. Note that '.
|
|
||||||
'the configuration value is only read at the beginning of '.
|
|
||||||
'generateFromTokens.'
|
|
||||||
);
|
|
||||||
|
|
||||||
HTMLPurifier_ConfigSchema::define(
|
|
||||||
'Core', 'XHTML', true, 'bool',
|
|
||||||
'Determines whether or not output is XHTML or not. When disabled, HTML '.
|
|
||||||
'Purifier goes into HTML 4.01 removes XHTML-specific markup constructs, '.
|
|
||||||
'such as boolean attribute expansion and trailing slashes in empty tags. '.
|
|
||||||
'This directive was available since 1.1.'
|
|
||||||
);
|
);
|
||||||
|
HTMLPurifier_ConfigSchema::defineAlias('Core', 'CommentScriptContents', 'Output', 'CommentScriptContents');
|
||||||
|
|
||||||
// extension constraints could be factored into ConfigSchema
|
// extension constraints could be factored into ConfigSchema
|
||||||
HTMLPurifier_ConfigSchema::define(
|
HTMLPurifier_ConfigSchema::define(
|
||||||
'Core', 'TidyFormat', false, 'bool',
|
'Output', 'TidyFormat', false, 'bool', <<<HTML
|
||||||
'<p>Determines whether or not to run Tidy on the final output for pretty '.
|
<p>
|
||||||
'formatting reasons, such as indentation and wrap.</p><p>This can greatly '.
|
Determines whether or not to run Tidy on the final output for pretty
|
||||||
'improve readability for editors who are hand-editing the HTML, but is '.
|
formatting reasons, such as indentation and wrap.
|
||||||
'by no means necessary as HTML Purifier has already fixed all major '.
|
</p>
|
||||||
'errors the HTML may have had. Tidy is a non-default extension, and this directive '.
|
<p>
|
||||||
'will silently fail if Tidy is not available.</p><p>If you are looking to make '.
|
This can greatly improve readability for editors who are hand-editing
|
||||||
'the overall look of your page\'s source better, I recommend running Tidy '.
|
the HTML, but is by no means necessary as HTML Purifier has already
|
||||||
'on the entire page rather than just user-content (after all, the '.
|
fixed all major errors the HTML may have had. Tidy is a non-default
|
||||||
'indentation relative to the containing blocks will be incorrect).</p><p>This '.
|
extension, and this directive will silently fail if Tidy is not
|
||||||
'directive was available since 1.1.1.</p>'
|
available.
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
If you are looking to make the overall look of your page's source
|
||||||
|
better, I recommend running Tidy on the entire page rather than just
|
||||||
|
user-content (after all, the indentation relative to the containing
|
||||||
|
blocks will be incorrect).
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
This directive was available since 1.1.1.
|
||||||
|
</p>
|
||||||
|
HTML
|
||||||
);
|
);
|
||||||
|
HTMLPurifier_ConfigSchema::defineAlias('Core', 'TidyFormat', 'Output', 'TidyFormat');
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Generates HTML from tokens.
|
* Generates HTML from tokens.
|
||||||
|
* @todo Create a configuration-wide instance that all objects retrieve
|
||||||
*/
|
*/
|
||||||
class HTMLPurifier_Generator
|
class HTMLPurifier_Generator
|
||||||
{
|
{
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Bool cache of %Core.CleanUTF8DuringGeneration
|
* Bool cache of %HTML.XHTML
|
||||||
* @private
|
|
||||||
*/
|
|
||||||
var $_clean_utf8 = false;
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Bool cache of %Core.XHTML
|
|
||||||
* @private
|
* @private
|
||||||
*/
|
*/
|
||||||
var $_xhtml = true;
|
var $_xhtml = true;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Bool cache of %Output.CommentScriptContents
|
||||||
|
* @private
|
||||||
|
*/
|
||||||
|
var $_scriptFix = false;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Cache of HTMLDefinition
|
||||||
|
* @private
|
||||||
|
*/
|
||||||
|
var $_def;
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Generates HTML from an array of tokens.
|
* Generates HTML from an array of tokens.
|
||||||
* @param $tokens Array of HTMLPurifier_Token
|
* @param $tokens Array of HTMLPurifier_Token
|
||||||
@@ -63,13 +69,24 @@ class HTMLPurifier_Generator
|
|||||||
function generateFromTokens($tokens, $config, &$context) {
|
function generateFromTokens($tokens, $config, &$context) {
|
||||||
$html = '';
|
$html = '';
|
||||||
if (!$config) $config = HTMLPurifier_Config::createDefault();
|
if (!$config) $config = HTMLPurifier_Config::createDefault();
|
||||||
$this->_clean_utf8 = $config->get('Core', 'CleanUTF8DuringGeneration');
|
$this->_scriptFix = $config->get('Output', 'CommentScriptContents');
|
||||||
$this->_xhtml = $config->get('Core', 'XHTML');
|
|
||||||
|
$this->_def = $config->getHTMLDefinition();
|
||||||
|
$this->_xhtml = $this->_def->doctype->xml;
|
||||||
|
|
||||||
if (!$tokens) return '';
|
if (!$tokens) return '';
|
||||||
foreach ($tokens as $token) {
|
for ($i = 0, $size = count($tokens); $i < $size; $i++) {
|
||||||
$html .= $this->generateFromToken($token);
|
if ($this->_scriptFix && $tokens[$i]->name === 'script') {
|
||||||
|
// script special case
|
||||||
|
$html .= $this->generateFromToken($tokens[$i++]);
|
||||||
|
$html .= $this->generateScriptFromToken($tokens[$i++]);
|
||||||
|
while ($tokens[$i]->name != 'script') {
|
||||||
|
$html .= $this->generateScriptFromToken($tokens[$i++]);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
$html .= $this->generateFromToken($tokens[$i]);
|
||||||
}
|
}
|
||||||
if ($config->get('Core', 'TidyFormat') && extension_loaded('tidy')) {
|
if ($config->get('Output', 'TidyFormat') && extension_loaded('tidy')) {
|
||||||
|
|
||||||
$tidy_options = array(
|
$tidy_options = array(
|
||||||
'indent'=> true,
|
'indent'=> true,
|
||||||
@@ -104,14 +121,14 @@ class HTMLPurifier_Generator
|
|||||||
function generateFromToken($token) {
|
function generateFromToken($token) {
|
||||||
if (!isset($token->type)) return '';
|
if (!isset($token->type)) return '';
|
||||||
if ($token->type == 'start') {
|
if ($token->type == 'start') {
|
||||||
$attr = $this->generateAttributes($token->attr);
|
$attr = $this->generateAttributes($token->attr, $token->name);
|
||||||
return '<' . $token->name . ($attr ? ' ' : '') . $attr . '>';
|
return '<' . $token->name . ($attr ? ' ' : '') . $attr . '>';
|
||||||
|
|
||||||
} elseif ($token->type == 'end') {
|
} elseif ($token->type == 'end') {
|
||||||
return '</' . $token->name . '>';
|
return '</' . $token->name . '>';
|
||||||
|
|
||||||
} elseif ($token->type == 'empty') {
|
} elseif ($token->type == 'empty') {
|
||||||
$attr = $this->generateAttributes($token->attr);
|
$attr = $this->generateAttributes($token->attr, $token->name);
|
||||||
return '<' . $token->name . ($attr ? ' ' : '') . $attr .
|
return '<' . $token->name . ($attr ? ' ' : '') . $attr .
|
||||||
( $this->_xhtml ? ' /': '' )
|
( $this->_xhtml ? ' /': '' )
|
||||||
. '>';
|
. '>';
|
||||||
@@ -125,18 +142,33 @@ class HTMLPurifier_Generator
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Special case processor for the contents of script tags
|
||||||
|
* @warning This runs into problems if there's already a literal
|
||||||
|
* --> somewhere inside the script contents.
|
||||||
|
*/
|
||||||
|
function generateScriptFromToken($token) {
|
||||||
|
if (!$token->type == 'text') return $this->generateFromToken($token);
|
||||||
|
return '<!--' . PHP_EOL . $token->data . PHP_EOL . '// -->';
|
||||||
|
// more advanced version:
|
||||||
|
// return '<!--//--><![CDATA[//><!--' . PHP_EOL . $token->data . PHP_EOL . '//--><!]]>';
|
||||||
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Generates attribute declarations from attribute array.
|
* Generates attribute declarations from attribute array.
|
||||||
* @param $assoc_array_of_attributes Attribute array
|
* @param $assoc_array_of_attributes Attribute array
|
||||||
* @return Generate HTML fragment for insertion.
|
* @return Generate HTML fragment for insertion.
|
||||||
*/
|
*/
|
||||||
function generateAttributes($assoc_array_of_attributes) {
|
function generateAttributes($assoc_array_of_attributes, $element) {
|
||||||
$html = '';
|
$html = '';
|
||||||
foreach ($assoc_array_of_attributes as $key => $value) {
|
foreach ($assoc_array_of_attributes as $key => $value) {
|
||||||
if (!$this->_xhtml) {
|
if (!$this->_xhtml) {
|
||||||
// remove namespaced attributes
|
// remove namespaced attributes
|
||||||
if (strpos($key, ':') !== false) continue;
|
if (strpos($key, ':') !== false) continue;
|
||||||
// also needed: check for attribute minimization
|
if (!empty($this->_def->info[$element]->attr[$key]->minimized)) {
|
||||||
|
$html .= $key . ' ';
|
||||||
|
continue;
|
||||||
|
}
|
||||||
}
|
}
|
||||||
$html .= $key.'="'.$this->escape($value).'" ';
|
$html .= $key.'="'.$this->escape($value).'" ';
|
||||||
}
|
}
|
||||||
@@ -149,7 +181,6 @@ class HTMLPurifier_Generator
|
|||||||
* @return String escaped data.
|
* @return String escaped data.
|
||||||
*/
|
*/
|
||||||
function escape($string) {
|
function escape($string) {
|
||||||
if ($this->_clean_utf8) $string = HTMLPurifier_Lexer::cleanUTF8($string);
|
|
||||||
return htmlspecialchars($string, ENT_COMPAT, 'UTF-8');
|
return htmlspecialchars($string, ENT_COMPAT, 'UTF-8');
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@@ -1,61 +1,132 @@
|
|||||||
<?php
|
<?php
|
||||||
|
|
||||||
// components
|
require_once 'HTMLPurifier/Definition.php';
|
||||||
require_once 'HTMLPurifier/HTMLModuleManager.php';
|
require_once 'HTMLPurifier/HTMLModuleManager.php';
|
||||||
|
|
||||||
// this definition and its modules MUST NOT define configuration directives
|
// this definition and its modules MUST NOT define configuration directives
|
||||||
// outside of the HTML or Attr namespaces
|
// outside of the HTML or Attr namespaces
|
||||||
|
|
||||||
// will be superceded by more accurate doctype declaration schemes
|
|
||||||
HTMLPurifier_ConfigSchema::define(
|
HTMLPurifier_ConfigSchema::define(
|
||||||
'HTML', 'Strict', false, 'bool',
|
'HTML', 'DefinitionID', null, 'string/null', '
|
||||||
'Determines whether or not to use Transitional (loose) or Strict rulesets. '.
|
<p>
|
||||||
'This directive has been available since 1.3.0.'
|
Unique identifier for a custom-built HTML definition. If you edit
|
||||||
);
|
the raw version of the HTMLDefinition, introducing changes that the
|
||||||
|
configuration object does not reflect, you must specify this variable.
|
||||||
|
If you change your custom edits, you should change this directive, or
|
||||||
|
clear your cache. Example:
|
||||||
|
</p>
|
||||||
|
<pre>
|
||||||
|
$config = HTMLPurifier_Config::createDefault();
|
||||||
|
$config->set(\'HTML\', \'DefinitionID\', \'1\');
|
||||||
|
$def = $config->getHTMLDefinition();
|
||||||
|
$def->addAttribute(\'a\', \'tabindex\', \'Number\');
|
||||||
|
</pre>
|
||||||
|
<p>
|
||||||
|
In the above example, the configuration is still at the defaults, but
|
||||||
|
using the advanced API, an extra attribute has been added. The
|
||||||
|
configuration object normally has no way of knowing that this change
|
||||||
|
has taken place, so it needs an extra directive: %HTML.DefinitionID.
|
||||||
|
If someone else attempts to use the default configuration, these two
|
||||||
|
pieces of code will not clobber each other in the cache, since one has
|
||||||
|
an extra directive attached to it.
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
This directive has been available since 2.0.0, and in that version or
|
||||||
|
later you <em>must</em> specify a value to this directive to use the
|
||||||
|
advanced API features.
|
||||||
|
</p>
|
||||||
|
');
|
||||||
|
|
||||||
HTMLPurifier_ConfigSchema::define(
|
HTMLPurifier_ConfigSchema::define(
|
||||||
'HTML', 'BlockWrapper', 'p', 'string',
|
'HTML', 'DefinitionRev', 1, 'int', '
|
||||||
'String name of element to wrap inline elements that are inside a block '.
|
<p>
|
||||||
'context. This only occurs in the children of blockquote in strict mode. '.
|
Revision identifier for your custom definition specified in
|
||||||
'Example: by default value, <code><blockquote>Foo</blockquote></code> '.
|
%HTML.DefinitionID. This serves the same purpose: uniquely identifying
|
||||||
'would become <code><blockquote><p>Foo</p></blockquote></code>. The '.
|
your custom definition, but this one does so in a chronological
|
||||||
'<code><p></code> tags can be replaced '.
|
context: revision 3 is more up-to-date then revision 2. Thus, when
|
||||||
'with whatever you desire, as long as it is a block level element. '.
|
this gets incremented, the cache handling is smart enough to clean
|
||||||
'This directive has been available since 1.3.0.'
|
up any older revisions of your definition as well as flush the
|
||||||
);
|
cache. This directive has been available since 2.0.0.
|
||||||
|
</p>
|
||||||
|
');
|
||||||
|
|
||||||
HTMLPurifier_ConfigSchema::define(
|
HTMLPurifier_ConfigSchema::define(
|
||||||
'HTML', 'Parent', 'div', 'string',
|
'HTML', 'BlockWrapper', 'p', 'string', '
|
||||||
'String name of element that HTML fragment passed to library will be '.
|
<p>
|
||||||
'inserted in. An interesting variation would be using span as the '.
|
String name of element to wrap inline elements that are inside a block
|
||||||
'parent element, meaning that only inline tags would be allowed. '.
|
context. This only occurs in the children of blockquote in strict mode.
|
||||||
'This directive has been available since 1.3.0.'
|
</p>
|
||||||
);
|
<p>
|
||||||
|
Example: by default value,
|
||||||
|
<code><blockquote>Foo</blockquote></code> would become
|
||||||
|
<code><blockquote><p>Foo</p></blockquote></code>.
|
||||||
|
The <code><p></code> tags can be replaced with whatever you desire,
|
||||||
|
as long as it is a block level element. This directive has been available
|
||||||
|
since 1.3.0.
|
||||||
|
</p>
|
||||||
|
');
|
||||||
|
|
||||||
HTMLPurifier_ConfigSchema::define(
|
HTMLPurifier_ConfigSchema::define(
|
||||||
'HTML', 'AllowedElements', null, 'lookup/null',
|
'HTML', 'Parent', 'div', 'string', '
|
||||||
'If HTML Purifier\'s tag set is unsatisfactory for your needs, you '.
|
<p>
|
||||||
'can overload it with your own list of tags to allow. Note that this '.
|
String name of element that HTML fragment passed to library will be
|
||||||
'method is subtractive: it does its job by taking away from HTML Purifier '.
|
inserted in. An interesting variation would be using span as the
|
||||||
'usual feature set, so you cannot add a tag that HTML Purifier never '.
|
parent element, meaning that only inline tags would be allowed.
|
||||||
'supported in the first place (like embed, form or head). If you change this, you '.
|
This directive has been available since 1.3.0.
|
||||||
'probably also want to change %HTML.AllowedAttributes. '.
|
</p>
|
||||||
'<strong>Warning:</strong> If another directive conflicts with the '.
|
');
|
||||||
'elements here, <em>that</em> directive will win and override. '.
|
|
||||||
'This directive has been available since 1.3.0.'
|
|
||||||
);
|
|
||||||
|
|
||||||
HTMLPurifier_ConfigSchema::define(
|
HTMLPurifier_ConfigSchema::define(
|
||||||
'HTML', 'AllowedAttributes', null, 'lookup/null',
|
'HTML', 'AllowedElements', null, 'lookup/null', '
|
||||||
'IF HTML Purifier\'s attribute set is unsatisfactory, overload it! '.
|
<p>
|
||||||
'The syntax is \'tag.attr\' or \'*.attr\' for the global attributes '.
|
If HTML Purifier\'s tag set is unsatisfactory for your needs, you
|
||||||
'(style, id, class, dir, lang, xml:lang).'.
|
can overload it with your own list of tags to allow. Note that this
|
||||||
'<strong>Warning:</strong> If another directive conflicts with the '.
|
method is subtractive: it does its job by taking away from HTML Purifier
|
||||||
'elements here, <em>that</em> directive will win and override. For '.
|
usual feature set, so you cannot add a tag that HTML Purifier never
|
||||||
'example, %HTML.EnableAttrID will take precedence over *.id in this '.
|
supported in the first place (like embed, form or head). If you
|
||||||
'directive. You must set that directive to true before you can use '.
|
change this, you probably also want to change %HTML.AllowedAttributes.
|
||||||
'IDs at all. This directive has been available since 1.3.0.'
|
</p>
|
||||||
);
|
<p>
|
||||||
|
<strong>Warning:</strong> If another directive conflicts with the
|
||||||
|
elements here, <em>that</em> directive will win and override.
|
||||||
|
This directive has been available since 1.3.0.
|
||||||
|
</p>
|
||||||
|
');
|
||||||
|
|
||||||
|
HTMLPurifier_ConfigSchema::define(
|
||||||
|
'HTML', 'AllowedAttributes', null, 'lookup/null', '
|
||||||
|
<p>
|
||||||
|
If HTML Purifier\'s attribute set is unsatisfactory, overload it!
|
||||||
|
The syntax is "tag.attr" or "*.attr" for the global attributes
|
||||||
|
(style, id, class, dir, lang, xml:lang).
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
<strong>Warning:</strong> If another directive conflicts with the
|
||||||
|
elements here, <em>that</em> directive will win and override. For
|
||||||
|
example, %HTML.EnableAttrID will take precedence over *.id in this
|
||||||
|
directive. You must set that directive to true before you can use
|
||||||
|
IDs at all. This directive has been available since 1.3.0.
|
||||||
|
</p>
|
||||||
|
');
|
||||||
|
|
||||||
|
HTMLPurifier_ConfigSchema::define(
|
||||||
|
'HTML', 'Allowed', null, 'string/null', '
|
||||||
|
<p>
|
||||||
|
This is a convenience directive that rolls the functionality of
|
||||||
|
%HTML.AllowedElements and %HTML.AllowedAttributes into one directive.
|
||||||
|
Specify elements and attributes that are allowed using:
|
||||||
|
<code>element1[attr1|attr2],element2...</code>.
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
<strong>Warning</strong>:
|
||||||
|
All of the constraints on the component directives are still enforced.
|
||||||
|
The syntax is a <em>subset</em> of TinyMCE\'s <code>valid_elements</code>
|
||||||
|
whitelist: directly copy-pasting it here will probably result in
|
||||||
|
broken whitelists. If %HTML.AllowedElements or %HTML.AllowedAttributes
|
||||||
|
are set, this directive has no effect.
|
||||||
|
This directive has been available since 2.0.0.
|
||||||
|
</p>
|
||||||
|
');
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Definition of the purified HTML that describes allowed children,
|
* Definition of the purified HTML that describes allowed children,
|
||||||
@@ -77,10 +148,10 @@ HTMLPurifier_ConfigSchema::define(
|
|||||||
* HTMLPurifier_Printer_HTMLDefinition is a notable exception to this
|
* HTMLPurifier_Printer_HTMLDefinition is a notable exception to this
|
||||||
* rule: in the interest of comprehensiveness, it will sniff everything.
|
* rule: in the interest of comprehensiveness, it will sniff everything.
|
||||||
*/
|
*/
|
||||||
class HTMLPurifier_HTMLDefinition
|
class HTMLPurifier_HTMLDefinition extends HTMLPurifier_Definition
|
||||||
{
|
{
|
||||||
|
|
||||||
/** FULLY-PUBLIC VARIABLES */
|
// FULLY-PUBLIC VARIABLES ---------------------------------------------
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Associative array of element names to HTMLPurifier_ElementDef
|
* Associative array of element names to HTMLPurifier_ElementDef
|
||||||
@@ -139,50 +210,97 @@ class HTMLPurifier_HTMLDefinition
|
|||||||
*/
|
*/
|
||||||
var $info_content_sets = array();
|
var $info_content_sets = array();
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Doctype object
|
||||||
|
*/
|
||||||
|
var $doctype;
|
||||||
|
|
||||||
|
|
||||||
/** PUBLIC BUT INTERNAL VARIABLES */
|
|
||||||
|
|
||||||
var $setup = false; /**< Has setup() been called yet? */
|
// RAW CUSTOMIZATION STUFF --------------------------------------------
|
||||||
var $config; /**< Temporary instance of HTMLPurifier_Config */
|
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Adds a custom attribute to a pre-existing element
|
||||||
|
* @param $element_name String element name to add attribute to
|
||||||
|
* @param $attr_name String name of attribute
|
||||||
|
* @param $def Attribute definition, can be string or object, see
|
||||||
|
* HTMLPurifier_AttrTypes for details
|
||||||
|
*/
|
||||||
|
function addAttribute($element_name, $attr_name, $def) {
|
||||||
|
$module =& $this->getAnonymousModule();
|
||||||
|
$element =& $module->addBlankElement($element_name);
|
||||||
|
$element->attr[$attr_name] = $def;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Adds a custom element to your HTML definition
|
||||||
|
* @note See HTMLPurifier_HTMLModule::addElement for detailed
|
||||||
|
* parameter descriptions.
|
||||||
|
*/
|
||||||
|
function addElement($element_name, $type, $contents, $attr_collections, $attributes) {
|
||||||
|
$module =& $this->getAnonymousModule();
|
||||||
|
// assume that if the user is calling this, the element
|
||||||
|
// is safe. This may not be a good idea
|
||||||
|
$module->addElement($element_name, true, $type, $contents, $attr_collections, $attributes);
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Retrieves a reference to the anonymous module, so you can
|
||||||
|
* bust out advanced features without having to make your own
|
||||||
|
* module.
|
||||||
|
*/
|
||||||
|
function &getAnonymousModule() {
|
||||||
|
if (!$this->_anonModule) {
|
||||||
|
$this->_anonModule = new HTMLPurifier_HTMLModule();
|
||||||
|
$this->_anonModule->name = 'Anonymous';
|
||||||
|
}
|
||||||
|
return $this->_anonModule;
|
||||||
|
}
|
||||||
|
|
||||||
|
var $_anonModule;
|
||||||
|
|
||||||
|
|
||||||
|
// PUBLIC BUT INTERNAL VARIABLES --------------------------------------
|
||||||
|
|
||||||
|
var $type = 'HTML';
|
||||||
var $manager; /**< Instance of HTMLPurifier_HTMLModuleManager */
|
var $manager; /**< Instance of HTMLPurifier_HTMLModuleManager */
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Performs low-cost, preliminary initialization.
|
* Performs low-cost, preliminary initialization.
|
||||||
* @param $config Instance of HTMLPurifier_Config
|
|
||||||
*/
|
*/
|
||||||
function HTMLPurifier_HTMLDefinition(&$config) {
|
function HTMLPurifier_HTMLDefinition() {
|
||||||
$this->config =& $config;
|
|
||||||
$this->manager = new HTMLPurifier_HTMLModuleManager();
|
$this->manager = new HTMLPurifier_HTMLModuleManager();
|
||||||
}
|
}
|
||||||
|
|
||||||
/**
|
function doSetup($config) {
|
||||||
* Processes internals into form usable by HTMLPurifier internals.
|
$this->processModules($config);
|
||||||
* Modifying the definition after calling this function should not
|
$this->setupConfigStuff($config);
|
||||||
* be done.
|
|
||||||
*/
|
|
||||||
function setup() {
|
|
||||||
|
|
||||||
// multiple call guard
|
|
||||||
if ($this->setup) {return;} else {$this->setup = true;}
|
|
||||||
|
|
||||||
$this->processModules();
|
|
||||||
$this->setupConfigStuff();
|
|
||||||
|
|
||||||
unset($this->config);
|
|
||||||
unset($this->manager);
|
unset($this->manager);
|
||||||
|
|
||||||
|
// cleanup some of the element definitions
|
||||||
|
foreach ($this->info as $k => $v) {
|
||||||
|
unset($this->info[$k]->content_model);
|
||||||
|
unset($this->info[$k]->content_model_type);
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Extract out the information from the manager
|
* Extract out the information from the manager
|
||||||
*/
|
*/
|
||||||
function processModules() {
|
function processModules($config) {
|
||||||
|
|
||||||
$this->manager->setup($this->config);
|
if ($this->_anonModule) {
|
||||||
|
// for user specific changes
|
||||||
|
// this is late-loaded so we don't have to deal with PHP4
|
||||||
|
// reference wonky-ness
|
||||||
|
$this->manager->addModule($this->_anonModule);
|
||||||
|
unset($this->_anonModule);
|
||||||
|
}
|
||||||
|
|
||||||
foreach ($this->manager->activeModules as $module) {
|
$this->manager->setup($config);
|
||||||
|
$this->doctype = $this->manager->doctype;
|
||||||
|
|
||||||
|
foreach ($this->manager->modules as $module) {
|
||||||
foreach($module->info_tag_transform as $k => $v) {
|
foreach($module->info_tag_transform as $k => $v) {
|
||||||
if ($v === false) unset($this->info_tag_transform[$k]);
|
if ($v === false) unset($this->info_tag_transform[$k]);
|
||||||
else $this->info_tag_transform[$k] = $v;
|
else $this->info_tag_transform[$k] = $v;
|
||||||
@@ -197,7 +315,7 @@ class HTMLPurifier_HTMLDefinition
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
$this->info = $this->manager->getElements($this->config);
|
$this->info = $this->manager->getElements();
|
||||||
$this->info_content_sets = $this->manager->contentSets->lookup;
|
$this->info_content_sets = $this->manager->contentSets->lookup;
|
||||||
|
|
||||||
}
|
}
|
||||||
@@ -205,9 +323,9 @@ class HTMLPurifier_HTMLDefinition
|
|||||||
/**
|
/**
|
||||||
* Sets up stuff based on config. We need a better way of doing this.
|
* Sets up stuff based on config. We need a better way of doing this.
|
||||||
*/
|
*/
|
||||||
function setupConfigStuff() {
|
function setupConfigStuff($config) {
|
||||||
|
|
||||||
$block_wrapper = $this->config->get('HTML', 'BlockWrapper');
|
$block_wrapper = $config->get('HTML', 'BlockWrapper');
|
||||||
if (isset($this->info_content_sets['Block'][$block_wrapper])) {
|
if (isset($this->info_content_sets['Block'][$block_wrapper])) {
|
||||||
$this->info_block_wrapper = $block_wrapper;
|
$this->info_block_wrapper = $block_wrapper;
|
||||||
} else {
|
} else {
|
||||||
@@ -215,24 +333,33 @@ class HTMLPurifier_HTMLDefinition
|
|||||||
E_USER_ERROR);
|
E_USER_ERROR);
|
||||||
}
|
}
|
||||||
|
|
||||||
$parent = $this->config->get('HTML', 'Parent');
|
$parent = $config->get('HTML', 'Parent');
|
||||||
$def = $this->manager->getElement($parent, $this->config);
|
$def = $this->manager->getElement($parent, true);
|
||||||
if ($def) {
|
if ($def) {
|
||||||
$this->info_parent = $parent;
|
$this->info_parent = $parent;
|
||||||
$this->info_parent_def = $def;
|
$this->info_parent_def = $def;
|
||||||
} else {
|
} else {
|
||||||
trigger_error('Cannot use unrecognized element as parent.',
|
trigger_error('Cannot use unrecognized element as parent.',
|
||||||
E_USER_ERROR);
|
E_USER_ERROR);
|
||||||
$this->info_parent_def = $this->manager->getElement(
|
$this->info_parent_def = $this->manager->getElement($this->info_parent, true);
|
||||||
$this->info_parent, $this->config);
|
|
||||||
}
|
}
|
||||||
|
|
||||||
// support template text
|
// support template text
|
||||||
$support = "(for information on implementing this, see the ".
|
$support = "(for information on implementing this, see the ".
|
||||||
"support forums) ";
|
"support forums) ";
|
||||||
|
|
||||||
// setup allowed elements, SubtractiveWhitelist module
|
// setup allowed elements
|
||||||
$allowed_elements = $this->config->get('HTML', 'AllowedElements');
|
|
||||||
|
$allowed_elements = $config->get('HTML', 'AllowedElements');
|
||||||
|
$allowed_attributes = $config->get('HTML', 'AllowedAttributes');
|
||||||
|
|
||||||
|
if (!is_array($allowed_elements) && !is_array($allowed_attributes)) {
|
||||||
|
$allowed = $config->get('HTML', 'Allowed');
|
||||||
|
if (is_string($allowed)) {
|
||||||
|
list($allowed_elements, $allowed_attributes) = $this->parseTinyMCEAllowedList($allowed);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
if (is_array($allowed_elements)) {
|
if (is_array($allowed_elements)) {
|
||||||
foreach ($this->info as $name => $d) {
|
foreach ($this->info as $name => $d) {
|
||||||
if(!isset($allowed_elements[$name])) unset($this->info[$name]);
|
if(!isset($allowed_elements[$name])) unset($this->info[$name]);
|
||||||
@@ -240,11 +367,11 @@ class HTMLPurifier_HTMLDefinition
|
|||||||
}
|
}
|
||||||
// emit errors
|
// emit errors
|
||||||
foreach ($allowed_elements as $element => $d) {
|
foreach ($allowed_elements as $element => $d) {
|
||||||
|
$element = htmlspecialchars($element);
|
||||||
trigger_error("Element '$element' is not supported $support", E_USER_WARNING);
|
trigger_error("Element '$element' is not supported $support", E_USER_WARNING);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
$allowed_attributes = $this->config->get('HTML', 'AllowedAttributes');
|
|
||||||
$allowed_attributes_mutable = $allowed_attributes; // by copy!
|
$allowed_attributes_mutable = $allowed_attributes; // by copy!
|
||||||
if (is_array($allowed_attributes)) {
|
if (is_array($allowed_attributes)) {
|
||||||
foreach ($this->info_global_attr as $attr_key => $info) {
|
foreach ($this->info_global_attr as $attr_key => $info) {
|
||||||
@@ -271,6 +398,8 @@ class HTMLPurifier_HTMLDefinition
|
|||||||
// emit errors
|
// emit errors
|
||||||
foreach ($allowed_attributes_mutable as $elattr => $d) {
|
foreach ($allowed_attributes_mutable as $elattr => $d) {
|
||||||
list($element, $attribute) = explode('.', $elattr);
|
list($element, $attribute) = explode('.', $elattr);
|
||||||
|
$element = htmlspecialchars($element);
|
||||||
|
$attribute = htmlspecialchars($attribute);
|
||||||
if ($element == '*') {
|
if ($element == '*') {
|
||||||
trigger_error("Global attribute '$attribute' is not ".
|
trigger_error("Global attribute '$attribute' is not ".
|
||||||
"supported in any elements $support",
|
"supported in any elements $support",
|
||||||
@@ -284,6 +413,41 @@ class HTMLPurifier_HTMLDefinition
|
|||||||
|
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Parses a TinyMCE-flavored Allowed Elements and Attributes list into
|
||||||
|
* separate lists for processing. Format is element[attr1|attr2],element2...
|
||||||
|
* @warning Although it's largely drawn from TinyMCE's implementation,
|
||||||
|
* it is different, and you'll probably have to modify your lists
|
||||||
|
* @param $list String list to parse
|
||||||
|
* @param array($allowed_elements, $allowed_attributes)
|
||||||
|
*/
|
||||||
|
function parseTinyMCEAllowedList($list) {
|
||||||
|
|
||||||
|
$elements = array();
|
||||||
|
$attributes = array();
|
||||||
|
|
||||||
|
$chunks = explode(',', $list);
|
||||||
|
foreach ($chunks as $chunk) {
|
||||||
|
// remove TinyMCE element control characters
|
||||||
|
if (!strpos($chunk, '[')) {
|
||||||
|
$element = $chunk;
|
||||||
|
$attr = false;
|
||||||
|
} else {
|
||||||
|
list($element, $attr) = explode('[', $chunk);
|
||||||
|
}
|
||||||
|
if ($element !== '*') $elements[$element] = true;
|
||||||
|
if (!$attr) continue;
|
||||||
|
$attr = substr($attr, 0, strlen($attr) - 1); // remove trailing ]
|
||||||
|
$attr = explode('|', $attr);
|
||||||
|
foreach ($attr as $key) {
|
||||||
|
$attributes["$element.$key"] = true;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return array($elements, $attributes);
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@@ -16,16 +16,14 @@
|
|||||||
|
|
||||||
class HTMLPurifier_HTMLModule
|
class HTMLPurifier_HTMLModule
|
||||||
{
|
{
|
||||||
|
|
||||||
|
// -- Overloadable ----------------------------------------------------
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Short unique string identifier of the module
|
* Short unique string identifier of the module
|
||||||
*/
|
*/
|
||||||
var $name;
|
var $name;
|
||||||
|
|
||||||
/**
|
|
||||||
* Dynamically set integer that specifies when the module was loaded in.
|
|
||||||
*/
|
|
||||||
var $order;
|
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Informally, a list of elements this module changes. Not used in
|
* Informally, a list of elements this module changes. Not used in
|
||||||
* any significant way.
|
* any significant way.
|
||||||
@@ -99,27 +97,128 @@ class HTMLPurifier_HTMLModule
|
|||||||
*/
|
*/
|
||||||
function getChildDef($def) {return false;}
|
function getChildDef($def) {return false;}
|
||||||
|
|
||||||
/**
|
// -- Convenience -----------------------------------------------------
|
||||||
* Hook method that lets module perform arbitrary operations on
|
|
||||||
* HTMLPurifier_HTMLDefinition before the module gets processed.
|
|
||||||
* @param $definition Reference to HTMLDefinition being setup
|
|
||||||
*/
|
|
||||||
function preProcess(&$definition) {}
|
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Hook method that lets module perform arbitrary operations
|
* Convenience function that sets up a new element
|
||||||
* on HTMLPurifier_HTMLDefinition after the module gets processed.
|
* @param $element Name of element to add
|
||||||
* @param $definition Reference to HTMLDefinition being setup
|
* @param $safe Is element safe for untrusted users to use?
|
||||||
|
* @param $type What content set should element be registered to?
|
||||||
|
* Set as false to skip this step.
|
||||||
|
* @param $contents Allowed children in form of:
|
||||||
|
* "$content_model_type: $content_model"
|
||||||
|
* @param $attr_includes What attribute collections to register to
|
||||||
|
* element?
|
||||||
|
* @param $attr What unique attributes does the element define?
|
||||||
|
* @note See ElementDef for in-depth descriptions of these parameters.
|
||||||
|
* @return Reference to created element definition object, so you
|
||||||
|
* can set advanced parameters
|
||||||
|
* @protected
|
||||||
*/
|
*/
|
||||||
function postProcess(&$definition) {}
|
function &addElement($element, $safe, $type, $contents, $attr_includes = array(), $attr = array()) {
|
||||||
|
$this->elements[] = $element;
|
||||||
|
// parse content_model
|
||||||
|
list($content_model_type, $content_model) = $this->parseContents($contents);
|
||||||
|
// merge in attribute inclusions
|
||||||
|
$this->mergeInAttrIncludes($attr, $attr_includes);
|
||||||
|
// add element to content sets
|
||||||
|
if ($type) $this->addElementToContentSet($element, $type);
|
||||||
|
// create element
|
||||||
|
$this->info[$element] = HTMLPurifier_ElementDef::create(
|
||||||
|
$safe, $content_model, $content_model_type, $attr
|
||||||
|
);
|
||||||
|
// literal object $contents means direct child manipulation
|
||||||
|
if (!is_string($contents)) $this->info[$element]->child = $contents;
|
||||||
|
return $this->info[$element];
|
||||||
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Hook method that is called when a module gets registered to
|
* Convenience function that creates a totally blank, non-standalone
|
||||||
* the definition.
|
* element.
|
||||||
* @param $definition Reference to HTMLDefinition being setup
|
* @param $element Name of element to create
|
||||||
|
* @return Reference to created element
|
||||||
*/
|
*/
|
||||||
function setup(&$definition) {}
|
function &addBlankElement($element) {
|
||||||
|
if (!isset($this->info[$element])) {
|
||||||
|
$this->elements[] = $element;
|
||||||
|
$this->info[$element] = new HTMLPurifier_ElementDef();
|
||||||
|
$this->info[$element]->standalone = false;
|
||||||
|
} else {
|
||||||
|
trigger_error("Definition for $element already exists in module, cannot redefine");
|
||||||
|
}
|
||||||
|
return $this->info[$element];
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Convenience function that registers an element to a content set
|
||||||
|
* @param Element to register
|
||||||
|
* @param Name content set (warning: case sensitive, usually upper-case
|
||||||
|
* first letter)
|
||||||
|
* @protected
|
||||||
|
*/
|
||||||
|
function addElementToContentSet($element, $type) {
|
||||||
|
if (!isset($this->content_sets[$type])) $this->content_sets[$type] = '';
|
||||||
|
else $this->content_sets[$type] .= ' | ';
|
||||||
|
$this->content_sets[$type] .= $element;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Convenience function that transforms single-string contents
|
||||||
|
* into separate content model and content model type
|
||||||
|
* @param $contents Allowed children in form of:
|
||||||
|
* "$content_model_type: $content_model"
|
||||||
|
* @note If contents is an object, an array of two nulls will be
|
||||||
|
* returned, and the callee needs to take the original $contents
|
||||||
|
* and use it directly.
|
||||||
|
*/
|
||||||
|
function parseContents($contents) {
|
||||||
|
if (!is_string($contents)) return array(null, null); // defer
|
||||||
|
switch ($contents) {
|
||||||
|
// check for shorthand content model forms
|
||||||
|
case 'Empty':
|
||||||
|
return array('empty', '');
|
||||||
|
case 'Inline':
|
||||||
|
return array('optional', 'Inline | #PCDATA');
|
||||||
|
case 'Flow':
|
||||||
|
return array('optional', 'Flow | #PCDATA');
|
||||||
|
}
|
||||||
|
list($content_model_type, $content_model) = explode(':', $contents);
|
||||||
|
$content_model_type = strtolower(trim($content_model_type));
|
||||||
|
$content_model = trim($content_model);
|
||||||
|
return array($content_model_type, $content_model);
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Convenience function that merges a list of attribute includes into
|
||||||
|
* an attribute array.
|
||||||
|
* @param $attr Reference to attr array to modify
|
||||||
|
* @param $attr_includes Array of includes / string include to merge in
|
||||||
|
*/
|
||||||
|
function mergeInAttrIncludes(&$attr, $attr_includes) {
|
||||||
|
if (!is_array($attr_includes)) {
|
||||||
|
if (empty($attr_includes)) $attr_includes = array();
|
||||||
|
else $attr_includes = array($attr_includes);
|
||||||
|
}
|
||||||
|
$attr[0] = $attr_includes;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Convenience function that generates a lookup table with boolean
|
||||||
|
* true as value.
|
||||||
|
* @param $list List of values to turn into a lookup
|
||||||
|
* @note You can also pass an arbitrary number of arguments in
|
||||||
|
* place of the regular argument
|
||||||
|
* @return Lookup array equivalent of list
|
||||||
|
*/
|
||||||
|
function makeLookup($list) {
|
||||||
|
if (is_string($list)) $list = func_get_args();
|
||||||
|
$ret = array();
|
||||||
|
foreach ($list as $value) {
|
||||||
|
if (is_null($value)) continue;
|
||||||
|
$ret[$value] = true;
|
||||||
|
}
|
||||||
|
return $ret;
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
?>
|
?>
|
@@ -11,30 +11,22 @@ class HTMLPurifier_HTMLModule_Bdo extends HTMLPurifier_HTMLModule
|
|||||||
{
|
{
|
||||||
|
|
||||||
var $name = 'Bdo';
|
var $name = 'Bdo';
|
||||||
var $elements = array('bdo');
|
|
||||||
var $content_sets = array('Inline' => 'bdo');
|
|
||||||
var $attr_collections = array(
|
var $attr_collections = array(
|
||||||
'I18N' => array('dir' => false)
|
'I18N' => array('dir' => false)
|
||||||
);
|
);
|
||||||
|
|
||||||
function HTMLPurifier_HTMLModule_Bdo() {
|
function HTMLPurifier_HTMLModule_Bdo() {
|
||||||
$dir = new HTMLPurifier_AttrDef_Enum(array('ltr','rtl'), false);
|
$bdo =& $this->addElement(
|
||||||
$this->attr_collections['I18N']['dir'] = $dir;
|
'bdo', true, 'Inline', 'Inline', array('Core', 'Lang'),
|
||||||
$this->info['bdo'] = new HTMLPurifier_ElementDef();
|
array(
|
||||||
$this->info['bdo']->attr = array(
|
'dir' => 'Enum#ltr,rtl', // required
|
||||||
0 => array('Core', 'Lang'),
|
// The Abstract Module specification has the attribute
|
||||||
'dir' => $dir, // required
|
// inclusions wrong for bdo: bdo allows Lang
|
||||||
// The Abstract Module specification has the attribute
|
)
|
||||||
// inclusions wrong for bdo: bdo allows
|
|
||||||
// xml:lang too (and we'll toss in lang for good measure,
|
|
||||||
// though it is not allowed for XHTML 1.1, this will
|
|
||||||
// be managed with a global attribute transform)
|
|
||||||
);
|
);
|
||||||
$this->info['bdo']->content_model = '#PCDATA | Inline';
|
$bdo->attr_transform_post['required-dir'] = new HTMLPurifier_AttrTransform_BdoDir();
|
||||||
$this->info['bdo']->content_model_type = 'optional';
|
|
||||||
// provides fallback behavior if dir's missing (dir is required)
|
$this->attr_collections['I18N']['dir'] = 'Enum#ltr,rtl';
|
||||||
$this->info['bdo']->attr_transform_post['required-dir'] =
|
|
||||||
new HTMLPurifier_AttrTransform_BdoDir();
|
|
||||||
}
|
}
|
||||||
|
|
||||||
}
|
}
|
||||||
|
@@ -1,5 +1,7 @@
|
|||||||
<?php
|
<?php
|
||||||
|
|
||||||
|
require_once 'HTMLPurifier/HTMLModule.php';
|
||||||
|
|
||||||
class HTMLPurifier_HTMLModule_CommonAttributes extends HTMLPurifier_HTMLModule
|
class HTMLPurifier_HTMLModule_CommonAttributes extends HTMLPurifier_HTMLModule
|
||||||
{
|
{
|
||||||
var $name = 'CommonAttributes';
|
var $name = 'CommonAttributes';
|
||||||
@@ -12,9 +14,7 @@ class HTMLPurifier_HTMLModule_CommonAttributes extends HTMLPurifier_HTMLModule
|
|||||||
'id' => 'ID',
|
'id' => 'ID',
|
||||||
'title' => 'CDATA',
|
'title' => 'CDATA',
|
||||||
),
|
),
|
||||||
'Lang' => array(
|
'Lang' => array(),
|
||||||
'xml:lang' => false, // see constructor
|
|
||||||
),
|
|
||||||
'I18N' => array(
|
'I18N' => array(
|
||||||
0 => array('Lang'), // proprietary, for xml:lang/lang
|
0 => array('Lang'), // proprietary, for xml:lang/lang
|
||||||
),
|
),
|
||||||
@@ -22,10 +22,6 @@ class HTMLPurifier_HTMLModule_CommonAttributes extends HTMLPurifier_HTMLModule
|
|||||||
0 => array('Core', 'I18N')
|
0 => array('Core', 'I18N')
|
||||||
)
|
)
|
||||||
);
|
);
|
||||||
|
|
||||||
function HTMLPurifier_HTMLModule_CommonAttributes() {
|
|
||||||
$this->attr_collections['Lang']['xml:lang'] = new HTMLPurifier_AttrDef_Lang();
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
|
||||||
?>
|
?>
|
@@ -11,28 +11,24 @@ class HTMLPurifier_HTMLModule_Edit extends HTMLPurifier_HTMLModule
|
|||||||
{
|
{
|
||||||
|
|
||||||
var $name = 'Edit';
|
var $name = 'Edit';
|
||||||
var $elements = array('del', 'ins');
|
|
||||||
var $content_sets = array('Inline' => 'del | ins');
|
|
||||||
|
|
||||||
function HTMLPurifier_HTMLModule_Edit() {
|
function HTMLPurifier_HTMLModule_Edit() {
|
||||||
foreach ($this->elements as $element) {
|
$contents = 'Chameleon: #PCDATA | Inline ! #PCDATA | Flow';
|
||||||
$this->info[$element] = new HTMLPurifier_ElementDef();
|
$attr = array(
|
||||||
$this->info[$element]->attr = array(
|
'cite' => 'URI',
|
||||||
0 => array('Common'),
|
// 'datetime' => 'Datetime', // not implemented
|
||||||
'cite' => 'URI',
|
);
|
||||||
// 'datetime' => 'Datetime' // Datetime not implemented
|
$this->addElement('del', true, 'Inline', $contents, 'Common', $attr);
|
||||||
);
|
$this->addElement('ins', true, 'Inline', $contents, 'Common', $attr);
|
||||||
// Inline context ! Block context (exclamation mark is
|
|
||||||
// separator, see getChildDef for parsing)
|
|
||||||
$this->info[$element]->content_model =
|
|
||||||
'#PCDATA | Inline ! #PCDATA | Flow';
|
|
||||||
// HTML 4.01 specifies that ins/del must not contain block
|
|
||||||
// elements when used in an inline context, chameleon is
|
|
||||||
// a complicated workaround to acheive this effect
|
|
||||||
$this->info[$element]->content_model_type = 'chameleon';
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// HTML 4.01 specifies that ins/del must not contain block
|
||||||
|
// elements when used in an inline context, chameleon is
|
||||||
|
// a complicated workaround to acheive this effect
|
||||||
|
|
||||||
|
// Inline context ! Block context (exclamation mark is
|
||||||
|
// separator, see getChildDef for parsing)
|
||||||
|
|
||||||
var $defines_child_def = true;
|
var $defines_child_def = true;
|
||||||
function getChildDef($def) {
|
function getChildDef($def) {
|
||||||
if ($def->content_model_type != 'chameleon') return false;
|
if ($def->content_model_type != 'chameleon') return false;
|
||||||
|
@@ -10,25 +10,22 @@ class HTMLPurifier_HTMLModule_Hypertext extends HTMLPurifier_HTMLModule
|
|||||||
{
|
{
|
||||||
|
|
||||||
var $name = 'Hypertext';
|
var $name = 'Hypertext';
|
||||||
var $elements = array('a');
|
|
||||||
var $content_sets = array('Inline' => 'a');
|
|
||||||
|
|
||||||
function HTMLPurifier_HTMLModule_Hypertext() {
|
function HTMLPurifier_HTMLModule_Hypertext() {
|
||||||
$this->info['a'] = new HTMLPurifier_ElementDef();
|
$a =& $this->addElement(
|
||||||
$this->info['a']->attr = array(
|
'a', true, 'Inline', 'Inline', 'Common',
|
||||||
0 => array('Common'),
|
array(
|
||||||
// 'accesskey' => 'Character',
|
// 'accesskey' => 'Character',
|
||||||
// 'charset' => 'Charset',
|
// 'charset' => 'Charset',
|
||||||
'href' => 'URI',
|
'href' => 'URI',
|
||||||
//'hreflang' => 'LanguageCode',
|
// 'hreflang' => 'LanguageCode',
|
||||||
'rel' => new HTMLPurifier_AttrDef_HTML_LinkTypes('rel'),
|
'rel' => new HTMLPurifier_AttrDef_HTML_LinkTypes('rel'),
|
||||||
'rev' => new HTMLPurifier_AttrDef_HTML_LinkTypes('rev'),
|
'rev' => new HTMLPurifier_AttrDef_HTML_LinkTypes('rev'),
|
||||||
//'tabindex' => 'Number',
|
// 'tabindex' => 'Number',
|
||||||
//'type' => 'ContentType',
|
// 'type' => 'ContentType',
|
||||||
|
)
|
||||||
);
|
);
|
||||||
$this->info['a']->content_model = '#PCDATA | Inline';
|
$a->excludes = array('a' => true);
|
||||||
$this->info['a']->content_model_type = 'optional';
|
|
||||||
$this->info['a']->excludes = array('a' => true);
|
|
||||||
}
|
}
|
||||||
|
|
||||||
}
|
}
|
||||||
|
@@ -14,21 +14,21 @@ class HTMLPurifier_HTMLModule_Image extends HTMLPurifier_HTMLModule
|
|||||||
{
|
{
|
||||||
|
|
||||||
var $name = 'Image';
|
var $name = 'Image';
|
||||||
var $elements = array('img');
|
|
||||||
var $content_sets = array('Inline' => 'img');
|
|
||||||
|
|
||||||
function HTMLPurifier_HTMLModule_Image() {
|
function HTMLPurifier_HTMLModule_Image() {
|
||||||
$this->info['img'] = new HTMLPurifier_ElementDef();
|
$img =& $this->addElement(
|
||||||
$this->info['img']->attr = array(
|
'img', true, 'Inline', 'Empty', 'Common',
|
||||||
0 => array('Common'),
|
array(
|
||||||
'alt' => 'Text',
|
'alt*' => 'Text',
|
||||||
'height' => 'Length',
|
'height' => 'Length',
|
||||||
'longdesc' => 'URI',
|
'longdesc' => 'URI',
|
||||||
'src' => new HTMLPurifier_AttrDef_URI(true), // embedded
|
'src*' => new HTMLPurifier_AttrDef_URI(true), // embedded
|
||||||
'width' => 'Length'
|
'width' => 'Length'
|
||||||
|
)
|
||||||
);
|
);
|
||||||
$this->info['img']->content_model_type = 'empty';
|
// kind of strange, but splitting things up would be inefficient
|
||||||
$this->info['img']->attr_transform_post[] =
|
$img->attr_transform_pre[] =
|
||||||
|
$img->attr_transform_post[] =
|
||||||
new HTMLPurifier_AttrTransform_ImgRequired();
|
new HTMLPurifier_AttrTransform_ImgRequired();
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@@ -1,5 +1,7 @@
|
|||||||
<?php
|
<?php
|
||||||
|
|
||||||
|
require_once 'HTMLPurifier/AttrDef/HTML/Bool.php';
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* XHTML 1.1 Legacy module defines elements that were previously
|
* XHTML 1.1 Legacy module defines elements that were previously
|
||||||
* deprecated.
|
* deprecated.
|
||||||
@@ -22,36 +24,115 @@ class HTMLPurifier_HTMLModule_Legacy extends HTMLPurifier_HTMLModule
|
|||||||
// incomplete
|
// incomplete
|
||||||
|
|
||||||
var $name = 'Legacy';
|
var $name = 'Legacy';
|
||||||
var $elements = array('u', 's', 'strike');
|
|
||||||
var $non_standalone_elements = array('li', 'ol', 'address', 'blockquote');
|
|
||||||
|
|
||||||
function HTMLPurifier_HTMLModule_Legacy() {
|
function HTMLPurifier_HTMLModule_Legacy() {
|
||||||
// setup new elements
|
|
||||||
foreach ($this->elements as $name) {
|
$this->addElement('basefont', true, 'Inline', 'Empty', false, array(
|
||||||
$this->info[$name] = new HTMLPurifier_ElementDef();
|
'color' => 'Color',
|
||||||
// for u, s, strike, as more elements get added, add
|
'face' => 'Text', // extremely broad, we should
|
||||||
// conditionals as necessary
|
'size' => 'Text', // tighten it
|
||||||
$this->info[$name]->content_model = 'Inline | #PCDATA';
|
'id' => 'ID'
|
||||||
$this->info[$name]->content_model_type = 'optional';
|
));
|
||||||
$this->info[$name]->attr[0] = array('Common');
|
$this->addElement('center', true, 'Block', 'Flow', 'Common');
|
||||||
}
|
$this->addElement('dir', true, 'Block', 'Required: li', 'Common', array(
|
||||||
|
'compact' => 'Bool#compact'
|
||||||
|
));
|
||||||
|
$this->addElement('font', true, 'Inline', 'Inline', array('Core', 'I18N'), array(
|
||||||
|
'color' => 'Color',
|
||||||
|
'face' => 'Text', // extremely broad, we should
|
||||||
|
'size' => 'Text', // tighten it
|
||||||
|
));
|
||||||
|
$this->addElement('menu', true, 'Block', 'Required: li', 'Common', array(
|
||||||
|
'compact' => 'Bool#compact'
|
||||||
|
));
|
||||||
|
$this->addElement('s', true, 'Inline', 'Inline', 'Common');
|
||||||
|
$this->addElement('strike', true, 'Inline', 'Inline', 'Common');
|
||||||
|
$this->addElement('u', true, 'Inline', 'Inline', 'Common');
|
||||||
|
|
||||||
// setup modifications to old elements
|
// setup modifications to old elements
|
||||||
foreach ($this->non_standalone_elements as $name) {
|
|
||||||
$this->info[$name] = new HTMLPurifier_ElementDef();
|
$align = 'Enum#left,right,center,justify';
|
||||||
$this->info[$name]->standalone = false;
|
|
||||||
|
$address =& $this->addBlankElement('address');
|
||||||
|
$address->content_model = 'Inline | #PCDATA | p';
|
||||||
|
$address->content_model_type = 'optional';
|
||||||
|
$address->child = false;
|
||||||
|
|
||||||
|
$blockquote =& $this->addBlankElement('blockquote');
|
||||||
|
$blockquote->content_model = 'Flow | #PCDATA';
|
||||||
|
$blockquote->content_model_type = 'optional';
|
||||||
|
$blockquote->child = false;
|
||||||
|
|
||||||
|
$br =& $this->addBlankElement('br');
|
||||||
|
$br->attr['clear'] = 'Enum#left,all,right,none';
|
||||||
|
|
||||||
|
$caption =& $this->addBlankElement('caption');
|
||||||
|
$caption->attr['align'] = 'Enum#top,bottom,left,right';
|
||||||
|
|
||||||
|
$div =& $this->addBlankElement('div');
|
||||||
|
$div->attr['align'] = $align;
|
||||||
|
|
||||||
|
$dl =& $this->addBlankElement('dl');
|
||||||
|
$dl->attr['compact'] = 'Bool#compact';
|
||||||
|
|
||||||
|
for ($i = 1; $i <= 6; $i++) {
|
||||||
|
$h =& $this->addBlankElement("h$i");
|
||||||
|
$h->attr['align'] = $align;
|
||||||
}
|
}
|
||||||
|
|
||||||
$this->info['li']->attr['value'] = new HTMLPurifier_AttrDef_Integer();
|
$hr =& $this->addBlankElement('hr');
|
||||||
$this->info['ol']->attr['start'] = new HTMLPurifier_AttrDef_Integer();
|
$hr->attr['align'] = $align;
|
||||||
|
$hr->attr['noshade'] = 'Bool#noshade';
|
||||||
|
$hr->attr['size'] = 'Pixels';
|
||||||
|
$hr->attr['width'] = 'Length';
|
||||||
|
|
||||||
$this->info['address']->content_model = 'Inline | #PCDATA | p';
|
$img =& $this->addBlankElement('img');
|
||||||
$this->info['address']->content_model_type = 'optional';
|
$img->attr['align'] = 'Enum#top,middle,bottom,left,right';
|
||||||
$this->info['address']->child = false;
|
$img->attr['border'] = 'Pixels';
|
||||||
|
$img->attr['hspace'] = 'Pixels';
|
||||||
|
$img->attr['vspace'] = 'Pixels';
|
||||||
|
|
||||||
$this->info['blockquote']->content_model = 'Flow | #PCDATA';
|
// figure out this integer business
|
||||||
$this->info['blockquote']->content_model_type = 'optional';
|
|
||||||
$this->info['blockquote']->child = false;
|
$li =& $this->addBlankElement('li');
|
||||||
|
$li->attr['value'] = new HTMLPurifier_AttrDef_Integer();
|
||||||
|
$li->attr['type'] = 'Enum#s:1,i,I,a,A,disc,square,circle';
|
||||||
|
|
||||||
|
$ol =& $this->addBlankElement('ol');
|
||||||
|
$ol->attr['compact'] = 'Bool#compact';
|
||||||
|
$ol->attr['start'] = new HTMLPurifier_AttrDef_Integer();
|
||||||
|
$ol->attr['type'] = 'Enum#s:1,i,I,a,A';
|
||||||
|
|
||||||
|
$p =& $this->addBlankElement('p');
|
||||||
|
$p->attr['align'] = $align;
|
||||||
|
|
||||||
|
$pre =& $this->addBlankElement('pre');
|
||||||
|
$pre->attr['width'] = 'Number';
|
||||||
|
|
||||||
|
// script omitted
|
||||||
|
|
||||||
|
$table =& $this->addBlankElement('table');
|
||||||
|
$table->attr['align'] = 'Enum#left,center,right';
|
||||||
|
$table->attr['bgcolor'] = 'Color';
|
||||||
|
|
||||||
|
$tr =& $this->addBlankElement('tr');
|
||||||
|
$tr->attr['bgcolor'] = 'Color';
|
||||||
|
|
||||||
|
$th =& $this->addBlankElement('th');
|
||||||
|
$th->attr['bgcolor'] = 'Color';
|
||||||
|
$th->attr['height'] = 'Length';
|
||||||
|
$th->attr['nowrap'] = 'Bool#nowrap';
|
||||||
|
$th->attr['width'] = 'Length';
|
||||||
|
|
||||||
|
$td =& $this->addBlankElement('td');
|
||||||
|
$td->attr['bgcolor'] = 'Color';
|
||||||
|
$td->attr['height'] = 'Length';
|
||||||
|
$td->attr['nowrap'] = 'Bool#nowrap';
|
||||||
|
$td->attr['width'] = 'Length';
|
||||||
|
|
||||||
|
$ul =& $this->addBlankElement('ul');
|
||||||
|
$ul->attr['compact'] = 'Bool#compact';
|
||||||
|
$ul->attr['type'] = 'Enum#square,disc,circle';
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@@ -9,7 +9,6 @@ class HTMLPurifier_HTMLModule_List extends HTMLPurifier_HTMLModule
|
|||||||
{
|
{
|
||||||
|
|
||||||
var $name = 'List';
|
var $name = 'List';
|
||||||
var $elements = array('dl', 'dt', 'dd', 'ol', 'ul', 'li');
|
|
||||||
|
|
||||||
// According to the abstract schema, the List content set is a fully formed
|
// According to the abstract schema, the List content set is a fully formed
|
||||||
// one or more expr, but it invariably occurs in an optional declaration
|
// one or more expr, but it invariably occurs in an optional declaration
|
||||||
@@ -19,26 +18,19 @@ class HTMLPurifier_HTMLModule_List extends HTMLPurifier_HTMLModule
|
|||||||
// Furthermore, the actual XML Schema may disagree. Regardless,
|
// Furthermore, the actual XML Schema may disagree. Regardless,
|
||||||
// we don't have support for such nested expressions without using
|
// we don't have support for such nested expressions without using
|
||||||
// the incredibly inefficient and draconic Custom ChildDef.
|
// the incredibly inefficient and draconic Custom ChildDef.
|
||||||
var $content_sets = array('List' => 'dl | ol | ul', 'Flow' => 'List');
|
|
||||||
|
var $content_sets = array('Flow' => 'List');
|
||||||
|
|
||||||
function HTMLPurifier_HTMLModule_List() {
|
function HTMLPurifier_HTMLModule_List() {
|
||||||
foreach ($this->elements as $element) {
|
$this->addElement('ol', true, 'List', 'Required: li', 'Common');
|
||||||
$this->info[$element] = new HTMLPurifier_ElementDef();
|
$this->addElement('ul', true, 'List', 'Required: li', 'Common');
|
||||||
$this->info[$element]->attr = array(0 => array('Common'));
|
$this->addElement('dl', true, 'List', 'Required: dt | dd', 'Common');
|
||||||
if ($element == 'li' || $element == 'dd') {
|
|
||||||
$this->info[$element]->content_model = '#PCDATA | Flow';
|
$li =& $this->addElement('li', true, false, 'Flow', 'Common');
|
||||||
$this->info[$element]->content_model_type = 'optional';
|
$li->auto_close = array('li' => true);
|
||||||
} elseif ($element == 'ol' || $element == 'ul') {
|
|
||||||
$this->info[$element]->content_model = 'li';
|
$this->addElement('dd', true, false, 'Flow', 'Common');
|
||||||
$this->info[$element]->content_model_type = 'required';
|
$this->addElement('dt', true, false, 'Inline', 'Common');
|
||||||
}
|
|
||||||
}
|
|
||||||
$this->info['dt']->content_model = '#PCDATA | Inline';
|
|
||||||
$this->info['dt']->content_model_type = 'optional';
|
|
||||||
$this->info['dl']->content_model = 'dt | dd';
|
|
||||||
$this->info['dl']->content_model_type = 'required';
|
|
||||||
// this could be a LOT more robust
|
|
||||||
$this->info['li']->auto_close = array('li' => true);
|
|
||||||
}
|
}
|
||||||
|
|
||||||
}
|
}
|
||||||
|
16
library/HTMLPurifier/HTMLModule/NonXMLCommonAttributes.php
Normal file
16
library/HTMLPurifier/HTMLModule/NonXMLCommonAttributes.php
Normal file
@@ -0,0 +1,16 @@
|
|||||||
|
<?php
|
||||||
|
|
||||||
|
require_once 'HTMLPurifier/HTMLModule.php';
|
||||||
|
|
||||||
|
class HTMLPurifier_HTMLModule_NonXMLCommonAttributes extends HTMLPurifier_HTMLModule
|
||||||
|
{
|
||||||
|
var $name = 'NonXMLCommonAttributes';
|
||||||
|
|
||||||
|
var $attr_collections = array(
|
||||||
|
'Lang' => array(
|
||||||
|
'lang' => 'LanguageCode',
|
||||||
|
)
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
?>
|
@@ -16,23 +16,16 @@ class HTMLPurifier_HTMLModule_Presentation extends HTMLPurifier_HTMLModule
|
|||||||
{
|
{
|
||||||
|
|
||||||
var $name = 'Presentation';
|
var $name = 'Presentation';
|
||||||
var $elements = array('b', 'big', 'hr', 'i', 'small', 'sub', 'sup', 'tt');
|
|
||||||
var $content_sets = array(
|
|
||||||
'Block' => 'hr',
|
|
||||||
'Inline' => 'b | big | i | small | sub | sup | tt'
|
|
||||||
);
|
|
||||||
|
|
||||||
function HTMLPurifier_HTMLModule_Presentation() {
|
function HTMLPurifier_HTMLModule_Presentation() {
|
||||||
foreach ($this->elements as $element) {
|
$this->addElement('b', true, 'Inline', 'Inline', 'Common');
|
||||||
$this->info[$element] = new HTMLPurifier_ElementDef();
|
$this->addElement('big', true, 'Inline', 'Inline', 'Common');
|
||||||
$this->info[$element]->attr = array(0 => array('Common'));
|
$this->addElement('hr', true, 'Block', 'Empty', 'Common');
|
||||||
if ($element == 'hr') {
|
$this->addElement('i', true, 'Inline', 'Inline', 'Common');
|
||||||
$this->info[$element]->content_model_type = 'empty';
|
$this->addElement('small', true, 'Inline', 'Inline', 'Common');
|
||||||
} else {
|
$this->addElement('sub', true, 'Inline', 'Inline', 'Common');
|
||||||
$this->info[$element]->content_model = '#PCDATA | Inline';
|
$this->addElement('sup', true, 'Inline', 'Inline', 'Common');
|
||||||
$this->info[$element]->content_model_type = 'optional';
|
$this->addElement('tt', true, 'Inline', 'Inline', 'Common');
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
|
||||||
}
|
}
|
||||||
|
@@ -46,8 +46,12 @@ class HTMLPurifier_HTMLModule_Scripting extends HTMLPurifier_HTMLModule
|
|||||||
// blockquote's custom definition (we would use it but
|
// blockquote's custom definition (we would use it but
|
||||||
// blockquote's contents are optional while noscript's contents
|
// blockquote's contents are optional while noscript's contents
|
||||||
// are required)
|
// are required)
|
||||||
|
|
||||||
|
// TODO: convert this to new syntax, main problem is getting
|
||||||
|
// both content sets working
|
||||||
foreach ($this->elements as $element) {
|
foreach ($this->elements as $element) {
|
||||||
$this->info[$element] = new HTMLPurifier_ElementDef();
|
$this->info[$element] = new HTMLPurifier_ElementDef();
|
||||||
|
$this->info[$element]->safe = false;
|
||||||
}
|
}
|
||||||
$this->info['noscript']->attr = array( 0 => array('Common') );
|
$this->info['noscript']->attr = array( 0 => array('Common') );
|
||||||
$this->info['noscript']->content_model = 'Heading | List | Block';
|
$this->info['noscript']->content_model = 'Heading | List | Block';
|
||||||
|
@@ -10,75 +10,60 @@ class HTMLPurifier_HTMLModule_Tables extends HTMLPurifier_HTMLModule
|
|||||||
{
|
{
|
||||||
|
|
||||||
var $name = 'Tables';
|
var $name = 'Tables';
|
||||||
var $elements = array('caption', 'table', 'td', 'th', 'tr', 'col',
|
|
||||||
'colgroup', 'tbody', 'thead', 'tfoot');
|
|
||||||
var $content_sets = array('Block' => 'table');
|
|
||||||
|
|
||||||
function HTMLPurifier_HTMLModule_Tables() {
|
function HTMLPurifier_HTMLModule_Tables() {
|
||||||
foreach ($this->elements as $e) {
|
|
||||||
$this->info[$e] = new HTMLPurifier_ElementDef();
|
|
||||||
$this->info[$e]->attr = array(0 => array('Common'));
|
|
||||||
$attr =& $this->info[$e]->attr;
|
|
||||||
if ($e == 'caption') continue;
|
|
||||||
if ($e == 'table'){
|
|
||||||
$attr['border'] = 'Pixels';
|
|
||||||
$attr['cellpadding'] = 'Length';
|
|
||||||
$attr['cellspacing'] = 'Length';
|
|
||||||
$attr['frame'] = new HTMLPurifier_AttrDef_Enum(array(
|
|
||||||
'void', 'above', 'below', 'hsides', 'lhs', 'rhs',
|
|
||||||
'vsides', 'box', 'border'
|
|
||||||
), false);
|
|
||||||
$attr['rules'] = new HTMLPurifier_AttrDef_Enum(array(
|
|
||||||
'none', 'groups', 'rows', 'cols', 'all'
|
|
||||||
), false);
|
|
||||||
$attr['summary'] = 'Text';
|
|
||||||
$attr['width'] = 'Length';
|
|
||||||
continue;
|
|
||||||
}
|
|
||||||
if ($e == 'col' || $e == 'colgroup') {
|
|
||||||
$attr['span'] = 'Number';
|
|
||||||
$attr['width'] = 'MultiLength';
|
|
||||||
}
|
|
||||||
if ($e == 'td' || $e == 'th') {
|
|
||||||
$attr['abbr'] = 'Text';
|
|
||||||
$attr['colspan'] = 'Number';
|
|
||||||
$attr['rowspan'] = 'Number';
|
|
||||||
}
|
|
||||||
$attr['align'] = new HTMLPurifier_AttrDef_Enum(array(
|
|
||||||
'left', 'center', 'right', 'justify', 'char'
|
|
||||||
), false);
|
|
||||||
$attr['valign'] = new HTMLPurifier_AttrDef_Enum(array(
|
|
||||||
'top', 'middle', 'bottom', 'baseline'
|
|
||||||
), false);
|
|
||||||
$attr['charoff'] = 'Length';
|
|
||||||
}
|
|
||||||
$this->info['caption']->content_model = '#PCDATA | Inline';
|
|
||||||
$this->info['caption']->content_model_type = 'optional';
|
|
||||||
|
|
||||||
// Is done directly because it doesn't leverage substitution
|
$this->addElement('caption', true, false, 'Inline', 'Common');
|
||||||
// mechanisms. True model is:
|
|
||||||
// 'caption?, ( col* | colgroup* ), (( thead?, tfoot?, tbody+ ) | ( tr+ ))'
|
|
||||||
$this->info['table']->child = new HTMLPurifier_ChildDef_Table();
|
|
||||||
|
|
||||||
$this->info['td']->content_model =
|
$this->addElement('table', true, 'Block',
|
||||||
$this->info['th']->content_model = '#PCDATA | Flow';
|
new HTMLPurifier_ChildDef_Table(), 'Common',
|
||||||
$this->info['td']->content_model_type =
|
array(
|
||||||
$this->info['th']->content_model_type = 'optional';
|
'border' => 'Pixels',
|
||||||
|
'cellpadding' => 'Length',
|
||||||
|
'cellspacing' => 'Length',
|
||||||
|
'frame' => 'Enum#void,above,below,hsides,lhs,rhs,vsides,box,border',
|
||||||
|
'rules' => 'Enum#none,groups,rows,cols,all',
|
||||||
|
'summary' => 'Text',
|
||||||
|
'width' => 'Length'
|
||||||
|
)
|
||||||
|
);
|
||||||
|
|
||||||
$this->info['tr']->content_model = 'td | th';
|
// common attributes
|
||||||
$this->info['tr']->content_model_type = 'required';
|
$cell_align = array(
|
||||||
|
'align' => 'Enum#left,center,right,justify,char',
|
||||||
|
'charoff' => 'Length',
|
||||||
|
'valign' => 'Enum#top,middle,bottom,baseline',
|
||||||
|
);
|
||||||
|
|
||||||
$this->info['col']->content_model_type = 'empty';
|
$cell_t = array_merge(
|
||||||
|
array(
|
||||||
|
'abbr' => 'Text',
|
||||||
|
'colspan' => 'Number',
|
||||||
|
'rowspan' => 'Number',
|
||||||
|
),
|
||||||
|
$cell_align
|
||||||
|
);
|
||||||
|
$this->addElement('td', true, false, 'Flow', 'Common', $cell_t);
|
||||||
|
$this->addElement('th', true, false, 'Flow', 'Common', $cell_t);
|
||||||
|
|
||||||
$this->info['colgroup']->content_model = 'col';
|
$this->addElement('tr', true, false, 'Required: td | th', 'Common', $cell_align);
|
||||||
$this->info['colgroup']->content_model_type = 'optional';
|
|
||||||
|
|
||||||
$this->info['tbody']->content_model =
|
$cell_col = array_merge(
|
||||||
$this->info['thead']->content_model =
|
array(
|
||||||
$this->info['tfoot']->content_model = 'tr';
|
'span' => 'Number',
|
||||||
$this->info['tbody']->content_model_type =
|
'width' => 'MultiLength',
|
||||||
$this->info['thead']->content_model_type =
|
),
|
||||||
$this->info['tfoot']->content_model_type = 'required';
|
$cell_align
|
||||||
|
);
|
||||||
|
$this->addElement('col', true, false, 'Empty', 'Common', $cell_col);
|
||||||
|
$colgroup =& $this->addElement('colgroup', true, false, 'Optional: col', 'Common', $cell_col);
|
||||||
|
$colgroup->auto_close = $this->makeLookup(
|
||||||
|
'thead', 'tbody', 'tfoot', 'tr'
|
||||||
|
);
|
||||||
|
|
||||||
|
$this->addElement('tbody', true, false, 'Required: tr', 'Common', $cell_align);
|
||||||
|
$this->addElement('thead', true, false, 'Required: tr', 'Common', $cell_align);
|
||||||
|
$this->addElement('tfoot', true, false, 'Required: tr', 'Common', $cell_align);
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@@ -9,13 +9,12 @@ class HTMLPurifier_HTMLModule_Target extends HTMLPurifier_HTMLModule
|
|||||||
{
|
{
|
||||||
|
|
||||||
var $name = 'Target';
|
var $name = 'Target';
|
||||||
var $elements = array('a');
|
|
||||||
|
|
||||||
function HTMLPurifier_HTMLModule_Target() {
|
function HTMLPurifier_HTMLModule_Target() {
|
||||||
foreach ($this->elements as $e) {
|
$elements = array('a');
|
||||||
$this->info[$e] = new HTMLPurifier_ElementDef();
|
foreach ($elements as $name) {
|
||||||
$this->info[$e]->standalone = false;
|
$e =& $this->addBlankElement($name);
|
||||||
$this->info[$e]->attr = array(
|
$e->attr = array(
|
||||||
'target' => new HTMLPurifier_AttrDef_HTML_FrameTarget()
|
'target' => new HTMLPurifier_AttrDef_HTML_FrameTarget()
|
||||||
);
|
);
|
||||||
}
|
}
|
||||||
|
@@ -10,65 +10,60 @@ require_once 'HTMLPurifier/HTMLModule.php';
|
|||||||
* - Block Structural (div, p)
|
* - Block Structural (div, p)
|
||||||
* - Inline Phrasal (abbr, acronym, cite, code, dfn, em, kbd, q, samp, strong, var)
|
* - Inline Phrasal (abbr, acronym, cite, code, dfn, em, kbd, q, samp, strong, var)
|
||||||
* - Inline Structural (br, span)
|
* - Inline Structural (br, span)
|
||||||
* We have elected not to follow suite, but this may change.
|
* This module, functionally, does not distinguish between these
|
||||||
|
* sub-modules, but the code is internally structured to reflect
|
||||||
|
* these distinctions.
|
||||||
*/
|
*/
|
||||||
class HTMLPurifier_HTMLModule_Text extends HTMLPurifier_HTMLModule
|
class HTMLPurifier_HTMLModule_Text extends HTMLPurifier_HTMLModule
|
||||||
{
|
{
|
||||||
|
|
||||||
var $name = 'Text';
|
var $name = 'Text';
|
||||||
|
|
||||||
var $elements = array('abbr', 'acronym', 'address', 'blockquote',
|
|
||||||
'br', 'cite', 'code', 'dfn', 'div', 'em', 'h1', 'h2', 'h3',
|
|
||||||
'h4', 'h5', 'h6', 'kbd', 'p', 'pre', 'q', 'samp', 'span', 'strong',
|
|
||||||
'var');
|
|
||||||
|
|
||||||
var $content_sets = array(
|
var $content_sets = array(
|
||||||
'Heading' => 'h1 | h2 | h3 | h4 | h5 | h6',
|
|
||||||
'Block' => 'address | blockquote | div | p | pre',
|
|
||||||
'Inline' => 'abbr | acronym | br | cite | code | dfn | em | kbd | q | samp | span | strong | var',
|
|
||||||
'Flow' => 'Heading | Block | Inline'
|
'Flow' => 'Heading | Block | Inline'
|
||||||
);
|
);
|
||||||
|
|
||||||
function HTMLPurifier_HTMLModule_Text() {
|
function HTMLPurifier_HTMLModule_Text() {
|
||||||
foreach ($this->elements as $element) {
|
|
||||||
$this->info[$element] = new HTMLPurifier_ElementDef();
|
// Inline Phrasal -------------------------------------------------
|
||||||
// attributes
|
$this->addElement('abbr', true, 'Inline', 'Inline', 'Common');
|
||||||
if ($element == 'br') {
|
$this->addElement('acronym', true, 'Inline', 'Inline', 'Common');
|
||||||
$this->info[$element]->attr = array(0 => array('Core'));
|
$this->addElement('cite', true, 'Inline', 'Inline', 'Common');
|
||||||
} elseif ($element == 'blockquote' || $element == 'q') {
|
$this->addElement('code', true, 'Inline', 'Inline', 'Common');
|
||||||
$this->info[$element]->attr = array(0 => array('Common'), 'cite' => 'URI');
|
$this->addElement('dfn', true, 'Inline', 'Inline', 'Common');
|
||||||
} else {
|
$this->addElement('em', true, 'Inline', 'Inline', 'Common');
|
||||||
$this->info[$element]->attr = array(0 => array('Common'));
|
$this->addElement('kbd', true, 'Inline', 'Inline', 'Common');
|
||||||
}
|
$this->addElement('q', true, 'Inline', 'Inline', 'Common', array('cite' => 'URI'));
|
||||||
// content models
|
$this->addElement('samp', true, 'Inline', 'Inline', 'Common');
|
||||||
if ($element == 'br') {
|
$this->addElement('strong', true, 'Inline', 'Inline', 'Common');
|
||||||
$this->info[$element]->content_model_type = 'empty';
|
$this->addElement('var', true, 'Inline', 'Inline', 'Common');
|
||||||
} elseif ($element == 'blockquote') {
|
|
||||||
$this->info[$element]->content_model = 'Heading | Block | List';
|
// Inline Structural ----------------------------------------------
|
||||||
$this->info[$element]->content_model_type = 'optional';
|
$this->addElement('span', true, 'Inline', 'Inline', 'Common');
|
||||||
} elseif ($element == 'div') {
|
$this->addElement('br', true, 'Inline', 'Empty', 'Core');
|
||||||
$this->info[$element]->content_model = '#PCDATA | Flow';
|
|
||||||
$this->info[$element]->content_model_type = 'optional';
|
// Block Phrasal --------------------------------------------------
|
||||||
} else {
|
$this->addElement('address', true, 'Block', 'Inline', 'Common');
|
||||||
$this->info[$element]->content_model = '#PCDATA | Inline';
|
$this->addElement('blockquote', true, 'Block', 'Optional: Heading | Block | List', 'Common', array('cite' => 'URI') );
|
||||||
$this->info[$element]->content_model_type = 'optional';
|
$pre =& $this->addElement('pre', true, 'Block', 'Inline', 'Common');
|
||||||
}
|
$pre->excludes = $this->makeLookup(
|
||||||
}
|
'img', 'big', 'small', 'object', 'applet', 'font', 'basefont' );
|
||||||
// SGML permits exclusions for all descendants, but this is
|
$this->addElement('h1', true, 'Heading', 'Inline', 'Common');
|
||||||
// not possible with DTDs or XML Schemas. W3C has elected to
|
$this->addElement('h2', true, 'Heading', 'Inline', 'Common');
|
||||||
// use complicated compositions of content_models to simulate
|
$this->addElement('h3', true, 'Heading', 'Inline', 'Common');
|
||||||
// exclusion for children, but we go the simpler, SGML-style
|
$this->addElement('h4', true, 'Heading', 'Inline', 'Common');
|
||||||
// route of flat-out exclusions. Note that the Abstract Module
|
$this->addElement('h5', true, 'Heading', 'Inline', 'Common');
|
||||||
// is blithely unaware of such distinctions.
|
$this->addElement('h6', true, 'Heading', 'Inline', 'Common');
|
||||||
$this->info['pre']->excludes = array_flip(array(
|
|
||||||
'img', 'big', 'small',
|
// Block Structural -----------------------------------------------
|
||||||
'object', 'applet', 'font', 'basefont' // generally not allowed
|
$p =& $this->addElement('p', true, 'Block', 'Inline', 'Common');
|
||||||
));
|
// this seems really ad hoc: implementing some general
|
||||||
$this->info['p']->auto_close = array_flip(array(
|
// heuristics would probably be better
|
||||||
|
$p->auto_close = $this->makeLookup(
|
||||||
'address', 'blockquote', 'dd', 'dir', 'div', 'dl', 'dt',
|
'address', 'blockquote', 'dd', 'dir', 'div', 'dl', 'dt',
|
||||||
'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'hr', 'ol', 'p', 'pre',
|
'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'hr', 'ol', 'p', 'pre',
|
||||||
'table', 'ul'
|
'table', 'ul' );
|
||||||
));
|
$this->addElement('div', true, 'Block', 'Flow', 'Common');
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|
||||||
}
|
}
|
||||||
|
241
library/HTMLPurifier/HTMLModule/Tidy.php
Normal file
241
library/HTMLPurifier/HTMLModule/Tidy.php
Normal file
@@ -0,0 +1,241 @@
|
|||||||
|
<?php
|
||||||
|
|
||||||
|
require_once 'HTMLPurifier/HTMLModule.php';
|
||||||
|
|
||||||
|
HTMLPurifier_ConfigSchema::define(
|
||||||
|
'HTML', 'TidyLevel', 'medium', 'string', '
|
||||||
|
<p>General level of cleanliness the Tidy module should enforce.
|
||||||
|
There are four allowed values:</p>
|
||||||
|
<dl>
|
||||||
|
<dt>none</dt>
|
||||||
|
<dd>No extra tidying should be done</dd>
|
||||||
|
<dt>light</dt>
|
||||||
|
<dd>Only fix elements that would be discarded otherwise due to
|
||||||
|
lack of support in doctype</dd>
|
||||||
|
<dt>medium</dt>
|
||||||
|
<dd>Enforce best practices</dd>
|
||||||
|
<dt>heavy</dt>
|
||||||
|
<dd>Transform all deprecated elements and attributes to standards
|
||||||
|
compliant equivalents</dd>
|
||||||
|
</dl>
|
||||||
|
<p>This directive has been available since 2.0.0</p>
|
||||||
|
' );
|
||||||
|
HTMLPurifier_ConfigSchema::defineAllowedValues(
|
||||||
|
'HTML', 'TidyLevel', array('none', 'light', 'medium', 'heavy')
|
||||||
|
);
|
||||||
|
|
||||||
|
HTMLPurifier_ConfigSchema::define(
|
||||||
|
'HTML', 'TidyAdd', array(), 'lookup', '
|
||||||
|
Fixes to add to the default set of Tidy fixes as per your level. This
|
||||||
|
directive has been available since 2.0.0.
|
||||||
|
' );
|
||||||
|
|
||||||
|
HTMLPurifier_ConfigSchema::define(
|
||||||
|
'HTML', 'TidyRemove', array(), 'lookup', '
|
||||||
|
Fixes to remove from the default set of Tidy fixes as per your level. This
|
||||||
|
directive has been available since 2.0.0.
|
||||||
|
' );
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Abstract class for a set of proprietary modules that clean up (tidy)
|
||||||
|
* poorly written HTML.
|
||||||
|
*/
|
||||||
|
class HTMLPurifier_HTMLModule_Tidy extends HTMLPurifier_HTMLModule
|
||||||
|
{
|
||||||
|
|
||||||
|
/**
|
||||||
|
* List of supported levels. Index zero is a special case "no fixes"
|
||||||
|
* level.
|
||||||
|
*/
|
||||||
|
var $levels = array(0 => 'none', 'light', 'medium', 'heavy');
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Default level to place all fixes in. Disabled by default
|
||||||
|
*/
|
||||||
|
var $defaultLevel = null;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Lists of fixes used by getFixesForLevel(). Format is:
|
||||||
|
* HTMLModule_Tidy->fixesForLevel[$level] = array('fix-1', 'fix-2');
|
||||||
|
*/
|
||||||
|
var $fixesForLevel = array(
|
||||||
|
'light' => array(),
|
||||||
|
'medium' => array(),
|
||||||
|
'heavy' => array()
|
||||||
|
);
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Lazy load constructs the module by determining the necessary
|
||||||
|
* fixes to create and then delegating to the populate() function.
|
||||||
|
* @todo Wildcard matching and error reporting when an added or
|
||||||
|
* subtracted fix has no effect.
|
||||||
|
*/
|
||||||
|
function construct($config) {
|
||||||
|
|
||||||
|
// create fixes, initialize fixesForLevel
|
||||||
|
$fixes = $this->makeFixes();
|
||||||
|
$this->makeFixesForLevel($fixes);
|
||||||
|
|
||||||
|
// figure out which fixes to use
|
||||||
|
$level = $config->get('HTML', 'TidyLevel');
|
||||||
|
$fixes_lookup = $this->getFixesForLevel($level);
|
||||||
|
|
||||||
|
// get custom fix declarations: these need namespace processing
|
||||||
|
$add_fixes = $config->get('HTML', 'TidyAdd');
|
||||||
|
$remove_fixes = $config->get('HTML', 'TidyRemove');
|
||||||
|
|
||||||
|
foreach ($fixes as $name => $fix) {
|
||||||
|
// needs to be refactored a little to implement globbing
|
||||||
|
if (
|
||||||
|
isset($remove_fixes[$name]) ||
|
||||||
|
(!isset($add_fixes[$name]) && !isset($fixes_lookup[$name]))
|
||||||
|
) {
|
||||||
|
unset($fixes[$name]);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// populate this module with necessary fixes
|
||||||
|
$this->populate($fixes);
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Retrieves all fixes per a level, returning fixes for that specific
|
||||||
|
* level as well as all levels below it.
|
||||||
|
* @param $level String level identifier, see $levels for valid values
|
||||||
|
* @return Lookup up table of fixes
|
||||||
|
*/
|
||||||
|
function getFixesForLevel($level) {
|
||||||
|
if ($level == $this->levels[0]) {
|
||||||
|
return array();
|
||||||
|
}
|
||||||
|
$activated_levels = array();
|
||||||
|
for ($i = 1, $c = count($this->levels); $i < $c; $i++) {
|
||||||
|
$activated_levels[] = $this->levels[$i];
|
||||||
|
if ($this->levels[$i] == $level) break;
|
||||||
|
}
|
||||||
|
if ($i == $c) {
|
||||||
|
trigger_error(
|
||||||
|
'Tidy level ' . htmlspecialchars($level) . ' not recognized',
|
||||||
|
E_USER_WARNING
|
||||||
|
);
|
||||||
|
return array();
|
||||||
|
}
|
||||||
|
$ret = array();
|
||||||
|
foreach ($activated_levels as $level) {
|
||||||
|
foreach ($this->fixesForLevel[$level] as $fix) {
|
||||||
|
$ret[$fix] = true;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return $ret;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Dynamically populates the $fixesForLevel member variable using
|
||||||
|
* the fixes array. It may be custom overloaded, used in conjunction
|
||||||
|
* with $defaultLevel, or not used at all.
|
||||||
|
*/
|
||||||
|
function makeFixesForLevel($fixes) {
|
||||||
|
if (!isset($this->defaultLevel)) return;
|
||||||
|
if (!isset($this->fixesForLevel[$this->defaultLevel])) {
|
||||||
|
trigger_error(
|
||||||
|
'Default level ' . $this->defaultLevel . ' does not exist',
|
||||||
|
E_USER_ERROR
|
||||||
|
);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
$this->fixesForLevel[$this->defaultLevel] = array_keys($fixes);
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Populates the module with transforms and other special-case code
|
||||||
|
* based on a list of fixes passed to it
|
||||||
|
* @param $lookup Lookup table of fixes to activate
|
||||||
|
*/
|
||||||
|
function populate($fixes) {
|
||||||
|
foreach ($fixes as $name => $fix) {
|
||||||
|
// determine what the fix is for
|
||||||
|
list($type, $params) = $this->getFixType($name);
|
||||||
|
switch ($type) {
|
||||||
|
case 'attr_transform_pre':
|
||||||
|
case 'attr_transform_post':
|
||||||
|
$attr = $params['attr'];
|
||||||
|
if (isset($params['element'])) {
|
||||||
|
$element = $params['element'];
|
||||||
|
if (empty($this->info[$element])) {
|
||||||
|
$e =& $this->addBlankElement($element);
|
||||||
|
} else {
|
||||||
|
$e =& $this->info[$element];
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
$type = "info_$type";
|
||||||
|
$e =& $this;
|
||||||
|
}
|
||||||
|
$f =& $e->$type;
|
||||||
|
$f[$attr] = $fix;
|
||||||
|
break;
|
||||||
|
case 'tag_transform':
|
||||||
|
$this->info_tag_transform[$params['element']] = $fix;
|
||||||
|
break;
|
||||||
|
case 'child':
|
||||||
|
case 'content_model_type':
|
||||||
|
$element = $params['element'];
|
||||||
|
if (empty($this->info[$element])) {
|
||||||
|
$e =& $this->addBlankElement($element);
|
||||||
|
} else {
|
||||||
|
$e =& $this->info[$element];
|
||||||
|
}
|
||||||
|
$e->$type = $fix;
|
||||||
|
break;
|
||||||
|
default:
|
||||||
|
trigger_error("Fix type $type not supported", E_USER_ERROR);
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Parses a fix name and determines what kind of fix it is, as well
|
||||||
|
* as other information defined by the fix
|
||||||
|
* @param $name String name of fix
|
||||||
|
* @return array(string $fix_type, array $fix_parameters)
|
||||||
|
* @note $fix_parameters is type dependant, see populate() for usage
|
||||||
|
* of these parameters
|
||||||
|
*/
|
||||||
|
function getFixType($name) {
|
||||||
|
// parse it
|
||||||
|
$property = $attr = null;
|
||||||
|
if (strpos($name, '#') !== false) list($name, $property) = explode('#', $name);
|
||||||
|
if (strpos($name, '@') !== false) list($name, $attr) = explode('@', $name);
|
||||||
|
|
||||||
|
// figure out the parameters
|
||||||
|
$params = array();
|
||||||
|
if ($name !== '') $params['element'] = $name;
|
||||||
|
if (!is_null($attr)) $params['attr'] = $attr;
|
||||||
|
|
||||||
|
// special case: attribute transform
|
||||||
|
if (!is_null($attr)) {
|
||||||
|
if (is_null($property)) $property = 'pre';
|
||||||
|
$type = 'attr_transform_' . $property;
|
||||||
|
return array($type, $params);
|
||||||
|
}
|
||||||
|
|
||||||
|
// special case: tag transform
|
||||||
|
if (is_null($property)) {
|
||||||
|
return array('tag_transform', $params);
|
||||||
|
}
|
||||||
|
|
||||||
|
return array($property, $params);
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Defines all fixes the module will perform in a compact
|
||||||
|
* associative array of fix name to fix implementation.
|
||||||
|
* @abstract
|
||||||
|
*/
|
||||||
|
function makeFixes() {}
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
?>
|
18
library/HTMLPurifier/HTMLModule/Tidy/Proprietary.php
Normal file
18
library/HTMLPurifier/HTMLModule/Tidy/Proprietary.php
Normal file
@@ -0,0 +1,18 @@
|
|||||||
|
<?php
|
||||||
|
|
||||||
|
require_once 'HTMLPurifier/HTMLModule/Tidy.php';
|
||||||
|
|
||||||
|
class HTMLPurifier_HTMLModule_Tidy_Proprietary extends
|
||||||
|
HTMLPurifier_HTMLModule_Tidy
|
||||||
|
{
|
||||||
|
|
||||||
|
var $name = 'Tidy_Proprietary';
|
||||||
|
var $defaultLevel = 'light';
|
||||||
|
|
||||||
|
function makeFixes() {
|
||||||
|
return array();
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
?>
|
21
library/HTMLPurifier/HTMLModule/Tidy/XHTML.php
Normal file
21
library/HTMLPurifier/HTMLModule/Tidy/XHTML.php
Normal file
@@ -0,0 +1,21 @@
|
|||||||
|
<?php
|
||||||
|
|
||||||
|
require_once 'HTMLPurifier/HTMLModule/Tidy.php';
|
||||||
|
require_once 'HTMLPurifier/AttrTransform/Lang.php';
|
||||||
|
|
||||||
|
class HTMLPurifier_HTMLModule_Tidy_XHTML extends
|
||||||
|
HTMLPurifier_HTMLModule_Tidy
|
||||||
|
{
|
||||||
|
|
||||||
|
var $name = 'Tidy_XHTML';
|
||||||
|
var $defaultLevel = 'medium';
|
||||||
|
|
||||||
|
function makeFixes() {
|
||||||
|
$r = array();
|
||||||
|
$r['@lang'] = new HTMLPurifier_AttrTransform_Lang();
|
||||||
|
return $r;
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
?>
|
193
library/HTMLPurifier/HTMLModule/Tidy/XHTMLAndHTML4.php
Normal file
193
library/HTMLPurifier/HTMLModule/Tidy/XHTMLAndHTML4.php
Normal file
@@ -0,0 +1,193 @@
|
|||||||
|
<?php
|
||||||
|
|
||||||
|
require_once 'HTMLPurifier/HTMLModule/Tidy.php';
|
||||||
|
|
||||||
|
require_once 'HTMLPurifier/TagTransform/Simple.php';
|
||||||
|
require_once 'HTMLPurifier/TagTransform/Font.php';
|
||||||
|
|
||||||
|
require_once 'HTMLPurifier/AttrTransform/BgColor.php';
|
||||||
|
require_once 'HTMLPurifier/AttrTransform/BoolToCSS.php';
|
||||||
|
require_once 'HTMLPurifier/AttrTransform/Border.php';
|
||||||
|
require_once 'HTMLPurifier/AttrTransform/Name.php';
|
||||||
|
require_once 'HTMLPurifier/AttrTransform/Length.php';
|
||||||
|
require_once 'HTMLPurifier/AttrTransform/ImgSpace.php';
|
||||||
|
require_once 'HTMLPurifier/AttrTransform/EnumToCSS.php';
|
||||||
|
|
||||||
|
class HTMLPurifier_HTMLModule_Tidy_XHTMLAndHTML4 extends
|
||||||
|
HTMLPurifier_HTMLModule_Tidy
|
||||||
|
{
|
||||||
|
|
||||||
|
function makeFixes() {
|
||||||
|
|
||||||
|
$r = array();
|
||||||
|
|
||||||
|
// == deprecated tag transforms ===================================
|
||||||
|
|
||||||
|
$r['font'] = new HTMLPurifier_TagTransform_Font();
|
||||||
|
$r['menu'] = new HTMLPurifier_TagTransform_Simple('ul');
|
||||||
|
$r['dir'] = new HTMLPurifier_TagTransform_Simple('ul');
|
||||||
|
$r['center'] = new HTMLPurifier_TagTransform_Simple('div', 'text-align:center;');
|
||||||
|
$r['u'] = new HTMLPurifier_TagTransform_Simple('span', 'text-decoration:underline;');
|
||||||
|
$r['s'] = new HTMLPurifier_TagTransform_Simple('span', 'text-decoration:line-through;');
|
||||||
|
$r['strike'] = new HTMLPurifier_TagTransform_Simple('span', 'text-decoration:line-through;');
|
||||||
|
|
||||||
|
// == deprecated attribute transforms =============================
|
||||||
|
|
||||||
|
$r['caption@align'] =
|
||||||
|
new HTMLPurifier_AttrTransform_EnumToCSS('align', array(
|
||||||
|
// we're following IE's behavior, not Firefox's, due
|
||||||
|
// to the fact that no one supports caption-side:right,
|
||||||
|
// W3C included (with CSS 2.1). This is a slightly
|
||||||
|
// unreasonable attribute!
|
||||||
|
'left' => 'text-align:left;',
|
||||||
|
'right' => 'text-align:right;',
|
||||||
|
'top' => 'caption-side:top;',
|
||||||
|
'bottom' => 'caption-side:bottom;' // not supported by IE
|
||||||
|
));
|
||||||
|
|
||||||
|
// @align for img -------------------------------------------------
|
||||||
|
$r['img@align'] =
|
||||||
|
new HTMLPurifier_AttrTransform_EnumToCSS('align', array(
|
||||||
|
'left' => 'float:left;',
|
||||||
|
'right' => 'float:right;',
|
||||||
|
'top' => 'vertical-align:top;',
|
||||||
|
'middle' => 'vertical-align:middle;',
|
||||||
|
'bottom' => 'vertical-align:baseline;',
|
||||||
|
));
|
||||||
|
|
||||||
|
// @align for table -----------------------------------------------
|
||||||
|
$r['table@align'] =
|
||||||
|
new HTMLPurifier_AttrTransform_EnumToCSS('align', array(
|
||||||
|
'left' => 'float:left;',
|
||||||
|
'center' => 'margin-left:auto;margin-right:auto;',
|
||||||
|
'right' => 'float:right;'
|
||||||
|
));
|
||||||
|
|
||||||
|
// @align for hr -----------------------------------------------
|
||||||
|
$r['hr@align'] =
|
||||||
|
new HTMLPurifier_AttrTransform_EnumToCSS('align', array(
|
||||||
|
// we use both text-align and margin because these work
|
||||||
|
// for different browsers (IE and Firefox, respectively)
|
||||||
|
// and the melange makes for a pretty cross-compatible
|
||||||
|
// solution
|
||||||
|
'left' => 'margin-left:0;margin-right:auto;text-align:left;',
|
||||||
|
'center' => 'margin-left:auto;margin-right:auto;text-align:center;',
|
||||||
|
'right' => 'margin-left:auto;margin-right:0;text-align:right;'
|
||||||
|
));
|
||||||
|
|
||||||
|
// @align for h1, h2, h3, h4, h5, h6, p, div ----------------------
|
||||||
|
// {{{
|
||||||
|
$align_lookup = array();
|
||||||
|
$align_values = array('left', 'right', 'center', 'justify');
|
||||||
|
foreach ($align_values as $v) $align_lookup[$v] = "text-align:$v;";
|
||||||
|
// }}}
|
||||||
|
$r['h1@align'] =
|
||||||
|
$r['h2@align'] =
|
||||||
|
$r['h3@align'] =
|
||||||
|
$r['h4@align'] =
|
||||||
|
$r['h5@align'] =
|
||||||
|
$r['h6@align'] =
|
||||||
|
$r['p@align'] =
|
||||||
|
$r['div@align'] =
|
||||||
|
new HTMLPurifier_AttrTransform_EnumToCSS('align', $align_lookup);
|
||||||
|
|
||||||
|
// @bgcolor for table, tr, td, th ---------------------------------
|
||||||
|
$r['table@bgcolor'] =
|
||||||
|
$r['td@bgcolor'] =
|
||||||
|
$r['th@bgcolor'] =
|
||||||
|
new HTMLPurifier_AttrTransform_BgColor();
|
||||||
|
|
||||||
|
// @border for img ------------------------------------------------
|
||||||
|
$r['img@border'] = new HTMLPurifier_AttrTransform_Border();
|
||||||
|
|
||||||
|
// @clear for br --------------------------------------------------
|
||||||
|
$r['br@clear'] =
|
||||||
|
new HTMLPurifier_AttrTransform_EnumToCSS('clear', array(
|
||||||
|
'left' => 'clear:left;',
|
||||||
|
'right' => 'clear:right;',
|
||||||
|
'all' => 'clear:both;',
|
||||||
|
'none' => 'clear:none;',
|
||||||
|
));
|
||||||
|
|
||||||
|
// @height for td, th ---------------------------------------------
|
||||||
|
$r['td@height'] =
|
||||||
|
$r['th@height'] =
|
||||||
|
new HTMLPurifier_AttrTransform_Length('height');
|
||||||
|
|
||||||
|
// @hspace for img ------------------------------------------------
|
||||||
|
$r['img@hspace'] = new HTMLPurifier_AttrTransform_ImgSpace('hspace');
|
||||||
|
|
||||||
|
// @name for img, a -----------------------------------------------
|
||||||
|
$r['img@name'] =
|
||||||
|
$r['a@name'] = new HTMLPurifier_AttrTransform_Name();
|
||||||
|
|
||||||
|
// @noshade for hr ------------------------------------------------
|
||||||
|
// this transformation is not precise but often good enough.
|
||||||
|
// different browsers use different styles to designate noshade
|
||||||
|
$r['hr@noshade'] =
|
||||||
|
new HTMLPurifier_AttrTransform_BoolToCSS(
|
||||||
|
'noshade',
|
||||||
|
'color:#808080;background-color:#808080;border:0;'
|
||||||
|
);
|
||||||
|
|
||||||
|
// @nowrap for td, th ---------------------------------------------
|
||||||
|
$r['td@nowrap'] =
|
||||||
|
$r['th@nowrap'] =
|
||||||
|
new HTMLPurifier_AttrTransform_BoolToCSS(
|
||||||
|
'nowrap',
|
||||||
|
'white-space:nowrap;'
|
||||||
|
);
|
||||||
|
|
||||||
|
// @size for hr --------------------------------------------------
|
||||||
|
$r['hr@size'] = new HTMLPurifier_AttrTransform_Length('size', 'height');
|
||||||
|
|
||||||
|
// @type for li, ol, ul -------------------------------------------
|
||||||
|
// {{{
|
||||||
|
$ul_types = array(
|
||||||
|
'disc' => 'list-style-type:disc;',
|
||||||
|
'square' => 'list-style-type:square;',
|
||||||
|
'circle' => 'list-style-type:circle;'
|
||||||
|
);
|
||||||
|
$ol_types = array(
|
||||||
|
'1' => 'list-style-type:decimal;',
|
||||||
|
'i' => 'list-style-type:lower-roman;',
|
||||||
|
'I' => 'list-style-type:upper-roman;',
|
||||||
|
'a' => 'list-style-type:lower-alpha;',
|
||||||
|
'A' => 'list-style-type:upper-alpha;'
|
||||||
|
);
|
||||||
|
$li_types = $ul_types + $ol_types;
|
||||||
|
// }}}
|
||||||
|
|
||||||
|
$r['ul@type'] = new HTMLPurifier_AttrTransform_EnumToCSS('type', $ul_types);
|
||||||
|
$r['ol@type'] = new HTMLPurifier_AttrTransform_EnumToCSS('type', $ol_types, true);
|
||||||
|
$r['li@type'] = new HTMLPurifier_AttrTransform_EnumToCSS('type', $li_types, true);
|
||||||
|
|
||||||
|
// @vspace for img ------------------------------------------------
|
||||||
|
$r['img@vspace'] = new HTMLPurifier_AttrTransform_ImgSpace('vspace');
|
||||||
|
|
||||||
|
// @width for hr, td, th ------------------------------------------
|
||||||
|
$r['td@width'] =
|
||||||
|
$r['th@width'] =
|
||||||
|
$r['hr@width'] = new HTMLPurifier_AttrTransform_Length('width');
|
||||||
|
|
||||||
|
return $r;
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
class HTMLPurifier_HTMLModule_Tidy_Transitional extends
|
||||||
|
HTMLPurifier_HTMLModule_Tidy_XHTMLAndHTML4
|
||||||
|
{
|
||||||
|
var $name = 'Tidy_Transitional';
|
||||||
|
var $defaultLevel = 'heavy';
|
||||||
|
}
|
||||||
|
|
||||||
|
class HTMLPurifier_HTMLModule_Tidy_Strict extends
|
||||||
|
HTMLPurifier_HTMLModule_Tidy_XHTMLAndHTML4
|
||||||
|
{
|
||||||
|
var $name = 'Tidy_Strict';
|
||||||
|
var $defaultLevel = 'light';
|
||||||
|
}
|
||||||
|
|
||||||
|
?>
|
27
library/HTMLPurifier/HTMLModule/Tidy/XHTMLStrict.php
Normal file
27
library/HTMLPurifier/HTMLModule/Tidy/XHTMLStrict.php
Normal file
@@ -0,0 +1,27 @@
|
|||||||
|
<?php
|
||||||
|
|
||||||
|
require_once 'HTMLPurifier/HTMLModule/Tidy.php';
|
||||||
|
require_once 'HTMLPurifier/ChildDef/StrictBlockquote.php';
|
||||||
|
|
||||||
|
class HTMLPurifier_HTMLModule_Tidy_XHTMLStrict extends
|
||||||
|
HTMLPurifier_HTMLModule_Tidy
|
||||||
|
{
|
||||||
|
|
||||||
|
var $name = 'Tidy_XHTMLStrict';
|
||||||
|
var $defaultLevel = 'light';
|
||||||
|
|
||||||
|
function makeFixes() {
|
||||||
|
$r = array();
|
||||||
|
$r['blockquote#content_model_type'] = 'strictblockquote';
|
||||||
|
return $r;
|
||||||
|
}
|
||||||
|
|
||||||
|
var $defines_child_def = true;
|
||||||
|
function getChildDef($def) {
|
||||||
|
if ($def->content_model_type != 'strictblockquote') return false;
|
||||||
|
return new HTMLPurifier_ChildDef_StrictBlockquote($def->content_model);
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
?>
|
@@ -1,200 +0,0 @@
|
|||||||
<?php
|
|
||||||
|
|
||||||
require_once 'HTMLPurifier/ChildDef/StrictBlockquote.php';
|
|
||||||
|
|
||||||
require_once 'HTMLPurifier/TagTransform/Simple.php';
|
|
||||||
require_once 'HTMLPurifier/TagTransform/Center.php';
|
|
||||||
require_once 'HTMLPurifier/TagTransform/Font.php';
|
|
||||||
|
|
||||||
require_once 'HTMLPurifier/AttrTransform/Lang.php';
|
|
||||||
require_once 'HTMLPurifier/AttrTransform/BgColor.php';
|
|
||||||
require_once 'HTMLPurifier/AttrTransform/BoolToCSS.php';
|
|
||||||
require_once 'HTMLPurifier/AttrTransform/Border.php';
|
|
||||||
require_once 'HTMLPurifier/AttrTransform/Name.php';
|
|
||||||
require_once 'HTMLPurifier/AttrTransform/Length.php';
|
|
||||||
require_once 'HTMLPurifier/AttrTransform/ImgSpace.php';
|
|
||||||
require_once 'HTMLPurifier/AttrTransform/EnumToCSS.php';
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Proprietary module that transforms deprecated elements into Strict
|
|
||||||
* HTML (see HTML 4.01 and XHTML 1.0) when possible.
|
|
||||||
*/
|
|
||||||
|
|
||||||
class HTMLPurifier_HTMLModule_TransformToStrict extends HTMLPurifier_HTMLModule
|
|
||||||
{
|
|
||||||
|
|
||||||
var $name = 'TransformToStrict';
|
|
||||||
|
|
||||||
// we're actually modifying these elements, not defining them
|
|
||||||
var $elements = array('h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'p',
|
|
||||||
'blockquote', 'table', 'td', 'th', 'tr', 'img', 'a', 'hr', 'br',
|
|
||||||
'caption', 'ul', 'ol', 'li');
|
|
||||||
|
|
||||||
var $info_tag_transform = array(
|
|
||||||
// placeholders, see constructor for definitions
|
|
||||||
'font' => false,
|
|
||||||
'menu' => false,
|
|
||||||
'dir' => false,
|
|
||||||
'center'=> false
|
|
||||||
);
|
|
||||||
|
|
||||||
var $attr_collections = array(
|
|
||||||
'Lang' => array(
|
|
||||||
'lang' => false // placeholder
|
|
||||||
)
|
|
||||||
);
|
|
||||||
|
|
||||||
var $info_attr_transform_post = array(
|
|
||||||
'lang' => false // placeholder
|
|
||||||
);
|
|
||||||
|
|
||||||
function HTMLPurifier_HTMLModule_TransformToStrict() {
|
|
||||||
|
|
||||||
// behavior with transformations when there's another CSS property
|
|
||||||
// working on it is interesting: the CSS will *always* override
|
|
||||||
// the deprecated attribute, whereas an inline CSS declaration will
|
|
||||||
// override the corresponding declaration in, say, an external
|
|
||||||
// stylesheet. This behavior won't affect most people, but it
|
|
||||||
// does represent an operational difference we CANNOT fix.
|
|
||||||
|
|
||||||
// deprecated tag transforms
|
|
||||||
$this->info_tag_transform['font'] = new HTMLPurifier_TagTransform_Font();
|
|
||||||
$this->info_tag_transform['menu'] = new HTMLPurifier_TagTransform_Simple('ul');
|
|
||||||
$this->info_tag_transform['dir'] = new HTMLPurifier_TagTransform_Simple('ul');
|
|
||||||
$this->info_tag_transform['center'] = new HTMLPurifier_TagTransform_Center();
|
|
||||||
|
|
||||||
foreach ($this->elements as $name) {
|
|
||||||
$this->info[$name] = new HTMLPurifier_ElementDef();
|
|
||||||
$this->info[$name]->standalone = false;
|
|
||||||
}
|
|
||||||
|
|
||||||
// deprecated attribute transforms
|
|
||||||
|
|
||||||
// align battery
|
|
||||||
$align_lookup = array();
|
|
||||||
$align_values = array('left', 'right', 'center', 'justify');
|
|
||||||
foreach ($align_values as $v) $align_lookup[$v] = "text-align:$v;";
|
|
||||||
$this->info['h1']->attr_transform_pre['align'] =
|
|
||||||
$this->info['h2']->attr_transform_pre['align'] =
|
|
||||||
$this->info['h3']->attr_transform_pre['align'] =
|
|
||||||
$this->info['h4']->attr_transform_pre['align'] =
|
|
||||||
$this->info['h5']->attr_transform_pre['align'] =
|
|
||||||
$this->info['h6']->attr_transform_pre['align'] =
|
|
||||||
$this->info['p'] ->attr_transform_pre['align'] =
|
|
||||||
new HTMLPurifier_AttrTransform_EnumToCSS('align', $align_lookup);
|
|
||||||
|
|
||||||
// xml:lang <=> lang mirroring, implement in TransformToStrict,
|
|
||||||
// this is overridden in TransformToXHTML11
|
|
||||||
$this->info_attr_transform_post['lang'] = new HTMLPurifier_AttrTransform_Lang();
|
|
||||||
$this->attr_collections['Lang']['lang'] = new HTMLPurifier_AttrDef_Lang();
|
|
||||||
|
|
||||||
// this should not be applied to XHTML 1.0 Transitional, ONLY
|
|
||||||
// XHTML 1.0 Strict. We may need three classes
|
|
||||||
$this->info['blockquote']->content_model_type = 'strictblockquote';
|
|
||||||
$this->info['blockquote']->child = false; // recalculate please!
|
|
||||||
|
|
||||||
$this->info['table']->attr_transform_pre['bgcolor'] =
|
|
||||||
$this->info['tr']->attr_transform_pre['bgcolor'] =
|
|
||||||
$this->info['td']->attr_transform_pre['bgcolor'] =
|
|
||||||
$this->info['th']->attr_transform_pre['bgcolor'] = new HTMLPurifier_AttrTransform_BgColor();
|
|
||||||
|
|
||||||
$this->info['img']->attr_transform_pre['border'] = new HTMLPurifier_AttrTransform_Border();
|
|
||||||
|
|
||||||
$this->info['img']->attr_transform_pre['name'] =
|
|
||||||
$this->info['a']->attr_transform_pre['name'] = new HTMLPurifier_AttrTransform_Name();
|
|
||||||
|
|
||||||
$this->info['td']->attr_transform_pre['width'] =
|
|
||||||
$this->info['th']->attr_transform_pre['width'] =
|
|
||||||
$this->info['hr']->attr_transform_pre['width'] = new HTMLPurifier_AttrTransform_Length('width');
|
|
||||||
|
|
||||||
$this->info['td']->attr_transform_pre['nowrap'] =
|
|
||||||
$this->info['th']->attr_transform_pre['nowrap'] = new HTMLPurifier_AttrTransform_BoolToCSS('nowrap', 'white-space:nowrap;');
|
|
||||||
|
|
||||||
$this->info['td']->attr_transform_pre['height'] =
|
|
||||||
$this->info['th']->attr_transform_pre['height'] = new HTMLPurifier_AttrTransform_Length('height');
|
|
||||||
|
|
||||||
$this->info['img']->attr_transform_pre['hspace'] = new HTMLPurifier_AttrTransform_ImgSpace('hspace');
|
|
||||||
$this->info['img']->attr_transform_pre['vspace'] = new HTMLPurifier_AttrTransform_ImgSpace('vspace');
|
|
||||||
|
|
||||||
$this->info['hr']->attr_transform_pre['size'] = new HTMLPurifier_AttrTransform_Length('size', 'height');
|
|
||||||
|
|
||||||
// this transformation is not precise but often good enough.
|
|
||||||
// different browsers use different styles to designate noshade
|
|
||||||
$this->info['hr']->attr_transform_pre['noshade'] = new HTMLPurifier_AttrTransform_BoolToCSS('noshade', 'color:#808080;background-color:#808080;border: 0;');
|
|
||||||
|
|
||||||
$this->info['br']->attr_transform_pre['clear'] =
|
|
||||||
new HTMLPurifier_AttrTransform_EnumToCSS('clear', array(
|
|
||||||
'left' => 'clear:left;',
|
|
||||||
'right' => 'clear:right;',
|
|
||||||
'all' => 'clear:both;',
|
|
||||||
'none' => 'clear:none;',
|
|
||||||
));
|
|
||||||
|
|
||||||
// this is a slightly unreasonable attribute
|
|
||||||
$this->info['caption']->attr_transform_pre['align'] =
|
|
||||||
new HTMLPurifier_AttrTransform_EnumToCSS('align', array(
|
|
||||||
// we're following IE's behavior, not Firefox's, due
|
|
||||||
// to the fact that no one supports caption-side:right,
|
|
||||||
// W3C included (with CSS 2.1)
|
|
||||||
'left' => 'text-align:left;',
|
|
||||||
'right' => 'text-align:right;',
|
|
||||||
'top' => 'caption-side:top;',
|
|
||||||
'bottom' => 'caption-side:bottom;' // not supported by IE
|
|
||||||
));
|
|
||||||
|
|
||||||
$this->info['table']->attr_transform_pre['align'] =
|
|
||||||
new HTMLPurifier_AttrTransform_EnumToCSS('align', array(
|
|
||||||
'left' => 'float:left;',
|
|
||||||
'center' => 'margin-left:auto;margin-right:auto;',
|
|
||||||
'right' => 'float:right;'
|
|
||||||
));
|
|
||||||
|
|
||||||
$this->info['img']->attr_transform_pre['align'] =
|
|
||||||
new HTMLPurifier_AttrTransform_EnumToCSS('align', array(
|
|
||||||
'left' => 'float:left;',
|
|
||||||
'right' => 'float:right;',
|
|
||||||
'top' => 'vertical-align:top;',
|
|
||||||
'middle' => 'vertical-align:middle;',
|
|
||||||
'bottom' => 'vertical-align:baseline;',
|
|
||||||
));
|
|
||||||
|
|
||||||
$this->info['hr']->attr_transform_pre['align'] =
|
|
||||||
new HTMLPurifier_AttrTransform_EnumToCSS('align', array(
|
|
||||||
'left' => 'margin-left:0;margin-right:auto;text-align:left;',
|
|
||||||
'center' => 'margin-left:auto;margin-right:auto;text-align:center;',
|
|
||||||
'right' => 'margin-left:auto;margin-right:0;text-align:right;'
|
|
||||||
));
|
|
||||||
|
|
||||||
$ul_types = array(
|
|
||||||
'disc' => 'list-style-type:disc;',
|
|
||||||
'square' => 'list-style-type:square;',
|
|
||||||
'circle' => 'list-style-type:circle;'
|
|
||||||
);
|
|
||||||
$ol_types = array(
|
|
||||||
'1' => 'list-style-type:decimal;',
|
|
||||||
'i' => 'list-style-type:lower-roman;',
|
|
||||||
'I' => 'list-style-type:upper-roman;',
|
|
||||||
'a' => 'list-style-type:lower-alpha;',
|
|
||||||
'A' => 'list-style-type:upper-alpha;'
|
|
||||||
);
|
|
||||||
$li_types = $ul_types + $ol_types;
|
|
||||||
|
|
||||||
$this->info['ul']->attr_transform_pre['type'] =
|
|
||||||
new HTMLPurifier_AttrTransform_EnumToCSS('type', $ul_types);
|
|
||||||
$this->info['ol']->attr_transform_pre['type'] =
|
|
||||||
new HTMLPurifier_AttrTransform_EnumToCSS('type', $ol_types, true);
|
|
||||||
$this->info['li']->attr_transform_pre['type'] =
|
|
||||||
new HTMLPurifier_AttrTransform_EnumToCSS('type', $li_types, true);
|
|
||||||
|
|
||||||
|
|
||||||
}
|
|
||||||
|
|
||||||
var $defines_child_def = true;
|
|
||||||
function getChildDef($def) {
|
|
||||||
if ($def->content_model_type != 'strictblockquote') return false;
|
|
||||||
return new HTMLPurifier_ChildDef_StrictBlockquote($def->content_model);
|
|
||||||
}
|
|
||||||
|
|
||||||
}
|
|
||||||
|
|
||||||
?>
|
|
@@ -1,36 +0,0 @@
|
|||||||
<?php
|
|
||||||
|
|
||||||
require_once 'HTMLPurifier/AttrTransform/Lang.php';
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Proprietary module that transforms XHTML 1.0 deprecated aspects into
|
|
||||||
* XHTML 1.1 compliant ones, when possible. For maximum effectiveness,
|
|
||||||
* HTMLPurifier_HTMLModule_TransformToStrict must also be loaded
|
|
||||||
* (otherwise, elements that were deprecated from Transitional to Strict
|
|
||||||
* will not be transformed).
|
|
||||||
*
|
|
||||||
* XHTML 1.1 compliant document are automatically XHTML 1.0 compliant too,
|
|
||||||
* although they may not be as friendly to legacy browsers.
|
|
||||||
*/
|
|
||||||
|
|
||||||
class HTMLPurifier_HTMLModule_TransformToXHTML11 extends HTMLPurifier_HTMLModule
|
|
||||||
{
|
|
||||||
|
|
||||||
var $name = 'TransformToXHTML11';
|
|
||||||
var $attr_collections = array(
|
|
||||||
'Lang' => array(
|
|
||||||
'lang' => false // remove it
|
|
||||||
)
|
|
||||||
);
|
|
||||||
|
|
||||||
var $info_attr_transform_post = array(
|
|
||||||
'lang' => false // remove it
|
|
||||||
);
|
|
||||||
|
|
||||||
function HTMLPurifier_HTMLModule_TransformToXHTML11() {
|
|
||||||
$this->info_attr_transform_pre['lang'] = new HTMLPurifier_AttrTransform_Lang();
|
|
||||||
}
|
|
||||||
|
|
||||||
}
|
|
||||||
|
|
||||||
?>
|
|
16
library/HTMLPurifier/HTMLModule/XMLCommonAttributes.php
Normal file
16
library/HTMLPurifier/HTMLModule/XMLCommonAttributes.php
Normal file
@@ -0,0 +1,16 @@
|
|||||||
|
<?php
|
||||||
|
|
||||||
|
require_once 'HTMLPurifier/HTMLModule.php';
|
||||||
|
|
||||||
|
class HTMLPurifier_HTMLModule_XMLCommonAttributes extends HTMLPurifier_HTMLModule
|
||||||
|
{
|
||||||
|
var $name = 'XMLCommonAttributes';
|
||||||
|
|
||||||
|
var $attr_collections = array(
|
||||||
|
'Lang' => array(
|
||||||
|
'xml:lang' => 'LanguageCode',
|
||||||
|
)
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
?>
|
@@ -2,6 +2,8 @@
|
|||||||
|
|
||||||
require_once 'HTMLPurifier/HTMLModule.php';
|
require_once 'HTMLPurifier/HTMLModule.php';
|
||||||
require_once 'HTMLPurifier/ElementDef.php';
|
require_once 'HTMLPurifier/ElementDef.php';
|
||||||
|
require_once 'HTMLPurifier/Doctype.php';
|
||||||
|
require_once 'HTMLPurifier/DoctypeRegistry.php';
|
||||||
|
|
||||||
require_once 'HTMLPurifier/ContentSets.php';
|
require_once 'HTMLPurifier/ContentSets.php';
|
||||||
require_once 'HTMLPurifier/AttrTypes.php';
|
require_once 'HTMLPurifier/AttrTypes.php';
|
||||||
@@ -23,14 +25,20 @@ require_once 'HTMLPurifier/HTMLModule/Image.php';
|
|||||||
require_once 'HTMLPurifier/HTMLModule/StyleAttribute.php';
|
require_once 'HTMLPurifier/HTMLModule/StyleAttribute.php';
|
||||||
require_once 'HTMLPurifier/HTMLModule/Legacy.php';
|
require_once 'HTMLPurifier/HTMLModule/Legacy.php';
|
||||||
require_once 'HTMLPurifier/HTMLModule/Target.php';
|
require_once 'HTMLPurifier/HTMLModule/Target.php';
|
||||||
|
require_once 'HTMLPurifier/HTMLModule/Scripting.php';
|
||||||
|
require_once 'HTMLPurifier/HTMLModule/XMLCommonAttributes.php';
|
||||||
|
require_once 'HTMLPurifier/HTMLModule/NonXMLCommonAttributes.php';
|
||||||
|
|
||||||
// proprietary modules
|
// tidy modules
|
||||||
require_once 'HTMLPurifier/HTMLModule/TransformToStrict.php';
|
require_once 'HTMLPurifier/HTMLModule/Tidy.php';
|
||||||
require_once 'HTMLPurifier/HTMLModule/TransformToXHTML11.php';
|
require_once 'HTMLPurifier/HTMLModule/Tidy/XHTMLAndHTML4.php';
|
||||||
|
require_once 'HTMLPurifier/HTMLModule/Tidy/XHTML.php';
|
||||||
|
require_once 'HTMLPurifier/HTMLModule/Tidy/XHTMLStrict.php';
|
||||||
|
require_once 'HTMLPurifier/HTMLModule/Tidy/Proprietary.php';
|
||||||
|
|
||||||
HTMLPurifier_ConfigSchema::define(
|
HTMLPurifier_ConfigSchema::define(
|
||||||
'HTML', 'Doctype', null, 'string/null',
|
'HTML', 'Doctype', null, 'string/null',
|
||||||
'Doctype to use, valid values are HTML 4.01 Transitional, HTML 4.01 '.
|
'Doctype to use, pre-defined values are HTML 4.01 Transitional, HTML 4.01 '.
|
||||||
'Strict, XHTML 1.0 Transitional, XHTML 1.0 Strict, XHTML 1.1. '.
|
'Strict, XHTML 1.0 Transitional, XHTML 1.0 Strict, XHTML 1.1. '.
|
||||||
'Technically speaking this is not actually a doctype (as it does '.
|
'Technically speaking this is not actually a doctype (as it does '.
|
||||||
'not identify a corresponding DTD), but we are using this name '.
|
'not identify a corresponding DTD), but we are using this name '.
|
||||||
@@ -38,173 +46,159 @@ HTMLPurifier_ConfigSchema::define(
|
|||||||
'like %Core.XHTML or %HTML.Strict.'
|
'like %Core.XHTML or %HTML.Strict.'
|
||||||
);
|
);
|
||||||
|
|
||||||
|
HTMLPurifier_ConfigSchema::define(
|
||||||
|
'HTML', 'Trusted', false, 'bool',
|
||||||
|
'Indicates whether or not the user input is trusted or not. If the '.
|
||||||
|
'input is trusted, a more expansive set of allowed tags and attributes '.
|
||||||
|
'will be used. This directive has been available since 2.0.0.'
|
||||||
|
);
|
||||||
|
|
||||||
|
HTMLPurifier_ConfigSchema::define(
|
||||||
|
'HTML', 'AllowedModules', null, 'lookup/null', '
|
||||||
|
<p>
|
||||||
|
A doctype comes with a set of usual modules to use. Without having
|
||||||
|
to mucking about with the doctypes, you can quickly activate or
|
||||||
|
disable these modules by specifying which modules you wish to allow
|
||||||
|
with this directive. This is most useful for unit testing specific
|
||||||
|
modules, although end users may find it useful for their own ends.
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
If you specify a module that does not exist, the manager will silently
|
||||||
|
fail to use it, so be careful! User-defined modules are not affected
|
||||||
|
by this directive. Modules defined in %HTML.CoreModules are not
|
||||||
|
affected by this directive. This directive has been available since 2.0.0.
|
||||||
|
</p>
|
||||||
|
');
|
||||||
|
|
||||||
|
HTMLPurifier_ConfigSchema::define(
|
||||||
|
'HTML', 'CoreModules', array(
|
||||||
|
'Structure' => true,
|
||||||
|
'Text' => true,
|
||||||
|
'Hypertext' => true,
|
||||||
|
'List' => true,
|
||||||
|
'NonXMLCommonAttributes' => true,
|
||||||
|
'XMLCommonAttributes' => true,
|
||||||
|
'CommonAttributes' => true
|
||||||
|
), 'lookup', '
|
||||||
|
<p>
|
||||||
|
Certain modularized doctypes (XHTML, namely), have certain modules
|
||||||
|
that must be included for the doctype to be an conforming document
|
||||||
|
type: put those modules here. By default, XHTML\'s core modules
|
||||||
|
are used. You can set this to a blank array to disable core module
|
||||||
|
protection, but this is not recommended. This directive has been
|
||||||
|
available since 2.0.0.
|
||||||
|
</p>
|
||||||
|
');
|
||||||
|
|
||||||
class HTMLPurifier_HTMLModuleManager
|
class HTMLPurifier_HTMLModuleManager
|
||||||
{
|
{
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Array of HTMLPurifier_Module instances, indexed by module's class name.
|
* Instance of HTMLPurifier_DoctypeRegistry
|
||||||
* All known modules, regardless of use, are in this array.
|
* @public
|
||||||
|
*/
|
||||||
|
var $doctypes;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Instance of current doctype
|
||||||
|
* @public
|
||||||
|
*/
|
||||||
|
var $doctype;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Instance of HTMLPurifier_AttrTypes
|
||||||
|
* @public
|
||||||
|
*/
|
||||||
|
var $attrTypes;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Active instances of modules for the specified doctype are
|
||||||
|
* indexed, by name, in this array.
|
||||||
*/
|
*/
|
||||||
var $modules = array();
|
var $modules = array();
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* String doctype we will validate against. See $validModules for use.
|
* Array of recognized HTMLPurifier_Module instances, indexed by
|
||||||
*
|
* module's class name. This array is usually lazy loaded, but a
|
||||||
* @note
|
* user can overload a module by pre-emptively registering it.
|
||||||
* There is a special doctype '*' that acts both as the "default"
|
|
||||||
* doctype if a customized system only defines one doctype and
|
|
||||||
* also a catch-all doctype that gets merged into all the other
|
|
||||||
* module collections. When possible, use a private collection to
|
|
||||||
* share modules between doctypes: this special doctype is to
|
|
||||||
* make life more convenient for users.
|
|
||||||
*/
|
*/
|
||||||
var $doctype;
|
var $registeredModules = array();
|
||||||
var $doctypeAliases = array(); /**< Lookup array of strings to real doctypes */
|
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Associative array: $collections[$type][$doctype] = list of modules.
|
* List of extra modules that were added by the user using addModule().
|
||||||
* This is used to logically separate types of functionality so that
|
* These get unconditionally merged into the current doctype, whatever
|
||||||
* based on the doctype and other configuration settings they may
|
* it may be.
|
||||||
* be easily switched and on and off. Custom setups may not need
|
|
||||||
* to use this abstraction, opting to have only one big collection
|
|
||||||
* with one valid doctype.
|
|
||||||
*/
|
*/
|
||||||
var $collections = array();
|
var $userModules = array();
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Modules that may be used in a valid doctype of this kind.
|
* Associative array of element name to list of modules that have
|
||||||
* Correctional and leniency modules should not be placed in this
|
* definitions for the element; this array is dynamically filled.
|
||||||
* array unless the user said so: don't stuff every possible lenient
|
|
||||||
* module for this doctype in here.
|
|
||||||
*/
|
*/
|
||||||
var $validModules = array();
|
|
||||||
var $validCollections = array(); /**< Collections to merge into $validModules */
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Modules that we will allow in input, subset of $validModules. Single
|
|
||||||
* element definitions may result in us consulting validModules.
|
|
||||||
*/
|
|
||||||
var $activeModules = array();
|
|
||||||
var $activeCollections = array(); /**< Collections to merge into $activeModules */
|
|
||||||
|
|
||||||
var $counter = 0; /**< Designates next available integer order for modules. */
|
|
||||||
var $initialized = false; /**< Says whether initialize() was called */
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Specifies what doctype to siphon new modules from addModule() to,
|
|
||||||
* or false to disable the functionality. Must be used in conjunction
|
|
||||||
* with $autoCollection.
|
|
||||||
*/
|
|
||||||
var $autoDoctype = false;
|
|
||||||
/**
|
|
||||||
* Specifies what collection to siphon new modules from addModule() to,
|
|
||||||
* or false to disable the functionality. Must be used in conjunction
|
|
||||||
* with $autoCollection.
|
|
||||||
*/
|
|
||||||
var $autoCollection = false;
|
|
||||||
|
|
||||||
/** Associative array of element name to defining modules (always array) */
|
|
||||||
var $elementLookup = array();
|
var $elementLookup = array();
|
||||||
|
|
||||||
/** List of prefixes we should use for resolving small names */
|
/** List of prefixes we should use for registering small names */
|
||||||
var $prefixes = array('HTMLPurifier_HTMLModule_');
|
var $prefixes = array('HTMLPurifier_HTMLModule_');
|
||||||
|
|
||||||
var $contentSets; /**< Instance of HTMLPurifier_ContentSets */
|
var $contentSets; /**< Instance of HTMLPurifier_ContentSets */
|
||||||
var $attrTypes; /**< Instance of HTMLPurifier_AttrTypes */
|
|
||||||
var $attrCollections; /**< Instance of HTMLPurifier_AttrCollections */
|
var $attrCollections; /**< Instance of HTMLPurifier_AttrCollections */
|
||||||
|
|
||||||
/**
|
/** If set to true, unsafe elements and attributes will be allowed */
|
||||||
* @param $blank If true, don't do any initializing
|
var $trusted = false;
|
||||||
*/
|
|
||||||
function HTMLPurifier_HTMLModuleManager($blank = false) {
|
function HTMLPurifier_HTMLModuleManager() {
|
||||||
|
|
||||||
// the only editable internal object. The rest need to
|
// editable internal objects
|
||||||
// be manipulated through modules
|
|
||||||
$this->attrTypes = new HTMLPurifier_AttrTypes();
|
$this->attrTypes = new HTMLPurifier_AttrTypes();
|
||||||
|
$this->doctypes = new HTMLPurifier_DoctypeRegistry();
|
||||||
|
|
||||||
if (!$blank) $this->initialize();
|
// setup default HTML doctypes
|
||||||
|
|
||||||
}
|
// module reuse
|
||||||
|
$common = array(
|
||||||
function initialize() {
|
'CommonAttributes', 'Text', 'Hypertext', 'List',
|
||||||
$this->initialized = true;
|
'Presentation', 'Edit', 'Bdo', 'Tables', 'Image',
|
||||||
|
'StyleAttribute', 'Scripting'
|
||||||
// load default modules to the recognized modules list (not active)
|
|
||||||
$modules = array(
|
|
||||||
// define
|
|
||||||
'CommonAttributes',
|
|
||||||
'Text', 'Hypertext', 'List', 'Presentation',
|
|
||||||
'Edit', 'Bdo', 'Tables', 'Image', 'StyleAttribute',
|
|
||||||
'Target',
|
|
||||||
// define-redefine
|
|
||||||
'Legacy',
|
|
||||||
// redefine
|
|
||||||
'TransformToStrict', 'TransformToXHTML11'
|
|
||||||
);
|
);
|
||||||
foreach ($modules as $module) {
|
$transitional = array('Legacy', 'Target');
|
||||||
$this->addModule($module);
|
$xml = array('XMLCommonAttributes');
|
||||||
}
|
$non_xml = array('NonXMLCommonAttributes');
|
||||||
|
|
||||||
// Safe modules for supported doctypes. These are included
|
$this->doctypes->register(
|
||||||
// in the valid and active module lists by default
|
'HTML 4.01 Transitional', false,
|
||||||
$this->collections['Safe'] = array(
|
array_merge($common, $transitional, $non_xml),
|
||||||
'_Common' => array( // leading _ indicates private
|
array('Tidy_Transitional', 'Tidy_Proprietary')
|
||||||
'CommonAttributes', 'Text', 'Hypertext', 'List',
|
|
||||||
'Presentation', 'Edit', 'Bdo', 'Tables', 'Image',
|
|
||||||
'StyleAttribute'
|
|
||||||
),
|
|
||||||
// HTML definitions, defer to XHTML definitions
|
|
||||||
'HTML 4.01 Transitional' => array(array('XHTML 1.0 Transitional')),
|
|
||||||
'HTML 4.01 Strict' => array(array('XHTML 1.0 Strict')),
|
|
||||||
// XHTML definitions
|
|
||||||
'XHTML 1.0 Transitional' => array( array('XHTML 1.0 Strict'), 'Legacy', 'Target' ),
|
|
||||||
'XHTML 1.0 Strict' => array(array('_Common')),
|
|
||||||
'XHTML 1.1' => array(array('_Common')),
|
|
||||||
);
|
);
|
||||||
|
|
||||||
// Modules that specify elements that are unsafe from untrusted
|
$this->doctypes->register(
|
||||||
// third-parties. These should be registered in $validModules but
|
'HTML 4.01 Strict', false,
|
||||||
// almost never $activeModules unless you really know what you're
|
array_merge($common, $non_xml),
|
||||||
// doing.
|
array('Tidy_Strict', 'Tidy_Proprietary')
|
||||||
$this->collections['Unsafe'] = array();
|
|
||||||
|
|
||||||
// Modules to import if lenient mode (attempt to convert everything
|
|
||||||
// to a valid representation) is on. These must not be in $validModules
|
|
||||||
// unless specified so.
|
|
||||||
$this->collections['Lenient'] = array(
|
|
||||||
'HTML 4.01 Strict' => array(array('XHTML 1.0 Strict')),
|
|
||||||
'XHTML 1.0 Strict' => array('TransformToStrict'),
|
|
||||||
'XHTML 1.1' => array(array('XHTML 1.0 Strict'), 'TransformToXHTML11')
|
|
||||||
);
|
);
|
||||||
|
|
||||||
// Modules to import if correctional mode (correct everything that
|
$this->doctypes->register(
|
||||||
// is feasible to strict mode) is on. These must not be in $validModules
|
'XHTML 1.0 Transitional', true,
|
||||||
// unless specified so.
|
array_merge($common, $transitional, $xml, $non_xml),
|
||||||
$this->collections['Correctional'] = array(
|
array('Tidy_Transitional', 'Tidy_XHTML', 'Tidy_Proprietary')
|
||||||
'HTML 4.01 Transitional' => array(array('XHTML 1.0 Transitional')),
|
|
||||||
'XHTML 1.0 Transitional' => array('TransformToStrict'), // probably want a different one
|
|
||||||
);
|
);
|
||||||
|
|
||||||
// User-space modules, custom code or whatever
|
$this->doctypes->register(
|
||||||
$this->collections['Extension'] = array();
|
'XHTML 1.0 Strict', true,
|
||||||
|
array_merge($common, $xml, $non_xml),
|
||||||
|
array('Tidy_Strict', 'Tidy_XHTML', 'Tidy_XHTMLStrict', 'Tidy_Proprietary')
|
||||||
|
);
|
||||||
|
|
||||||
// setup active versus valid modules. ORDER IS IMPORTANT!
|
$this->doctypes->register(
|
||||||
// definition modules
|
'XHTML 1.1', true,
|
||||||
$this->makeCollectionActive('Safe');
|
array_merge($common, $xml),
|
||||||
$this->makeCollectionValid('Unsafe');
|
array('Tidy_Strict', 'Tidy_XHTML', 'Tidy_Proprietary') // Tidy_XHTML1_1
|
||||||
// redefinition modules
|
);
|
||||||
$this->makeCollectionActive('Lenient');
|
|
||||||
$this->makeCollectionActive('Correctional');
|
|
||||||
|
|
||||||
$this->autoDoctype = '*';
|
|
||||||
$this->autoCollection = 'Extension';
|
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Adds a module to the recognized module list. This does not
|
* Registers a module to the recognized module list, useful for
|
||||||
* do anything else: the module must be added to a corresponding
|
* overloading pre-existing modules.
|
||||||
* collection to be "activated".
|
|
||||||
* @param $module Mixed: string module name, with or without
|
* @param $module Mixed: string module name, with or without
|
||||||
* HTMLPurifier_HTMLModule prefix, or instance of
|
* HTMLPurifier_HTMLModule prefix, or instance of
|
||||||
* subclass of HTMLPurifier_HTMLModule.
|
* subclass of HTMLPurifier_HTMLModule.
|
||||||
@@ -217,10 +211,15 @@ class HTMLPurifier_HTMLModuleManager
|
|||||||
* - Check for literal object name
|
* - Check for literal object name
|
||||||
* - Throw fatal error
|
* - Throw fatal error
|
||||||
* If your object name collides with an internal class, specify
|
* If your object name collides with an internal class, specify
|
||||||
* your module manually.
|
* your module manually. All modules must have been included
|
||||||
|
* externally: registerModule will not perform inclusions for you!
|
||||||
|
* @warning If your module has the same name as an already loaded
|
||||||
|
* module, your module will overload the old one WITHOUT
|
||||||
|
* warning.
|
||||||
*/
|
*/
|
||||||
function addModule($module) {
|
function registerModule($module) {
|
||||||
if (is_string($module)) {
|
if (is_string($module)) {
|
||||||
|
// attempt to load the module
|
||||||
$original_module = $module;
|
$original_module = $module;
|
||||||
$ok = false;
|
$ok = false;
|
||||||
foreach ($this->prefixes as $prefix) {
|
foreach ($this->prefixes as $prefix) {
|
||||||
@@ -240,16 +239,19 @@ class HTMLPurifier_HTMLModuleManager
|
|||||||
}
|
}
|
||||||
$module = new $module();
|
$module = new $module();
|
||||||
}
|
}
|
||||||
$module->order = $this->counter++; // assign then increment
|
if (empty($module->name)) {
|
||||||
$this->modules[$module->name] = $module;
|
trigger_error('Module instance of ' . get_class($module) . ' must have name');
|
||||||
if ($this->autoDoctype !== false && $this->autoCollection !== false) {
|
return;
|
||||||
$this->collections[$this->autoCollection][$this->autoDoctype][] = $module->name;
|
|
||||||
}
|
}
|
||||||
|
$this->registeredModules[$module->name] = $module;
|
||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Safely tests for class existence without invoking __autoload in PHP5
|
* Safely tests for class existence without invoking __autoload in PHP5
|
||||||
|
* or greater.
|
||||||
* @param $name String class name to test
|
* @param $name String class name to test
|
||||||
|
* @note If any other class needs it, we'll need to stash in a
|
||||||
|
* conjectured "compatibility" class
|
||||||
* @private
|
* @private
|
||||||
*/
|
*/
|
||||||
function _classExists($name) {
|
function _classExists($name) {
|
||||||
@@ -265,55 +267,63 @@ class HTMLPurifier_HTMLModuleManager
|
|||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Makes a collection active, while also making it valid if not
|
* Adds a module to the current doctype by first registering it,
|
||||||
* already done so. See $activeModules for the semantics of "active".
|
* and then tacking it on to the active doctype
|
||||||
* @param $collection_name Name of collection to activate
|
|
||||||
*/
|
*/
|
||||||
function makeCollectionActive($collection_name) {
|
function addModule($module) {
|
||||||
if (!in_array($collection_name, $this->validCollections)) {
|
$this->registerModule($module);
|
||||||
$this->makeCollectionValid($collection_name);
|
if (is_object($module)) $module = $module->name;
|
||||||
}
|
$this->userModules[] = $module;
|
||||||
$this->activeCollections[] = $collection_name;
|
|
||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Makes a collection valid. See $validModules for the semantics of "valid"
|
* Adds a class prefix that registerModule() will use to resolve a
|
||||||
*/
|
|
||||||
function makeCollectionValid($collection_name) {
|
|
||||||
$this->validCollections[] = $collection_name;
|
|
||||||
}
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Adds a class prefix that addModule() will use to resolve a
|
|
||||||
* string name to a concrete class
|
* string name to a concrete class
|
||||||
*/
|
*/
|
||||||
function addPrefix($prefix) {
|
function addPrefix($prefix) {
|
||||||
$this->prefixes[] = (string) $prefix;
|
$this->prefixes[] = $prefix;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Performs processing on modules, after being called you may
|
||||||
|
* use getElement() and getElements()
|
||||||
|
* @param $config Instance of HTMLPurifier_Config
|
||||||
|
*/
|
||||||
function setup($config) {
|
function setup($config) {
|
||||||
|
|
||||||
// load up the autocollection
|
$this->trusted = $config->get('HTML', 'Trusted');
|
||||||
if ($this->autoCollection !== false) {
|
|
||||||
$this->makeCollectionActive($this->autoCollection);
|
// generate
|
||||||
|
$this->doctype = $this->doctypes->make($config);
|
||||||
|
$modules = $this->doctype->modules;
|
||||||
|
|
||||||
|
// take out the default modules that aren't allowed
|
||||||
|
$lookup = $config->get('HTML', 'AllowedModules');
|
||||||
|
$special_cases = $config->get('HTML', 'CoreModules');
|
||||||
|
|
||||||
|
if (is_array($lookup)) {
|
||||||
|
foreach ($modules as $k => $m) {
|
||||||
|
if (isset($special_cases[$m])) continue;
|
||||||
|
if (!isset($lookup[$m])) unset($modules[$k]);
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// retrieve the doctype
|
// merge in custom modules
|
||||||
$this->doctype = $this->getDoctype($config);
|
$modules = array_merge($modules, $this->userModules);
|
||||||
if (isset($this->doctypeAliases[$this->doctype])) {
|
|
||||||
$this->doctype = $this->doctypeAliases[$this->doctype];
|
foreach ($modules as $module) {
|
||||||
|
$this->processModule($module);
|
||||||
}
|
}
|
||||||
|
|
||||||
// process module collections to module name => module instance form
|
foreach ($this->doctype->tidyModules as $module) {
|
||||||
foreach ($this->collections as $col_i => $x) {
|
$this->processModule($module);
|
||||||
$this->processCollections($this->collections[$col_i]);
|
if (method_exists($this->modules[$module], 'construct')) {
|
||||||
|
$this->modules[$module]->construct($config);
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
$this->validModules = $this->assembleModules($this->validCollections);
|
|
||||||
$this->activeModules = $this->assembleModules($this->activeCollections);
|
|
||||||
|
|
||||||
// setup lookup table based on all valid modules
|
// setup lookup table based on all valid modules
|
||||||
foreach ($this->validModules as $module) {
|
foreach ($this->modules as $module) {
|
||||||
foreach ($module->info as $name => $def) {
|
foreach ($module->info as $name => $def) {
|
||||||
if (!isset($this->elementLookup[$name])) {
|
if (!isset($this->elementLookup[$name])) {
|
||||||
$this->elementLookup[$name] = array();
|
$this->elementLookup[$name] = array();
|
||||||
@@ -324,214 +334,51 @@ class HTMLPurifier_HTMLModuleManager
|
|||||||
|
|
||||||
// note the different choice
|
// note the different choice
|
||||||
$this->contentSets = new HTMLPurifier_ContentSets(
|
$this->contentSets = new HTMLPurifier_ContentSets(
|
||||||
// content models that contain non-allowed elements are
|
// content set assembly deals with all possible modules,
|
||||||
// harmless because RemoveForeignElements will ensure
|
// not just ones deemed to be "safe"
|
||||||
// they never get in anyway, and there is usually no
|
$this->modules
|
||||||
// reason why you should want to restrict a content
|
|
||||||
// model beyond what is mandated by the doctype.
|
|
||||||
// Note, however, that this means redefinitions of
|
|
||||||
// content models can't be tossed in validModels willy-nilly:
|
|
||||||
// that stuff still is regulated by configuration.
|
|
||||||
$this->validModules
|
|
||||||
);
|
);
|
||||||
$this->attrCollections = new HTMLPurifier_AttrCollections(
|
$this->attrCollections = new HTMLPurifier_AttrCollections(
|
||||||
$this->attrTypes,
|
$this->attrTypes,
|
||||||
// only explicitly allowed modules are allowed to affect
|
// there is no way to directly disable a global attribute,
|
||||||
// the global attribute collections. This mean's there's
|
// but using AllowedAttributes or simply not including
|
||||||
// a distinction between loading the Bdo module, and the
|
// the module in your custom doctype should be sufficient
|
||||||
// bdo element: Bdo will enable the dir attribute on all
|
$this->modules
|
||||||
// elements, while bdo will only define the bdo element,
|
|
||||||
// which will not have an editable directionality. This might
|
|
||||||
// catch people who are loading only elements by surprise, so
|
|
||||||
// we should consider loading an entire module if all the
|
|
||||||
// elements it defines are requested by the user, especially
|
|
||||||
// if it affects the global attribute collections.
|
|
||||||
$this->activeModules
|
|
||||||
);
|
);
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Takes a list of collections and merges together all the defined
|
* Takes a module and adds it to the active module collection,
|
||||||
* modules for the current doctype from those collections.
|
* registering it if necessary.
|
||||||
* @param $collections List of collection suffixes we should grab
|
|
||||||
* modules from (like 'Safe' or 'Lenient')
|
|
||||||
*/
|
*/
|
||||||
function assembleModules($collections) {
|
function processModule($module) {
|
||||||
$modules = array();
|
if (!isset($this->registeredModules[$module]) || is_object($module)) {
|
||||||
$numOfCollectionsUsed = 0;
|
$this->registerModule($module);
|
||||||
foreach ($collections as $name) {
|
|
||||||
$disable_global = false;
|
|
||||||
if (!isset($this->collections[$name])) {
|
|
||||||
trigger_error("$name collection is undefined", E_USER_ERROR);
|
|
||||||
continue;
|
|
||||||
}
|
|
||||||
$cols = $this->collections[$name];
|
|
||||||
if (isset($cols[$this->doctype])) {
|
|
||||||
if (isset($cols[$this->doctype]['*'])) {
|
|
||||||
unset($cols[$this->doctype]['*']);
|
|
||||||
$disable_global = true;
|
|
||||||
}
|
|
||||||
$modules += $cols[$this->doctype];
|
|
||||||
$numOfCollectionsUsed++;
|
|
||||||
}
|
|
||||||
// accept catch-all doctype
|
|
||||||
if (
|
|
||||||
$this->doctype !== '*' &&
|
|
||||||
isset($cols['*']) &&
|
|
||||||
!$disable_global
|
|
||||||
) {
|
|
||||||
$modules += $cols['*'];
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
$this->modules[$module] = $this->registeredModules[$module];
|
||||||
if ($numOfCollectionsUsed < 1) {
|
|
||||||
// possible XSS injection if user-specified doctypes
|
|
||||||
// are allowed
|
|
||||||
trigger_error("Doctype {$this->doctype} does not exist, ".
|
|
||||||
"check for typos (if you desire a doctype that allows ".
|
|
||||||
"no elements, use an empty array collection)", E_USER_ERROR);
|
|
||||||
}
|
|
||||||
return $modules;
|
|
||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Takes a collection and performs inclusions and substitutions for it.
|
* Retrieves merged element definitions.
|
||||||
* @param $cols Reference to collections class member variable
|
* @return Array of HTMLPurifier_ElementDef
|
||||||
*/
|
*/
|
||||||
function processCollections(&$cols) {
|
function getElements() {
|
||||||
|
|
||||||
// $cols is the set of collections
|
|
||||||
// $col_i is the name (index) of a collection
|
|
||||||
// $col is a collection/list of modules
|
|
||||||
|
|
||||||
// perform inclusions
|
|
||||||
foreach ($cols as $col_i => $col) {
|
|
||||||
$seen = array();
|
|
||||||
if (!empty($col[0]) && is_array($col[0])) {
|
|
||||||
$seen[$col_i] = true; // recursion reporting
|
|
||||||
$includes = $col[0];
|
|
||||||
unset($cols[$col_i][0]); // remove inclusions value, recursion guard
|
|
||||||
} else {
|
|
||||||
$includes = array();
|
|
||||||
}
|
|
||||||
if (empty($includes)) continue;
|
|
||||||
for ($i = 0; isset($includes[$i]); $i++) {
|
|
||||||
$inc = $includes[$i];
|
|
||||||
if (isset($seen[$inc])) {
|
|
||||||
trigger_error(
|
|
||||||
"Circular inclusion detected in $col_i collection",
|
|
||||||
E_USER_ERROR
|
|
||||||
);
|
|
||||||
continue;
|
|
||||||
} else {
|
|
||||||
$seen[$inc] = true;
|
|
||||||
}
|
|
||||||
if (!isset($cols[$inc])) {
|
|
||||||
trigger_error(
|
|
||||||
"Collection $col_i tried to include undefined ".
|
|
||||||
"collection $inc", E_USER_ERROR);
|
|
||||||
continue;
|
|
||||||
}
|
|
||||||
foreach ($cols[$inc] as $module) {
|
|
||||||
if (is_array($module)) { // another inclusion!
|
|
||||||
foreach ($module as $inc2) $includes[] = $inc2;
|
|
||||||
continue;
|
|
||||||
}
|
|
||||||
$cols[$col_i][] = $module; // merge in the other modules
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
// replace with real modules, invert module from list to
|
|
||||||
// assoc array of module name to module instance
|
|
||||||
foreach ($cols as $col_i => $col) {
|
|
||||||
$ignore_global = false;
|
|
||||||
$order = array();
|
|
||||||
foreach ($col as $module_i => $module) {
|
|
||||||
unset($cols[$col_i][$module_i]);
|
|
||||||
if (is_array($module)) {
|
|
||||||
trigger_error("Illegal inclusion array at index".
|
|
||||||
" $module_i found collection $col_i, inclusion".
|
|
||||||
" arrays must be at start of collection (index 0)",
|
|
||||||
E_USER_ERROR);
|
|
||||||
continue;
|
|
||||||
}
|
|
||||||
if ($module_i === '*' && $module === false) {
|
|
||||||
$ignore_global = true;
|
|
||||||
continue;
|
|
||||||
}
|
|
||||||
if (!isset($this->modules[$module])) {
|
|
||||||
trigger_error(
|
|
||||||
"Collection $col_i references undefined ".
|
|
||||||
"module $module",
|
|
||||||
E_USER_ERROR
|
|
||||||
);
|
|
||||||
continue;
|
|
||||||
}
|
|
||||||
$module = $this->modules[$module];
|
|
||||||
$cols[$col_i][$module->name] = $module;
|
|
||||||
$order[$module->name] = $module->order;
|
|
||||||
}
|
|
||||||
array_multisort(
|
|
||||||
$order, SORT_ASC, SORT_NUMERIC, $cols[$col_i]
|
|
||||||
);
|
|
||||||
if ($ignore_global) $cols[$col_i]['*'] = false;
|
|
||||||
}
|
|
||||||
|
|
||||||
// delete pseudo-collections
|
|
||||||
foreach ($cols as $col_i => $col) {
|
|
||||||
if ($col_i[0] == '_') unset($cols[$col_i]);
|
|
||||||
}
|
|
||||||
|
|
||||||
}
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Retrieves the doctype from the configuration object
|
|
||||||
*/
|
|
||||||
function getDoctype($config) {
|
|
||||||
$doctype = $config->get('HTML', 'Doctype');
|
|
||||||
if ($doctype !== null) {
|
|
||||||
return $doctype;
|
|
||||||
}
|
|
||||||
if (!$this->initialized) {
|
|
||||||
// don't do HTML-oriented backwards compatibility stuff
|
|
||||||
// use either the auto-doctype, or the catch-all doctype
|
|
||||||
return $this->autoDoctype ? $this->autoDoctype : '*';
|
|
||||||
}
|
|
||||||
// this is backwards-compatibility stuff
|
|
||||||
if ($config->get('Core', 'XHTML')) {
|
|
||||||
$doctype = 'XHTML 1.0';
|
|
||||||
} else {
|
|
||||||
$doctype = 'HTML 4.01';
|
|
||||||
}
|
|
||||||
if ($config->get('HTML', 'Strict')) {
|
|
||||||
$doctype .= ' Strict';
|
|
||||||
} else {
|
|
||||||
$doctype .= ' Transitional';
|
|
||||||
}
|
|
||||||
return $doctype;
|
|
||||||
}
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Retrieves merged element definitions for all active elements.
|
|
||||||
* @note We may want to generate an elements array during setup
|
|
||||||
* and pass that on, because a specific combination of
|
|
||||||
* elements may trigger the loading of a module.
|
|
||||||
* @param $config Instance of HTMLPurifier_Config, for determining
|
|
||||||
* stray elements.
|
|
||||||
*/
|
|
||||||
function getElements($config) {
|
|
||||||
|
|
||||||
$elements = array();
|
$elements = array();
|
||||||
foreach ($this->activeModules as $module) {
|
foreach ($this->modules as $module) {
|
||||||
foreach ($module->info as $name => $v) {
|
foreach ($module->info as $name => $v) {
|
||||||
if (isset($elements[$name])) continue;
|
if (isset($elements[$name])) continue;
|
||||||
$elements[$name] = $this->getElement($name, $config);
|
// if element is not safe, don't use it
|
||||||
|
if (!$this->trusted && ($v->safe === false)) continue;
|
||||||
|
$elements[$name] = $this->getElement($name);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// standalone elements now loaded
|
// remove dud elements, this happens when an element that
|
||||||
|
// appeared to be safe actually wasn't
|
||||||
|
foreach ($elements as $n => $v) {
|
||||||
|
if ($v === false) unset($elements[$n]);
|
||||||
|
}
|
||||||
|
|
||||||
return $elements;
|
return $elements;
|
||||||
|
|
||||||
@@ -540,13 +387,16 @@ class HTMLPurifier_HTMLModuleManager
|
|||||||
/**
|
/**
|
||||||
* Retrieves a single merged element definition
|
* Retrieves a single merged element definition
|
||||||
* @param $name Name of element
|
* @param $name Name of element
|
||||||
* @param $config Instance of HTMLPurifier_Config, may not be necessary.
|
* @param $trusted Boolean trusted overriding parameter: set to true
|
||||||
|
* if you want the full version of an element
|
||||||
|
* @return Merged HTMLPurifier_ElementDef
|
||||||
*/
|
*/
|
||||||
function getElement($name, $config) {
|
function getElement($name, $trusted = null) {
|
||||||
|
|
||||||
$def = false;
|
$def = false;
|
||||||
|
if ($trusted === null) $trusted = $this->trusted;
|
||||||
|
|
||||||
$modules = $this->validModules;
|
$modules = $this->modules;
|
||||||
|
|
||||||
if (!isset($this->elementLookup[$name])) {
|
if (!isset($this->elementLookup[$name])) {
|
||||||
return false;
|
return false;
|
||||||
@@ -555,9 +405,23 @@ class HTMLPurifier_HTMLModuleManager
|
|||||||
foreach($this->elementLookup[$name] as $module_name) {
|
foreach($this->elementLookup[$name] as $module_name) {
|
||||||
|
|
||||||
$module = $modules[$module_name];
|
$module = $modules[$module_name];
|
||||||
$new_def = $module->info[$name];
|
|
||||||
|
// copy is used because, ideally speaking, the original
|
||||||
|
// definition should not be modified. Usually, this will
|
||||||
|
// make no difference, but for consistency's sake
|
||||||
|
$new_def = $module->info[$name]->copy();
|
||||||
|
|
||||||
|
// refuse to create/merge in a definition that is deemed unsafe
|
||||||
|
if (!$trusted && ($new_def->safe === false)) {
|
||||||
|
$def = false;
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
|
||||||
if (!$def && $new_def->standalone) {
|
if (!$def && $new_def->standalone) {
|
||||||
|
// element with unknown safety is not to be trusted.
|
||||||
|
// however, a merge-in definition with undefined safety
|
||||||
|
// is fine
|
||||||
|
if (!$trusted && !$new_def->safe) continue;
|
||||||
$def = $new_def;
|
$def = $new_def;
|
||||||
} elseif ($def) {
|
} elseif ($def) {
|
||||||
$def->mergeIn($new_def);
|
$def->mergeIn($new_def);
|
||||||
@@ -583,6 +447,13 @@ class HTMLPurifier_HTMLModuleManager
|
|||||||
|
|
||||||
$this->contentSets->generateChildDef($def, $module);
|
$this->contentSets->generateChildDef($def, $module);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// add information on required attributes
|
||||||
|
foreach ($def->attr as $attr_name => $attr_def) {
|
||||||
|
if ($attr_def->required) {
|
||||||
|
$def->required_attr[] = $attr_name;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
return $def;
|
return $def;
|
||||||
|
|
||||||
|
@@ -41,16 +41,34 @@ class HTMLPurifier_Language
|
|||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Retrieves a localised message. Does not perform any operations.
|
* Retrieves a localised message.
|
||||||
* @param $key string identifier of message
|
* @param $key string identifier of message
|
||||||
* @return string localised message
|
* @return string localised message
|
||||||
*/
|
*/
|
||||||
function getMessage($key) {
|
function getMessage($key) {
|
||||||
if (!$this->_loaded) $this->load();
|
if (!$this->_loaded) $this->load();
|
||||||
if (!isset($this->messages[$key])) return '';
|
if (!isset($this->messages[$key])) return "[$key]";
|
||||||
return $this->messages[$key];
|
return $this->messages[$key];
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Formats a localised message with passed parameters
|
||||||
|
* @param $key string identifier of message
|
||||||
|
* @param $param Parameter to substitute in (arbitrary number)
|
||||||
|
* @return string localised message
|
||||||
|
*/
|
||||||
|
function formatMessage($key) {
|
||||||
|
if (!$this->_loaded) $this->load();
|
||||||
|
if (!isset($this->messages[$key])) return "[$key]";
|
||||||
|
$raw = $this->messages[$key];
|
||||||
|
$args = func_get_args();
|
||||||
|
$substitutions = array();
|
||||||
|
for ($i = 1; $i < count($args); $i++) {
|
||||||
|
$substitutions['$' . $i] = $args[$i];
|
||||||
|
}
|
||||||
|
return strtr($raw, $substitutions);
|
||||||
|
}
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|
||||||
?>
|
?>
|
@@ -7,6 +7,8 @@ $messages = array(
|
|||||||
'htmlpurifier' => 'HTML Purifier',
|
'htmlpurifier' => 'HTML Purifier',
|
||||||
'pizza' => 'Pizza', // for unit testing purposes
|
'pizza' => 'Pizza', // for unit testing purposes
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
);
|
);
|
||||||
|
|
||||||
?>
|
?>
|
@@ -100,15 +100,15 @@ class HTMLPurifier_LanguageFactory
|
|||||||
// you can bypass the conditional include by loading the
|
// you can bypass the conditional include by loading the
|
||||||
// file yourself
|
// file yourself
|
||||||
if (file_exists($file) && !class_exists($class)) {
|
if (file_exists($file) && !class_exists($class)) {
|
||||||
include_once $file;
|
include_once $file;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
if (!class_exists($class)) {
|
if (!class_exists($class)) {
|
||||||
// go fallback
|
// go fallback
|
||||||
$fallback = HTMLPurifier_Language::getFallbackFor($code);
|
$fallback = HTMLPurifier_LanguageFactory::getFallbackFor($code);
|
||||||
$depth++;
|
$depth++;
|
||||||
$lang = Language::factory( $fallback );
|
$lang = HTMLPurifier_LanguageFactory::factory( $fallback );
|
||||||
$depth--;
|
$depth--;
|
||||||
} else {
|
} else {
|
||||||
$lang = new $class;
|
$lang = new $class;
|
||||||
@@ -172,15 +172,15 @@ class HTMLPurifier_LanguageFactory
|
|||||||
|
|
||||||
// merge fallback with current language
|
// merge fallback with current language
|
||||||
foreach ( $this->keys as $key ) {
|
foreach ( $this->keys as $key ) {
|
||||||
if (isset($cache[$key]) && isset($fallback_cache[$key])) {
|
if (isset($cache[$key]) && isset($fallback_cache[$key])) {
|
||||||
if (isset($this->mergeable_keys_map[$key])) {
|
if (isset($this->mergeable_keys_map[$key])) {
|
||||||
$cache[$key] = $cache[$key] + $fallback_cache[$key];
|
$cache[$key] = $cache[$key] + $fallback_cache[$key];
|
||||||
} elseif (isset($this->mergeable_keys_list[$key])) {
|
} elseif (isset($this->mergeable_keys_list[$key])) {
|
||||||
$cache[$key] = array_merge( $fallback_cache[$key], $cache[$key] );
|
$cache[$key] = array_merge( $fallback_cache[$key], $cache[$key] );
|
||||||
}
|
}
|
||||||
} else {
|
} else {
|
||||||
$cache[$key] = $fallback_cache[$key];
|
$cache[$key] = $fallback_cache[$key];
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
}
|
}
|
||||||
|
@@ -4,6 +4,14 @@ require_once 'HTMLPurifier/Token.php';
|
|||||||
require_once 'HTMLPurifier/Encoder.php';
|
require_once 'HTMLPurifier/Encoder.php';
|
||||||
require_once 'HTMLPurifier/EntityParser.php';
|
require_once 'HTMLPurifier/EntityParser.php';
|
||||||
|
|
||||||
|
// implementations
|
||||||
|
require_once 'HTMLPurifier/Lexer/DirectLex.php';
|
||||||
|
if (version_compare(PHP_VERSION, "5", ">=")) {
|
||||||
|
// You can remove the if statement if you are running PHP 5 only.
|
||||||
|
// We ought to get the strict version to follow those rules.
|
||||||
|
require_once 'HTMLPurifier/Lexer/DOMLex.php';
|
||||||
|
}
|
||||||
|
|
||||||
HTMLPurifier_ConfigSchema::define(
|
HTMLPurifier_ConfigSchema::define(
|
||||||
'Core', 'AcceptFullDocuments', true, 'bool',
|
'Core', 'AcceptFullDocuments', true, 'bool',
|
||||||
'This parameter determines whether or not the filter should accept full '.
|
'This parameter determines whether or not the filter should accept full '.
|
||||||
@@ -11,6 +19,52 @@ HTMLPurifier_ConfigSchema::define(
|
|||||||
'drop all sections except the content between body.'
|
'drop all sections except the content between body.'
|
||||||
);
|
);
|
||||||
|
|
||||||
|
HTMLPurifier_ConfigSchema::define(
|
||||||
|
'Core', 'LexerImpl', null, 'mixed/null', '
|
||||||
|
<p>
|
||||||
|
This parameter determines what lexer implementation can be used. The
|
||||||
|
valid values are:
|
||||||
|
</p>
|
||||||
|
<dl>
|
||||||
|
<dt><em>null</em></dt>
|
||||||
|
<dd>
|
||||||
|
Recommended, the lexer implementation will be auto-detected based on
|
||||||
|
your PHP-version and configuration.
|
||||||
|
</dd>
|
||||||
|
<dt><em>string</em> lexer identifier</dt>
|
||||||
|
<dd>
|
||||||
|
This is a slim way of manually overridding the implementation.
|
||||||
|
Currently recognized values are: DOMLex (the default PHP5 implementation)
|
||||||
|
and DirectLex (the default PHP4 implementation). Only use this if
|
||||||
|
you know what you are doing: usually, the auto-detection will
|
||||||
|
manage things for cases you aren\'t even aware of.
|
||||||
|
</dd>
|
||||||
|
<dt><em>object</em> lexer instance</dt>
|
||||||
|
<dd>
|
||||||
|
Super-advanced: you can specify your own, custom, implementation that
|
||||||
|
implements the interface defined by <code>HTMLPurifier_Lexer</code>.
|
||||||
|
I may remove this option simply because I don\'t expect anyone
|
||||||
|
to use it.
|
||||||
|
</dd>
|
||||||
|
</dl>
|
||||||
|
<p>
|
||||||
|
This directive has been available since 2.0.0.
|
||||||
|
</p>
|
||||||
|
'
|
||||||
|
);
|
||||||
|
|
||||||
|
HTMLPurifier_ConfigSchema::define(
|
||||||
|
'Core', 'MaintainLineNumbers', false, 'bool', '
|
||||||
|
<p>
|
||||||
|
If true, HTML Purifier will add line number information to all tokens.
|
||||||
|
This is useful when error reporting is turned on, but can result in
|
||||||
|
significant performance degradation and should not be used when
|
||||||
|
unnecessary. This directive must be used with the DirectLex lexer,
|
||||||
|
as the DOMLex lexer does not (yet) support this functionality. This directive
|
||||||
|
has been available since 2.0.0.
|
||||||
|
</p>
|
||||||
|
');
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Forgivingly lexes HTML (SGML-style) markup into tokens.
|
* Forgivingly lexes HTML (SGML-style) markup into tokens.
|
||||||
*
|
*
|
||||||
@@ -55,11 +109,83 @@ HTMLPurifier_ConfigSchema::define(
|
|||||||
class HTMLPurifier_Lexer
|
class HTMLPurifier_Lexer
|
||||||
{
|
{
|
||||||
|
|
||||||
|
// -- STATIC ----------------------------------------------------------
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Retrieves or sets the default Lexer as a Prototype Factory.
|
||||||
|
*
|
||||||
|
* Depending on what PHP version you are running, the abstract base
|
||||||
|
* Lexer class will determine which concrete Lexer is best for you:
|
||||||
|
* HTMLPurifier_Lexer_DirectLex for PHP 4, and HTMLPurifier_Lexer_DOMLex
|
||||||
|
* for PHP 5 and beyond. This general rule has a few exceptions to it
|
||||||
|
* involving special features that only DirectLex implements.
|
||||||
|
*
|
||||||
|
* @static
|
||||||
|
*
|
||||||
|
* @note The behavior of this class has changed, rather than accepting
|
||||||
|
* a prototype object, it now accepts a configuration object.
|
||||||
|
* To specify your own prototype, set %Core.LexerImpl to it.
|
||||||
|
* This change in behavior de-singletonizes the lexer object.
|
||||||
|
*
|
||||||
|
* @note In PHP4, it is possible to call this factory method from
|
||||||
|
* subclasses, such usage is not recommended and not
|
||||||
|
* forwards-compatible.
|
||||||
|
*
|
||||||
|
* @param $prototype Optional prototype lexer or configuration object
|
||||||
|
* @return Concrete lexer.
|
||||||
|
*/
|
||||||
|
static function create($config) {
|
||||||
|
|
||||||
|
if (!($config instanceof HTMLPurifier_Config)) {
|
||||||
|
$lexer = $config;
|
||||||
|
trigger_error("Passing a prototype to
|
||||||
|
HTMLPurifier_Lexer::create() is deprecated, please instead
|
||||||
|
use %Core.LexerImpl", E_USER_WARNING);
|
||||||
|
} else {
|
||||||
|
$lexer = $config->get('Core', 'LexerImpl');
|
||||||
|
}
|
||||||
|
|
||||||
|
if (is_object($lexer)) {
|
||||||
|
return $lexer;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (is_null($lexer)) { do {
|
||||||
|
// auto-detection algorithm
|
||||||
|
|
||||||
|
// once PHP DOM implements native line numbers, or we
|
||||||
|
// hack out something using XSLT, remove this stipulation
|
||||||
|
if ($config->get('Core', 'MaintainLineNumbers')) {
|
||||||
|
$lexer = 'DirectLex';
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (version_compare(PHP_VERSION, "5", ">=") && // check for PHP5
|
||||||
|
class_exists('DOMDocument')) { // check for DOM support
|
||||||
|
$lexer = 'DOMLex';
|
||||||
|
} else {
|
||||||
|
$lexer = 'DirectLex';
|
||||||
|
}
|
||||||
|
|
||||||
|
} while(0); } // do..while so we can break
|
||||||
|
|
||||||
|
// instantiate recognized string names
|
||||||
|
switch ($lexer) {
|
||||||
|
case 'DOMLex':
|
||||||
|
return new HTMLPurifier_Lexer_DOMLex();
|
||||||
|
case 'DirectLex':
|
||||||
|
return new HTMLPurifier_Lexer_DirectLex();
|
||||||
|
default:
|
||||||
|
trigger_error("Cannot instantiate unrecognized Lexer type " . htmlspecialchars($lexer), E_USER_ERROR);
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
// -- CONVENIENCE MEMBERS ---------------------------------------------
|
||||||
|
|
||||||
function HTMLPurifier_Lexer() {
|
function HTMLPurifier_Lexer() {
|
||||||
$this->_entity_parser = new HTMLPurifier_EntityParser();
|
$this->_entity_parser = new HTMLPurifier_EntityParser();
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Most common entity to raw value conversion table for special entities.
|
* Most common entity to raw value conversion table for special entities.
|
||||||
* @protected
|
* @protected
|
||||||
@@ -123,46 +249,6 @@ class HTMLPurifier_Lexer
|
|||||||
trigger_error('Call to abstract class', E_USER_ERROR);
|
trigger_error('Call to abstract class', E_USER_ERROR);
|
||||||
}
|
}
|
||||||
|
|
||||||
/**
|
|
||||||
* Retrieves or sets the default Lexer as a Prototype Factory.
|
|
||||||
*
|
|
||||||
* Depending on what PHP version you are running, the abstract base
|
|
||||||
* Lexer class will determine which concrete Lexer is best for you:
|
|
||||||
* HTMLPurifier_Lexer_DirectLex for PHP 4, and HTMLPurifier_Lexer_DOMLex
|
|
||||||
* for PHP 5 and beyond.
|
|
||||||
*
|
|
||||||
* Passing the optional prototype lexer parameter will override the
|
|
||||||
* default with your own implementation. A copy/reference of the prototype
|
|
||||||
* lexer will now be returned when you request a new lexer.
|
|
||||||
*
|
|
||||||
* @static
|
|
||||||
*
|
|
||||||
* @note
|
|
||||||
* Though it is possible to call this factory method from subclasses,
|
|
||||||
* such usage is not recommended.
|
|
||||||
*
|
|
||||||
* @param $prototype Optional prototype lexer.
|
|
||||||
* @return Concrete lexer.
|
|
||||||
*/
|
|
||||||
static function create($prototype = null) {
|
|
||||||
// we don't really care if it's a reference or a copy
|
|
||||||
static $lexer = null;
|
|
||||||
if ($prototype) {
|
|
||||||
$lexer = $prototype;
|
|
||||||
}
|
|
||||||
if (empty($lexer)) {
|
|
||||||
if (version_compare(PHP_VERSION, "5", ">=") && // check for PHP5
|
|
||||||
class_exists('DOMDocument')) { // check for DOM support
|
|
||||||
require_once 'HTMLPurifier/Lexer/DOMLex.php';
|
|
||||||
$lexer = new HTMLPurifier_Lexer_DOMLex();
|
|
||||||
} else {
|
|
||||||
require_once 'HTMLPurifier/Lexer/DirectLex.php';
|
|
||||||
$lexer = new HTMLPurifier_Lexer_DirectLex();
|
|
||||||
}
|
|
||||||
}
|
|
||||||
return $lexer;
|
|
||||||
}
|
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Translates CDATA sections into regular sections (through escaping).
|
* Translates CDATA sections into regular sections (through escaping).
|
||||||
*
|
*
|
||||||
|
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user