mirror of
https://github.com/ezyang/htmlpurifier.git
synced 2025-08-02 12:21:09 +02:00
Compare commits
7 Commits
v4.11.0
...
v1.4.1-str
Author | SHA1 | Date | |
---|---|---|---|
|
cec7a1c087 | ||
|
c2d3d5b859 | ||
|
9a84e11f34 | ||
|
37ea1673dd | ||
|
5395d8b4bd | ||
|
c980e76197 | ||
|
2bf912d528 |
2
Doxyfile
2
Doxyfile
@@ -4,7 +4,7 @@
|
||||
# Project related configuration options
|
||||
#---------------------------------------------------------------------------
|
||||
PROJECT_NAME = HTML Purifier
|
||||
PROJECT_NUMBER = 1.3.2
|
||||
PROJECT_NUMBER = 1.4.1
|
||||
OUTPUT_DIRECTORY = "C:/Documents and Settings/Edward/My Documents/My Webs/htmlpurifier/docs/doxygen"
|
||||
CREATE_SUBDIRS = NO
|
||||
OUTPUT_LANGUAGE = English
|
||||
|
1
INSTALL
1
INSTALL
@@ -8,6 +8,7 @@ installation GUI, you've come to the wrong place!) The impatient can scroll
|
||||
down to the bottom of this INSTALL document to see the code, but you really
|
||||
should make sure a few things are properly done.
|
||||
|
||||
Todo: Convert to using the array syntax for configuration.
|
||||
|
||||
|
||||
1. Compatibility
|
||||
|
26
NEWS
26
NEWS
@@ -9,11 +9,29 @@ NEWS ( CHANGELOG and HISTORY ) HTMLPurifier
|
||||
. Internal change
|
||||
==========================
|
||||
|
||||
1.4.0, unknown release date
|
||||
(major feature release)
|
||||
1.4.1, released 2007-01-21
|
||||
! docs/enduser-youtube.html updated according to new functionality
|
||||
- YouTube IDs can have underscores and dashes
|
||||
|
||||
1.3.3, unknown release date, may be dropped
|
||||
(security/bugfix/minor feature release)
|
||||
1.4.0, released 2007-01-21
|
||||
! Implemented list-style-image, URIs now allowed in list-style
|
||||
! Implemented background-image, background-repeat, background-attachment
|
||||
and background-position CSS properties. Shorthand property background
|
||||
supports all of these properties.
|
||||
! Configuration documentation looks nicer
|
||||
! Added %Core.EscapeNonASCIICharacters to workaround loss of Unicode
|
||||
characters while %Core.Encoding is set to a non-UTF-8 encoding.
|
||||
! Support for configuration directive aliases added
|
||||
! Config object can now be instantiated from ini files
|
||||
! YouTube preservation code added to the core, with two lines of code
|
||||
you can add it as a filter to your code. See smoketests/preserveYouTube.php
|
||||
for sample code.
|
||||
! Moved SLOW to docs/enduser-slow.html and added code examples
|
||||
- Replaced version check with functionality check for DOM (thanks Stephen
|
||||
Khoo)
|
||||
. Added smoketest 'all.php', which loads all other smoketests via frames
|
||||
. Implemented AttrDef_CSSURI for url(http://google.com) style declarations
|
||||
. Added convenient single test selector form on test runner
|
||||
|
||||
1.3.2, released 2006-12-25
|
||||
! HTMLPurifier object now accepts configuration arrays, no need to manually
|
||||
|
25
README
25
README
@@ -1,13 +1,22 @@
|
||||
|
||||
README
|
||||
All about HTMLPurifier
|
||||
All about HTML Purifier
|
||||
|
||||
HTMLPurifier is an HTML filtering solution. It uses a unique combination of
|
||||
robust whitelists and agressive parsing to ensure that not only are XSS
|
||||
attacks thwarted, but the resulting HTML is standards compliant.
|
||||
HTML Purifier is an HTML filtering solution that uses a unique combination
|
||||
of robust whitelists and agressive parsing to ensure that not only are
|
||||
XSS attacks thwarted, but the resulting HTML is standards compliant.
|
||||
|
||||
See INSTALL on how to use the library. See docs/ for more developer-oriented
|
||||
documentation as well as some code examples. Users of TinyMCE or FCKeditor
|
||||
may be especially interested in WYSIWYG.
|
||||
HTML Purifier is oriented towards richly formatted documents from
|
||||
untrusted sources that require CSS and a full tag-set. This library can
|
||||
be configured to accept a more restrictive set of tags, but it won't be
|
||||
as efficient as more bare-bones parsers. It will, however, do the job
|
||||
right, which may be more important.
|
||||
|
||||
HTMLPurifier can be found on the web at: http://hp.jpsband.org/
|
||||
Places to go:
|
||||
|
||||
* See INSTALL for a quick installation guide
|
||||
* See docs/ for developer-oriented documentation, code examples and
|
||||
an in-depth installation guide.
|
||||
* See WYSIWYG for information on editors like TinyMCE and FCKeditor
|
||||
|
||||
HTML Purifier can be found on the web at: http://hp.jpsband.org/
|
||||
|
40
SLOW
40
SLOW
@@ -1,40 +0,0 @@
|
||||
|
||||
SLOW
|
||||
also known as the HELP ME LIBRARY IS TOO SLOW MY PAGE TAKE TOO LONG LOAD page
|
||||
|
||||
HTML Purifier is a very powerful library. But with power comes great
|
||||
responsibility, or, at least, longer execution times. Remember, this
|
||||
library isn't lightly grazing over submitted HTML: it's deconstructing
|
||||
the whole thing, rigorously checking the parts, and then putting it
|
||||
back together.
|
||||
|
||||
So, if it so turns out that HTML Purifier is kinda too slow for outbound
|
||||
filtering, you've got a few options:
|
||||
|
||||
1. Inbound filtering - perform filtering of HTML when it's submitted by the
|
||||
user. Since the user is already submitting something, an extra half a
|
||||
second tacked on to the load time probably isn't going to be that huge of
|
||||
a problem. Then, displaying the content is a simple a manner of outputting
|
||||
it directly from your database/filesystem. The trouble with this method is
|
||||
that your user loses the original text, and when doing edits, will be
|
||||
handling the filtered text. While this may be a good thing, especially if
|
||||
you're using a WYSIWYG editor, it can also result in data-loss if a user
|
||||
makes a typo.
|
||||
|
||||
2. Caching the filtered output - accept the submitted text and put it
|
||||
unaltered into the database, but then also generate a filtered version and
|
||||
stash that in the database. Serve the filtered version to readers, and the
|
||||
unaltered version to editors. If need be, you can invalidate the cache and
|
||||
have the cached filtered version be regenerated on the first page view. Pros?
|
||||
Full data retention. Cons? It's more complicated, and opens other editors
|
||||
up to XSS if they are using a WYSIWYG editor (to fix that, they'd have to
|
||||
be able to get their hands on the *really* original text served in plaintext
|
||||
mode).
|
||||
|
||||
In short, inbound filtering is almost as simple as outbound filtering, but
|
||||
it has some drawbacks which cannot be fixed unless you save both the original
|
||||
and the filtered versions.
|
||||
|
||||
There is a third option: profile and optimize HTMLPurifier yourself. Be sure
|
||||
to report back your results if you decide to do that! Especially if you
|
||||
port HTML Purifier to C++. ;-)
|
78
TODO
78
TODO
@@ -7,19 +7,14 @@ TODO List
|
||||
? At-risk
|
||||
==========================
|
||||
|
||||
1.4 release
|
||||
# More extensive URI filtering schemes (see docs/proposal-new-directives.txt)
|
||||
# Allow for background-image and list-style-image (intrinsically tied to above)
|
||||
# Add hooks for custom behavior (for instance, YouTube preservation)
|
||||
- Aggressive caching
|
||||
? Rich set* methods and config file loaders for HTMLPurifier_Config
|
||||
? Configuration profiles: sets of directives that get set with one func call
|
||||
? ConfigSchema directive aliases (so we can rename some of them)
|
||||
? URI validation routines tighter (see docs/dev-code-quality.html) (COMPLEX)
|
||||
|
||||
1.5 release
|
||||
# Implement all non-essential attribute transforms, configurable
|
||||
# URI validation routines tighter (see docs/dev-code-quality.html) (COMPLEX)
|
||||
# Advanced URI filtering schemes (see docs/proposal-new-directives.txt)
|
||||
# Error logging for filtering/cleanup procedures
|
||||
- Requires I18N facilities to be created first (COMPLEX)
|
||||
? Configuration profiles: sets of directives that get set with one func call
|
||||
- XSS-attempt detection
|
||||
|
||||
1.6 release
|
||||
# Add pre-packaged "levels" of cleaning (custom behavior already done)
|
||||
@@ -28,14 +23,30 @@ TODO List
|
||||
specification of elements that, when detected as foreign, trigger removal
|
||||
of children, although unbalanced tags could wreck havoc (or at least
|
||||
delete the rest of the document)).
|
||||
- Allow specifying global attributes on a tag-by-tag basis in
|
||||
%HTML.AllowAttributes
|
||||
? More user-friendly warnings when %HTML.Allow* attempts to specify a
|
||||
tag or attribute that is not supported
|
||||
- Parse TinyMCE whitelist into our %HTML.Allow* whitelists
|
||||
|
||||
1.7 release
|
||||
# Additional support for poorly written HTML
|
||||
- Implement all non-essential attribute transforms (BIG!)
|
||||
- Microsoft Word HTML cleaning (i.e. MsoNormal, but research essential!)
|
||||
- Friendly strict handling of <address> (block -> <br>)
|
||||
- Remove redundant tags, ex. <u><u>Underlined</u></u>. Implementation notes:
|
||||
1. Analyzing which tags to remove duplicants
|
||||
2. Ensure attributes are merged into the parent tag
|
||||
3. Extend the tag exclusion system to specify whether or not the
|
||||
contents should be dropped or not (currently, there's code that could do
|
||||
something like this if it didn't drop the inner text too.)
|
||||
- Remove <span> tags that don't do anything (no attributes)
|
||||
- Remove empty inline tags<i></i>
|
||||
- Append something to duplicate IDs so they're still usable (impl. note: the
|
||||
dupe detector would also need to detect the suffix as well)
|
||||
|
||||
2.0 release
|
||||
# Legit token based CSS parsing (will require revamping almost every
|
||||
AttrDef class)
|
||||
# Formatters for plaintext (COMPLEX)
|
||||
- Auto-paragraphing (be sure to leverage fact that we know when things
|
||||
shouldn't be paragraphed, such as lists and tables).
|
||||
@@ -48,48 +59,31 @@ TODO List
|
||||
- Hooks for adding custom processors to custom namespaced tags and
|
||||
attributes, offer default implementation
|
||||
- Lots of documentation and samples
|
||||
- Allow tags to be "armored", an internal flag that protects them
|
||||
from validation and passes them out unharmed
|
||||
- XHTML 1.1 support
|
||||
|
||||
Ongoing
|
||||
- Lots of profiling, make it faster!
|
||||
- Plugins for major CMSes (COMPLEX)
|
||||
- Drupal
|
||||
- WordPress
|
||||
- eFiction
|
||||
- more! (look for ones that use WYSIWYGs)
|
||||
|
||||
Unknown release (on a scratch-an-itch basis)
|
||||
- Fixes for Firefox's inability to handle COL alignment props (Bug 915)
|
||||
- Automatically add non-breaking spaces to empty table cells when
|
||||
empty-cells:show is applied to have compatibility with Internet Explorer
|
||||
- Convert RTL/LTR override characters to <bdo> tags, or vice versa on demand.
|
||||
Also, enable disabling of directionality
|
||||
- Append something to duplicate IDs so they're still usable (impl. note: the
|
||||
dupe detector would also need to detect the suffix as well)
|
||||
- Have 'lang' attribute be checked against official lists
|
||||
|
||||
Encoding workarounds
|
||||
- Non-lossy dumb alternate character encoding transformations, achieved by
|
||||
numerically encoding all non-ASCII characters
|
||||
- Semi-lossy dumb alternate character encoding transformations, achieved by
|
||||
Ongoing
|
||||
- Lots of profiling, make it faster!
|
||||
- Plugins for major CMSes (COMPLEX)
|
||||
- WordPress
|
||||
- eFiction
|
||||
- more! (look for ones that use WYSIWYGs)
|
||||
|
||||
Unknown release (on a scratch-an-itch basis)
|
||||
- Upgrade SimpleTest testing code to newest versions
|
||||
- Have 'lang' attribute be checked against official lists
|
||||
? Semi-lossy dumb alternate character encoding transformations, achieved by
|
||||
encoding all characters that have string entity equivalents
|
||||
|
||||
Requested
|
||||
- Native content compression, whitespace stripping (don't rely on Tidy, make
|
||||
? Native content compression, whitespace stripping (don't rely on Tidy, make
|
||||
sure we don't remove from <pre> or related tags)
|
||||
- Win32 Phalanger C# binaries (?)
|
||||
- Remove redundant tags, ex. <u><u>Underlined</u></u>. Implementation notes:
|
||||
1. Analyzing which tags to remove duplicants
|
||||
2. Ensure attributes are merged into the parent tag
|
||||
3. Extend the tag exclusion system to specify whether or not the
|
||||
contents should be dropped or not (currently, there's code that could do
|
||||
something like this if it didn't drop the inner text too.)
|
||||
- More user-friendly warnings when %HTML.Allow* attempts to specify a
|
||||
tag or attribute that is not supported
|
||||
- Allow specifying global attributes on a tag-by-tag basis in
|
||||
%HTML.AllowAttributes
|
||||
- Parse TinyMCE whitelist into our %HTML.Allow* whitelists
|
||||
- XSS-attempt detection
|
||||
|
||||
Wontfix
|
||||
- Non-lossy smart alternate character encoding transformations (unless
|
||||
|
3
WYSIWYG
3
WYSIWYG
@@ -18,4 +18,5 @@ HTML Purifier is perfect for filtering pure-HTML input from WYSIWYG editors.
|
||||
Enough said.
|
||||
|
||||
There is a proof-of-concept integration of HTML Purifier with the Mantis
|
||||
bugtracker at http://hp.jpsband.org/mantis/
|
||||
bugtracker at http://hp.jpsband.org/mantis/ You can see notes on how
|
||||
this integration was acheived at http://hp.jpsband.org/mantis_notes.txt
|
||||
|
BIN
art/1000passes.png
Normal file
BIN
art/1000passes.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 3.4 KiB |
@@ -99,6 +99,8 @@ foreach($schema->info as $namespace_name => $namespace_info) {
|
||||
|
||||
foreach ($namespace_info as $name => $info) {
|
||||
|
||||
if ($info->class == 'alias') continue;
|
||||
|
||||
$dom_directive = $dom_document->createElement('directive');
|
||||
$dom_namespace->appendChild($dom_directive);
|
||||
|
||||
|
@@ -1,3 +1,6 @@
|
||||
|
||||
body {margin:1em 4em;}
|
||||
|
||||
table {border-collapse:collapse;}
|
||||
table td, table th {padding:0.2em;}
|
||||
|
||||
@@ -8,3 +11,14 @@ table.constraints td pre {margin:0;}
|
||||
|
||||
#toc {list-style-type:none; font-weight:bold;}
|
||||
#toc ul {list-style-type:disc; font-weight:normal;}
|
||||
|
||||
.description p {margin-top:0;margin-bottom:1em;}
|
||||
|
||||
#library, h1 {text-align:center; font-family:Garamond, serif;
|
||||
font-variant:small-caps;}
|
||||
#library {font-size:1em;}
|
||||
h1 {margin-top:0;}
|
||||
h2 {border-bottom:1px solid #CCC; font-family:sans-serif; font-weight:normal;
|
||||
font-size:1.3em;}
|
||||
h3 {font-family:sans-serif; font-size:1.1em; font-weight:bold; }
|
||||
h4 {font-family:sans-serif; font-size:0.9em; font-weight:bold; }
|
||||
|
@@ -18,12 +18,13 @@
|
||||
<xsl:template match="/">
|
||||
<html lang="en" xml:lang="en">
|
||||
<head>
|
||||
<title><xsl:value-of select="/configdoc/title" /> Configuration Documentation</title>
|
||||
<title>Configuration Documentation - <xsl:value-of select="/configdoc/title" /></title>
|
||||
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8" />
|
||||
<link rel="stylesheet" type="text/css" href="styles/plain.css" />
|
||||
</head>
|
||||
<body>
|
||||
<h1><xsl:value-of select="/configdoc/title" /> Configuration Documentation</h1>
|
||||
<div id="library"><xsl:value-of select="/configdoc/title" /></div>
|
||||
<h1>Configuration Documentation</h1>
|
||||
<h2>Table of Contents</h2>
|
||||
<ul id="toc">
|
||||
<xsl:apply-templates mode="toc" />
|
||||
|
@@ -14,6 +14,7 @@
|
||||
|
||||
<div id="filing">Filed under Development</div>
|
||||
<div id="index">Return to the <a href="index.html">index</a>.</div>
|
||||
<div id="home"><a href="http://hp.jpsband.org/">HTML Purifier</a> End-User Documentation</div>
|
||||
|
||||
<p>Okay, face it. Programmers can get lazy, cut corners, or make mistakes. They
|
||||
also can do quick prototypes, and then forget to rewrite them later. Well,
|
||||
|
@@ -14,6 +14,7 @@
|
||||
|
||||
<div id="filing">Filed under Development</div>
|
||||
<div id="index">Return to the <a href="index.html">index</a>.</div>
|
||||
<div id="home"><a href="http://hp.jpsband.org/">HTML Purifier</a> End-User Documentation</div>
|
||||
|
||||
<p>The classes in this library follow a few naming conventions, which may
|
||||
help you find the correct functionality more quickly. Here they are:</p>
|
||||
|
@@ -14,6 +14,7 @@
|
||||
|
||||
<div id="filing">Filed under Development</div>
|
||||
<div id="index">Return to the <a href="index.html">index</a>.</div>
|
||||
<div id="home"><a href="http://hp.jpsband.org/">HTML Purifier</a> End-User Documentation</div>
|
||||
|
||||
<p>Here are some possible optimization techniques we can apply to code sections if
|
||||
they turn out to be slow. Be sure not to prematurely optimize: if you get
|
||||
|
@@ -32,6 +32,7 @@ thead th {text-align:left;padding:0.1em;background-color:#EEE;}
|
||||
|
||||
<div id="filing">Filed under Development</div>
|
||||
<div id="index">Return to the <a href="index.html">index</a>.</div>
|
||||
<div id="home"><a href="http://hp.jpsband.org/">HTML Purifier</a> End-User Documentation</div>
|
||||
|
||||
<h2>Key</h2>
|
||||
|
||||
@@ -59,7 +60,7 @@ thead th {text-align:left;padding:0.1em;background-color:#EEE;}
|
||||
<tbody>
|
||||
<tr><th colspan="2">Standard</th></tr>
|
||||
<tr class="css1 impl-yes"><td>background-color</td><td>COMPOSITE(<color>, transparent)</td></tr>
|
||||
<tr class="css1 impl-yes"><td>background</td><td>SHORTHAND, only for color, see below for info on background-image and friends</td></tr>
|
||||
<tr class="css1 impl-yes"><td>background</td><td>SHORTHAND, currently alias for background-color</td></tr>
|
||||
<tr class="css1 impl-yes"><td>border</td><td>SHORTHAND, MULTIPLE</td></tr>
|
||||
<tr class="css1 impl-yes"><td>border-color</td><td>MULTIPLE</td></tr>
|
||||
<tr class="css1 impl-yes"><td>border-style</td><td>MULTIPLE</td></tr>
|
||||
@@ -141,17 +142,17 @@ thead th {text-align:left;padding:0.1em;background-color:#EEE;}
|
||||
|
||||
<tbody>
|
||||
<tr><th colspan="2">Unknown</th></tr>
|
||||
<tr class="danger css1"><td>background-image</td><td>Dangerous, target milestone 1.3</td></tr>
|
||||
<tr class="css1"><td>background-attachment</td><td>ENUM(scroll, fixed),
|
||||
<tr class="danger css1 impl-yes"><td>background-image</td><td>Dangerous, target milestone 1.3</td></tr>
|
||||
<tr class="css1 impl-yes"><td>background-attachment</td><td>ENUM(scroll, fixed),
|
||||
Depends on background-image</td></tr>
|
||||
<tr class="css1"><td>background-position</td><td>Depends on background-image</td></tr>
|
||||
<tr class="css1 impl-yes"><td>background-position</td><td>Depends on background-image</td></tr>
|
||||
<tr class="danger impl-no"><td>cursor</td><td>Dangerous but fluffy</td></tr>
|
||||
<tr class="danger css1"><td>display</td><td>ENUM(...), Dangerous but interesting;
|
||||
will not implement list-item, run-in (Opera only) or table (no IE);
|
||||
inline-block has incomplete IE6 support and requires -moz-inline-box
|
||||
for Mozilla. Unknown target milestone.</td></tr>
|
||||
<tr><td class="css1">height</td><td>Interesting, why use it? Unknown target milestone.</td></tr>
|
||||
<tr class="danger css1"><td>list-style-image</td><td>Dangerous? Target milestone 1.3</td></tr>
|
||||
<tr class="css1"><td>height</td><td>Interesting, why use it? Unknown target milestone.</td></tr>
|
||||
<tr class="danger css1 impl-yes"><td>list-style-image</td><td>Dangerous?</td></tr>
|
||||
<tr class="impl-no"><td>max-height</td><td rowspan="4">No IE 5/6</td></tr>
|
||||
<tr class="impl-no"><td>min-height</td></tr>
|
||||
<tr class="impl-no"><td>max-width</td></tr>
|
||||
@@ -230,7 +231,7 @@ Mozilla on inside and needs -moz-outline, no IE support.</td></tr>
|
||||
|
||||
<tbody>
|
||||
<tr><th colspan="3">CSS</th></tr>
|
||||
<tr class="impl-yes"><td>style</td><td>All</td><td>Not all properties may be implemented, parser is good though.</td></tr>
|
||||
<tr class="impl-yes"><td>style</td><td>All</td><td>Parser is reasonably functional. Status here doesn't count individual properties.</td></tr>
|
||||
</tbody>
|
||||
|
||||
<tbody>
|
||||
@@ -265,13 +266,13 @@ Mozilla on inside and needs -moz-outline, no IE support.</td></tr>
|
||||
<tr><td rowspan="5">align</td><td>CAPTION</td><td>Near-equiv style 'caption-side', drop left and right</td></tr>
|
||||
<tr><td>IMG</td><td rowspan="2">Margin-left and margin-right = auto or parent div</td></tr>
|
||||
<tr><td>TABLE</td></tr>
|
||||
<tr><td>HR</td><td>Equivalent style 'text-align' (IE tested)</td></tr>
|
||||
<tr><td>HR</td><td>Near-equivalent style 'text-align' (Works for IE and Opera, but not Firefox). Also try <code>margin-right:auto; margin-left:0;</code> for left or <code>margin-right:0; margin-left:auto;</code> for right (optionally replacing 0 with the original margin for that side)</td></tr>
|
||||
<tr class="impl-yes"><td>H1, H2, H3, H4, H5, H6, P</td><td>Equivalent style 'text-align'</td></tr>
|
||||
<tr class="required impl-yes"><td>alt</td><td>IMG</td><td>Required, insert image filename if src is present or default invalid image text</td></tr>
|
||||
<tr><td rowspan="3">bgcolor</td><td>TABLE</td><td>Equivalent style 'background-color' (IE tested)</td></tr>
|
||||
<tr><td>TR</td><td>Equivalent style 'background-color' (IE tested)</td></tr>
|
||||
<tr><td rowspan="3">bgcolor</td><td>TABLE</td><td>Equivalent style 'background-color'</td></tr>
|
||||
<tr><td>TR</td><td>Equivalent style 'background-color'</td></tr>
|
||||
<tr><td>TD, TH</td><td>Equivalent style 'background-color'</td></tr>
|
||||
<tr><td>border</td><td>IMG</td><td>Equivalent style 'border-width', only applies when link present</td></tr>
|
||||
<tr><td>border</td><td>IMG</td><td>Near equivalent style 'border-width', as it only applies when link present</td></tr>
|
||||
<tr><td>clear</td><td>BR</td><td>Near-equiv style 'clear', transform 'all' into 'both'</td></tr>
|
||||
<tr class="impl-no"><td>compact</td><td>DL, OL, UL</td><td>Boolean, needs custom CSS class; rarely used anyway</td></tr>
|
||||
<tr class="required impl-yes"><td>dir</td><td>BDO</td><td>Required, insert ltr (or configuration value) if none</td></tr>
|
||||
|
@@ -15,6 +15,7 @@
|
||||
|
||||
<div id="filing">Filed under End-User</div>
|
||||
<div id="index">Return to the <a href="index.html">index</a>.</div>
|
||||
<div id="home"><a href="http://hp.jpsband.org/">HTML Purifier</a> End-User Documentation</div>
|
||||
|
||||
<p>Prior to HTML Purifier 1.2.0, this library blithely accepted user input that
|
||||
looked like this:</p>
|
||||
|
@@ -7,6 +7,7 @@ and it's up to you to provide it the proper information and proper context
|
||||
to be effective. Things to remember:
|
||||
|
||||
1. Character Encoding: UTF-8.
|
||||
This segment will soon be obsoleted by enduser-utf8.html
|
||||
Currently, the parser runs under the assumption that it is dealing
|
||||
with UTF-8. Not ISO-8859-1 or Windows-1252, UTF-8. And definitely not "no
|
||||
character encoding explicitly stated" or UTF-7. If you're not using UTF-8 as
|
||||
@@ -27,6 +28,7 @@ this may be configurable in the future. Do you want standards compliance?
|
||||
The doctype is a good place to start.
|
||||
|
||||
3. IDs
|
||||
This segment is obsoleted by enduser-id.html
|
||||
They need to be unique, but without some knowledge of the
|
||||
rest of the document, it's difficult to know what's unique. %Attr.IDBlacklist
|
||||
needs to be set: we may want to consider disallowing IDs by default to
|
||||
|
117
docs/enduser-slow.html
Normal file
117
docs/enduser-slow.html
Normal file
@@ -0,0 +1,117 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
|
||||
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
|
||||
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head>
|
||||
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
|
||||
<meta name="description" content="Explains how to speed up HTML Purifier through caching or inbound filtering." />
|
||||
<link rel="stylesheet" type="text/css" href="./style.css" />
|
||||
|
||||
<title>Speeding up HTML Purifier - HTML Purifier</title>
|
||||
|
||||
</head><body>
|
||||
|
||||
<h1 class="subtitled">Speeding up HTML Purifier</h1>
|
||||
<div class="subtitle">...also known as the HELP ME LIBRARY IS TOO SLOW MY PAGE TAKE TOO LONG page</div>
|
||||
|
||||
<div id="filing">Filed under End-User</div>
|
||||
<div id="index">Return to the <a href="index.html">index</a>.</div>
|
||||
<div id="home"><a href="http://hp.jpsband.org/">HTML Purifier</a> End-User Documentation</div>
|
||||
|
||||
<p>HTML Purifier is a very powerful library. But with power comes great
|
||||
responsibility, in the form of longer execution times. Remember, this
|
||||
library isn't lightly grazing over submitted HTML: it's deconstructing
|
||||
the whole thing, rigorously checking the parts, and then putting it back
|
||||
together. </p>
|
||||
|
||||
<p>So, if it so turns out that HTML Purifier is kinda too slow for outbound
|
||||
filtering, you've got a few options: </p>
|
||||
|
||||
<h2>Inbound filtering</h2>
|
||||
|
||||
<p>Perform filtering of HTML when it's submitted by the user. Since the
|
||||
user is already submitting something, an extra half a second tacked on
|
||||
to the load time probably isn't going to be that huge of a problem.
|
||||
Then, displaying the content is a simple a manner of outputting it
|
||||
directly from your database/filesystem. The trouble with this method is
|
||||
that your user loses the original text, and when doing edits, will be
|
||||
handling the filtered text. While this may be a good thing, especially
|
||||
if you're using a WYSIWYG editor, it can also result in data-loss if a
|
||||
user makes a typo. </p>
|
||||
|
||||
<p>Example (non-functional):</p>
|
||||
|
||||
<pre><?php
|
||||
/**
|
||||
* FORM SUBMISSION PAGE
|
||||
* display_error($message) : displays nice error page with message
|
||||
* display_success() : displays a nice success page
|
||||
* display_form() : displays the HTML submission form
|
||||
* database_insert($html) : inserts data into database as new row
|
||||
*/
|
||||
if (!empty($_POST)) {
|
||||
require_once '/path/to/library/HTMLPurifier.auto.php';
|
||||
require_once 'HTMLPurifier.func.php';
|
||||
$dirty_html = isset($_POST['html']) ? $_POST['html'] : false;
|
||||
if (!$dirty_html) {
|
||||
display_error('You must write some HTML!');
|
||||
}
|
||||
$html = HTMLPurifier($dirty_html);
|
||||
database_insert($html);
|
||||
display_success();
|
||||
// notice that $dirty_html is *not* saved
|
||||
} else {
|
||||
display_form();
|
||||
}
|
||||
?></pre>
|
||||
|
||||
<h2>Caching the filtered output</h2>
|
||||
|
||||
<p>Accept the submitted text and put it unaltered into the database, but
|
||||
then also generate a filtered version and stash that in the database.
|
||||
Serve the filtered version to readers, and the unaltered version to
|
||||
editors. If need be, you can invalidate the cache and have the cached
|
||||
filtered version be regenerated on the first page view. Pros? Full data
|
||||
retention. Cons? It's more complicated, and opens other editors up to
|
||||
XSS if they are using a WYSIWYG editor (to fix that, they'd have to be
|
||||
able to get their hands on the *really* original text served in
|
||||
plaintext mode). </p>
|
||||
|
||||
<p>Example (non-functional):</p>
|
||||
|
||||
<pre><?php
|
||||
/**
|
||||
* VIEW PAGE
|
||||
* display_error($message) : displays nice error page with message
|
||||
* cache_get($id) : retrieves HTML from fast cache (db or file)
|
||||
* cache_insert($id, $html) : inserts good HTML into cache system
|
||||
* database_get($id) : retrieves raw HTML from database
|
||||
*/
|
||||
$id = isset($_GET['id']) ? (int) $_GET['id'] : false;
|
||||
if (!$id) {
|
||||
display_error('Must specify ID.');
|
||||
exit;
|
||||
}
|
||||
$html = cache_get($id); // filesystem or database
|
||||
if ($html === false) {
|
||||
// cache didn't have the HTML, generate it
|
||||
$raw_html = database_get($id);
|
||||
require_once '/path/to/library/HTMLPurifier.auto.php';
|
||||
require_once 'HTMLPurifier.func.php';
|
||||
$html = HTMLPurifier($raw_html);
|
||||
cache_insert($id, $html);
|
||||
}
|
||||
echo $html;
|
||||
?></pre>
|
||||
|
||||
<h2>Summary</h2>
|
||||
|
||||
<p>In short, inbound filtering is the simple option and caching is the
|
||||
robust option (albeit with bigger storage requirements). </p>
|
||||
|
||||
<p>There is a third option, independent of the two we've discussed: profile
|
||||
and optimize HTMLPurifier yourself. Be sure to report back your results
|
||||
if you decide to do that! Especially if you port HTML Purifier to C++.
|
||||
<tt>;-)</tt></p>
|
||||
|
||||
</body>
|
||||
</html>
|
640
docs/enduser-utf8.html
Normal file
640
docs/enduser-utf8.html
Normal file
@@ -0,0 +1,640 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
|
||||
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
|
||||
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"><head>
|
||||
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
|
||||
<meta name="description" content="Describes the rationale for using UTF-8, the ramifications otherwise, and how to make the switch." />
|
||||
<link rel="stylesheet" type="text/css" href="./style.css" />
|
||||
<script defer="defer" type="text/javascript" src="./toc-gen.js"></script>
|
||||
<style type="text/css">
|
||||
.minor td {font-style:italic;}
|
||||
</style>
|
||||
|
||||
<title>UTF-8 - HTML Purifier</title>
|
||||
|
||||
<!-- Note to users: this document, though professing to be UTF-8, attempts
|
||||
to use only ASCII characters, because most webservers are configured
|
||||
to send HTML as ISO-8859-1. So I will, many times, go against my
|
||||
own advice for sake of portability. -->
|
||||
|
||||
</head><body>
|
||||
|
||||
<h1>UTF-8</h1>
|
||||
|
||||
<div id="filing">Filed under End-User</div>
|
||||
<div id="index">Return to the <a href="index.html">index</a>.</div>
|
||||
<div id="home"><a href="http://hp.jpsband.org/">HTML Purifier</a> End-User Documentation</div>
|
||||
|
||||
<p>Character encoding and character sets, in truth, are not that
|
||||
difficult to understand. But if you don't understand them, you are going
|
||||
to be caught by surprise by some of HTML Purifier's behavior, namely
|
||||
the fact that it operates UTF-8 or the limitations of the character
|
||||
encoding transformations it does. This document will walk you through
|
||||
determining the encoding of your system and how you should handle
|
||||
this information. It will stay away from excessive discussion on
|
||||
the internals of character encoding, but offer the information in
|
||||
asides that can easily be skipped.</p>
|
||||
|
||||
<blockquote class="aside">
|
||||
<div class="label">Asides</div>
|
||||
<p>Text in this formatting is an <strong>aside</strong>,
|
||||
interesting tidbits for the curious but not strictly necessary material to
|
||||
do the tutorial. If you read this text, you'll come out
|
||||
with a greater understanding of the underlying issues.</p>
|
||||
</blockquote>
|
||||
|
||||
<h2 id="findcharset">Finding the real encoding</h2>
|
||||
|
||||
<p>In the beginning, there was ASCII, and things were simple. But they
|
||||
weren't good, for no one could write in Cryllic or Thai. So there
|
||||
exploded a proliferation of character encodings to remedy the problem
|
||||
by extending the characters ASCII could express. This ridiculously
|
||||
simplified version of the history of character encodings shows us that
|
||||
there are now many character encodings floating around.</p>
|
||||
|
||||
<blockquote class="aside">
|
||||
<p>A <strong>character encoding</strong> tells the computer how to
|
||||
interpret raw zeroes and ones into real characters. It
|
||||
usually does this by pairing numbers with characters.</p>
|
||||
<p>There are many different types of character encodings floating
|
||||
around, but the ones we deal most frequently with are ASCII,
|
||||
8-bit encodings, and Unicode-based encodings.</p>
|
||||
<ul>
|
||||
<li><strong>ASCII</strong> is a 7-bit encoding based on the
|
||||
English alphabet.</li>
|
||||
<li><strong>8-bit encodings</strong> are extensions to ASCII
|
||||
that add a potpourri of useful, non-standard characters
|
||||
like é and æ. They can only add 127 characters,
|
||||
so usually only support one script at a time. When you
|
||||
see a page on the web, chances are it's encoded in one
|
||||
of these encodings.</li>
|
||||
<li><strong>Unicode-based encodings</strong> implement the
|
||||
Unicode standard and include UTF-8, UCS-2 and UTF-16.
|
||||
They go beyond 8-bits (the first two are variable length,
|
||||
while the second one uses 16-bits), and support almost
|
||||
every language in the world. UTF-8 is gaining traction
|
||||
as the dominant international encoding of the web.</li>
|
||||
</ul>
|
||||
</blockquote>
|
||||
|
||||
<p>The first step of our journey is to find out what the encoding of
|
||||
your website is. The most reliable way is to ask your
|
||||
browser:</p>
|
||||
|
||||
<dl>
|
||||
<dt>Mozilla Firefox</dt>
|
||||
<dd>Tools > Page Info: Encoding</dd>
|
||||
<dt>Internet Explorer</dt>
|
||||
<dd>View > Encoding: bulleted item is unofficial name</dd>
|
||||
</dl>
|
||||
|
||||
<p>Internet Explorer won't give you the mime (i.e. useful/real) name of the
|
||||
character encoding, so you'll have to look it up using their description.
|
||||
Some common ones:</p>
|
||||
|
||||
<table class="table">
|
||||
<thead><tr>
|
||||
<th>IE's Description</th>
|
||||
<th>Mime Name</th>
|
||||
</tr></thead>
|
||||
<tbody>
|
||||
<tr><th colspan="2">Windows</th></tr>
|
||||
<tr><td>Arabic (Windows)</td><td>Windows-1256</td></tr>
|
||||
<tr><td>Baltic (Windows)</td><td>Windows-1257</td></tr>
|
||||
<tr><td>Central European (Windows)</td><td>Windows-1250</td></tr>
|
||||
<tr><td>Cyrillic (Windows)</td><td>Windows-1251</td></tr>
|
||||
<tr><td>Greek (Windows)</td><td>Windows-1253</td></tr>
|
||||
<tr><td>Hebrew (Windows)</td><td>Windows-1255</td></tr>
|
||||
<tr><td>Thai (Windows)</td><td>TIS-620</td></tr>
|
||||
<tr><td>Turkish (Windows)</td><td>Windows-1254</td></tr>
|
||||
<tr><td>Vietnamese (Windows)</td><td>Windows-1258</td></tr>
|
||||
<tr><td>Western European (Windows)</td><td>Windows-1252</td></tr>
|
||||
</tbody>
|
||||
<tbody>
|
||||
<tr><th colspan="2">ISO</th></tr>
|
||||
<tr><td>Arabic (ISO)</td><td>ISO-8859-6</td></tr>
|
||||
<tr><td>Baltic (ISO)</td><td>ISO-8859-4</td></tr>
|
||||
<tr><td>Central European (ISO)</td><td>ISO-8859-2</td></tr>
|
||||
<tr><td>Cyrillic (ISO)</td><td>ISO-8859-5</td></tr>
|
||||
<tr class="minor"><td>Estonian (ISO)</td><td>ISO-8859-13</td></tr>
|
||||
<tr class="minor"><td>Greek (ISO)</td><td>ISO-8859-7</td></tr>
|
||||
<tr><td>Hebrew (ISO-Logical)</td><td>ISO-8859-8-l</td></tr>
|
||||
<tr><td>Hebrew (ISO-Visual)</td><td>ISO-8859-8</td></tr>
|
||||
<tr class="minor"><td>Latin 9 (ISO)</td><td>ISO-8859-15</td></tr>
|
||||
<tr class="minor"><td>Turkish (ISO)</td><td>ISO-8859-9</td></tr>
|
||||
<tr><td>Western European (ISO)</td><td>ISO-8859-1</td></tr>
|
||||
</tbody>
|
||||
<tbody>
|
||||
<tr><th colspan="2">Other</th></tr>
|
||||
<tr><td>Chinese Simplified (GB18030)</td><td>GB18030</td></tr>
|
||||
<tr><td>Chinese Simplified (GB2312)</td><td>GB2312</td></tr>
|
||||
<tr><td>Chinese Simplified (HZ)</td><td>HZ</td></tr>
|
||||
<tr><td>Chinese Traditional (Big5)</td><td>Big5</td></tr>
|
||||
<tr><td>Japanese (Shift-JIS)</td><td>Shift_JIS</td></tr>
|
||||
<tr><td>Japanese (EUC)</td><td>EUC-JP</td></tr>
|
||||
<tr><td>Korean</td><td>EUC-KR</td></tr>
|
||||
<tr><td>Unicode (UTF-8)</td><td>UTF-8</td></tr>
|
||||
</tbody>
|
||||
</table>
|
||||
|
||||
<p>Internet Explorer does not recognize some of the more obscure
|
||||
character encodings, and having to lookup the real names with a table
|
||||
is a pain, so I recommend using Mozilla Firefox to find out your
|
||||
character encoding.</p>
|
||||
|
||||
<h2 id="findmetacharset">Finding the embedded encoding</h2>
|
||||
|
||||
<p>At this point, you may be asking, "Didn't we already find out our
|
||||
encoding?" Well, as it turns out, there are multiple places where
|
||||
a web developer can specify a character encoding, and one such place
|
||||
is in a <code>META</code> tag:</p>
|
||||
|
||||
<pre><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /></pre>
|
||||
|
||||
<p>You'll find this in the <code>HEAD</code> section of an HTML document.
|
||||
The text to the right of <code>charset=</code> is the "claimed"
|
||||
encoding: the HTML claims to be this encoding, but whether or not this
|
||||
is actually the case depends on other factors. For now, take note
|
||||
if your <code>META</code> tag claims that either:</p>
|
||||
|
||||
<ol>
|
||||
<li>The character encoding is the same as the one reported by the
|
||||
browser,</li>
|
||||
<li>The character encoding is different from the browser's, or</li>
|
||||
<li>There is no <code>META</code> tag at all! (horror, horror!)</li>
|
||||
</ol>
|
||||
|
||||
<h2 id="fixcharset">Fixing the encoding</h2>
|
||||
|
||||
<p>If your <code>META</code> encoding and your real encoding match,
|
||||
savvy! You can skip this section. If they don't...</p>
|
||||
|
||||
<h3 id="fixcharset-none">No embedded encoding</h3>
|
||||
|
||||
<p>If this is the case, you'll want to add in the appropriate
|
||||
<code>META</code> tag to your website. It's as simple as copy-pasting
|
||||
the code snippet above and replacing UTF-8 with whatever is the mime name
|
||||
of your real encoding.</p>
|
||||
|
||||
<blockquote class="aside">
|
||||
<p>For all those skeptics out there, there is a very good reason
|
||||
why the character encoding should be explicitly stated. When the
|
||||
browser isn't told what the character encoding of a text is, it
|
||||
has to guess: and sometimes the guess is wrong. Hackers can manipulate
|
||||
this guess in order to slip XSS pass filters and then fool the
|
||||
browser into executing it as active code. A great example of this
|
||||
is the <a href="http://shiflett.org/archive/177">Google UTF-7
|
||||
exploit</a>.</p>
|
||||
<p>You might be able to get away with not specifying a character
|
||||
encoding with the <code>META</code> tag as long as your webserver
|
||||
sends the right Content-Type header, but why risk it? Besides, if
|
||||
the user downloads the HTML file, there is no longer any webserver
|
||||
to define the character encoding.</p>
|
||||
</blockquote>
|
||||
|
||||
<h3 id="fixcharset-diff">Embedded encoding disagrees</h3>
|
||||
|
||||
<p>This is an extremely common mistake: another source is telling
|
||||
the browser what the
|
||||
character encoding is and is overriding the embedded encoding. This
|
||||
source usually is the Content-Type HTTP header that the webserver (i.e.
|
||||
Apache) sends. A usual Content-Type header sent with a page might
|
||||
look like this:</p>
|
||||
|
||||
<pre>Content-Type: text/html; charset=ISO-8859-1</pre>
|
||||
|
||||
<p>Notice how there is a charset parameter: this is the webserver's
|
||||
way of telling a browser what the character encoding is, much like
|
||||
the <code>META</code> tags we touched upon previously.</p>
|
||||
|
||||
<blockquote class="aside"><p>In fact, the <code>META</code> tag is
|
||||
designed as a substitute for the HTTP header for contexts where
|
||||
sending headers is impossible (such as locally stored files without
|
||||
a webserver). Thus the name <code>http-equiv</code> (HTTP equivalent).
|
||||
</p></blockquote>
|
||||
|
||||
<p>There are two ways to go about fixing this: changing the <code>META</code>
|
||||
tag to match the HTTP header, or changing the HTTP header to match
|
||||
the <code>META</code> tag. How do we know which to do? It depends
|
||||
on the website's content: after all, headers and tags are only ways of
|
||||
describing the actual characters on the web page.</p>
|
||||
|
||||
<p>If your website:</p>
|
||||
|
||||
<dl>
|
||||
<dt>...only uses ASCII characters,</dt>
|
||||
<dd>Either way is fine, but I recommend switching both to
|
||||
UTF-8 (more on this later).</dd>
|
||||
<dt>...uses special characters, and they display
|
||||
properly,</dt>
|
||||
<dd>Change the embedded encoding to the server encoding.</dd>
|
||||
<dt>...uses special characters, but users often complain that
|
||||
they come out garbled,</dt>
|
||||
<dd>Change the server encoding to the embedded encoding.</dd>
|
||||
</dl>
|
||||
|
||||
<p>Changing a META tag is easy: just swap out the old encoding
|
||||
for the new. Changing the server (HTTP header) encoding, however,
|
||||
is slightly more difficult.</p>
|
||||
|
||||
<h3 id="fixcharset-server">Changing the server encoding</h3>
|
||||
|
||||
<h4 id="fixcharset-server-php">PHP header() function</h4>
|
||||
|
||||
<p>The simplest way to handle this problem is to send the encoding
|
||||
yourself, via your programming language. Since you're using HTML
|
||||
Purifier, I'll assume PHP, although it's not too difficult to do
|
||||
similar things in
|
||||
<a href="http://www.w3.org/International/O-HTTP-charset#scripting">other
|
||||
languages</a>. The appropriate code is:</p>
|
||||
|
||||
<pre><a href="http://php.net/function.header">header</a>('Content-Type:text/html; charset=UTF-8');</pre>
|
||||
|
||||
<p>...replacing UTF-8 with whatever your embedded encoding is.
|
||||
This code must come before any output, so be careful about
|
||||
stray whitespace in your application.</p>
|
||||
|
||||
<h4 id="fixcharset-server-phpini">PHP ini directive</h4>
|
||||
|
||||
<p>PHP also has a neat little ini directive that can save you a
|
||||
header call: <code><a href="http://php.net/ini.core#ini.default-charset">default_charset</a></code>. Using this code:</p>
|
||||
|
||||
<pre><a href="http://php.net/function.ini_set">ini_set</a>('default_charset', 'UTF-8');</pre>
|
||||
|
||||
<p>...will also do the trick. If PHP is running as an Apache module (and
|
||||
not as FastCGI, consult
|
||||
<a href="http://php.net/phpinfo">phpinfo</a>() for details), you can even use htaccess do apply this property
|
||||
globally:</p>
|
||||
|
||||
<pre><a href="http://php.net/configuration.changes#configuration.changes.apache">php_value</a> default_charset "UTF-8"</pre>
|
||||
|
||||
<blockquote class="aside"><p>As with all INI directives, this can
|
||||
also go in your php.ini file. Some hosting providers allow you to customize
|
||||
your own php.ini file, ask your support for details. Use:</p>
|
||||
<pre>default_charset = "utf-8"</pre></blockquote>
|
||||
|
||||
<h4 id="fixcharset-server-nophp">Non-PHP</h4>
|
||||
|
||||
<p>You may, for whatever reason, may need to set the character encoding
|
||||
on non-PHP files, usually plain ol' HTML files. Doing this
|
||||
is more of a hit-or-miss process: depending on the software being
|
||||
used as a webserver and the configuration of that software, certain
|
||||
techniques may work, or may not work.</p>
|
||||
|
||||
<h4 id="fixcharset-server-htaccess">.htaccess</h4>
|
||||
|
||||
<p>On Apache, you can use an .htaccess file to change the character
|
||||
encoding. I'll defer to
|
||||
<a href="http://www.w3.org/International/questions/qa-htaccess-charset">W3C</a>
|
||||
for the in-depth explanation, but it boils down to creating a file
|
||||
named .htaccess with the contents:</p>
|
||||
|
||||
<pre><a href="http://httpd.apache.org/docs/1.3/mod/mod_mime.html#addcharset">AddCharset</a> UTF-8 .html</pre>
|
||||
|
||||
<p>Where UTF-8 is replaced with the character encoding you want to
|
||||
use and .html is a file extension that this will be applied to. This
|
||||
character encoding will then be set for any file directly in
|
||||
or in the subdirectories of directory you place this file in.</p>
|
||||
|
||||
<p>If you're feeling particularly courageous, you can use:</p>
|
||||
|
||||
<pre><a href="http://httpd.apache.org/docs/1.3/mod/core.html#adddefaultcharset">AddDefaultCharset</a> UTF-8</pre>
|
||||
|
||||
<p>...which changes the character set Apache adds to any document that
|
||||
doesn't have any Content-Type parameters. This directive, which the
|
||||
default configuration file sets to iso-8859-1 for security
|
||||
reasons, is probably why your headers mismatch
|
||||
with the <code>META</code> tag. If you would prefer Apache not to be
|
||||
butting in on your character encodings, you can tell it not
|
||||
to send anything at all:</p>
|
||||
|
||||
<pre><a href="http://httpd.apache.org/docs/1.3/mod/core.html#adddefaultcharset">AddDefaultCharset</a> Off</pre>
|
||||
|
||||
<p>...making your <code>META</code> tags the sole source of
|
||||
character encoding information. In these cases, it is
|
||||
<em>especially</em> important to make sure you have valid <code>META</code>
|
||||
tags on your pages and all the text before them is ASCII.</p>
|
||||
|
||||
<blockquote class="aside"><p>These directives can also be
|
||||
placed in httpd.conf file for Apache, but
|
||||
in most shared hosting situations you won't be able to edit this file.
|
||||
</p></blockquote>
|
||||
|
||||
<h4 id="fixcharset-server-ext">File extensions</h4>
|
||||
|
||||
<p>If you're not allowed to use .htaccess files, you can often
|
||||
piggy-back off of Apache's default AddCharset declarations to get
|
||||
your files in the proper extension. Here are Apache's default
|
||||
character set declarations:</p>
|
||||
|
||||
<table class="table">
|
||||
<thead><tr>
|
||||
<th>Charset</th>
|
||||
<th>File extension(s)</th>
|
||||
</tr></thead>
|
||||
<tbody>
|
||||
<tr><td>ISO-8859-1</td><td>.iso8859-1 .latin1</td></tr>
|
||||
<tr><td>ISO-8859-2</td><td>.iso8859-2 .latin2 .cen</td></tr>
|
||||
<tr><td>ISO-8859-3</td><td>.iso8859-3 .latin3</td></tr>
|
||||
<tr><td>ISO-8859-4</td><td>.iso8859-4 .latin4</td></tr>
|
||||
<tr><td>ISO-8859-5</td><td>.iso8859-5 .latin5 .cyr .iso-ru</td></tr>
|
||||
<tr><td>ISO-8859-6</td><td>.iso8859-6 .latin6 .arb</td></tr>
|
||||
<tr><td>ISO-8859-7</td><td>.iso8859-7 .latin7 .grk</td></tr>
|
||||
<tr><td>ISO-8859-8</td><td>.iso8859-8 .latin8 .heb</td></tr>
|
||||
<tr><td>ISO-8859-9</td><td>.iso8859-9 .latin9 .trk</td></tr>
|
||||
<tr><td>ISO-2022-JP</td><td>.iso2022-jp .jis</td></tr>
|
||||
<tr><td>ISO-2022-KR</td><td>.iso2022-kr .kis</td></tr>
|
||||
<tr><td>ISO-2022-CN</td><td>.iso2022-cn .cis</td></tr>
|
||||
<tr><td>Big5</td><td>.Big5 .big5 .b5</td></tr>
|
||||
<tr><td>WINDOWS-1251</td><td>.cp-1251 .win-1251</td></tr>
|
||||
<tr><td>CP866</td><td>.cp866</td></tr>
|
||||
<tr><td>KOI8-r</td><td>.koi8-r .koi8-ru</td></tr>
|
||||
<tr><td>KOI8-ru</td><td>.koi8-uk .ua</td></tr>
|
||||
<tr><td>ISO-10646-UCS-2</td><td>.ucs2</td></tr>
|
||||
<tr><td>ISO-10646-UCS-4</td><td>.ucs4</td></tr>
|
||||
<tr><td>UTF-8</td><td>.utf8</td></tr>
|
||||
<tr><td>GB2312</td><td>.gb2312 .gb </td></tr>
|
||||
<tr><td>utf-7</td><td>.utf7</td></tr>
|
||||
<tr><td>EUC-TW</td><td>.euc-tw</td></tr>
|
||||
<tr><td>EUC-JP</td><td>.euc-jp</td></tr>
|
||||
<tr><td>EUC-KR</td><td>.euc-kr</td></tr>
|
||||
<tr><td>shift_jis</td><td>.sjis</td></tr>
|
||||
</tbody>
|
||||
</table>
|
||||
|
||||
<p>So, for example, a file named <code>page.utf8.html</code> or
|
||||
<code>page.html.utf8</code> will probably be sent with the UTF-8 charset
|
||||
attached, the difference being that if there is an
|
||||
<code>AddCharset charset .html</code> declaration, it will override
|
||||
the .utf8 extension in <code>page.utf8.html</code> (precedence moves
|
||||
from right to left). By default, Apache has no such declaration.</p>
|
||||
|
||||
<h4 id="fixcharset-server-iis">Microsoft IIS</h4>
|
||||
|
||||
<p>If anyone can contribute information on how to configure Microsoft
|
||||
IIS to change character encodings, I'd be grateful.</p>
|
||||
|
||||
<h3 id="fixcharset-xml">XML</h3>
|
||||
|
||||
<p><code>META</code> tags are the most common source of embedded
|
||||
encodings, but they can also come from somewhere else: XML
|
||||
processing instructions. They look like:</p>
|
||||
|
||||
<pre><?xml version="1.0" encoding="UTF-8"?></pre>
|
||||
|
||||
<p>...and are most often found in XML documents (including XHTML).</p>
|
||||
|
||||
<p>For XHTML, this processing instruction theoretically
|
||||
overrides the <code>META</code> tag. In reality, this happens only when the
|
||||
XHTML is actually served as legit XML and not HTML, which is almost
|
||||
always never due to Internet Explorer's lack of support for
|
||||
<code>application/xhtml+xml</code> (even though doing so is often
|
||||
argued to be <a href="http://www.hixie.ch/advocacy/xhtml">good practice</a>).</p>
|
||||
|
||||
<p>For XML, however, this processing instruction is extremely important.
|
||||
Since most webservers are not configured to send charsets for .xml files,
|
||||
this is the only thing a parser has to go on. Furthermore, the default
|
||||
for XML files is UTF-8, which often butts heads with more common
|
||||
ISO-8859-1 encoding (you see this in garbled RSS feeds).</p>
|
||||
|
||||
<p>In short, if you use XHTML and have gone through the
|
||||
trouble of adding the XML header, be sure to make sure it jives
|
||||
with your <code>META</code> tags and HTTP headers.</p>
|
||||
|
||||
<h3>Inside the process</h3>
|
||||
|
||||
<p>This section is not required reading,
|
||||
but may answer some of your questions on what's going on in all
|
||||
this character encoding hocus pocus. If you're interested in
|
||||
moving on to the next phase, skip this section.</p>
|
||||
|
||||
<p>A logical question that follows all of our wheeling and dealing
|
||||
with multiple sources of character encodings is "Why are there
|
||||
so many options?" To answer this question, we have to turn
|
||||
back our definition of character encodings: they allow a program
|
||||
to interpret bytes into human-readable characters.</p>
|
||||
|
||||
<p>Thus, a chicken-egg problem: a character encoding
|
||||
is necessary to interpret the
|
||||
text of a document. A <code>META</code> tag is in the text of a document.
|
||||
The <code>META</code> tag gives the character encoding. How can we
|
||||
determine the contents of a <code>META</code> tag, inside the text,
|
||||
if we don't know it's character encoding? And how do we figure out
|
||||
the character encoding, if we don't know the contents of the
|
||||
<code>META</code> tag?</p>
|
||||
|
||||
<p>Fortunantely for us, the characters we need to write the
|
||||
<code>META</code> are in ASCII, which is pretty much universal
|
||||
over every character encoding that is in common use today. So,
|
||||
all the web-browser has to do is parse all the way down until
|
||||
it gets to the Content-Type tag, extract the character encoding
|
||||
tag, then re-parse the document according to this new information.</p>
|
||||
|
||||
<p>Obviously this is complicated, so browsers prefer the simpler
|
||||
and more efficient solution: get the character encoding from a
|
||||
somewhere other than the document itself, i.e. the HTTP headers,
|
||||
much to the chagrin of HTML authors who can't set these headers.</p>
|
||||
|
||||
<h2 id="whyutf8">Why UTF-8?</h2>
|
||||
|
||||
<p>So, you've gone through all the trouble of ensuring that your
|
||||
server and embedded characters all line up properly and are
|
||||
present. Good job: at
|
||||
this point, you could quit and rest easy knowing that your pages
|
||||
are not vulnerable to character encoding style XSS attacks.
|
||||
However, just as having a character encoding is better than
|
||||
having no character encoding at all, having UTF-8 as your
|
||||
character encoding is better than having some other random
|
||||
character encoding, and the next step is to convert to UTF-8.
|
||||
But why?</p>
|
||||
|
||||
<h3 id="whyutf8-i18n">Internationalization</h3>
|
||||
|
||||
<p>Many software projects, at one point or another, suddenly realize
|
||||
that they should be supporting more than one language. Even regular
|
||||
usage in one language sometimes requires the occasional special character
|
||||
that, without surprise, is not available in your character set. Sometimes
|
||||
developers get around this by adding support for multiple encodings: when
|
||||
using Chinese, use Big5, when using Japanese, use Shift-JIS, when
|
||||
using Greek, etc. Other times, they use character entities with great
|
||||
zeal.</p>
|
||||
|
||||
<p>UTF-8, however, obviates the need for any of these complicated
|
||||
measures. After getting the system to use UTF-8 and adjusting for
|
||||
sources that are outside the hand of the browser (more on this later),
|
||||
UTF-8 just works. You can use it for any language, even many languages
|
||||
at once, you don't have to worry about managing multiple encodings,
|
||||
you don't have to use those user-unfriendly entities.</p>
|
||||
|
||||
<h3 id="whyutf8-user">User-friendly</h3>
|
||||
|
||||
<p>Websites encoded in Latin-1 (ISO-8859-1) which ocassionally need
|
||||
a special character outside of their scope often will use a character
|
||||
entity to achieve the desired effect. For instance, θ can be
|
||||
written <code>&theta;</code>, regardless of the character encoding's
|
||||
support of Greek letters.</p>
|
||||
|
||||
<p>This works nicely for limited use of special characters, but
|
||||
say you wanted this sentence of Chinese text: 激光,
|
||||
這兩個字是甚麼意思.
|
||||
The entity-ized version would look like this:</p>
|
||||
|
||||
<pre>&#28608;&#20809;, &#36889;&#20841;&#20491;&#23383;&#26159;&#29978;&#40636;&#24847;&#24605;</pre>
|
||||
|
||||
<p>Extremely inconvenient for those of us who actually know what
|
||||
character entities are, totally unintelligible to poor users who don't!
|
||||
Even the slightly more user-friendly, "intelligible" character
|
||||
entities like <code>&theta;</code> will leave users who are
|
||||
uninterested in learning HTML scratching their heads. On the other
|
||||
hand, if they see θ in an edit box, they'll know that it's a
|
||||
special character, and treat it accordingly, even if they don't know
|
||||
how to write that character themselves.</p>
|
||||
|
||||
<blockquote class="aside"><p>Wikipedia is a great case study for
|
||||
an application that originally used ISO-8859-1 but switched to UTF-8
|
||||
when it became far to cumbersome to support foreign languages. Bots
|
||||
will now actually go through articles and convert character entities
|
||||
to their corresponding real characters for the sake of user-friendliness
|
||||
and searcheability. See
|
||||
<a href="http://meta.wikimedia.org/wiki/Help:Special_characters">Meta's
|
||||
page on special characters</a> for more details.
|
||||
</p></blockquote>
|
||||
|
||||
<h3 id="whyutf8-forms">Forms</h3>
|
||||
|
||||
<p>While we're on the tack of users, how do non-UTF-8 web forms deal
|
||||
with characters that our outside of their character set? Rather than
|
||||
discuss what UTF-8 does right, we're going to show what could go wrong
|
||||
if you didn't use UTF-8 and people tried to use characters outside
|
||||
of your character encoding.</p>
|
||||
|
||||
<p>The troubles are large, extensive, and extremely difficult to fix (or,
|
||||
at least, difficult enough that if you had the time and resources to invest
|
||||
in doing the fix, you would be probably better off migrating to UTF-8).
|
||||
There are two types of form submission: <code>application/x-www-form-urlencoded</code>
|
||||
which is used for GET and by default for POST, and <code>multipart/form-data</code>
|
||||
which may be used by POST, and is required when you want to upload
|
||||
files.</p>
|
||||
|
||||
<p>The following is a summarization of notes from
|
||||
<a href="http://ppewww.physics.gla.ac.uk/~flavell/charset/form-i18n.html">
|
||||
<code>FORM</code> submission and i18n</a>. That document contains lots
|
||||
of useful information, but is written in a rambly manner, so
|
||||
here I try to get right to the point.</p>
|
||||
|
||||
<h4 id="whyutf8-forms-urlencoded"><code>application/x-www-form-urlencoded</code></h4>
|
||||
|
||||
<p>This is the Content-Type that GET requests must use, and POST requests
|
||||
use by default. It involves the ubiquituous percent encoding format that
|
||||
looks something like: <code>%C3%86</code>. There is no official way of
|
||||
determining the character encoding of such a request, since the percent
|
||||
encoding operates on a byte level, so it is usually assumed that it
|
||||
is the same as the encoding the page containing the form was submitted
|
||||
in. You'll run into very few problems if you only use characters in
|
||||
the character encoding you chose.</p>
|
||||
|
||||
<p>However, once you start adding characters outside of your encoding
|
||||
(and this is a lot more common than you may think: take curly
|
||||
"smart" quotes from Microsoft as an example),
|
||||
a whole manner of strange things start to happen. Depending on the
|
||||
browser you're using, they might:</p>
|
||||
|
||||
<ul>
|
||||
<li>Replace the unsupported characters with useless question marks,</li>
|
||||
<li>Attempt to fix the characters (example: smart quotes to regular quotes),</li>
|
||||
<li>Replace the character with a character entity, or</li>
|
||||
<li>Send it anyway as a different character encoding mixed in
|
||||
with the original encoding (usually Windows-1252 rather than
|
||||
iso-8859-1 or UTF-8 interspersed in 8-bit)</li>
|
||||
</ul>
|
||||
|
||||
<p>To properly guard against these behaviors, you'd have to sniff out
|
||||
the browser agent, compile a database of different behaviors, and
|
||||
take appropriate conversion action against the string (disregarding
|
||||
a spate of extremely mysterious, random and devastating bugs Internet
|
||||
Explorer manifests every once in a while). Or you could
|
||||
use UTF-8 and rest easy knowing that none of this could possibly happen
|
||||
since UTF-8 supports every character.</p>
|
||||
|
||||
<h4 id="whyutf8-forms-multipart"><code>multipart/form-data</code></h4>
|
||||
|
||||
<p>Multipart form submission takes a way a lot of the ambiguity
|
||||
that percent-encoding had: the server now can explicitly ask for
|
||||
certain encodings, and the client can explicitly tell the server
|
||||
during the form submission what encoding the fields are in.</p>
|
||||
|
||||
<p>There are two ways you go with this functionality: leave it
|
||||
unset and have the browser send in the same encoding as the page,
|
||||
or set it to UTF-8 and then do another conversion server-side.
|
||||
Each method has deficiencies, especially the former.</p>
|
||||
|
||||
<p>If you tell the browser to send the form in the same encoding as
|
||||
the page, you still have the trouble of what to do with characters
|
||||
that are outside of the character encoding's range. The behavior, once
|
||||
again, varies: Firefox 2.0 entity-izes them while Internet Explorer
|
||||
7.0 mangles them beyond intelligibility. For serious I18N purposes,
|
||||
this is not an option.</p>
|
||||
|
||||
<p>The other possibility is to set Accept-Encoding to UTF-8, which
|
||||
begs the question: Why aren't you using UTF-8 for everything then?
|
||||
This route is more palatable, but there's a notable caveat: your data
|
||||
will come in as UTF-8, so you will have to explicitly convert it into
|
||||
your favored local character encoding.</p>
|
||||
|
||||
<p>I object to this approach on idealogical grounds: you're
|
||||
digging yourself deeper into
|
||||
the hole when you could have been converting to UTF-8
|
||||
instead. And, of course, you can't use this method for GET requests.</p>
|
||||
|
||||
<h3 id="whyutf8-support">Well supported</h3>
|
||||
|
||||
<p>Almost every modern browser in the wild today has full UTF-8 and Unicode
|
||||
support: the number of troublesome cases can be counted with the
|
||||
fingers of one hand, and these browsers usually have trouble with
|
||||
other character encodings too. Problems users usually encounter stem
|
||||
from the lack of appropriate fonts to display the characters (once
|
||||
again, this applies to all character encodings and HTML entities) or
|
||||
Internet Explorer's lack of intelligent font picking (which can be
|
||||
worked around).</p>
|
||||
|
||||
<p>We will go into more detail about how to deal with edge cases in
|
||||
the browser world in the Migration section, but rest assured that
|
||||
converting to UTF-8, if done correctly, will not result in users
|
||||
hounding you about broken pages.</p>
|
||||
|
||||
<h3 id="whyutf8-htmlpurifier">HTML Purifier</h3>
|
||||
|
||||
<p>And finally, we get to HTML Purifier.</p>
|
||||
|
||||
<h2 id="migrate">Migrate to UTF-8</h2>
|
||||
|
||||
<h3 id="migrate-editor">Text editor</h3>
|
||||
|
||||
<h3 id="migrate-db">Configuring your database</h3>
|
||||
|
||||
<h3 id="migrate-convert">Convert old text</h3>
|
||||
|
||||
<h3 id="migrate-bom">Byte Order Mark (headers already sent!)</h3>
|
||||
|
||||
<h3 id="migrate-variablewidth">Dealing with variable width in functions</h3>
|
||||
|
||||
<h2 id="externallinks">Further Reading</h2>
|
||||
|
||||
<p>Many other developers have already discussed the subject of Unicode,
|
||||
UTF-8 and internationalization, and I would like to defer to them for
|
||||
a more in-depth look into character sets and encodings.</p>
|
||||
|
||||
<ul>
|
||||
<li><a href="http://www.joelonsoftware.com/articles/Unicode.html">
|
||||
The Absolute Minimum Every Software Developer Absolutely,
|
||||
Positively Must Know About Unicode and Character Sets
|
||||
(No Excuses!)</a> by Joel Spolsky, provides a <em>very</em>
|
||||
good high-level look at Unicode and character sets in general.</li>
|
||||
<li><a href="http://en.wikipedia.org/wiki/UTF-8">UTF-8 on Wikipedia</a>,
|
||||
provides a lot of useful details into the innards of UTF-8, although
|
||||
it may be a little off-putting to people who don't know much
|
||||
about Unicode to begin with.</li>
|
||||
</ul>
|
||||
|
||||
</body>
|
||||
</html>
|
@@ -15,6 +15,7 @@
|
||||
|
||||
<div id="filing">Filed under End-User</div>
|
||||
<div id="index">Return to the <a href="index.html">index</a>.</div>
|
||||
<div id="home"><a href="http://hp.jpsband.org/">HTML Purifier</a> End-User Documentation</div>
|
||||
|
||||
<p>Clients like their YouTube videos. It gives them a warm fuzzy feeling when
|
||||
they see a neat little embedded video player on their websites that can play
|
||||
@@ -36,7 +37,7 @@ from a specific website, it probably is okay. If no amount of pleading will
|
||||
convince the people upstairs that they should just settle with just linking
|
||||
to their movies, you may find this technique very useful.</p>
|
||||
|
||||
<h2>Sample</h2>
|
||||
<h2>Looking in</h2>
|
||||
|
||||
<p>Below is custom code that allows users to embed
|
||||
YouTube videos. This is not favoritism: this trick can easily be adapted for
|
||||
@@ -68,55 +69,27 @@ into your documents. YouTube's code goes like this:</p>
|
||||
<p>What point 2 means is that if we have code like <code><span
|
||||
class="embed-youtube">AyPzM5WK8ys</span></code> your
|
||||
application can reconstruct the full object from this small snippet that
|
||||
passes through HTML Purifier <em>unharmed</em>.</p>
|
||||
passes through HTML Purifier <em>unharmed</em>.
|
||||
<a href="http://hp.jpsband.org/svnroot/htmlpurifier/trunk/library/HTMLPurifier/Filter/YouTube.php">Show me the code!</a></p>
|
||||
|
||||
<pre>
|
||||
<?php
|
||||
<p>And the corresponding usage:</p>
|
||||
|
||||
class HTMLPurifierX_PreserveYouTube extends HTMLPurifier
|
||||
{
|
||||
function purify($html, $config = null) {
|
||||
$pre_regex = '#<object[^>]+>.+?'.
|
||||
'http://www.youtube.com/v/([A-Za-z0-9]+).+?</object>#';
|
||||
$pre_replace = '<span class="youtube-embed">\1</span>';
|
||||
$html = preg_replace($pre_regex, $pre_replace, $html);
|
||||
$html = parent::purify($html, $config);
|
||||
$post_regex = '#<span class="youtube-embed">([A-Za-z0-9]+)</span>#';
|
||||
$post_replace = '<object width="425" height="350" '.
|
||||
'data="http://www.youtube.com/v/\1">'.
|
||||
'<param name="movie" value="http://www.youtube.com/v/\1"></param>'.
|
||||
'<param name="wmode" value="transparent"></param>'.
|
||||
'<!--[if IE]>'.
|
||||
'<embed src="http://www.youtube.com/v/\1"'.
|
||||
'type="application/x-shockwave-flash"'.
|
||||
'wmode="transparent" width="425" height="350" />'.
|
||||
'<![endif]-->'.
|
||||
'</object>';
|
||||
$html = preg_replace($post_regex, $post_replace, $html);
|
||||
return $html;
|
||||
}
|
||||
}
|
||||
<pre><?php
|
||||
// assuming $purifier is an instance of HTMLPurifier
|
||||
require_once 'HTMLPurifier/Filter/YouTube.php';
|
||||
$purifier->addFilter(new HTMLPurifier_Filter_YouTube());
|
||||
?></pre>
|
||||
|
||||
$purifier = new HTMLPurifierX_PreserveYouTube();
|
||||
$html_still_with_youtube = $purifier->purify($html_with_youtube);
|
||||
|
||||
?>
|
||||
</pre>
|
||||
|
||||
<p>There is a bit going on here, so let's explain.</p>
|
||||
<p>There is a bit going in the two code snippets, so let's explain.</p>
|
||||
|
||||
<ol>
|
||||
<li>The class uses the prefix <code>HTMLPurifierX</code> because it's
|
||||
userspace code. Don't use <code>HTMLPurifier</code> in front of your
|
||||
class, since it might clobber another class in the library.</li>
|
||||
<li>In order to keep the interface compatible, we've extended HTMLPurifier
|
||||
into a new class that preserves the YouTube videos. This means that
|
||||
all you have to do is replace all instances of
|
||||
<code>new HTMLPurifier</code> to <code>new
|
||||
HTMLPurifierX_PreserveYouTube</code>. There's other ways to go about
|
||||
doing this: if you were calling a function that wrapped HTML Purifier,
|
||||
you could paste the PHP right there. If you wanted to be really
|
||||
fancy, you could make a decorator for HTMLPurifier.</li>
|
||||
<li>This is a Filter object, which intercepts the HTML that is
|
||||
coming into and out of the purifier. You can add as many
|
||||
filter objects as you like. <code>preFilter()</code>
|
||||
processes the code before it gets purified, and <code>postFilter()</code>
|
||||
processes the code afterwards. So, we'll use <code>preFilter()</code> to
|
||||
replace the object tag with a <code>span</code>, and <code>postFilter()</code>
|
||||
to restore it.</li>
|
||||
<li>The first preg_replace call replaces any YouTube code users may have
|
||||
embedded into the benign span tag. Span is used because it is inline,
|
||||
and objects are inline too. We are very careful to be extremely
|
||||
@@ -164,16 +137,16 @@ it is important that you are cognizant of the risk.</p>
|
||||
|
||||
<p>This should go without saying, but if you're going to adapt this code
|
||||
for Google Video or the like, make sure you do it <em>right</em>. It's
|
||||
extremely easy to allow a character too many in the final section and
|
||||
extremely easy to allow a character too many in <code>postFilter()</code> and
|
||||
suddenly you're introducing XSS into HTML Purifier's XSS free output. HTML
|
||||
Purifier may be well written, but it cannot guard against vulnerabilities
|
||||
introduced after it has finished.</p>
|
||||
|
||||
<h2>Future plans</h2>
|
||||
<h2>Help out!</h2>
|
||||
|
||||
<p>It would probably be a good idea if this code was added to the core
|
||||
library. Look out for the inclusion of this into the core as a decorator
|
||||
or the like.</p>
|
||||
<p>If you write a filter for your favorite video destination (or anything
|
||||
like that, for that matter), send it over and it might get included
|
||||
with the core!</p>
|
||||
|
||||
</body>
|
||||
</html>
|
@@ -1,15 +1,14 @@
|
||||
<?php
|
||||
<?php exit;
|
||||
|
||||
// This file demonstrates basic usage of HTMLPurifier.
|
||||
|
||||
exit; // not to be called directly, it will fail fantastically!
|
||||
|
||||
set_include_path('/path/to/htmlpurifier/library' . PATH_SEPARATOR . get_include_path());
|
||||
require_once 'HTMLPurifier.php';
|
||||
require_once '/path/to/htmlpurifier/library/HTMLPurifier.auto.php';
|
||||
|
||||
$purifier = new HTMLPurifier();
|
||||
$html = '<b>Simple and short';
|
||||
|
||||
$pure_html = $purifier->purify($html);
|
||||
|
||||
echo $pure_html;
|
||||
|
||||
?>
|
@@ -13,7 +13,7 @@
|
||||
|
||||
<h1>Documentation</h1>
|
||||
|
||||
<p><strong>HTML Purifier</strong> has documentation for all types of people.
|
||||
<p><strong><a href="http://hp.jpsband.org/">HTML Purifier</a></strong> has documentation for all types of people.
|
||||
Here is an index of all of them.</p>
|
||||
|
||||
<h2>End-user</h2>
|
||||
@@ -28,6 +28,12 @@ information for casual developers using HTML Purifier.</p>
|
||||
<dt><a href="enduser-youtube.html">Embedding YouTube videos</a></dt>
|
||||
<dd>Explains how to safely allow the embedding of flash from trusted sites.</dd>
|
||||
|
||||
<dt><a href="enduser-slow.html">Speeding up HTML Purifier</a></dt>
|
||||
<dd>Explains how to speed up HTML Purifier through caching or inbound filtering.</dd>
|
||||
|
||||
<dt><a href="enduser-utf8.html">UTF-8</a></dt>
|
||||
<dd>Describes the rationale for using UTF-8, the ramifications otherwise, and how to make the switch.</dd>
|
||||
|
||||
</dl>
|
||||
|
||||
<h2>Development</h2>
|
||||
|
@@ -15,6 +15,7 @@
|
||||
|
||||
<div id="filing">Filed under Proposals</div>
|
||||
<div id="index">Return to the <a href="index.html">index</a>.</div>
|
||||
<div id="home"><a href="http://hp.jpsband.org/">HTML Purifier</a> End-User Documentation</div>
|
||||
|
||||
<p>Your website probably has a color-scheme.
|
||||
<span style="color:#090; background:#FFF;">Green on white</span>,
|
||||
|
@@ -14,15 +14,15 @@ Since configuration is dependant on context, internal classes require a
|
||||
configuration object to be passed as a parameter. (They also require a
|
||||
Context object).
|
||||
|
||||
In relation to HTMLDefinition and CSSDefinition, there is a special class
|
||||
In relation to HTMLDefinition and CSSDefinition, there could be a special class
|
||||
of directives that influence the *construction* of the Definition object.
|
||||
A standard call pattern would look like:
|
||||
A theoretical call pattern would look like:
|
||||
|
||||
1. Client calls Config->getHTMLDefinition()
|
||||
2. Config calls HTMLDefinition->createNew(this)
|
||||
3. HTMLDefinition constructs itself with base configuration
|
||||
4. HTMLDefinition calls Config->get('HTMLDefinition')
|
||||
5. Config returns array of directives that later construction
|
||||
4. HTMLDefinition calls Config->get('HTML')
|
||||
5. Config returns array of directives
|
||||
6. HTMLDefinition performs operations and changes specified by directives
|
||||
7. HTMLPurifier returns constructed definition
|
||||
8. Config caches definition so it doesn't have to be generated again
|
||||
@@ -33,3 +33,7 @@ custom copy, which OVERRIDES all directives. Only the base, vanilla copy
|
||||
is the Singleton, the object actually interfaced with is a operated-upon
|
||||
clone of that object. Also, if an update to the directives would update
|
||||
the definition, you'd have to force reconstruction.
|
||||
|
||||
In practice, the pulling directives from the config object are
|
||||
solely need-based, and the flex points are littered throughout the
|
||||
setup() function. Some sort of refactoring is likely in order.
|
||||
|
@@ -15,7 +15,10 @@ and properties to allow. HTMLDefinition makes a big part of what HTMLPurifier
|
||||
is.
|
||||
|
||||
The idea, then, is to setup fundamentally different set of definitions, which
|
||||
can further be customized using simpler configuration options.
|
||||
can further be customized using simpler configuration options. Alternatively,
|
||||
they could be implemented as configuration profiles, which simply load
|
||||
a set of recommended directives to acheive a desired affect (no simpler
|
||||
config options though).
|
||||
|
||||
Here are some fuzzy levels you could set:
|
||||
|
||||
|
@@ -4,8 +4,6 @@ Configuration Ideas
|
||||
Here are some theoretical configuration ideas that we could implement some
|
||||
time. Note the naming convention: %Namespace.Directive
|
||||
|
||||
%Attr.IDPrefix - prefix all ids with this
|
||||
|
||||
%Attr.RewriteFragments - if there's %Attr.IDPrefix we may want to transparently
|
||||
rewrite the URLs we parse too. However, we can only do it when it's a pure
|
||||
anchor link, so it's not foolproof
|
||||
|
@@ -15,6 +15,7 @@
|
||||
|
||||
<div id="filing">Filed under Reference</div>
|
||||
<div id="index">Return to the <a href="index.html">index</a>.</div>
|
||||
<div id="home"><a href="http://hp.jpsband.org/">HTML Purifier</a> End-User Documentation</div>
|
||||
|
||||
<p>Many thanks to the DevNetwork community for answering questions,
|
||||
theorizing about design, and offering encouragement during
|
||||
|
@@ -2,8 +2,8 @@
|
||||
Is HTML Purifier Strict or Transitional?
|
||||
A little bit of helpful guidance
|
||||
|
||||
Despite the fact that HTML Purifier professes only to support transitional
|
||||
HTML, it rejects a lot of attributes and elements that are actually, indeed,
|
||||
Despite the fact that HTML Purifier professes to support both transitional and
|
||||
strict HTML, it rejects a lot of attributes and elements that are actually, indeed,
|
||||
valid. You can investigate progress.html to find out precisely what we
|
||||
are doing to these *deprecated* attributes.
|
||||
|
||||
@@ -11,8 +11,8 @@ However, users have found that Strict HTML imposes some quite unreasonable
|
||||
restrictions on certain things. The start and value attributes in ol and
|
||||
li (respectively) perhaps are the most contested. There's is currently no
|
||||
widely supported browser method short of JavaScript that can replace these
|
||||
two deprecated elements. HTML Purifier does not currently support them, but
|
||||
it might behoove us to do so while our output is still transitional.
|
||||
two deprecated elements. It behooves us to allow these deprecated
|
||||
attributes when the output is transitional.
|
||||
|
||||
Fortunantely, that's the only real bugger case. The others have near-perfect
|
||||
CSS equivalents, and were presentational anyway. However, the other question
|
||||
@@ -32,5 +32,6 @@ these loose-only constructs in loose mode:
|
||||
|
||||
The changed child definitions as well as the ul.start li.value are the most
|
||||
compelling reasons why loose should be used. We may want offer disabling <u>,
|
||||
<strike> and <s> by themselves.
|
||||
<strike> and <s> by themselves. We may also want to offer no pre-emptive
|
||||
deprecated conversions. This all must be unified.
|
||||
|
||||
|
@@ -23,6 +23,8 @@ h4 {font-family:sans-serif; font-size:0.9em; font-weight:bold; }
|
||||
|
||||
/* Marks off asides, discussions on why something is the way it is */
|
||||
.aside {margin-left:2em; font-family:sans-serif; font-size:0.9em; }
|
||||
blockquote .label {font-weight:bold; font-size:1em; margin:0 0 .1em;
|
||||
border-bottom:1px solid #CCC;}
|
||||
|
||||
/* A regular table */
|
||||
.table {border-collapse:collapse; border-bottom:2px solid #888; margin-left:2em; }
|
||||
@@ -36,5 +38,7 @@ h4 {font-family:sans-serif; font-size:0.9em; font-weight:bold; }
|
||||
/* Contains, without exception, Return to index. */
|
||||
#index {font-size:smaller; }
|
||||
|
||||
#home {font-size:smaller;}
|
||||
|
||||
/* Contains, without exception, $Id$, for SVN version info. */
|
||||
#version {text-align:right; font-style:italic; margin:2em 0;}
|
||||
#version {text-align:right; font-style:italic; margin:2em 0;}
|
||||
|
@@ -6,12 +6,12 @@
|
||||
* this is efficient for instances when you only use HTML Purifier
|
||||
* on a few of your pages, it murders bytecode caching. You still
|
||||
* need to add HTML Purifier to your path.
|
||||
* @note ''HTMLPurifier()'' is NOT the same as ''new HTMLPurifier()''
|
||||
*/
|
||||
|
||||
function HTMLPurifier($html, $config = null) {
|
||||
static $purifier = false;
|
||||
if (!$purifier) {
|
||||
$init = true;
|
||||
require_once 'HTMLPurifier.php';
|
||||
$purifier = new HTMLPurifier();
|
||||
}
|
||||
|
@@ -22,7 +22,7 @@
|
||||
*/
|
||||
|
||||
/*
|
||||
HTML Purifier 1.3.2 - Standards Compliant HTML Filtering
|
||||
HTML Purifier 1.4.1 - Standards Compliant HTML Filtering
|
||||
Copyright (C) 2006 Edward Z. Yang
|
||||
|
||||
This library is free software; you can redistribute it and/or
|
||||
@@ -64,9 +64,10 @@ require_once 'HTMLPurifier/Encoder.php';
|
||||
class HTMLPurifier
|
||||
{
|
||||
|
||||
var $version = '1.3.2';
|
||||
var $version = '1.4.1';
|
||||
|
||||
var $config;
|
||||
var $filters;
|
||||
|
||||
var $lexer, $strategy, $generator;
|
||||
|
||||
@@ -91,10 +92,17 @@ class HTMLPurifier
|
||||
$this->lexer = HTMLPurifier_Lexer::create();
|
||||
$this->strategy = new HTMLPurifier_Strategy_Core();
|
||||
$this->generator = new HTMLPurifier_Generator();
|
||||
$this->encoder = new HTMLPurifier_Encoder();
|
||||
|
||||
}
|
||||
|
||||
/**
|
||||
* Adds a filter to process the output. First come first serve
|
||||
* @param $filter HTMLPurifier_Filter object
|
||||
*/
|
||||
function addFilter($filter) {
|
||||
$this->filters[] = $filter;
|
||||
}
|
||||
|
||||
/**
|
||||
* Filters an HTML snippet/document to be XSS-free and standards-compliant.
|
||||
*
|
||||
@@ -109,8 +117,12 @@ class HTMLPurifier
|
||||
|
||||
$config = $config ? HTMLPurifier_Config::create($config) : $this->config;
|
||||
|
||||
$context =& new HTMLPurifier_Context();
|
||||
$html = $this->encoder->convertToUTF8($html, $config, $context);
|
||||
$context = new HTMLPurifier_Context();
|
||||
$html = HTMLPurifier_Encoder::convertToUTF8($html, $config, $context);
|
||||
|
||||
for ($i = 0, $size = count($this->filters); $i < $size; $i++) {
|
||||
$html = $this->filters[$i]->preFilter($html, $config, $context);
|
||||
}
|
||||
|
||||
// purified HTML
|
||||
$html =
|
||||
@@ -127,7 +139,11 @@ class HTMLPurifier
|
||||
$config, $context
|
||||
);
|
||||
|
||||
$html = $this->encoder->convertFromUTF8($html, $config, $context);
|
||||
for ($i = $size - 1; $i >= 0; $i--) {
|
||||
$html = $this->filters[$i]->postFilter($html, $config, $context);
|
||||
}
|
||||
|
||||
$html = HTMLPurifier_Encoder::convertFromUTF8($html, $config, $context);
|
||||
$this->context =& $context;
|
||||
return $html;
|
||||
}
|
||||
|
87
library/HTMLPurifier/AttrDef/Background.php
Normal file
87
library/HTMLPurifier/AttrDef/Background.php
Normal file
@@ -0,0 +1,87 @@
|
||||
<?php
|
||||
|
||||
require_once 'HTMLPurifier/AttrDef.php';
|
||||
require_once 'HTMLPurifier/CSSDefinition.php';
|
||||
|
||||
/**
|
||||
* Validates shorthand CSS property background.
|
||||
* @warning Does not support url tokens that have internal spaces.
|
||||
*/
|
||||
class HTMLPurifier_AttrDef_Background extends HTMLPurifier_AttrDef
|
||||
{
|
||||
|
||||
/**
|
||||
* Local copy of component validators.
|
||||
* @note See HTMLPurifier_AttrDef_Font::$info for a similar impl.
|
||||
*/
|
||||
var $info;
|
||||
|
||||
function HTMLPurifier_AttrDef_Background($config) {
|
||||
$def = $config->getCSSDefinition();
|
||||
$this->info['background-color'] = $def->info['background-color'];
|
||||
$this->info['background-image'] = $def->info['background-image'];
|
||||
$this->info['background-repeat'] = $def->info['background-repeat'];
|
||||
$this->info['background-attachment'] = $def->info['background-attachment'];
|
||||
$this->info['background-position'] = $def->info['background-position'];
|
||||
}
|
||||
|
||||
function validate($string, $config, &$context) {
|
||||
|
||||
// regular pre-processing
|
||||
$string = $this->parseCDATA($string);
|
||||
if ($string === '') return false;
|
||||
|
||||
// assumes URI doesn't have spaces in it
|
||||
$bits = explode(' ', strtolower($string)); // bits to process
|
||||
|
||||
$caught = array();
|
||||
$caught['color'] = false;
|
||||
$caught['image'] = false;
|
||||
$caught['repeat'] = false;
|
||||
$caught['attachment'] = false;
|
||||
$caught['position'] = false;
|
||||
|
||||
$i = 0; // number of catches
|
||||
$none = false;
|
||||
|
||||
foreach ($bits as $bit) {
|
||||
if ($bit === '') continue;
|
||||
foreach ($caught as $key => $status) {
|
||||
if ($key != 'position') {
|
||||
if ($status !== false) continue;
|
||||
$r = $this->info['background-' . $key]->validate($bit, $config, $context);
|
||||
} else {
|
||||
$r = $bit;
|
||||
}
|
||||
if ($r === false) continue;
|
||||
if ($key == 'position') {
|
||||
if ($caught[$key] === false) $caught[$key] = '';
|
||||
$caught[$key] .= $r . ' ';
|
||||
} else {
|
||||
$caught[$key] = $r;
|
||||
}
|
||||
$i++;
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
if (!$i) return false;
|
||||
if ($caught['position'] !== false) {
|
||||
$caught['position'] = $this->info['background-position']->
|
||||
validate($caught['position'], $config, $context);
|
||||
}
|
||||
|
||||
$ret = array();
|
||||
foreach ($caught as $value) {
|
||||
if ($value === false) continue;
|
||||
$ret[] = $value;
|
||||
}
|
||||
|
||||
if (empty($ret)) return false;
|
||||
return implode(' ', $ret);
|
||||
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
?>
|
130
library/HTMLPurifier/AttrDef/BackgroundPosition.php
Normal file
130
library/HTMLPurifier/AttrDef/BackgroundPosition.php
Normal file
@@ -0,0 +1,130 @@
|
||||
<?php
|
||||
|
||||
require_once 'HTMLPurifier/AttrDef.php';
|
||||
require_once 'HTMLPurifier/AttrDef/CSSLength.php';
|
||||
require_once 'HTMLPurifier/AttrDef/Percentage.php';
|
||||
|
||||
/* W3C says:
|
||||
[ // adjective and number must be in correct order, even if
|
||||
// you could switch them without introducing ambiguity.
|
||||
// some browsers support that syntax
|
||||
[
|
||||
<percentage> | <length> | left | center | right
|
||||
]
|
||||
[
|
||||
<percentage> | <length> | top | center | bottom
|
||||
]?
|
||||
] |
|
||||
[ // this signifies that the vertical and horizontal adjectives
|
||||
// can be arbitrarily ordered, however, there can only be two,
|
||||
// one of each, or none at all
|
||||
[
|
||||
left | center | right
|
||||
] ||
|
||||
[
|
||||
top | center | bottom
|
||||
]
|
||||
]
|
||||
top, left = 0%
|
||||
center, (none) = 50%
|
||||
bottom, right = 100%
|
||||
*/
|
||||
|
||||
/* QuirksMode says:
|
||||
keyword + length/percentage must be ordered correctly, as per W3C
|
||||
|
||||
Internet Explorer and Opera, however, support arbitrary ordering. We
|
||||
should fix it up.
|
||||
|
||||
Minor issue though, not strictly necessary.
|
||||
*/
|
||||
|
||||
// control freaks may appreciate the ability to convert these to
|
||||
// percentages or something, but it's not necessary
|
||||
|
||||
/**
|
||||
* Validates the value of background-position.
|
||||
*/
|
||||
class HTMLPurifier_AttrDef_BackgroundPosition extends HTMLPurifier_AttrDef
|
||||
{
|
||||
|
||||
var $length;
|
||||
var $percentage;
|
||||
|
||||
function HTMLPurifier_AttrDef_BackgroundPosition() {
|
||||
$this->length = new HTMLPurifier_AttrDef_CSSLength();
|
||||
$this->percentage = new HTMLPurifier_AttrDef_Percentage();
|
||||
}
|
||||
|
||||
function validate($string, $config, &$context) {
|
||||
$string = $this->parseCDATA($string);
|
||||
$bits = explode(' ', $string);
|
||||
|
||||
$keywords = array();
|
||||
$keywords['h'] = false; // left, right
|
||||
$keywords['v'] = false; // top, bottom
|
||||
$keywords['c'] = false; // center
|
||||
$measures = array();
|
||||
|
||||
$i = 0;
|
||||
|
||||
$lookup = array(
|
||||
'top' => 'v',
|
||||
'bottom' => 'v',
|
||||
'left' => 'h',
|
||||
'right' => 'h',
|
||||
'center' => 'c'
|
||||
);
|
||||
|
||||
foreach ($bits as $bit) {
|
||||
if ($bit === '') continue;
|
||||
|
||||
// test for keyword
|
||||
$lbit = ctype_lower($bit) ? $bit : strtolower($bit);
|
||||
if (isset($lookup[$lbit])) {
|
||||
$status = $lookup[$lbit];
|
||||
$keywords[$status] = $lbit;
|
||||
$i++;
|
||||
}
|
||||
|
||||
// test for length
|
||||
$r = $this->length->validate($bit, $config, $context);
|
||||
if ($r !== false) {
|
||||
$measures[] = $r;
|
||||
$i++;
|
||||
}
|
||||
|
||||
// test for percentage
|
||||
$r = $this->percentage->validate($bit, $config, $context);
|
||||
if ($r !== false) {
|
||||
$measures[] = $r;
|
||||
$i++;
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
if (!$i) return false; // no valid values were caught
|
||||
|
||||
|
||||
$ret = array();
|
||||
|
||||
// first keyword
|
||||
if ($keywords['h']) $ret[] = $keywords['h'];
|
||||
elseif (count($measures)) $ret[] = array_shift($measures);
|
||||
elseif ($keywords['c']) {
|
||||
$ret[] = $keywords['c'];
|
||||
$keywords['c'] = false; // prevent re-use: center = center center
|
||||
}
|
||||
|
||||
if ($keywords['v']) $ret[] = $keywords['v'];
|
||||
elseif (count($measures)) $ret[] = array_shift($measures);
|
||||
elseif ($keywords['c']) $ret[] = $keywords['c'];
|
||||
|
||||
if (empty($ret)) return false;
|
||||
return implode(' ', $ret);
|
||||
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
?>
|
@@ -8,6 +8,11 @@ require_once 'HTMLPurifier/CSSDefinition.php';
|
||||
* @note We don't implement the whole CSS specification, so it might be
|
||||
* difficult to reuse this component in the context of validating
|
||||
* actual stylesheet declarations.
|
||||
* @note If we were really serious about validating the CSS, we would
|
||||
* tokenize the styles and then parse the tokens. Obviously, we
|
||||
* are not doing that. Doing that could seriously harm performance,
|
||||
* but would make these components a lot more viable for a CSS
|
||||
* filtering solution.
|
||||
*/
|
||||
class HTMLPurifier_AttrDef_CSS extends HTMLPurifier_AttrDef
|
||||
{
|
||||
@@ -20,6 +25,9 @@ class HTMLPurifier_AttrDef_CSS extends HTMLPurifier_AttrDef
|
||||
|
||||
// we're going to break the spec and explode by semicolons.
|
||||
// This is because semicolon rarely appears in escaped form
|
||||
// Doing this is generally flaky but fast
|
||||
// IT MIGHT APPEAR IN URIs, see HTMLPurifier_AttrDef_CSSURI
|
||||
// for details
|
||||
|
||||
$declarations = explode(';', $css);
|
||||
$propvalues = array();
|
||||
|
@@ -40,6 +40,7 @@ class HTMLPurifier_AttrDef_CSSLength extends HTMLPurifier_AttrDef
|
||||
|
||||
// we assume all units are two characters
|
||||
$unit = substr($length, $strlen - 2);
|
||||
if (!ctype_lower($unit)) $unit = strtolower($unit);
|
||||
$number = substr($length, 0, $strlen - 2);
|
||||
|
||||
if (!isset($this->units[$unit])) return false;
|
||||
|
58
library/HTMLPurifier/AttrDef/CSSURI.php
Normal file
58
library/HTMLPurifier/AttrDef/CSSURI.php
Normal file
@@ -0,0 +1,58 @@
|
||||
<?php
|
||||
|
||||
require_once 'HTMLPurifier/AttrDef/URI.php';
|
||||
|
||||
/**
|
||||
* Validates a URI in CSS syntax, which uses url('http://example.com')
|
||||
* @note While theoretically speaking we a URI in a CSS document could
|
||||
* be non-embedded, as of CSS2 there is no such usage so we're
|
||||
* generalizing it. This may need to be changed in the future.
|
||||
* @warning Since HTMLPurifier_AttrDef_CSS blindly uses semicolons as
|
||||
* the separator, you cannot put a literal semicolon in
|
||||
* in the URI. Try percent encoding it, in that case.
|
||||
*/
|
||||
class HTMLPurifier_AttrDef_CSSURI extends HTMLPurifier_AttrDef_URI
|
||||
{
|
||||
|
||||
function HTMLPurifier_AttrDef_CSSURI() {
|
||||
$this->HTMLPurifier_AttrDef_URI(true); // always embedded
|
||||
}
|
||||
|
||||
function validate($uri_string, $config, &$context) {
|
||||
// parse the URI out of the string and then pass it onto
|
||||
// the parent object
|
||||
|
||||
$uri_string = $this->parseCDATA($uri_string);
|
||||
if (strpos($uri_string, 'url(') !== 0) return false;
|
||||
$uri_string = substr($uri_string, 4);
|
||||
$new_length = strlen($uri_string) - 1;
|
||||
if ($uri_string[$new_length] != ')') return false;
|
||||
$uri = trim(substr($uri_string, 0, $new_length));
|
||||
|
||||
if (isset($uri[0]) && ($uri[0] == "'" || $uri[0] == '"')) {
|
||||
$quote = $uri[0];
|
||||
$new_length = strlen($uri) - 1;
|
||||
if ($uri[$new_length] !== $quote) return false;
|
||||
$uri = substr($uri, 1, $new_length - 1);
|
||||
}
|
||||
|
||||
$keys = array( '(', ')', ',', ' ', '"', "'");
|
||||
$values = array('\\(', '\\)', '\\,', '\\ ', '\\"', "\\'");
|
||||
$uri = str_replace($values, $keys, $uri);
|
||||
|
||||
$result = parent::validate($uri, $config, $context);
|
||||
|
||||
if ($result === false) return false;
|
||||
|
||||
// escape necessary characters according to CSS spec
|
||||
// except for the comma, none of these should appear in the
|
||||
// URI at all
|
||||
$result = str_replace($keys, $values, $result);
|
||||
|
||||
return "url($result)";
|
||||
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
?>
|
@@ -4,8 +4,7 @@ require_once 'HTMLPurifier/AttrDef.php';
|
||||
|
||||
/**
|
||||
* Validates shorthand CSS property list-style.
|
||||
* @note This currently does not support list-style-image, as that functionality
|
||||
* is not implemented yet elsewhere.
|
||||
* @warning Does not support url tokens that have internal spaces.
|
||||
*/
|
||||
class HTMLPurifier_AttrDef_ListStyle extends HTMLPurifier_AttrDef
|
||||
{
|
||||
@@ -20,6 +19,7 @@ class HTMLPurifier_AttrDef_ListStyle extends HTMLPurifier_AttrDef
|
||||
$def = $config->getCSSDefinition();
|
||||
$this->info['list-style-type'] = $def->info['list-style-type'];
|
||||
$this->info['list-style-position'] = $def->info['list-style-position'];
|
||||
$this->info['list-style-image'] = $def->info['list-style-image'];
|
||||
}
|
||||
|
||||
function validate($string, $config, &$context) {
|
||||
@@ -28,48 +28,50 @@ class HTMLPurifier_AttrDef_ListStyle extends HTMLPurifier_AttrDef
|
||||
$string = $this->parseCDATA($string);
|
||||
if ($string === '') return false;
|
||||
|
||||
// assumes URI doesn't have spaces in it
|
||||
$bits = explode(' ', strtolower($string)); // bits to process
|
||||
|
||||
$caught_type = false;
|
||||
$caught_position = false;
|
||||
$caught_none = false; // as in keyword none, which is in all of them
|
||||
$caught = array();
|
||||
$caught['type'] = false;
|
||||
$caught['position'] = false;
|
||||
$caught['image'] = false;
|
||||
|
||||
$ret = '';
|
||||
$i = 0; // number of catches
|
||||
$none = false;
|
||||
|
||||
foreach ($bits as $bit) {
|
||||
if ($caught_none && ($caught_type || $caught_position)) break;
|
||||
if ($caught_type && $caught_position) break;
|
||||
|
||||
if ($i >= 3) return; // optimization bit
|
||||
if ($bit === '') continue;
|
||||
|
||||
if ($bit === 'none') {
|
||||
if ($caught_none) continue;
|
||||
$caught_none = true;
|
||||
$ret .= 'none ';
|
||||
continue;
|
||||
}
|
||||
|
||||
// if we add anymore, roll it into a loop
|
||||
|
||||
$r = $this->info['list-style-type']->validate($bit, $config, $context);
|
||||
if ($r !== false) {
|
||||
if ($caught_type) continue;
|
||||
$caught_type = true;
|
||||
$ret .= $r . ' ';
|
||||
continue;
|
||||
}
|
||||
|
||||
$r = $this->info['list-style-position']->validate($bit, $config, $context);
|
||||
if ($r !== false) {
|
||||
if ($caught_position) continue;
|
||||
$caught_position = true;
|
||||
$ret .= $r . ' ';
|
||||
continue;
|
||||
foreach ($caught as $key => $status) {
|
||||
if ($status !== false) continue;
|
||||
$r = $this->info['list-style-' . $key]->validate($bit, $config, $context);
|
||||
if ($r === false) continue;
|
||||
if ($r === 'none') {
|
||||
if ($none) continue;
|
||||
else $none = true;
|
||||
if ($key == 'image') continue;
|
||||
}
|
||||
$caught[$key] = $r;
|
||||
$i++;
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
$ret = rtrim($ret);
|
||||
return $ret ? $ret : false;
|
||||
if (!$i) return false;
|
||||
|
||||
$ret = array();
|
||||
|
||||
// construct type
|
||||
if ($caught['type']) $ret[] = $caught['type'];
|
||||
|
||||
// construct image
|
||||
if ($caught['image']) $ret[] = $caught['image'];
|
||||
|
||||
// construct position
|
||||
if ($caught['position']) $ret[] = $caught['position'];
|
||||
|
||||
if (empty($ret)) return false;
|
||||
return implode(' ', $ret);
|
||||
|
||||
}
|
||||
|
||||
|
@@ -4,14 +4,13 @@ require_once 'HTMLPurifier/AttrDef.php';
|
||||
require_once 'HTMLPurifier/AttrDef/Number.php';
|
||||
|
||||
/**
|
||||
* Validates a Percentage as defined by the HTML spec.
|
||||
* @note This also allows integer pixel values.
|
||||
* Validates a Percentage as defined by the CSS spec.
|
||||
*/
|
||||
class HTMLPurifier_AttrDef_Percentage extends HTMLPurifier_AttrDef
|
||||
{
|
||||
|
||||
/**
|
||||
* Instance of HTMLPurifier_AttrDef_Number to defer pixel validation
|
||||
* Instance of HTMLPurifier_AttrDef_Number to defer number validation
|
||||
*/
|
||||
var $number_def;
|
||||
|
||||
|
@@ -139,10 +139,10 @@ class HTMLPurifier_AttrDef_URI extends HTMLPurifier_AttrDef
|
||||
// no need to validate the scheme's fmt since we do that when we
|
||||
// retrieve the specific scheme object from the registry
|
||||
$scheme = ctype_lower($scheme) ? $scheme : strtolower($scheme);
|
||||
$scheme_obj =& $registry->getScheme($scheme, $config, $context);
|
||||
$scheme_obj = $registry->getScheme($scheme, $config, $context);
|
||||
if (!$scheme_obj) return false; // invalid scheme, clean it out
|
||||
} else {
|
||||
$scheme_obj =& $registry->getScheme(
|
||||
$scheme_obj = $registry->getScheme(
|
||||
$config->get('URI', 'DefaultScheme'), $config, $context
|
||||
);
|
||||
}
|
||||
|
@@ -20,7 +20,7 @@ HTMLPurifier_ConfigSchema::defineAllowedValues(
|
||||
class HTMLPurifier_AttrTransform_BdoDir extends HTMLPurifier_AttrTransform
|
||||
{
|
||||
|
||||
function transform($attr, $config, $context) {
|
||||
function transform($attr, $config, &$context) {
|
||||
if (isset($attr['dir'])) return $attr;
|
||||
$attr['dir'] = $config->get('Attr', 'DefaultTextDir');
|
||||
return $attr;
|
||||
|
@@ -25,7 +25,7 @@ HTMLPurifier_ConfigSchema::define(
|
||||
class HTMLPurifier_AttrTransform_ImgRequired extends HTMLPurifier_AttrTransform
|
||||
{
|
||||
|
||||
function transform($attr, $config, $context) {
|
||||
function transform($attr, $config, &$context) {
|
||||
|
||||
$src = true;
|
||||
if (!isset($attr['src'])) {
|
||||
|
@@ -10,7 +10,7 @@ require_once 'HTMLPurifier/AttrTransform.php';
|
||||
class HTMLPurifier_AttrTransform_Lang extends HTMLPurifier_AttrTransform
|
||||
{
|
||||
|
||||
function transform($attr, $config, $context) {
|
||||
function transform($attr, $config, &$context) {
|
||||
|
||||
$lang = isset($attr['lang']) ? $attr['lang'] : false;
|
||||
$xml_lang = isset($attr['xml:lang']) ? $attr['xml:lang'] : false;
|
||||
|
@@ -8,7 +8,7 @@ require_once 'HTMLPurifier/AttrTransform.php';
|
||||
class HTMLPurifier_AttrTransform_TextAlign
|
||||
extends HTMLPurifier_AttrTransform {
|
||||
|
||||
function transform($attr, $config, $context) {
|
||||
function transform($attr, $config, &$context) {
|
||||
|
||||
if (!isset($attr['align'])) return $attr;
|
||||
|
||||
|
@@ -11,6 +11,9 @@ require_once 'HTMLPurifier/AttrDef/FontFamily.php';
|
||||
require_once 'HTMLPurifier/AttrDef/Font.php';
|
||||
require_once 'HTMLPurifier/AttrDef/Border.php';
|
||||
require_once 'HTMLPurifier/AttrDef/ListStyle.php';
|
||||
require_once 'HTMLPurifier/AttrDef/CSSURI.php';
|
||||
require_once 'HTMLPurifier/AttrDef/BackgroundPosition.php';
|
||||
require_once 'HTMLPurifier/AttrDef/Background.php';
|
||||
|
||||
/**
|
||||
* Defines allowed CSS attributes and what their values are.
|
||||
@@ -51,11 +54,19 @@ class HTMLPurifier_CSSDefinition
|
||||
$this->info['font-variant'] = new HTMLPurifier_AttrDef_Enum(
|
||||
array('normal', 'small-caps'), false);
|
||||
|
||||
$uri_or_none = new HTMLPurifier_AttrDef_Composite(
|
||||
array(
|
||||
new HTMLPurifier_AttrDef_Enum(array('none')),
|
||||
new HTMLPurifier_AttrDef_CSSURI()
|
||||
)
|
||||
);
|
||||
|
||||
$this->info['list-style-position'] = new HTMLPurifier_AttrDef_Enum(
|
||||
array('inside', 'outside'), false);
|
||||
$this->info['list-style-type'] = new HTMLPurifier_AttrDef_Enum(
|
||||
array('disc', 'circle', 'square', 'decimal', 'lower-roman',
|
||||
'upper-roman', 'lower-alpha', 'upper-alpha'), false);
|
||||
'upper-roman', 'lower-alpha', 'upper-alpha', 'none'), false);
|
||||
$this->info['list-style-image'] = $uri_or_none;
|
||||
|
||||
$this->info['list-style'] = new HTMLPurifier_AttrDef_ListStyle($config);
|
||||
|
||||
@@ -63,14 +74,14 @@ class HTMLPurifier_CSSDefinition
|
||||
array('capitalize', 'uppercase', 'lowercase', 'none'), false);
|
||||
$this->info['color'] = new HTMLPurifier_AttrDef_Color();
|
||||
|
||||
// technically speaking, this one should get its own validator, but
|
||||
// since we don't support background images, it effectively is
|
||||
// equivalent to color. The only trouble is that if the author
|
||||
// specifies an image and a color, they'll both end up getting dropped,
|
||||
// even though we ought to implement it and just discard the image
|
||||
// info. This will be fixed in a later version (see TODO) when
|
||||
// better URI filtering is implemented.
|
||||
$this->info['background'] =
|
||||
$this->info['background-image'] = $uri_or_none;
|
||||
$this->info['background-repeat'] = new HTMLPurifier_AttrDef_Enum(
|
||||
array('repeat', 'repeat-x', 'repeat-y', 'no-repeat')
|
||||
);
|
||||
$this->info['background-attachment'] = new HTMLPurifier_AttrDef_Enum(
|
||||
array('scroll', 'fixed')
|
||||
);
|
||||
$this->info['background-position'] = new HTMLPurifier_AttrDef_BackgroundPosition();
|
||||
|
||||
$border_color =
|
||||
$this->info['border-top-color'] =
|
||||
@@ -82,6 +93,8 @@ class HTMLPurifier_CSSDefinition
|
||||
new HTMLPurifier_AttrDef_Color()
|
||||
));
|
||||
|
||||
$this->info['background'] = new HTMLPurifier_AttrDef_Background($config);
|
||||
|
||||
$this->info['border-color'] = new HTMLPurifier_AttrDef_Multiple($border_color);
|
||||
|
||||
$border_width =
|
||||
|
@@ -46,23 +46,27 @@ class HTMLPurifier_Config
|
||||
|
||||
/**
|
||||
* Convenience constructor that creates a config object based on a mixed var
|
||||
* @static
|
||||
* @param mixed $config Variable that defines the state of the config
|
||||
* object. Can be: a HTMLPurifier_Config() object or
|
||||
* an array of directives based on loadArray().
|
||||
* object. Can be: a HTMLPurifier_Config() object,
|
||||
* an array of directives based on loadArray(),
|
||||
* or a string filename of an ini file.
|
||||
* @return Configured HTMLPurifier_Config object
|
||||
*/
|
||||
function create($config) {
|
||||
if (is_a($config, 'HTMLPurifier_Config')) return $config;
|
||||
static function create($config) {
|
||||
if ($config instanceof HTMLPurifier_Config) return $config;
|
||||
$ret = HTMLPurifier_Config::createDefault();
|
||||
if (is_array($config)) $ret->loadArray($config);
|
||||
if (is_string($config)) $ret->loadIni($config);
|
||||
elseif (is_array($config)) $ret->loadArray($config);
|
||||
return $ret;
|
||||
}
|
||||
|
||||
/**
|
||||
* Convenience constructor that creates a default configuration object.
|
||||
* @static
|
||||
* @return Default HTMLPurifier_Config object.
|
||||
*/
|
||||
function createDefault() {
|
||||
static function createDefault() {
|
||||
$definition =& HTMLPurifier_ConfigSchema::instance();
|
||||
$config = new HTMLPurifier_Config($definition);
|
||||
return $config;
|
||||
@@ -73,12 +77,17 @@ class HTMLPurifier_Config
|
||||
* @param $namespace String namespace
|
||||
* @param $key String key
|
||||
*/
|
||||
function get($namespace, $key) {
|
||||
function get($namespace, $key, $from_alias = false) {
|
||||
if (!isset($this->def->info[$namespace][$key])) {
|
||||
trigger_error('Cannot retrieve value of undefined directive',
|
||||
E_USER_WARNING);
|
||||
return;
|
||||
}
|
||||
if ($this->def->info[$namespace][$key]->class == 'alias') {
|
||||
trigger_error('Cannot get value from aliased directive, use real name',
|
||||
E_USER_ERROR);
|
||||
return;
|
||||
}
|
||||
return $this->conf[$namespace][$key];
|
||||
}
|
||||
|
||||
@@ -101,12 +110,22 @@ class HTMLPurifier_Config
|
||||
* @param $key String key
|
||||
* @param $value Mixed value
|
||||
*/
|
||||
function set($namespace, $key, $value) {
|
||||
function set($namespace, $key, $value, $from_alias = false) {
|
||||
if (!isset($this->def->info[$namespace][$key])) {
|
||||
trigger_error('Cannot set undefined directive to value',
|
||||
E_USER_WARNING);
|
||||
return;
|
||||
}
|
||||
if ($this->def->info[$namespace][$key]->class == 'alias') {
|
||||
if ($from_alias) {
|
||||
trigger_error('Double-aliases not allowed, please fix '.
|
||||
'ConfigSchema bug');
|
||||
}
|
||||
$this->set($this->def->info[$namespace][$key]->namespace,
|
||||
$this->def->info[$namespace][$key]->name,
|
||||
$value, true);
|
||||
return;
|
||||
}
|
||||
$value = $this->def->validate(
|
||||
$value,
|
||||
$this->def->info[$namespace][$key]->type,
|
||||
@@ -176,6 +195,15 @@ class HTMLPurifier_Config
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Loads configuration values from an ini file
|
||||
* @param $filename Name of ini file
|
||||
*/
|
||||
function loadIni($filename) {
|
||||
$array = parse_ini_file($filename, true);
|
||||
$this->loadArray($array);
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
?>
|
||||
?>
|
||||
|
@@ -67,8 +67,9 @@ class HTMLPurifier_ConfigSchema {
|
||||
|
||||
/**
|
||||
* Retrieves an instance of the application-wide configuration definition.
|
||||
* @static
|
||||
*/
|
||||
function &instance($prototype = null) {
|
||||
static function &instance($prototype = null) {
|
||||
static $instance;
|
||||
if ($prototype !== null) {
|
||||
$instance = $prototype;
|
||||
@@ -81,6 +82,7 @@ class HTMLPurifier_ConfigSchema {
|
||||
|
||||
/**
|
||||
* Defines a directive for configuration
|
||||
* @static
|
||||
* @warning Will fail of directive's namespace is defined
|
||||
* @param $namespace Namespace the directive is in
|
||||
* @param $name Key of directive
|
||||
@@ -89,7 +91,7 @@ class HTMLPurifier_ConfigSchema {
|
||||
* HTMLPurifier_DirectiveDef::$type for allowed values
|
||||
* @param $description Description of directive for documentation
|
||||
*/
|
||||
function define(
|
||||
static function define(
|
||||
$namespace, $name, $default, $type,
|
||||
$description
|
||||
) {
|
||||
@@ -104,6 +106,11 @@ class HTMLPurifier_ConfigSchema {
|
||||
E_USER_ERROR);
|
||||
return;
|
||||
}
|
||||
if (empty($description)) {
|
||||
trigger_error('Description must be non-empty',
|
||||
E_USER_ERROR);
|
||||
return;
|
||||
}
|
||||
if (isset($def->info[$namespace][$name])) {
|
||||
if (
|
||||
$def->info[$namespace][$name]->type !== $type ||
|
||||
@@ -144,10 +151,11 @@ class HTMLPurifier_ConfigSchema {
|
||||
|
||||
/**
|
||||
* Defines a namespace for directives to be put into.
|
||||
* @static
|
||||
* @param $namespace Namespace's name
|
||||
* @param $description Description of the namespace
|
||||
*/
|
||||
function defineNamespace($namespace, $description) {
|
||||
static function defineNamespace($namespace, $description) {
|
||||
$def =& HTMLPurifier_ConfigSchema::instance();
|
||||
if (isset($def->info[$namespace])) {
|
||||
trigger_error('Cannot redefine namespace', E_USER_ERROR);
|
||||
@@ -158,6 +166,11 @@ class HTMLPurifier_ConfigSchema {
|
||||
E_USER_ERROR);
|
||||
return;
|
||||
}
|
||||
if (empty($description)) {
|
||||
trigger_error('Description must be non-empty',
|
||||
E_USER_ERROR);
|
||||
return;
|
||||
}
|
||||
$def->info[$namespace] = array();
|
||||
$def->info_namespace[$namespace] = new HTMLPurifier_ConfigEntity_Namespace();
|
||||
$def->info_namespace[$namespace]->description = $description;
|
||||
@@ -169,12 +182,13 @@ class HTMLPurifier_ConfigSchema {
|
||||
*
|
||||
* Directive value aliases are convenient for developers because it lets
|
||||
* them set a directive to several values and get the same result.
|
||||
* @static
|
||||
* @param $namespace Directive's namespace
|
||||
* @param $name Name of Directive
|
||||
* @param $alias Name of aliased value
|
||||
* @param $real Value aliased value will be converted into
|
||||
*/
|
||||
function defineValueAliases($namespace, $name, $aliases) {
|
||||
static function defineValueAliases($namespace, $name, $aliases) {
|
||||
$def =& HTMLPurifier_ConfigSchema::instance();
|
||||
if (!isset($def->info[$namespace][$name])) {
|
||||
trigger_error('Cannot set value alias for non-existant directive',
|
||||
@@ -200,23 +214,78 @@ class HTMLPurifier_ConfigSchema {
|
||||
|
||||
/**
|
||||
* Defines a set of allowed values for a directive.
|
||||
* @static
|
||||
* @param $namespace Namespace of directive
|
||||
* @param $name Name of directive
|
||||
* @param $allowed_values Arraylist of allowed values
|
||||
*/
|
||||
function defineAllowedValues($namespace, $name, $allowed_values) {
|
||||
static function defineAllowedValues($namespace, $name, $allowed_values) {
|
||||
$def =& HTMLPurifier_ConfigSchema::instance();
|
||||
if (!isset($def->info[$namespace][$name])) {
|
||||
trigger_error('Cannot define allowed values for undefined directive',
|
||||
E_USER_ERROR);
|
||||
return;
|
||||
}
|
||||
if ($def->info[$namespace][$name]->allowed === true) {
|
||||
$def->info[$namespace][$name]->allowed = array();
|
||||
$directive =& $def->info[$namespace][$name];
|
||||
$type = $directive->type;
|
||||
if ($type != 'string' && $type != 'istring') {
|
||||
trigger_error('Cannot define allowed values for directive whose type is not string',
|
||||
E_USER_ERROR);
|
||||
return;
|
||||
}
|
||||
if ($directive->allowed === true) {
|
||||
$directive->allowed = array();
|
||||
}
|
||||
foreach ($allowed_values as $value) {
|
||||
$def->info[$namespace][$name]->allowed[$value] = true;
|
||||
$directive->allowed[$value] = true;
|
||||
}
|
||||
if ($def->defaults[$namespace][$name] !== null &&
|
||||
!isset($directive->allowed[$def->defaults[$namespace][$name]])) {
|
||||
trigger_error('Default value must be in allowed range of variables',
|
||||
E_USER_ERROR);
|
||||
$directive->allowed = true; // undo undo!
|
||||
return;
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Defines a directive alias for backwards compatibility
|
||||
* @static
|
||||
* @param $namespace
|
||||
* @param $name Directive that will be aliased
|
||||
* @param $new_namespace
|
||||
* @param $new_name Directive that the alias will be to
|
||||
*/
|
||||
static function defineAlias($namespace, $name, $new_namespace, $new_name) {
|
||||
$def =& HTMLPurifier_ConfigSchema::instance();
|
||||
if (!isset($def->info[$namespace])) {
|
||||
trigger_error('Cannot define directive alias in undefined namespace',
|
||||
E_USER_ERROR);
|
||||
return;
|
||||
}
|
||||
if (!ctype_alnum($name)) {
|
||||
trigger_error('Directive name must be alphanumeric',
|
||||
E_USER_ERROR);
|
||||
return;
|
||||
}
|
||||
if (isset($def->info[$namespace][$name])) {
|
||||
trigger_error('Cannot define alias over directive',
|
||||
E_USER_ERROR);
|
||||
return;
|
||||
}
|
||||
if (!isset($def->info[$new_namespace][$new_name])) {
|
||||
trigger_error('Cannot define alias to undefined directive',
|
||||
E_USER_ERROR);
|
||||
return;
|
||||
}
|
||||
if ($def->info[$new_namespace][$new_name]->class == 'alias') {
|
||||
trigger_error('Cannot define alias to alias',
|
||||
E_USER_ERROR);
|
||||
return;
|
||||
}
|
||||
$def->info[$namespace][$name] =
|
||||
new HTMLPurifier_ConfigEntity_DirectiveAlias(
|
||||
$new_namespace, $new_name);
|
||||
}
|
||||
|
||||
/**
|
||||
@@ -305,7 +374,7 @@ class HTMLPurifier_ConfigSchema {
|
||||
*/
|
||||
function isError($var) {
|
||||
if (!is_object($var)) return false;
|
||||
if (!is_a($var, 'HTMLPurifier_Error')) return false;
|
||||
if (!($var instanceof HTMLPurifier_Error)) return false;
|
||||
return true;
|
||||
}
|
||||
}
|
||||
@@ -313,13 +382,21 @@ class HTMLPurifier_ConfigSchema {
|
||||
/**
|
||||
* Base class for configuration entity
|
||||
*/
|
||||
class HTMLPurifier_ConfigEntity {}
|
||||
class HTMLPurifier_ConfigEntity {
|
||||
var $class = false;
|
||||
}
|
||||
|
||||
/**
|
||||
* Structure object describing of a namespace
|
||||
*/
|
||||
class HTMLPurifier_ConfigEntity_Namespace extends HTMLPurifier_ConfigEntity {
|
||||
|
||||
function HTMLPurifier_ConfigEntity_Namespace($description = null) {
|
||||
$this->description = $description;
|
||||
}
|
||||
|
||||
var $class = 'namespace';
|
||||
|
||||
/**
|
||||
* String description of what kinds of directives go in this namespace.
|
||||
*/
|
||||
@@ -334,15 +411,21 @@ class HTMLPurifier_ConfigEntity_Namespace extends HTMLPurifier_ConfigEntity {
|
||||
class HTMLPurifier_ConfigEntity_Directive extends HTMLPurifier_ConfigEntity
|
||||
{
|
||||
|
||||
/**
|
||||
* Hash of value aliases, i.e. values that are equivalent.
|
||||
*/
|
||||
var $aliases = array();
|
||||
var $class = 'directive';
|
||||
|
||||
/**
|
||||
* Lookup table of allowed values of the element, bool true if all allowed.
|
||||
*/
|
||||
var $allowed = true;
|
||||
function HTMLPurifier_ConfigEntity_Directive(
|
||||
$type = null,
|
||||
$descriptions = null,
|
||||
$allow_null = null,
|
||||
$allowed = null,
|
||||
$aliases = null
|
||||
) {
|
||||
if ( $type !== null) $this->type = $type;
|
||||
if ($descriptions !== null) $this->descriptions = $descriptions;
|
||||
if ( $allow_null !== null) $this->allow_null = $allow_null;
|
||||
if ( $allowed !== null) $this->allowed = $allowed;
|
||||
if ( $aliases !== null) $this->aliases = $aliases;
|
||||
}
|
||||
|
||||
/**
|
||||
* Allowed type of the directive. Values are:
|
||||
@@ -359,16 +442,26 @@ class HTMLPurifier_ConfigEntity_Directive extends HTMLPurifier_ConfigEntity
|
||||
var $type = 'mixed';
|
||||
|
||||
/**
|
||||
* Is null allowed? Has no affect for mixed type.
|
||||
* Plaintext descriptions of the configuration entity is. Organized by
|
||||
* file and line number, so multiple descriptions are allowed.
|
||||
*/
|
||||
var $descriptions = array();
|
||||
|
||||
/**
|
||||
* Is null allowed? Has no effect for mixed type.
|
||||
* @bool
|
||||
*/
|
||||
var $allow_null = false;
|
||||
|
||||
/**
|
||||
* Plaintext descriptions of the configuration entity is. Organized by
|
||||
* file and line number, so multiple descriptions are allowed.
|
||||
* Lookup table of allowed values of the element, bool true if all allowed.
|
||||
*/
|
||||
var $descriptions = array();
|
||||
var $allowed = true;
|
||||
|
||||
/**
|
||||
* Hash of value aliases, i.e. values that are equivalent.
|
||||
*/
|
||||
var $aliases = array();
|
||||
|
||||
/**
|
||||
* Adds a description to the array
|
||||
@@ -380,4 +473,26 @@ class HTMLPurifier_ConfigEntity_Directive extends HTMLPurifier_ConfigEntity
|
||||
|
||||
}
|
||||
|
||||
?>
|
||||
/**
|
||||
* Structure object describing a directive alias
|
||||
*/
|
||||
class HTMLPurifier_ConfigEntity_DirectiveAlias extends HTMLPurifier_ConfigEntity
|
||||
{
|
||||
var $class = 'alias';
|
||||
|
||||
/**
|
||||
* Namespace being aliased to
|
||||
*/
|
||||
var $namespace;
|
||||
/**
|
||||
* Directive being aliased to
|
||||
*/
|
||||
var $name;
|
||||
|
||||
function HTMLPurifier_ConfigEntity_DirectiveAlias($namespace, $name) {
|
||||
$this->namespace = $namespace;
|
||||
$this->name = $name;
|
||||
}
|
||||
}
|
||||
|
||||
?>
|
||||
|
@@ -6,15 +6,29 @@ HTMLPurifier_ConfigSchema::define(
|
||||
'Core', 'Encoding', 'utf-8', 'istring',
|
||||
'If for some reason you are unable to convert all webpages to UTF-8, '.
|
||||
'you can use this directive as a stop-gap compatibility change to '.
|
||||
'let HTMLPurifier deal with non UTF-8 input. This technique has '.
|
||||
'let HTML Purifier deal with non UTF-8 input. This technique has '.
|
||||
'notable deficiencies: absolutely no characters outside of the selected '.
|
||||
'character encoding will be preserved, not even the ones that have '.
|
||||
'been ampersand escaped (this is due to a UTF-8 specific <em>feature</em> '.
|
||||
'that automatically resolves all entities), making it pretty useless '.
|
||||
'for anything except the most I18N-blind applications. This directive '.
|
||||
'for anything except the most I18N-blind applications, although '.
|
||||
'%Core.EscapeNonASCIICharacters offers fixes this trouble with '.
|
||||
'another tradeoff. This directive '.
|
||||
'only accepts ISO-8859-1 if iconv is not enabled.'
|
||||
);
|
||||
|
||||
HTMLPurifier_ConfigSchema::define(
|
||||
'Core', 'EscapeNonASCIICharacters', false, 'bool',
|
||||
'This directive overcomes a deficiency in %Core.Encoding by blindly '.
|
||||
'converting all non-ASCII characters into decimal numeric entities before '.
|
||||
'converting it to its native encoding. This means that even '.
|
||||
'characters that can be expressed in the non-UTF-8 encoding will '.
|
||||
'be entity-ized, which can be a real downer for encodings like Big5. '.
|
||||
'It also assumes that the ASCII repetoire is available, although '.
|
||||
'this is the case for almost all encodings. Anyway, use UTF-8! This '.
|
||||
'directive has been available since 1.4.0.'
|
||||
);
|
||||
|
||||
if ( !function_exists('iconv') ) {
|
||||
// only encodings with native PHP support
|
||||
HTMLPurifier_ConfigSchema::defineAllowedValues(
|
||||
@@ -38,16 +52,25 @@ HTMLPurifier_ConfigSchema::define(
|
||||
|
||||
/**
|
||||
* A UTF-8 specific character encoder that handles cleaning and transforming.
|
||||
* @note All functions in this class should be static.
|
||||
*/
|
||||
class HTMLPurifier_Encoder
|
||||
{
|
||||
|
||||
/**
|
||||
* Constructor throws fatal error if you attempt to instantiate class
|
||||
*/
|
||||
function HTMLPurifier_Encoder() {
|
||||
trigger_error('Cannot instantiate encoder, call methods statically', E_USER_ERROR);
|
||||
}
|
||||
|
||||
/**
|
||||
* Cleans a UTF-8 string for well-formedness and SGML validity
|
||||
*
|
||||
* It will parse according to UTF-8 and return a valid UTF8 string, with
|
||||
* non-SGML codepoints excluded.
|
||||
*
|
||||
* @static
|
||||
* @note Just for reference, the non-SGML code points are 0 to 31 and
|
||||
* 127 to 159, inclusive. However, we allow code points 9, 10
|
||||
* and 13, which are the tab, line feed and carriage return
|
||||
@@ -67,7 +90,7 @@ class HTMLPurifier_Encoder
|
||||
* would need that, and I'm probably not going to implement them.
|
||||
* Once again, PHP 6 should solve all our problems.
|
||||
*/
|
||||
function cleanUTF8($str, $force_php = false) {
|
||||
static function cleanUTF8($str, $force_php = false) {
|
||||
|
||||
static $non_sgml_chars = array();
|
||||
if (empty($non_sgml_chars)) {
|
||||
@@ -225,6 +248,7 @@ class HTMLPurifier_Encoder
|
||||
|
||||
/**
|
||||
* Translates a Unicode codepoint into its corresponding UTF-8 character.
|
||||
* @static
|
||||
* @note Based on Feyd's function at
|
||||
* <http://forums.devnetwork.net/viewtopic.php?p=191404#191404>,
|
||||
* which is in public domain.
|
||||
@@ -249,7 +273,7 @@ class HTMLPurifier_Encoder
|
||||
// | 00000000 | 00010000 | 11111111 | 11111111 | Defined upper limit of legal scalar codes
|
||||
// +----------+----------+----------+----------+
|
||||
|
||||
function unichr($code) {
|
||||
static function unichr($code) {
|
||||
if($code > 1114111 or $code < 0 or
|
||||
($code >= 55296 and $code <= 57343) ) {
|
||||
// bits are set outside the "valid" range as defined
|
||||
@@ -288,8 +312,9 @@ class HTMLPurifier_Encoder
|
||||
|
||||
/**
|
||||
* Converts a string to UTF-8 based on configuration.
|
||||
* @static
|
||||
*/
|
||||
function convertToUTF8($str, $config, &$context) {
|
||||
static function convertToUTF8($str, $config, &$context) {
|
||||
static $iconv = null;
|
||||
if ($iconv === null) $iconv = function_exists('iconv');
|
||||
$encoding = $config->get('Core', 'Encoding');
|
||||
@@ -299,23 +324,77 @@ class HTMLPurifier_Encoder
|
||||
} elseif ($encoding === 'iso-8859-1') {
|
||||
return @utf8_encode($str);
|
||||
}
|
||||
trigger_error('Encoding not supported', E_USER_ERROR);
|
||||
}
|
||||
|
||||
/**
|
||||
* Converts a string from UTF-8 based on configuration.
|
||||
* @static
|
||||
* @note Currently, this is a lossy conversion, with unexpressable
|
||||
* characters being omitted.
|
||||
*/
|
||||
function convertFromUTF8($str, $config, &$context) {
|
||||
static function convertFromUTF8($str, $config, &$context) {
|
||||
static $iconv = null;
|
||||
if ($iconv === null) $iconv = function_exists('iconv');
|
||||
$encoding = $config->get('Core', 'Encoding');
|
||||
if ($encoding === 'utf-8') return $str;
|
||||
if ($config->get('Core', 'EscapeNonASCIICharacters')) {
|
||||
$str = HTMLPurifier_Encoder::convertToASCIIDumbLossless($str);
|
||||
}
|
||||
if ($iconv && !$config->get('Test', 'ForceNoIconv')) {
|
||||
return @iconv('utf-8', $encoding . '//IGNORE', $str);
|
||||
} elseif ($encoding === 'iso-8859-1') {
|
||||
return @utf8_decode($str);
|
||||
}
|
||||
trigger_error('Encoding not supported', E_USER_ERROR);
|
||||
}
|
||||
|
||||
/**
|
||||
* Lossless (character-wise) conversion of HTML to ASCII
|
||||
* @static
|
||||
* @param $str UTF-8 string to be converted to ASCII
|
||||
* @returns ASCII encoded string with non-ASCII character entity-ized
|
||||
* @warning Adapted from MediaWiki, claiming fair use: this is a common
|
||||
* algorithm. If you disagree with this license fudgery,
|
||||
* implement it yourself.
|
||||
* @note Uses decimal numeric entities since they are best supported.
|
||||
* @note This is a DUMB function: it has no concept of keeping
|
||||
* character entities that the projected character encoding
|
||||
* can allow. We could possibly implement a smart version
|
||||
* but that would require it to also know which Unicode
|
||||
* codepoints the charset supported (not an easy task).
|
||||
* @note Sort of with cleanUTF8() but it assumes that $str is
|
||||
* well-formed UTF-8
|
||||
*/
|
||||
static function convertToASCIIDumbLossless($str) {
|
||||
$bytesleft = 0;
|
||||
$result = '';
|
||||
$working = 0;
|
||||
$len = strlen($str);
|
||||
for( $i = 0; $i < $len; $i++ ) {
|
||||
$bytevalue = ord( $str[$i] );
|
||||
if( $bytevalue <= 0x7F ) { //0xxx xxxx
|
||||
$result .= chr( $bytevalue );
|
||||
$bytesleft = 0;
|
||||
} elseif( $bytevalue <= 0xBF ) { //10xx xxxx
|
||||
$working = $working << 6;
|
||||
$working += ($bytevalue & 0x3F);
|
||||
$bytesleft--;
|
||||
if( $bytesleft <= 0 ) {
|
||||
$result .= "&#" . $working . ";";
|
||||
}
|
||||
} elseif( $bytevalue <= 0xDF ) { //110x xxxx
|
||||
$working = $bytevalue & 0x1F;
|
||||
$bytesleft = 1;
|
||||
} elseif( $bytevalue <= 0xEF ) { //1110 xxxx
|
||||
$working = $bytevalue & 0x0F;
|
||||
$bytesleft = 2;
|
||||
} else { //1111 0xxx
|
||||
$working = $bytevalue & 0x07;
|
||||
$bytesleft = 3;
|
||||
}
|
||||
}
|
||||
return $result;
|
||||
}
|
||||
|
||||
|
||||
|
@@ -26,9 +26,10 @@ class HTMLPurifier_EntityLookup {
|
||||
|
||||
/**
|
||||
* Retrieves sole instance of the object.
|
||||
* @static
|
||||
* @param Optional prototype of custom lookup table to overload with.
|
||||
*/
|
||||
function instance($prototype = false) {
|
||||
static function instance($prototype = false) {
|
||||
// no references, since PHP doesn't copy unless modified
|
||||
static $instance = null;
|
||||
if ($prototype) {
|
||||
|
39
library/HTMLPurifier/Filter.php
Normal file
39
library/HTMLPurifier/Filter.php
Normal file
@@ -0,0 +1,39 @@
|
||||
<?php
|
||||
|
||||
/**
|
||||
* Represents a pre or post processing filter on HTML Purifier's output
|
||||
*
|
||||
* Sometimes, a little ad-hoc fixing of HTML has to be done before
|
||||
* it gets sent through HTML Purifier: you can use filters to acheive
|
||||
* this effect. For instance, YouTube videos can be preserved using
|
||||
* this manner. You could have used a decorator for this task, but
|
||||
* PHP's support for them is not terribly robust, so we're going
|
||||
* to just loop through the filters.
|
||||
*
|
||||
* Filters should be exited first in, last out. If there are three filters,
|
||||
* named 1, 2 and 3, the order of execution should go 1->preFilter,
|
||||
* 2->preFilter, 3->preFilter, purify, 3->postFilter, 2->postFilter,
|
||||
* 1->postFilter.
|
||||
*/
|
||||
|
||||
class HTMLPurifier_Filter
|
||||
{
|
||||
|
||||
/**
|
||||
* Name of the filter for identification purposes
|
||||
*/
|
||||
var $name;
|
||||
|
||||
/**
|
||||
* Pre-processor function, handles HTML before HTML Purifier
|
||||
*/
|
||||
function preFilter($html, $config, &$context) {}
|
||||
|
||||
/**
|
||||
* Post-processor function, handles HTML after HTML Purifier
|
||||
*/
|
||||
function postFilter($html, $config, &$context) {}
|
||||
|
||||
}
|
||||
|
||||
?>
|
34
library/HTMLPurifier/Filter/YouTube.php
Normal file
34
library/HTMLPurifier/Filter/YouTube.php
Normal file
@@ -0,0 +1,34 @@
|
||||
<?php
|
||||
|
||||
require_once 'HTMLPurifier/Filter.php';
|
||||
|
||||
class HTMLPurifier_Filter_YouTube extends HTMLPurifier_Filter
|
||||
{
|
||||
|
||||
var $name = 'YouTube preservation';
|
||||
|
||||
function preFilter($html, $config, &$context) {
|
||||
$pre_regex = '#<object[^>]+>.+?'.
|
||||
'http://www.youtube.com/v/([A-Za-z0-9\-_]+).+?</object>#';
|
||||
$pre_replace = '<span class="youtube-embed">\1</span>';
|
||||
return preg_replace($pre_regex, $pre_replace, $html);
|
||||
}
|
||||
|
||||
function postFilter($html, $config, &$context) {
|
||||
$post_regex = '#<span class="youtube-embed">([A-Za-z0-9\-_]+)</span>#';
|
||||
$post_replace = '<object width="425" height="350" '.
|
||||
'data="http://www.youtube.com/v/\1">'.
|
||||
'<param name="movie" value="http://www.youtube.com/v/\1"></param>'.
|
||||
'<param name="wmode" value="transparent"></param>'.
|
||||
'<!--[if IE]>'.
|
||||
'<embed src="http://www.youtube.com/v/\1"'.
|
||||
'type="application/x-shockwave-flash"'.
|
||||
'wmode="transparent" width="425" height="350" />'.
|
||||
'<![endif]-->'.
|
||||
'</object>';
|
||||
return preg_replace($post_regex, $post_replace, $html);
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
?>
|
@@ -300,9 +300,6 @@ class HTMLPurifier_HTMLDefinition
|
||||
$this->info['b']->child =
|
||||
$this->info['big']->child =
|
||||
$this->info['small']->child=
|
||||
$this->info['u']->child =
|
||||
$this->info['s']->child =
|
||||
$this->info['strike']->child =
|
||||
$this->info['bdo']->child =
|
||||
$this->info['span']->child =
|
||||
$this->info['dt']->child =
|
||||
@@ -314,6 +311,12 @@ class HTMLPurifier_HTMLDefinition
|
||||
$this->info['h5']->child =
|
||||
$this->info['h6']->child = $e_Inline;
|
||||
|
||||
if (!$this->strict) {
|
||||
$this->info['u']->child =
|
||||
$this->info['s']->child =
|
||||
$this->info['strike']->child = $e_Inline;
|
||||
}
|
||||
|
||||
// the only three required definitions, besides custom table code
|
||||
$this->info['ol']->child =
|
||||
$this->info['ul']->child = new HTMLPurifier_ChildDef_Required('li');
|
||||
@@ -355,10 +358,12 @@ class HTMLPurifier_HTMLDefinition
|
||||
// reuses $e_Inline and $e_Block
|
||||
foreach ($e_Inline->elements as $name => $bool) {
|
||||
if ($name == '#PCDATA') continue;
|
||||
if (!isset($this->info[$name])) continue;
|
||||
$this->info[$name]->type = 'inline';
|
||||
}
|
||||
|
||||
foreach ($e_Block->elements as $name => $bool) {
|
||||
if (!isset($this->info[$name])) continue;
|
||||
$this->info[$name]->type = 'block';
|
||||
}
|
||||
|
||||
@@ -531,7 +536,7 @@ class HTMLPurifier_HTMLDefinition
|
||||
|
||||
// protect against stdclasses floating around
|
||||
foreach ($this->info as $key => $obj) {
|
||||
if (is_a($obj, 'stdclass')) {
|
||||
if ($obj instanceof stdClass) {
|
||||
unset($this->info[$key]);
|
||||
}
|
||||
}
|
||||
@@ -648,4 +653,4 @@ class HTMLPurifier_ElementDef
|
||||
|
||||
}
|
||||
|
||||
?>
|
||||
?>
|
||||
|
@@ -56,7 +56,6 @@ class HTMLPurifier_Lexer
|
||||
{
|
||||
|
||||
function HTMLPurifier_Lexer() {
|
||||
$this->_encoder = new HTMLPurifier_Encoder();
|
||||
$this->_entity_parser = new HTMLPurifier_EntityParser();
|
||||
}
|
||||
|
||||
@@ -114,8 +113,6 @@ class HTMLPurifier_Lexer
|
||||
return $string;
|
||||
}
|
||||
|
||||
var $_encoder;
|
||||
|
||||
/**
|
||||
* Lexes an HTML string into tokens.
|
||||
*
|
||||
@@ -138,6 +135,8 @@ class HTMLPurifier_Lexer
|
||||
* default with your own implementation. A copy/reference of the prototype
|
||||
* lexer will now be returned when you request a new lexer.
|
||||
*
|
||||
* @static
|
||||
*
|
||||
* @note
|
||||
* Though it is possible to call this factory method from subclasses,
|
||||
* such usage is not recommended.
|
||||
@@ -145,14 +144,14 @@ class HTMLPurifier_Lexer
|
||||
* @param $prototype Optional prototype lexer.
|
||||
* @return Concrete lexer.
|
||||
*/
|
||||
function create($prototype = null) {
|
||||
static function create($prototype = null) {
|
||||
// we don't really care if it's a reference or a copy
|
||||
static $lexer = null;
|
||||
if ($prototype) {
|
||||
$lexer = $prototype;
|
||||
}
|
||||
if (empty($lexer)) {
|
||||
if (version_compare(PHP_VERSION, '5', '>=')) {
|
||||
if (class_exists('DOMDocument')) { // check for DOM support
|
||||
require_once 'HTMLPurifier/Lexer/DOMLex.php';
|
||||
$lexer = new HTMLPurifier_Lexer_DOMLex();
|
||||
} else {
|
||||
@@ -166,11 +165,12 @@ class HTMLPurifier_Lexer
|
||||
/**
|
||||
* Translates CDATA sections into regular sections (through escaping).
|
||||
*
|
||||
* @static
|
||||
* @protected
|
||||
* @param $string HTML string to process.
|
||||
* @returns HTML with CDATA sections escaped.
|
||||
*/
|
||||
function escapeCDATA($string) {
|
||||
static function escapeCDATA($string) {
|
||||
return preg_replace_callback(
|
||||
'/<!\[CDATA\[(.+?)\]\]>/',
|
||||
array('HTMLPurifier_Lexer', 'CDATACallback'),
|
||||
@@ -181,13 +181,14 @@ class HTMLPurifier_Lexer
|
||||
/**
|
||||
* Callback function for escapeCDATA() that does the work.
|
||||
*
|
||||
* @static
|
||||
* @warning Though this is public in order to let the callback happen,
|
||||
* calling it directly is not recommended.
|
||||
* @params $matches PCRE matches array, with index 0 the entire match
|
||||
* and 1 the inside of the CDATA section.
|
||||
* @returns Escaped internals of the CDATA section.
|
||||
*/
|
||||
function CDATACallback($matches) {
|
||||
static function CDATACallback($matches) {
|
||||
// not exactly sure why the character set is needed, but whatever
|
||||
return htmlspecialchars($matches[1], ENT_COMPAT, 'UTF-8');
|
||||
}
|
||||
@@ -212,7 +213,7 @@ class HTMLPurifier_Lexer
|
||||
// clean into wellformed UTF-8 string for an SGML context: this has
|
||||
// to be done after entity expansion because the entities sometimes
|
||||
// represent non-SGML characters (horror, horror!)
|
||||
$html = $this->_encoder->cleanUTF8($html);
|
||||
$html = HTMLPurifier_Encoder::cleanUTF8($html);
|
||||
|
||||
return $html;
|
||||
}
|
||||
|
@@ -88,6 +88,11 @@ class HTMLPurifier_Lexer_DOMLex extends HTMLPurifier_Lexer
|
||||
} elseif ($node->nodeType === XML_COMMENT_NODE) {
|
||||
$tokens[] = $this->factory->createComment($node->data);
|
||||
return;
|
||||
} elseif (
|
||||
// not-well tested: there may be other nodes we have to grab
|
||||
$node->nodeType !== XML_ELEMENT_NODE
|
||||
) {
|
||||
return;
|
||||
}
|
||||
|
||||
$attr = $node->hasAttributes() ?
|
||||
|
@@ -37,7 +37,7 @@ class HTMLPurifier_Lexer_PEARSax3 extends HTMLPurifier_Lexer
|
||||
|
||||
$string = $this->normalize($string, $config, $context);
|
||||
|
||||
$parser=& new XML_HTMLSax3();
|
||||
$parser = new XML_HTMLSax3();
|
||||
$parser->set_object($this);
|
||||
$parser->set_element_handler('openHandler','closeHandler');
|
||||
$parser->set_data_handler('dataHandler');
|
||||
|
@@ -10,10 +10,10 @@ class HTMLPurifier_Printer_HTMLDefinition extends HTMLPurifier_Printer
|
||||
*/
|
||||
var $def;
|
||||
|
||||
function render(&$config) {
|
||||
function render($config) {
|
||||
$ret = '';
|
||||
$this->config =& $config;
|
||||
$this->def =& $config->getHTMLDefinition();
|
||||
$this->def = $config->getHTMLDefinition();
|
||||
$def =& $this->def;
|
||||
|
||||
$ret .= $this->start('div', array('class' => 'HTMLPurifier_Printer'));
|
||||
|
@@ -32,12 +32,13 @@ class HTMLPurifier_URISchemeRegistry
|
||||
|
||||
/**
|
||||
* Retrieve sole instance of the registry.
|
||||
* @static
|
||||
* @param $prototype Optional prototype to overload sole instance with,
|
||||
* or bool true to reset to default registry.
|
||||
* @note Pass a registry object $prototype with a compatible interface and
|
||||
* the function will copy it and return it all further times.
|
||||
*/
|
||||
function &instance($prototype = null) {
|
||||
static function &instance($prototype = null) {
|
||||
static $instance = null;
|
||||
if ($prototype !== null) {
|
||||
$instance = $prototype;
|
||||
|
40
smoketests/all.php
Normal file
40
smoketests/all.php
Normal file
@@ -0,0 +1,40 @@
|
||||
<?php
|
||||
|
||||
require_once 'common.php';
|
||||
|
||||
header('Content-type: text/html; charset=UTF-8');
|
||||
echo '<?xml version="1.0" encoding="UTF-8" ?>';
|
||||
|
||||
?><!DOCTYPE html
|
||||
PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
|
||||
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-loose.dtd">
|
||||
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
|
||||
<head>
|
||||
<title>HTML Purifier: All Smoketests</title>
|
||||
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
|
||||
<style type="text/css">
|
||||
#content {margin:5em;}
|
||||
iframe {width:100%;height:30em;}
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
<h1>HTML Purifier: All Smoketests</h1>
|
||||
<div id="content">
|
||||
<?php
|
||||
|
||||
$dir = './';
|
||||
$dh = opendir($dir);
|
||||
while (false !== ($filename = readdir($dh))) {
|
||||
if ($filename[0] == '.') continue;
|
||||
if (strpos($filename, '.php') === false) continue;
|
||||
if ($filename == 'common.php') continue;
|
||||
if ($filename == 'all.php') continue;
|
||||
?>
|
||||
<iframe src="<?php echo escapeHTML($filename); ?>"></iframe>
|
||||
<?php
|
||||
}
|
||||
|
||||
?>
|
||||
</div>
|
||||
</body>
|
||||
</html>
|
@@ -3,6 +3,7 @@
|
||||
header('Content-type: text/html; charset=UTF-8');
|
||||
|
||||
require_once '../library/HTMLPurifier.auto.php';
|
||||
error_reporting(E_ALL | E_STRICT);
|
||||
|
||||
function escapeHTML($string) {
|
||||
$string = HTMLPurifier_Encoder::cleanUTF8($string);
|
||||
@@ -10,4 +11,4 @@ function escapeHTML($string) {
|
||||
return $string;
|
||||
}
|
||||
|
||||
?>
|
||||
?>
|
||||
|
@@ -15,34 +15,13 @@ echo '<?xml version="1.0" encoding="UTF-8" ?>';
|
||||
<h1>HTML Purifier Preserve YouTube Smoketest</h1>
|
||||
<?php
|
||||
|
||||
class HTMLPurifierX_PreserveYouTube extends HTMLPurifier
|
||||
{
|
||||
function purify($html, $config = null) {
|
||||
$pre_regex = '#<object[^>]+>.+?'.
|
||||
'http://www.youtube.com/v/([A-Za-z0-9]+).+?</object>#';
|
||||
$pre_replace = '<span class="youtube-embed">\1</span>';
|
||||
$html = preg_replace($pre_regex, $pre_replace, $html);
|
||||
$html = parent::purify($html, $config);
|
||||
$post_regex = '#<span class="youtube-embed">([A-Za-z0-9]+)</span>#';
|
||||
$post_replace = '<object width="425" height="350" '.
|
||||
'data="http://www.youtube.com/v/\1">'.
|
||||
'<param name="movie" value="http://www.youtube.com/v/\1"></param>'.
|
||||
'<param name="wmode" value="transparent"></param>'.
|
||||
'<!--[if IE]>'.
|
||||
'<embed src="http://www.youtube.com/v/\1"'.
|
||||
'type="application/x-shockwave-flash"'.
|
||||
'wmode="transparent" width="425" height="350" />'.
|
||||
'<![endif]-->'.
|
||||
'</object>';
|
||||
$html = preg_replace($post_regex, $post_replace, $html);
|
||||
return $html;
|
||||
}
|
||||
}
|
||||
|
||||
$string = '<object width="425" height="350"><param name="movie" value="http://www.youtube.com/v/JzqumbhfxRo"></param><param name="wmode" value="transparent"></param><embed src="http://www.youtube.com/v/JzqumbhfxRo" type="application/x-shockwave-flash" wmode="transparent" width="425" height="350"></embed></object>';
|
||||
|
||||
$regular_purifier = new HTMLPurifier();
|
||||
$youtube_purifier = new HTMLPurifierX_PreserveYouTube();
|
||||
|
||||
$youtube_purifier = new HTMLPurifier();
|
||||
require_once 'HTMLPurifier/Filter/YouTube.php';
|
||||
$youtube_purifier->addFilter(new HTMLPurifier_Filter_YouTube());
|
||||
|
||||
?>
|
||||
<h2>Unpurified</h2>
|
||||
|
@@ -46,6 +46,7 @@ echo '<?xml version="1.0" encoding="UTF-8" ?>';
|
||||
.HTMLPurifier_Printer caption {font-size:1.5em; font-weight:bold;
|
||||
width:100%;}
|
||||
.HTMLPurifier_Printer .heavy {background:#99C;text-align:center;}
|
||||
dt {font-weight:bold;}
|
||||
</style>
|
||||
<script type="text/javascript">
|
||||
function toggleWriteability(id_of_patient, checked) {
|
||||
@@ -54,11 +55,15 @@ echo '<?xml version="1.0" encoding="UTF-8" ?>';
|
||||
</script>
|
||||
</head>
|
||||
<body>
|
||||
|
||||
<h1>HTML Purifier Printer Smoketest</h1>
|
||||
<p>This page will allow you to see precisely what HTML Purifier's internal
|
||||
|
||||
<p>HTML Purifier claims to have a robust yet permissive whitelist: this
|
||||
page will allow you to see precisely what HTML Purifier's internal
|
||||
whitelist is. You can
|
||||
also twiddle with the configuration settings to see how a directive
|
||||
influences the internal workings of the definition objects.</p>
|
||||
|
||||
<h2>Modify configuration</h2>
|
||||
|
||||
<p>You can specify an array by typing in a comma-separated
|
||||
@@ -93,13 +98,14 @@ transformation into a real array list or a lookup table).</p>
|
||||
<label for="<?php echo $directive; ?>">%<?php echo $directive; ?></label>
|
||||
</a>
|
||||
</th>
|
||||
<td>
|
||||
<?php if (is_bool($value)) { ?>
|
||||
<td id="<?php echo $directive; ?>">
|
||||
<label for="Yes_<?php echo $directive; ?>"><span class="c">%<?php echo $directive; ?>:</span> Yes</label>
|
||||
<input type="radio" name="<?php echo $directive; ?>" id="Yes_<?php echo $directive; ?>" value="1"<?php if ($value) { ?> checked="checked"<?php } ?> />
|
||||
<label for="No_<?php echo $directive; ?>"><span class="c">%<?php echo $directive; ?>:</span> No</label>
|
||||
<input type="radio" name="<?php echo $directive; ?>" id="No_<?php echo $directive; ?>" value="0"<?php if (!$value) { ?> checked="checked"<?php } ?> />
|
||||
<?php } else { ?>
|
||||
<td>
|
||||
<?php if($allow_null) { ?>
|
||||
<label for="Null_<?php echo $directive; ?>"><span class="c">%<?php echo $directive; ?>:</span> Null/Disabled*</label>
|
||||
<input
|
||||
@@ -136,6 +142,40 @@ variable and a null variable. A whitelist, for example, will take an
|
||||
empty array as meaning <em>no</em> allowed elements, while checking
|
||||
Null/Disabled will mean that user whitelisting functionality is disabled.</p>
|
||||
</form>
|
||||
|
||||
<h2>Definitions</h2>
|
||||
|
||||
<dl>
|
||||
<dt>Parent of Fragment</dt>
|
||||
<dd>HTML that HTML Purifier does not live in a void: when it's
|
||||
output, it has to be placed in another element by means of
|
||||
something like <code><element> <?php echo $html
|
||||
?> </element></code>. The parent in this example
|
||||
is <code>element</code>.</dd>
|
||||
<dt>Strict mode</dt>
|
||||
<dd>Whether or not HTML Purifier's output is Transitional or
|
||||
Strict compliant. Non-strict mode still actually a little strict
|
||||
and converts many deprecated elements.</dd>
|
||||
<dt>#PCDATA</dt>
|
||||
<dd>Literally <strong>Parsed Character Data</strong>, it is regular
|
||||
text. Tags like <code>ul</code> don't allow text in them, so
|
||||
#PCDATA is missing.</dd>
|
||||
<dt>Tag transform</dt>
|
||||
<dd>A tag transform will change one tag to another. Example: <code>font</code>
|
||||
turns into a <code>span</code> tag with appropriate CSS.</dd>
|
||||
<dt>Attr Transform</dt>
|
||||
<dd>An attribute transform changes a group of attributes based on one
|
||||
another. Currently, only <code>lang</code> and <code>xml:lang</code>
|
||||
use this hook, to synchronize each other's values. Pre/Post indicates
|
||||
whether or not the transform is done before/after validation.</dd>
|
||||
<dt>Excludes</dt>
|
||||
<dd>Tags that an element excludes are excluded for all descendants of
|
||||
that element, and not just the children of them.</dd>
|
||||
<dt>Name(Param1, Param2)</dt>
|
||||
<dd>Represents an internal data-structure. You'll have to check out
|
||||
the corresponding classes in HTML Purifier to find out more.</dd>
|
||||
</dl>
|
||||
|
||||
<h2>HTMLDefinition</h2>
|
||||
<?php echo $printer_html_definition->render($config) ?>
|
||||
<h2>CSSDefinition</h2>
|
||||
|
@@ -1,5 +1,7 @@
|
||||
<?php
|
||||
|
||||
// this file is encoded in UTF-8, please don't let your editor mangle it
|
||||
|
||||
require_once 'common.php';
|
||||
|
||||
echo '<?xml version="1.0" encoding="UTF-8" ?>';
|
||||
|
@@ -978,8 +978,6 @@ alert(a.source)</SCRIPT></code>
|
||||
|
||||
-onErrorUpdate() (fires on a databound object when an error occurs while updating the associated data in the data source object)
|
||||
|
||||
-onExit() (fires when someone clicks on a link or presses the back button)
|
||||
|
||||
-onFilterChange() (fires when a visual filter completes state change)
|
||||
|
||||
-onFinish() (attacker could create the exploit when marquee is finished looping)
|
||||
|
@@ -70,7 +70,10 @@ class Debugger
|
||||
$this->add_pre = !extension_loaded('xdebug');
|
||||
}
|
||||
|
||||
function &instance() {
|
||||
/**
|
||||
* @static
|
||||
*/
|
||||
static function &instance() {
|
||||
static $soleInstance = false;
|
||||
if (!$soleInstance) $soleInstance = new Debugger();
|
||||
return $soleInstance;
|
||||
@@ -142,4 +145,4 @@ class Debugger
|
||||
|
||||
}
|
||||
|
||||
?>
|
||||
?>
|
||||
|
71
tests/HTMLPurifier/AttrDef/BackgroundPositionTest.php
Normal file
71
tests/HTMLPurifier/AttrDef/BackgroundPositionTest.php
Normal file
@@ -0,0 +1,71 @@
|
||||
<?php
|
||||
|
||||
require_once 'HTMLPurifier/AttrDefHarness.php';
|
||||
require_once 'HTMLPurifier/AttrDef/BackgroundPosition.php';
|
||||
|
||||
class HTMLPurifier_AttrDef_BackgroundPositionTest extends HTMLPurifier_AttrDefHarness
|
||||
{
|
||||
|
||||
function test() {
|
||||
|
||||
$this->def = new HTMLPurifier_AttrDef_BackgroundPosition();
|
||||
|
||||
// explicitly cited in spec
|
||||
$this->assertDef('0% 0%');
|
||||
$this->assertDef('100% 100%');
|
||||
$this->assertDef('14% 84%');
|
||||
$this->assertDef('2cm 1cm');
|
||||
$this->assertDef('top');
|
||||
$this->assertDef('left');
|
||||
$this->assertDef('center');
|
||||
$this->assertDef('right');
|
||||
$this->assertDef('bottom');
|
||||
$this->assertDef('left top');
|
||||
$this->assertDef('center top');
|
||||
$this->assertDef('right top');
|
||||
$this->assertDef('left center');
|
||||
$this->assertDef('right center');
|
||||
$this->assertDef('left bottom');
|
||||
$this->assertDef('center bottom');
|
||||
$this->assertDef('right bottom');
|
||||
|
||||
// reordered due to internal impl details
|
||||
$this->assertDef('top left', 'left top');
|
||||
$this->assertDef('top center', 'center top');
|
||||
$this->assertDef('top right', 'right top');
|
||||
$this->assertDef('center left', 'left center');
|
||||
$this->assertDef('center center', 'center'); // two centers collide
|
||||
$this->assertDef('center right', 'right center');
|
||||
$this->assertDef('bottom left', 'left bottom');
|
||||
$this->assertDef('bottom center', 'center bottom');
|
||||
$this->assertDef('bottom right', 'right bottom');
|
||||
|
||||
// more cases from the defined syntax
|
||||
$this->assertDef('1.32in 4ex');
|
||||
$this->assertDef('-14% -84.65%');
|
||||
$this->assertDef('-1in -4ex');
|
||||
$this->assertDef('-1pc 2.3%');
|
||||
|
||||
// keyword mixing
|
||||
$this->assertDef('3em top');
|
||||
$this->assertDef('left 50%');
|
||||
|
||||
// fixable keyword mixing
|
||||
$this->assertDef('top 3em', '3em top');
|
||||
$this->assertDef('50% left', 'left 50%');
|
||||
|
||||
// whitespace collapsing
|
||||
$this->assertDef('3em top', '3em top');
|
||||
$this->assertDef("left\n \t foo ", 'left');
|
||||
|
||||
// invalid uses (we're going to be strict on these)
|
||||
$this->assertDef('foo bar', false);
|
||||
$this->assertDef('left left', 'left');
|
||||
$this->assertDef('left right top bottom center left', 'left bottom');
|
||||
$this->assertDef('0fr 9%', '9%');
|
||||
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
?>
|
21
tests/HTMLPurifier/AttrDef/BackgroundTest.php
Normal file
21
tests/HTMLPurifier/AttrDef/BackgroundTest.php
Normal file
@@ -0,0 +1,21 @@
|
||||
<?php
|
||||
|
||||
require_once 'HTMLPurifier/AttrDefHarness.php';
|
||||
require_once 'HTMLPurifier/AttrDef/Background.php';
|
||||
|
||||
class HTMLPurifier_AttrDef_BackgroundTest extends HTMLPurifier_AttrDefHarness
|
||||
{
|
||||
|
||||
function test() {
|
||||
|
||||
$this->def = new HTMLPurifier_AttrDef_Background(HTMLPurifier_Config::createDefault());
|
||||
|
||||
$valid = '#333 url(chess.png) repeat fixed 50% top';
|
||||
$this->assertDef($valid);
|
||||
$this->assertDef('url("chess.png") #333 50% top repeat fixed', $valid);
|
||||
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
?>
|
@@ -1,6 +1,7 @@
|
||||
<?php
|
||||
|
||||
require_once 'HTMLPurifier/AttrDef/Border.php';
|
||||
require_once 'HTMLPurifier/AttrDef/PixelsTest.php';
|
||||
|
||||
class HTMLPurifier_AttrDef_BorderTest extends HTMLPurifier_AttrDef_PixelsTest
|
||||
{
|
||||
|
@@ -1,6 +1,7 @@
|
||||
<?php
|
||||
|
||||
require_once 'HTMLPurifier/AttrDef/CSSLength.php';
|
||||
require_once 'HTMLPurifier/AttrDefHarness.php';
|
||||
|
||||
class HTMLPurifier_AttrDef_CSSLengthTest extends HTMLPurifier_AttrDefHarness
|
||||
{
|
||||
@@ -21,6 +22,8 @@ class HTMLPurifier_AttrDef_CSSLengthTest extends HTMLPurifier_AttrDefHarness
|
||||
$this->assertDef('3pt');
|
||||
$this->assertDef('3pc');
|
||||
|
||||
$this->assertDef('3PX', '3px');
|
||||
|
||||
$this->assertDef('3', false);
|
||||
$this->assertDef('3miles', false);
|
||||
|
||||
|
@@ -1,6 +1,7 @@
|
||||
<?php
|
||||
|
||||
require_once 'HTMLPurifier/AttrDef/CSS.php';
|
||||
require_once 'HTMLPurifier/AttrDefHarness.php';
|
||||
|
||||
class HTMLPurifier_AttrDef_CSSTest extends HTMLPurifier_AttrDefHarness
|
||||
{
|
||||
@@ -24,7 +25,7 @@ class HTMLPurifier_AttrDef_CSSTest extends HTMLPurifier_AttrDefHarness
|
||||
$this->assertDef('text-transform:capitalize;');
|
||||
$this->assertDef('background-color:rgb(0,0,255);');
|
||||
$this->assertDef('background-color:transparent;');
|
||||
$this->assertDef('background:#FF9;');
|
||||
$this->assertDef('background:#333 url(chess.png) repeat fixed 50% top;');
|
||||
$this->assertDef('color:#F00;');
|
||||
$this->assertDef('border-top-color:#F00;');
|
||||
$this->assertDef('border-color:#F00 #FF0;');
|
||||
@@ -71,6 +72,13 @@ class HTMLPurifier_AttrDef_CSSTest extends HTMLPurifier_AttrDefHarness
|
||||
$this->assertDef('vertical-align:12px;');
|
||||
$this->assertDef('vertical-align:50%;');
|
||||
$this->assertDef('table-layout:fixed;');
|
||||
$this->assertDef('list-style-image:url(nice.jpg);');
|
||||
$this->assertDef('list-style:disc url(nice.jpg) inside;');
|
||||
$this->assertDef('background-image:url(foo.jpg);');
|
||||
$this->assertDef('background-image:none;');
|
||||
$this->assertDef('background-repeat:repeat-y;');
|
||||
$this->assertDef('background-attachment:fixed;');
|
||||
$this->assertDef('background-position:left 90%;');
|
||||
|
||||
// duplicates
|
||||
$this->assertDef('text-align:right;text-align:left;',
|
||||
|
37
tests/HTMLPurifier/AttrDef/CSSURITest.php
Normal file
37
tests/HTMLPurifier/AttrDef/CSSURITest.php
Normal file
@@ -0,0 +1,37 @@
|
||||
<?php
|
||||
|
||||
require_once 'HTMLPurifier/AttrDef/CSSURI.php';
|
||||
require_once 'HTMLPurifier/AttrDefHarness.php';
|
||||
|
||||
class HTMLPurifier_AttrDef_CSSURITest extends HTMLPurifier_AttrDefHarness
|
||||
{
|
||||
|
||||
function test() {
|
||||
|
||||
$this->def = new HTMLPurifier_AttrDef_CSSURI();
|
||||
|
||||
$this->assertDef('', false);
|
||||
|
||||
// we could be nice but we won't be
|
||||
$this->assertDef('http://www.example.com/', false);
|
||||
|
||||
// no quotes are used, since that's the most widely supported
|
||||
// syntax
|
||||
$this->assertDef('url(', false);
|
||||
$this->assertDef('url()', true);
|
||||
$result = "url(http://www.example.com/)";
|
||||
$this->assertDef('url(http://www.example.com/)', $result);
|
||||
$this->assertDef('url("http://www.example.com/")', $result);
|
||||
$this->assertDef("url('http://www.example.com/')", $result);
|
||||
$this->assertDef(
|
||||
' url( "http://www.example.com/" ) ', $result);
|
||||
|
||||
// escaping
|
||||
$this->assertDef("url(http://www.example.com/foo,bar\))",
|
||||
"url(http://www.example.com/foo\,bar\))");
|
||||
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
?>
|
@@ -1,6 +1,7 @@
|
||||
<?php
|
||||
|
||||
require_once 'HTMLPurifier/AttrDef/Color.php';
|
||||
require_once 'HTMLPurifier/AttrDefHarness.php';
|
||||
|
||||
class HTMLPurifier_AttrDef_ColorTest extends HTMLPurifier_AttrDefHarness
|
||||
{
|
||||
|
@@ -1,6 +1,7 @@
|
||||
<?php
|
||||
|
||||
require_once 'HTMLPurifier/AttrDef/Composite.php';
|
||||
require_once 'HTMLPurifier/AttrDefHarness.php';
|
||||
|
||||
class HTMLPurifier_AttrDef_Composite_Testable extends
|
||||
HTMLPurifier_AttrDef_Composite
|
||||
@@ -28,10 +29,10 @@ class HTMLPurifier_AttrDef_CompositeTest extends HTMLPurifier_AttrDefHarness
|
||||
// first test: value properly validates on first definition
|
||||
// so second def is never called
|
||||
|
||||
$def1 =& new HTMLPurifier_AttrDefMock($this);
|
||||
$def2 =& new HTMLPurifier_AttrDefMock($this);
|
||||
$def1 = new HTMLPurifier_AttrDefMock($this);
|
||||
$def2 = new HTMLPurifier_AttrDefMock($this);
|
||||
$defs = array(&$def1, &$def2);
|
||||
$def =& new HTMLPurifier_AttrDef_Composite_Testable($defs);
|
||||
$def = new HTMLPurifier_AttrDef_Composite_Testable($defs);
|
||||
$input = 'FOOBAR';
|
||||
$output = 'foobar';
|
||||
$def1_params = array($input, $config, $context);
|
||||
@@ -47,10 +48,10 @@ class HTMLPurifier_AttrDef_CompositeTest extends HTMLPurifier_AttrDefHarness
|
||||
|
||||
// second test, first def fails, second def works
|
||||
|
||||
$def1 =& new HTMLPurifier_AttrDefMock($this);
|
||||
$def2 =& new HTMLPurifier_AttrDefMock($this);
|
||||
$def1 = new HTMLPurifier_AttrDefMock($this);
|
||||
$def2 = new HTMLPurifier_AttrDefMock($this);
|
||||
$defs = array(&$def1, &$def2);
|
||||
$def =& new HTMLPurifier_AttrDef_Composite_Testable($defs);
|
||||
$def = new HTMLPurifier_AttrDef_Composite_Testable($defs);
|
||||
$input = 'BOOMA';
|
||||
$output = 'booma';
|
||||
$def_params = array($input, $config, $context);
|
||||
@@ -67,10 +68,10 @@ class HTMLPurifier_AttrDef_CompositeTest extends HTMLPurifier_AttrDefHarness
|
||||
|
||||
// third test, all fail, so composite faiils
|
||||
|
||||
$def1 =& new HTMLPurifier_AttrDefMock($this);
|
||||
$def2 =& new HTMLPurifier_AttrDefMock($this);
|
||||
$def1 = new HTMLPurifier_AttrDefMock($this);
|
||||
$def2 = new HTMLPurifier_AttrDefMock($this);
|
||||
$defs = array(&$def1, &$def2);
|
||||
$def =& new HTMLPurifier_AttrDef_Composite_Testable($defs);
|
||||
$def = new HTMLPurifier_AttrDef_Composite_Testable($defs);
|
||||
$input = 'BOOMA';
|
||||
$output = false;
|
||||
$def_params = array($input, $config, $context);
|
||||
|
@@ -1,6 +1,7 @@
|
||||
<?php
|
||||
|
||||
require_once 'HTMLPurifier/AttrDef/Email.php';
|
||||
require_once 'HTMLPurifier/AttrDefHarness.php';
|
||||
|
||||
class HTMLPurifier_AttrDef_EmailHarness extends HTMLPurifier_AttrDefHarness
|
||||
{
|
||||
|
@@ -66,13 +66,11 @@ class HTMLPurifier_AttrDef_IDTest extends HTMLPurifier_AttrDefHarness
|
||||
$this->assertDef('user_story95_alas');
|
||||
$this->assertDef('user_alas', 'user_story95_user_alas'); // !
|
||||
|
||||
// no effect when IDPrefix isn't set
|
||||
$this->config->set('Attr', 'IDPrefix', '');
|
||||
$this->assertDef('amherst'); // no affect when IDPrefix isn't set
|
||||
$this->assertError('%Attr.IDPrefixLocal cannot be used unless '.
|
||||
$this->expectError('%Attr.IDPrefixLocal cannot be used unless '.
|
||||
'%Attr.IDPrefix is set');
|
||||
// SimpleTest has a bug and throws a sprintf error
|
||||
// $this->assertNoErrors();
|
||||
$this->swallowErrors();
|
||||
$this->assertDef('amherst');
|
||||
|
||||
}
|
||||
|
||||
|
@@ -1,6 +1,7 @@
|
||||
<?php
|
||||
|
||||
require_once 'HTMLPurifier/AttrDef/Integer.php';
|
||||
require_once 'HTMLPurifier/AttrDefHarness.php';
|
||||
|
||||
class HTMLPurifier_AttrDef_IntegerTest extends HTMLPurifier_AttrDefHarness
|
||||
{
|
||||
|
@@ -15,9 +15,20 @@ class HTMLPurifier_AttrDef_ListStyleTest extends HTMLPurifier_AttrDefHarness
|
||||
$this->assertDef('circle outside');
|
||||
$this->assertDef('inside');
|
||||
$this->assertDef('none');
|
||||
$this->assertDef('url(foo.gif)');
|
||||
$this->assertDef('circle url(foo.gif) inside');
|
||||
|
||||
// invalid values
|
||||
$this->assertDef('outside inside', 'outside');
|
||||
|
||||
// ordering
|
||||
$this->assertDef('url(foo.gif) none', 'none url(foo.gif)');
|
||||
$this->assertDef('circle lower-alpha', 'circle');
|
||||
// the spec is ambiguous about what happens in these
|
||||
// cases, so we're going off the W3C CSS validator
|
||||
$this->assertDef('disc none', 'disc');
|
||||
$this->assertDef('none disc', 'none');
|
||||
|
||||
|
||||
}
|
||||
|
||||
|
@@ -1,7 +1,10 @@
|
||||
<?php
|
||||
|
||||
require_once 'HTMLPurifier/AttrDef/Integer.php';
|
||||
require_once 'HTMLPurifier/AttrDef/Multiple.php';
|
||||
require_once 'HTMLPurifier/AttrDefHarness.php';
|
||||
|
||||
// borrowed for the sakes of this test
|
||||
require_once 'HTMLPurifier/AttrDef/Integer.php';
|
||||
|
||||
class HTMLPurifier_AttrDef_MultipleTest extends HTMLPurifier_AttrDefHarness
|
||||
{
|
||||
|
@@ -1,6 +1,7 @@
|
||||
<?php
|
||||
|
||||
require_once 'HTMLPurifier/AttrDef/Number.php';
|
||||
require_once 'HTMLPurifier/AttrDefHarness.php';
|
||||
|
||||
class HTMLPurifier_AttrDef_NumberTest extends HTMLPurifier_AttrDefHarness
|
||||
{
|
||||
|
@@ -1,6 +1,7 @@
|
||||
<?php
|
||||
|
||||
require_once 'HTMLPurifier/AttrDef/Percentage.php';
|
||||
require_once 'HTMLPurifier/AttrDefHarness.php';
|
||||
|
||||
class HTMLPurifier_AttrDef_PercentageTest extends HTMLPurifier_AttrDefHarness
|
||||
{
|
||||
|
@@ -206,7 +206,7 @@ class HTMLPurifier_AttrDef_URITest extends HTMLPurifier_AttrDefHarness
|
||||
$registry =& HTMLPurifier_URISchemeRegistry::instance($fake_registry);
|
||||
|
||||
// now, let's add a pseudo-scheme to the registry
|
||||
$this->scheme =& new HTMLPurifier_URISchemeMock($this);
|
||||
$this->scheme = new HTMLPurifier_URISchemeMock($this);
|
||||
|
||||
// here are the schemes we will support with overloaded mocks
|
||||
$registry->setReturnReference('getScheme', $this->scheme, array('http', $this->config, $this->context));
|
||||
|
@@ -1,6 +1,7 @@
|
||||
<?php
|
||||
|
||||
require_once 'HTMLPurifier/AttrTransform/BdoDir.php';
|
||||
require_once 'HTMLPurifier/AttrTransformHarness.php';
|
||||
|
||||
class HTMLPurifier_AttrTransform_BdoDirTest extends HTMLPurifier_AttrTransformHarness
|
||||
{
|
||||
|
@@ -1,6 +1,7 @@
|
||||
<?php
|
||||
|
||||
require_once 'HTMLPurifier/AttrTransform/ImgRequired.php';
|
||||
require_once 'HTMLPurifier/AttrTransformHarness.php';
|
||||
|
||||
class HTMLPurifier_AttrTransform_ImgRequiredTest extends HTMLPurifier_AttrTransformHarness
|
||||
{
|
||||
|
@@ -1,6 +1,7 @@
|
||||
<?php
|
||||
|
||||
require_once 'HTMLPurifier/AttrTransform/TextAlign.php';
|
||||
require_once 'HTMLPurifier/AttrTransformHarness.php';
|
||||
|
||||
class HTMLPurifier_AttrTransform_TextAlignTest extends HTMLPurifier_AttrTransformHarness
|
||||
{
|
||||
|
@@ -38,10 +38,9 @@ extends HTMLPurifier_ChildDefHarness
|
||||
$this->assertResult('Needs wrap', '<div>Needs wrap</div>',
|
||||
array('HTML.BlockWrapper' => 'div'));
|
||||
|
||||
$this->expectError('Cannot use non-block element as block wrapper.');
|
||||
$this->assertResult('Needs wrap', '<p>Needs wrap</p>',
|
||||
array('HTML.BlockWrapper' => 'dav'));
|
||||
$this->assertError('Cannot use non-block element as block wrapper.');
|
||||
$this->assertNoErrors();
|
||||
|
||||
}
|
||||
|
||||
|
@@ -2,10 +2,26 @@
|
||||
|
||||
require_once 'HTMLPurifier/ConfigSchema.php';
|
||||
|
||||
if (!class_exists('CS')) {
|
||||
class CS extends HTMLPurifier_ConfigSchema {}
|
||||
}
|
||||
|
||||
class HTMLPurifier_ConfigSchemaTest extends UnitTestCase
|
||||
{
|
||||
|
||||
/**
|
||||
* Munged name of current file.
|
||||
*/
|
||||
var $file;
|
||||
|
||||
/**
|
||||
* Copy of the real ConfigSchema to revert to.
|
||||
*/
|
||||
var $old_copy;
|
||||
|
||||
/**
|
||||
* Copy of dummy ConfigSchema for testing purposes.
|
||||
*/
|
||||
var $our_copy;
|
||||
|
||||
function setUp() {
|
||||
@@ -18,239 +34,214 @@ class HTMLPurifier_ConfigSchemaTest extends UnitTestCase
|
||||
$this->old_copy = HTMLPurifier_ConfigSchema::instance();
|
||||
// put in our copy, and reassign to the REAL reference
|
||||
$this->our_copy =& HTMLPurifier_ConfigSchema::instance($our_copy);
|
||||
|
||||
$this->file = $this->our_copy->mungeFilename(__FILE__);
|
||||
}
|
||||
|
||||
function tearDown() {
|
||||
// testing is done, restore the old copy
|
||||
HTMLPurifier_ConfigSchema::instance($this->old_copy);
|
||||
tally_errors();
|
||||
}
|
||||
|
||||
function testNormal() {
|
||||
function test_defineNamespace() {
|
||||
CS::defineNamespace('http', $d = 'This is an internet protocol.');
|
||||
|
||||
$file = $this->our_copy->mungeFilename(__FILE__);
|
||||
|
||||
// define a namespace
|
||||
$description = 'Configuration that is always available.';
|
||||
HTMLPurifier_ConfigSchema::defineNamespace(
|
||||
'Core', $description
|
||||
);
|
||||
$this->assertIdentical($this->our_copy->defaults, array(
|
||||
'Core' => array()
|
||||
));
|
||||
$this->assertIdentical($this->our_copy->info, array(
|
||||
'Core' => array()
|
||||
));
|
||||
$namespace = new HTMLPurifier_ConfigEntity_Namespace();
|
||||
$namespace->description = $description;
|
||||
$this->assertIdentical($this->our_copy->info_namespace, array(
|
||||
'Core' => $namespace
|
||||
'http' => new HTMLPurifier_ConfigEntity_Namespace($d)
|
||||
));
|
||||
|
||||
$this->expectError('Cannot redefine namespace');
|
||||
CS::defineNamespace('http', 'It is used to serve webpages.');
|
||||
|
||||
$this->expectError('Namespace name must be alphanumeric');
|
||||
CS::defineNamespace('ssh+http', 'This http is tunneled through SSH.');
|
||||
|
||||
// define a directive
|
||||
$description = 'This is a description of the directive.';
|
||||
HTMLPurifier_ConfigSchema::define(
|
||||
'Core', 'Name', 'default value', 'string',
|
||||
$description
|
||||
); $line = __LINE__;
|
||||
$this->assertIdentical($this->our_copy->defaults, array(
|
||||
'Core' => array(
|
||||
'Name' => 'default value'
|
||||
)
|
||||
));
|
||||
$directive = new HTMLPurifier_ConfigEntity_Directive();
|
||||
$directive->type = 'string';
|
||||
$directive->addDescription($file, $line, $description);
|
||||
$this->assertIdentical($this->our_copy->info, array(
|
||||
'Core' => array(
|
||||
'Name' => $directive
|
||||
)
|
||||
));
|
||||
$this->expectError('Description must be non-empty');
|
||||
CS::defineNamespace('ftp', null);
|
||||
}
|
||||
|
||||
function test_define() {
|
||||
CS::defineNamespace('Car', 'Automobiles, those gas-guzzlers!');
|
||||
|
||||
CS::define('Car', 'Seats', 5, 'int', $d = 'Standard issue.'); $l = __LINE__;
|
||||
|
||||
|
||||
// define a directive in an undefined namespace
|
||||
HTMLPurifier_ConfigSchema::define(
|
||||
'Extension', 'Name', false, 'bool',
|
||||
'This is for an extension, but we have not defined its namespace!'
|
||||
);
|
||||
$this->assertError('Cannot define directive for undefined namespace');
|
||||
$this->assertNoErrors();
|
||||
|
||||
|
||||
|
||||
// redefine a value in a valid manner
|
||||
$description = 'Alternative configuration definition';
|
||||
HTMLPurifier_ConfigSchema::define(
|
||||
'Core', 'Name', 'default value', 'string',
|
||||
$description
|
||||
); $line = __LINE__;
|
||||
$this->assertNoErrors();
|
||||
$directive->addDescription($file, $line, $description);
|
||||
$this->assertIdentical($this->our_copy->info, array(
|
||||
'Core' => array(
|
||||
'Name' => $directive
|
||||
)
|
||||
));
|
||||
|
||||
|
||||
|
||||
// redefine a directive in an invalid manner
|
||||
HTMLPurifier_ConfigSchema::define(
|
||||
'Core', 'Name', 'different default', 'string',
|
||||
'Inconsistent default or type, cannot redefine'
|
||||
);
|
||||
$this->assertError('Inconsistent default or type, cannot redefine');
|
||||
$this->assertNoErrors();
|
||||
|
||||
|
||||
|
||||
// make an enumeration
|
||||
HTMLPurifier_ConfigSchema::defineAllowedValues(
|
||||
'Core', 'Name', array(
|
||||
'Real Value',
|
||||
'Real Value 2'
|
||||
$this->assertIdentical($this->our_copy->defaults['Car']['Seats'], 5);
|
||||
$this->assertIdentical($this->our_copy->info['Car']['Seats'],
|
||||
new HTMLPurifier_ConfigEntity_Directive('int',
|
||||
array($this->file => array($l => $d))
|
||||
)
|
||||
);
|
||||
$directive->allowed = array(
|
||||
'Real Value' => true,
|
||||
'Real Value 2' => true
|
||||
);
|
||||
$this->assertIdentical($this->our_copy->info, array(
|
||||
'Core' => array(
|
||||
'Name' => $directive
|
||||
)
|
||||
));
|
||||
|
||||
CS::define('Car', 'Age', null, 'int/null', $d = 'Not always known.'); $l = __LINE__;
|
||||
|
||||
|
||||
// redefinition of enumeration is cumulative
|
||||
HTMLPurifier_ConfigSchema::defineAllowedValues(
|
||||
'Core', 'Name', array(
|
||||
'Real Value 3',
|
||||
$this->assertIdentical($this->our_copy->defaults['Car']['Age'], null);
|
||||
$this->assertIdentical($this->our_copy->info['Car']['Age'],
|
||||
new HTMLPurifier_ConfigEntity_Directive('int',
|
||||
array($this->file => array($l => $d)), true
|
||||
)
|
||||
);
|
||||
$directive->allowed['Real Value 3'] = true;
|
||||
$this->assertIdentical($this->our_copy->info, array(
|
||||
'Core' => array(
|
||||
'Name' => $directive
|
||||
)
|
||||
));
|
||||
|
||||
$this->expectError('Cannot define directive for undefined namespace');
|
||||
CS::define('Train', 'Cars', 10, 'int', 'Including the caboose.');
|
||||
|
||||
$this->expectError('Directive name must be alphanumeric');
|
||||
CS::define('Car', 'Is it shiny?', true, 'bool', 'Indicates regular waxing.');
|
||||
|
||||
// cannot define enumeration for undefined directive
|
||||
HTMLPurifier_ConfigSchema::defineAllowedValues(
|
||||
'Core', 'Foobar', array(
|
||||
'Real Value 9',
|
||||
$this->expectError('Invalid type for configuration directive');
|
||||
CS::define('Car', 'Efficiency', 50, 'mpg', 'The higher the better.');
|
||||
|
||||
$this->expectError('Default value does not match directive type');
|
||||
CS::define('Car', 'Producer', 'Ford', 'int', 'ID of the company that made the car.');
|
||||
|
||||
$this->expectError('Description must be non-empty');
|
||||
CS::define('Car', 'ComplexAttribute', 'lawyers', 'istring', null);
|
||||
}
|
||||
|
||||
function testRedefinition_define() {
|
||||
CS::defineNamespace('Cat', 'Belongs to Schrodinger.');
|
||||
|
||||
CS::define('Cat', 'Dead', false, 'bool', $d1 = 'Well, is it?'); $l1 = __LINE__;
|
||||
CS::define('Cat', 'Dead', false, 'bool', $d2 = 'It is difficult to say.'); $l2 = __LINE__;
|
||||
|
||||
$this->assertIdentical($this->our_copy->defaults['Cat']['Dead'], false);
|
||||
$this->assertIdentical($this->our_copy->info['Cat']['Dead'],
|
||||
new HTMLPurifier_ConfigEntity_Directive('bool',
|
||||
array($this->file => array($l1 => $d1, $l2 => $d2))
|
||||
)
|
||||
);
|
||||
$this->assertError('Cannot define allowed values for undefined directive');
|
||||
$this->assertNoErrors();
|
||||
|
||||
$this->expectError('Inconsistent default or type, cannot redefine');
|
||||
CS::define('Cat', 'Dead', true, 'bool', 'Quantum mechanics does not know.');
|
||||
|
||||
$this->expectError('Inconsistent default or type, cannot redefine');
|
||||
CS::define('Cat', 'Dead', 'maybe', 'string', 'Perhaps if we look we will know.');
|
||||
}
|
||||
|
||||
function test_defineAllowedValues() {
|
||||
CS::defineNamespace('QuantumNumber', 'D');
|
||||
CS::define('QuantumNumber', 'Spin', 0.5, 'float',
|
||||
'Spin of particle. Fourth quantum number, represented by s.');
|
||||
CS::define('QuantumNumber', 'Current', 's', 'string',
|
||||
'Currently selected quantum number.');
|
||||
CS::define('QuantumNumber', 'Difficulty', null, 'string/null', $d = 'How hard are the problems?'); $l = __LINE__;
|
||||
|
||||
// test defining value aliases for an enumerated value
|
||||
HTMLPurifier_ConfigSchema::defineValueAliases(
|
||||
'Core', 'Name', array(
|
||||
'Aliased Value' => 'Real Value'
|
||||
CS::defineAllowedValues( // okay, since default is null
|
||||
'QuantumNumber', 'Difficulty', array('easy', 'medium', 'hard')
|
||||
);
|
||||
|
||||
$this->assertIdentical($this->our_copy->defaults['QuantumNumber']['Difficulty'], null);
|
||||
$this->assertIdentical($this->our_copy->info['QuantumNumber']['Difficulty'],
|
||||
new HTMLPurifier_ConfigEntity_Directive(
|
||||
'string',
|
||||
array($this->file => array($l => $d)),
|
||||
true,
|
||||
array(
|
||||
'easy' => true,
|
||||
'medium' => true,
|
||||
'hard' => true
|
||||
)
|
||||
)
|
||||
);
|
||||
$directive->aliases['Aliased Value'] = 'Real Value';
|
||||
$this->assertIdentical($this->our_copy->info, array(
|
||||
'Core' => array(
|
||||
'Name' => $directive
|
||||
)
|
||||
));
|
||||
|
||||
$this->expectError('Cannot define allowed values for undefined directive');
|
||||
CS::defineAllowedValues(
|
||||
'SpaceTime', 'Symmetry', array('time', 'spatial', 'projective')
|
||||
);
|
||||
|
||||
$this->expectError('Cannot define allowed values for directive whose type is not string');
|
||||
CS::defineAllowedValues(
|
||||
'QuantumNumber', 'Spin', array(0.5, -0.5)
|
||||
);
|
||||
|
||||
// redefine should be cumulative
|
||||
HTMLPurifier_ConfigSchema::defineValueAliases(
|
||||
'Core', 'Name', array(
|
||||
'Aliased Value 2' => 'Real Value 2'
|
||||
$this->expectError('Default value must be in allowed range of variables');
|
||||
CS::defineAllowedValues(
|
||||
'QuantumNumber', 'Current', array('n', 'l', 'm') // forgot s!
|
||||
);
|
||||
}
|
||||
|
||||
function test_defineValueAliases() {
|
||||
CS::defineNamespace('Abbrev', 'Stuff on abbreviations.');
|
||||
CS::define('Abbrev', 'HTH', 'Happy to Help', 'string', $d = 'Three-letters'); $l = __LINE__;
|
||||
CS::defineAllowedValues(
|
||||
'Abbrev', 'HTH', array(
|
||||
'Happy to Help',
|
||||
'Hope that Helps',
|
||||
'HAIL THE HAND!'
|
||||
)
|
||||
);
|
||||
$directive->aliases['Aliased Value 2'] = 'Real Value 2';
|
||||
$this->assertIdentical($this->our_copy->info, array(
|
||||
'Core' => array(
|
||||
'Name' => $directive
|
||||
)
|
||||
));
|
||||
|
||||
|
||||
|
||||
// cannot create alias to not-allowed value
|
||||
HTMLPurifier_ConfigSchema::defineValueAliases(
|
||||
'Core', 'Name', array(
|
||||
'Aliased Value 3' => 'Invalid Value'
|
||||
CS::defineValueAliases(
|
||||
'Abbrev', 'HTH', array(
|
||||
'happy' => 'Happy to Help',
|
||||
'hope' => 'Hope that Helps'
|
||||
)
|
||||
);
|
||||
$this->assertError('Cannot define alias to value that is not allowed');
|
||||
$this->assertNoErrors();
|
||||
|
||||
|
||||
|
||||
// cannot create alias for already allowed value
|
||||
HTMLPurifier_ConfigSchema::defineValueAliases(
|
||||
'Core', 'Name', array(
|
||||
'Real Value' => 'Real Value 2'
|
||||
CS::defineValueAliases( // delayed addition
|
||||
'Abbrev', 'HTH', array(
|
||||
'hail' => 'HAIL THE HAND!'
|
||||
)
|
||||
);
|
||||
$this->assertError('Cannot define alias over allowed value');
|
||||
$this->assertNoErrors();
|
||||
|
||||
|
||||
|
||||
// define a directive with an invalid type
|
||||
HTMLPurifier_ConfigSchema::define(
|
||||
'Core', 'Foobar', false, 'omen',
|
||||
'Omen is not a valid type, so we reject this.'
|
||||
$this->assertIdentical($this->our_copy->defaults['Abbrev']['HTH'], 'Happy to Help');
|
||||
$this->assertIdentical($this->our_copy->info['Abbrev']['HTH'],
|
||||
new HTMLPurifier_ConfigEntity_Directive(
|
||||
'string',
|
||||
array($this->file => array($l => $d)),
|
||||
false,
|
||||
array(
|
||||
'Happy to Help' => true,
|
||||
'Hope that Helps' => true,
|
||||
'HAIL THE HAND!' => true
|
||||
),
|
||||
array(
|
||||
'happy' => 'Happy to Help',
|
||||
'hope' => 'Hope that Helps',
|
||||
'hail' => 'HAIL THE HAND!'
|
||||
)
|
||||
)
|
||||
);
|
||||
|
||||
$this->assertError('Invalid type for configuration directive');
|
||||
$this->assertNoErrors();
|
||||
|
||||
|
||||
|
||||
// define a directive with inconsistent type
|
||||
HTMLPurifier_ConfigSchema::define(
|
||||
'Core', 'Foobaz', 10, 'string',
|
||||
'If we say string, we should mean it, not integer 10.'
|
||||
$this->expectError('Cannot define alias to value that is not allowed');
|
||||
CS::defineValueAliases(
|
||||
'Abbrev', 'HTH', array(
|
||||
'head' => 'Head to Head'
|
||||
)
|
||||
);
|
||||
|
||||
$this->assertError('Default value does not match directive type');
|
||||
$this->assertNoErrors();
|
||||
|
||||
|
||||
|
||||
// define a directive that allows null
|
||||
HTMLPurifier_ConfigSchema::define(
|
||||
'Core', 'Foobaz', null, 'string/null',
|
||||
'Nulls are allowed if you add on /null, cool huh?'
|
||||
$this->expectError('Cannot define alias over allowed value');
|
||||
CS::defineValueAliases(
|
||||
'Abbrev', 'HTH', array(
|
||||
'Hope that Helps' => 'Happy to Help'
|
||||
)
|
||||
);
|
||||
|
||||
$this->assertNoErrors();
|
||||
}
|
||||
|
||||
function testAlias() {
|
||||
CS::defineNamespace('Home', 'Sweet home.');
|
||||
CS::define('Home', 'Rug', 3, 'int', 'ID.');
|
||||
CS::defineAlias('Home', 'Carpet', 'Home', 'Rug');
|
||||
|
||||
|
||||
// define a directive with bad characters
|
||||
HTMLPurifier_ConfigSchema::define(
|
||||
'Core', 'Core.Attr', 10, 'int',
|
||||
'No periods! >:-('
|
||||
$this->assertTrue(!isset($this->our_copy->defaults['Home']['Carpet']));
|
||||
$this->assertIdentical($this->our_copy->info['Home']['Carpet'],
|
||||
new HTMLPurifier_ConfigEntity_DirectiveAlias('Home', 'Rug')
|
||||
);
|
||||
|
||||
$this->assertError('Directive name must be alphanumeric');
|
||||
$this->assertNoErrors();
|
||||
$this->expectError('Cannot define directive alias in undefined namespace');
|
||||
CS::defineAlias('Store', 'Rug', 'Home', 'Rug');
|
||||
|
||||
// define a namespace with bad characters
|
||||
HTMLPurifier_ConfigSchema::defineNamespace(
|
||||
'Foobar&Gromit', $description
|
||||
);
|
||||
$this->expectError('Directive name must be alphanumeric');
|
||||
CS::defineAlias('Home', 'R.g', 'Home', 'Rug');
|
||||
|
||||
$this->assertError('Namespace name must be alphanumeric');
|
||||
$this->assertNoErrors();
|
||||
CS::define('Home', 'Rugger', 'Bob Max', 'string', 'Name of.');
|
||||
$this->expectError('Cannot define alias over directive');
|
||||
CS::defineAlias('Home', 'Rugger', 'Home', 'Rug');
|
||||
|
||||
$this->expectError('Cannot define alias to undefined directive');
|
||||
CS::defineAlias('Home', 'Rug2', 'Home', 'Rugavan');
|
||||
|
||||
$this->expectError('Cannot define alias to alias');
|
||||
CS::defineAlias('Home', 'Rug2', 'Home', 'Carpet');
|
||||
}
|
||||
|
||||
function assertValid($var, $type, $ret = null) {
|
||||
@@ -270,25 +261,32 @@ class HTMLPurifier_ConfigSchemaTest extends UnitTestCase
|
||||
|
||||
$this->assertValid('foobar', 'string');
|
||||
$this->assertValid('FOOBAR', 'istring', 'foobar');
|
||||
|
||||
$this->assertValid(34, 'int');
|
||||
|
||||
$this->assertValid(3.34, 'float');
|
||||
|
||||
$this->assertValid(false, 'bool');
|
||||
$this->assertValid(0, 'bool', false);
|
||||
$this->assertValid(1, 'bool', true);
|
||||
$this->assertInvalid(34, 'bool');
|
||||
$this->assertInvalid(null, 'bool');
|
||||
$this->assertValid(array('1', '2', '3'), 'list');
|
||||
$this->assertValid(array('1' => true, '2' => true), 'lookup');
|
||||
$this->assertValid(array('1', '2'), 'lookup', array('1' => true, '2' => true));
|
||||
$this->assertValid(array('foo' => 'bar'), 'hash');
|
||||
$this->assertInvalid(array(0 => 'moo'), 'hash');
|
||||
$this->assertValid(array(1 => 'moo'), 'hash');
|
||||
$this->assertValid(23, 'mixed');
|
||||
$this->assertValid('foo,bar, cow', 'list', array('foo', 'bar', 'cow'));
|
||||
$this->assertValid('foo,bar', 'lookup', array('foo' => true, 'bar' => true));
|
||||
$this->assertValid('true', 'bool', true);
|
||||
$this->assertValid('false', 'bool', false);
|
||||
$this->assertValid('1', 'bool', true);
|
||||
$this->assertInvalid(34, 'bool');
|
||||
$this->assertInvalid(null, 'bool');
|
||||
|
||||
$this->assertValid(array('1', '2', '3'), 'list');
|
||||
$this->assertValid('foo,bar, cow', 'list', array('foo', 'bar', 'cow'));
|
||||
|
||||
$this->assertValid(array('1' => true, '2' => true), 'lookup');
|
||||
$this->assertValid(array('1', '2'), 'lookup', array('1' => true, '2' => true));
|
||||
$this->assertValid('foo,bar', 'lookup', array('foo' => true, 'bar' => true));
|
||||
|
||||
$this->assertValid(array('foo' => 'bar'), 'hash');
|
||||
$this->assertValid(array(1 => 'moo'), 'hash');
|
||||
$this->assertInvalid(array(0 => 'moo'), 'hash');
|
||||
|
||||
$this->assertValid(23, 'mixed');
|
||||
|
||||
}
|
||||
|
||||
@@ -318,12 +316,12 @@ class HTMLPurifier_ConfigSchemaTest extends UnitTestCase
|
||||
function testMungeFilename() {
|
||||
|
||||
$this->assertMungeFilename(
|
||||
'C:\\php\\libs\\htmlpurifier\\library\\HTMLPurifier\\AttrDef.php',
|
||||
'C:\\php\\My Libraries\\htmlpurifier\\library\\HTMLPurifier\\AttrDef.php',
|
||||
'HTMLPurifier/AttrDef.php'
|
||||
);
|
||||
|
||||
$this->assertMungeFilename(
|
||||
'C:\\php\\libs\\htmlpurifier\\library\\HTMLPurifier.php',
|
||||
'C:\\php\\My Libraries\\htmlpurifier\\library\\HTMLPurifier.php',
|
||||
'HTMLPurifier.php'
|
||||
);
|
||||
|
||||
|
2
tests/HTMLPurifier/ConfigTest-create.ini
Normal file
2
tests/HTMLPurifier/ConfigTest-create.ini
Normal file
@@ -0,0 +1,2 @@
|
||||
[Cake]
|
||||
Sprinkles = 42
|
4
tests/HTMLPurifier/ConfigTest-loadIni.ini
Normal file
4
tests/HTMLPurifier/ConfigTest-loadIni.ini
Normal file
@@ -0,0 +1,4 @@
|
||||
[Shortcut]
|
||||
Copy = q
|
||||
Cut = t
|
||||
Paste = p
|
@@ -2,6 +2,10 @@
|
||||
|
||||
require_once 'HTMLPurifier/Config.php';
|
||||
|
||||
if (!class_exists('CS')) {
|
||||
class CS extends HTMLPurifier_ConfigSchema {}
|
||||
}
|
||||
|
||||
class HTMLPurifier_ConfigTest extends UnitTestCase
|
||||
{
|
||||
|
||||
@@ -16,109 +20,199 @@ class HTMLPurifier_ConfigTest extends UnitTestCase
|
||||
|
||||
function tearDown() {
|
||||
HTMLPurifier_ConfigSchema::instance($this->old_copy);
|
||||
tally_errors();
|
||||
}
|
||||
|
||||
function test() {
|
||||
// test functionality based on ConfigSchema
|
||||
|
||||
function testNormal() {
|
||||
CS::defineNamespace('Element', 'Chemical substances that cannot be further decomposed');
|
||||
|
||||
HTMLPurifier_ConfigSchema::defineNamespace('Core', 'Corestuff');
|
||||
HTMLPurifier_ConfigSchema::defineNamespace('Attr', 'Attributes');
|
||||
HTMLPurifier_ConfigSchema::defineNamespace('Extension', 'Extensible');
|
||||
|
||||
HTMLPurifier_ConfigSchema::define(
|
||||
'Core', 'Key', false, 'bool', 'A boolean directive.'
|
||||
);
|
||||
HTMLPurifier_ConfigSchema::define(
|
||||
'Attr', 'Key', 42, 'int', 'An integer directive.'
|
||||
);
|
||||
HTMLPurifier_ConfigSchema::define(
|
||||
'Extension', 'Pert', 'foo', 'string', 'A string directive.'
|
||||
);
|
||||
HTMLPurifier_ConfigSchema::define(
|
||||
'Core', 'Encoding', 'utf-8', 'istring', 'Case insensitivity!'
|
||||
);
|
||||
|
||||
HTMLPurifier_ConfigSchema::define(
|
||||
'Extension', 'CanBeNull', null, 'string/null', 'Null or string!'
|
||||
);
|
||||
|
||||
HTMLPurifier_ConfigSchema::defineAllowedValues(
|
||||
'Extension', 'Pert', array('foo', 'moo')
|
||||
);
|
||||
HTMLPurifier_ConfigSchema::defineValueAliases(
|
||||
'Extension', 'Pert', array('cow' => 'moo')
|
||||
);
|
||||
HTMLPurifier_ConfigSchema::defineAllowedValues(
|
||||
'Core', 'Encoding', array('utf-8', 'iso-8859-1')
|
||||
);
|
||||
CS::define('Element', 'Abbr', 'H', 'string', 'Abbreviation of element name.');
|
||||
CS::define('Element', 'Name', 'hydrogen', 'istring', 'Full name of atoms.');
|
||||
CS::define('Element', 'Number', 1, 'int', 'Atomic number, is identity.');
|
||||
CS::define('Element', 'Mass', 1.00794, 'float', 'Atomic mass.');
|
||||
CS::define('Element', 'Radioactive', false, 'bool', 'Does it have rapid decay?');
|
||||
CS::define('Element', 'Isotopes', array(1 => true, 2 => true, 3 => true), 'lookup',
|
||||
'What numbers of neutrons for this element have been observed?');
|
||||
CS::define('Element', 'Traits', array('nonmetallic', 'odorless', 'flammable'), 'list',
|
||||
'What are general properties of the element?');
|
||||
CS::define('Element', 'IsotopeNames', array(1 => 'protium', 2 => 'deuterium', 3 => 'tritium'), 'hash',
|
||||
'Lookup hash of neutron counts to formal names.');
|
||||
CS::define('Element', 'Object', new stdClass(), 'mixed', 'Model representation.');
|
||||
|
||||
$config = HTMLPurifier_Config::createDefault();
|
||||
|
||||
// test default value retrieval
|
||||
$this->assertIdentical($config->get('Core', 'Key'), false);
|
||||
$this->assertIdentical($config->get('Attr', 'Key'), 42);
|
||||
$this->assertIdentical($config->get('Extension', 'Pert'), 'foo');
|
||||
$this->assertIdentical($config->get('Element', 'Abbr'), 'H');
|
||||
$this->assertIdentical($config->get('Element', 'Name'), 'hydrogen');
|
||||
$this->assertIdentical($config->get('Element', 'Number'), 1);
|
||||
$this->assertIdentical($config->get('Element', 'Mass'), 1.00794);
|
||||
$this->assertIdentical($config->get('Element', 'Radioactive'), false);
|
||||
$this->assertIdentical($config->get('Element', 'Isotopes'), array(1 => true, 2 => true, 3 => true));
|
||||
$this->assertIdentical($config->get('Element', 'Traits'), array('nonmetallic', 'odorless', 'flammable'));
|
||||
$this->assertIdentical($config->get('Element', 'IsotopeNames'), array(1 => 'protium', 2 => 'deuterium', 3 => 'tritium'));
|
||||
$this->assertIdentical($config->get('Element', 'Object'), new stdClass());
|
||||
|
||||
// set some values
|
||||
$config->set('Core', 'Key', true);
|
||||
$this->assertIdentical($config->get('Core', 'Key'), true);
|
||||
// test setting values
|
||||
$config->set('Element', 'Abbr', 'Pu');
|
||||
$config->set('Element', 'Name', 'PLUTONIUM'); // test decaps
|
||||
$config->set('Element', 'Number', '94'); // test parsing
|
||||
$config->set('Element', 'Mass', '244.'); // test parsing
|
||||
$config->set('Element', 'Radioactive', true);
|
||||
$config->set('Element', 'Isotopes', array(238, 239)); // test inversion
|
||||
$config->set('Element', 'Traits', 'nuclear, heavy, actinide'); // test parsing
|
||||
$config->set('Element', 'IsotopeNames', array(238 => 'Plutonium-238', 239 => 'Plutonium-239'));
|
||||
$config->set('Element', 'Object', false); // unmodeled
|
||||
|
||||
// try to retrieve undefined value
|
||||
$config->get('Core', 'NotDefined');
|
||||
$this->assertError('Cannot retrieve value of undefined directive');
|
||||
$this->assertNoErrors();
|
||||
// test value retrieval
|
||||
$this->assertIdentical($config->get('Element', 'Abbr'), 'Pu');
|
||||
$this->assertIdentical($config->get('Element', 'Name'), 'plutonium');
|
||||
$this->assertIdentical($config->get('Element', 'Number'), 94);
|
||||
$this->assertIdentical($config->get('Element', 'Mass'), 244.);
|
||||
$this->assertIdentical($config->get('Element', 'Radioactive'), true);
|
||||
$this->assertIdentical($config->get('Element', 'Isotopes'), array(238 => true, 239 => true));
|
||||
$this->assertIdentical($config->get('Element', 'Traits'), array('nuclear', 'heavy', 'actinide'));
|
||||
$this->assertIdentical($config->get('Element', 'IsotopeNames'), array(238 => 'Plutonium-238', 239 => 'Plutonium-239'));
|
||||
$this->assertIdentical($config->get('Element', 'Object'), false);
|
||||
|
||||
// try to set undefined value
|
||||
$config->set('Foobar', 'Key', 'foobar');
|
||||
$this->assertError('Cannot set undefined directive to value');
|
||||
$this->assertNoErrors();
|
||||
// errors
|
||||
|
||||
// try to set not allowed value
|
||||
$config->set('Extension', 'Pert', 'wizard');
|
||||
$this->assertError('Value not supported');
|
||||
$this->assertNoErrors();
|
||||
$this->expectError('Cannot retrieve value of undefined directive');
|
||||
$config->get('Element', 'Metal');
|
||||
|
||||
// try to set not allowed value
|
||||
$config->set('Extension', 'Pert', 34);
|
||||
$this->assertError('Value is of invalid type');
|
||||
$this->assertNoErrors();
|
||||
$this->expectError('Cannot set undefined directive to value');
|
||||
$config->set('Element', 'Metal', true);
|
||||
|
||||
// set aliased value
|
||||
$config->set('Extension', 'Pert', 'cow');
|
||||
$this->assertNoErrors();
|
||||
$this->assertIdentical($config->get('Extension', 'Pert'), 'moo');
|
||||
$this->expectError('Value is of invalid type');
|
||||
$config->set('Element', 'Radioactive', 'very');
|
||||
|
||||
// case-insensitive attempt to set value that is allowed
|
||||
$config->set('Core', 'Encoding', 'ISO-8859-1');
|
||||
$this->assertNoErrors();
|
||||
$this->assertIdentical($config->get('Core', 'Encoding'), 'iso-8859-1');
|
||||
}
|
||||
|
||||
function testEnumerated() {
|
||||
|
||||
// set null to directive that allows null
|
||||
$config->set('Extension', 'CanBeNull', null);
|
||||
$this->assertNoErrors();
|
||||
$this->assertIdentical($config->get('Extension', 'CanBeNull'), null);
|
||||
CS::defineNamespace('Instrument', 'Of the musical type.');
|
||||
|
||||
$config->set('Extension', 'CanBeNull', 'foobar');
|
||||
$this->assertNoErrors();
|
||||
$this->assertIdentical($config->get('Extension', 'CanBeNull'), 'foobar');
|
||||
// case sensitive
|
||||
CS::define('Instrument', 'Manufacturer', 'Yamaha', 'string', 'Who made it?');
|
||||
CS::defineAllowedValues('Instrument', 'Manufacturer', array(
|
||||
'Yamaha', 'Conn-Selmer', 'Vandoren', 'Laubin', 'Buffet', 'other'));
|
||||
CS::defineValueAliases('Instrument', 'Manufacturer', array(
|
||||
'Selmer' => 'Conn-Selmer'));
|
||||
|
||||
// set null to directive that doesn't allow null
|
||||
$config->set('Extension', 'Pert', null);
|
||||
$this->assertError('Value is of invalid type');
|
||||
$this->assertNoErrors();
|
||||
// case insensitive
|
||||
CS::define('Instrument', 'Family', 'woodwind', 'istring', 'What family is it?');
|
||||
CS::defineAllowedValues('Instrument', 'Family', array(
|
||||
'brass', 'woodwind', 'percussion', 'string', 'keyboard', 'electronic'));
|
||||
CS::defineValueAliases('Instrument', 'Family', array(
|
||||
'synth' => 'electronic'));
|
||||
|
||||
$config = HTMLPurifier_Config::createDefault();
|
||||
|
||||
// case sensitive
|
||||
|
||||
$config->set('Instrument', 'Manufacturer', 'Vandoren');
|
||||
$this->assertIdentical($config->get('Instrument', 'Manufacturer'), 'Vandoren');
|
||||
|
||||
$config->set('Instrument', 'Manufacturer', 'Selmer');
|
||||
$this->assertIdentical($config->get('Instrument', 'Manufacturer'), 'Conn-Selmer');
|
||||
|
||||
$this->expectError('Value not supported');
|
||||
$config->set('Instrument', 'Manufacturer', 'buffet');
|
||||
|
||||
// case insensitive
|
||||
|
||||
$config->set('Instrument', 'Family', 'brass');
|
||||
$this->assertIdentical($config->get('Instrument', 'Family'), 'brass');
|
||||
|
||||
$config->set('Instrument', 'Family', 'PERCUSSION');
|
||||
$this->assertIdentical($config->get('Instrument', 'Family'), 'percussion');
|
||||
|
||||
$config->set('Instrument', 'Family', 'synth');
|
||||
$this->assertIdentical($config->get('Instrument', 'Family'), 'electronic');
|
||||
|
||||
$config->set('Instrument', 'Family', 'Synth');
|
||||
$this->assertIdentical($config->get('Instrument', 'Family'), 'electronic');
|
||||
|
||||
}
|
||||
|
||||
function testNull() {
|
||||
|
||||
CS::defineNamespace('ReportCard', 'It is for grades.');
|
||||
CS::define('ReportCard', 'English', null, 'string/null', 'Grade from English class.');
|
||||
CS::define('ReportCard', 'Absences', 0, 'int', 'How many times missing from school?');
|
||||
|
||||
$config = HTMLPurifier_Config::createDefault();
|
||||
|
||||
$config->set('ReportCard', 'English', 'B-');
|
||||
$this->assertIdentical($config->get('ReportCard', 'English'), 'B-');
|
||||
|
||||
$config->set('ReportCard', 'English', null); // not yet graded
|
||||
$this->assertIdentical($config->get('ReportCard', 'English'), null);
|
||||
|
||||
// error
|
||||
$this->expectError('Value is of invalid type');
|
||||
$config->set('ReportCard', 'Absences', null);
|
||||
|
||||
}
|
||||
|
||||
function testAliases() {
|
||||
|
||||
HTMLPurifier_ConfigSchema::defineNamespace('Home', 'Sweet home.');
|
||||
HTMLPurifier_ConfigSchema::define('Home', 'Rug', 3, 'int', 'ID.');
|
||||
HTMLPurifier_ConfigSchema::defineAlias('Home', 'Carpet', 'Home', 'Rug');
|
||||
|
||||
$config = HTMLPurifier_Config::createDefault();
|
||||
|
||||
$this->assertEqual($config->get('Home', 'Rug'), 3);
|
||||
|
||||
$this->expectError('Cannot get value from aliased directive, use real name');
|
||||
$config->get('Home', 'Carpet');
|
||||
|
||||
$config->set('Home', 'Carpet', 999);
|
||||
$this->assertEqual($config->get('Home', 'Rug'), 999);
|
||||
|
||||
}
|
||||
|
||||
// test functionality based on method
|
||||
|
||||
function test_getBatch() {
|
||||
|
||||
CS::defineNamespace('Variables', 'Changing quantities in equation.');
|
||||
CS::define('Variables', 'TangentialAcceleration', 'a_tan', 'string', 'In m/s^2');
|
||||
CS::define('Variables', 'AngularAcceleration', 'alpha', 'string', 'In rad/s^2');
|
||||
|
||||
$config = HTMLPurifier_Config::createDefault();
|
||||
|
||||
// grab a namespace
|
||||
$config->set('Attr', 'Key', 0xBEEF);
|
||||
$this->assertIdentical(
|
||||
$config->getBatch('Attr'),
|
||||
$config->getBatch('Variables'),
|
||||
array(
|
||||
'Key' => 0xBEEF
|
||||
'TangentialAcceleration' => 'a_tan',
|
||||
'AngularAcceleration' => 'alpha'
|
||||
)
|
||||
);
|
||||
|
||||
// grab a non-existant namespace
|
||||
$config->getBatch('FurnishedGoods');
|
||||
$this->assertError('Cannot retrieve undefined namespace');
|
||||
$this->assertNoErrors();
|
||||
$this->expectError('Cannot retrieve undefined namespace');
|
||||
$config->getBatch('Constants');
|
||||
|
||||
}
|
||||
|
||||
function test_loadIni() {
|
||||
|
||||
CS::defineNamespace('Shortcut', 'Keyboard shortcuts for commands');
|
||||
CS::define('Shortcut', 'Copy', 'c', 'istring', 'Copy text');
|
||||
CS::define('Shortcut', 'Paste', 'v', 'istring', 'Paste clipboard');
|
||||
CS::define('Shortcut', 'Cut', 'x', 'istring', 'Cut text');
|
||||
|
||||
$config = HTMLPurifier_Config::createDefault();
|
||||
|
||||
$config->loadIni(dirname(__FILE__) . '/ConfigTest-loadIni.ini');
|
||||
|
||||
$this->assertIdentical($config->get('Shortcut', 'Copy'), 'q');
|
||||
$this->assertIdentical($config->get('Shortcut', 'Paste'), 'p');
|
||||
$this->assertIdentical($config->get('Shortcut', 'Cut'), 't');
|
||||
|
||||
}
|
||||
|
||||
@@ -148,7 +242,7 @@ class HTMLPurifier_ConfigTest extends UnitTestCase
|
||||
'Zoo', 'Others', array(), 'list', 'Other animals we have one of.'
|
||||
);
|
||||
|
||||
$config_manual = HTMLPurifier_Config::createDefault();
|
||||
$config_manual = HTMLPurifier_Config::createDefault();
|
||||
$config_loadabbr = HTMLPurifier_Config::createDefault();
|
||||
$config_loadfull = HTMLPurifier_Config::createDefault();
|
||||
|
||||
@@ -197,6 +291,10 @@ class HTMLPurifier_ConfigTest extends UnitTestCase
|
||||
$created_config = HTMLPurifier_Config::create(array('Cake.Sprinkles' => 42));
|
||||
$this->assertEqual($config, $created_config);
|
||||
|
||||
// test loadIni
|
||||
$created_config = HTMLPurifier_Config::create(dirname(__FILE__) . '/ConfigTest-create.ini');
|
||||
$this->assertEqual($config, $created_config);
|
||||
|
||||
}
|
||||
|
||||
}
|
||||
|
@@ -20,7 +20,7 @@ class HTMLPurifier_ContextTest extends UnitTestCase
|
||||
|
||||
$this->assertFalse($this->context->exists('IDAccumulator'));
|
||||
|
||||
$accumulator =& new HTMLPurifier_IDAccumulatorMock($this);
|
||||
$accumulator = new HTMLPurifier_IDAccumulatorMock($this);
|
||||
$this->context->register('IDAccumulator', $accumulator);
|
||||
$this->assertTrue($this->context->exists('IDAccumulator'));
|
||||
|
||||
@@ -29,12 +29,13 @@ class HTMLPurifier_ContextTest extends UnitTestCase
|
||||
|
||||
$this->context->destroy('IDAccumulator');
|
||||
$this->assertFalse($this->context->exists('IDAccumulator'));
|
||||
|
||||
$this->expectError('Attempted to retrieve non-existent variable');
|
||||
$accumulator_3 =& $this->context->get('IDAccumulator');
|
||||
$this->assertError('Attempted to retrieve non-existent variable');
|
||||
$this->assertNull($accumulator_3);
|
||||
|
||||
$this->expectError('Attempted to destroy non-existent variable');
|
||||
$this->context->destroy('IDAccumulator');
|
||||
$this->assertError('Attempted to destroy non-existent variable');
|
||||
|
||||
}
|
||||
|
||||
@@ -42,15 +43,13 @@ class HTMLPurifier_ContextTest extends UnitTestCase
|
||||
|
||||
$var = true;
|
||||
$this->context->register('OnceOnly', $var);
|
||||
$this->assertNoErrors();
|
||||
|
||||
$this->expectError('Name collision, cannot re-register');
|
||||
$this->context->register('OnceOnly', $var);
|
||||
$this->assertError('Name collision, cannot re-register');
|
||||
|
||||
// destroy it, now registration is okay
|
||||
$this->context->destroy('OnceOnly');
|
||||
$this->context->register('OnceOnly', $var);
|
||||
$this->assertNoErrors();
|
||||
|
||||
}
|
||||
|
||||
|
@@ -5,17 +5,16 @@ require_once 'HTMLPurifier/Encoder.php';
|
||||
class HTMLPurifier_EncoderTest extends UnitTestCase
|
||||
{
|
||||
|
||||
var $Encoder;
|
||||
var $_entity_lookup;
|
||||
|
||||
function setUp() {
|
||||
$this->Encoder = new HTMLPurifier_Encoder();
|
||||
$this->_entity_lookup = HTMLPurifier_EntityLookup::instance();
|
||||
}
|
||||
|
||||
function assertCleanUTF8($string, $expect = null) {
|
||||
if ($expect === null) $expect = $string;
|
||||
$this->assertIdentical($this->Encoder->cleanUTF8($string), $expect, 'iconv: %s');
|
||||
$this->assertIdentical($this->Encoder->cleanUTF8($string, true), $expect, 'PHP: %s');
|
||||
$this->assertIdentical(HTMLPurifier_Encoder::cleanUTF8($string), $expect, 'iconv: %s');
|
||||
$this->assertIdentical(HTMLPurifier_Encoder::cleanUTF8($string, true), $expect, 'PHP: %s');
|
||||
}
|
||||
|
||||
function test_cleanUTF8() {
|
||||
@@ -35,7 +34,7 @@ class HTMLPurifier_EncoderTest extends UnitTestCase
|
||||
|
||||
// UTF-8 means that we don't touch it
|
||||
$this->assertIdentical(
|
||||
$this->Encoder->convertToUTF8("\xF6", $config, $context),
|
||||
HTMLPurifier_Encoder::convertToUTF8("\xF6", $config, $context),
|
||||
"\xF6" // this is invalid
|
||||
);
|
||||
$this->assertNoErrors();
|
||||
@@ -44,14 +43,14 @@ class HTMLPurifier_EncoderTest extends UnitTestCase
|
||||
|
||||
// Now it gets converted
|
||||
$this->assertIdentical(
|
||||
$this->Encoder->convertToUTF8("\xF6", $config, $context),
|
||||
HTMLPurifier_Encoder::convertToUTF8("\xF6", $config, $context),
|
||||
"\xC3\xB6"
|
||||
);
|
||||
|
||||
$config->set('Test', 'ForceNoIconv', true);
|
||||
|
||||
$this->assertIdentical(
|
||||
$this->Encoder->convertToUTF8("\xF6", $config, $context),
|
||||
HTMLPurifier_Encoder::convertToUTF8("\xF6", $config, $context),
|
||||
"\xC3\xB6"
|
||||
);
|
||||
|
||||
@@ -61,9 +60,12 @@ class HTMLPurifier_EncoderTest extends UnitTestCase
|
||||
$config = HTMLPurifier_Config::createDefault();
|
||||
$context = new HTMLPurifier_Context();
|
||||
|
||||
// zhong-wen
|
||||
$chinese = "\xE4\xB8\xAD\xE6\x96\x87 (Chinese)";
|
||||
|
||||
// UTF-8 means that we don't touch it
|
||||
$this->assertIdentical(
|
||||
$this->Encoder->convertFromUTF8("\xC3\xB6", $config, $context),
|
||||
HTMLPurifier_Encoder::convertFromUTF8("\xC3\xB6", $config, $context),
|
||||
"\xC3\xB6"
|
||||
);
|
||||
|
||||
@@ -71,15 +73,57 @@ class HTMLPurifier_EncoderTest extends UnitTestCase
|
||||
|
||||
// Now it gets converted
|
||||
$this->assertIdentical(
|
||||
$this->Encoder->convertFromUTF8("\xC3\xB6", $config, $context),
|
||||
HTMLPurifier_Encoder::convertFromUTF8("\xC3\xB6", $config, $context),
|
||||
"\xF6"
|
||||
);
|
||||
|
||||
if (function_exists('iconv')) {
|
||||
// iconv has it's own way
|
||||
$this->assertIdentical(
|
||||
HTMLPurifier_Encoder::convertFromUTF8($chinese, $config, $context),
|
||||
" (Chinese)"
|
||||
);
|
||||
}
|
||||
|
||||
// Plain PHP implementation has slightly different behavior
|
||||
$config->set('Test', 'ForceNoIconv', true);
|
||||
$this->assertIdentical(
|
||||
HTMLPurifier_Encoder::convertFromUTF8("\xC3\xB6", $config, $context),
|
||||
"\xF6"
|
||||
);
|
||||
|
||||
$this->assertIdentical(
|
||||
$this->Encoder->convertFromUTF8("\xC3\xB6", $config, $context),
|
||||
"\xF6"
|
||||
HTMLPurifier_Encoder::convertFromUTF8($chinese, $config, $context),
|
||||
"?? (Chinese)"
|
||||
);
|
||||
|
||||
// Preserve the characters!
|
||||
|
||||
$config->set('Core', 'EscapeNonASCIICharacters', true);
|
||||
$this->assertIdentical(
|
||||
HTMLPurifier_Encoder::convertFromUTF8($chinese, $config, $context),
|
||||
"中文 (Chinese)"
|
||||
);
|
||||
|
||||
}
|
||||
|
||||
function test_convertToASCIIDumbLossless() {
|
||||
|
||||
// Uppercase thorn letter
|
||||
$this->assertIdentical(
|
||||
HTMLPurifier_Encoder::convertToASCIIDumbLossless("\xC3\x9Eorn"),
|
||||
"Þorn"
|
||||
);
|
||||
|
||||
$this->assertIdentical(
|
||||
HTMLPurifier_Encoder::convertToASCIIDumbLossless("an"),
|
||||
"an"
|
||||
);
|
||||
|
||||
// test up to four bytes
|
||||
$this->assertIdentical(
|
||||
HTMLPurifier_Encoder::convertToASCIIDumbLossless("\xF3\xA0\x80\xA0"),
|
||||
"󠀠"
|
||||
);
|
||||
|
||||
}
|
||||
|
@@ -16,7 +16,9 @@ class HTMLPurifier_LexerTest extends UnitTestCase
|
||||
|
||||
$this->DirectLex = new HTMLPurifier_Lexer_DirectLex();
|
||||
|
||||
if ( $GLOBALS['HTMLPurifierTest']['PEAR'] ) {
|
||||
if ( $GLOBALS['HTMLPurifierTest']['PEAR'] &&
|
||||
((error_reporting() & E_STRICT) != E_STRICT)
|
||||
) {
|
||||
$this->_has_pear = true;
|
||||
require_once 'HTMLPurifier/Lexer/PEARSax3.php';
|
||||
$this->PEARSax3 = new HTMLPurifier_Lexer_PEARSax3();
|
||||
@@ -324,4 +326,4 @@ class HTMLPurifier_LexerTest extends UnitTestCase
|
||||
|
||||
}
|
||||
|
||||
?>
|
||||
?>
|
||||
|
35
tests/HTMLPurifier/SimpleTest/Reporter.php
Normal file
35
tests/HTMLPurifier/SimpleTest/Reporter.php
Normal file
@@ -0,0 +1,35 @@
|
||||
<?php
|
||||
|
||||
class HTMLPurifier_SimpleTest_Reporter extends HTMLReporter
|
||||
{
|
||||
|
||||
function paintHeader($test_name) {
|
||||
parent::paintHeader($test_name);
|
||||
$test_file = $GLOBALS['HTMLPurifierTest']['File'];
|
||||
?>
|
||||
<form action="" method="get" id="select">
|
||||
<select name="f">
|
||||
<option value="" style="font-weight:bold;"<?php if(!$test_file) {echo ' selected';} ?>>All Tests</option>
|
||||
<?php foreach($GLOBALS['HTMLPurifierTest']['Files'] as $file) { ?>
|
||||
<option value="<?php echo $file ?>"<?php
|
||||
if ($test_file == $file) echo ' selected';
|
||||
?>><?php echo $file ?></option>
|
||||
<?php } ?>
|
||||
</select>
|
||||
<input type="submit" value="Go">
|
||||
</form>
|
||||
<?php
|
||||
flush();
|
||||
}
|
||||
|
||||
function _getCss() {
|
||||
$css = parent::_getCss();
|
||||
$css .= '
|
||||
#select {position:absolute;top:0.2em;right:0.2em;}
|
||||
';
|
||||
return $css;
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
?>
|
@@ -91,11 +91,10 @@ class HTMLPurifier_Strategy_FixNestingTest extends HTMLPurifier_StrategyHarness
|
||||
'<div>Reject</div>', 'Reject', array('HTML.Parent' => 'span')
|
||||
);
|
||||
|
||||
$this->expectError('Cannot use unrecognized element as parent.');
|
||||
$this->assertResult(
|
||||
'<div>Accept</div>', true, array('HTML.Parent' => 'script')
|
||||
);
|
||||
$this->assertError('Cannot use unrecognized element as parent.');
|
||||
$this->assertNoErrors();
|
||||
|
||||
}
|
||||
|
||||
|
@@ -154,7 +154,7 @@ class HTMLPurifier_Strategy_ValidateAttributesTest extends
|
||||
'<bdo dir="ltr">Invalid value!</bdo>'
|
||||
);
|
||||
|
||||
// comparison check for test 20
|
||||
// see above, behavior is subtly different
|
||||
$this->assertResult(
|
||||
'<span dir="blahblah">Invalid value!</span>',
|
||||
'<span>Invalid value!</span>'
|
||||
@@ -176,4 +176,4 @@ class HTMLPurifier_Strategy_ValidateAttributesTest extends
|
||||
|
||||
}
|
||||
|
||||
?>
|
||||
?>
|
||||
|
11
tests/generate_mock_once.func.php
Normal file
11
tests/generate_mock_once.func.php
Normal file
@@ -0,0 +1,11 @@
|
||||
<?php
|
||||
|
||||
// since Mocks can't be called from within test files, we need to do
|
||||
// a little jumping through hoops to generate them
|
||||
function generate_mock_once($name) {
|
||||
$mock_name = $name . 'Mock';
|
||||
if (class_exists($mock_name)) return false;
|
||||
Mock::generate($name, $mock_name);
|
||||
}
|
||||
|
||||
?>
|
140
tests/index.php
140
tests/index.php
@@ -1,146 +1,82 @@
|
||||
<?php
|
||||
|
||||
error_reporting(E_ALL);
|
||||
// call one file using /?f=FileTest.php , see $test_files array for
|
||||
// valid values
|
||||
|
||||
error_reporting(E_ALL | E_STRICT);
|
||||
define('HTMLPurifierTest', 1);
|
||||
|
||||
// wishlist: automated calling of this file from multiple PHP versions so we
|
||||
// don't have to constantly switch around
|
||||
|
||||
// configuration
|
||||
// default settings (protect against register_globals)
|
||||
$GLOBALS['HTMLPurifierTest'] = array();
|
||||
$GLOBALS['HTMLPurifierTest']['PEAR'] = false; // do PEAR tests
|
||||
$simpletest_location = 'simpletest/'; // reasonable guess
|
||||
|
||||
$simpletest_location = 'simpletest/';
|
||||
if (file_exists('../test-settings.php')) include_once '../test-settings.php';
|
||||
// load SimpleTest
|
||||
@include '../test-settings.php'; // don't mind if it isn't there
|
||||
require_once $simpletest_location . 'unit_tester.php';
|
||||
require_once $simpletest_location . 'reporter.php';
|
||||
require_once $simpletest_location . 'mock_objects.php';
|
||||
require_once 'HTMLPurifier/SimpleTest/Reporter.php';
|
||||
|
||||
// configure PEAR if necessary
|
||||
// load Debugger
|
||||
require_once 'Debugger.php';
|
||||
|
||||
// load convenience functions
|
||||
require_once 'generate_mock_once.func.php';
|
||||
require_once 'path2class.func.php';
|
||||
require_once 'tally_errors.func.php'; // compat
|
||||
|
||||
// initialize PEAR (optional)
|
||||
if ( is_string($GLOBALS['HTMLPurifierTest']['PEAR']) ) {
|
||||
// if PEAR is true, we assume that there's no need to
|
||||
// add it to the path
|
||||
set_include_path($GLOBALS['HTMLPurifierTest']['PEAR'] . PATH_SEPARATOR .
|
||||
get_include_path());
|
||||
}
|
||||
|
||||
// debugger
|
||||
require_once 'Debugger.php';
|
||||
|
||||
// emulates inserting a dir called HTMLPurifier into your class dir
|
||||
// initialize and load HTML Purifier
|
||||
set_include_path('../library' . PATH_SEPARATOR . get_include_path());
|
||||
|
||||
// since Mocks can't be called from within test files, we need to do
|
||||
// a little jumping through hoops to generate them
|
||||
function generate_mock_once($name) {
|
||||
$mock_name = $name . 'Mock';
|
||||
if (class_exists($mock_name)) return false;
|
||||
Mock::generate($name, $mock_name);
|
||||
}
|
||||
|
||||
// this has to be defined before we do any includes of library files
|
||||
require_once 'HTMLPurifier.php';
|
||||
|
||||
// define callable test files
|
||||
// load tests
|
||||
$test_files = array();
|
||||
$test_files[] = 'ConfigTest.php';
|
||||
$test_files[] = 'ConfigSchemaTest.php';
|
||||
$test_files[] = 'LexerTest.php';
|
||||
$test_files[] = 'Lexer/DirectLexTest.php';
|
||||
$test_files[] = 'TokenTest.php';
|
||||
$test_files[] = 'ChildDef/RequiredTest.php';
|
||||
$test_files[] = 'ChildDef/OptionalTest.php';
|
||||
$test_files[] = 'ChildDef/ChameleonTest.php';
|
||||
$test_files[] = 'ChildDef/CustomTest.php';
|
||||
$test_files[] = 'ChildDef/TableTest.php';
|
||||
$test_files[] = 'ChildDef/StrictBlockquoteTest.php';
|
||||
$test_files[] = 'GeneratorTest.php';
|
||||
$test_files[] = 'EntityLookupTest.php';
|
||||
$test_files[] = 'Strategy/RemoveForeignElementsTest.php';
|
||||
$test_files[] = 'Strategy/MakeWellFormedTest.php';
|
||||
$test_files[] = 'Strategy/FixNestingTest.php';
|
||||
$test_files[] = 'Strategy/CompositeTest.php';
|
||||
$test_files[] = 'Strategy/CoreTest.php';
|
||||
$test_files[] = 'Strategy/ValidateAttributesTest.php';
|
||||
$test_files[] = 'AttrDefTest.php';
|
||||
$test_files[] = 'AttrDef/EnumTest.php';
|
||||
$test_files[] = 'AttrDef/IDTest.php';
|
||||
$test_files[] = 'AttrDef/ClassTest.php';
|
||||
$test_files[] = 'AttrDef/TextTest.php';
|
||||
$test_files[] = 'AttrDef/LangTest.php';
|
||||
$test_files[] = 'AttrDef/PixelsTest.php';
|
||||
$test_files[] = 'AttrDef/LengthTest.php';
|
||||
$test_files[] = 'AttrDef/URITest.php';
|
||||
$test_files[] = 'AttrDef/CSSTest.php';
|
||||
$test_files[] = 'AttrDef/CompositeTest.php';
|
||||
$test_files[] = 'AttrDef/ColorTest.php';
|
||||
$test_files[] = 'AttrDef/IntegerTest.php';
|
||||
$test_files[] = 'AttrDef/NumberTest.php';
|
||||
$test_files[] = 'AttrDef/CSSLengthTest.php';
|
||||
$test_files[] = 'AttrDef/PercentageTest.php';
|
||||
$test_files[] = 'AttrDef/MultipleTest.php';
|
||||
$test_files[] = 'AttrDef/TextDecorationTest.php';
|
||||
$test_files[] = 'AttrDef/FontFamilyTest.php';
|
||||
$test_files[] = 'AttrDef/HostTest.php';
|
||||
$test_files[] = 'AttrDef/IPv4Test.php';
|
||||
$test_files[] = 'AttrDef/IPv6Test.php';
|
||||
$test_files[] = 'AttrDef/FontTest.php';
|
||||
$test_files[] = 'AttrDef/BorderTest.php';
|
||||
$test_files[] = 'AttrDef/ListStyleTest.php';
|
||||
$test_files[] = 'AttrDef/Email/SimpleCheckTest.php';
|
||||
$test_files[] = 'IDAccumulatorTest.php';
|
||||
$test_files[] = 'TagTransformTest.php';
|
||||
$test_files[] = 'AttrTransform/LangTest.php';
|
||||
$test_files[] = 'AttrTransform/TextAlignTest.php';
|
||||
$test_files[] = 'AttrTransform/BdoDirTest.php';
|
||||
$test_files[] = 'AttrTransform/ImgRequiredTest.php';
|
||||
$test_files[] = 'URISchemeRegistryTest.php';
|
||||
$test_files[] = 'URISchemeTest.php';
|
||||
$test_files[] = 'EncoderTest.php';
|
||||
$test_files[] = 'EntityParserTest.php';
|
||||
$test_files[] = 'Test.php';
|
||||
$test_files[] = 'ContextTest.php';
|
||||
$test_files[] = 'PercentEncoderTest.php';
|
||||
|
||||
if (version_compare(PHP_VERSION, '5', '>=')) {
|
||||
$test_files[] = 'TokenFactoryTest.php';
|
||||
}
|
||||
|
||||
require 'test_files.php'; // populates $test_files array
|
||||
sort($test_files); // for the SELECT
|
||||
$GLOBALS['HTMLPurifierTest']['Files'] = $test_files; // for the reporter
|
||||
$test_file_lookup = array_flip($test_files);
|
||||
|
||||
function htmlpurifier_path2class($path) {
|
||||
$temp = $path;
|
||||
$temp = str_replace('./', '', $temp); // remove leading './'
|
||||
$temp = str_replace('.\\', '', $temp); // remove leading '.\'
|
||||
$temp = str_replace('\\', '_', $temp); // normalize \ to _
|
||||
$temp = str_replace('/', '_', $temp); // normalize / to _
|
||||
while(strpos($temp, '__') !== false) $temp = str_replace('__', '_', $temp);
|
||||
$temp = str_replace('.php', '', $temp);
|
||||
return $temp;
|
||||
// determine test file
|
||||
if (isset($_GET['f']) && isset($test_file_lookup[$_GET['f']])) {
|
||||
$GLOBALS['HTMLPurifierTest']['File'] = $_GET['f'];
|
||||
} else {
|
||||
$GLOBALS['HTMLPurifierTest']['File'] = false;
|
||||
}
|
||||
|
||||
// we can't use addTestFile because SimpleTest chokes on E_STRICT warnings
|
||||
|
||||
if (isset($_GET['file']) && isset($test_file_lookup[$_GET['file']])) {
|
||||
if ($test_file = $GLOBALS['HTMLPurifierTest']['File']) {
|
||||
|
||||
// execute only one test
|
||||
$test_file = $_GET['file'];
|
||||
|
||||
$test = new GroupTest('HTML Purifier - ' . $test_file);
|
||||
$test = new GroupTest($test_file . ' - HTML Purifier');
|
||||
$path = 'HTMLPurifier/' . $test_file;
|
||||
require_once $path;
|
||||
$test->addTestClass(htmlpurifier_path2class($path));
|
||||
$test->addTestClass(path2class($path));
|
||||
|
||||
} else {
|
||||
|
||||
$test = new GroupTest('HTML Purifier');
|
||||
$test = new GroupTest('All Tests - HTML Purifier');
|
||||
|
||||
foreach ($test_files as $test_file) {
|
||||
$path = 'HTMLPurifier/' . $test_file;
|
||||
require_once $path;
|
||||
$test->addTestClass(htmlpurifier_path2class($path));
|
||||
$test->addTestClass(path2class($path));
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
if (SimpleReporter::inCli()) $reporter = new TextReporter();
|
||||
else $reporter = new HTMLReporter('UTF-8');
|
||||
else $reporter = new HTMLPurifier_SimpleTest_Reporter('UTF-8');
|
||||
|
||||
$test->run($reporter);
|
||||
|
||||
|
14
tests/path2class.func.php
Normal file
14
tests/path2class.func.php
Normal file
@@ -0,0 +1,14 @@
|
||||
<?php
|
||||
|
||||
function path2class($path) {
|
||||
$temp = $path;
|
||||
$temp = str_replace('./', '', $temp); // remove leading './'
|
||||
$temp = str_replace('.\\', '', $temp); // remove leading '.\'
|
||||
$temp = str_replace('\\', '_', $temp); // normalize \ to _
|
||||
$temp = str_replace('/', '_', $temp); // normalize / to _
|
||||
while(strpos($temp, '__') !== false) $temp = str_replace('__', '_', $temp);
|
||||
$temp = str_replace('.php', '', $temp);
|
||||
return $temp;
|
||||
}
|
||||
|
||||
?>
|
18
tests/tally_errors.func.php
Normal file
18
tests/tally_errors.func.php
Normal file
@@ -0,0 +1,18 @@
|
||||
<?php
|
||||
|
||||
function tally_errors() {
|
||||
// BRITTLE: relies on private code to work
|
||||
$context = &SimpleTest::getContext();
|
||||
$queue = &$context->get('SimpleErrorQueue');
|
||||
if (!isset($queue->_expectation_queue)) return; // fut-compat
|
||||
foreach ($queue->_expectation_queue as $e) {
|
||||
if (count($e) != 2) return; // fut-compat
|
||||
if (!isset($e[0])) return; // fut-compat
|
||||
$e[0]->_dumper = new SimpleDumper();
|
||||
$this->fail('Error expectation not fulfilled: ' .
|
||||
$e[0]->testMessage(null));
|
||||
}
|
||||
$queue->_expectation_queue = array();
|
||||
}
|
||||
|
||||
?>
|
72
tests/test_files.php
Normal file
72
tests/test_files.php
Normal file
@@ -0,0 +1,72 @@
|
||||
<?php
|
||||
|
||||
if (!defined('HTMLPurifierTest')) exit;
|
||||
|
||||
// define callable test files
|
||||
$test_files[] = 'ConfigTest.php';
|
||||
$test_files[] = 'ConfigSchemaTest.php';
|
||||
$test_files[] = 'LexerTest.php';
|
||||
$test_files[] = 'Lexer/DirectLexTest.php';
|
||||
$test_files[] = 'TokenTest.php';
|
||||
$test_files[] = 'ChildDef/RequiredTest.php';
|
||||
$test_files[] = 'ChildDef/OptionalTest.php';
|
||||
$test_files[] = 'ChildDef/ChameleonTest.php';
|
||||
$test_files[] = 'ChildDef/CustomTest.php';
|
||||
$test_files[] = 'ChildDef/TableTest.php';
|
||||
$test_files[] = 'ChildDef/StrictBlockquoteTest.php';
|
||||
$test_files[] = 'GeneratorTest.php';
|
||||
$test_files[] = 'EntityLookupTest.php';
|
||||
$test_files[] = 'Strategy/RemoveForeignElementsTest.php';
|
||||
$test_files[] = 'Strategy/MakeWellFormedTest.php';
|
||||
$test_files[] = 'Strategy/FixNestingTest.php';
|
||||
$test_files[] = 'Strategy/CompositeTest.php';
|
||||
$test_files[] = 'Strategy/CoreTest.php';
|
||||
$test_files[] = 'Strategy/ValidateAttributesTest.php';
|
||||
$test_files[] = 'AttrDefTest.php';
|
||||
$test_files[] = 'AttrDef/EnumTest.php';
|
||||
$test_files[] = 'AttrDef/IDTest.php';
|
||||
$test_files[] = 'AttrDef/ClassTest.php';
|
||||
$test_files[] = 'AttrDef/TextTest.php';
|
||||
$test_files[] = 'AttrDef/LangTest.php';
|
||||
$test_files[] = 'AttrDef/PixelsTest.php';
|
||||
$test_files[] = 'AttrDef/LengthTest.php';
|
||||
$test_files[] = 'AttrDef/URITest.php';
|
||||
$test_files[] = 'AttrDef/CSSTest.php';
|
||||
$test_files[] = 'AttrDef/CompositeTest.php';
|
||||
$test_files[] = 'AttrDef/ColorTest.php';
|
||||
$test_files[] = 'AttrDef/IntegerTest.php';
|
||||
$test_files[] = 'AttrDef/NumberTest.php';
|
||||
$test_files[] = 'AttrDef/CSSLengthTest.php';
|
||||
$test_files[] = 'AttrDef/PercentageTest.php';
|
||||
$test_files[] = 'AttrDef/MultipleTest.php';
|
||||
$test_files[] = 'AttrDef/TextDecorationTest.php';
|
||||
$test_files[] = 'AttrDef/FontFamilyTest.php';
|
||||
$test_files[] = 'AttrDef/HostTest.php';
|
||||
$test_files[] = 'AttrDef/IPv4Test.php';
|
||||
$test_files[] = 'AttrDef/IPv6Test.php';
|
||||
$test_files[] = 'AttrDef/FontTest.php';
|
||||
$test_files[] = 'AttrDef/BorderTest.php';
|
||||
$test_files[] = 'AttrDef/ListStyleTest.php';
|
||||
$test_files[] = 'AttrDef/Email/SimpleCheckTest.php';
|
||||
$test_files[] = 'AttrDef/CSSURITest.php';
|
||||
$test_files[] = 'AttrDef/BackgroundPositionTest.php';
|
||||
$test_files[] = 'AttrDef/BackgroundTest.php';
|
||||
$test_files[] = 'IDAccumulatorTest.php';
|
||||
$test_files[] = 'TagTransformTest.php';
|
||||
$test_files[] = 'AttrTransform/LangTest.php';
|
||||
$test_files[] = 'AttrTransform/TextAlignTest.php';
|
||||
$test_files[] = 'AttrTransform/BdoDirTest.php';
|
||||
$test_files[] = 'AttrTransform/ImgRequiredTest.php';
|
||||
$test_files[] = 'URISchemeRegistryTest.php';
|
||||
$test_files[] = 'URISchemeTest.php';
|
||||
$test_files[] = 'EncoderTest.php';
|
||||
$test_files[] = 'EntityParserTest.php';
|
||||
$test_files[] = 'Test.php';
|
||||
$test_files[] = 'ContextTest.php';
|
||||
$test_files[] = 'PercentEncoderTest.php';
|
||||
|
||||
if (version_compare(PHP_VERSION, '5', '>=')) {
|
||||
$test_files[] = 'TokenFactoryTest.php';
|
||||
}
|
||||
|
||||
?>
|
Reference in New Issue
Block a user