1
0
mirror of https://github.com/ezyang/htmlpurifier.git synced 2025-08-03 12:47:56 +02:00

Compare commits

...

54 Commits

Author SHA1 Message Date
Edward Z. Yang
280211f70b Release 3.2.0.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-10-31 16:30:54 -04:00
Edward Z. Yang
3fd51d527c Add a nod to the RFC's recommendation that UTF-8 be used in URIs.
Mentioned in http://unspecified.wordpress.com/2008/06/29/do-browsers-encode-urls-correctly/

Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-10-31 12:55:07 -04:00
Edward Z. Yang
0e6e2c4edf Bump descriptions to 3.2.0.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-10-31 12:25:43 -04:00
Edward Z. Yang
22d24e6b04 Add error planning documentation.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-10-25 02:11:58 -04:00
Edward Z. Yang
3a2fd0b5db Improve floating point scaling in UnitConverter.
When precision dictates that a number be zero padded, we cannot give sprintf()
a negative precision specifier.  This commit implements manual negative precision
printing of floats, taking into account common rounding errors with floating
point numbers.

Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-10-24 12:50:59 -04:00
Edward Z. Yang
25fa53c15b Fix misleading entry in NEWS.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-10-24 12:28:19 -04:00
David Morton
0b6ae1c3c1 Custom Injector to display URL address along with link text.
When viewing potentially hostile html, it may be helpful to see what
a given link was pointing to.  This new injector takes the href
attribute and adds the text after the link, and deletes the href
attribute.

Other forms of display could easily be contrived, but this seems to be
a good basic way to present the information.

Signed-off-by: David Morton <mortonda@dgrmm.net>
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-10-23 17:11:29 -04:00
Edward Z. Yang
ab263a0bf1 Rewrite spurious encoding test, as utf8 is sometimes useful.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-10-23 15:22:31 -04:00
Edward Z. Yang
c5b18d345c Fix validation error in documentation index.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-10-16 20:08:01 -04:00
Edward Z. Yang
d26418ca3a Ignore runtime files in Phorum plugin directory.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-10-11 23:20:12 -04:00
Edward Z. Yang
d304c5c976 Detect if domxml extension is loaded, and use DirectLex accordingly.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-10-08 17:06:10 -04:00
Edward Z. Yang
f7bc0b0875 Implement %Attr.DefaultImageAlt, allowing overriding default behavior for alt attributes.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-10-06 14:51:03 -04:00
Edward Z. Yang
70515dd48f Increase test coverage, and modify handleEnd behavior to only see correct tokens.
Previously, handleEnd was called for any end tag, except ones that were obviously
spurious because there were no parent tags. Now, it is only called for end tags
that are "approved." If an injector operates on the end tag, we automatically
punt. There may be some optimizations that could be made to this procedure,
but for now it's much more consistent.

Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-10-01 15:40:31 -04:00
Edward Z. Yang
1555cb617f Minor refactoring and bugfixing of Injector and MakeWellFormed.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-10-01 04:10:41 -04:00
Edward Z. Yang
cd4500457e More refactoring to MakeWellFormed and Injectors; they work better than ever now!
Major paradigm shift in this commit is bailing ship on the "skip" integers, which
were extremely buggy and error prone, and simply mark tokens as processed or
not processed by injectors. Other notable changes:

- Removed ad hoc decrements to inputIndex in favor of $reprocess flag variable
- Moved rewind outside of processToken()
- Make rewind properly ignore all other injectors
- Cleanup end of document code
- Reconfigure injector loops to account for skips and rewinds
- Punt the empty to start/end transformation
- Completely rewrite processToken to be array based
- Added skip and rewind member variables to tokens
- Fixed a longstanding bug with remove empty!

Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-10-01 03:14:28 -04:00
Edward Z. Yang
fa413e96ac Implement Injector->handleEnd, with lots of refactoring for injector.
Previous design of injector streaming involved editability only to start, empty
and text tokens, because they could be safely modified without causing formedness
errors.  By modifying notifyEnd to operate before MakeWellFormed's safeguards
kick into effect, it can be converted into a handle function, allowing for
arbitrary modification of end tags.

This change involved quite a bit of restructuring of the MakeWellFormed code,
including the moving of end of document tags to inside the loop, so rewinding
on those tags would be functional, increased reuse of the end tag codepath by
code that inserts end tags (as they could be changed out from under you), and
processToken modified to have an extra parameter to force re-processing of
a token if the original token was an end token.

We're not exactly sure if handleEnd works at this point, but the important
talking point about this refactoring is that nothing else broke. Also, a number
of convenience functions were moved from AutoParagraph to the Injector
supertype (specifically: forward, forwardToEndToken, backward, and current).

Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-10-01 00:54:51 -04:00
Edward Z. Yang
d0fdcc103e Add support for proprietary "background" attribute in table elements.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-09-27 21:19:35 -04:00
Edward Z. Yang
6a06b92f0c Setup ErrorCollector to maintain new error format, and output that HTML.
Also changed:
    - DirectLex keeps track of column numbers in context
    - New class HTMLPurifier_ErrorStruct

Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-09-15 19:08:58 -04:00
Edward Z. Yang
3184fee468 Undo start()/end() error collector changes in AttrValidator.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-09-05 17:25:35 -04:00
Edward Z. Yang
ed7983b559 Refactor lexer instantiation logic with exceptions and forced line tracking.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-09-05 14:04:23 -04:00
Edward Z. Yang
92df9e5b28 Update customize docs to use new directive name for definition caching.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-09-02 16:38:40 -04:00
Edward Z. Yang
2f41bd07fa Update docs, removing $Id$ and linking to repo.or.cz.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-09-02 15:01:25 -04:00
Edward Z. Yang
c6914dce51 Track column numbers in addition to line numbers.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-09-01 14:10:10 -04:00
Edward Z. Yang
9977350143 Fix bug with anonymous module and the SafeObject/SafeEmbed modules.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-08-31 19:06:25 -04:00
Edward Z. Yang
d9e60350d3 Migrate AttrValidator to nested error format; modify generator logic in ErrorCollector.
AttrValidator's changes are fairly self-explanatory, but ErrorCollector's
changes are worth a little discussion.  ErrorCollector can use generators at
various points during its flow control; there are two distinct generators that
it should use: 1. The one used for the output, and 2. The one used for the
error output.  These will usually be the same, but in the odd case where they
need to be different, getHTMLFormatted() will accept an alterate configuration
object with an appropriate doctype.

Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-08-18 22:13:58 -04:00
Edward Z. Yang
c807ed5fe2 Implement nested error collection with start() and end() in ErrorCollector.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-08-16 00:41:34 -04:00
Edward Z. Yang
c9b6f125aa Forms implementation for %HTML.Trusted. Some backend changes:
* Added Charsets and Character attribute types
* Fix a heavily recursive form of ContentSets, this allows a content-set
  to include another content-set which includes another content-set, and
  so forth.

Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-08-15 18:57:44 -04:00
Edward Z. Yang
dc28346677 Fix bug where absolute paths with dots/double-dots were not collapsed.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-08-15 13:12:54 -04:00
Edward Z. Yang
8423daef05 Increase test coverage for MakeAbsolute.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-08-13 23:19:38 -04:00
Edward Z. Yang
617f70a8ac Improve auto-paragraph to preserve newlines and handle edge-cases better.
This is a very large commit that includes numerous improvements to the
AutoParagraph injector.  These are:

* Rewritten flow control of the injector to use almost exclusively
  binary conditionals.
* Improved inline documentation with "State" comments, which give concise
  examples of what the token stack looks like at flow points.
* Documentation for all flow branches, even those with no actions.
* Factoring out of common operations to improve readability, especially the
  new iterator private methods.
* Expanded test-suite which covers new flow points, and corrects some errors
  in previous cases.

Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-08-10 00:32:29 -04:00
Edward Z. Yang
0423985b45 Detect if HTML support in DOM is disabled by checking loadHTML().
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-08-07 18:44:21 -04:00
Edward Z. Yang
e013bc9126 Fix bug involving autoclose and inline elements in strict <blockquote>.
The newest autoclose code uses the elements property in whether or not an
element should be closed by a particular tag.  The heuristic is simple; if
the element doesn't allow that tag as a child, it closes the parent
container.  This doesn't work, however, with <blockquote>, which while not
allowing inline styles under Strict doctypes, requires them to be passed
through MakeWellFormed.

The fix was to transition MakeWellFormed to call a method to retrieve the
elements, and then have StrictBlockquote implement a special version of
this method.  Future versions of HTML Purifier may be more flexible in this
regard--further study of the HTML5 specification is required.

Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-08-01 20:52:06 -04:00
Edward Z. Yang
1d90bb2397 Allow <![CDATA[<body>...</body>]]> not to trigger Core.ConvertDocumentToFragment
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-08-01 19:06:28 -04:00
Edward Z. Yang
03dabec2c0 Fix documentation error in Filter.ExtractStyleBlocks and give better example.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-08-01 18:58:47 -04:00
Edward Z. Yang
85090520f1 Add double-munging protection by checking if the domains are the same.
Previously, if an absolute munge URL location was used, HTML passed through
HTML Purifier multiple times would be munged multiple times. This patch
checks if the output URI has the same URI as the input URI; if they do,
the munge is considered unnecessary and discarded.

Requested-by: Chris <justbittin@gmail.com>
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-07-26 22:45:19 -06:00
Edward Z. Yang
3b6aa10592 %URI.DisableExternal(Resources) uses %URI.Base if %URI.Host is not available.
As part of its duties, URIDefinition determine the base URL and the host URL
of the page based on the two corresponding configuration directives. The
DisableExternal URIFilter, however, bypassed this check by directly checking
%URI.Host. This fix forwards the call through URIDefinition.

Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-07-10 18:46:46 -04:00
Edward Z. Yang
3a4b92da81 Slight optimization in LinkTypes using array_keys().
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-07-08 21:47:52 -04:00
Edward Z. Yang
0ec9731184 Update TODO to add IDNA support along with IRI support.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-07-08 20:47:44 -04:00
Edward Z. Yang
e05bd77344 Implement HTMLT tests, and migrate HTMLPurifierTest to this format.
HTMLT tests are a compact and easy-to-use way of making assertPurification
type tests. They take the format of:

--INI--
Ns.Directive = "directive value"
--HTML--
Input HTML
--EXPECT--
Expected HTML

Expect more features and migration to be coming soon.

Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-07-07 08:59:33 -04:00
Edward Z. Yang
334ffac5b4 Various improvements to test script command line options, i.e. --type
The following changes were made:
* Create --type parameter which accepts 'htmlpurifier', 'phpt', 'vtest', etc.
  in order to execute only that class of tests. This supercedes --only-phpt.
* Create --quick parameter for multitest.php, run only the tips of each
  release series.
* Create --distro parameter for multitest.php, supercedes --exclude-normal
  and --exclude-standalone.

Also, a grep for htmlt tests was added, although add_tests() doesn't do
anything with it yet.

Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-07-07 08:59:29 -04:00
Edward Z. Yang
a227cb483a Allow empty sections in string hashes; previously they were left undefined.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-07-07 08:57:16 -04:00
Edward Z. Yang
aa0fdeee30 Refine Lexers for parsing stray angled brackets; %Core.AggressivelyFixLt = true
By default, the DirectLex and DOMLex behavior with stray angled brackets
varied a great deal due to their implementations. A little known directive
%Core.AggressivelyFixLt attempted to match DOMLex's behavior with DirectLex's,
but it was off by default. By turning it on by default, users now enjoy these
benefits, and performance-minded users can turn it back off.

Also, several refinements to stray angled bracket parsing was made. Specifically:

* DirectLex: Handle each left angled bracket individually, which prevents
  strange behavior as reported by eon.
* DOMLex: Iterate aggressive lt fix, so that stacked brackets like << are
  handled.

Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-07-07 08:52:29 -04:00
Edward Z. Yang
ba418a1f19 Redirect stderr to stdout when calling flush.php
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-07-05 03:15:36 -04:00
Edward Z. Yang
c845f0bb78 Give warnings when attempting to use encoding iconv doesn't support.
Previously, attempting to set %Core.Encoding to an encoding iconv didn't
know about would result in a silent failure, with the return of the
boolean false. Now it will fatally error out.

Reported-by: mcgrailm <mgm19@psu.edu>
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-07-05 03:14:32 -04:00
Edward Z. Yang
594268ca3b Fix two bugs in MakeAbsolute filter involving base URIs that have empty path.
The bugs are:
* Undefined $is_folder variable when path is empty, and
* Improper concatenation of host and path together.

Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-07-05 03:12:44 -04:00
Edward Z. Yang
965be3bd73 Add support for unrecognized elements in MakeWellFormed.
The MakeWellFormed strategy uses metadata from HTMLDefinition in order to
determine whether or not tokens need to be converted or tags need to be
auto-closed. While this functionality is good to have, it is by no means
essential, and MakeWellFormed should not error when this information is not
available.

Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-07-05 03:11:29 -04:00
Edward Z. Yang
700d5bcbfc Implement %AutoFormat.RemoveEmpty, end to start ref, and injector rewind.
Injector rewind: Injectors can now use the method rewind() in order to move
the input index backwards, so that they can reprocess tokens (other injectors
are not affected by a rewind). This functionality was necessary to implement
nested node removals in %AutoFormat.RemoveEmpty.

End to start ref: To facilitate rewinding, HTMLPurifier_Token_End now
maintains a reference called $start to the starting token for their node.

%AutoFormat.RemoveEmpty removes empty nodes. Lots of people have requested
it, so here is a partially effective implementation. Because it is implemented
as an Injector, it's not possible for it to handle newly introduced empty
nodes by later validators, specifically auto-closing and child validation.
The Injector is only meant to be used on HTML-ish languages.

Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-06-27 16:09:14 -04:00
Edward Z. Yang
fd384129bf Proper support for name attribute in <a> and <img>
Prior to this commit, the name attribute was unilaterally removed, except
for Strict doctypes or a heavy TidyLevel, when it was converted to an id
attribute. As name is actually permitted in both HTML 4.01 Strict and
XHTML 1.0 Strict, although deprecated, the more sensible default behavior
is to allow it unless TidyLevel is heavy.

Our implementation is slightly stricter than the specs, as name attributes are
treated as first class IDs, disallowing <a name="foo" id="foo"> or duplicate
names. The former should be treated as a special case, but that will be
a separate commit.

Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-06-27 15:44:27 -04:00
Edward Z. Yang
f8b47c64dd Make Strategy_MakeWellFormed operate in place.
Previously, MakeWellFormed processed tokens and appended them onto an output
array, which was presumably immutable and inaccessible to Injectors. By
having MakeWellFormed operate directly on the input array, the strategy
saves memory and will also allow for a rewind implementation, as a unifying
the two arrays allows Injectors to easily determine an index behind them they'd
like to reset state to.

Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-06-27 01:33:48 -04:00
Edward Z. Yang
a5ceb1e22a Update printTokens() debug function to work with new Generator API.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-06-27 01:33:20 -04:00
Edward Z. Yang
636e2883df Add ignore rules for configdoc generated files.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-06-27 00:14:39 -04:00
Edward Z. Yang
dba3ed7770 [3.1.2] Implement comments when %HTML.Trusted is on.
Some implementation notes: not all comments are valid; HTML makes sure
double-hyphens and trailing hyphens are not found in comments. In addition,
two new localizable messages were added.

Requested-by: Waldo Jaquith <waldo@vqronline.org>
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-06-25 23:12:19 -04:00
Edward Z. Yang
de9869d942 Ignore .phpt.skip.php files.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-06-25 23:10:03 -04:00
Edward Z. Yang
cfcdce0db8 Ignore test-settings.php 2008-06-25 22:47:12 -04:00
145 changed files with 3421 additions and 1026 deletions

4
.gitignore vendored
View File

@@ -1,9 +1,13 @@
conf/ conf/
test-settings.php
library/HTMLPurifier/DefinitionCache/Serializer/*/ library/HTMLPurifier/DefinitionCache/Serializer/*/
library/standalone/ library/standalone/
library/HTMLPurifier.standalone.php library/HTMLPurifier.standalone.php
configdoc/*.html
configdoc/configdoc.xml
*.phpt.diff *.phpt.diff
*.phpt.exp *.phpt.exp
*.phpt.log *.phpt.log
*.phpt.out *.phpt.out
*.phpt.php *.phpt.php
*.phpt.skip.php

View File

@@ -31,7 +31,7 @@ PROJECT_NAME = HTMLPurifier
# This could be handy for archiving the generated documentation or # This could be handy for archiving the generated documentation or
# if some version control system is used. # if some version control system is used.
PROJECT_NUMBER = 3.1.1 PROJECT_NUMBER = 3.2.0
# The OUTPUT_DIRECTORY tag is used to specify the (relative or absolute) # The OUTPUT_DIRECTORY tag is used to specify the (relative or absolute)
# base path where the generated documentation will be put. # base path where the generated documentation will be put.

2
FOCUS
View File

@@ -1,4 +1,4 @@
9 - Major security fixes 5 - Major feature enhancements
[ Appendix A: Release focus IDs ] [ Appendix A: Release focus IDs ]
0 - N/A 0 - N/A

71
NEWS
View File

@@ -9,11 +9,76 @@ NEWS ( CHANGELOG and HISTORY ) HTMLPurifier
. Internal change . Internal change
========================== ==========================
3.2.0, unknown release date 3.2.0, released 2008-10-31
# Using %Core.CollectErrors forces line number/column tracking on, whereas
3.1.2, unknown release date previously you could theoretically turn it off.
# HTMLPurifier_Injector->notifyEnd() is formally deprecated. Please
use handleEnd() instead.
! %Output.AttrSort for when you need your attributes in alphabetical order to ! %Output.AttrSort for when you need your attributes in alphabetical order to
deal with a bug in FCKEditor. Requested by frank farmer. deal with a bug in FCKEditor. Requested by frank farmer.
! Enable HTML comments when %HTML.Trusted is on. Requested by Waldo Jaquith.
! Proper support for name attribute. It is now allowed and equivalent to the id
attribute in a and img tags, and is only converted to id when %HTML.TidyLevel
is heavy (for all doctypes).
! %AutoFormat.RemoveEmpty to remove some empty tags from documents. Please don't
use on hand-written HTML.
! Add error-cases for unsupported elements in MakeWellFormed. This enables
the strategy to be used, standalone, on untrusted input.
! %Core.AggressivelyFixLt is on by default. This causes more sensible
processing of left angled brackets in smileys and other whatnot.
! Test scripts now have a 'type' parameter, which lets you say 'htmlpurifier',
'phpt', 'vtest', etc. in order to only execute those tests. This supercedes
the --only-phpt parameter, although for backwards-compatibility the flag
will still work.
! AutoParagraph auto-formatter will now preserve double-newlines upon output.
Users who are not performing inbound filtering, this may seem a little
useless, but as a bonus, the test suite and handling of edge cases is also
improved.
! Experimental implementation of forms for %HTML.Trusted
! Track column numbers when maintain line numbers is on
! Proprietary 'background' attribute on table-related elements converted into
corresponding CSS. Thanks Fusemail for sponsoring this feature!
! Add forward(), forwardUntilEndToken(), backward() and current() to Injector
supertype.
! HTMLPurifier_Injector->handleEnd() permits modification to end tokens. The
time of operation varies slightly from notifyEnd() as *all* end tokens are
processed by the injector before they are subject to the well-formedness rules.
! %Attr.DefaultImageAlt allows overriding default behavior of setting alt to
basename of image when not present.
! %AutoFormat.DisplayLinkURI neuters <a> tags into plain text URLs.
- Fix two bugs in %URI.MakeAbsolute; one involving empty paths in base URLs,
the other involving an undefined $is_folder error.
- Throw error when %Core.Encoding is set to a spurious value. Previously,
this errored silently and returned false.
- Redirected stderr to stdout for flush error output.
- %URI.DisableExternal will now use the host in %URI.Base if %URI.Host is not
available.
- Do not re-munge URL if the output URL has the same host as the input URL.
Requested by Chris.
- Fix error in documentation regarding %Filter.ExtractStyleBlocks
- Prevent <![CDATA[<body></body>]]> from triggering %Core.ConvertDocumentToFragment
- Fix bug with inline elements in blockquotes conflicting with strict doctype
- Detect if HTML support is disabled for DOM by checking for loadHTML() method.
- Fix bug where dots and double-dots in absolute URLs without hostname were
not collapsed by URIFilter_MakeAbsolute.
- Fix bug with anonymous modules operating on SafeEmbed or SafeObject elements
by reordering their addition.
- Will now throw exception on many error conditions during lexer creation; also
throw an exception when MaintainLineNumbers is true, but a non-tracksLineNumbers
is being used.
- Detect if domxml extension is loaded, and use DirectLEx accordingly.
- Improve handling of big numbers with floating point arithmetic in UnitConverter.
Reported by David Morton.
. Strategy_MakeWellFormed now operates in-place, saving memory and allowing
for more interesting filter-backtracking
. New HTMLPurifier_Injector->rewind() functionality, allows injectors to rewind
index to reprocess tokens.
. StringHashParser now allows for multiline sections with "empty" content;
previously the section would remain undefined.
. Added --quick option to multitest.php, which tests only the most recent
release for each series.
. Added --distro option to multitest.php, which accepts either 'normal' or
'standalone'. This supercedes --exclude-normal and --exclude-standalone
3.1.1, released 2008-06-19 3.1.1, released 2008-06-19
# %URI.Munge now, by default, does not munge resources (for example, <img src="">) # %URI.Munge now, by default, does not munge resources (for example, <img src="">)

15
TODO
View File

@@ -14,25 +14,25 @@ afraid to cast your vote for the next feature to be implemented!
- Investigate how early internal structures can be accessed; this would - Investigate how early internal structures can be accessed; this would
prevent structures from being parsed and serialized multiple times. prevent structures from being parsed and serialized multiple times.
- Built-in support for target="_blank" on all external links - Built-in support for target="_blank" on all external links
- Gitify the repository - Allow <a id="asdf" name="asdf">
- Implement overflow CSS property (as per jlp09550)
FUTURE VERSIONS FUTURE VERSIONS
--------------- ---------------
3.2 release [It's All About Trust] (floating) 3.3 release [It's All About Trust] (floating)
# Implement untrusted, dangerous elements/attributes # Implement untrusted, dangerous elements/attributes
- Forms are especially wanted
# Implement IDREF support (harder than it seems, since you cannot have # Implement IDREF support (harder than it seems, since you cannot have
IDREFs to non-existent IDs) IDREFs to non-existent IDs)
# Frameset XHTML 1.0 and HTML 4.01 doctypes # Frameset XHTML 1.0 and HTML 4.01 doctypes
- Implement <area> - Implement <area>
- Figure out how to simultaneously set %CSS.Trusted and %HTML.Trusted (?) - Figure out how to simultaneously set %CSS.Trusted and %HTML.Trusted (?)
3.3 release [Error'ed] 3.4 release [Error'ed]
# Error logging for filtering/cleanup procedures # Error logging for filtering/cleanup procedures
- XSS-attempt detection--certain errors are flagged XSS-like - XSS-attempt detection--certain errors are flagged XSS-like
3.4 release [Do What I Mean, Not What I Say] 3.5 release [Do What I Mean, Not What I Say]
# Additional support for poorly written HTML # Additional support for poorly written HTML
- Microsoft Word HTML cleaning (i.e. MsoNormal, but research essential!) - Microsoft Word HTML cleaning (i.e. MsoNormal, but research essential!)
- Friendly strict handling of <address> (block -> <br>) - Friendly strict handling of <address> (block -> <br>)
@@ -43,7 +43,6 @@ FUTURE VERSIONS
contents should be dropped or not (currently, there's code that could do contents should be dropped or not (currently, there's code that could do
something like this if it didn't drop the inner text too.) something like this if it didn't drop the inner text too.)
- Remove <span> tags that don't do anything (no attributes) - Remove <span> tags that don't do anything (no attributes)
- Remove empty inline tags<i></i>
- Append something to duplicate IDs so they're still usable (impl. note: the - Append something to duplicate IDs so they're still usable (impl. note: the
dupe detector would also need to detect the suffix as well) dupe detector would also need to detect the suffix as well)
- Externalize inline CSS to promote clean HTML, proposed by Sander Tekelenburg - Externalize inline CSS to promote clean HTML, proposed by Sander Tekelenburg
@@ -53,14 +52,12 @@ FUTURE VERSIONS
AttrDef class). Probably will use CSSTidy class? AttrDef class). Probably will use CSSTidy class?
# More control over allowed CSS properties using a modularization # More control over allowed CSS properties using a modularization
# HTML 5 support # HTML 5 support
# IRI support # IRI support (this includes IDN)
- Standardize token armor for all areas of processing - Standardize token armor for all areas of processing
- Convert RTL/LTR override characters to <bdo> tags, or vice versa on demand. - Convert RTL/LTR override characters to <bdo> tags, or vice versa on demand.
Also, enable disabling of directionality Also, enable disabling of directionality
5.0 release [To XML and Beyond] 5.0 release [To XML and Beyond]
- AllowedAttributes and ForbiddenAttributes step on the toes of XML by
using periods; this needs to be changed.
- Extended HTML capabilities based on namespacing and tag transforms (COMPLEX) - Extended HTML capabilities based on namespacing and tag transforms (COMPLEX)
- Hooks for adding custom processors to custom namespaced tags and - Hooks for adding custom processors to custom namespaced tags and
attributes, offer default implementation attributes, offer default implementation

View File

@@ -1 +1 @@
3.1.1 3.2.0

View File

@@ -1,8 +1,6 @@
HTML Purifier 3.1.1 is a security and bugfix release. This release addresses HTML Purifier 3.2.0 is an amalgamation of new features and fixes that
two security vulnerabilities, both related to CSS, and one of which only have accumulated over a four month period. Some notable features
applies to users using Shift_JIS as their output encoding. There is also include %AutoFormat.RemoveEmpty, column tracking for tokens,
a security improvement regarding the imagecrash attack. There is a backwards %AutoFormat.DisplayLinkURI and %Attr.DefaultImageAlt. There were also
incompatible change in which resources are no longer munged major improvements to the test suite interface, error collection output
by default; please enable using %URI.MungeResources. Besides this, there and the auto-formatter framework.
are numerous improvements to URI munging, esp. with the addition of
%URI.MungeSecretKey, as well as an experimental %HTML.SafeObject and %HTML.SafeEmbed.

View File

@@ -5,15 +5,15 @@
<line>131</line> <line>131</line>
</file> </file>
<file name="HTMLPurifier/Lexer.php"> <file name="HTMLPurifier/Lexer.php">
<line>85</line> <line>81</line>
</file> </file>
<file name="HTMLPurifier/Lexer/DirectLex.php"> <file name="HTMLPurifier/Lexer/DirectLex.php">
<line>50</line> <line>53</line>
<line>62</line> <line>73</line>
<line>327</line> <line>348</line>
</file> </file>
<file name="HTMLPurifier/Strategy/RemoveForeignElements.php"> <file name="HTMLPurifier/Strategy/RemoveForeignElements.php">
<line>44</line> <line>47</line>
</file> </file>
</directive> </directive>
<directive id="CSS.MaxImgLength"> <directive id="CSS.MaxImgLength">
@@ -69,29 +69,18 @@
<directive id="Core.Encoding"> <directive id="Core.Encoding">
<file name="HTMLPurifier/Encoder.php"> <file name="HTMLPurifier/Encoder.php">
<line>267</line> <line>267</line>
<line>294</line> <line>300</line>
</file> </file>
</directive> </directive>
<directive id="Test.ForceNoIconv"> <directive id="Test.ForceNoIconv">
<file name="HTMLPurifier/Encoder.php"> <file name="HTMLPurifier/Encoder.php">
<line>272</line> <line>272</line>
<line>302</line> <line>308</line>
</file> </file>
</directive> </directive>
<directive id="Core.EscapeNonASCIICharacters"> <directive id="Core.EscapeNonASCIICharacters">
<file name="HTMLPurifier/Encoder.php"> <file name="HTMLPurifier/Encoder.php">
<line>298</line> <line>304</line>
</file>
</directive>
<directive id="Core.MaintainLineNumbers">
<file name="HTMLPurifier/ErrorCollector.php">
<line>81</line>
</file>
<file name="HTMLPurifier/Lexer.php">
<line>82</line>
</file>
<file name="HTMLPurifier/Lexer/DirectLex.php">
<line>45</line>
</file> </file>
</directive> </directive>
<directive id="Output.CommentScriptContents"> <directive id="Output.CommentScriptContents">
@@ -151,41 +140,44 @@
</directive> </directive>
<directive id="HTML.Trusted"> <directive id="HTML.Trusted">
<file name="HTMLPurifier/HTMLModuleManager.php"> <file name="HTMLPurifier/HTMLModuleManager.php">
<line>198</line> <line>202</line>
</file> </file>
<file name="HTMLPurifier/Lexer.php"> <file name="HTMLPurifier/Lexer.php">
<line>238</line> <line>258</line>
</file> </file>
<file name="HTMLPurifier/HTMLModule/Image.php"> <file name="HTMLPurifier/HTMLModule/Image.php">
<line>27</line> <line>27</line>
</file> </file>
<file name="HTMLPurifier/Lexer/DirectLex.php"> <file name="HTMLPurifier/Lexer/DirectLex.php">
<line>34</line> <line>36</line>
</file>
<file name="HTMLPurifier/Strategy/RemoveForeignElements.php">
<line>23</line>
</file> </file>
</directive> </directive>
<directive id="HTML.AllowedModules"> <directive id="HTML.AllowedModules">
<file name="HTMLPurifier/HTMLModuleManager.php"> <file name="HTMLPurifier/HTMLModuleManager.php">
<line>205</line> <line>209</line>
</file> </file>
</directive> </directive>
<directive id="HTML.CoreModules"> <directive id="HTML.CoreModules">
<file name="HTMLPurifier/HTMLModuleManager.php"> <file name="HTMLPurifier/HTMLModuleManager.php">
<line>206</line> <line>210</line>
</file> </file>
</directive> </directive>
<directive id="HTML.Proprietary"> <directive id="HTML.Proprietary">
<file name="HTMLPurifier/HTMLModuleManager.php"> <file name="HTMLPurifier/HTMLModuleManager.php">
<line>220</line> <line>221</line>
</file> </file>
</directive> </directive>
<directive id="HTML.SafeObject"> <directive id="HTML.SafeObject">
<file name="HTMLPurifier/HTMLModuleManager.php"> <file name="HTMLPurifier/HTMLModuleManager.php">
<line>225</line> <line>226</line>
</file> </file>
</directive> </directive>
<directive id="HTML.SafeEmbed"> <directive id="HTML.SafeEmbed">
<file name="HTMLPurifier/HTMLModuleManager.php"> <file name="HTMLPurifier/HTMLModuleManager.php">
<line>228</line> <line>229</line>
</file> </file>
</directive> </directive>
<directive id="Attr.IDBlacklist"> <directive id="Attr.IDBlacklist">
@@ -200,21 +192,26 @@
</directive> </directive>
<directive id="Core.LexerImpl"> <directive id="Core.LexerImpl">
<file name="HTMLPurifier/Lexer.php"> <file name="HTMLPurifier/Lexer.php">
<line>70</line> <line>76</line>
</file>
</directive>
<directive id="Core.MaintainLineNumbers">
<file name="HTMLPurifier/Lexer.php">
<line>80</line>
</file>
<file name="HTMLPurifier/Lexer/DirectLex.php">
<line>48</line>
</file> </file>
</directive> </directive>
<directive id="Core.ConvertDocumentToFragment"> <directive id="Core.ConvertDocumentToFragment">
<file name="HTMLPurifier/Lexer.php"> <file name="HTMLPurifier/Lexer.php">
<line>230</line> <line>267</line>
</file> </file>
</directive> </directive>
<directive id="URI.Host"> <directive id="URI.Host">
<file name="HTMLPurifier/URIDefinition.php"> <file name="HTMLPurifier/URIDefinition.php">
<line>64</line> <line>64</line>
</file> </file>
<file name="HTMLPurifier/URIFilter/DisableExternal.php">
<line>8</line>
</file>
</directive> </directive>
<directive id="URI.Base"> <directive id="URI.Base">
<file name="HTMLPurifier/URIDefinition.php"> <file name="HTMLPurifier/URIDefinition.php">
@@ -293,9 +290,14 @@
<line>19</line> <line>19</line>
</file> </file>
</directive> </directive>
<directive id="Attr.DefaultImageAlt">
<file name="HTMLPurifier/AttrTransform/ImgRequired.php">
<line>25</line>
</file>
</directive>
<directive id="Attr.DefaultInvalidImageAlt"> <directive id="Attr.DefaultInvalidImageAlt">
<file name="HTMLPurifier/AttrTransform/ImgRequired.php"> <file name="HTMLPurifier/AttrTransform/ImgRequired.php">
<line>27</line> <line>32</line>
</file> </file>
</directive> </directive>
<directive id="Core.EscapeInvalidChildren"> <directive id="Core.EscapeInvalidChildren">
@@ -361,12 +363,12 @@
</directive> </directive>
<directive id="Core.DirectLexLineNumberSyncInterval"> <directive id="Core.DirectLexLineNumberSyncInterval">
<file name="HTMLPurifier/Lexer/DirectLex.php"> <file name="HTMLPurifier/Lexer/DirectLex.php">
<line>59</line> <line>70</line>
</file> </file>
</directive> </directive>
<directive id="Core.EscapeInvalidTags"> <directive id="Core.EscapeInvalidTags">
<file name="HTMLPurifier/Strategy/MakeWellFormed.php"> <file name="HTMLPurifier/Strategy/MakeWellFormed.php">
<line>22</line> <line>45</line>
</file> </file>
<file name="HTMLPurifier/Strategy/RemoveForeignElements.php"> <file name="HTMLPurifier/Strategy/RemoveForeignElements.php">
<line>19</line> <line>19</line>
@@ -374,12 +376,12 @@
</directive> </directive>
<directive id="Core.RemoveScriptContents"> <directive id="Core.RemoveScriptContents">
<file name="HTMLPurifier/Strategy/RemoveForeignElements.php"> <file name="HTMLPurifier/Strategy/RemoveForeignElements.php">
<line>22</line> <line>25</line>
</file> </file>
</directive> </directive>
<directive id="Core.HiddenElements"> <directive id="Core.HiddenElements">
<file name="HTMLPurifier/Strategy/RemoveForeignElements.php"> <file name="HTMLPurifier/Strategy/RemoveForeignElements.php">
<line>23</line> <line>26</line>
</file> </file>
</directive> </directive>
<directive id="URI.HostBlacklist"> <directive id="URI.HostBlacklist">

View File

@@ -213,6 +213,4 @@ the usual things required are:</p>
<p>See <code>HTMLPurifier/HTMLModule.php</code> for details.</p> <p>See <code>HTMLPurifier/HTMLModule.php</code> for details.</p>
<div id="version">$Id$</div>
</body></html> </body></html>

View File

@@ -239,15 +239,15 @@ Test.Example</pre>
object; users have a little bit of leeway when setting configuration object; users have a little bit of leeway when setting configuration
values (for example, a lookup value can be specified as a list; values (for example, a lookup value can be specified as a list;
HTML Purifier will flip it as necessary.) These types are defined HTML Purifier will flip it as necessary.) These types are defined
in <a href="http://htmlpurifier.org/svnroot/htmlpurifier/trunk/library/HTMLPurifier/VarParser.php"> in <a href="http://repo.or.cz/w/htmlpurifier.git?a=blob;hb=HEAD;f=library/HTMLPurifier/VarParser.php">
library/HTMLPurifier/VarParser.php</a>. library/HTMLPurifier/VarParser.php</a>.
</p> </p>
<p> <p>
For more information on what values are allowed, and how they are parsed, For more information on what values are allowed, and how they are parsed,
consult <a href="http://htmlpurifier.org/svnroot/htmlpurifier/trunk/library/HTMLPurifier/ConfigSchema/InterchangeBuilder.php"> consult <a href="http://repo.or.cz/w/htmlpurifier.git?a=blob;hb=HEAD;f=library/HTMLPurifier/ConfigSchema/InterchangeBuilder.php">
library/HTMLPurifier/ConfigSchema/InterchangeBuilder.php</a>, as well library/HTMLPurifier/ConfigSchema/InterchangeBuilder.php</a>, as well
as <a href="http://htmlpurifier.org/svnroot/htmlpurifier/trunk/library/HTMLPurifier/ConfigSchema/Interchange/Directive.php"> as <a href="http://repo.or.cz/w/htmlpurifier.git?a=blob;hb=HEAD;f=library/HTMLPurifier/ConfigSchema/Interchange/Directive.php">
library/HTMLPurifier/ConfigSchema/Interchange/Directive.php</a> for library/HTMLPurifier/ConfigSchema/Interchange/Directive.php</a> for
the semantics of the parsed values. the semantics of the parsed values.
</p> </p>
@@ -272,7 +272,7 @@ Test.Example</pre>
<p> <p>
All directive files go through a rigorous validation process All directive files go through a rigorous validation process
through <a href="http://htmlpurifier.org/svnroot/htmlpurifier/trunk/library/HTMLPurifier/ConfigSchema/"> through <a href="http://repo.or.cz/w/htmlpurifier.git?a=blob;hb=HEAD;f=library/HTMLPurifier/ConfigSchema/Validator.php">
library/HTMLPurifier/ConfigSchema/Validator.php</a>, as well library/HTMLPurifier/ConfigSchema/Validator.php</a>, as well
as some basic checks during building. While as some basic checks during building. While
listing every error out here is out-of-scope for this document, we listing every error out here is out-of-scope for this document, we
@@ -339,7 +339,7 @@ Test.Example</pre>
The most difficult part is translating the Interchange member variable (valueAliases) The most difficult part is translating the Interchange member variable (valueAliases)
into a directive file key (VALUE-ALIASES), but there's a one-to-one into a directive file key (VALUE-ALIASES), but there's a one-to-one
correspondence currently. If the two formats diverge, any discrepancies correspondence currently. If the two formats diverge, any discrepancies
will be described in <a href="http://htmlpurifier.org/svnroot/htmlpurifier/trunk/library/HTMLPurifier/ConfigSchema/InterchangeBuilder.php"> will be described in <a href="http://repo.or.cz/w/htmlpurifier.git?a=blob;hb=HEAD;f=library/HTMLPurifier/ConfigSchema/InterchangeBuilder.php">
library/HTMLPurifier/ConfigSchema/InterchangeBuilder.php</a>. library/HTMLPurifier/ConfigSchema/InterchangeBuilder.php</a>.
</p> </p>
@@ -369,8 +369,6 @@ Test.Example</pre>
which <code>HTMLPurifier_Config</code> uses to validate its incoming which <code>HTMLPurifier_Config</code> uses to validate its incoming
data. There is also an XML serializer, which is used to build documentation. data. There is also an XML serializer, which is used to build documentation.
</p> </p>
<div id="version">$Id$</div>
</body> </body>
</html> </html>

View File

@@ -62,6 +62,4 @@
do. do.
</p> </p>
<div id="version">$Id$</div>
</body></html> </body></html>

View File

@@ -77,6 +77,4 @@ help you find the correct functionality more quickly. Here they are:</p>
</dl> </dl>
<div id="version">$Id$</div>
</body></html> </body></html>

View File

@@ -27,6 +27,4 @@ that itch, put it here!</p>
<li>Parallelize strategies</li> <li>Parallelize strategies</li>
</ul> </ul>
<div id="version">$Id$</div>
</body></html> </body></html>

View File

@@ -303,6 +303,4 @@ Mozilla on inside and needs -moz-outline, no IE support.</td></tr>
</table> </table>
<div id="version">$Id$</div>
</body></html> </body></html>

View File

@@ -213,7 +213,7 @@ $def = $config->getHTMLDefinition(true);</pre>
<pre>$config = HTMLPurifier_Config::createDefault(); <pre>$config = HTMLPurifier_Config::createDefault();
$config->set('HTML', 'DefinitionID', 'enduser-customize.html tutorial'); $config->set('HTML', 'DefinitionID', 'enduser-customize.html tutorial');
$config->set('HTML', 'DefinitionRev', 1); $config->set('HTML', 'DefinitionRev', 1);
<strong>$config->set('Core', 'DefinitionCache', null); // remove this later!</strong> <strong>$config->set('Cache', 'DefinitionImpl', null); // remove this later!</strong>
$def = $config->getHTMLDefinition(true);</pre> $def = $config->getHTMLDefinition(true);</pre>
<p> <p>
@@ -269,7 +269,7 @@ $def = $config->getHTMLDefinition(true);</pre>
<pre>$config = HTMLPurifier_Config::createDefault(); <pre>$config = HTMLPurifier_Config::createDefault();
$config->set('HTML', 'DefinitionID', 'enduser-customize.html tutorial'); $config->set('HTML', 'DefinitionID', 'enduser-customize.html tutorial');
$config->set('HTML', 'DefinitionRev', 1); $config->set('HTML', 'DefinitionRev', 1);
$config->set('Core', 'DefinitionCache', null); // remove this later! $config->set('Cache', 'DefinitionImpl', null); // remove this later!
$def = $config->getHTMLDefinition(true); $def = $config->getHTMLDefinition(true);
<strong>$def->addAttribute('a', 'target', 'Enum#_blank,_self,_target,_top');</strong></pre> <strong>$def->addAttribute('a', 'target', 'Enum#_blank,_self,_target,_top');</strong></pre>
@@ -372,10 +372,10 @@ $def = $config->getHTMLDefinition(true);
<p> <p>
For a complete list, consult For a complete list, consult
<a href="http://htmlpurifier.org/svnroot/htmlpurifier/trunk/library/HTMLPurifier/AttrTypes.php"><code>library/HTMLPurifier/AttrTypes.php</code></a>; <a href="http://repo.or.cz/w/htmlpurifier.git?a=blob;hb=HEAD;f=library/HTMLPurifier/AttrTypes.php"><code>library/HTMLPurifier/AttrTypes.php</code></a>;
more information on attributes that accept parameters can be found on their more information on attributes that accept parameters can be found on their
respective includes in respective includes in
<a href="http://htmlpurifier.org/svnroot/htmlpurifier/trunk/library/HTMLPurifier/AttrDef/"><code>library/HTMLPurifier/AttrDef</code></a>. <a href="http://repo.or.cz/w/htmlpurifier.git?a=tree;hb=HEAD;f=library/HTMLPurifier/AttrDef"><code>library/HTMLPurifier/AttrDef</code></a>.
</p> </p>
<p> <p>
@@ -387,7 +387,7 @@ $def = $config->getHTMLDefinition(true);
<pre>$config = HTMLPurifier_Config::createDefault(); <pre>$config = HTMLPurifier_Config::createDefault();
$config->set('HTML', 'DefinitionID', 'enduser-customize.html tutorial'); $config->set('HTML', 'DefinitionID', 'enduser-customize.html tutorial');
$config->set('HTML', 'DefinitionRev', 1); $config->set('HTML', 'DefinitionRev', 1);
$config->set('Core', 'DefinitionCache', null); // remove this later! $config->set('Cache', 'DefinitionImpl', null); // remove this later!
$def = $config->getHTMLDefinition(true); $def = $config->getHTMLDefinition(true);
<strong>$def->addAttribute('a', 'target', new HTMLPurifier_AttrDef_Enum( <strong>$def->addAttribute('a', 'target', new HTMLPurifier_AttrDef_Enum(
array('_blank','_self','_target','_top') array('_blank','_self','_target','_top')
@@ -734,7 +734,7 @@ $def = $config->getHTMLDefinition(true);
<pre>$config = HTMLPurifier_Config::createDefault(); <pre>$config = HTMLPurifier_Config::createDefault();
$config->set('HTML', 'DefinitionID', 'enduser-customize.html tutorial'); $config->set('HTML', 'DefinitionID', 'enduser-customize.html tutorial');
$config->set('HTML', 'DefinitionRev', 1); $config->set('HTML', 'DefinitionRev', 1);
$config->set('Core', 'DefinitionCache', null); // remove this later! $config->set('Cache', 'DefinitionImpl', null); // remove this later!
$def = $config->getHTMLDefinition(true); $def = $config->getHTMLDefinition(true);
$def->addAttribute('a', 'target', new HTMLPurifier_AttrDef_Enum( $def->addAttribute('a', 'target', new HTMLPurifier_AttrDef_Enum(
array('_blank','_self','_target','_top') array('_blank','_self','_target','_top')
@@ -764,7 +764,7 @@ $form->excludes = array('form' => true);</strong></pre>
<p> <p>
And that's all there is to it! Implementing the rest of the form And that's all there is to it! Implementing the rest of the form
module is left as an exercise to the user; to see more examples module is left as an exercise to the user; to see more examples
check the <a href="http://htmlpurifier.org/svnroot/htmlpurifier/trunk/library/HTMLPurifier/HTMLModule/"><code>library/HTMLPurifier/HTMLModule/</code></a> directory check the <a href="http://repo.or.cz/w/htmlpurifier.git?a=tree;hb=HEAD;f=library/HTMLPurifier/HTMLModule"><code>library/HTMLPurifier/HTMLModule/</code></a> directory
in your local HTML Purifier installation. in your local HTML Purifier installation.
</p> </p>
@@ -789,10 +789,8 @@ $form->excludes = array('form' => true);</strong></pre>
</p> </p>
<ul> <ul>
<li><a href="http://htmlpurifier.org/svnroot/htmlpurifier/trunk/library/HTMLPurifier/HTMLModule.php"><code>library/HTMLPurifier/HTMLModule.php</code></a></li> <li><a href="http://repo.or.cz/w/htmlpurifier.git?a=blob;hb=HEAD;f=library/HTMLPurifier/HTMLModule.php"><code>library/HTMLPurifier/HTMLModule.php</code></a></li>
<li><a href="http://htmlpurifier.org/svnroot/htmlpurifier/trunk/library/HTMLPurifier/ElementDef.php"><code>library/HTMLPurifier/ElementDef.php</code></a></li> <li><a href="http://repo.or.cz/w/htmlpurifier.git?a=blob;hb=HEAD;f=library/HTMLPurifier/ElementDef.php"><code>library/HTMLPurifier/ElementDef.php</code></a></li>
</ul> </ul>
<div id="version">$Id: enduser-tidy.html 1158 2007-06-18 19:26:29Z Edward $</div>
</body></html> </body></html>

View File

@@ -141,7 +141,5 @@ anchors is beyond me.</p>
<p>Don't come crying to me when your page mysteriously stops validating, though.</p> <p>Don't come crying to me when your page mysteriously stops validating, though.</p>
<div id="version">$Id$</div>
</body> </body>
</html> </html>

View File

@@ -225,6 +225,4 @@ and if that still doesn't satisfy your appetite, do some fine-tuning.
Other than that, don't worry about it: this all works silently and Other than that, don't worry about it: this all works silently and
effectively in the background.</p> effectively in the background.</p>
<div id="version">$Id$</div>
</body></html> </body></html>

View File

@@ -205,12 +205,10 @@ $uri->registerFilter(new HTMLPurifier_URIFilter_<strong>NameOfFilter</strong>())
<p> <p>
Check the Check the
<a href="http://htmlpurifier.org/svnroot/htmlpurifier/trunk/library/HTMLPurifier/URIFilter/">URIFilter</a> <a href="http://repo.or.cz/w/htmlpurifier.git?a=tree;hb=HEAD;f=library/HTMLPurifier/URIFilter">URIFilter</a>
directory for more implementation examples, and see <a href="http://htmlpurifier.org/svnroot/htmlpurifier/trunk/docs/proposal-new-directives.txt">the directory for more implementation examples, and see <a href="proposal-new-directives.txt">the
new directives proposal document</a> for ideas on what could be implemented new directives proposal document</a> for ideas on what could be implemented
as a filter. as a filter.
</p> </p>
<div id="version">$Id$</div>
</body></html> </body></html>

View File

@@ -589,8 +589,10 @@ looks something like: <code>%C3%86</code>. There is no official way of
determining the character encoding of such a request, since the percent determining the character encoding of such a request, since the percent
encoding operates on a byte level, so it is usually assumed that it encoding operates on a byte level, so it is usually assumed that it
is the same as the encoding the page containing the form was submitted is the same as the encoding the page containing the form was submitted
in. You'll run into very few problems if you only use characters in in. (<a href="http://tools.ietf.org/html/rfc3986#section-2.5">RFC 3986</a>
the character encoding you chose.</p> recommends that textual identifiers be translated to UTF-8; however, browser
compliance is spotty.) You'll run into very few problems
if you only use characters in the character encoding you chose.</p>
<p>However, once you start adding characters outside of your encoding <p>However, once you start adding characters outside of your encoding
(and this is a lot more common than you may think: take curly (and this is a lot more common than you may think: take curly

View File

@@ -70,7 +70,7 @@ into your documents. YouTube's code goes like this:</p>
class=&quot;embed-youtube&quot;&gt;AyPzM5WK8ys&lt;/span&gt;</code> your class=&quot;embed-youtube&quot;&gt;AyPzM5WK8ys&lt;/span&gt;</code> your
application can reconstruct the full object from this small snippet that application can reconstruct the full object from this small snippet that
passes through HTML Purifier <em>unharmed</em>. passes through HTML Purifier <em>unharmed</em>.
<a href="http://htmlpurifier.org/svnroot/htmlpurifier/trunk/library/HTMLPurifier/Filter/YouTube.php">Show me the code!</a></p> <a href="http://repo.or.cz/w/htmlpurifier.git?a=blob;hb=HEAD;f=library/HTMLPurifier/Filter/YouTube.php">Show me the code!</a></p>
<p>And the corresponding usage:</p> <p>And the corresponding usage:</p>

View File

@@ -98,8 +98,8 @@ the code. They may be upgraded to HTML files or stay as TXT scratchpads.</p>
<table class="table"> <table class="table">
<thead><tr> <thead><tr>
<th width="10%">Type</th> <th style="width:10%">Type</th>
<th width="20%">Name</th> <th style="width:20%">Name</th>
<th>Description</th> <th>Description</th>
</tr></thead> </tr></thead>
@@ -175,6 +175,5 @@ the code. They may be upgraded to HTML files or stay as TXT scratchpads.</p>
</table> </table>
<div id="version">$Id$</div>
</body> </body>
</html> </html>

View File

@@ -42,7 +42,5 @@ into the mix.</li>
something like that?</li> something like that?</li>
</ol> </ol>
<div id="version">$Id$</div>
</body> </body>
</html> </html>

209
docs/proposal-errors.txt Normal file
View File

@@ -0,0 +1,209 @@
Considerations for ErrorCollection
Presently, HTML Purifier takes a code-execution centric approach to handling
errors. Errors are organized and grouped according to which segment of the
code triggers them, not necessarily the portion of the input document that
triggered the error. This means that errors are pseudo-sorted by category,
rather than location in the document.
One easy way to "fix" this problem would be to re-sort according to line number.
However, the "category" style information we derive from naively following
program execution is still useful. After all, each of the strategies which
can report errors still process the document mostly linearly. Furthermore,
not only do they process linearly, but the way they pass off operations to
sub-systems mirrors that of the document. For example, AttrValidator will
linearly proceed through elements, and on each element will use AttrDef to
validate those contents. From there, the attribute might have more
sub-components, which have execution passed off accordingly.
In fact, each strategy handles a very specific class of "error."
RemoveForeignElements - element tokens
MakeWellFormed - element token ordering
FixNesting - element token ordering
ValidateAttributes - attributes of elements
The crucial point is that while we care about the hierarchy governing these
different errors, we *don't* care about any other information about what actually
happens to the elements. This brings up another point: if HTML Purifier fixes
something, this is not really a notice/warning/error; it's really a suggestion
of a way to fix the aforementioned defects.
In short, the refactoring to take this into account kinda sucks.
Errors should not be recorded in order that they are reported. Instead, they
should be bound to the line (and preferably element) in which they were found.
This means we need some way to uniquely identify every element in the document,
which doesn't presently exist. An easy way of adding this would be to track
line columns. An important ramification of this is that we *must* use the
DirectLex implementation.
1. Implement column numbers for DirectLex [DONE!]
2. Disable error collection when not using DirectLex [DONE!]
Next, we need to re-orient all of the error declarations to place CurrentToken
at utmost important. Since this is passed via Context, it's not always clear
if that's available. ErrorCollector should complain HARD if it isn't available.
There are some locations when we don't have a token available. These include:
* Lexing - this can actually have a row and column, but NOT correspond to
a token
* End of document errors - bump this to the end
Actually, we *don't* have to complain if CurrentToken isn't available; we just
set it as a document-wide error. And actually, nothing needs to be done here.
Something interesting to consider is whether or not we care about the locations
of attributes and CSS properties, i.e. the sub-objects that compose these things.
In terms of consistency, at the very least attributes should have column/line
numbers attached to them. However, this may be overkill, as attributes are
uniquely identifiable. You could go even further, with CSS, but they are also
uniquely identifiable.
Bottom-line is, however, this information must be available, in form of the
CurrentAttribute and CurrentCssProperty (theoretical) context variables, and
it must be used to organize the errors that the sub-processes may throw.
There is also a hierarchy of sorts that may make merging this into one context
variable more sense, if it hadn't been for HTML's reasonably rigid structure.
A CSS property will never contain an HTML attribute. So we won't ever get
recursive relations, and having multiple depths won't ever make sense. Leave
this be.
We already have this information, and consequently, using start and end is
*unnecessary*, so long as the context variables are set appropriately. We don't
care if an error was thrown by an attribute transform or an attribute definition;
to the end user these are the same (for a developer, they are different, but
they're better off with a stack trace (which we should add support for) in such
cases).
3. Remove start()/end() code. Don't get rid of recursion, though [DONE]
4. Setup ErrorCollector to use context information to setup hierarchies.
This may require a different internal format. Use objects if it gets
complex. [DONE]
ASIDE
More on this topic: since we are now binding errors to lines
and columns, a particular error can have three relationships to that
specific location:
1. The token at that location directly
RemoveForeignElements
AttrValidator (transforms)
MakeWellFormed
2. A "component" of that token (i.e. attribute)
AttrValidator (removals)
3. A modification to that node (i.e. contents from start to end
token) as a whole
FixNesting
This needs to be marked accordingly. In the presentation, it might
make sense keep (3) separate, have (2) a sublist of (1). (1) can
be a closing tag, in which case (3) makes no sense at all, OR it
should be related with its opening tag (this may not necessarily
be possible before MakeWellFormed is run).
So, the line and column counts as our identifier, so:
$errors[$line][$col] = ...
Then, we need to identify case 1, 2 or 3. They are identified as
such:
1. Need some sort of semaphore in RemoveForeignElements, etc.
2. If CurrentAttr/CurrentCssProperty is non-null
3. Default (FixNesting, MakeWellFormed)
One consideration about (1) is that it usually is actually a
(3) modification, but we have no way of knowing about that because
of various optimizations. However, they can probably be treated
the same. The other difficulty is that (3) is never a line and
column; rather, it is a range (i.e. a duple) and telling the user
the very start of the range may confuse them. For example,
<b>Foo<div>bar</div></b>
^ ^
The node being operated on is <b>, so the error would be assigned
to the first caret, with a "node reorganized" error. Then, the
ChildDef would have submitted its own suggestions and errors with
regard to what's going in the internals. So I suppose this is
ok. :-)
Now, the structure of the earlier mentioned ... would be something
like this:
object {
type = (token|attr|property),
value, // appropriate for type
errors => array(),
sub-errors = [recursive],
}
This helps us keep things agnostic. It is also sufficiently complex
enough to warrant an object.
So, more wanking about the object format is in order. The way HTML Purifier is
currently setup, the only possible hierarchy is:
token -> attr -> css property
These relations do not exist all of the time; a comment or end token would not
ever have any attributes, and non-style attributes would never have CSS properties
associated with them.
I believe that it is worth supporting multiple paths. At some point, we might
have a hierarchy like:
* -> syntax
-> token -> attr -> css property
-> url
-> css stylesheet <style>
et cetera. Now, one of the practical implications of this is that every "node"
on our tree is well-defined, so in theory it should be possible to either 1.
create a separate class for each error struct, or 2. embed this information
directly into HTML Purifier's token stream. Embedding the information in the
token stream is not a terribly good idea, since tokens can be removed, etc.
So that leaves us with 1... and if we use a generic interface we can cut down
on a lot of code we might need. So let's leave it like this.
~~~~
Then we setup suggestions.
5. Setup a separate error class which tells the user any modifications
HTML Purifier made.
Some information about this:
Our current paradigm is to tell the user what HTML Purifier did to the HTML.
This is the most natural mode of operation, since that's what HTML Purifier
is all about; it was not meant to be a validator.
However, most other people have experience dealing with a validator. In cases
where HTML Purifier unambiguously does the right thing, simply giving the user
the correct version isn't a bad idea, but problems arise when:
- The user has such bad HTML we do something odd, when we should have just
flagged the HTML as an error. Such examples are when we do things like
remove text from directly inside a <table> tag. It was probably meant to
be in a <td> tag or be outside the table, but we're not smart enough to
realize this so we just remove it. In such a case, we should tell the user
that there was foreign data in the table, but then we shouldn't "demand"
the user remove the data; it's more of a "here's a possible way of
rectifying the problem"
- Giving line context for input is hard enough, but feasible; giving output
line context will be extremely difficult due to shifting lines; we'd probably
have to track what the tokens are and then find the appropriate out context
and it's not guaranteed to work etc etc etc.
````````````
Don't forget to spruce up output.
6. Output needs to automatically give line and column numbers, basically
"at line" on steroids. Look at W3C's output; it's ok. [PARTIALLY DONE]
- We need a standard CSS to apply (check demo.css for some starting
styling; some buttons would also be hip)

View File

@@ -40,6 +40,5 @@ the development of this library in these forum threads:</p>
<p>...as well as any I may have forgotten.</p> <p>...as well as any I may have forgotten.</p>
<div id="version">$Id$</div>
</body> </body>
</html> </html>

View File

@@ -7,7 +7,7 @@
* primary concern and you are using an opcode cache. PLEASE DO NOT EDIT THIS * primary concern and you are using an opcode cache. PLEASE DO NOT EDIT THIS
* FILE, changes will be overwritten the next time the script is run. * FILE, changes will be overwritten the next time the script is run.
* *
* @version 3.1.1 * @version 3.2.0
* *
* @warning * @warning
* You must *not* include any other HTML Purifier files before this file, * You must *not* include any other HTML Purifier files before this file,
@@ -41,6 +41,7 @@ require 'HTMLPurifier/Encoder.php';
require 'HTMLPurifier/EntityLookup.php'; require 'HTMLPurifier/EntityLookup.php';
require 'HTMLPurifier/EntityParser.php'; require 'HTMLPurifier/EntityParser.php';
require 'HTMLPurifier/ErrorCollector.php'; require 'HTMLPurifier/ErrorCollector.php';
require 'HTMLPurifier/ErrorStruct.php';
require 'HTMLPurifier/Exception.php'; require 'HTMLPurifier/Exception.php';
require 'HTMLPurifier/Filter.php'; require 'HTMLPurifier/Filter.php';
require 'HTMLPurifier/Generator.php'; require 'HTMLPurifier/Generator.php';
@@ -108,6 +109,7 @@ require 'HTMLPurifier/AttrDef/URI/Host.php';
require 'HTMLPurifier/AttrDef/URI/IPv4.php'; require 'HTMLPurifier/AttrDef/URI/IPv4.php';
require 'HTMLPurifier/AttrDef/URI/IPv6.php'; require 'HTMLPurifier/AttrDef/URI/IPv6.php';
require 'HTMLPurifier/AttrDef/URI/Email/SimpleCheck.php'; require 'HTMLPurifier/AttrDef/URI/Email/SimpleCheck.php';
require 'HTMLPurifier/AttrTransform/Background.php';
require 'HTMLPurifier/AttrTransform/BdoDir.php'; require 'HTMLPurifier/AttrTransform/BdoDir.php';
require 'HTMLPurifier/AttrTransform/BgColor.php'; require 'HTMLPurifier/AttrTransform/BgColor.php';
require 'HTMLPurifier/AttrTransform/BoolToCSS.php'; require 'HTMLPurifier/AttrTransform/BoolToCSS.php';
@@ -115,6 +117,7 @@ require 'HTMLPurifier/AttrTransform/Border.php';
require 'HTMLPurifier/AttrTransform/EnumToCSS.php'; require 'HTMLPurifier/AttrTransform/EnumToCSS.php';
require 'HTMLPurifier/AttrTransform/ImgRequired.php'; require 'HTMLPurifier/AttrTransform/ImgRequired.php';
require 'HTMLPurifier/AttrTransform/ImgSpace.php'; require 'HTMLPurifier/AttrTransform/ImgSpace.php';
require 'HTMLPurifier/AttrTransform/Input.php';
require 'HTMLPurifier/AttrTransform/Lang.php'; require 'HTMLPurifier/AttrTransform/Lang.php';
require 'HTMLPurifier/AttrTransform/Length.php'; require 'HTMLPurifier/AttrTransform/Length.php';
require 'HTMLPurifier/AttrTransform/Name.php'; require 'HTMLPurifier/AttrTransform/Name.php';
@@ -122,6 +125,7 @@ require 'HTMLPurifier/AttrTransform/SafeEmbed.php';
require 'HTMLPurifier/AttrTransform/SafeObject.php'; require 'HTMLPurifier/AttrTransform/SafeObject.php';
require 'HTMLPurifier/AttrTransform/SafeParam.php'; require 'HTMLPurifier/AttrTransform/SafeParam.php';
require 'HTMLPurifier/AttrTransform/ScriptRequired.php'; require 'HTMLPurifier/AttrTransform/ScriptRequired.php';
require 'HTMLPurifier/AttrTransform/Textarea.php';
require 'HTMLPurifier/ChildDef/Chameleon.php'; require 'HTMLPurifier/ChildDef/Chameleon.php';
require 'HTMLPurifier/ChildDef/Custom.php'; require 'HTMLPurifier/ChildDef/Custom.php';
require 'HTMLPurifier/ChildDef/Empty.php'; require 'HTMLPurifier/ChildDef/Empty.php';
@@ -137,10 +141,12 @@ require 'HTMLPurifier/DefinitionCache/Decorator/Memory.php';
require 'HTMLPurifier/HTMLModule/Bdo.php'; require 'HTMLPurifier/HTMLModule/Bdo.php';
require 'HTMLPurifier/HTMLModule/CommonAttributes.php'; require 'HTMLPurifier/HTMLModule/CommonAttributes.php';
require 'HTMLPurifier/HTMLModule/Edit.php'; require 'HTMLPurifier/HTMLModule/Edit.php';
require 'HTMLPurifier/HTMLModule/Forms.php';
require 'HTMLPurifier/HTMLModule/Hypertext.php'; require 'HTMLPurifier/HTMLModule/Hypertext.php';
require 'HTMLPurifier/HTMLModule/Image.php'; require 'HTMLPurifier/HTMLModule/Image.php';
require 'HTMLPurifier/HTMLModule/Legacy.php'; require 'HTMLPurifier/HTMLModule/Legacy.php';
require 'HTMLPurifier/HTMLModule/List.php'; require 'HTMLPurifier/HTMLModule/List.php';
require 'HTMLPurifier/HTMLModule/Name.php';
require 'HTMLPurifier/HTMLModule/NonXMLCommonAttributes.php'; require 'HTMLPurifier/HTMLModule/NonXMLCommonAttributes.php';
require 'HTMLPurifier/HTMLModule/Object.php'; require 'HTMLPurifier/HTMLModule/Object.php';
require 'HTMLPurifier/HTMLModule/Presentation.php'; require 'HTMLPurifier/HTMLModule/Presentation.php';
@@ -155,14 +161,17 @@ require 'HTMLPurifier/HTMLModule/Target.php';
require 'HTMLPurifier/HTMLModule/Text.php'; require 'HTMLPurifier/HTMLModule/Text.php';
require 'HTMLPurifier/HTMLModule/Tidy.php'; require 'HTMLPurifier/HTMLModule/Tidy.php';
require 'HTMLPurifier/HTMLModule/XMLCommonAttributes.php'; require 'HTMLPurifier/HTMLModule/XMLCommonAttributes.php';
require 'HTMLPurifier/HTMLModule/Tidy/Name.php';
require 'HTMLPurifier/HTMLModule/Tidy/Proprietary.php'; require 'HTMLPurifier/HTMLModule/Tidy/Proprietary.php';
require 'HTMLPurifier/HTMLModule/Tidy/XHTMLAndHTML4.php'; require 'HTMLPurifier/HTMLModule/Tidy/XHTMLAndHTML4.php';
require 'HTMLPurifier/HTMLModule/Tidy/Strict.php'; require 'HTMLPurifier/HTMLModule/Tidy/Strict.php';
require 'HTMLPurifier/HTMLModule/Tidy/Transitional.php'; require 'HTMLPurifier/HTMLModule/Tidy/Transitional.php';
require 'HTMLPurifier/HTMLModule/Tidy/XHTML.php'; require 'HTMLPurifier/HTMLModule/Tidy/XHTML.php';
require 'HTMLPurifier/Injector/AutoParagraph.php'; require 'HTMLPurifier/Injector/AutoParagraph.php';
require 'HTMLPurifier/Injector/DisplayLinkURI.php';
require 'HTMLPurifier/Injector/Linkify.php'; require 'HTMLPurifier/Injector/Linkify.php';
require 'HTMLPurifier/Injector/PurifierLinkify.php'; require 'HTMLPurifier/Injector/PurifierLinkify.php';
require 'HTMLPurifier/Injector/RemoveEmpty.php';
require 'HTMLPurifier/Injector/SafeObject.php'; require 'HTMLPurifier/Injector/SafeObject.php';
require 'HTMLPurifier/Lexer/DOMLex.php'; require 'HTMLPurifier/Lexer/DOMLex.php';
require 'HTMLPurifier/Lexer/DirectLex.php'; require 'HTMLPurifier/Lexer/DirectLex.php';

View File

@@ -19,7 +19,7 @@
*/ */
/* /*
HTML Purifier 3.1.1 - Standards Compliant HTML Filtering HTML Purifier 3.2.0 - Standards Compliant HTML Filtering
Copyright (C) 2006-2008 Edward Z. Yang Copyright (C) 2006-2008 Edward Z. Yang
This library is free software; you can redistribute it and/or This library is free software; you can redistribute it and/or
@@ -55,10 +55,10 @@ class HTMLPurifier
{ {
/** Version of HTML Purifier */ /** Version of HTML Purifier */
public $version = '3.1.1'; public $version = '3.2.0';
/** Constant with version of HTML Purifier */ /** Constant with version of HTML Purifier */
const VERSION = '3.1.1'; const VERSION = '3.2.0';
/** Global configuration object */ /** Global configuration object */
public $config; public $config;

View File

@@ -35,6 +35,7 @@ require_once $__dir . '/HTMLPurifier/Encoder.php';
require_once $__dir . '/HTMLPurifier/EntityLookup.php'; require_once $__dir . '/HTMLPurifier/EntityLookup.php';
require_once $__dir . '/HTMLPurifier/EntityParser.php'; require_once $__dir . '/HTMLPurifier/EntityParser.php';
require_once $__dir . '/HTMLPurifier/ErrorCollector.php'; require_once $__dir . '/HTMLPurifier/ErrorCollector.php';
require_once $__dir . '/HTMLPurifier/ErrorStruct.php';
require_once $__dir . '/HTMLPurifier/Exception.php'; require_once $__dir . '/HTMLPurifier/Exception.php';
require_once $__dir . '/HTMLPurifier/Filter.php'; require_once $__dir . '/HTMLPurifier/Filter.php';
require_once $__dir . '/HTMLPurifier/Generator.php'; require_once $__dir . '/HTMLPurifier/Generator.php';
@@ -102,6 +103,7 @@ require_once $__dir . '/HTMLPurifier/AttrDef/URI/Host.php';
require_once $__dir . '/HTMLPurifier/AttrDef/URI/IPv4.php'; require_once $__dir . '/HTMLPurifier/AttrDef/URI/IPv4.php';
require_once $__dir . '/HTMLPurifier/AttrDef/URI/IPv6.php'; require_once $__dir . '/HTMLPurifier/AttrDef/URI/IPv6.php';
require_once $__dir . '/HTMLPurifier/AttrDef/URI/Email/SimpleCheck.php'; require_once $__dir . '/HTMLPurifier/AttrDef/URI/Email/SimpleCheck.php';
require_once $__dir . '/HTMLPurifier/AttrTransform/Background.php';
require_once $__dir . '/HTMLPurifier/AttrTransform/BdoDir.php'; require_once $__dir . '/HTMLPurifier/AttrTransform/BdoDir.php';
require_once $__dir . '/HTMLPurifier/AttrTransform/BgColor.php'; require_once $__dir . '/HTMLPurifier/AttrTransform/BgColor.php';
require_once $__dir . '/HTMLPurifier/AttrTransform/BoolToCSS.php'; require_once $__dir . '/HTMLPurifier/AttrTransform/BoolToCSS.php';
@@ -109,6 +111,7 @@ require_once $__dir . '/HTMLPurifier/AttrTransform/Border.php';
require_once $__dir . '/HTMLPurifier/AttrTransform/EnumToCSS.php'; require_once $__dir . '/HTMLPurifier/AttrTransform/EnumToCSS.php';
require_once $__dir . '/HTMLPurifier/AttrTransform/ImgRequired.php'; require_once $__dir . '/HTMLPurifier/AttrTransform/ImgRequired.php';
require_once $__dir . '/HTMLPurifier/AttrTransform/ImgSpace.php'; require_once $__dir . '/HTMLPurifier/AttrTransform/ImgSpace.php';
require_once $__dir . '/HTMLPurifier/AttrTransform/Input.php';
require_once $__dir . '/HTMLPurifier/AttrTransform/Lang.php'; require_once $__dir . '/HTMLPurifier/AttrTransform/Lang.php';
require_once $__dir . '/HTMLPurifier/AttrTransform/Length.php'; require_once $__dir . '/HTMLPurifier/AttrTransform/Length.php';
require_once $__dir . '/HTMLPurifier/AttrTransform/Name.php'; require_once $__dir . '/HTMLPurifier/AttrTransform/Name.php';
@@ -116,6 +119,7 @@ require_once $__dir . '/HTMLPurifier/AttrTransform/SafeEmbed.php';
require_once $__dir . '/HTMLPurifier/AttrTransform/SafeObject.php'; require_once $__dir . '/HTMLPurifier/AttrTransform/SafeObject.php';
require_once $__dir . '/HTMLPurifier/AttrTransform/SafeParam.php'; require_once $__dir . '/HTMLPurifier/AttrTransform/SafeParam.php';
require_once $__dir . '/HTMLPurifier/AttrTransform/ScriptRequired.php'; require_once $__dir . '/HTMLPurifier/AttrTransform/ScriptRequired.php';
require_once $__dir . '/HTMLPurifier/AttrTransform/Textarea.php';
require_once $__dir . '/HTMLPurifier/ChildDef/Chameleon.php'; require_once $__dir . '/HTMLPurifier/ChildDef/Chameleon.php';
require_once $__dir . '/HTMLPurifier/ChildDef/Custom.php'; require_once $__dir . '/HTMLPurifier/ChildDef/Custom.php';
require_once $__dir . '/HTMLPurifier/ChildDef/Empty.php'; require_once $__dir . '/HTMLPurifier/ChildDef/Empty.php';
@@ -131,10 +135,12 @@ require_once $__dir . '/HTMLPurifier/DefinitionCache/Decorator/Memory.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/Bdo.php'; require_once $__dir . '/HTMLPurifier/HTMLModule/Bdo.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/CommonAttributes.php'; require_once $__dir . '/HTMLPurifier/HTMLModule/CommonAttributes.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/Edit.php'; require_once $__dir . '/HTMLPurifier/HTMLModule/Edit.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/Forms.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/Hypertext.php'; require_once $__dir . '/HTMLPurifier/HTMLModule/Hypertext.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/Image.php'; require_once $__dir . '/HTMLPurifier/HTMLModule/Image.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/Legacy.php'; require_once $__dir . '/HTMLPurifier/HTMLModule/Legacy.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/List.php'; require_once $__dir . '/HTMLPurifier/HTMLModule/List.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/Name.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/NonXMLCommonAttributes.php'; require_once $__dir . '/HTMLPurifier/HTMLModule/NonXMLCommonAttributes.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/Object.php'; require_once $__dir . '/HTMLPurifier/HTMLModule/Object.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/Presentation.php'; require_once $__dir . '/HTMLPurifier/HTMLModule/Presentation.php';
@@ -149,14 +155,17 @@ require_once $__dir . '/HTMLPurifier/HTMLModule/Target.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/Text.php'; require_once $__dir . '/HTMLPurifier/HTMLModule/Text.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/Tidy.php'; require_once $__dir . '/HTMLPurifier/HTMLModule/Tidy.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/XMLCommonAttributes.php'; require_once $__dir . '/HTMLPurifier/HTMLModule/XMLCommonAttributes.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/Tidy/Name.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/Tidy/Proprietary.php'; require_once $__dir . '/HTMLPurifier/HTMLModule/Tidy/Proprietary.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/Tidy/XHTMLAndHTML4.php'; require_once $__dir . '/HTMLPurifier/HTMLModule/Tidy/XHTMLAndHTML4.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/Tidy/Strict.php'; require_once $__dir . '/HTMLPurifier/HTMLModule/Tidy/Strict.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/Tidy/Transitional.php'; require_once $__dir . '/HTMLPurifier/HTMLModule/Tidy/Transitional.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/Tidy/XHTML.php'; require_once $__dir . '/HTMLPurifier/HTMLModule/Tidy/XHTML.php';
require_once $__dir . '/HTMLPurifier/Injector/AutoParagraph.php'; require_once $__dir . '/HTMLPurifier/Injector/AutoParagraph.php';
require_once $__dir . '/HTMLPurifier/Injector/DisplayLinkURI.php';
require_once $__dir . '/HTMLPurifier/Injector/Linkify.php'; require_once $__dir . '/HTMLPurifier/Injector/Linkify.php';
require_once $__dir . '/HTMLPurifier/Injector/PurifierLinkify.php'; require_once $__dir . '/HTMLPurifier/Injector/PurifierLinkify.php';
require_once $__dir . '/HTMLPurifier/Injector/RemoveEmpty.php';
require_once $__dir . '/HTMLPurifier/Injector/SafeObject.php'; require_once $__dir . '/HTMLPurifier/Injector/SafeObject.php';
require_once $__dir . '/HTMLPurifier/Lexer/DOMLex.php'; require_once $__dir . '/HTMLPurifier/Lexer/DOMLex.php';
require_once $__dir . '/HTMLPurifier/Lexer/DirectLex.php'; require_once $__dir . '/HTMLPurifier/Lexer/DirectLex.php';

View File

@@ -42,10 +42,7 @@ class HTMLPurifier_AttrDef_HTML_LinkTypes extends HTMLPurifier_AttrDef
} }
if (empty($ret_lookup)) return false; if (empty($ret_lookup)) return false;
$string = implode(' ', array_keys($ret_lookup));
$ret_array = array();
foreach ($ret_lookup as $part => $bool) $ret_array[] = $part;
$string = implode(' ', $ret_array);
return $string; return $string;

View File

@@ -0,0 +1,22 @@
<?php
/**
* Pre-transform that changes proprietary background attribute to CSS.
*/
class HTMLPurifier_AttrTransform_Background extends HTMLPurifier_AttrTransform {
public function transform($attr, $config, $context) {
if (!isset($attr['background'])) return $attr;
$background = $this->confiscateAttr($attr, 'background');
// some validation should happen here
$this->prependCSS($attr, "background-image:url($background);");
return $attr;
}
}

View File

@@ -22,7 +22,12 @@ class HTMLPurifier_AttrTransform_ImgRequired extends HTMLPurifier_AttrTransform
if (!isset($attr['alt'])) { if (!isset($attr['alt'])) {
if ($src) { if ($src) {
$attr['alt'] = basename($attr['src']); $alt = $config->get('Attr', 'DefaultImageAlt');
if ($alt === null) {
$attr['alt'] = basename($attr['src']);
} else {
$attr['alt'] = $alt;
}
} else { } else {
$attr['alt'] = $config->get('Attr', 'DefaultInvalidImageAlt'); $attr['alt'] = $config->get('Attr', 'DefaultInvalidImageAlt');
} }

View File

@@ -0,0 +1,39 @@
<?php
/**
* Performs miscellaneous cross attribute validation and filtering for
* input elements. This is meant to be a post-transform.
*/
class HTMLPurifier_AttrTransform_Input extends HTMLPurifier_AttrTransform {
protected $pixels;
public function __construct() {
$this->pixels = new HTMLPurifier_AttrDef_HTML_Pixels();
}
public function transform($attr, $config, $context) {
if (!isset($attr['type'])) $t = 'text';
else $t = strtolower($attr['type']);
if (isset($attr['checked']) && $t !== 'radio' && $t !== 'checkbox') {
unset($attr['checked']);
}
if (isset($attr['maxlength']) && $t !== 'text' && $t !== 'password') {
unset($attr['maxlength']);
}
if (isset($attr['size']) && $t !== 'text' && $t !== 'password') {
$result = $this->pixels->validate($attr['size'], $config, $context);
if ($result === false) unset($attr['size']);
else $attr['size'] = $result;
}
if (isset($attr['src']) && $t !== 'image') {
unset($attr['src']);
}
if (!isset($attr['value']) && ($t === 'radio' || $t === 'checkbox')) {
$attr['value'] = '';
}
return $attr;
}
}

View File

@@ -0,0 +1,16 @@
<?php
/**
* Sets height/width defaults for <textarea>
*/
class HTMLPurifier_AttrTransform_Textarea extends HTMLPurifier_AttrTransform
{
public function transform($attr, $config, $context) {
// Calculated from Firefox
if (!isset($attr['cols'])) $attr['cols'] = '22';
if (!isset($attr['rows'])) $attr['rows'] = '3';
return $attr;
}
}

View File

@@ -32,6 +32,9 @@ class HTMLPurifier_AttrTypes
// unimplemented aliases // unimplemented aliases
$this->info['ContentType'] = new HTMLPurifier_AttrDef_Text(); $this->info['ContentType'] = new HTMLPurifier_AttrDef_Text();
$this->info['ContentTypes'] = new HTMLPurifier_AttrDef_Text();
$this->info['Charsets'] = new HTMLPurifier_AttrDef_Text();
$this->info['Character'] = new HTMLPurifier_AttrDef_Text();
// number is really a positive integer (one or more digits) // number is really a positive integer (one or more digits)
// FIXME: ^^ not always, see start and value of list items // FIXME: ^^ not always, see start and value of list items

View File

@@ -35,8 +35,8 @@ class HTMLPurifier_AttrValidator
if (!$current_token) $context->register('CurrentToken', $token); if (!$current_token) $context->register('CurrentToken', $token);
if ( if (
!$token instanceof HTMLPurifier_Token_Start && !$token instanceof HTMLPurifier_Token_Start &&
!$token instanceof HTMLPurifier_Token_Empty !$token instanceof HTMLPurifier_Token_Empty
) return $token; ) return $token;
// create alias to global definition array, see also $defs // create alias to global definition array, see also $defs
@@ -50,14 +50,18 @@ class HTMLPurifier_AttrValidator
// nothing currently utilizes this // nothing currently utilizes this
foreach ($definition->info_attr_transform_pre as $transform) { foreach ($definition->info_attr_transform_pre as $transform) {
$attr = $transform->transform($o = $attr, $config, $context); $attr = $transform->transform($o = $attr, $config, $context);
if ($e && ($attr != $o)) $e->send(E_NOTICE, 'AttrValidator: Attributes transformed', $o, $attr); if ($e) {
if ($attr != $o) $e->send(E_NOTICE, 'AttrValidator: Attributes transformed', $o, $attr);
}
} }
// do local transformations only applicable to this element (pre) // do local transformations only applicable to this element (pre)
// ex. <p align="right"> to <p style="text-align:right;"> // ex. <p align="right"> to <p style="text-align:right;">
foreach ($definition->info[$token->name]->attr_transform_pre as $transform) { foreach ($definition->info[$token->name]->attr_transform_pre as $transform) {
$attr = $transform->transform($o = $attr, $config, $context); $attr = $transform->transform($o = $attr, $config, $context);
if ($e && ($attr != $o)) $e->send(E_NOTICE, 'AttrValidator: Attributes transformed', $o, $attr); if ($e) {
if ($attr != $o) $e->send(E_NOTICE, 'AttrValidator: Attributes transformed', $o, $attr);
}
} }
// create alias to this element's attribute definition array, see // create alias to this element's attribute definition array, see
@@ -114,6 +118,8 @@ class HTMLPurifier_AttrValidator
// simple substitution // simple substitution
$attr[$attr_key] = $result; $attr[$attr_key] = $result;
} else {
// nothing happens
} }
// we'd also want slightly more complicated substitution // we'd also want slightly more complicated substitution
@@ -130,13 +136,17 @@ class HTMLPurifier_AttrValidator
// global (error reporting untested) // global (error reporting untested)
foreach ($definition->info_attr_transform_post as $transform) { foreach ($definition->info_attr_transform_post as $transform) {
$attr = $transform->transform($o = $attr, $config, $context); $attr = $transform->transform($o = $attr, $config, $context);
if ($e && ($attr != $o)) $e->send(E_NOTICE, 'AttrValidator: Attributes transformed', $o, $attr); if ($e) {
if ($attr != $o) $e->send(E_NOTICE, 'AttrValidator: Attributes transformed', $o, $attr);
}
} }
// local (error reporting untested) // local (error reporting untested)
foreach ($definition->info[$token->name]->attr_transform_post as $transform) { foreach ($definition->info[$token->name]->attr_transform_post as $transform) {
$attr = $transform->transform($o = $attr, $config, $context); $attr = $transform->transform($o = $attr, $config, $context);
if ($e && ($attr != $o)) $e->send(E_NOTICE, 'AttrValidator: Attributes transformed', $o, $attr); if ($e) {
if ($attr != $o) $e->send(E_NOTICE, 'AttrValidator: Attributes transformed', $o, $attr);
}
} }
$token->attr = $attr; $token->attr = $attr;

View File

@@ -24,6 +24,14 @@ abstract class HTMLPurifier_ChildDef
*/ */
public $elements = array(); public $elements = array();
/**
* Get lookup of tag names that should not close this element automatically.
* All other elements will do so.
*/
public function getNonAutoCloseElements($config) {
return $this->elements;
}
/** /**
* Validates nodes according to definition and returns modification. * Validates nodes according to definition and returns modification.
* *

View File

@@ -5,8 +5,6 @@
* *
* @warning Currently this class is an all or nothing proposition, that is, * @warning Currently this class is an all or nothing proposition, that is,
* it will only give a bool return value. * it will only give a bool return value.
* @note This class is currently not used by any code, although it is unit
* tested.
*/ */
class HTMLPurifier_ChildDef_Custom extends HTMLPurifier_ChildDef class HTMLPurifier_ChildDef_Custom extends HTMLPurifier_ChildDef
{ {

View File

@@ -10,16 +10,19 @@ class HTMLPurifier_ChildDef_StrictBlockquote extends HTMLPurifier_ChildDef_Requi
public $allow_empty = true; public $allow_empty = true;
public $type = 'strictblockquote'; public $type = 'strictblockquote';
protected $init = false; protected $init = false;
/**
* @note We don't want MakeWellFormed to auto-close inline elements since
* they might be allowed.
*/
public function getNonAutoCloseElements($config) {
$this->init($config);
return $this->fake_elements;
}
public function validateChildren($tokens_of_children, $config, $context) { public function validateChildren($tokens_of_children, $config, $context) {
$def = $config->getHTMLDefinition(); $this->init($config);
if (!$this->init) {
// allow all inline elements
$this->real_elements = $this->elements;
$this->fake_elements = $def->info_content_sets['Flow'];
$this->fake_elements['#PCDATA'] = true;
$this->init = true;
}
// trick the parent class into thinking it allows more // trick the parent class into thinking it allows more
$this->elements = $this->fake_elements; $this->elements = $this->fake_elements;
@@ -29,6 +32,7 @@ class HTMLPurifier_ChildDef_StrictBlockquote extends HTMLPurifier_ChildDef_Requi
if ($result === false) return array(); if ($result === false) return array();
if ($result === true) $result = $tokens_of_children; if ($result === true) $result = $tokens_of_children;
$def = $config->getHTMLDefinition();
$block_wrap_start = new HTMLPurifier_Token_Start($def->info_block_wrapper); $block_wrap_start = new HTMLPurifier_Token_Start($def->info_block_wrapper);
$block_wrap_end = new HTMLPurifier_Token_End( $def->info_block_wrapper); $block_wrap_end = new HTMLPurifier_Token_End( $def->info_block_wrapper);
$is_inline = false; $is_inline = false;
@@ -68,5 +72,16 @@ class HTMLPurifier_ChildDef_StrictBlockquote extends HTMLPurifier_ChildDef_Requi
if ($is_inline) $ret[] = $block_wrap_end; if ($is_inline) $ret[] = $block_wrap_end;
return $ret; return $ret;
} }
private function init($config) {
if (!$this->init) {
$def = $config->getHTMLDefinition();
// allow all inline elements
$this->real_elements = $this->elements;
$this->fake_elements = $def->info_content_sets['Flow'];
$this->fake_elements['#PCDATA'] = true;
$this->init = true;
}
}
} }

View File

@@ -20,7 +20,7 @@ class HTMLPurifier_Config
/** /**
* HTML Purifier's version * HTML Purifier's version
*/ */
public $version = '3.1.1'; public $version = '3.2.0';
/** /**
* Bool indicator whether or not to automatically finalize * Bool indicator whether or not to automatically finalize

File diff suppressed because one or more lines are too long

View File

@@ -0,0 +1,9 @@
Attr.DefaultImageAlt
TYPE: string/null
DEFAULT: null
--DESCRIPTION--
This is the content of the alt tag of an image if the user had not
previously specified an alt attribute. This applies to all images without
a valid alt attribute, as opposed to %Attr.DefaultInvalidImageAlt, which
only applies to invalid images, and overrides in the case of an invalid image.
Default behavior with null is to use the basename of the src tag for the alt.

View File

@@ -0,0 +1,10 @@
AutoFormat.DisplayLinkURI
TYPE: bool
VERSION: 3.2.0
DEFAULT: false
--DESCRIPTION--
<p>
This directive turns on the in-text display of URIs in &lt;a&gt; tags, and disables
those links. For example, <a href="http://example.com">example</a> becomes
example (<a>http://example.com</a>).
</p>

View File

@@ -0,0 +1,44 @@
AutoFormat.RemoveEmpty
TYPE: bool
VERSION: 3.2.0
DEFAULT: false
--DESCRIPTION--
<p>
When enabled, HTML Purifier will attempt to remove empty elements that
contribute no semantic information to the document. The following types
of nodes will be removed:
</p>
<ul><li>
Tags with no attributes and no content, and that are not empty
elements (remove <code>&lt;a&gt;&lt;/a&gt;</code> but not
<code>&lt;br /&gt;</code>), and
</li>
<li>
Tags with no content, except for:<ul>
<li>The <code>colgroup</code> element, or</li>
<li>
Elements with the <code>id</code> or <code>name</code> attribute,
when those attributes are permitted on those elements.
</li>
</ul></li>
</ul>
<p>
Please be very careful when using this functionality; while it may not
seem that empty elements contain useful information, they can alter the
layout of a document given appropriate styling. This directive is most
useful when you are processing machine-generated HTML, please avoid using
it on regular user HTML.
</p>
<p>
Elements that contain only whitespace will be treated as empty. Non-breaking
spaces, however, do not count as whitespace.
</p>
<p>
This algorithm is not perfect; you may still notice some empty tags,
particularly if a node had elements, but those elements were later removed
because they were not permitted in that context, or tags that, after
being auto-closed by another tag, where empty. This is for safety reasons
to prevent clever code from breaking validation. The general rule of thumb:
if a tag looked empty on the way end, it will get removed; if HTML Purifier
made it empty, it will stay.
</p>

View File

@@ -1,13 +1,17 @@
Core.AggressivelyFixLt Core.AggressivelyFixLt
TYPE: bool TYPE: bool
VERSION: 2.1.0 VERSION: 2.1.0
DEFAULT: false DEFAULT: true
--DESCRIPTION-- --DESCRIPTION--
<p>
This directive enables aggressive pre-filter fixes HTML Purifier can This directive enables aggressive pre-filter fixes HTML Purifier can
perform in order to ensure that open angled-brackets do not get killed perform in order to ensure that open angled-brackets do not get killed
during parsing stage. Enabling this will result in two preg_replace_callback during parsing stage. Enabling this will result in two preg_replace_callback
calls and one preg_replace call for every bit of HTML passed through here. calls and at least two preg_replace calls for every HTML document parsed;
It is not necessary and will have no effect for PHP 4. if your users make very well-formed HTML, you can set this directive false.
This has no effect when DirectLex is used.
</p>
<p>
<strong>Notice:</strong> This directive's default turned from false to true
in HTML Purifier 3.2.0.
</p>

View File

@@ -14,13 +14,49 @@ EXTERNAL: CSSTidy
<p> <p>
Sample usage: Sample usage:
</p> </p>
<pre><![CDATA[$config = HTMLPurifier_Config::createDefault(); <pre><![CDATA[
$config->set('Filter', 'ExtractStyleBlocks', true); <?php
$purifier = new HTMLPurifier($config); header('Content-type: text/html; charset=utf-8');
$styles = $purifier->context->get('StyleBlocks'); echo '<?xml version="1.0" encoding="UTF-8"?>';
foreach ($styles as $style) { ?>
echo '<style type="text/css">' . $style . "</style>\n"; <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
}]]></pre> "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
<head>
<title>Filter.ExtractStyleBlocks</title>
<?php
require_once '/path/to/library/HTMLPurifier.auto.php';
require_once '/path/to/csstidy.class.php';
$dirty = '<style>body {color:#F00;}</style> Some text';
$config = HTMLPurifier_Config::createDefault();
$config->set('Filter', 'ExtractStyleBlocks', true);
$purifier = new HTMLPurifier($config);
$html = $purifier->purify($dirty);
// This implementation writes the stylesheets to the styles/ directory.
// You can also echo the styles inside the document, but it's a bit
// more difficult to make sure they get interpreted properly by
// browsers; try the usual CSS armoring techniques.
$styles = $purifier->context->get('StyleBlocks');
$dir = 'styles/';
if (!is_dir($dir)) mkdir($dir);
$hash = sha1($_GET['html']);
foreach ($styles as $i => $style) {
file_put_contents($name = $dir . $hash . "_$i");
echo '<link rel="stylesheet" type="text/css" href="'.$name.'" />';
}
?>
</head>
<body>
<div>
<?php echo $html; ?>
</div>
</b]]><![CDATA[ody>
</html>
]]></pre>
<p> <p>
<strong>Warning:</strong> It is possible for a user to mount an <strong>Warning:</strong> It is possible for a user to mount an
imagecrash attack using this CSS. Counter-measures are difficult; imagecrash attack using this CSS. Counter-measures are difficult;

View File

@@ -1,6 +1,6 @@
Output.SortAttr Output.SortAttr
TYPE: bool TYPE: bool
VERSION: 3.1.2 VERSION: 3.2.0
DEFAULT: false DEFAULT: false
--DESCRIPTION-- --DESCRIPTION--
<p> <p>

View File

@@ -37,33 +37,35 @@ class HTMLPurifier_ContentSets
// sorry, no way of overloading // sorry, no way of overloading
foreach ($modules as $module_i => $module) { foreach ($modules as $module_i => $module) {
foreach ($module->content_sets as $key => $value) { foreach ($module->content_sets as $key => $value) {
if (isset($this->info[$key])) { $temp = $this->convertToLookup($value);
if (isset($this->lookup[$key])) {
// add it into the existing content set // add it into the existing content set
$this->info[$key] = $this->info[$key] . ' | ' . $value; $this->lookup[$key] = array_merge($this->lookup[$key], $temp);
} else { } else {
$this->info[$key] = $value; $this->lookup[$key] = $temp;
} }
} }
} }
// perform content_set expansions $old_lookup = false;
$this->keys = array_keys($this->info); while ($old_lookup !== $this->lookup) {
foreach ($this->info as $i => $set) { $old_lookup = $this->lookup;
// only performed once, so infinite recursion is not foreach ($this->lookup as $i => $set) {
// a problem $add = array();
$this->info[$i] = foreach ($set as $element => $x) {
str_replace( if (isset($this->lookup[$element])) {
$this->keys, $add += $this->lookup[$element];
// must be recalculated each time due to unset($this->lookup[$i][$element]);
// changing substitutions }
array_values($this->info), }
$set); $this->lookup[$i] += $add;
}
} }
$this->values = array_values($this->info);
// generate lookup tables foreach ($this->lookup as $key => $lookup) {
foreach ($this->info as $name => $set) { $this->info[$key] = implode(' | ', array_keys($lookup));
$this->lookup[$name] = $this->convertToLookup($set);
} }
$this->keys = array_keys($this->info);
$this->values = array_values($this->info);
} }
/** /**
@@ -75,12 +77,22 @@ class HTMLPurifier_ContentSets
if (!empty($def->child)) return; // already done! if (!empty($def->child)) return; // already done!
$content_model = $def->content_model; $content_model = $def->content_model;
if (is_string($content_model)) { if (is_string($content_model)) {
$def->content_model = str_replace( // Assume that $this->keys is alphanumeric
$this->keys, $this->values, $content_model); $def->content_model = preg_replace_callback(
'/\b(' . implode('|', $this->keys) . ')\b/',
array($this, 'generateChildDefCallback'),
$content_model
);
//$def->content_model = str_replace(
// $this->keys, $this->values, $content_model);
} }
$def->child = $this->getChildDef($def, $module); $def->child = $this->getChildDef($def, $module);
} }
public function generateChildDefCallback($matches) {
return $this->info[$matches[0]];
}
/** /**
* Instantiates a ChildDef based on content_model and content_model_type * Instantiates a ChildDef based on content_model and content_model_type
* member variables in HTMLPurifier_ElementDef * member variables in HTMLPurifier_ElementDef

View File

@@ -271,6 +271,12 @@ class HTMLPurifier_Encoder
set_error_handler(array('HTMLPurifier_Encoder', 'muteErrorHandler')); set_error_handler(array('HTMLPurifier_Encoder', 'muteErrorHandler'));
if ($iconv && !$config->get('Test', 'ForceNoIconv')) { if ($iconv && !$config->get('Test', 'ForceNoIconv')) {
$str = iconv($encoding, 'utf-8//IGNORE', $str); $str = iconv($encoding, 'utf-8//IGNORE', $str);
if ($str === false) {
// $encoding is not a valid encoding
restore_error_handler();
trigger_error('Invalid encoding ' . $encoding, E_USER_ERROR);
return '';
}
// If the string is bjorked by Shift_JIS or a similar encoding // If the string is bjorked by Shift_JIS or a similar encoding
// that doesn't support all of ASCII, convert the naughty // that doesn't support all of ASCII, convert the naughty
// characters to their true byte-wise ASCII/UTF-8 equivalents. // characters to their true byte-wise ASCII/UTF-8 equivalents.
@@ -282,7 +288,7 @@ class HTMLPurifier_Encoder
restore_error_handler(); restore_error_handler();
return $str; return $str;
} }
trigger_error('Encoding not supported', E_USER_ERROR); trigger_error('Encoding not supported, please install iconv', E_USER_ERROR);
} }
/** /**

View File

@@ -7,22 +7,37 @@
class HTMLPurifier_ErrorCollector class HTMLPurifier_ErrorCollector
{ {
protected $errors = array(); /**
* Identifiers for the returned error array. These are purposely numeric
* so list() can be used.
*/
const LINENO = 0;
const SEVERITY = 1;
const MESSAGE = 2;
const CHILDREN = 3;
protected $errors;
protected $_current;
protected $_stacks = array(array());
protected $locale; protected $locale;
protected $generator; protected $generator;
protected $context; protected $context;
protected $lines = array();
public function __construct($context) { public function __construct($context) {
$this->locale =& $context->get('Locale'); $this->locale =& $context->get('Locale');
$this->generator =& $context->get('Generator');
$this->context = $context; $this->context = $context;
$this->_current =& $this->_stacks[0];
$this->errors =& $this->_stacks[0];
} }
/** /**
* Sends an error message to the collector for later use * Sends an error message to the collector for later use
* @param $line Integer line number, or HTMLPurifier_Token that caused error
* @param $severity int Error severity, PHP error style (don't use E_USER_) * @param $severity int Error severity, PHP error style (don't use E_USER_)
* @param $msg string Error message text * @param $msg string Error message text
* @param $subst1 string First substitution for $msg
* @param $subst2 string ...
*/ */
public function send($severity, $msg) { public function send($severity, $msg) {
@@ -35,6 +50,7 @@ class HTMLPurifier_ErrorCollector
$token = $this->context->get('CurrentToken', true); $token = $this->context->get('CurrentToken', true);
$line = $token ? $token->line : $this->context->get('CurrentLine', true); $line = $token ? $token->line : $this->context->get('CurrentLine', true);
$col = $token ? $token->col : $this->context->get('CurrentCol', true);
$attr = $this->context->get('CurrentAttr', true); $attr = $this->context->get('CurrentAttr', true);
// perform special substitutions, also add custom parameters // perform special substitutions, also add custom parameters
@@ -55,13 +71,66 @@ class HTMLPurifier_ErrorCollector
if (!empty($subst)) $msg = strtr($msg, $subst); if (!empty($subst)) $msg = strtr($msg, $subst);
$this->errors[] = array($line, $severity, $msg); // (numerically indexed)
$error = array(
self::LINENO => $line,
self::SEVERITY => $severity,
self::MESSAGE => $msg,
self::CHILDREN => array()
);
$this->_current[] = $error;
// NEW CODE BELOW ...
$struct = null;
// Top-level errors are either:
// TOKEN type, if $value is set appropriately, or
// "syntax" type, if $value is null
$new_struct = new HTMLPurifier_ErrorStruct();
$new_struct->type = HTMLPurifier_ErrorStruct::TOKEN;
if ($token) $new_struct->value = clone $token;
if (is_int($line) && is_int($col)) {
if (isset($this->lines[$line][$col])) {
$struct = $this->lines[$line][$col];
} else {
$struct = $this->lines[$line][$col] = $new_struct;
}
// These ksorts may present a performance problem
ksort($this->lines[$line], SORT_NUMERIC);
} else {
if (isset($this->lines[-1])) {
$struct = $this->lines[-1];
} else {
$struct = $this->lines[-1] = $new_struct;
}
}
ksort($this->lines, SORT_NUMERIC);
// Now, check if we need to operate on a lower structure
if (!empty($attr)) {
$struct = $struct->getChild(HTMLPurifier_ErrorStruct::ATTR, $attr);
if (!$struct->value) {
$struct->value = array($attr, 'PUT VALUE HERE');
}
}
if (!empty($cssprop)) {
$struct = $struct->getChild(HTMLPurifier_ErrorStruct::CSSPROP, $cssprop);
if (!$struct->value) {
// if we tokenize CSS this might be a little more difficult to do
$struct->value = array($cssprop, 'PUT VALUE HERE');
}
}
// Ok, structs are all setup, now time to register the error
$struct->addError($severity, $msg);
} }
/** /**
* Retrieves raw error data for custom formatter to use * Retrieves raw error data for custom formatter to use
* @param List of arrays in format of array(Error message text, * @param List of arrays in format of array(line of error,
* token that caused error, tokens surrounding token) * error severity, error message,
* recursive sub-errors array)
*/ */
public function getRaw() { public function getRaw() {
return $this->errors; return $this->errors;
@@ -70,38 +139,25 @@ class HTMLPurifier_ErrorCollector
/** /**
* Default HTML formatting implementation for error messages * Default HTML formatting implementation for error messages
* @param $config Configuration array, vital for HTML output nature * @param $config Configuration array, vital for HTML output nature
* @param $errors Errors array to display; used for recursion.
*/ */
public function getHTMLFormatted($config) { public function getHTMLFormatted($config, $errors = null) {
$ret = array(); $ret = array();
$errors = $this->errors; $this->generator = new HTMLPurifier_Generator($config, $this->context);
if ($errors === null) $errors = $this->errors;
// sort error array by line // 'At line' message needs to be removed
// line numbers are enabled if they aren't explicitly disabled
if ($config->get('Core', 'MaintainLineNumbers') !== false) { // generation code for new structure goes here. It needs to be recursive.
$has_line = array(); foreach ($this->lines as $line => $col_array) {
$lines = array(); if ($line == -1) continue;
$original_order = array(); foreach ($col_array as $col => $struct) {
foreach ($errors as $i => $error) { $this->_renderStruct($ret, $struct, $line, $col);
$has_line[] = (int) (bool) $error[0];
$lines[] = $error[0];
$original_order[] = $i;
} }
array_multisort($has_line, SORT_DESC, $lines, SORT_ASC, $original_order, SORT_ASC, $errors);
} }
if (isset($this->lines[-1])) {
foreach ($errors as $error) { $this->_renderStruct($ret, $this->lines[-1]);
list($line, $severity, $msg) = $error;
$string = '';
$string .= '<strong>' . $this->locale->getErrorName($severity) . '</strong>: ';
$string .= $this->generator->escape($msg);
if ($line) {
// have javascript link generation that causes
// textarea to skip to the specified line
$string .= $this->locale->formatMessage(
'ErrorCollector: At line', array('line' => $line));
}
$ret[] = $string;
} }
if (empty($errors)) { if (empty($errors)) {
@@ -112,5 +168,41 @@ class HTMLPurifier_ErrorCollector
} }
private function _renderStruct(&$ret, $struct, $line = null, $col = null) {
$stack = array($struct);
$context_stack = array(array());
while ($current = array_pop($stack)) {
$context = array_pop($context_stack);
foreach ($current->errors as $error) {
list($severity, $msg) = $error;
$string = '';
$string .= '<div>';
// W3C uses an icon to indicate the severity of the error.
$error = $this->locale->getErrorName($severity);
$string .= "<span class=\"error e$severity\"><strong>$error</strong></span> ";
if (!is_null($line) && !is_null($col)) {
$string .= "<em class=\"location\">Line $line, Column $col: </em> ";
} else {
$string .= '<em class="location">End of Document: </em> ';
}
$string .= '<strong class="description">' . $this->generator->escape($msg) . '</strong> ';
$string .= '</div>';
// Here, have a marker for the character on the column appropriate.
// Be sure to clip extremely long lines.
//$string .= '<pre>';
//$string .= '';
//$string .= '</pre>';
$ret[] = $string;
}
foreach ($current->children as $type => $array) {
$context[] = $current;
$stack = array_merge($stack, array_reverse($array, true));
for ($i = count($array); $i > 0; $i--) {
$context_stack[] = $context;
}
}
}
}
} }

View File

@@ -0,0 +1,58 @@
<?php
/**
* Records errors for particular segments of an HTML document such as tokens,
* attributes or CSS properties. They can contain error structs (which apply
* to components of what they represent), but their main purpose is to hold
* errors applying to whatever struct is being used.
*/
class HTMLPurifier_ErrorStruct
{
/**
* Possible values for $children first-key. Note that top-level structures
* are automatically token-level.
*/
const TOKEN = 0;
const ATTR = 1;
const CSSPROP = 2;
/**
* Type of this struct.
*/
public $type;
/**
* Value of the struct we are recording errors for. There are various
* values for this:
* - TOKEN: Instance of HTMLPurifier_Token
* - ATTR: array('attr-name', 'value')
* - CSSPROP: array('prop-name', 'value')
*/
public $value;
/**
* Errors registered for this structure.
*/
public $errors = array();
/**
* Child ErrorStructs that are from this structure. For example, a TOKEN
* ErrorStruct would contain ATTR ErrorStructs. This is a multi-dimensional
* array in structure: [TYPE]['identifier']
*/
public $children = array();
public function getChild($type, $id) {
if (!isset($this->children[$type][$id])) {
$this->children[$type][$id] = new HTMLPurifier_ErrorStruct();
$this->children[$type][$id]->type = $type;
}
return $this->children[$type][$id];
}
public function addError($severity, $message) {
$this->errors[] = array($severity, $message);
}
}

View File

@@ -0,0 +1,117 @@
<?php
/**
* XHTML 1.1 Forms module, defines all form-related elements found in HTML 4.
*/
class HTMLPurifier_HTMLModule_Forms extends HTMLPurifier_HTMLModule
{
public $name = 'Forms';
public $safe = false;
public $content_sets = array(
'Block' => 'Form',
'Inline' => 'Formctrl',
);
public function setup($config) {
$form = $this->addElement('form', 'Form',
'Required: Heading | List | Block | fieldset', 'Common', array(
'accept' => 'ContentTypes',
'accept-charset' => 'Charsets',
'action*' => 'URI',
'method' => 'Enum#get,post',
// really ContentType, but these two are the only ones used today
'enctype' => 'Enum#application/x-www-form-urlencoded,multipart/form-data',
));
$form->excludes = array('form' => true);
$input = $this->addElement('input', 'Formctrl', 'Empty', 'Common', array(
'accept' => 'ContentTypes',
'accesskey' => 'Character',
'alt' => 'Text',
'checked' => 'Bool#checked',
'disabled' => 'Bool#disabled',
'maxlength' => 'Number',
'name' => 'CDATA',
'readonly' => 'Bool#readonly',
'size' => 'Number',
'src' => 'URI#embeds',
'tabindex' => 'Number',
'type' => 'Enum#text,password,checkbox,button,radio,submit,reset,file,hidden,image',
'value' => 'CDATA',
));
$input->attr_transform_post[] = new HTMLPurifier_AttrTransform_Input();
$this->addElement('select', 'Formctrl', 'Required: optgroup | option', 'Common', array(
'disabled' => 'Bool#disabled',
'multiple' => 'Bool#multiple',
'name' => 'CDATA',
'size' => 'Number',
'tabindex' => 'Number',
));
$this->addElement('option', false, 'Optional: #PCDATA', 'Common', array(
'disabled' => 'Bool#disabled',
'label' => 'Text',
'selected' => 'Bool#selected',
'value' => 'CDATA',
));
// It's illegal for there to be more than one selected, but not
// be multiple. Also, no selected means undefined behavior. This might
// be difficult to implement; perhaps an injector, or a context variable.
$textarea = $this->addElement('textarea', 'Formctrl', 'Optional: #PCDATA', 'Common', array(
'accesskey' => 'Character',
'cols*' => 'Number',
'disabled' => 'Bool#disabled',
'name' => 'CDATA',
'readonly' => 'Bool#readonly',
'rows*' => 'Number',
'tabindex' => 'Number',
));
$textarea->attr_transform_pre[] = new HTMLPurifier_AttrTransform_Textarea();
$button = $this->addElement('button', 'Formctrl', 'Optional: #PCDATA | Heading | List | Block | Inline', 'Common', array(
'accesskey' => 'Character',
'disabled' => 'Bool#disabled',
'name' => 'CDATA',
'tabindex' => 'Number',
'type' => 'Enum#button,submit,reset',
'value' => 'CDATA',
));
// For exclusions, ideally we'd specify content sets, not literal elements
$button->excludes = $this->makeLookup(
'form', 'fieldset', // Form
'input', 'select', 'textarea', 'label', 'button', // Formctrl
'a' // as per HTML 4.01 spec, this is omitted by modularization
);
// Extra exclusion: img usemap="" is not permitted within this element.
// We'll omit this for now, since we don't have any good way of
// indicating it yet.
// This is HIGHLY user-unfriendly; we need a custom child-def for this
$this->addElement('fieldset', 'Form', 'Custom: (#WS?,legend,(Flow|#PCDATA)*)', 'Common');
$label = $this->addElement('label', 'Formctrl', 'Optional: #PCDATA | Inline', 'Common', array(
'accesskey' => 'Character',
// 'for' => 'IDREF', // IDREF not implemented, cannot allow
));
$label->excludes = array('label' => true);
$this->addElement('legend', false, 'Optional: #PCDATA | Inline', 'Common', array(
'accesskey' => 'Character',
));
$this->addElement('optgroup', false, 'Required: option', 'Common', array(
'disabled' => 'Bool#disabled',
'label*' => 'Text',
));
// Don't forget an injector for <isindex>. This one's a little complex
// because it maps to multiple elements.
}
}

View File

@@ -0,0 +1,16 @@
<?php
class HTMLPurifier_HTMLModule_Name extends HTMLPurifier_HTMLModule
{
public $name = 'Name';
public function setup($config) {
$elements = array('a', 'applet', 'form', 'frame', 'iframe', 'img', 'map');
foreach ($elements as $name) {
$element = $this->addBlankElement($name);
$element->attr['name'] = 'ID';
}
}
}

View File

@@ -0,0 +1,23 @@
<?php
/**
* Name is deprecated, but allowed in strict doctypes, so onl
*/
class HTMLPurifier_HTMLModule_Tidy_Name extends HTMLPurifier_HTMLModule_Tidy
{
public $name = 'Tidy_Name';
public $defaultLevel = 'heavy';
public function makeFixes() {
$r = array();
// @name for img, a -----------------------------------------------
// Technically, it's allowed even on strict, so we allow authors to use
// it. However, it's deprecated in future versions of XHTML.
$r['img@name'] =
$r['a@name'] = new HTMLPurifier_AttrTransform_Name();
return $r;
}
}

View File

@@ -7,7 +7,15 @@ class HTMLPurifier_HTMLModule_Tidy_Proprietary extends HTMLPurifier_HTMLModule_T
public $defaultLevel = 'light'; public $defaultLevel = 'light';
public function makeFixes() { public function makeFixes() {
return array(); $r = array();
$r['table@background'] = new HTMLPurifier_AttrTransform_Background();
$r['td@background'] = new HTMLPurifier_AttrTransform_Background();
$r['th@background'] = new HTMLPurifier_AttrTransform_Background();
$r['tr@background'] = new HTMLPurifier_AttrTransform_Background();
$r['thead@background'] = new HTMLPurifier_AttrTransform_Background();
$r['tfoot@background'] = new HTMLPurifier_AttrTransform_Background();
$r['tbody@background'] = new HTMLPurifier_AttrTransform_Background();
return $r;
} }
} }

View File

@@ -103,10 +103,6 @@ class HTMLPurifier_HTMLModule_Tidy_XHTMLAndHTML4 extends HTMLPurifier_HTMLModule
// @hspace for img ------------------------------------------------ // @hspace for img ------------------------------------------------
$r['img@hspace'] = new HTMLPurifier_AttrTransform_ImgSpace('hspace'); $r['img@hspace'] = new HTMLPurifier_AttrTransform_ImgSpace('hspace');
// @name for img, a -----------------------------------------------
$r['img@name'] =
$r['a@name'] = new HTMLPurifier_AttrTransform_Name();
// @noshade for hr ------------------------------------------------ // @noshade for hr ------------------------------------------------
// this transformation is not precise but often good enough. // this transformation is not precise but often good enough.
// different browsers use different styles to designate noshade // different browsers use different styles to designate noshade

View File

@@ -63,7 +63,11 @@ class HTMLPurifier_HTMLModuleManager
$common = array( $common = array(
'CommonAttributes', 'Text', 'Hypertext', 'List', 'CommonAttributes', 'Text', 'Hypertext', 'List',
'Presentation', 'Edit', 'Bdo', 'Tables', 'Image', 'Presentation', 'Edit', 'Bdo', 'Tables', 'Image',
'StyleAttribute', 'Scripting', 'Object' 'StyleAttribute',
// Unsafe:
'Scripting', 'Object', 'Forms',
// Sorta legacy, but present in strict:
'Name',
); );
$transitional = array('Legacy', 'Target'); $transitional = array('Legacy', 'Target');
$xml = array('XMLCommonAttributes'); $xml = array('XMLCommonAttributes');
@@ -82,7 +86,7 @@ class HTMLPurifier_HTMLModuleManager
$this->doctypes->register( $this->doctypes->register(
'HTML 4.01 Strict', false, 'HTML 4.01 Strict', false,
array_merge($common, $non_xml), array_merge($common, $non_xml),
array('Tidy_Strict', 'Tidy_Proprietary'), array('Tidy_Strict', 'Tidy_Proprietary', 'Tidy_Name'),
array(), array(),
'-//W3C//DTD HTML 4.01//EN', '-//W3C//DTD HTML 4.01//EN',
'http://www.w3.org/TR/html4/strict.dtd' 'http://www.w3.org/TR/html4/strict.dtd'
@@ -91,7 +95,7 @@ class HTMLPurifier_HTMLModuleManager
$this->doctypes->register( $this->doctypes->register(
'XHTML 1.0 Transitional', true, 'XHTML 1.0 Transitional', true,
array_merge($common, $transitional, $xml, $non_xml), array_merge($common, $transitional, $xml, $non_xml),
array('Tidy_Transitional', 'Tidy_XHTML', 'Tidy_Proprietary'), array('Tidy_Transitional', 'Tidy_XHTML', 'Tidy_Proprietary', 'Tidy_Name'),
array(), array(),
'-//W3C//DTD XHTML 1.0 Transitional//EN', '-//W3C//DTD XHTML 1.0 Transitional//EN',
'http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd' 'http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd'
@@ -100,7 +104,7 @@ class HTMLPurifier_HTMLModuleManager
$this->doctypes->register( $this->doctypes->register(
'XHTML 1.0 Strict', true, 'XHTML 1.0 Strict', true,
array_merge($common, $xml, $non_xml), array_merge($common, $xml, $non_xml),
array('Tidy_Strict', 'Tidy_XHTML', 'Tidy_Strict', 'Tidy_Proprietary'), array('Tidy_Strict', 'Tidy_XHTML', 'Tidy_Strict', 'Tidy_Proprietary', 'Tidy_Name'),
array(), array(),
'-//W3C//DTD XHTML 1.0 Strict//EN', '-//W3C//DTD XHTML 1.0 Strict//EN',
'http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd' 'http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd'
@@ -109,7 +113,7 @@ class HTMLPurifier_HTMLModuleManager
$this->doctypes->register( $this->doctypes->register(
'XHTML 1.1', true, 'XHTML 1.1', true,
array_merge($common, $xml, array('Ruby')), array_merge($common, $xml, array('Ruby')),
array('Tidy_Strict', 'Tidy_XHTML', 'Tidy_Proprietary', 'Tidy_Strict'), // Tidy_XHTML1_1 array('Tidy_Strict', 'Tidy_XHTML', 'Tidy_Proprietary', 'Tidy_Strict', 'Tidy_Name'), // Tidy_XHTML1_1
array(), array(),
'-//W3C//DTD XHTML 1.1//EN', '-//W3C//DTD XHTML 1.1//EN',
'http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd' 'http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd'
@@ -212,9 +216,6 @@ class HTMLPurifier_HTMLModuleManager
} }
} }
// merge in custom modules
$modules = array_merge($modules, $this->userModules);
// add proprietary module (this gets special treatment because // add proprietary module (this gets special treatment because
// it is completely removed from doctypes, etc.) // it is completely removed from doctypes, etc.)
if ($config->get('HTML', 'Proprietary')) { if ($config->get('HTML', 'Proprietary')) {
@@ -229,6 +230,9 @@ class HTMLPurifier_HTMLModuleManager
$modules[] = 'SafeEmbed'; $modules[] = 'SafeEmbed';
} }
// merge in custom modules
$modules = array_merge($modules, $this->userModules);
foreach ($modules as $module) { foreach ($modules as $module) {
$this->processModule($module); $this->processModule($module);
$this->modules[$module]->setup($config); $this->modules[$module]->setup($config);
@@ -378,7 +382,11 @@ class HTMLPurifier_HTMLModuleManager
$this->contentSets->generateChildDef($def, $module); $this->contentSets->generateChildDef($def, $module);
} }
// This can occur if there is a blank definition, but no base to
// mix it in with
if (!$def) return false;
// add information on required attributes // add information on required attributes
foreach ($def->attr as $attr_name => $attr_def) { foreach ($def->attr as $attr_name => $attr_def) {
if ($attr_def->required) { if ($attr_def->required) {

View File

@@ -5,6 +5,11 @@
* This enables "formatter-like" functionality such as auto-paragraphing, * This enables "formatter-like" functionality such as auto-paragraphing,
* smiley-ification and linkification to take place. * smiley-ification and linkification to take place.
* *
* A note on how handlers create changes; this is done by assigning a new
* value to the $token reference. These values can take a variety of forms and
* are best described HTMLPurifier_Strategy_MakeWellFormed->processToken()
* documentation.
*
* @todo Allow injectors to request a re-run on their output. This * @todo Allow injectors to request a re-run on their output. This
* would help if an operation is recursive. * would help if an operation is recursive.
*/ */
@@ -16,13 +21,6 @@ abstract class HTMLPurifier_Injector
*/ */
public $name; public $name;
/**
* Amount of tokens the injector needs to skip + 1. Because
* the decrement is the first thing that happens, this needs to
* be one greater than the "real" skip count.
*/
public $skip = 1;
/** /**
* Instance of HTMLPurifier_HTMLDefinition * Instance of HTMLPurifier_HTMLDefinition
*/ */
@@ -54,6 +52,32 @@ abstract class HTMLPurifier_Injector
*/ */
public $needed = array(); public $needed = array();
/**
* Index of inputTokens to rewind to.
*/
protected $rewind = false;
/**
* Rewind to a spot to re-perform processing. This is useful if you
* deleted a node, and now need to see if this change affected any
* earlier nodes. Rewinding does not affect other injectors, and can
* result in infinite loops if not used carefully.
* @warning HTML Purifier will prevent you from fast-forwarding with this
* function.
*/
public function rewind($index) {
$this->rewind = $index;
}
/**
* Retrieves rewind, and then unsets it.
*/
public function getRewind() {
$r = $this->rewind;
$this->rewind = false;
return $r;
}
/** /**
* Prepares the injector by giving it the config and context objects: * Prepares the injector by giving it the config and context objects:
* this allows references to important variables to be made within * this allows references to important variables to be made within
@@ -116,6 +140,69 @@ abstract class HTMLPurifier_Injector
return true; return true;
} }
/**
* Iterator function, which starts with the next token and continues until
* you reach the end of the input tokens.
* @warning Please prevent previous references from interfering with this
* functions by setting $i = null beforehand!
* @param &$i Current integer index variable for inputTokens
* @param &$current Current token variable. Do NOT use $token, as that variable is also a reference
*/
protected function forward(&$i, &$current) {
if ($i === null) $i = $this->inputIndex + 1;
else $i++;
if (!isset($this->inputTokens[$i])) return false;
$current = $this->inputTokens[$i];
return true;
}
/**
* Similar to _forward, but accepts a third parameter $nesting (which
* should be initialized at 0) and stops when we hit the end tag
* for the node $this->inputIndex starts in.
*/
protected function forwardUntilEndToken(&$i, &$current, &$nesting) {
$result = $this->forward($i, $current);
if (!$result) return false;
if ($nesting === null) $nesting = 0;
if ($current instanceof HTMLPurifier_Token_Start) $nesting++;
elseif ($current instanceof HTMLPurifier_Token_End) {
if ($nesting <= 0) return false;
$nesting--;
}
return true;
}
/**
* Iterator function, starts with the previous token and continues until
* you reach the beginning of input tokens.
* @warning Please prevent previous references from interfering with this
* functions by setting $i = null beforehand!
* @param &$i Current integer index variable for inputTokens
* @param &$current Current token variable. Do NOT use $token, as that variable is also a reference
*/
protected function backward(&$i, &$current) {
if ($i === null) $i = $this->inputIndex - 1;
else $i--;
if ($i < 0) return false;
$current = $this->inputTokens[$i];
return true;
}
/**
* Initializes the iterator at the current position. Use in a do {} while;
* loop to force the _forward and _backward functions to start at the
* current location.
* @warning Please prevent previous references from interfering with this
* functions by setting $i = null beforehand!
* @param &$i Current integer index variable for inputTokens
* @param &$current Current token variable. Do NOT use $token, as that variable is also a reference
*/
protected function current(&$i, &$current) {
if ($i === null) $i = $this->inputIndex;
$current = $this->inputTokens[$i];
}
/** /**
* Handler that is called when a text token is processed * Handler that is called when a text token is processed
*/ */
@@ -126,9 +213,17 @@ abstract class HTMLPurifier_Injector
*/ */
public function handleElement(&$token) {} public function handleElement(&$token) {}
/**
* Handler that is called when an end token is processed
*/
public function handleEnd(&$token) {
$this->notifyEnd($token);
}
/** /**
* Notifier that is called when an end token is processed * Notifier that is called when an end token is processed
* @note This differs from handlers in that the token is read-only * @note This differs from handlers in that the token is read-only
* @deprecated
*/ */
public function notifyEnd($token) {} public function notifyEnd($token) {}

View File

@@ -3,6 +3,8 @@
/** /**
* Injector that auto paragraphs text in the root node based on * Injector that auto paragraphs text in the root node based on
* double-spacing. * double-spacing.
* @todo Ensure all states are unit tested, including variations as well.
* @todo Make a graph of the flow control for this Injector.
*/ */
class HTMLPurifier_Injector_AutoParagraph extends HTMLPurifier_Injector class HTMLPurifier_Injector_AutoParagraph extends HTMLPurifier_Injector
{ {
@@ -18,116 +20,177 @@ class HTMLPurifier_Injector_AutoParagraph extends HTMLPurifier_Injector
public function handleText(&$token) { public function handleText(&$token) {
$text = $token->data; $text = $token->data;
if (empty($this->currentNesting)) { // Does the current parent allow <p> tags?
if (!$this->allowsElement('p')) return; if ($this->allowsElement('p')) {
// case 1: we're in root node (and it allows paragraphs) if (empty($this->currentNesting) || strpos($text, "\n\n") !== false) {
$token = array($this->_pStart()); // Note that we have differing behavior when dealing with text
$this->_splitText($text, $token); // in the anonymous root node, or a node inside the document.
} elseif ($this->currentNesting[count($this->currentNesting)-1]->name == 'p') { // If the text as a double-newline, the treatment is the same;
// case 2: we're in a paragraph // if it doesn't, see the next if-block if you're in the document.
$token = array();
$this->_splitText($text, $token); $i = $nesting = null;
} elseif ($this->allowsElement('p')) { if (!$this->forwardUntilEndToken($i, $current, $nesting) && $token->is_whitespace) {
// case 3: we're in an element that allows paragraphs // State 1.1: ... ^ (whitespace, then document end)
if (strpos($text, "\n\n") !== false) { // ----
// case 3.1: this text node has a double-newline // This is a degenerate case
$token = array($this->_pStart()); } else {
$this->_splitText($text, $token); // State 1.2: PAR1
} else { // ----
$ok = false;
// test if up-coming tokens are either block or have // State 1.3: PAR1\n\nPAR2
// a double newline in them // ------------
$nesting = 0;
for ($i = $this->inputIndex + 1; isset($this->inputTokens[$i]); $i++) { // State 1.4: <div>PAR1\n\nPAR2 (see State 2)
if ($this->inputTokens[$i] instanceof HTMLPurifier_Token_Start){ // ------------
if (!$this->_isInline($this->inputTokens[$i])) { $token = array($this->_pStart());
// we haven't found a double-newline, and $this->_splitText($text, $token);
// we've hit a block element, so don't paragraph
$ok = false;
break;
}
$nesting++;
}
if ($this->inputTokens[$i] instanceof HTMLPurifier_Token_End) {
if ($nesting <= 0) break;
$nesting--;
}
if ($this->inputTokens[$i] instanceof HTMLPurifier_Token_Text) {
// found it!
if (strpos($this->inputTokens[$i]->data, "\n\n") !== false) {
$ok = true;
break;
}
}
} }
if ($ok) { } else {
// case 3.2: this text node is next to another node // State 2: <div>PAR1... (similar to 1.4)
// that will start a paragraph // ----
// We're in an element that allows paragraph tags, but we're not
// sure if we're going to need them.
if ($this->_pLookAhead()) {
// State 2.1: <div>PAR1<b>PAR1\n\nPAR2
// ----
// Note: This will always be the first child, since any
// previous inline element would have triggered this very
// same routine, and found the double newline. One possible
// exception would be a comment.
$token = array($this->_pStart(), $token); $token = array($this->_pStart(), $token);
} else {
// State 2.2.1: <div>PAR1<div>
// ----
// State 2.2.2: <div>PAR1<b>PAR1</b></div>
// ----
} }
} }
// Is the current parent a <p> tag?
} elseif (
!empty($this->currentNesting) &&
$this->currentNesting[count($this->currentNesting)-1]->name == 'p'
) {
// State 3.1: ...<p>PAR1
// ----
// State 3.2: ...<p>PAR1\n\nPAR2
// ------------
$token = array();
$this->_splitText($text, $token);
// Abort!
} else {
// State 4.1: ...<b>PAR1
// ----
// State 4.2: ...<b>PAR1\n\nPAR2
// ------------
} }
} }
public function handleElement(&$token) { public function handleElement(&$token) {
// check if we're inside a tag already // We don't have to check if we're already in a <p> tag for block
if (!empty($this->currentNesting)) { // tokens, because the tag would have been autoclosed by MakeWellFormed.
if ($this->allowsElement('p')) { if ($this->allowsElement('p')) {
// special case: we're in an element that allows paragraphs if (!empty($this->currentNesting)) {
if ($this->_isInline($token)) {
// this token is already paragraph, abort // State 1: <div>...<b>
if ($token->name == 'p') return; // ---
// this token is a block level, abort // Check if this token is adjacent to the parent token
if (!$this->_isInline($token)) return; // (seek backwards until token isn't whitespace)
$i = null;
// check if this token is adjacent to the parent token $this->backward($i, $prev);
$prev = $this->inputTokens[$this->inputIndex - 1];
if (!$prev instanceof HTMLPurifier_Token_Start) { if (!$prev instanceof HTMLPurifier_Token_Start) {
// not adjacent, we can abort early // Token wasn't adjacent
// add lead paragraph tag if our token is inline
// and the previous tag was an end paragraph if (
if ( $prev instanceof HTMLPurifier_Token_Text &&
$prev->name == 'p' && $prev instanceof HTMLPurifier_Token_End && substr($prev->data, -2) === "\n\n"
$this->_isInline($token) ) {
) { // State 1.1.4: <div><p>PAR1</p>\n\n<b>
$token = array($this->_pStart(), $token); // ---
}
return; // Quite frankly, this should be handled by splitText
} $token = array($this->_pStart(), $token);
} else {
// this token is the first child of the element that allows // State 1.1.1: <div><p>PAR1</p><b>
// paragraph. We have to peek ahead and see whether or not // ---
// there is anything inside that suggests that a paragraph
// will be needed // State 1.1.2: <div><br /><b>
$ok = false; // ---
// maintain a mini-nesting counter, this lets us bail out
// early if possible // State 1.1.3: <div>PAR<b>
$j = 1; // current nesting, one is due to parent (we recalculate current token) // ---
for ($i = $this->inputIndex; isset($this->inputTokens[$i]); $i++) { }
if ($this->inputTokens[$i] instanceof HTMLPurifier_Token_Start) $j++;
if ($this->inputTokens[$i] instanceof HTMLPurifier_Token_End) $j--; } else {
if ($this->inputTokens[$i] instanceof HTMLPurifier_Token_Text) { // State 1.2.1: <div><b>
if (strpos($this->inputTokens[$i]->data, "\n\n") !== false) { // ---
$ok = true;
break; // Lookahead to see if <p> is needed.
if ($this->_pLookAhead()) {
// State 1.3.1: <div><b>PAR1\n\nPAR2
// ---
$token = array($this->_pStart(), $token);
} else {
// State 1.3.2: <div><b>PAR1</b></div>
// ---
// State 1.3.3: <div><b>PAR1</b><div></div>\n\n</div>
// ---
} }
} }
if ($j <= 0) break; } else {
// State 2.3: ...<div>
// -----
} }
if ($ok) { } else {
if ($this->_isInline($token)) {
// State 3.1: <b>
// ---
// This is where the {p} tag is inserted, not reflected in
// inputTokens yet, however.
$token = array($this->_pStart(), $token); $token = array($this->_pStart(), $token);
} else {
// State 3.2: <div>
// -----
}
$i = null;
if ($this->backward($i, $prev)) {
if (
!$prev instanceof HTMLPurifier_Token_Text
) {
// State 3.1.1: ...</p>{p}<b>
// ---
// State 3.2.1: ...</p><div>
// -----
if (!is_array($token)) $token = array($token);
array_unshift($token, new HTMLPurifier_Token_Text("\n\n"));
} else {
// State 3.1.2: ...</p>\n\n{p}<b>
// ---
// State 3.2.2: ...</p>\n\n<div>
// -----
// Note: PAR<ELEM> cannot occur because PAR would have been
// wrapped in <p> tags.
}
} }
} }
return; } else {
// State 2.2: <ul><li>
// ----
// State 2.4: <p><b>
// ---
} }
// check if the start tag counts as a "block" element
if (!$this->_isInline($token)) return;
// append a paragraph tag before the token
$token = array($this->_pStart(), $token);
} }
/** /**
@@ -142,96 +205,80 @@ class HTMLPurifier_Injector_AutoParagraph extends HTMLPurifier_Injector
*/ */
private function _splitText($data, &$result) { private function _splitText($data, &$result) {
$raw_paragraphs = explode("\n\n", $data); $raw_paragraphs = explode("\n\n", $data);
$paragraphs = array(); // without empty paragraphs
// remove empty paragraphs
$paragraphs = array();
$needs_start = false; $needs_start = false;
$needs_end = false; $needs_end = false;
$c = count($raw_paragraphs); $c = count($raw_paragraphs);
if ($c == 1) { if ($c == 1) {
// there were no double-newlines, abort quickly // There were no double-newlines, abort quickly. In theory this
// should never happen.
$result[] = new HTMLPurifier_Token_Text($data); $result[] = new HTMLPurifier_Token_Text($data);
return; return;
} }
for ($i = 0; $i < $c; $i++) { for ($i = 0; $i < $c; $i++) {
$par = $raw_paragraphs[$i]; $par = $raw_paragraphs[$i];
if (trim($par) !== '') { if (trim($par) !== '') {
$paragraphs[] = $par; $paragraphs[] = $par;
continue; } else {
} if ($i == 0) {
if ($i == 0 && empty($result)) { // Double newline at the front
// The empty result indicates that the AutoParagraph if (empty($result)) {
// injector did not add any start paragraph tokens. // The empty result indicates that the AutoParagraph
// The fact that the first paragraph is empty indicates // injector did not add any start paragraph tokens.
// that there was a double-newline at the start of the // This means that we have been in a paragraph for
// data. // a while, and the newline means we should start a new one.
// Combined together, this means that we are in a paragraph, $result[] = new HTMLPurifier_Token_End('p');
// and the newline means we should start a new one. $result[] = new HTMLPurifier_Token_Text("\n\n");
$result[] = new HTMLPurifier_Token_End('p'); // However, the start token should only be added if
// However, the start token should only be added if // there is more processing to be done (i.e. there are
// there is more processing to be done (i.e. there are // real paragraphs in here). If there are none, the
// real paragraphs in here). If there are none, the // next start paragraph tag will be handled by the
// next start paragraph tag will be handled by the // next call to the injector
// next run-around the injector $needs_start = true;
$needs_start = true; } else {
} elseif ($i + 1 == $c) { // We just started a new paragraph!
// a double-paragraph at the end indicates that // Reinstate a double-newline for presentation's sake, since
// there is an overriding need to start a new paragraph // it was in the source code.
// for the next section. This has no effect until array_unshift($result, new HTMLPurifier_Token_Text("\n\n"));
// we've processed all of the other paragraphs though }
$needs_end = true; } elseif ($i + 1 == $c) {
// Double newline at the end
// There should be a trailing </p> when we're finally done.
$needs_end = true;
}
} }
} }
// check if there are no "real" paragraphs to be processed // Check if this was just a giant blob of whitespace. Move this earlier,
// perhaps?
if (empty($paragraphs)) { if (empty($paragraphs)) {
return; return;
} }
// add a start tag if an end tag was added while processing // Add the start tag indicated by \n\n at the beginning of $data
// the raw paragraphs (that happens if there's a leading double if ($needs_start) {
// newline)
if ($needs_start) $result[] = $this->_pStart();
// append the paragraphs onto the result
foreach ($paragraphs as $par) {
$result[] = new HTMLPurifier_Token_Text($par);
$result[] = new HTMLPurifier_Token_End('p');
$result[] = $this->_pStart(); $result[] = $this->_pStart();
} }
// remove trailing start token, if one is needed, it will // Append the paragraphs onto the result
// be handled the next time this injector is called foreach ($paragraphs as $par) {
array_pop($result); $result[] = new HTMLPurifier_Token_Text($par);
$result[] = new HTMLPurifier_Token_End('p');
// check the outside to determine whether or not the $result[] = new HTMLPurifier_Token_Text("\n\n");
// end paragraph tag should be removed. It should be removed $result[] = $this->_pStart();
// unless the next non-whitespace token is a paragraph
// or a block element.
$remove_paragraph_end = true;
if (!$needs_end) {
// Start of the checks one after the current token's index
for ($i = $this->inputIndex + 1; isset($this->inputTokens[$i]); $i++) {
if ($this->inputTokens[$i] instanceof HTMLPurifier_Token_Start || $this->inputTokens[$i] instanceof HTMLPurifier_Token_Empty) {
$remove_paragraph_end = $this->_isInline($this->inputTokens[$i]);
}
// check if we can abort early (whitespace means we carry-on!)
if ($this->inputTokens[$i] instanceof HTMLPurifier_Token_Text && !$this->inputTokens[$i]->is_whitespace) break;
// end tags will automatically be handled by MakeWellFormed,
// so we don't have to worry about them
if ($this->inputTokens[$i] instanceof HTMLPurifier_Token_End) break;
}
} else {
$remove_paragraph_end = false;
} }
// check the outside to determine whether or not the // Remove trailing start token; Injector will handle this later if
// end paragraph tag should be removed // it was indeed needed. This prevents from needing to do a lookahead,
if ($remove_paragraph_end) { // at the cost of a lookbehind later.
array_pop($result); array_pop($result);
// If there is no need for an end tag, remove all of it and let
// MakeWellFormed close it later.
if (!$needs_end) {
array_pop($result); // removes \n\n
array_pop($result); // removes </p>
} }
} }
@@ -244,5 +291,49 @@ class HTMLPurifier_Injector_AutoParagraph extends HTMLPurifier_Injector
return isset($this->htmlDefinition->info['p']->child->elements[$token->name]); return isset($this->htmlDefinition->info['p']->child->elements[$token->name]);
} }
/**
* Looks ahead in the token list and determines whether or not we need
* to insert a <p> tag.
*/
private function _pLookAhead() {
$this->current($i, $current);
if ($current instanceof HTMLPurifier_Token_Start) $nesting = 1;
else $nesting = 0;
$ok = false;
while ($this->forwardUntilEndToken($i, $current, $nesting)) {
$result = $this->_checkNeedsP($current);
if ($result !== null) {
$ok = $result;
break;
}
}
return $ok;
}
/**
* Determines if a particular token requires an earlier inline token
* to get a paragraph. This should be used with _forwardUntilEndToken
*/
private function _checkNeedsP($current) {
if ($current instanceof HTMLPurifier_Token_Start){
if (!$this->_isInline($current)) {
// <div>PAR1<div>
// ----
// Terminate early, since we hit a block element
return false;
}
} elseif ($current instanceof HTMLPurifier_Token_Text) {
if (strpos($current->data, "\n\n") !== false) {
// <div>PAR1<b>PAR1\n\nPAR2
// ----
return true;
} else {
// <div>PAR1<b>PAR1...
// ----
}
}
return null;
}
} }

View File

@@ -0,0 +1,24 @@
<?php
/**
* Injector that displays the URL of an anchor instead of linking to it, in addition to showing the text of the link.
*/
class HTMLPurifier_Injector_DisplayLinkURI extends HTMLPurifier_Injector
{
public $name = 'DisplayLinkURI';
public $needed = array('a');
public function handleElement(&$token) {
}
public function handleEnd(&$token) {
if (isset($token->start->attr['href'])){
$url = $token->start->attr['href'];
unset($token->start->attr['href']);
$token = array($token, new HTMLPurifier_Token_Text(" ($url)"));
} else {
// nothing to display
}
}
}

View File

@@ -0,0 +1,40 @@
<?php
class HTMLPurifier_Injector_RemoveEmpty extends HTMLPurifier_Injector
{
private $context, $config;
public function prepare($config, $context) {
parent::prepare($config, $context);
$this->config = $config;
$this->context = $context;
$this->attrValidator = new HTMLPurifier_AttrValidator();
}
public function handleElement(&$token) {
if (!$token instanceof HTMLPurifier_Token_Start) return;
$next = false;
for ($i = $this->inputIndex + 1, $c = count($this->inputTokens); $i < $c; $i++) {
$next = $this->inputTokens[$i];
if ($next instanceof HTMLPurifier_Token_Text && $next->is_whitespace) continue;
break;
}
if (!$next || ($next instanceof HTMLPurifier_Token_End && $next->name == $token->name)) {
if ($token->name == 'colgroup') return;
$this->attrValidator->validateToken($token, $this->config, $this->context);
$token->armor['ValidateAttributes'] = true;
if (isset($token->attr['id']) || isset($token->attr['name'])) return;
$token = $i - $this->inputIndex + 1;
for ($b = $this->inputIndex - 1; $b > 0; $b--) {
$prev = $this->inputTokens[$b];
if ($prev instanceof HTMLPurifier_Token_Text && $prev->is_whitespace) continue;
break;
}
// This is safe because we removed the token that triggered this.
$this->rewind($b - 1);
return;
}
}
}

View File

@@ -72,7 +72,10 @@ class HTMLPurifier_Injector_SafeObject extends HTMLPurifier_Injector
} }
} }
public function notifyEnd($token) { public function handleEnd(&$token) {
// This is the WRONG way of handling the object and param stacks;
// we should be inserting them directly on the relevant object tokens
// so that the global stack handling handles it.
if ($token->name == 'object') { if ($token->name == 'object') {
array_pop($this->objectStack); array_pop($this->objectStack);
array_pop($this->paramStack); array_pop($this->paramStack);

View File

@@ -15,7 +15,8 @@ $messages = array(
'Item separator last' => ' and ', // non-Harvard style 'Item separator last' => ' and ', // non-Harvard style
'ErrorCollector: No errors' => 'No errors detected. However, because error reporting is still incomplete, there may have been errors that the error collector was not notified of; please inspect the output HTML carefully.', 'ErrorCollector: No errors' => 'No errors detected. However, because error reporting is still incomplete, there may have been errors that the error collector was not notified of; please inspect the output HTML carefully.',
'ErrorCollector: At line' => ' at line $line', 'ErrorCollector: At line' => ' at line $line',
'ErrorCollector: Incidental errors' => 'Incidental errors',
'Lexer: Unclosed comment' => 'Unclosed comment', 'Lexer: Unclosed comment' => 'Unclosed comment',
'Lexer: Unescaped lt' => 'Unescaped less-than sign (<) should be &lt;', 'Lexer: Unescaped lt' => 'Unescaped less-than sign (<) should be &lt;',
@@ -30,6 +31,8 @@ $messages = array(
'Strategy_RemoveForeignElements: Comment removed' => 'Comment containing "$CurrentToken.Data" removed', 'Strategy_RemoveForeignElements: Comment removed' => 'Comment containing "$CurrentToken.Data" removed',
'Strategy_RemoveForeignElements: Foreign meta element removed' => 'Unrecognized $CurrentToken.Serialized meta tag and all descendants removed', 'Strategy_RemoveForeignElements: Foreign meta element removed' => 'Unrecognized $CurrentToken.Serialized meta tag and all descendants removed',
'Strategy_RemoveForeignElements: Token removed to end' => 'Tags and text starting from $1 element where removed to end', 'Strategy_RemoveForeignElements: Token removed to end' => 'Tags and text starting from $1 element where removed to end',
'Strategy_RemoveForeignElements: Trailing hyphen in comment removed' => 'Trailing hyphen(s) in comment removed',
'Strategy_RemoveForeignElements: Hyphens in comment collapsed' => 'Double hyphens in comments are not allowed, and were collapsed into single hyphens',
'Strategy_MakeWellFormed: Unnecessary end tag removed' => 'Unnecessary $CurrentToken.Serialized tag removed', 'Strategy_MakeWellFormed: Unnecessary end tag removed' => 'Unnecessary $CurrentToken.Serialized tag removed',
'Strategy_MakeWellFormed: Unnecessary end tag to text' => 'Unnecessary $CurrentToken.Serialized tag converted to text', 'Strategy_MakeWellFormed: Unnecessary end tag to text' => 'Unnecessary $CurrentToken.Serialized tag converted to text',
@@ -50,8 +53,8 @@ $messages = array(
); );
$errorNames = array( $errorNames = array(
E_ERROR => 'Error', E_ERROR => 'Error',
E_WARNING => 'Warning', E_WARNING => 'Warning',
E_NOTICE => 'Notice' E_NOTICE => 'Notice'
); );

View File

@@ -42,6 +42,12 @@
class HTMLPurifier_Lexer class HTMLPurifier_Lexer
{ {
/**
* Whether or not this lexer implements line-number/column-number tracking.
* If it does, set to true.
*/
public $tracksLineNumbers = false;
// -- STATIC ---------------------------------------------------------- // -- STATIC ----------------------------------------------------------
/** /**
@@ -70,46 +76,65 @@ class HTMLPurifier_Lexer
$lexer = $config->get('Core', 'LexerImpl'); $lexer = $config->get('Core', 'LexerImpl');
} }
$needs_tracking =
$config->get('Core', 'MaintainLineNumbers') ||
$config->get('Core', 'CollectErrors');
$inst = null;
if (is_object($lexer)) { if (is_object($lexer)) {
return $lexer; $inst = $lexer;
} else {
if (is_null($lexer)) { do {
// auto-detection algorithm
if ($needs_tracking) {
$lexer = 'DirectLex';
break;
}
if (
class_exists('DOMDocument') &&
method_exists('DOMDocument', 'loadHTML') &&
!extension_loaded('domxml')
) {
// check for DOM support, because while it's part of the
// core, it can be disabled compile time. Also, the PECL
// domxml extension overrides the default DOM, and is evil
// and nasty and we shan't bother to support it
$lexer = 'DOMLex';
} else {
$lexer = 'DirectLex';
}
} while(0); } // do..while so we can break
// instantiate recognized string names
switch ($lexer) {
case 'DOMLex':
$inst = new HTMLPurifier_Lexer_DOMLex();
break;
case 'DirectLex':
$inst = new HTMLPurifier_Lexer_DirectLex();
break;
case 'PH5P':
$inst = new HTMLPurifier_Lexer_PH5P();
break;
default:
throw new HTMLPurifier_Exception("Cannot instantiate unrecognized Lexer type " . htmlspecialchars($lexer));
}
} }
if (is_null($lexer)) { do { if (!$inst) throw new HTMLPurifier_Exception('No lexer was instantiated');
// auto-detection algorithm
// once PHP DOM implements native line numbers, or we
// hack out something using XSLT, remove this stipulation
$line_numbers = $config->get('Core', 'MaintainLineNumbers');
if (
$line_numbers === true ||
($line_numbers === null && $config->get('Core', 'CollectErrors'))
) {
$lexer = 'DirectLex';
break;
}
if (class_exists('DOMDocument')) {
// check for DOM support, because, surprisingly enough,
// it's *not* part of the core!
$lexer = 'DOMLex';
} else {
$lexer = 'DirectLex';
}
} while(0); } // do..while so we can break
// instantiate recognized string names // once PHP DOM implements native line numbers, or we
switch ($lexer) { // hack out something using XSLT, remove this stipulation
case 'DOMLex': if ($needs_tracking && !$inst->tracksLineNumbers) {
return new HTMLPurifier_Lexer_DOMLex(); throw new HTMLPurifier_Exception('Cannot use lexer that does not support line numbers with Core.MaintainLineNumbers or Core.CollectErrors (use DirectLex instead)');
case 'DirectLex':
return new HTMLPurifier_Lexer_DirectLex();
case 'PH5P':
return new HTMLPurifier_Lexer_PH5P();
default:
trigger_error("Cannot instantiate unrecognized Lexer type " . htmlspecialchars($lexer), E_USER_ERROR);
} }
return $inst;
} }
// -- CONVENIENCE MEMBERS --------------------------------------------- // -- CONVENIENCE MEMBERS ---------------------------------------------
@@ -226,11 +251,6 @@ class HTMLPurifier_Lexer
*/ */
public function normalize($html, $config, $context) { public function normalize($html, $config, $context) {
// extract body from document if applicable
if ($config->get('Core', 'ConvertDocumentToFragment')) {
$html = $this->extractBody($html);
}
// normalize newlines to \n // normalize newlines to \n
$html = str_replace("\r\n", "\n", $html); $html = str_replace("\r\n", "\n", $html);
$html = str_replace("\r", "\n", $html); $html = str_replace("\r", "\n", $html);
@@ -243,6 +263,11 @@ class HTMLPurifier_Lexer
// escape CDATA // escape CDATA
$html = $this->escapeCDATA($html); $html = $this->escapeCDATA($html);
// extract body from document if applicable
if ($config->get('Core', 'ConvertDocumentToFragment')) {
$html = $this->extractBody($html);
}
// expand entities that aren't the big five // expand entities that aren't the big five
$html = $this->_entity_parser->substituteNonSpecialEntities($html); $html = $this->_entity_parser->substituteNonSpecialEntities($html);

View File

@@ -45,7 +45,10 @@ class HTMLPurifier_Lexer_DOMLex extends HTMLPurifier_Lexer
$char = '[^a-z!\/]'; $char = '[^a-z!\/]';
$comment = "/<!--(.*?)(-->|\z)/is"; $comment = "/<!--(.*?)(-->|\z)/is";
$html = preg_replace_callback($comment, array($this, 'callbackArmorCommentEntities'), $html); $html = preg_replace_callback($comment, array($this, 'callbackArmorCommentEntities'), $html);
$html = preg_replace("/<($char)/i", '&lt;\\1', $html); do {
$old = $html;
$html = preg_replace("/<($char)/i", '&lt;\\1', $html);
} while ($html !== $old);
$html = preg_replace_callback($comment, array($this, 'callbackUndoCommentSubst'), $html); // fix comments $html = preg_replace_callback($comment, array($this, 'callbackUndoCommentSubst'), $html); // fix comments
} }

View File

@@ -13,6 +13,8 @@
class HTMLPurifier_Lexer_DirectLex extends HTMLPurifier_Lexer class HTMLPurifier_Lexer_DirectLex extends HTMLPurifier_Lexer
{ {
public $tracksLineNumbers = true;
/** /**
* Whitespace characters for str(c)spn. * Whitespace characters for str(c)spn.
*/ */
@@ -42,6 +44,7 @@ class HTMLPurifier_Lexer_DirectLex extends HTMLPurifier_Lexer
$inside_tag = false; // whether or not we're parsing the inside of a tag $inside_tag = false; // whether or not we're parsing the inside of a tag
$array = array(); // result array $array = array(); // result array
// This is also treated to mean maintain *column* numbers too
$maintain_line_numbers = $config->get('Core', 'MaintainLineNumbers'); $maintain_line_numbers = $config->get('Core', 'MaintainLineNumbers');
if ($maintain_line_numbers === null) { if ($maintain_line_numbers === null) {
@@ -50,9 +53,17 @@ class HTMLPurifier_Lexer_DirectLex extends HTMLPurifier_Lexer
$maintain_line_numbers = $config->get('Core', 'CollectErrors'); $maintain_line_numbers = $config->get('Core', 'CollectErrors');
} }
if ($maintain_line_numbers) $current_line = 1; if ($maintain_line_numbers) {
else $current_line = false; $current_line = 1;
$current_col = 0;
$length = strlen($html);
} else {
$current_line = false;
$current_col = false;
$length = false;
}
$context->register('CurrentLine', $current_line); $context->register('CurrentLine', $current_line);
$context->register('CurrentCol', $current_col);
$nl = "\n"; $nl = "\n";
// how often to manually recalculate. This will ALWAYS be right, // how often to manually recalculate. This will ALWAYS be right,
// but it's pretty wasteful. Set to 0 to turn off // but it's pretty wasteful. Set to 0 to turn off
@@ -68,14 +79,31 @@ class HTMLPurifier_Lexer_DirectLex extends HTMLPurifier_Lexer
while(++$loops) { while(++$loops) {
// recalculate lines // $cursor is either at the start of a token, or inside of
if ( // a tag (i.e. there was a < immediately before it), as indicated
$maintain_line_numbers && // line number tracking is on // by $inside_tag
$synchronize_interval && // synchronization is on
$cursor > 0 && // cursor is further than zero if ($maintain_line_numbers) {
$loops % $synchronize_interval === 0 // time to synchronize!
) { // $rcursor, however, is always at the start of a token.
$current_line = 1 + $this->substrCount($html, $nl, 0, $cursor); $rcursor = $cursor - (int) $inside_tag;
// Column number is cheap, so we calculate it every round.
// We're interested at the *end* of the newline string, so
// we need to add strlen($nl) == 1 to $nl_pos before subtracting it
// from our "rcursor" position.
$nl_pos = strrpos($html, $nl, $rcursor - $length);
$current_col = $rcursor - (is_bool($nl_pos) ? 0 : $nl_pos + 1);
// recalculate lines
if (
$synchronize_interval && // synchronization is on
$cursor > 0 && // cursor is further than zero
$loops % $synchronize_interval === 0 // time to synchronize!
) {
$current_line = 1 + $this->substrCount($html, $nl, 0, $cursor);
}
} }
$position_next_lt = strpos($html, '<', $cursor); $position_next_lt = strpos($html, '<', $cursor);
@@ -99,7 +127,7 @@ class HTMLPurifier_Lexer_DirectLex extends HTMLPurifier_Lexer
) )
); );
if ($maintain_line_numbers) { if ($maintain_line_numbers) {
$token->line = $current_line; $token->rawPosition($current_line, $current_col);
$current_line += $this->substrCount($html, $nl, $cursor, $position_next_lt - $cursor); $current_line += $this->substrCount($html, $nl, $cursor, $position_next_lt - $cursor);
} }
$array[] = $token; $array[] = $token;
@@ -119,7 +147,7 @@ class HTMLPurifier_Lexer_DirectLex extends HTMLPurifier_Lexer
) )
) )
); );
if ($maintain_line_numbers) $token->line = $current_line; if ($maintain_line_numbers) $token->rawPosition($current_line, $current_col);
$array[] = $token; $array[] = $token;
break; break;
} elseif ($inside_tag && $position_next_gt !== false) { } elseif ($inside_tag && $position_next_gt !== false) {
@@ -167,7 +195,7 @@ class HTMLPurifier_Lexer_DirectLex extends HTMLPurifier_Lexer
) )
); );
if ($maintain_line_numbers) { if ($maintain_line_numbers) {
$token->line = $current_line; $token->rawPosition($current_line, $current_col);
$current_line += $this->substrCount($html, $nl, $cursor, $strlen_segment); $current_line += $this->substrCount($html, $nl, $cursor, $strlen_segment);
} }
$array[] = $token; $array[] = $token;
@@ -182,7 +210,7 @@ class HTMLPurifier_Lexer_DirectLex extends HTMLPurifier_Lexer
$type = substr($segment, 1); $type = substr($segment, 1);
$token = new HTMLPurifier_Token_End($type); $token = new HTMLPurifier_Token_End($type);
if ($maintain_line_numbers) { if ($maintain_line_numbers) {
$token->line = $current_line; $token->rawPosition($current_line, $current_col);
$current_line += $this->substrCount($html, $nl, $cursor, $position_next_gt - $cursor); $current_line += $this->substrCount($html, $nl, $cursor, $position_next_gt - $cursor);
} }
$array[] = $token; $array[] = $token;
@@ -197,20 +225,12 @@ class HTMLPurifier_Lexer_DirectLex extends HTMLPurifier_Lexer
if (!ctype_alpha($segment[0])) { if (!ctype_alpha($segment[0])) {
// XML: $segment[0] !== '_' && $segment[0] !== ':' // XML: $segment[0] !== '_' && $segment[0] !== ':'
if ($e) $e->send(E_NOTICE, 'Lexer: Unescaped lt'); if ($e) $e->send(E_NOTICE, 'Lexer: Unescaped lt');
$token = new $token = new HTMLPurifier_Token_Text('<');
HTMLPurifier_Token_Text(
'<' .
$this->parseData(
$segment
) .
'>'
);
if ($maintain_line_numbers) { if ($maintain_line_numbers) {
$token->line = $current_line; $token->rawPosition($current_line, $current_col);
$current_line += $this->substrCount($html, $nl, $cursor, $position_next_gt - $cursor); $current_line += $this->substrCount($html, $nl, $cursor, $position_next_gt - $cursor);
} }
$array[] = $token; $array[] = $token;
$cursor = $position_next_gt + 1;
$inside_tag = false; $inside_tag = false;
continue; continue;
} }
@@ -235,7 +255,7 @@ class HTMLPurifier_Lexer_DirectLex extends HTMLPurifier_Lexer
$token = new HTMLPurifier_Token_Start($segment); $token = new HTMLPurifier_Token_Start($segment);
} }
if ($maintain_line_numbers) { if ($maintain_line_numbers) {
$token->line = $current_line; $token->rawPosition($current_line, $current_col);
$current_line += $this->substrCount($html, $nl, $cursor, $position_next_gt - $cursor); $current_line += $this->substrCount($html, $nl, $cursor, $position_next_gt - $cursor);
} }
$array[] = $token; $array[] = $token;
@@ -267,7 +287,7 @@ class HTMLPurifier_Lexer_DirectLex extends HTMLPurifier_Lexer
$token = new HTMLPurifier_Token_Start($type, $attr); $token = new HTMLPurifier_Token_Start($type, $attr);
} }
if ($maintain_line_numbers) { if ($maintain_line_numbers) {
$token->line = $current_line; $token->rawPosition($current_line, $current_col);
$current_line += $this->substrCount($html, $nl, $cursor, $position_next_gt - $cursor); $current_line += $this->substrCount($html, $nl, $cursor, $position_next_gt - $cursor);
} }
$array[] = $token; $array[] = $token;
@@ -284,7 +304,7 @@ class HTMLPurifier_Lexer_DirectLex extends HTMLPurifier_Lexer
substr($html, $cursor) substr($html, $cursor)
) )
); );
if ($maintain_line_numbers) $token->line = $current_line; if ($maintain_line_numbers) $token->rawPosition($current_line, $current_col);
// no cursor scroll? Hmm... // no cursor scroll? Hmm...
$array[] = $token; $array[] = $token;
break; break;
@@ -293,6 +313,7 @@ class HTMLPurifier_Lexer_DirectLex extends HTMLPurifier_Lexer
} }
$context->destroy('CurrentLine'); $context->destroy('CurrentLine');
$context->destroy('CurrentCol');
return $array; return $array;
} }

View File

@@ -7,31 +7,61 @@ class HTMLPurifier_Strategy_MakeWellFormed extends HTMLPurifier_Strategy
{ {
/** /**
* Locally shared variable references * Array stream of tokens being processed.
*/ */
protected $inputTokens, $inputIndex, $outputTokens, $currentNesting, protected $tokens;
$currentInjector, $injectors;
/**
* Current index in $tokens.
*/
protected $t;
/**
* Current nesting of elements.
*/
protected $stack;
/**
* Injectors active in this stream processing.
*/
protected $injectors;
/**
* Current instance of HTMLPurifier_Config.
*/
protected $config;
/**
* Current instance of HTMLPurifier_Context.
*/
protected $context;
public function execute($tokens, $config, $context) { public function execute($tokens, $config, $context) {
$definition = $config->getHTMLDefinition(); $definition = $config->getHTMLDefinition();
// local variables // local variables
$result = array();
$generator = new HTMLPurifier_Generator($config, $context); $generator = new HTMLPurifier_Generator($config, $context);
$escape_invalid_tags = $config->get('Core', 'EscapeInvalidTags'); $escape_invalid_tags = $config->get('Core', 'EscapeInvalidTags');
$e = $context->get('ErrorCollector', true); $e = $context->get('ErrorCollector', true);
$t = false; // token index
$i = false; // injector index
$token = false; // the current token
$reprocess = false; // whether or not to reprocess the same token
$stack = array();
// member variables // member variables
$this->currentNesting = array(); $this->stack =& $stack;
$this->inputIndex = false; $this->t =& $t;
$this->inputTokens =& $tokens; $this->tokens =& $tokens;
$this->outputTokens =& $result; $this->config = $config;
$this->context = $context;
// context variables // context variables
$context->register('CurrentNesting', $this->currentNesting); $context->register('CurrentNesting', $stack);
$context->register('InputIndex', $this->inputIndex); $context->register('InputIndex', $t);
$context->register('InputTokens', $tokens); $context->register('InputTokens', $tokens);
$context->register('CurrentToken', $token);
// -- begin INJECTOR -- // -- begin INJECTOR --
@@ -58,73 +88,119 @@ class HTMLPurifier_Strategy_MakeWellFormed extends HTMLPurifier_Strategy
$this->injectors[] = $injector; $this->injectors[] = $injector;
} }
// array index of the injector that resulted in an array
// substitution. This enables processTokens() to know which
// injectors are affected by the added tokens and which are
// not (namely, the ones after the current injector are not
// affected)
$this->currentInjector = false;
// give the injectors references to the definition and context // give the injectors references to the definition and context
// variables for performance reasons // variables for performance reasons
foreach ($this->injectors as $i => $injector) { foreach ($this->injectors as $ix => $injector) {
$error = $injector->prepare($config, $context); $error = $injector->prepare($config, $context);
if (!$error) continue; if (!$error) continue;
array_splice($this->injectors, $i, 1); // rm the injector array_splice($this->injectors, $ix, 1); // rm the injector
trigger_error("Cannot enable {$injector->name} injector because $error is not allowed", E_USER_WARNING); trigger_error("Cannot enable {$injector->name} injector because $error is not allowed", E_USER_WARNING);
} }
// warning: most foreach loops follow the convention $i => $injector.
// Don't define these as loop-wide variables, please!
// -- end INJECTOR -- // -- end INJECTOR --
$token = false; // a note on punting:
$context->register('CurrentToken', $token); // In order to reduce code duplication, whenever some code needs
// to make HTML changes in order to make things "correct", the
// new HTML gets sent through the purifier, regardless of its
// status. This means that if we add a start token, because it
// was totally necessary, we don't have to update nesting; we just
// punt ($reprocess = true; continue;) and it does that for us.
// isset is in loop because $tokens size changes during loop exec // isset is in loop because $tokens size changes during loop exec
for ($this->inputIndex = 0; isset($tokens[$this->inputIndex]); $this->inputIndex++) { for (
$t = 0;
$t == 0 || isset($tokens[$t - 1]);
// only increment if we don't need to reprocess
$reprocess ? $reprocess = false : $t++
) {
// if all goes well, this token will be passed through unharmed // check for a rewind
$token = $tokens[$this->inputIndex]; if (is_int($i) && $i >= 0) {
// possibility: disable rewinding if the current token has a
//printTokens($tokens, $this->inputIndex); // rewind set on it already. This would offer protection from
// infinite loop, but might hinder some advanced rewinding.
foreach ($this->injectors as $injector) { $rewind_to = $this->injectors[$i]->getRewind();
if ($injector->skip > 0) $injector->skip--; if (is_int($rewind_to) && $rewind_to < $t) {
if ($rewind_to < 0) $rewind_to = 0;
while ($t > $rewind_to) {
$t--;
$prev = $tokens[$t];
// indicate that other injectors should not process this token,
// but we need to reprocess it
unset($prev->skip[$i]);
$prev->rewind = $i;
if ($prev instanceof HTMLPurifier_Token_Start) array_pop($this->stack);
elseif ($prev instanceof HTMLPurifier_Token_End) $this->stack[] = $prev->start;
}
}
$i = false;
} }
// quick-check: if it's not a tag, no need to process // handle case of document end
if (empty( $token->is_tag )) { if (!isset($tokens[$t])) {
if ($token instanceof HTMLPurifier_Token_Text) { // kill processing if stack is empty
// injector handler code; duplicated for performance reasons if (empty($this->stack)) break;
foreach ($this->injectors as $i => $injector) {
if (!$injector->skip) $injector->handleText($token); // peek
if (is_array($token)) { $top_nesting = array_pop($this->stack);
$this->currentInjector = $i; $this->stack[] = $top_nesting;
break;
} // send error
} if ($e && !isset($top_nesting->armor['MakeWellFormed_TagClosedError'])) {
$e->send(E_NOTICE, 'Strategy_MakeWellFormed: Tag closed by document end', $top_nesting);
} }
$this->processToken($token, $config, $context);
// append, don't splice, since this is the end
$tokens[] = new HTMLPurifier_Token_End($top_nesting->name);
// punt!
$reprocess = true;
continue; continue;
} }
$info = $definition->info[$token->name]->child; // if all goes well, this token will be passed through unharmed
$token = $tokens[$t];
//echo '<hr>';
//printTokens($tokens, $t);
//var_dump($this->stack);
// quick-check: if it's not a tag, no need to process
if (empty($token->is_tag)) {
if ($token instanceof HTMLPurifier_Token_Text) {
foreach ($this->injectors as $i => $injector) {
if (isset($token->skip[$i])) continue;
if ($token->rewind !== null && $token->rewind !== $i) continue;
$injector->handleText($token);
$this->processToken($token, $i);
$reprocess = true;
break;
}
}
// another possibility is a comment
continue;
}
if (isset($definition->info[$token->name])) {
$type = $definition->info[$token->name]->child->type;
} else {
$type = false; // Type is unknown, treat accordingly
}
// quick tag checks: anything that's *not* an end tag // quick tag checks: anything that's *not* an end tag
$ok = false; $ok = false;
if ($info->type === 'empty' && $token instanceof HTMLPurifier_Token_Start) { if ($type === 'empty' && $token instanceof HTMLPurifier_Token_Start) {
// test if it claims to be a start tag but is empty // claims to be a start tag but is empty
$token = new HTMLPurifier_Token_Empty($token->name, $token->attr); $token = new HTMLPurifier_Token_Empty($token->name, $token->attr);
$ok = true; $ok = true;
} elseif ($info->type !== 'empty' && $token instanceof HTMLPurifier_Token_Empty) { } elseif ($type && $type !== 'empty' && $token instanceof HTMLPurifier_Token_Empty) {
// claims to be empty but really is a start tag // claims to be empty but really is a start tag
$token = array( $this->swap(new HTMLPurifier_Token_End($token->name));
new HTMLPurifier_Token_Start($token->name, $token->attr), $this->insertBefore(new HTMLPurifier_Token_Start($token->name, $token->attr));
new HTMLPurifier_Token_End($token->name) // punt (since we had to modify the input stream in a non-trivial way)
); $reprocess = true;
$ok = true; continue;
} elseif ($token instanceof HTMLPurifier_Token_Empty) { } elseif ($token instanceof HTMLPurifier_Token_Empty) {
// real empty token // real empty token
$ok = true; $ok = true;
@@ -132,62 +208,88 @@ class HTMLPurifier_Strategy_MakeWellFormed extends HTMLPurifier_Strategy
// start tag // start tag
// ...unless they also have to close their parent // ...unless they also have to close their parent
if (!empty($this->currentNesting)) { if (!empty($this->stack)) {
$parent = array_pop($this->currentNesting); $parent = array_pop($this->stack);
$parent_info = $definition->info[$parent->name]; $this->stack[] = $parent;
// this can be replaced with a more general algorithm: if (isset($definition->info[$parent->name])) {
// if the token is not allowed by the parent, auto-close $elements = $definition->info[$parent->name]->child->getNonAutoCloseElements($config);
// the parent $autoclose = !isset($elements[$token->name]);
if (!isset($parent_info->child->elements[$token->name])) { } else {
$autoclose = false;
}
if ($autoclose) {
if ($e) $e->send(E_NOTICE, 'Strategy_MakeWellFormed: Tag auto closed', $parent); if ($e) $e->send(E_NOTICE, 'Strategy_MakeWellFormed: Tag auto closed', $parent);
// close the parent, then re-loop to reprocess token // insert parent end tag before this tag
$result[] = new HTMLPurifier_Token_End($parent->name); $new_token = new HTMLPurifier_Token_End($parent->name);
$this->inputIndex--; $new_token->start = $parent;
$this->insertBefore($new_token);
$reprocess = true;
continue; continue;
} }
$this->currentNesting[] = $parent; // undo the pop
} }
$ok = true; $ok = true;
} }
// injector handler code; duplicated for performance reasons
if ($ok) { if ($ok) {
foreach ($this->injectors as $i => $injector) { foreach ($this->injectors as $i => $injector) {
if (!$injector->skip) $injector->handleElement($token); if (isset($token->skip[$i])) continue;
if (is_array($token)) { if ($token->rewind !== null && $token->rewind !== $i) continue;
$this->currentInjector = $i; $injector->handleElement($token);
break; $this->processToken($token, $i);
$reprocess = true;
break;
}
if (!$reprocess) {
// ah, nothing interesting happened; do normal processing
$this->swap($token);
if ($token instanceof HTMLPurifier_Token_Start) {
$this->stack[] = $token;
} elseif ($token instanceof HTMLPurifier_Token_End) {
throw new HTMLPurifier_Exception('Improper handling of end tag in start code; possible error in MakeWellFormed');
} }
} }
$this->processToken($token, $config, $context);
continue; continue;
} }
// sanity check: we should be dealing with a closing tag // sanity check: we should be dealing with a closing tag
if (!$token instanceof HTMLPurifier_Token_End) continue; if (!$token instanceof HTMLPurifier_Token_End) {
throw new HTMLPurifier_Exception('Unaccounted for tag token in input stream, bug in HTML Purifier');
}
// make sure that we have something open // make sure that we have something open
if (empty($this->currentNesting)) { if (empty($this->stack)) {
if ($escape_invalid_tags) { if ($escape_invalid_tags) {
if ($e) $e->send(E_WARNING, 'Strategy_MakeWellFormed: Unnecessary end tag to text'); if ($e) $e->send(E_WARNING, 'Strategy_MakeWellFormed: Unnecessary end tag to text');
$result[] = new HTMLPurifier_Token_Text( $this->swap(new HTMLPurifier_Token_Text(
$generator->generateFromToken($token) $generator->generateFromToken($token)
); ));
} elseif ($e) { } else {
$e->send(E_WARNING, 'Strategy_MakeWellFormed: Unnecessary end tag removed'); $this->remove();
if ($e) $e->send(E_WARNING, 'Strategy_MakeWellFormed: Unnecessary end tag removed');
} }
$reprocess = true;
continue; continue;
} }
// first, check for the simplest case: everything closes neatly // first, check for the simplest case: everything closes neatly.
$current_parent = array_pop($this->currentNesting); // Eventually, everything passes through here; if there are problems
// we modify the input stream accordingly and then punt, so that
// the tokens get processed again.
$current_parent = array_pop($this->stack);
if ($current_parent->name == $token->name) { if ($current_parent->name == $token->name) {
$result[] = $token; $token->start = $current_parent;
foreach ($this->injectors as $i => $injector) { foreach ($this->injectors as $i => $injector) {
$injector->notifyEnd($token); if (isset($token->skip[$i])) continue;
if ($token->rewind !== null && $token->rewind !== $i) continue;
$injector->handleEnd($token);
$this->processToken($token, $i);
$this->stack[] = $current_parent;
$reprocess = true;
break;
} }
continue; continue;
} }
@@ -195,47 +297,56 @@ class HTMLPurifier_Strategy_MakeWellFormed extends HTMLPurifier_Strategy
// okay, so we're trying to close the wrong tag // okay, so we're trying to close the wrong tag
// undo the pop previous pop // undo the pop previous pop
$this->currentNesting[] = $current_parent; $this->stack[] = $current_parent;
// scroll back the entire nest, trying to find our tag. // scroll back the entire nest, trying to find our tag.
// (feature could be to specify how far you'd like to go) // (feature could be to specify how far you'd like to go)
$size = count($this->currentNesting); $size = count($this->stack);
// -2 because -1 is the last element, but we already checked that // -2 because -1 is the last element, but we already checked that
$skipped_tags = false; $skipped_tags = false;
for ($i = $size - 2; $i >= 0; $i--) { for ($j = $size - 2; $j >= 0; $j--) {
if ($this->currentNesting[$i]->name == $token->name) { if ($this->stack[$j]->name == $token->name) {
// current nesting is modified $skipped_tags = array_slice($this->stack, $j);
$skipped_tags = array_splice($this->currentNesting, $i);
break; break;
} }
} }
// we still didn't find the tag, so remove // we didn't find the tag, so remove
if ($skipped_tags === false) { if ($skipped_tags === false) {
if ($escape_invalid_tags) { if ($escape_invalid_tags) {
$result[] = new HTMLPurifier_Token_Text( $this->swap(new HTMLPurifier_Token_Text(
$generator->generateFromToken($token) $generator->generateFromToken($token)
); ));
if ($e) $e->send(E_WARNING, 'Strategy_MakeWellFormed: Stray end tag to text'); if ($e) $e->send(E_WARNING, 'Strategy_MakeWellFormed: Stray end tag to text');
} elseif ($e) { } else {
$e->send(E_WARNING, 'Strategy_MakeWellFormed: Stray end tag removed'); $this->remove();
if ($e) $e->send(E_WARNING, 'Strategy_MakeWellFormed: Stray end tag removed');
} }
$reprocess = true;
continue; continue;
} }
// okay, we found it, close all the skipped tags // do errors, in REVERSE $j order: a,b,c with </a></b></c>
// note that skipped tags contains the element we need closed $c = count($skipped_tags);
for ($i = count($skipped_tags) - 1; $i >= 0; $i--) { if ($e) {
// please don't redefine $i! for ($j = $c - 1; $j > 0; $j--) {
if ($i && $e && !isset($skipped_tags[$i]->armor['MakeWellFormed_TagClosedError'])) { // notice we exclude $j == 0, i.e. the current ending tag, from
$e->send(E_NOTICE, 'Strategy_MakeWellFormed: Tag closed by element end', $skipped_tags[$i]); // the errors...
} if (!isset($skipped_tags[$j]->armor['MakeWellFormed_TagClosedError'])) {
$result[] = $new_token = new HTMLPurifier_Token_End($skipped_tags[$i]->name); $e->send(E_NOTICE, 'Strategy_MakeWellFormed: Tag closed by element end', $skipped_tags[$j]);
foreach ($this->injectors as $injector) { }
$injector->notifyEnd($new_token);
} }
} }
// insert tags, in FORWARD $j order: c,b,a with </a></b></c>
for ($j = 1; $j < $c; $j++) {
// ...as well as from the insertions
$new_token = new HTMLPurifier_Token_End($skipped_tags[$j]->name);
$new_token->start = $skipped_tags[$j];
$this->insertBefore($new_token);
}
$reprocess = true;
continue;
} }
$context->destroy('CurrentNesting'); $context->destroy('CurrentNesting');
@@ -243,59 +354,77 @@ class HTMLPurifier_Strategy_MakeWellFormed extends HTMLPurifier_Strategy
$context->destroy('InputIndex'); $context->destroy('InputIndex');
$context->destroy('CurrentToken'); $context->destroy('CurrentToken');
// we're at the end now, fix all still unclosed tags (this is unset($this->injectors, $this->stack, $this->tokens, $this->t);
// duplicated from the end of the loop with some slight modifications) return $tokens;
// not using $skipped_tags since it would invariably be all of them
if (!empty($this->currentNesting)) {
for ($i = count($this->currentNesting) - 1; $i >= 0; $i--) {
// please don't redefine $i!
if ($e && !isset($this->currentNesting[$i]->armor['MakeWellFormed_TagClosedError'])) {
$e->send(E_NOTICE, 'Strategy_MakeWellFormed: Tag closed by document end', $this->currentNesting[$i]);
}
$result[] = $new_token = new HTMLPurifier_Token_End($this->currentNesting[$i]->name);
foreach ($this->injectors as $injector) {
$injector->notifyEnd($new_token);
}
}
}
unset($this->outputTokens, $this->injectors, $this->currentInjector,
$this->currentNesting, $this->inputTokens, $this->inputIndex);
return $result;
} }
function processToken($token, $config, $context) { /**
if (is_array($token)) { * Processes arbitrary token values for complicated substitution patterns.
// the original token was overloaded by an injector, time * In general:
// to some fancy acrobatics *
* If $token is an array, it is a list of tokens to substitute for the
// $this->inputIndex is decremented so that the entire set gets * current token. These tokens then get individually processed. If there
// re-processed * is a leading integer in the list, that integer determines how many
array_splice($this->inputTokens, $this->inputIndex--, 1, $token); * tokens from the stream should be removed.
*
// adjust the injector skips based on the array substitution * If $token is a regular token, it is swapped with the current token.
if ($this->injectors) { *
$offset = count($token); * If $token is false, the current token is deleted.
for ($i = 0; $i <= $this->currentInjector; $i++) { *
// because of the skip back, we need to add one more * If $token is an integer, that number of tokens (with the first token
// for uninitialized injectors. I'm not exactly * being the current one) will be deleted.
// sure why this is the case, but I think it has to *
// do with the fact that we're decrementing skips * @param $token Token substitution value
// before re-checking text * @param $injector Injector that performed the substitution; default is if
if (!$this->injectors[$i]->skip) $this->injectors[$i]->skip++; * this is not an injector related operation.
$this->injectors[$i]->skip += $offset; */
} protected function processToken($token, $injector = -1) {
}
} elseif ($token) { // normalize forms of token
// regular case if (is_object($token)) $token = array(1, $token);
$this->outputTokens[] = $token; if (is_int($token)) $token = array($token);
if ($token instanceof HTMLPurifier_Token_Start) { if ($token === false) $token = array(1);
$this->currentNesting[] = $token; if (!is_array($token)) throw new HTMLPurifier_Exception('Invalid token type from injector');
} elseif ($token instanceof HTMLPurifier_Token_End) { if (!is_int($token[0])) array_unshift($token, 1);
array_pop($this->currentNesting); // not actually used if ($token[0] === 0) throw new HTMLPurifier_Exception('Deleting zero tokens is not valid');
// $token is now an array with the following form:
// array(number nodes to delete, new node 1, new node 2, ...)
$delete = array_shift($token);
$old = array_splice($this->tokens, $this->t, $delete, $token);
if ($injector > -1) {
// determine appropriate skips
$oldskip = isset($old[0]) ? $old[0]->skip : array();
foreach ($token as $object) {
$object->skip = $oldskip;
$object->skip[$injector] = true;
} }
} }
}
/**
* Inserts a token before the current token. Cursor now points to this token
*/
private function insertBefore($token) {
array_splice($this->tokens, $this->t, 0, array($token));
}
/**
* Removes current token. Cursor now points to new token occupying previously
* occupied space.
*/
private function remove() {
array_splice($this->tokens, $this->t, 1);
}
/**
* Swap current token with new token. Cursor points to new token (no change).
*/
private function swap($token) {
$this->tokens[$this->t] = $token;
} }
} }

View File

@@ -19,6 +19,9 @@ class HTMLPurifier_Strategy_RemoveForeignElements extends HTMLPurifier_Strategy
$escape_invalid_tags = $config->get('Core', 'EscapeInvalidTags'); $escape_invalid_tags = $config->get('Core', 'EscapeInvalidTags');
$remove_invalid_img = $config->get('Core', 'RemoveInvalidImg'); $remove_invalid_img = $config->get('Core', 'RemoveInvalidImg');
// currently only used to determine if comments should be kept
$trusted = $config->get('HTML', 'Trusted');
$remove_script_contents = $config->get('Core', 'RemoveScriptContents'); $remove_script_contents = $config->get('Core', 'RemoveScriptContents');
$hidden_elements = $config->get('Core', 'HiddenElements'); $hidden_elements = $config->get('Core', 'HiddenElements');
@@ -125,6 +128,23 @@ class HTMLPurifier_Strategy_RemoveForeignElements extends HTMLPurifier_Strategy
if ($textify_comments !== false) { if ($textify_comments !== false) {
$data = $token->data; $data = $token->data;
$token = new HTMLPurifier_Token_Text($data); $token = new HTMLPurifier_Token_Text($data);
} elseif ($trusted) {
// keep, but perform comment cleaning
if ($e) {
// perform check whether or not there's a trailing hyphen
if (substr($token->data, -1) == '-') {
$e->send(E_NOTICE, 'Strategy_RemoveForeignElements: Trailing hyphen in comment removed');
}
}
$token->data = rtrim($token->data, '-');
$found_double_hyphen = false;
while (strpos($token->data, '--') !== false) {
if ($e && !$found_double_hyphen) {
$e->send(E_NOTICE, 'Strategy_RemoveForeignElements: Hyphens in comment collapsed');
}
$found_double_hyphen = true; // prevent double-erroring
$token->data = str_replace('--', '-', $token->data);
}
} else { } else {
// strip comments // strip comments
if ($e) $e->send(E_NOTICE, 'Strategy_RemoveForeignElements: Comment removed'); if ($e) $e->send(E_NOTICE, 'Strategy_RemoveForeignElements: Comment removed');

View File

@@ -78,6 +78,7 @@ class HTMLPurifier_StringHashParser
if (strncmp('--', $line, 2) === 0) { if (strncmp('--', $line, 2) === 0) {
// Multiline declaration // Multiline declaration
$state = trim($line, '- '); $state = trim($line, '- ');
if (!isset($ret[$state])) $ret[$state] = '';
continue; continue;
} elseif (!$state) { } elseif (!$state) {
$single = true; $single = true;
@@ -94,7 +95,6 @@ class HTMLPurifier_StringHashParser
$single = false; $single = false;
$state = false; $state = false;
} else { } else {
if (!isset($ret[$state])) $ret[$state] = '';
$ret[$state] .= "$line\n"; $ret[$state] .= "$line\n";
} }
} while (!feof($fh)); } while (!feof($fh));

View File

@@ -5,6 +5,7 @@
*/ */
class HTMLPurifier_Token { class HTMLPurifier_Token {
public $line; /**< Line number node was on in source document. Null if unknown. */ public $line; /**< Line number node was on in source document. Null if unknown. */
public $col; /**< Column of line node was on in source document. Null if unknown. */
/** /**
* Lookup array of processing that this token is exempt from. * Lookup array of processing that this token is exempt from.
@@ -13,17 +14,41 @@ class HTMLPurifier_Token {
*/ */
public $armor = array(); public $armor = array();
/**
* Used during MakeWellFormed.
*/
public $skip;
public $rewind;
public function __get($n) { public function __get($n) {
if ($n === 'type') { if ($n === 'type') {
trigger_error('Deprecated type property called; use instanceof', E_USER_NOTICE); trigger_error('Deprecated type property called; use instanceof', E_USER_NOTICE);
switch (get_class($this)) { switch (get_class($this)) {
case 'HTMLPurifier_Token_Start': return 'start'; case 'HTMLPurifier_Token_Start': return 'start';
case 'HTMLPurifier_Token_Empty': return 'empty'; case 'HTMLPurifier_Token_Empty': return 'empty';
case 'HTMLPurifier_Token_End': return 'end'; case 'HTMLPurifier_Token_End': return 'end';
case 'HTMLPurifier_Token_Text': return 'text'; case 'HTMLPurifier_Token_Text': return 'text';
case 'HTMLPurifier_Token_Comment': return 'comment'; case 'HTMLPurifier_Token_Comment': return 'comment';
default: return null; default: return null;
} }
} }
} }
/**
* Sets the position of the token in the source document.
*/
public function position($l = null, $c = null) {
$this->line = $l;
$this->col = $c;
}
/**
* Convenience function for DirectLex settings line/col position.
*/
public function rawPosition($l, $c) {
if ($c === -1) $l++;
$this->line = $l;
$this->col = $c;
}
} }

View File

@@ -11,9 +11,10 @@ class HTMLPurifier_Token_Comment extends HTMLPurifier_Token
* *
* @param $data String comment data. * @param $data String comment data.
*/ */
public function __construct($data, $line = null) { public function __construct($data, $line = null, $col = null) {
$this->data = $data; $this->data = $data;
$this->line = $line; $this->line = $line;
$this->col = $col;
} }
} }

View File

@@ -9,5 +9,9 @@
*/ */
class HTMLPurifier_Token_End extends HTMLPurifier_Token_Tag class HTMLPurifier_Token_End extends HTMLPurifier_Token_Tag
{ {
/**
* Token that started this node. Added by MakeWellFormed. Please
* do not edit this!
*/
public $start;
} }

View File

@@ -33,7 +33,7 @@ class HTMLPurifier_Token_Tag extends HTMLPurifier_Token
* @param $name String name. * @param $name String name.
* @param $attr Associative array of attributes. * @param $attr Associative array of attributes.
*/ */
public function __construct($name, $attr = array(), $line = null) { public function __construct($name, $attr = array(), $line = null, $col = null) {
$this->name = ctype_lower($name) ? $name : strtolower($name); $this->name = ctype_lower($name) ? $name : strtolower($name);
foreach ($attr as $key => $value) { foreach ($attr as $key => $value) {
// normalization only necessary when key is not lowercase // normalization only necessary when key is not lowercase
@@ -49,5 +49,6 @@ class HTMLPurifier_Token_Tag extends HTMLPurifier_Token
} }
$this->attr = $attr; $this->attr = $attr;
$this->line = $line; $this->line = $line;
$this->col = $col;
} }
} }

View File

@@ -21,10 +21,11 @@ class HTMLPurifier_Token_Text extends HTMLPurifier_Token
* *
* @param $data String parsed character data. * @param $data String parsed character data.
*/ */
public function __construct($data, $line = null) { public function __construct($data, $line = null, $col = null) {
$this->data = $data; $this->data = $data;
$this->is_whitespace = ctype_space($data); $this->is_whitespace = ctype_space($data);
$this->line = $line; $this->line = $line;
$this->col = $col;
} }
} }

View File

@@ -5,7 +5,7 @@ class HTMLPurifier_URIFilter_DisableExternal extends HTMLPurifier_URIFilter
public $name = 'DisableExternal'; public $name = 'DisableExternal';
protected $ourHostParts = false; protected $ourHostParts = false;
public function prepare($config) { public function prepare($config) {
$our_host = $config->get('URI', 'Host'); $our_host = $config->getDefinition('URI')->host;
if ($our_host !== null) $this->ourHostParts = array_reverse(explode('.', $our_host)); if ($our_host !== null) $this->ourHostParts = array_reverse(explode('.', $our_host));
} }
public function filter(&$uri, $config, $context) { public function filter(&$uri, $config, $context) {

View File

@@ -51,12 +51,18 @@ class HTMLPurifier_URIFilter_MakeAbsolute extends HTMLPurifier_URIFilter
} }
if ($uri->path === '') { if ($uri->path === '') {
$uri->path = $this->base->path; $uri->path = $this->base->path;
}elseif ($uri->path[0] !== '/') { } elseif ($uri->path[0] !== '/') {
// relative path, needs more complicated processing // relative path, needs more complicated processing
$stack = explode('/', $uri->path); $stack = explode('/', $uri->path);
$new_stack = array_merge($this->basePathStack, $stack); $new_stack = array_merge($this->basePathStack, $stack);
if ($new_stack[0] !== '' && !is_null($this->base->host)) {
array_unshift($new_stack, '');
}
$new_stack = $this->_collapseStack($new_stack); $new_stack = $this->_collapseStack($new_stack);
$uri->path = implode('/', $new_stack); $uri->path = implode('/', $new_stack);
} else {
// absolute path, but still we should collapse
$uri->path = implode('/', $this->_collapseStack(explode('/', $uri->path)));
} }
// re-combine // re-combine
$uri->scheme = $this->base->scheme; $uri->scheme = $this->base->scheme;
@@ -71,6 +77,7 @@ class HTMLPurifier_URIFilter_MakeAbsolute extends HTMLPurifier_URIFilter
*/ */
private function _collapseStack($stack) { private function _collapseStack($stack) {
$result = array(); $result = array();
$is_folder = false;
for ($i = 0; isset($stack[$i]); $i++) { for ($i = 0; isset($stack[$i]); $i++) {
$is_folder = false; $is_folder = false;
// absorb an internally duplicated slash // absorb an internally duplicated slash

View File

@@ -28,7 +28,11 @@ class HTMLPurifier_URIFilter_Munge extends HTMLPurifier_URIFilter
$this->replace = array_map('rawurlencode', $this->replace); $this->replace = array_map('rawurlencode', $this->replace);
$new_uri = strtr($this->target, $this->replace); $new_uri = strtr($this->target, $this->replace);
$uri = $this->parser->parse($new_uri); // overwrite $new_uri = $this->parser->parse($new_uri);
// don't redirect if the target host is the same as the
// starting host
if ($uri->host === $new_uri->host) return true;
$uri = $new_uri; // overwrite
return true; return true;
} }

View File

@@ -234,6 +234,18 @@ class HTMLPurifier_UnitConverter
* Scales a float to $scale digits right of decimal point, like BCMath. * Scales a float to $scale digits right of decimal point, like BCMath.
*/ */
private function scale($r, $scale) { private function scale($r, $scale) {
if ($scale < 0) {
// The f sprintf type doesn't support negative numbers, so we
// need to cludge things manually. First get the string.
$r = sprintf('%.0f', (float) $r);
// Due to floating point precision loss, $r will more than likely
// look something like 4652999999999.9234. We grab one more digit
// than we need to precise from $r and then use that to round
// appropriately.
$precise = (string) round(substr($r, 0, strlen($r) + $scale), -1);
// Now we return it, truncating the zero that was rounded off.
return substr($precise, 0, -1) . str_repeat('0', -$scale + 1);
}
return sprintf('%.' . $scale . 'f', (float) $r); return sprintf('%.' . $scale . 'f', (float) $r);
} }

2
plugins/phorum/.gitignore vendored Normal file
View File

@@ -0,0 +1,2 @@
migrate.php
htmlpurifier/*

View File

@@ -104,5 +104,5 @@ file_put_contents('library/HTMLPurifier/Config.php', $config_c);
passthru('php maintenance/flush.php'); passthru('php maintenance/flush.php');
if ($is_dev) echo "Review changes, write something in WHATSNEW and FOCUS, and then SVN commit with log 'Release $version.'" . PHP_EOL; if ($is_dev) echo "Review changes, write something in WHATSNEW and FOCUS, and then commit with log 'Release $version.'" . PHP_EOL;
else echo "Numbers updated to dev, no other modifications necessary!"; else echo "Numbers updated to dev, no other modifications necessary!";

View File

@@ -0,0 +1,39 @@
<?php
class HTMLPurifier_AttrTransform_BackgroundTest extends HTMLPurifier_AttrTransformHarness
{
function setUp() {
parent::setUp();
$this->obj = new HTMLPurifier_AttrTransform_Background();
}
function testEmptyInput() {
$this->assertResult( array() );
}
function testBasicTransform() {
$this->assertResult(
array('background' => 'logo.png'),
array('style' => 'background-image:url(logo.png);')
);
}
function testPrependNewCSS() {
$this->assertResult(
array('background' => 'logo.png', 'style' => 'font-weight:bold'),
array('style' => 'background-image:url(logo.png);font-weight:bold')
);
}
function testLenientTreatmentOfInvalidInput() {
// notice that we rely on the CSS validator later to fix this invalid
// stuff
$this->assertResult(
array('background' => 'logo.png);foo:('),
array('style' => 'background-image:url(logo.png);foo:();')
);
}
}

View File

@@ -19,6 +19,7 @@ class HTMLPurifier_AttrTransform_ImgRequiredTest extends HTMLPurifier_AttrTransf
function testAlternateDefaults() { function testAlternateDefaults() {
$this->config->set('Attr', 'DefaultInvalidImage', 'blank.png'); $this->config->set('Attr', 'DefaultInvalidImage', 'blank.png');
$this->config->set('Attr', 'DefaultInvalidImageAlt', 'Pawned!'); $this->config->set('Attr', 'DefaultInvalidImageAlt', 'Pawned!');
$this->config->set('Attr', 'DefaultImageAlt', 'not pawned');
$this->config->set('Core', 'RemoveInvalidImg', false); $this->config->set('Core', 'RemoveInvalidImg', false);
$this->assertResult( $this->assertResult(
array(), array(),
@@ -41,5 +42,13 @@ class HTMLPurifier_AttrTransform_ImgRequiredTest extends HTMLPurifier_AttrTransf
); );
} }
function testAddDefaultAlt() {
$this->config->set('Attr', 'DefaultImageAlt', 'default');
$this->assertResult(
array('src' => ''),
array('src' => '', 'alt' => 'default')
);
}
} }

View File

@@ -0,0 +1,93 @@
<?php
class HTMLPurifier_AttrTransform_InputTest extends HTMLPurifier_AttrTransformHarness
{
function setUp() {
parent::setUp();
$this->obj = new HTMLPurifier_AttrTransform_Input();
}
function testEmptyInput() {
$this->assertResult(array());
}
function testInvalidCheckedWithEmpty() {
$this->assertResult(array('checked' => 'checked'), array());
}
function testInvalidCheckedWithPassword() {
$this->assertResult(array(
'checked' => 'checked',
'type' => 'password'
), array(
'type' => 'password'
));
}
function testValidCheckedWithUcCheckbox() {
$this->assertResult(array(
'checked' => 'checked',
'type' => 'CHECKBOX',
'value' => 'bar',
));
}
function testInvalidMaxlength() {
$this->assertResult(array(
'maxlength' => '10',
'type' => 'checkbox',
'value' => 'foo',
), array(
'type' => 'checkbox',
'value' => 'foo',
));
}
function testValidMaxLength() {
$this->assertResult(array(
'maxlength' => '10',
));
}
// these two are really bad test-cases
function testSizeWithCheckbox() {
$this->assertResult(array(
'type' => 'checkbox',
'value' => 'foo',
'size' => '100px',
), array(
'type' => 'checkbox',
'value' => 'foo',
'size' => '100',
));
}
function testSizeWithText() {
$this->assertResult(array(
'type' => 'password',
'size' => '100px', // spurious value, to indicate no validation takes place
), array(
'type' => 'password',
'size' => '100px',
));
}
function testInvalidSrc() {
$this->assertResult(array(
'src' => 'img.png',
), array());
}
function testMissingValue() {
$this->assertResult(array(
'type' => 'checkbox',
), array(
'type' => 'checkbox',
'value' => '',
));
}
}

View File

@@ -3,6 +3,15 @@
class HTMLPurifier_AttrValidator_ErrorsTest extends HTMLPurifier_ErrorsHarness class HTMLPurifier_AttrValidator_ErrorsTest extends HTMLPurifier_ErrorsHarness
{ {
public function setup() {
parent::setup();
$config = HTMLPurifier_Config::createDefault();
$this->language = HTMLPurifier_LanguageFactory::instance()->create($config, $this->context);
$this->context->register('Locale', $this->language);
$this->collector = new HTMLPurifier_ErrorCollector($this->context);
$this->context->register('Generator', new HTMLPurifier_Generator($config, $this->context));
}
protected function invoke($input) { protected function invoke($input) {
$validator = new HTMLPurifier_AttrValidator(); $validator = new HTMLPurifier_AttrValidator();
$validator->validateToken($input, $this->config, $this->context); $validator->validateToken($input, $this->config, $this->context);
@@ -18,28 +27,40 @@ class HTMLPurifier_AttrValidator_ErrorsTest extends HTMLPurifier_ErrorsHarness
$output = array('class' => 'value'); // must be valid $output = array('class' => 'value'); // must be valid
$transform->setReturnValue('transform', $output, array($input, new AnythingExpectation(), new AnythingExpectation())); $transform->setReturnValue('transform', $output, array($input, new AnythingExpectation(), new AnythingExpectation()));
$def->info_attr_transform_pre[] = $transform; $def->info_attr_transform_pre[] = $transform;
$this->expectErrorCollection(E_NOTICE, 'AttrValidator: Attributes transformed', $input, $output);
$token = new HTMLPurifier_Token_Start('span', $input, 1); $token = new HTMLPurifier_Token_Start('span', $input, 1);
$this->invoke($token); $this->invoke($token);
$result = $this->collector->getRaw();
$expect = array(
array(1, E_NOTICE, 'Attributes on <span> transformed from original to class', array()),
);
$this->assertIdentical($result, $expect);
} }
function testAttributesTransformedLocalPre() { function testAttributesTransformedLocalPre() {
$this->config->set('HTML', 'TidyLevel', 'heavy'); $this->config->set('HTML', 'TidyLevel', 'heavy');
$input = array('align' => 'right'); $input = array('align' => 'right');
$output = array('style' => 'text-align:right;'); $output = array('style' => 'text-align:right;');
$this->expectErrorCollection(E_NOTICE, 'AttrValidator: Attributes transformed', $input, $output);
$token = new HTMLPurifier_Token_Start('p', $input, 1); $token = new HTMLPurifier_Token_Start('p', $input, 1);
$this->invoke($token); $this->invoke($token);
$result = $this->collector->getRaw();
$expect = array(
array(1, E_NOTICE, 'Attributes on <p> transformed from align to style', array()),
);
$this->assertIdentical($result, $expect);
} }
// too lazy to check for global post and global pre // too lazy to check for global post and global pre
function testAttributeRemoved() { function testAttributeRemoved() {
$this->expectErrorCollection(E_ERROR, 'AttrValidator: Attribute removed');
$this->expectContext('CurrentAttr', 'foobar');
$token = new HTMLPurifier_Token_Start('p', array('foobar' => 'right'), 1); $token = new HTMLPurifier_Token_Start('p', array('foobar' => 'right'), 1);
$this->expectContext('CurrentToken', $token);
$this->invoke($token); $this->invoke($token);
$result = $this->collector->getRaw();
$expect = array(
array(1, E_ERROR, 'foobar attribute on <p> removed', array()),
);
$this->assertIdentical($result, $expect);
} }
} }

View File

@@ -69,5 +69,20 @@ class HTMLPurifier_ChildDef_CustomTest extends HTMLPurifier_ChildDefHarness
} }
function testPcdata() {
$this->obj = new HTMLPurifier_ChildDef_Custom('#PCDATA,a');
$this->assertEqual($this->obj->elements, array('#PCDATA' => true, 'a' => true));
$this->assertResult('foo<a />');
$this->assertResult('<a />', false);
}
function testWhitespace() {
$this->obj = new HTMLPurifier_ChildDef_Custom('a');
$this->assertEqual($this->obj->elements, array('a' => true));
$this->assertResult('foo<a />', false);
$this->assertResult('<a />');
$this->assertResult(' <a />');
}
} }

View File

@@ -67,8 +67,6 @@ class HTMLPurifier_ChildDef_RequiredTest extends HTMLPurifier_ChildDefHarness
'Out <b>Bold text</b><img />', 'Out <b>Bold text</b><img />',
'Out <b>Bold text</b>&lt;img /&gt;' 'Out <b>Bold text</b>&lt;img /&gt;'
); );
} }
} }

View File

@@ -35,16 +35,6 @@ class HTMLPurifier_ComplexHarness extends HTMLPurifier_Harness
*/ */
protected $lexer; protected $lexer;
/**
* Default config to fall back on if no config is available
*/
protected $config;
/**
* Default context to fall back on if no context is available
*/
protected $context;
public function __construct() { public function __construct() {
$this->lexer = new HTMLPurifier_Lexer_DirectLex(); $this->lexer = new HTMLPurifier_Lexer_DirectLex();
parent::__construct(); parent::__construct();
@@ -88,9 +78,12 @@ class HTMLPurifier_ComplexHarness extends HTMLPurifier_Harness
$expect = $this->generate($expect); $expect = $this->generate($expect);
} }
} }
$this->assertIdentical($expect, $result); $this->assertIdentical($expect, $result);
if ($expect !== $result) {
echo '<pre>' . htmlspecialchars($result) . '</pre>';
}
} }
/** /**

View File

@@ -38,6 +38,15 @@ class HTMLPurifier_EncoderTest extends HTMLPurifier_Harness
); );
} }
function test_convertToUTF8_spuriousEncoding() {
$this->config->set('Core', 'Encoding', 'utf99');
$this->expectError('Invalid encoding utf99');
$this->assertIdentical(
HTMLPurifier_Encoder::convertToUTF8("\xF6", $this->config, $this->context),
''
);
}
function test_convertToUTF8_iso8859_1() { function test_convertToUTF8_iso8859_1() {
$this->config->set('Core', 'Encoding', 'ISO-8859-1'); $this->config->set('Core', 'Encoding', 'ISO-8859-1');
$this->assertIdentical( $this->assertIdentical(

View File

@@ -1,134 +1,156 @@
<?php <?php
/**
* @warning HTML output is in flux, but eventually needs to be stabilized.
*/
class HTMLPurifier_ErrorCollectorTest extends HTMLPurifier_Harness class HTMLPurifier_ErrorCollectorTest extends HTMLPurifier_Harness
{ {
protected $language, $generator, $line;
protected $collector;
public function setup() { public function setup() {
generate_mock_once('HTMLPurifier_Language'); generate_mock_once('HTMLPurifier_Language');
generate_mock_once('HTMLPurifier_Generator'); generate_mock_once('HTMLPurifier_Generator');
parent::setup(); parent::setup();
$this->language = new HTMLPurifier_LanguageMock();
$this->language->setReturnValue('getErrorName', 'Error', array(E_ERROR));
$this->language->setReturnValue('getErrorName', 'Warning', array(E_WARNING));
$this->language->setReturnValue('getErrorName', 'Notice', array(E_NOTICE));
// this might prove to be troublesome if we need to set config
$this->generator = new HTMLPurifier_Generator($this->config, $this->context);
$this->line = false;
$this->context->register('Locale', $this->language);
$this->context->register('CurrentLine', $this->line);
$this->context->register('Generator', $this->generator);
$this->collector = new HTMLPurifier_ErrorCollector($this->context);
} }
function test() { function test() {
$language = new HTMLPurifier_LanguageMock(); $language = $this->language;
$language->setReturnValue('getErrorName', 'Error', array(E_ERROR)); $language->setReturnValue('getMessage', 'Message 1', array('message-1'));
$language->setReturnValue('getErrorName', 'Warning', array(E_WARNING)); $language->setReturnValue('formatMessage', 'Message 2', array('message-2', array(1 => 'param')));
$language->setReturnValue('getMessage', 'Message 1', array('message-1'));
$language->setReturnValue('formatMessage', 'Message 2', array('message-2', array(1 => 'param')));
$language->setReturnValue('formatMessage', ' at line 23', array('ErrorCollector: At line', array('line' => 23))); $language->setReturnValue('formatMessage', ' at line 23', array('ErrorCollector: At line', array('line' => 23)));
$language->setReturnValue('formatMessage', ' at line 3', array('ErrorCollector: At line', array('line' => 3))); $language->setReturnValue('formatMessage', ' at line 3', array('ErrorCollector: At line', array('line' => 3)));
$line = false; $this->line = 23;
$this->collector->send(E_ERROR, 'message-1');
$this->context->register('Locale', $language); $this->line = 3;
$this->context->register('CurrentLine', $line); $this->collector->send(E_WARNING, 'message-2', 'param');
$generator = new HTMLPurifier_Generator($this->config, $this->context);
$this->context->register('Generator', $generator);
$collector = new HTMLPurifier_ErrorCollector($this->context);
$line = 23;
$collector->send(E_ERROR, 'message-1');
$line = 3;
$collector->send(E_WARNING, 'message-2', 'param');
$result = array( $result = array(
0 => array(23, E_ERROR, 'Message 1'), 0 => array(23, E_ERROR, 'Message 1', array()),
1 => array(3, E_WARNING, 'Message 2') 1 => array(3, E_WARNING, 'Message 2', array())
); );
$this->assertIdentical($collector->getRaw(), $result); $this->assertIdentical($this->collector->getRaw(), $result);
/*
$formatted_result = $formatted_result =
'<ul><li><strong>Warning</strong>: Message 2 at line 3</li>'. '<ul><li><strong>Warning</strong>: Message 2 at line 3</li>'.
'<li><strong>Error</strong>: Message 1 at line 23</li></ul>'; '<li><strong>Error</strong>: Message 1 at line 23</li></ul>';
$config = HTMLPurifier_Config::create(array('Core.MaintainLineNumbers' => true)); $this->assertIdentical($this->collector->getHTMLFormatted($this->config), $formatted_result);
*/
$this->assertIdentical($collector->getHTMLFormatted($this->config), $formatted_result);
} }
function testNoErrors() { function testNoErrors() {
$language = new HTMLPurifier_LanguageMock(); $this->language->setReturnValue('getMessage', 'No errors', array('ErrorCollector: No errors'));
$language->setReturnValue('getMessage', 'No errors', array('ErrorCollector: No errors'));
$this->context->register('Locale', $language);
$generator = new HTMLPurifier_Generator($this->config, $this->context);
$this->context->register('Generator', $generator);
$collector = new HTMLPurifier_ErrorCollector($this->context);
$formatted_result = '<p>No errors</p>'; $formatted_result = '<p>No errors</p>';
$this->assertIdentical($collector->getHTMLFormatted($this->config), $formatted_result); $this->assertIdentical(
$this->collector->getHTMLFormatted($this->config),
$formatted_result
);
} }
function testNoLineNumbers() { function testNoLineNumbers() {
$language = new HTMLPurifier_LanguageMock(); $this->language->setReturnValue('getMessage', 'Message 1', array('message-1'));
$language->setReturnValue('getMessage', 'Message 1', array('message-1')); $this->language->setReturnValue('getMessage', 'Message 2', array('message-2'));
$language->setReturnValue('getMessage', 'Message 2', array('message-2'));
$language->setReturnValue('getErrorName', 'Error', array(E_ERROR));
$this->context->register('Locale', $language);
$generator = new HTMLPurifier_Generator($this->config, $this->context); $this->collector->send(E_ERROR, 'message-1');
$this->context->register('Generator', $generator); $this->collector->send(E_ERROR, 'message-2');
$collector = new HTMLPurifier_ErrorCollector($this->context);
$collector->send(E_ERROR, 'message-1');
$collector->send(E_ERROR, 'message-2');
$result = array( $result = array(
0 => array(null, E_ERROR, 'Message 1'), 0 => array(false, E_ERROR, 'Message 1', array()),
1 => array(null, E_ERROR, 'Message 2') 1 => array(false, E_ERROR, 'Message 2', array())
); );
$this->assertIdentical($collector->getRaw(), $result); $this->assertIdentical($this->collector->getRaw(), $result);
/*
$formatted_result = $formatted_result =
'<ul><li><strong>Error</strong>: Message 1</li>'. '<ul><li><strong>Error</strong>: Message 1</li>'.
'<li><strong>Error</strong>: Message 2</li></ul>'; '<li><strong>Error</strong>: Message 2</li></ul>';
$this->assertIdentical($collector->getHTMLFormatted($this->config), $formatted_result); $this->assertIdentical($this->collector->getHTMLFormatted($this->config), $formatted_result);
*/
} }
function testContextSubstitutions() { function testContextSubstitutions() {
$language = new HTMLPurifier_LanguageMock();
$this->context->register('Locale', $language);
$generator = new HTMLPurifier_Generator($this->config, $this->context);
$this->context->register('Generator', $generator);
$current_token = false; $current_token = false;
$this->context->register('CurrentToken', $current_token); $this->context->register('CurrentToken', $current_token);
$collector = new HTMLPurifier_ErrorCollector($this->context);
// 0 // 0
$current_token = new HTMLPurifier_Token_Start('a', array('href' => 'http://example.com'), 32); $current_token = new HTMLPurifier_Token_Start('a', array('href' => 'http://example.com'), 32);
$language->setReturnValue('formatMessage', 'Token message', $this->language->setReturnValue('formatMessage', 'Token message',
array('message-data-token', array('CurrentToken' => $current_token))); array('message-data-token', array('CurrentToken' => $current_token)));
$collector->send(E_NOTICE, 'message-data-token'); $this->collector->send(E_NOTICE, 'message-data-token');
$current_attr = 'href'; $current_attr = 'href';
$language->setReturnValue('formatMessage', '$CurrentAttr.Name => $CurrentAttr.Value', $this->language->setReturnValue('formatMessage', '$CurrentAttr.Name => $CurrentAttr.Value',
array('message-attr', array('CurrentToken' => $current_token))); array('message-attr', array('CurrentToken' => $current_token)));
// 1 // 1
$collector->send(E_NOTICE, 'message-attr'); // test when context isn't available $this->collector->send(E_NOTICE, 'message-attr'); // test when context isn't available
// 2 // 2
$this->context->register('CurrentAttr', $current_attr); $this->context->register('CurrentAttr', $current_attr);
$collector->send(E_NOTICE, 'message-attr'); $this->collector->send(E_NOTICE, 'message-attr');
$result = array( $result = array(
0 => array(32, E_NOTICE, 'Token message'), 0 => array(32, E_NOTICE, 'Token message', array()),
1 => array(32, E_NOTICE, '$CurrentAttr.Name => $CurrentAttr.Value'), 1 => array(32, E_NOTICE, '$CurrentAttr.Name => $CurrentAttr.Value', array()),
2 => array(32, E_NOTICE, 'href => http://example.com') 2 => array(32, E_NOTICE, 'href => http://example.com', array())
); );
$this->assertIdentical($collector->getRaw(), $result); $this->assertIdentical($this->collector->getRaw(), $result);
} }
/*
function testNestedErrors() {
$this->language->setReturnValue('getMessage', 'Message 1', array('message-1'));
$this->language->setReturnValue('getMessage', 'Message 2', array('message-2'));
$this->language->setReturnValue('formatMessage', 'End Message', array('end-message', array(1 => 'param')));
$this->language->setReturnValue('formatMessage', ' at line 4', array('ErrorCollector: At line', array('line' => 4)));
$this->line = 4;
$this->collector->start();
$this->collector->send(E_WARNING, 'message-1');
$this->collector->send(E_NOTICE, 'message-2');
$this->collector->end(E_NOTICE, 'end-message', 'param');
$expect = array(
0 => array(4, E_NOTICE, 'End Message', array(
0 => array(4, E_WARNING, 'Message 1', array()),
1 => array(4, E_NOTICE, 'Message 2', array()),
)),
);
$result = $this->collector->getRaw();
$this->assertIdentical($result, $expect);
$formatted_expect =
'<ul><li><strong>Notice</strong>: End Message at line 4<ul>'.
'<li><strong>Warning</strong>: Message 1 at line 4</li>'.
'<li><strong>Notice</strong>: Message 2 at line 4</li></ul>'.
'</li></ul>';
$formatted_result = $this->collector->getHTMLFormatted($this->config);
$this->assertIdentical($formatted_result, $formatted_expect);
}
*/
} }

View File

@@ -0,0 +1,137 @@
<?php
class HTMLPurifier_HTMLModule_FormsTest extends HTMLPurifier_HTMLModuleHarness
{
function setUp() {
parent::setUp();
$this->config->set('HTML', 'Trusted', true);
$this->config->set('Attr', 'EnableID', true);
$this->config->set('Cache', 'DefinitionImpl', null);
}
function testBasicUse() {
$this->assertResult( // need support for label for later
'
<form action="http://somesite.com/prog/adduser" method="post">
<p>
<label>First name: </label>
<input type="text" id="firstname" /><br />
<label>Last name: </label>
<input type="text" id="lastname" /><br />
<label>email: </label>
<input type="text" id="email" /><br />
<input type="radio" name="sex" value="Male" /> Male<br />
<input type="radio" name="sex" value="Female" /> Female<br />
<input type="submit" value="Send" /> <input type="reset" />
</p>
</form>'
);
}
function testSelectOption() {
$this->assertResult('
<form action="http://somesite.com/prog/component-select" method="post">
<p>
<select multiple="multiple" size="4" name="component-select">
<option selected="selected" value="Component_1_a">Component_1</option>
<option selected="selected" value="Component_1_b">Component_2</option>
<option>Component_3</option>
<option>Component_4</option>
<option>Component_5</option>
<option>Component_6</option>
<option>Component_7</option>
</select>
<input type="submit" value="Send" /><input type="reset" />
</p>
</form>
');
}
function testSelectOptgroup() {
$this->assertResult('
<form action="http://somesite.com/prog/someprog" method="post">
<p>
<select name="ComOS">
<option selected="selected" label="none" value="none">None</option>
<optgroup label="PortMaster 3">
<option label="3.7.1" value="pm3_3.7.1">PortMaster 3 with ComOS 3.7.1</option>
<option label="3.7" value="pm3_3.7">PortMaster 3 with ComOS 3.7</option>
<option label="3.5" value="pm3_3.5">PortMaster 3 with ComOS 3.5</option>
</optgroup>
<optgroup label="PortMaster 2">
<option label="3.7" value="pm2_3.7">PortMaster 2 with ComOS 3.7</option>
<option label="3.5" value="pm2_3.5">PortMaster 2 with ComOS 3.5</option>
</optgroup>
<optgroup label="IRX">
<option label="3.7R" value="IRX_3.7R">IRX with ComOS 3.7R</option>
<option label="3.5R" value="IRX_3.5R">IRX with ComOS 3.5R</option>
</optgroup>
</select>
</p>
</form>
');
}
function testTextarea() {
$this->assertResult('
<form action="http://somesite.com/prog/text-read" method="post">
<p>
<textarea name="thetext" rows="20" cols="80">
First line of initial text.
Second line of initial text.
</textarea>
<input type="submit" value="Send" /><input type="reset" />
</p>
</form>
');
}
// label tests omitted
function testFieldset() {
$this->assertResult('
<form action="..." method="post">
<fieldset>
<legend>Personal Information</legend>
Last Name: <input name="personal_lastname" type="text" tabindex="1" />
First Name: <input name="personal_firstname" type="text" tabindex="2" />
Address: <input name="personal_address" type="text" tabindex="3" />
...more personal information...
</fieldset>
<fieldset>
<legend>Medical History</legend>
<input name="history_illness" type="checkbox" value="Smallpox" tabindex="20" />Smallpox
<input name="history_illness" type="checkbox" value="Mumps" tabindex="21" /> Mumps
<input name="history_illness" type="checkbox" value="Dizziness" tabindex="22" /> Dizziness
<input name="history_illness" type="checkbox" value="Sneezing" tabindex="23" /> Sneezing
...more medical history...
</fieldset>
<fieldset>
<legend>Current Medication</legend>
Are you currently taking any medication?
<input name="medication_now" type="radio" value="Yes" tabindex="35" />Yes
<input name="medication_now" type="radio" value="No" tabindex="35" />No
If you are currently taking medication, please indicate
it in the space below:
<textarea name="current_medication" rows="20" cols="50" tabindex="40"></textarea>
</fieldset>
</form>
');
}
function testInputTransform() {
$this->assertResult('<input type="checkbox" />', '<input type="checkbox" value="" />');
}
function testTextareaTransform() {
$this->assertResult('<textarea></textarea>', '<textarea cols="22" rows="3"></textarea>');
}
function testTextInFieldset() {
$this->assertResult('<fieldset> <legend></legend>foo</fieldset>');
}
}

View File

@@ -0,0 +1,31 @@
<?php
class HTMLPurifier_HTMLT extends HTMLPurifier_Harness
{
protected $path;
public function __construct($path) {
$this->path = $path;
parent::__construct($path);
}
public function testHtmlt() {
$parser = new HTMLPurifier_StringHashParser();
$hash = $parser->parseFile($this->path); // assume parser normalizes to "\n"
if (isset($hash['SKIPIF'])) {
if (eval($hash['SKIPIF'])) return;
}
$this->config->set('Output', 'Newline', "\n");
if (isset($hash['INI'])) {
// there should be a more efficient way than writing another
// ini file every time... probably means building a parser for
// ini (check out the yaml implementation we saw somewhere else)
$ini_file = $this->path . '.ini';
file_put_contents($ini_file, $hash['INI']);
$this->config->loadIni($ini_file);
}
$expect = isset($hash['EXPECT']) ? $hash['EXPECT'] : $hash['HTML'];
$this->assertPurification(rtrim($hash['HTML']), rtrim($expect));
if (isset($hash['INI'])) unlink($ini_file);
}
}

View File

@@ -0,0 +1,7 @@
--INI--
HTML.AllowedElements = b,i,p,a
HTML.AllowedAttributes = a.href,*.id
--HTML--
<p>Par.</p>
<p>Para<a href="http://google.com/">gr</a>aph</p>
Text<b>Bol<i>d</i></b>

View File

@@ -0,0 +1,7 @@
--INI--
HTML.AllowedElements = b,i,p,a
HTML.AllowedAttributes = a.href,*.id
--HTML--
<span>Not allowed</span><a class="mef" id="foobar">Remove id too!</a>
--EXPECT--
Not allowed<a>Remove id too!</a>

View File

@@ -0,0 +1,4 @@
--HTML--
<b>basic</b>
--EXPECT--
<b>basic</b>

View File

@@ -0,0 +1,5 @@
--INI--
HTML.ForbiddenElements = b
HTML.ForbiddenAttributes = a@href
--HTML--
<p>foo</p>

View File

@@ -0,0 +1,7 @@
--INI--
HTML.ForbiddenElements = b
HTML.ForbiddenAttributes = a@href
--HTML--
<b>Foo<a href="bar">bar</a></b>
--EXPECT--
Foo<a>bar</a>

View File

@@ -0,0 +1,4 @@
--INI--
CSS.AllowedProperties = color,background-color
--HTML--
<div style="color:#f00;background-color:#ded;">red</div>

View File

@@ -0,0 +1,6 @@
--INI--
CSS.AllowedProperties = color,background-color
--HTML--
<div style="color:#f00;border:1px solid #000">red</div>
--EXPECT--
<div style="color:#f00;">red</div>

View File

@@ -0,0 +1,5 @@
--INI--
URI.Disable = true
--HTML--
<img src="foobar" />
--EXPECT--

View File

@@ -0,0 +1,6 @@
--INI--
--HTML--
--EXPECT--

View File

@@ -0,0 +1,4 @@
--HTML--
<span id="moon">foobar</span>
--EXPECT--
<span>foobar</span>

View File

@@ -0,0 +1,5 @@
--INI--
Attr.EnableID = true
--HTML--
<span id="moon">foobar</span>
<img id="folly" src="folly.png" alt="Omigosh!" />

Some files were not shown because too many files have changed in this diff Show More