1
0
mirror of https://github.com/ezyang/htmlpurifier.git synced 2025-08-02 20:27:40 +02:00

Compare commits

...

1156 Commits

Author SHA1 Message Date
semantic-release-bot
523407fb06 chore(release): 4.16.0 [skip ci]
# [4.16.0](https://github.com/ezyang/htmlpurifier/compare/v4.15.0...v4.16.0) (2022-09-18)

### Features

* add semantic release ([#307](https://github.com/ezyang/htmlpurifier/issues/307)) ([db31243](db312435cb)), closes [#322](https://github.com/ezyang/htmlpurifier/issues/322) [#323](https://github.com/ezyang/htmlpurifier/issues/323) [#326](https://github.com/ezyang/htmlpurifier/issues/326) [#327](https://github.com/ezyang/htmlpurifier/issues/327) [#328](https://github.com/ezyang/htmlpurifier/issues/328) [#329](https://github.com/ezyang/htmlpurifier/issues/329) [#330](https://github.com/ezyang/htmlpurifier/issues/330) [#331](https://github.com/ezyang/htmlpurifier/issues/331) [#332](https://github.com/ezyang/htmlpurifier/issues/332) [#333](https://github.com/ezyang/htmlpurifier/issues/333) [#337](https://github.com/ezyang/htmlpurifier/issues/337) [#335](https://github.com/ezyang/htmlpurifier/issues/335) [ezyang/htmlpurifier#334](https://github.com/ezyang/htmlpurifier/issues/334) [#336](https://github.com/ezyang/htmlpurifier/issues/336) [#338](https://github.com/ezyang/htmlpurifier/issues/338)
2022-09-18 07:06:19 +00:00
Kieran
db312435cb feat: add semantic release (#307)
* Add semantic release

* fix typo

* split from matrix

* remove only on push

* remove npm plugin

* write changelog to NEWS

* list assets to include in git commit

* fix update-for-release

* lint pr title

* split release into separate workflow that runs manually

* revert ci.yml changes

* remove references to WHATSNEW

* Fix #322 - PHP 8.1 deprecation notice in HostBlacklist URIFilter (#323)

* Replace 8.1-deprecated utf8_ funcs with mbstring (#326)

* Treat PHP version numbers as strings in GitHub Actions (#327)

YAML will try to interpret numeric values as numbers, leading to `8.0` being
interpreted as `8` instead of `'8.0'`.

This doesn't result in a functional change, but cleans up the output of the
jobs a little (e.g. in the title line).

* Update to `actions/checkout@v3` (#328)

This does not introduce any functional difference and is intended as a
future-proofing change.

see https://github.com/actions/checkout/releases/tag/v3.0.0

* Fix test selection logic in tests/test_files.php (#329)

Selecting the `fstools` tests also executed the `htmlt` tests.

* Fix some more PHP 8.2 deprecations (#330)

* Define HTMLPurifier_AttrTransform_SafeParam::$wmode

This fixes a PHP 8.2 deprecation.

* Define HTMLPurifier_DefinitionCache_DecoratorHarness::$cache

This fixes a PHP 8.2 deprecation.

* Define HTMLPurifier_DefinitionCache_DecoratorHarness::$mock

This fixes a PHP 8.2 deprecation.

* Define HTMLPurifier_DefinitionCache_DecoratorHarness::$def

This fixes a PHP 8.2 deprecation.

* Define HTMLPurifier_EntityParserTest::$_entity_lookup

This fixes a PHP 8.2 deprecation.

* Increase minimum requirement to PHP 5.6 (#331)

* Add contenteditable attribute definition (#332)

* Add contenteditable attribute definition

* gate behind html.trusted

* use enum

* Fix creation of dynamic property (#333)

* Fix creation of dynamic property (#337)

* Add PHP 8.2 to CI (#335)

* Add PHP 8.2 to CI

see ezyang/htmlpurifier#334

* Add PHP 8.2 to composer.json

* Fix contenteditable attribute definition (#336)

* Run CSSTidy tests on CI (#338)

* Run CSSTidy tests on CI

* update dirname

* use compopser instead of git clone

* use composer

* use test-settings.sample.php

* enable ext-intl

* disable Net_IDNA2

* Release 4.15.0

Signed-off-by: Edward Z. Yang <ezyang@mit.edu>

Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
Co-authored-by: John Flatness <john@zerocrates.org>
Co-authored-by: Tim Düsterhus <duesterhus@woltlab.com>
Co-authored-by: Tim Düsterhus <timwolla@googlemail.com>
Co-authored-by: Edward Z. Yang <ezyang@mit.edu>
2022-09-18 02:44:00 -04:00
Edward Z. Yang
8d9f4c9ec1 Release 4.15.0
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2022-09-18 02:23:57 -04:00
Kieran
25824056ee Run CSSTidy tests on CI (#338)
* Run CSSTidy tests on CI

* update dirname

* use compopser instead of git clone

* use composer

* use test-settings.sample.php

* enable ext-intl

* disable Net_IDNA2
2022-09-14 20:55:41 -07:00
Kieran
f1d6da13bc Fix contenteditable attribute definition (#336) 2022-09-12 07:53:24 -07:00
Tim Düsterhus
dc27c78871 Add PHP 8.2 to CI (#335)
* Add PHP 8.2 to CI

see ezyang/htmlpurifier#334

* Add PHP 8.2 to composer.json
2022-09-11 19:51:02 -04:00
Kieran
ce9cf2ec99 Fix creation of dynamic property (#337) 2022-09-10 14:03:42 -04:00
Kieran
36e06603a8 Fix creation of dynamic property (#333) 2022-09-06 13:05:15 -04:00
Kieran
dbbd3e59f9 Add contenteditable attribute definition (#332)
* Add contenteditable attribute definition

* gate behind html.trusted

* use enum
2022-09-06 13:04:45 -04:00
Tim Düsterhus
1c2bae18e3 Increase minimum requirement to PHP 5.6 (#331) 2022-09-02 21:43:29 -04:00
Tim Düsterhus
1b80051115 Fix some more PHP 8.2 deprecations (#330)
* Define HTMLPurifier_AttrTransform_SafeParam::$wmode

This fixes a PHP 8.2 deprecation.

* Define HTMLPurifier_DefinitionCache_DecoratorHarness::$cache

This fixes a PHP 8.2 deprecation.

* Define HTMLPurifier_DefinitionCache_DecoratorHarness::$mock

This fixes a PHP 8.2 deprecation.

* Define HTMLPurifier_DefinitionCache_DecoratorHarness::$def

This fixes a PHP 8.2 deprecation.

* Define HTMLPurifier_EntityParserTest::$_entity_lookup

This fixes a PHP 8.2 deprecation.
2022-09-02 21:38:58 -04:00
Tim Düsterhus
c60bba1fe4 Fix test selection logic in tests/test_files.php (#329)
Selecting the `fstools` tests also executed the `htmlt` tests.
2022-09-02 21:35:32 -04:00
Tim Düsterhus
6ec13635ce Update to actions/checkout@v3 (#328)
This does not introduce any functional difference and is intended as a
future-proofing change.

see https://github.com/actions/checkout/releases/tag/v3.0.0
2022-08-30 09:50:18 -04:00
Tim Düsterhus
be2a668e81 Treat PHP version numbers as strings in GitHub Actions (#327)
YAML will try to interpret numeric values as numbers, leading to `8.0` being
interpreted as `8` instead of `'8.0'`.

This doesn't result in a functional change, but cleans up the output of the
jobs a little (e.g. in the title line).
2022-08-30 09:46:59 -04:00
John Flatness
dff4746e13 Replace 8.1-deprecated utf8_ funcs with mbstring (#326) 2022-08-15 22:59:31 -04:00
Kieran
3fc193c755 Fix #322 - PHP 8.1 deprecation notice in HostBlacklist URIFilter (#323) 2022-06-27 17:20:36 -04:00
Tim Düsterhus
1db36fb09d Fix some PHP 8.2 deprecations (#319)
* Define HTMLPurifier_Lexer::$_entity_parser property

This fixes a PHP 8.2 deprecation.

* Define HTMLPurifier_URIFilterHarness::$filter property

This fixes a PHP 8.2 deprecation.

* Define HTMLPurifier_AttrTransform_NameSync::$idDef property

This fixes a PHP 8.2 deprecation.

* Define HTMLPurifier_AttrTransform_NameSyncTest::$accumulator property

This fixes a PHP 8.2 deprecation.

* Define HTMLPurifier_AttrValidator_ErrorsTest::$language property

This fixes a PHP 8.2 deprecation.

* Define HTMLPurifier_ChildDef_List::$whitespace property

This fixes a PHP 8.2 deprecation.

* Do not modify incoming tokens in RemoveSpansWithoutAttributes

Previously the undefined property `->markForDeletion` was added to the incoming
tokens. This causes a deprecation in PHP 8.2. Fix this by storing to-be-deleted
tokens inside SplObjectStorage. In PHP 8 a WeakMap would be preferable, as that
prevents leaks if `handleEnd` is never called for the token.
2022-06-10 16:30:01 -04:00
func0der
38296c603b Composer suggestions with extensions (#317)
* Add suggestion for usage of Filter.ExtractStyleBlocks

Resolves #316

* Add php extensions as suggestions

Resolves #316

* Correct typo in composer property
2022-06-02 23:03:44 -04:00
David Rans
1dd3e52365 PHP 8.1: fix various deprecations/errors in newest version of PHP (#310)
* Test on PHP 8.1

* PHP 8.1: fix deprecated NULL param to glob()

* PHP 8.1: fix PHP error when passing NULL to rawurlencode()

* PHP 8.1: calling ctype_lower with FALSE is deprecated

* PHP 8.1: passing NULL to setAttribute() is deprecated

* PHP 8.1: passing NULL to str_replace() is an error

* PHP 8.1: fix error passing NULL to str_replace()

* PHP 8.1: fix return type deprecation with backwards compatible attribute

* Revert typo
2022-04-08 13:48:12 -04:00
Edward Z. Yang
12ab42bd6e Release 4.14.0
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2021-12-24 20:21:49 -05:00
Kieran
1c784a5c3d ci: test on php 8.0 (#308)
* PHP 8 support

* only php 8.0

* Merged to master

Co-authored-by: Edward Z. Yang <ezyang@mit.edu>
2021-12-23 21:26:25 -05:00
Kieran
41fc223f96 feat: transform deprecated width attribute (#306)
* Transform deprecated col@width attribute

* Transform deprecated table@width attribute

* reformat
2021-12-23 21:26:14 -05:00
Arkadiusz Biczewski
996eaf4331 Remove unnecessary reference assigment (#301)
* Remove unnecessary reference assigment

Proposed code is PHP5 and PHP7 compatible. PHP5 interpreted `$e->$type[$attr]` as `$e->{$type[$attr]}`, but the expected behavior based on workaround is consistent with PHP7 interpretation: `($e->$type)[$attr]`. By using curly braces `{$e->$type}[$attr]` there is a forced interpretation order working for both versions.
Details can be found on https://www.php.net/manual/en/migration70.incompatible.php (section "Changes to the handling of indirect variables, properties, and methods")

* Fix syntax

Use correct syntax for indirect variable evaluation order change.
2021-09-07 14:16:55 -04:00
Kieran
c97bb93223 Fix GH workflow conditions (#298)
* Fix GH workflow conditions

* Remove PHP 8
2021-07-26 10:16:49 -04:00
Max
288bf75acc PHP 8 Support (#297)
Co-authored-by: Maksims Sļotovs <maksims.slotovs@printful.com>
2021-07-20 09:40:50 -04:00
Kieran
3a368d7668 Switch to GitHub Actions (#293) 2021-05-21 20:46:13 -04:00
Václav Smítal
6f9aac9325 CSS: Add "background-size" tag support (#289) 2021-04-22 10:01:00 -04:00
Kieran
1354e7e8c5 Fix "Parameter must be an array or an object that implements Countable" (#285) 2021-02-27 20:42:20 -05:00
Marcus Artner
214cb8a693 Fixed Issue #264: <thead> element removed from <table> if there are no <tbody> or <tr> elements (#283) 2021-01-26 11:11:50 -05:00
Jasper Zonneveld
2512f595e0 Check PHP version before checking magic quotes (#273)
This function has been DEPRECATED as of PHP 7.4.0. Relying on this function is highly discouraged. It is basically useless as of PHP 5.4 because it will always return false, so for modern applications it can be safely removed. But as this library still supports PHP 5.2 — according to the constraints in composer.json — I added a version check to prevent this method from being called (and trigger a notice) on PHP >=7.4.

See: https://www.php.net/manual/en/function.get-magic-quotes-gpc.php
2020-09-30 20:19:10 -04:00
kishor
6aa4166b7e Issue-256: Fix PHP 7.3 compatibility issues update zend.ze1_compatibility_mode mode (#267) 2020-09-15 20:12:43 -04:00
kishor
4285590c90 issue-256: Fix PHP 7.3 compatibility issues (#266) 2020-09-15 12:38:39 -04:00
LeSuisse
15258fd24e Fix typo in the 4.13.0 NEWS: PHP 6.4 never existed (#262)
Corresponding PRs (#230, #242) are about PHP 7.4 and PHP 6.4 has never
existed 🙂.
2020-07-06 14:36:33 -04:00
Edward Z. Yang
08e27c97e4 Release 4.13.0
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2020-06-28 20:56:53 -04:00
Edward Z. Yang
d7be9d2a8c Update changelog
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2020-06-28 20:55:45 -04:00
Edward Z. Yang
ce7efc11b2 Delete language tests that are interfering with PSR-0 compatibility
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2020-06-28 20:38:16 -04:00
Mateusz Turcza
3bdc031224 Add %HTML.Forms config directive (#260)
The %HTML.Forms directive enables Forms module regardless of the %HTML.Trusted
value. This adds support for form elements without enabling other unsafe
modules, such as Scripts, Iframe or Object.

To achieve the same effect without this directive one has to explicitly list
all enabled modules in %HTML.AllowedModules, and any not listed will be
removed. This however is not very convenient, as the allowed modules may vary
between doctypes.

Resolves #213.
2020-06-28 20:26:33 -04:00
Sergei Morozov
d148edbcf1 Exclude more resources from the distribution package (#257) 2020-06-06 10:29:01 -04:00
Fräntz Miccoli
ced089434d Make purifyArray work with empty array (#245) 2020-02-22 12:12:02 -05:00
Kieran
c2c91f52d0 Added tr@bgcolor to tidy (#244) 2020-02-22 12:10:30 -05:00
Eloy Lafuente
37dd61c45f Correct implode() params for php74 compliance (#243)
Passing parameters to implode() in reverse order is deprecated, use
implode($glue, $parts) instead of implode($parts, $glue).

Part of https://tracker.moodle.org/browse/MDL-67115
2020-01-21 11:17:18 -05:00
Witold Wasiczko
d15890222b Add support for stable php 7.4 (#242) 2020-01-02 06:58:15 -05:00
Anders Jenbo
fe0452d688 Correct typehinting of maybeGet* (#240)
getDefinition can return null, this wasn't properly hinted leaning to false error detections with static analyzers
2019-12-04 10:29:08 -05:00
lubomirbartos
df923d1f15 Issue 238 remove leading zeroes except if there is only zero (#239)
* Issue 238 remove leading zeroes except if there is only zero

* Issue-238 unit test fixes
2019-11-21 10:05:07 -05:00
Jordi Boggiano
4faca32a4d Exclude language classes from autoloader optimization (#236)
These classes are autoloaded by a custom autoloader
2019-10-31 13:42:00 -04:00
Edward Z. Yang
a617e55bc6 Release 4.12.0
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2019-10-27 23:44:26 -04:00
Edward Z. Yang
3060a5606c Update changelog
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2019-10-27 23:42:45 -04:00
Edward Z. Yang
b4ec8c8036 Merge remote-tracking branch 'ezyang/master' 2019-10-27 23:40:25 -04:00
Mateusz Turcza
06b3fc4cf4 Fix phpdoc params in HTMLModule::addElement() and Bool attr (#233) 2019-10-25 10:07:38 -04:00
Witold Wasiczko
c6ca293eab Add support for PHP 7.4 (#230)
* Add php7.4

* 7.4 cannot fail

* Disallow failures
2019-09-11 20:25:44 -04:00
Mateusz Turcza
ab2887e423 Fix DOM Lexer for PHP versions older than 5.4 (#225) 2019-08-09 17:01:13 -04:00
Mateusz Turcza
029d1df5e3 Fix PHP 5.4 and 5.5 builds on Travis CI (#227) 2019-08-09 09:45:41 -04:00
Edi Modrić
b88fcd180c Replace curly braces with square brackets in string offsets (#224) 2019-07-30 22:50:43 -04:00
Edward Z. Yang
83ab08bc1a Release 4.11.0
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2019-07-14 14:58:38 -04:00
Edward Z. Yang
2739fa5462 Update changelog.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2019-07-14 14:22:14 -04:00
Sandro Miguel Marques
b91833877a Method purifyArray() updated (#143)
* Methof purifyArray() updated

Now it works with multidimensional arrays

* Add test case.

Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2019-07-14 14:10:33 -04:00
Edward Z. Yang
abba77a80b Recent PHPs default to display_error=0, override this in index.php
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2019-07-14 14:04:12 -04:00
Michael Kliewe
7cfc44654a CSS: added "initial" and "inherit" to width + height (#144)
* CSS: added "initial" and "inherit" to width + height
CSS: added "initial" and "inherit" to min-width + min-height, removed "auto"
CSS: added "initial" and "inherit" and "none" to max-width + max-height, removed "auto"

* Fixed test: min-width:auto; should be false
2019-07-14 13:20:58 -04:00
msuzuki
8c153eef3a Supported hundreds of nested HTML (#202)
* Supported hundreds of nested HTML (#201)

* Add Core.AllowParseManyTags
2019-07-14 13:15:31 -04:00
DiLong Fa
524cd08a59 Update Config.php (#211)
Fixed Undefined index: class
2019-07-14 13:11:34 -04:00
Lukas Neumann
5a90c92d83 Adds PHP 7.3 to Travis (#214)
* Adds PHP 7.3 to Travis

* Fix tests for PHP 7.3
2019-07-14 13:10:24 -04:00
Darko Hrgovic
f03e1a2c48 Fixed reserved words in constants for PHP 7 as per https://www.php.net/manual/en/reserved.other-reserved-words.php (#222) 2019-07-10 22:24:27 -04:00
Edward Z. Yang
a93250f251 Don't use @ warning suppression.
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2018-11-11 18:20:33 -05:00
Edward Z. Yang
5a8e48d672 Remove php extension from release1-update script, to appease #192
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2018-11-11 17:05:51 -05:00
Edward Z. Yang
cb5a742574 Replace flush.php with a shell script, to appease #192
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2018-11-11 17:04:11 -05:00
Edward Z. Yang
ff41146439 Delete defunct release2-tag.php script.
Thanks Adham Saad <asaad@edrnet.com> for reporting.

Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2018-11-11 16:59:24 -05:00
Edward Z. Yang
aa83689188 Delete references to PHP 5.1 in INSTALL.
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2018-11-11 16:56:06 -05:00
Edward Z. Yang
3d15f5253b Don't define __autoload; rely on spl_autoload_register
Fixes #196

Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2018-11-11 16:55:01 -05:00
Edward Z. Yang
21e32042e9 Update schema for case-sensitive safe scripting
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2018-11-11 16:54:33 -05:00
Edward Z. Yang
ce0ccc4bff Delete unneeded update-config.php script
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2018-11-11 16:54:33 -05:00
Chris Pelzer
ab7bbefe8a Update reference to the valid types to refer to HTMLPurifier_VarParser::types (#189) 2018-11-11 16:23:01 -05:00
Edward Z. Yang
0f7b138aaf Make SafeScripting case-sensitive.
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2018-11-11 16:21:58 -05:00
Edward Z. Yang
4b6b3b31e8 Typofix: AutoForamt -> AutoFormat
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2018-11-11 16:21:58 -05:00
Dimitri Gritsajuk
5a01e6535d [SafeScripting] disable autoclosing of <script /> tag (#198) 2018-11-11 15:04:11 -05:00
Benjamin Brahmer
b74425bee5 .htaccess support apache 2.4+ (#190) 2018-11-11 14:55:13 -05:00
Oleg Kainov
39068e6d08 Update PHP version in INSTALL (#195)
* update PHP version in INSTALL

Fix #194

* update PHP version in INSTALL

Fix #194
2018-10-23 20:03:41 -04:00
Daijobou
b81690c17e More colors names (#176)
Added more colors names https://www.w3schools.com/colors/colors_names.asp

remove old unorded colors names
2018-06-09 22:48:13 -04:00
Mathias Brodala
4005ffd563 Suggest stable Composer installation (#179)
Normally people should not use the latest master but the latest stable release instead.
2018-06-09 22:44:20 -04:00
Mateusz Turcza
89b3fe431e Use IDNA constants only if defined (#171)
Fixes #168.

Solution based on https://git.ispconfig.org/ispconfig/ispconfig3/commit/0e3cf6f51b4fd.
2018-03-04 19:16:11 -05:00
Mateusz Turcza
3cb77da11d Make tagName and node data detection hhvm compatible (#170) 2018-03-04 13:22:03 -05:00
Edward Z. Yang
c1167edbf1 dummy commit
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2018-02-22 21:36:54 -05:00
Edward Z. Yang
c7b5148c4f New changelog entry.
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2018-02-22 21:34:16 -05:00
Edward Z. Yang
f8c830de12 Fix SPDX identifier
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2018-02-22 21:26:02 -05:00
Edward Z. Yang
0737a6e916 Whoops, forgot to edit WHATSNEW
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2018-02-22 21:07:33 -05:00
Edward Z. Yang
d85d39da45 Release 4.10.0
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2018-02-22 20:58:20 -05:00
Edward Z. Yang
f33d1f8e99 Changelog prep for release. (#167)
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2018-02-22 20:55:01 -05:00
John Flatness
6d6d88512a Skip counting currentNesting if null
This is an error starting in PHP 7.2
2017-12-30 00:23:44 -05:00
John Flatness
bb7ad66526 Quarantine __autoload defs for PHP 7.2 compat 2017-12-30 00:23:05 -05:00
Edward Z. Yang
64baeda65c Deal with old libxml incompatibilities.
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2017-12-22 22:03:02 -05:00
Jan Dageförde
67c3798922 Add relative length units from CSS 3
cf. https://www.w3schools.com/cssref/css_units.asp
2017-12-22 21:59:47 -05:00
Brad Mostert
df64746caa Fix spelling 2017-12-22 21:59:19 -05:00
Roberto
ab9c9f30fd Small typos in comments 2017-12-13 11:16:39 -05:00
Edward Z. Yang
5988f29583 Remove PHP 5.3 support.
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2017-10-08 19:52:05 -04:00
Marina Glancy
ce0ede24de Use IDNA2008 for converting domains to ASCII 2017-10-03 11:19:50 -04:00
Edward Z. Yang
17f80cd74b Merge pull request #141 from pawelkania/master
Fix E_WARNING when cache directory exists
2017-06-23 22:50:48 -04:00
pawelkania
e11f7c9802 Fix E_WARNING when cache directory exists
Sometimes Serializer from another thread already creaded dir - this commit resolves this issue.
2017-06-20 09:53:14 +02:00
Edward Z. Yang
d21213e0d3 Merge pull request #139 from Edgars-Burtnieks/patch-1
Unnecessary space which gives error removed
2017-06-10 15:57:51 -04:00
Edgars-Burtnieks
9b3f856fb9 Update README.md 2017-06-10 22:36:19 +03:00
Edward Z. Yang
95e1bae318 Release 4.9.3
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2017-06-02 22:28:16 -04:00
Edward Z. Yang
ff16ed3de4 Merge pull request #137 from Xiphin/master
Fix: using null instead of false. Fixed CPU is 100% on PHP 7.1.*
2017-06-02 21:07:56 -04:00
Xiphin
1df505296f Mod: using stdClass instead of stdclass 2017-06-02 09:55:46 +08:00
Xiphin
b9bc1039da Mod: using null instead of false 2017-06-02 08:50:38 +08:00
Xiphin
cb4871f446 Fix: It runs on PHP 7.1.* CPU process is 100% 2017-06-01 21:32:25 +08:00
Edward Z. Yang
65d5cdee50 Merge pull request #130 from Izumi-kun/lexer-create-fix
Autoloading must be skipped while checking for php builtin class.
2017-03-21 17:50:26 -07:00
Viktor Khokhryakov
b45c6f5363 Autoloading must be skipped while checking for php builtin class. 2017-03-20 10:42:28 +04:00
Edward Z. Yang
6d50e5282a Release 4.9.2
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2017-03-12 23:30:53 -07:00
Edward Z. Yang
5bc7c72608 Add tests for new entity decoding codepath.
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2017-03-12 20:05:09 -07:00
Edward Z. Yang
98984546d4 NEWS for 4.9.2
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2017-03-12 20:03:47 -07:00
Edward Z. Yang
c7a2f6f0df Merge pull request #129 from rybakit/patch-1
Fix a call to undefined function HTMLPurifier_Encoder()
2017-03-12 16:25:58 -07:00
Eugene Leonovich
fd24de69a3 Fix a call to undefined function HTMLPurifier_Encoder() 2017-03-12 22:44:03 +01:00
Edward Z. Yang
5688656174 Fix more PHP 5.3 problems.
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2017-03-08 18:01:58 -08:00
Edward Z. Yang
d728205767 Turn on 5.3 Travis testing.
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2017-03-08 17:47:14 -08:00
Edward Z. Yang
8836ae05aa Fix PHP 5.3 compatibility, fixes #125.
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2017-03-08 17:46:29 -08:00
Edward Z. Yang
b90295deda Enable PHP 7.1 testing.
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2017-03-08 13:38:20 -08:00
Edward Z. Yang
de82f9845f Release 4.9.1 (sic)
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2017-03-08 00:22:36 -08:00
Edward Z. Yang
9d2d75d8bc Add test case for removing empty list items.
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2017-03-08 00:11:32 -08:00
Edward Z. Yang
74f123a84c Fix #83.
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2017-03-07 17:52:41 -08:00
Edward Z. Yang
7e11c271b9 Revamp entity decoding to be more like HTML5.
See %Core.LegacyEntityDecoder for more details.

Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2017-03-07 17:34:59 -08:00
Edward Z. Yang
66bbae73a9 Comment on why it's a non-greedy match.
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2017-03-06 23:27:30 -08:00
Edward Z. Yang
5886326cd0 Test for catastrophic backtracking.
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2017-03-06 23:26:55 -08:00
Edward Z. Yang
564af61809 Usage/includes update.
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2017-03-06 23:06:56 -08:00
Edward Z. Yang
b19dcb0ba5 CHANGELOG for #120 fix, and remove the array_filter.
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2017-03-06 23:06:24 -08:00
Edward Z. Yang
586abc63e4 CHANGELOG for rgba/hsl/hsla patch.
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2017-03-06 23:03:33 -08:00
Edward Z. Yang
5b6a3f55bf Merge pull request #121 from breathbath/master
Fixing PREG_BACKTRACK_LIMIT_ERROR in HTMLPurifier_Filter_ExtractStyle…
2017-03-06 23:01:34 -08:00
Edward Z. Yang
0c31b22240 Merge pull request #118 from fxbt/master
Add hsl, hsla and rgba support for css color attribute definition
2017-03-06 23:01:06 -08:00
Edward Z. Yang
5662efc936 Fix #78.
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2017-03-06 22:54:54 -08:00
Edward Z. Yang
353c96f156 Document skips in more detail, #116.
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2017-03-06 20:31:28 -08:00
Edward Z. Yang
4047a6230b Extra cleanup on cleanUTF8.
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2017-03-06 16:31:02 -08:00
Andrey Pozolotin
9195cb7a2e Added escape sequense 2017-03-06 16:28:53 -08:00
Andrey Pozolotin
39c4c359ad Fixing PREG_BACKTRACK_LIMIT_ERROR in HTMLPurifier_Filter_ExtractStyleBlocks 2017-03-06 16:28:53 -08:00
Edward Z. Yang
bb3f86e80a Merge pull request #123 from mpyw-forks/fix/#122/surrogate-pair-range
Fix surrogate pair range
2017-03-03 23:13:30 -08:00
mpyw
d16e73e63e Add test for #122 2017-03-04 15:40:44 +09:00
mpyw
f145f64bf4 Fix #122: correct surrogate pair range 2017-03-04 15:38:01 +09:00
Andrey Pozolotin
5fdec87fe9 Added escape sequense 2017-03-01 17:52:00 +01:00
Andrey Pozolotin
4462559459 Fixing PREG_BACKTRACK_LIMIT_ERROR in HTMLPurifier_Filter_ExtractStyleBlocks 2017-03-01 17:46:03 +01:00
f.godfrin
12185143ef Use a constructor and a property for the alpha check 2017-02-10 21:03:11 +01:00
f.godfrin
17a90a951a Better regex for mungeRgb 2017-02-10 00:40:56 +01:00
f.godfrin
0bab4b9fd0 Fix mungeRgb to handle percent, float and hsl values 2017-02-10 00:38:05 +01:00
f.godfrin
bd92f3531b Remove double % 2017-02-09 23:37:36 +01:00
f.godfrin
0d5ab2fe13 Include hsl and hsla support 2017-02-09 23:34:19 +01:00
f.godfrin
d41a59e422 Add rgba support for css color attribute definition 2017-02-09 22:18:15 +01:00
Bastian Hofmann
8e4cacf0a7 Refactor HTML.Noopener to HTML.TargetNoopener so that it behaves like HTML.TargetNoreferrer and is active by default if a target is set 2017-02-03 16:54:51 -08:00
Bastian Hofmann
c82051c3e1 Add HTML.Noopener to add a noopener rel to every external link
This has performance benefits https://jakearchibald.com/2016/performance-benefits-of-rel-noopener/ but most importantly also security benefits https://mathiasbynens.github.io/rel-noopener/

Adresses https://github.com/ezyang/htmlpurifier/issues/96
2017-02-03 16:54:51 -08:00
Edward Z. Yang
d4a96463ef export-ignore .travis.yml
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2017-01-19 09:28:40 -08:00
Edward Z. Yang
1b7d684d07 Remove $a = array($a) which is miscompiled by Zend OpCache.
Fixes #108.

Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2017-01-04 14:35:52 -05:00
Edward Z. Yang
5070404376 Handle semicolons in strings in CSS correctly.
Fixes http://htmlpurifier.org/phorum/read.php?3,7522,8096

Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2016-10-29 00:01:19 -07:00
Edward Z. Yang
cef27f750d Add missing changelog entries.
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2016-10-27 17:31:10 -07:00
Edward Z. Yang
59463c5c39 Allow %URI.DefaultScheme to be null.
Fixes #103.

Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2016-10-27 17:30:44 -07:00
Edward Z. Yang
d19d648a26 [ci skip] Add a Travis build badge.
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2016-10-27 02:02:29 -07:00
Edward Z. Yang
20b40a5441 Travis support.
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2016-10-27 02:00:47 -07:00
Edward Z. Yang
34d252cbbc Update usage.xml.
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2016-10-27 02:00:47 -07:00
Edward Z. Yang
8b28e571fe Handle case when IDNAs are supported.
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2016-10-27 02:00:46 -07:00
Edward Z. Yang
3ae21ce511 PHP 7.0 warnings fix: don't pass rvalue by reference.
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2016-10-27 02:00:46 -07:00
Edward Z. Yang
3ba9133b21 Don't assume that idn_to_ascii does validation.
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2016-10-27 02:00:46 -07:00
Edward Z. Yang
dc8702160c Merge pull request #101 from yankos/hotfix/directory_not_close
FIX directory not closing
2016-10-15 23:14:10 -07:00
yan_kos
4dc68aa920 FIX directory not closing
#100
2016-10-15 16:20:47 +03:00
Edward Z. Yang
08eee90e15 Delete asserts, fixes #97.
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2016-10-02 00:14:41 -07:00
Edward Z. Yang
1ef4375dbb Proposed fix to Serializer code.
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2016-09-05 15:24:08 -07:00
Edward Z. Yang
6a221a3045 Merge pull request #94 from zobzn/css-min-max-width
css definition (min-width, max-width, min-height, max-height)
2016-09-05 14:57:44 -07:00
zema
246fc8946a css properties: min-width, max-width, min-height, max-height 2016-09-05 10:45:58 +03:00
Edward Z. Yang
1ce2fde400 Merge pull request #91 from apsdsm/fix-permissions-bug
changed chmod behaviour in Serializer
2016-07-29 03:25:41 -07:00
Nick del Pozo
1f982d279f rollback change to permissions 2016-07-29 08:56:36 +09:00
Nick del Pozo
8be8cee9b3 changed chmod behaviour in Serializer 2016-07-27 12:56:03 +09:00
Edward Z. Yang
d0c392f77d Release 4.8.0
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2016-07-16 05:58:58 -07:00
Edward Z. Yang
d1c5d75027 Fix #73 with Attr.ID.HTML5
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2016-07-16 05:52:45 -07:00
Bart Butler
3747cb7efb avoid exif_imagetype exception with small files/corrupt data URI 2016-07-16 05:23:17 -07:00
Edward Z. Yang
0166c3728b Stop trying to chmod if SerializerPermissions is null, fixes #71
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2016-07-01 16:04:11 -04:00
Edward Z. Yang
ed180f595d Hack to fix #85
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2016-07-01 15:52:09 -04:00
Edward Z. Yang
3e4deabbb3 New smoketest for testing configuration HTML form.
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2016-07-01 15:50:51 -04:00
Edward Z. Yang
44baee6a82 Partial border-radius support.
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2016-06-30 22:22:13 -04:00
Cameron Ball
1675fc7caf Add %HTML.TargetNoreferrer, which adds rel="noreferrer" when target attribute is set
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2016-06-30 21:53:43 -04:00
Wes Cossick
cc35c8eb8c tel protocol support. 2016-06-30 21:19:49 -04:00
Edward Z. Yang
a11aeab4a6 Don't suggest 777, only 775.
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2016-03-27 15:59:10 -07:00
Edward Z. Yang
43a9f052fd Fix #57, make flashvars check (and others) case-insensitive.
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2016-03-27 15:56:30 -07:00
Edward Z. Yang
b4981c3395 Fix #67, don't use <body> tags in comments for %Core.ConvertDocumentToFragment
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2016-03-27 15:19:32 -07:00
Edward Z. Yang
f14076dc3e Fix #49; prevent readdir infinite loop when cache directory not listable.
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2016-03-27 14:53:31 -07:00
Edward Z. Yang
91fd55c857 Fix #45, errors when ul/ol allowed without li.
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2016-03-26 22:41:54 -07:00
Rodrigo Prado
096064dd0a Added more info in README 2016-03-24 20:32:54 -07:00
Mike Zukowsky
845edf16e2 Docblock update 2016-03-24 20:26:41 -07:00
Roman Kovalenko
2c4f889ca4 Remove BOM from file INSTALL.fr.utf8 It's only one file with BOM among project 2016-03-24 20:25:58 -07:00
Stefano Torresi
b3856d2040 Export maintenance and path2class scripts in composer.
These scripts could be used in continuously integrated environments
(e.g. `generate-standalone.php`).
2016-03-24 20:24:18 -07:00
Chimpzee
6e00b443cd Bug with tempnam("/tmp", "");
Some hostings have a different temporary path than "/tmp".
2016-03-24 20:19:57 -07:00
Edward Z. Yang
7e49ff3dcd Announce PHP 7 support.
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2016-03-24 00:14:05 -07:00
Edward Z. Yang
1f3e282fde Fix a bounds error which now errors in PHP 7.
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2016-03-24 00:13:08 -07:00
Edward Z. Yang
753c830239 Update to work with Git version of SimpleTest.
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2016-03-24 00:08:03 -07:00
Edward Z. Yang
72123e23c9 Update ExtractStyleBlocks tests for modern CSSTidy at https://github.com/Cerdic/CSSTidy
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2016-03-23 23:39:38 -07:00
Edward Z. Yang
45161b4fb1 Accept leading digits in hostnames as per RFC 1123.
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2016-03-23 22:42:21 -07:00
Synchro
25db9e1dd0 Don't use PHP4-style constructors 2016-03-16 17:09:41 -07:00
Edward Z. Yang
92aabf2b23 Fix #76, linkify includes dots at end of URL.
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2016-03-02 02:05:54 -08:00
Edward Z. Yang
aebe1c02a2 Use idn_to_ascii when available.
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2016-03-02 01:35:07 -08:00
Edward Z. Yang
913ac6955b CSS.AllowDuplicates for duplicate properties.
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2015-12-20 11:53:54 -08:00
Edward Z. Yang
958ba65595 Don't truncate alts.
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2015-09-29 15:36:53 -07:00
Edward Z. Yang
ae1828d955 Release 4.7.0.
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2015-08-04 18:03:42 -07:00
Edward Z. Yang
e34a858ca9 Merge pull request #60 from sylfabre/patch-1
Missing @return
2015-08-03 10:36:45 -07:00
Sylvain
2c963dcc7f Missing @return
Adding PHPDoc @return statement for code completion in IDE
2015-08-03 10:21:47 +02:00
Edward Z. Yang
bfbf8a9da1 Revert "Fix autoloading in Composer."
This reverts commit 04cf6c8739.
2015-06-14 10:57:52 -07:00
Timothée Barray
04cf6c8739 Fix autoloading in Composer.
Per https://getcomposer.org/doc/04-schema.md#psr-0
2015-06-06 20:04:21 -07:00
Edward Z. Yang
0d7328dbb2 s/Include/Inclure/
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2015-05-05 13:43:49 -07:00
anthonybocci
7aeedd9071 Updated translation of installing in french 2015-05-05 10:50:42 -07:00
Edward Z. Yang
c67e4c2f7e All values, including empty, are valid HTML bools.
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2015-02-11 16:36:44 -08:00
Edward Z. Yang
0c3e68dd03 Stop using umask to make definition cache. Fixes #32
This is not really the right way to solve the ACL problem,
but there isn't really any reason we should be mucking about
with the umask.

Mucked around with the test case to make it pass, but I think
it's probably a bit delicate now.

Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2014-12-08 18:30:54 -08:00
Jon Dufresne
b307f3d9ef Update gitattributes to produce slimmer packages for composer 2014-10-23 15:36:02 -07:00
Edward Z. Yang
cd60294ada Fix rgb in border attribute with spaces, fixes #30.
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2014-08-31 12:12:38 +01:00
Edward Z. Yang
39d3df1fd7 Add AutoFormat.RemoveEmpty.Predicate, fixes #35.
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2014-08-31 12:12:17 +01:00
Edward Z. Yang
b8704535a3 Update test.
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2014-08-31 11:10:11 +01:00
Edward Z. Yang
4da38aca80 Update YouTube embed code to new style, fixes #28
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2014-08-31 09:30:16 +01:00
Edward Z. Yang
bf84df4f7d Move opacity to tricky. Fixes #16.
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2014-08-31 09:24:11 +01:00
Edward Z. Yang
15d1a3003a Don't truncate in DOMLex when seeing closing div
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2014-08-31 08:50:33 +01:00
Edward Z. Yang
80ebd4322e Typo in docs, thanks Soleil Golden for reporting.
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2014-02-04 12:17:24 -08:00
Edward Z. Yang
18b8a0e44a Make Composer work with PHP 5.2 and earlier. Reported by @voku
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2013-12-08 15:51:56 -08:00
Edward Z. Yang
6f389f0f25 Release 4.6.0.
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2013-11-30 00:25:19 -08:00
Edward Z. Yang
8cd08620dc Conditionalize hash_hmac tests for 5.0
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2013-11-29 22:27:01 -08:00
Edward Z. Yang
0beecad78a Add Twitter handle to release notes.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2013-11-29 22:26:57 -08:00
Edward Z. Yang
54477c172b Fix infinite loop in Lexer.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2013-10-27 21:41:08 -07:00
Edward Z. Yang
e52d1fe310 Fix < PHP 5.4 compatibility break. Thanks GromNaN for submitting the patch.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2013-10-22 14:17:41 -07:00
Edward Z. Yang
0767bbc12d Rewrite FixNesting implementation to be tree-based.
This mega-patch rips out the FixNesting implementation and the related
ChildDef components.  The primary algorithmic change is to convert from
use of tokens to tree nodes, which are far more amenable to the style
of processing that FixNesting uses.  Additionally, FixNesting has been
changed to go bottom-up rather than top-down, in order to avoid needing
to implement backtracking.

This patch simplifies a good deal of the relevant logic, since we no
longer need to continually recalculate the nesting structure when
processing things.  However, the conversion to the alternate format
incurs some overhead, so for small inputs these changes are not a win.
One possibility to greatly reduce the constant factors here is to switch
to entirely using libxml's representation, and never serializing tokens;
this would require one to rewrite injectors, however.

The iterative post-order traversal in FixNesting is a bit subtle, but
we have essentially reified the stack and continuations.

We've removed support for %Core.EscapeInvalidChildren.

Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2013-10-20 22:37:01 -07:00
Edward Z. Yang
b3640e1af6 Add conversion functions for our own tree format.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2013-10-20 15:05:11 -07:00
Edward Z. Yang
be5769804a Make the Token class abstract.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2013-10-17 16:13:04 -07:00
Edward Z. Yang
d6fbd7df22 Remove some unnecessary pass-by-reference.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2013-10-16 18:55:23 -07:00
Edward Z. Yang
804a06f01e Remove PHP 4 compatibility hack.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2013-10-16 18:36:44 -07:00
Edward Z. Yang
8f401f769e Use a Zipper to process MakeWellFormed, removing quadratic behavior.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2013-10-13 13:21:02 -07:00
Edward Z. Yang
82bcc62058 Properly handle context variables that are NULL.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2013-10-13 13:21:02 -07:00
Edward Z. Yang
f17490f009 Implementation of a Zipper, for efficient splice.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2013-10-13 01:16:32 -07:00
Edward Z. Yang
a5fc37d8c3 Improve gitignore.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2013-10-13 00:18:11 -07:00
Edward Z. Yang
412bae13b5 Fix quadratic behavior in DOMLex due to array_shift.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2013-09-17 00:48:42 -07:00
Edward Z. Yang
cf44f399f8 Properly use HMAC for secure munging.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2013-09-13 21:16:50 -07:00
Marcus Bointon
fac747bdbd PSR-2 reformatting PHPDoc corrections
With minor corrections.

Signed-off-by: Marcus Bointon <marcus@synchromedia.co.uk>
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2013-08-17 22:27:26 -04:00
Edward Z. Yang
19eee14899 Tighten up invariants.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2013-07-26 21:54:53 -07:00
Edward Z. Yang
25d49f4ec0 Explicitly specify decorator name.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2013-07-26 21:37:33 -07:00
Edward Z. Yang
53c2907706 New directive %Core.AllowHostnameUnderscore
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2013-07-26 21:33:39 -07:00
Edward Z. Yang
af7107e830 Add note fall through is intentional.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2013-07-18 10:22:45 -07:00
Marcus Bointon
107b3055a1 Fix var name conflict in loadArray 2013-07-16 21:56:29 -07:00
Synchro
29a3c70370 A bunch of PHPdoc and php codesniffer corrections - no functional code changes 2013-07-16 21:53:17 -07:00
Edward Z. Yang
75bd7abcc7 Make list nesting test more sensitive.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2013-06-06 13:08:13 -07:00
Edward Z. Yang
0680832d41 Use info_parent_def to get parent information, since it may not be present in info array.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2013-05-21 17:19:59 -07:00
Edward Z. Yang
19360ddb36 Ignore commas and nbsps for linkification. Thanks nAS for contributing.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2013-05-21 16:43:59 -07:00
Edward Z. Yang
3c903b7463 Doc fix.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2013-05-18 08:48:47 -07:00
Edward Z. Yang
6e37ecd1c8 Make URI parsing algorithm more strict.
Thanks Michael Gusev <mgusev@sugarcrm.com> for contributing this patch.

Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2013-04-16 13:56:43 -07:00
Edward Z. Yang
20eff0a3a0 Fix NEWS entry.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2013-02-21 14:08:36 -08:00
Edward Z. Yang
d516e2f8de Release 4.5.0
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2013-02-17 16:04:08 -08:00
Edward Z. Yang
631021733b Add %Core.DisableExcludes directive
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2013-02-17 15:47:38 -08:00
Michael Tibben
344e0640b6 Add required constant for composer autoloading
Signed-off-by: Michael Tibben <michael.tibben@99designs.com>
2012-12-21 16:16:16 +08:00
Edward Z. Yang
62d2550e16 Use SHA-1 instead of MD5.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2012-10-27 02:33:22 -07:00
Edward Z. Yang
087145a71b Blacklist more tags from RemoveEmpty.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2012-10-27 02:32:48 -07:00
Edward Z. Yang
a44187a5c1 Cleanup after data validation.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2012-10-27 02:30:58 -07:00
Edward Z. Yang
c0ad68108a Do checks against iconvAvailable because PHP 5.4 has botched iconv support.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2012-10-27 02:27:57 -07:00
Edward Z. Yang
83a574491e Comment for bug that needs to get fixed.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2012-10-11 11:40:02 -07:00
Edward Z. Yang
3b537365a4 CSS properties page-break-*
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2012-10-11 11:39:52 -07:00
Rob Loach
8a8b123d33 Autoloading support for Composer 2012-09-16 18:11:46 +02:00
Edward Z. Yang
72db575446 Fix bug with non-lower case color names in HTML.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2012-07-30 10:54:32 -04:00
Edward Z. Yang
d8bb73ce46 Permit underscores in font-families.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2012-07-27 18:28:29 -04:00
Edward Z. Yang
f90372f8ab More support for white-space.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2012-06-16 17:10:36 -04:00
Edward Z. Yang
f38fca32a9 Don't lower-case components of background.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2012-06-02 11:22:58 -04:00
Edward Z. Yang
5a23004652 Support for inline-block.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2012-05-25 23:55:48 -04:00
Edward Z. Yang
6705140082 Fix in AttrTransform_Nofollow
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2012-05-14 23:07:27 -04:00
Edward Z. Yang
cb7162a995 Use prepend for autoloading on PHP 5.3+
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2012-05-02 11:07:24 -04:00
Edward Z. Yang
2189a9430f Support for safe external scripts via explicit whitelist.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2012-04-27 17:44:49 -04:00
Edward Z. Yang
7291f19347 Fix problem where stacked AttrTransforms clobber each other.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2012-03-16 23:12:16 -04:00
Benjamin Steininger
9fcffd6533 Add composer.json file for easy install via composer.
Composer: http://getcomposer.org/

Since HTML Purifier is not completely psr-0 compatible (a classmap is
not enough for autoloading), the package-description does not contain
anything autoload-related. The user has to include the autoloader
himself.

This lets us create an entry on packagist which allows installing HTML
Purifier without the need to declare a repository in projects; it also
makes it easy to create libraries which want to use HTML Purifier using
composer.

Signed-off-by: Benjamin Steininger <robo47@robo47.net>
2012-03-16 01:05:02 -04:00
Edward Z. Yang
31dce298ea Actually make URI.DisableResources do something.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2012-03-02 13:25:00 -05:00
Edward Z. Yang
8c9d461a62 Bugfix: _blank not blank.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2012-02-18 11:28:01 -05:00
Edward Z. Yang
7291a9647e Update NEWS.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2012-01-25 07:06:30 -05:00
Edward Z. Yang
17af0e4fc1 Release 4.4.0
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2012-01-18 19:22:31 -05:00
Edward Z. Yang
70028f83d6 Make all of the tests work on all PHP versions.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2012-01-18 18:57:13 -05:00
Edward Z. Yang
5c5e3fe79f Avoid doing stupidly clever reflection tricks that make old PHP versions sad.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2012-01-18 18:21:36 -05:00
Edward Z. Yang
56a26cab14 Modernize some of the testing facilities.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2012-01-18 18:10:16 -05:00
Edward Z. Yang
1c7fedff5a Tighter CSS selector validation.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2012-01-17 15:36:26 -05:00
Edward Z. Yang
9de0785448 Remark about bypassing host list with punycode.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2012-01-06 05:32:53 -08:00
Edward Z. Yang
974fe3f25e Optional support for IDNAs with PEAR Net_IDNA2
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2012-01-06 05:28:00 -08:00
Edward Z. Yang
94468f3c24 Remove PEARSax3 lexer.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2012-01-03 20:40:17 +08:00
Edward Z. Yang
e0354fecd9 Make forms work for transitional doctypes.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2011-12-30 22:56:44 +08:00
Edward Z. Yang
1bbbc624dd Remove inscrutable TODO, optionalize another.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2011-12-27 23:50:02 +08:00
Edward Z. Yang
49879d2cc6 Add note about superseding modules in TODO.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2011-12-27 23:21:32 +08:00
Edward Z. Yang
5c9b5130c8 Bump minor version number to 4.4.0.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2011-12-26 21:55:14 +08:00
Edward Z. Yang
d2de8d976a Add test for invalid SafeIframe usage.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2011-12-26 21:52:55 +08:00
Bradley M. Froehle
4164b2eb2b Implement Iframe module, and provide %HTML.SafeIframe and %URI.SafeIframeRegexp for untrusted usage.
The purpose of this addition is twofold. In trusted mode, iframes are
now unconditionally allowed.

However, many online video providers (YouTube, Vimeo) and other web
applications (Google Maps, Google Calendar, etc) provide embed code in
iframe format, which is useful functionality in untrusted mode.
You can specify iframes as trusted elements with %HTML.SafeIframe;
however, you need to additionally specify a whitelist mechanism such as
%URI.SafeIframeRegexp to say what iframe embeds are OK (by default
everything is rejected).

Note: As iframes are invalid in strict doctypes, you will not be able to
use them there.

We also added an always_load parameter to URIFilters in order to support
the strange nature of the SafeIframe URIFilter (it always needs to be
loaded, due to the inability of accessing the %HTML.SafeIframe directive
to see if it's needed!)  We expect this URIFilter can expand in the future
to offer more complex validation mechanisms.

Signed-off-by: Bradley M. Froehle <brad.froehle@gmail.com>
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2011-12-26 21:50:53 +08:00
Edward Z. Yang
1e5293d9fe Add more attributions.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2011-12-26 15:45:41 +08:00
Edward Z. Yang
6b643ede02 Implement %HTML.AllowedComments and %HTML.AllowedCommentsRegexp
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2011-12-26 15:34:42 +08:00
Edward Z. Yang
e41af46a8b Fix broken table content model, easily seen in XHTML1.1
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2011-12-26 14:49:26 +08:00
Edward Z. Yang
3570c9985a Properly handle nested sublists by folding into previous list item.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2011-12-26 14:00:34 +08:00
Edward Z. Yang
8d572993b4 Implement %HTML.TargetBlank
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2011-12-26 08:36:00 +08:00
Edward Z. Yang
1bacbc0563 Add isBenign and getDefaultScheme methods.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2011-12-25 23:31:15 +08:00
Edward Z. Yang
bfe2c10d07 Add a little bit of documentation about contexts for URIFilters.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2011-12-25 23:31:15 +08:00
Edward Z. Yang
9b10515fa4 Core.EscapeNonASCIICharacters now always works, even if target is UTF-8.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2011-12-25 23:31:15 +08:00
Edward Z. Yang
1255d0f15d Add support for scope attribute on td and th.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2011-12-25 23:31:13 +08:00
Edward Z. Yang
d45e11cc6b Add one more test for SPL autoload defaults.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2011-12-25 02:58:51 -05:00
Edward Z. Yang
94c15d1f56 Fix iconv truncation bug.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2011-12-25 02:31:06 -05:00
Edward Z. Yang
ce68cfe484 Remove spurious abstract definition; PHP 5.4 doesn't like that.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2011-12-18 13:28:07 -05:00
Edward Z. Yang
9f5f85952b Don't unset parser variable; plays poorly with serialize.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2011-12-18 13:27:51 -05:00
Edward Z. Yang
dbb365155b Typofix.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2011-08-24 09:56:51 -04:00
Edward Z. Yang
32c0ffde0c Don't add nofollow for matching hosts, generalize this code.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2011-08-24 09:56:49 -04:00
Edward Z. Yang
856a5e5b89 Update INSTALL to avoid missing config snafu, update usage.xml.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2011-08-24 09:56:21 -04:00
Edward Z. Yang
820d6e9097 Do not duplicate nofollow attribute in transform.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2011-08-24 09:56:13 -04:00
Edward Z. Yang
35b1fbce01 Explicitly initialize anonModule to null.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2011-04-19 22:46:17 +01:00
Edward Z. Yang
bcfbb8338c URI.Munge munges https to http URIs.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2011-04-10 13:09:24 +01:00
Edward Z. Yang
f51a6f7de9 Color keywords now case-insensitive.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2011-04-10 12:45:02 +01:00
Edward Z. Yang
f1439f0af5 Release 4.3.0
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2011-03-27 23:02:49 +01:00
Edward Z. Yang
0124605918 Fix CSS URL innerHTML/cssText escaping bug.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2011-03-27 21:24:32 +01:00
Edward Z. Yang
afb007d22f Protect against font family innerHTML/cssText attacks.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2011-03-27 20:35:43 +01:00
Edward Z. Yang
0dd9e4faf4 Fix Internet Explorer innerHTML bug.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2011-03-27 11:50:52 +01:00
Edward Z. Yang
94ed3b1231 Implement CSS.AllowedFonts.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2011-03-24 22:54:39 +00:00
Edward Z. Yang
6a6c0ed5d7 Don't autoclose if no parents support the tag.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2011-03-22 00:26:41 +00:00
Edward Z. Yang
e05b555448 Safety update for nested ul test.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2011-03-21 21:05:23 +00:00
Edward Z. Yang
ee9c70ab7f Fix E_NOTICE from indexing into empty string.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2011-03-17 17:33:11 +00:00
Edward Z. Yang
b4469f17aa Fix missing numeric entities (shows up when DirectLexing).
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2011-02-27 11:58:37 +00:00
Edward Z. Yang
e76f4b45d0 Dramatically rewrite null host URI handling.
Basically, browsers don't parse what should be valid URIs correctly, so
we have to go through some backbends to accomodate them.  Specifically,
for browseable URIs, the following URIs have unintended behavior:

    - ///example.com
    - http:/example.com
    - http:///example.com

Furthermore, if the path begins with //, modifying these URLs must
be done with care, as if you remove the host-name component, the
parse tree changes.

I've modified the engine to follow correct URI semantics as much
as possible while outputting browser compatible code, and invalidate
the URI in cases where we can't deal.  There has been a refactoring
of URIScheme so that this important check is always performed,
introducing a new member variable allow_empty_host which is true
on data, file, mailto and news schemes.

This also fixes bypass bugs on URI.Munge.

Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2011-01-25 18:56:46 +00:00
Edward Z. Yang
a32d5b52e1 Fix embedding flash on non-IE browsers and allow more wmode.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2011-01-22 12:28:57 +00:00
Maxim Krizhanovsky
a3d71fe606 Iterative traversal of DOM.
There are some deep DOMs you can hit the maximum nesting level
limit in tokenizeDOM (we've experienced this even with maximum nesting
level of 300). Here is an iterative version of the same function with
simple queue/dequeue approach.

Signed-off-by: Maxim Krizhanovsky <darhazer@gmail.com>
2011-01-19 22:06:40 +00:00
Edward Z. Yang
77982bd61d Bump version number for Cache.SerializerPermissions.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2011-01-14 00:40:39 +00:00
Petr Skoda
78c4e62245 Add new Cache.SerializerPermissions option. 2011-01-13 22:57:40 +00:00
Edward Z. Yang
5803c06765 Check that argv is set before operating on it.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2011-01-13 22:42:47 +00:00
Edward Z. Yang
b63569ac22 Fix bad interaction between bootstrap autoloader and Zend Debugger/APC.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-12-31 09:48:28 +00:00
Edward Z. Yang
f3d050c517 Fix two bugs with caching of customized raw definitions.
The first bug is that we will repeatedly write out the result
of a customized raw definition to the filesystem, even when a cache
entry already exists.

The second bug is that caching these definitions doesn't actually
work (the cache entry is written but never used.)  A new API
for retrieving raw definitions permits the user to take advantage
of caching.

Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-12-30 23:51:53 +00:00
Edward Z. Yang
6dcc37cb55 Update PHPT instructions.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-11-21 14:00:20 +00:00
Edward Z. Yang
cfc4ee1faf Add initial implementation of CSS.Trusted.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-11-12 18:45:03 +00:00
Edward Z. Yang
598c5b60c9 Add sanity check against ze1_compatibility_mode.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-11-12 16:15:03 +00:00
Edward Z. Yang
c9e7ffc172 Fix incorrect PEARSax3 test assertion.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-11-12 16:06:34 +00:00
Edward Z. Yang
feeffe6ed2 Check if schema.ser was corrupted.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-10-29 14:47:40 +01:00
Edward Z. Yang
4754d407aa Fix removal of id with DirectLex by preserving armor.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-10-28 17:25:31 +01:00
Nick Pope
0b9db1f54b Allow non-static autoload methods w/ PHP >= 5.2.11
HTML Purifier loads itself as the first autoload function by
unregistering all existing functions and re-registering them after
registering itself.

Originally an exception was thrown when a non-static object method was
encountered as the behaviour of spl_autoload_functions() did not return
the object instance, but only the class name.  This was filed on PHP
bugs (#44144).

The bug was fixed for PHP >= 5.2.11 and >= 5.3

Signed-off-by: Nick Pope <nick@nickpope.me.uk>
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-10-28 17:25:17 +01:00
Edward Z. Yang
1d4a38d055 Escape CDATA before handling conditional comments.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-09-28 12:11:26 -04:00
Edward Z. Yang
8c80349f9d Implement HTML.Nofollow for external links.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-09-28 12:01:57 -04:00
Edward Z. Yang
d848c99b74 Make IE conditional comment matching ungreedy.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-09-28 10:22:38 -04:00
Edward Z. Yang
882ffed9ba Release 4.2.0.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-09-15 02:52:57 -04:00
Edward Z. Yang
86990a21f1 Rename newline normalization directive to something better.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-09-15 02:50:39 -04:00
Tomasz Muras
9573f0933d Make newline normalization optional. 2010-09-14 23:49:28 -04:00
Edward Z. Yang
632bf2bbd4 Shift to 4.2.0 release cycle.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-09-14 23:38:51 -04:00
Edward Z. Yang
ec86598446 Add support for file:// URI scheme.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-09-09 00:01:26 -04:00
Edward Z. Yang
b6c3f5e89b Update TODO.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-09-08 23:42:05 -04:00
Edward Z. Yang
7c91104532 Implement HTML.FlashAllowFullScreen.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-09-08 23:39:20 -04:00
Edward Z. Yang
eac628f490 Add %CSS.ForbiddenProperties directive.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-09-04 02:59:03 -04:00
Edward Z. Yang
92913bc816 Add documentation about configuration directive types.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-09-04 02:28:53 -04:00
Edward Z. Yang
479d793562 Reword documentation to be clearer, and give warning on common user error.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-09-04 01:31:20 -04:00
Edward Z. Yang
e2c15f1c98 Fix Mac Snow Leopard APC bug.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-08-26 21:40:58 -07:00
Edward Z. Yang
57ced3f361 Tighten up ignore spec.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-06-30 06:00:45 -07:00
Edward Z. Yang
c04a441b3e Actually make URI.DisableResources do something.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-06-30 05:59:17 -07:00
Edward Z. Yang
1bed8b6d5f Added %Core.RemoveProcessingInstructions.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-06-20 18:26:44 -07:00
Edward Z. Yang
33afd7d9e0 Fix improper handling of IE conditional comments.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-06-18 06:08:54 -07:00
Edward Z. Yang
18e538317a Release 4.1.1.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-05-31 20:17:31 -07:00
Edward Z. Yang
96a4193fc9 Fix undefined index warnings in maintenance scripts.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-05-31 20:07:27 -07:00
Edward Z. Yang
00c66fa9cb Fix bug in parsing single attribute with entities.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-05-31 19:44:18 -07:00
Edward Z. Yang
d3abcb90e3 Rewrite CSS url() and font-family output logic.
The new logic is as follows:

* Given a URL to insert into url(), check that it is properly URL
  encoded (in particular, a doublequote and backslash never occurs
  within it) and then place it as url("http://example.com").

* Given a font name, if it is strictly alphanumeric, it is safe to omit
  quotes. Otherwise, wrap in double quotes and replace '"' with '\22 '
  (note trailing space) and '\' with '\5C ' (ditto).

We introduce expandCSSEscape() which is a hack for common parsing
idioms in CSS; this means that CSS escapes are now recognized inside
URLs as well as unquoted font names.

Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-05-31 18:45:21 -07:00
Edward Z. Yang
df3100b1b3 Make test script less chatty when log_errors is on.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-05-20 21:50:44 -04:00
Edward Z. Yang
143e1ad718 Remove shebang and +x from test script.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-05-20 21:21:26 -04:00
Edward Z. Yang
875b0febde Fix infinite loop involving wrapping formedness.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-05-17 23:22:51 -04:00
Edward Z. Yang
3166b8a10f Fix bug in background-position with center keyword.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-05-05 15:08:57 -04:00
Edward Z. Yang
1a70bffd5a Emit errors when body is extracted.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-05-04 13:41:09 -04:00
Edward Z. Yang
f4c6e10ff7 Release 4.1.0.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-04-26 18:31:40 -04:00
Edward Z. Yang
c1cbd9e565 Mute STRICT errors from CSSTidy and don't run PEARSax3 on PHP 5.3.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-04-26 18:27:32 -04:00
Edward Z. Yang
da94d3d6ac Always quote the contents of url() in CSS.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-04-26 12:10:15 -04:00
Edward Z. Yang
80793e925e Remove +x bit from RemoveSpansWithoutAttributes.php
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-04-17 00:23:09 -04:00
Edward Z. Yang
8ef4fb22db Support for flashvars in HTML.SafeEmbed.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-03-30 13:33:13 -04:00
Edward Z. Yang
70a7a3f5dd Handle <ol><ol> properly by adding missing <li> tag.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-03-10 00:58:37 -05:00
Edward Z. Yang
4d612d5a77 Improve handling of malformed object parameters.
When specifying source material for <object> tags, you must use
data inside the object tag as well as specify movie in a param.
If you specify a src (which is the appropriate markup for <embed>)
we now convert and fill in the other attributes appropriately.

Also, fix a PHP warning in Generator code.

Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-03-09 17:29:38 -05:00
Edward Z. Yang
63a854ee5d Remove call-time pass-by-reference.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-03-08 03:45:11 -05:00
Edward Z. Yang
0229458f8f Implement Internet Explorer compatibility code for embedded content.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-03-08 01:56:40 -05:00
Edward Z. Yang
baa477ac08 Truncate alt text from src if it's too long.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-03-08 01:22:21 -05:00
Edward Z. Yang
dc90e8e85b Support flashvars.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-03-08 01:16:57 -05:00
Edward Z. Yang
97125ed18b Implement data URI scheme.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-03-07 21:45:39 -05:00
Paul Stone
9a9036c689 Implement auto-formatter that removes empty span tags.
Signed-off-by: Paul Stone <patches@pdjs.co.uk>
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-03-07 18:59:33 -05:00
Edward Z. Yang
aea7d02dfe Support YouTube slideshow embedding.
YouTube slideshows contain a /cp/, not a /v/, in their URL;
relax the YouTube filter to allow them.

Signed-off-by: Nigel McNie <nigel@catalyst.net.nz>
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-03-07 18:57:22 -05:00
Brian DeRocher
b3ca1498c2 Add boolean value flag for PEARSax3 for testing if a token is empty.
Signed-off-by: Brian DeRocher <brian@derocher.org>
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-02-26 21:36:51 -05:00
Edward Z. Yang
ac18672aba Fix extant broken PEARSax3 parsing patterns.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-02-26 21:14:52 -05:00
Edward Z. Yang
faf28682ad Manually work around PEARSax3 E_STRICT errors.
Previously, my development environment was not running the PEARSax3
tests because my environment was set to E_STRICT error handling, and
thus the tests were skipped.  Relax this requirement by making the
wrapper class E_STRICT safe.  This introduces a few failing tests.

Also update TODO and add another fresh test.

Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-02-26 20:42:42 -05:00
Edward Z. Yang
e2cd852bcf Add shebang line to tests index script.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-02-15 02:55:43 -05:00
Edward Z. Yang
694583259c Fix autoparagraph bug with non-inline elements.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-02-15 02:55:33 -05:00
Edward Z. Yang
bde4de3c78 Update TODO.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2009-08-27 20:17:41 -04:00
Edward Z. Yang
5b4e5c983e Support proprietary height attribute on table.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2009-08-27 20:17:24 -04:00
Edward Z. Yang
1ad8fd5ce9 Gracefully deal with null injectors.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2009-08-27 20:03:31 -04:00
Edward Z. Yang
6bdf161afd Update TODO.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2009-07-15 14:50:52 -04:00
Edward Z. Yang
af45a6c191 Release Phorum module 4.0.0.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2009-07-09 21:12:35 -04:00
Edward Z. Yang
2b72d0445f Add 4.1.0 release NEWS entry.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2009-07-09 21:03:46 -04:00
Edward Z. Yang
d7b3117678 Add doxygen doc scripts, and fix package.php
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2009-07-08 22:11:15 -04:00
Edward Z. Yang
53ff3e2744 Release 4.0.0.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2009-07-07 22:41:01 -04:00
Edward Z. Yang
6776efccdd Update configuration scanner to parse new format.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2009-07-07 22:32:44 -04:00
Edward Z. Yang
ba9fd175d7 Make extractBody not terminate prematurely on first </body>.
Previously, if two </body> tags were present, HTML Purifier
would truncate everything after the first </body>.  This is
not ideal behavior; so HTML Purifier has been changed to
match up to the last </body>.

Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2009-07-07 22:19:04 -04:00
Edward Z. Yang
4d27906b02 Make %URI.Munge respect %URI.Host (don't munge).
%URI.Munge incorrectly munged URIs that pointed to the
same host as the current website (it did, however, have
the correct behavior for when the munge URL was on the
same server).

Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2009-07-06 22:04:51 -04:00
Edward Z. Yang
8f573df3dc XHTML 2 is dead. Long live XHTML 2.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2009-07-02 15:43:42 -04:00
Edward Z. Yang
c7594487a2 Fix inability to totally override content model.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2009-06-10 18:24:52 -04:00
Edward Z. Yang
733a5ce5c3 Fix allowsElement() bug manifesting in LinkifyTest.
Thanks frank farmer for reporting.

Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2009-06-10 18:11:34 -04:00
Edward Z. Yang
e8abd5953c Fix prototype impedance in HTMLDefinition and typo in
docs/enduser-customize.html
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2009-06-07 16:05:46 -04:00
Edward Z. Yang
1b8c8865b2 Fix PHP 5.3.0 problem with numeric indices causing -0 problem.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2009-06-07 16:04:07 -04:00
Edward Z. Yang
6e66dc9cad Add HTMLPurifier_config->serialize()
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2009-05-30 00:25:14 -04:00
Edward Z. Yang
77b60a4206 Update documentation to new configuration format.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2009-05-29 23:46:40 -04:00
Edward Z. Yang
5bf7ac4e9f Add docs and facilities for having separate directories of schemas.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2009-05-29 22:16:35 -04:00
Edward Z. Yang
a025203b18 Minor updates to Config and TODO items thereof.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2009-05-29 18:03:57 -04:00
Edward Z. Yang
809da84ae1 Ignore tags files (from exuberant ctags)
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2009-05-29 18:03:44 -04:00
Edward Z. Yang
777781a95c Don't have mute error handler be private.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2009-05-29 17:59:30 -04:00
Edward Z. Yang
4a87f732ca Fix two minor bugs, updating Phorum and removing unused $dir variable.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2009-05-27 01:17:23 -04:00
Edward Z. Yang
a2885181df Update TODO file.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2009-05-26 12:55:09 -04:00
Edward Z. Yang
84abae08f5 Relax allowed values of class for certain doctypes, see %Attr.ClassUseCDATA
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2009-05-26 01:07:40 -04:00
Edward Z. Yang
10e2d32a79 Lock configuration objects to a single namespace, to help prevent bugs.
* Also, fix a slight bug with URI definition clearing.

Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2009-05-25 23:38:49 -04:00
Edward Z. Yang
baf053b016 Implement %Attr.AllowedClasses and %Attr.ForbiddenClasses.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2009-05-25 22:08:45 -04:00
Edward Z. Yang
bf71c3f392 Add documents on how to restructure configuration directives.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2009-05-25 21:54:43 -04:00
Edward Z. Yang
bfbe29d5a1 Rename ExtractStyleBlocks configuration parameters.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2009-05-25 21:54:39 -04:00
Edward Z. Yang
e194b8efc6 Rename AutoFormatParam.PurifierLinkifyDocURL.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2009-05-25 21:51:08 -04:00
Edward Z. Yang
4214ac9d67 Update TODO list.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2009-05-22 14:52:43 -04:00
Edward Z. Yang
24f761d84a Remove PHP4 cruft from URISchemeRegistry.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2009-05-13 16:14:57 -04:00
Edward Z. Yang
41c9226f3d Style refresh: add/remove vimlines, fix minor factual errors.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2009-04-09 12:47:10 -04:00
Edward Z. Yang
e3c2063f69 Implement %AutoFormat.RemoveEmpty.RemoveNbsp, by popular demand.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2009-04-09 00:53:19 -04:00
Edward Z. Yang
398a02039e Implement %HTML.Attr.Name.UseCDATA which relaxes name validation rules.
Sponsored-by: Ian Cook <thinkspill@gmail.com>
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2009-03-20 19:34:38 -04:00
Edward Z. Yang
84e2e141fc Fix bad configuration call in NameSyncTest.php.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2009-03-14 19:18:02 -04:00
Edward Z. Yang
47bbbad000 Fix typo in YouTube docs. Thanks vbMark for reporting.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2009-03-13 13:33:51 -04:00
Edward Z. Yang
eaa906f8fc Implement configuration inheritance.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2009-02-21 03:01:02 -05:00
Edward Z. Yang
86ca784da3 Convert all to new configuration get/set format.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2009-02-21 03:00:34 -05:00
Edward Z. Yang
b107eec452 Revamp configuration backend.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2009-02-21 03:00:33 -05:00
Edward Z. Yang
fcbf724e6e Make name="" and id="" play nicely together.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2009-02-21 02:58:30 -05:00
Edward Z. Yang
92344cc83a Add 4.0.0 release information.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2009-02-16 22:00:22 -05:00
Edward Z. Yang
e9f529e78f Release 3.3.0.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2009-02-16 17:18:31 -05:00
Edward Z. Yang
e802065b65 Punt Lexer test entirely for 5.0.5.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2009-02-16 17:18:30 -05:00
Edward Z. Yang
1d70929eba Add text parameter to unit tests, forces text output.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2009-02-16 17:18:30 -05:00
Edward Z. Yang
77f57aa264 Fix CSSDefinition Printer problems with important decorator.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2009-02-15 14:11:22 -05:00
Edward Z. Yang
db218c7b2b Fix YouTube rendering problem on versions of Firefox.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2009-02-15 14:11:21 -05:00
Edward Z. Yang
762c089431 Ignore generated test-schema.html file in smoketests.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2009-02-15 14:11:20 -05:00
Edward Z. Yang
07ed1bbf8c Fix broken trusted comments functionality.
This fix is slightly hackish, as we simply treat comments as whitespace.
This should largely be correct, and breaks no current test cases,
although it could result in noncompliant behavior.

Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2009-02-05 18:04:10 -05:00
Edward Z. Yang
b9094d5ec8 Convert HTMLPurifier_Config to use property list backend.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2009-02-02 18:42:23 -05:00
Edward Z. Yang
b31f280d41 Ignore htmlt.ini files.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2009-02-02 18:18:10 -05:00
Edward Z. Yang
1b962e68f0 Downgrade directory not found and permissions errors to warnings.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2009-02-02 10:08:37 -05:00
Edward Z. Yang
0c9dc02d4a Use default configuration when resetting; prevents zombie defaults for encodings from carrying over.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2009-01-30 17:42:41 -05:00
Edward Z. Yang
bfe474042f Implement "carryover" functionality, requested by Kinderlehrer <bitweaver@7doves.com>
This commit is a limited implementation of the "active formatting
elements" algorithm implemented in HTML5, which preserves certain
formatting elements such as <a> and <b> when exiting or entering nodes.

Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-12-20 13:06:00 -05:00
Edward Z. Yang
119ebcda71 Implement user-friendly links to test-cases on web tester.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-12-20 13:01:20 -05:00
Edward Z. Yang
3dfcd016d3 Fix standards-compliance issue with YouTube filter with double hyphens.
Thanks Pierre Attar for reporting.

Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-12-12 16:27:23 -05:00
Edward Z. Yang
0c9dfc6c3d Don't add vimline to auto-generated files.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-12-12 15:44:13 -05:00
Edward Z. Yang
33a873f5cb Fix missing numbers when pass/fail count is zero.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-12-06 16:08:09 -05:00
Edward Z. Yang
12b811d749 Add vim modelines to all files.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-12-06 04:24:59 -05:00
Edward Z. Yang
781f9a4084 Update PH5P.patch, and add NEWS entry for trailing whitespace purge.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-12-06 02:30:52 -05:00
Edward Z. Yang
2c955af135 Remove trailing whitespace.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-12-06 02:28:20 -05:00
Edward Z. Yang
3a6b63dff1 Generic implementation of property-lists.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-12-06 00:43:42 -05:00
Edward Z. Yang
90110a4e3a Fix broken test-suite in early versions of PHP.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-12-05 15:50:59 -05:00
Edward Z. Yang
d67e17a69c Remove unnecessary svn wrapper include.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-12-03 12:45:31 -05:00
Edward Z. Yang
5cfecebb33 Fix bug involving whitespace-only nodes. Thanks Eric Wald for reporting.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-12-02 20:13:47 -05:00
Edward Z. Yang
f5cd2c07ea Implement 'overflow' CSS property.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-11-27 16:14:50 -05:00
Edward Z. Yang
6691676666 Fix newline issues in tests.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-11-26 15:30:59 -05:00
Edward Z. Yang
e128c09132 Fix bug with testEncodingSupportsASCII() with strange iconv
implementations.

Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-11-26 15:17:09 -05:00
Edward Z. Yang
527f154d3d Add verbose mode to command line test runner.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-11-23 20:45:21 -05:00
Edward Z. Yang
778ddf7c96 Turn on unit tests for UnitConverter.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-11-23 20:43:58 -05:00
Edward Z. Yang
c5d4b1ec93 Fix missing version number in config directive, and add TODO item.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-11-05 02:56:13 -05:00
Edward Z. Yang
6fe6cc8901 Update gitignore with post-release files, new NEWS entry and spellcheck UTF-8.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-11-01 01:51:51 -04:00
Edward Z. Yang
280211f70b Release 3.2.0.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-10-31 16:30:54 -04:00
Edward Z. Yang
3fd51d527c Add a nod to the RFC's recommendation that UTF-8 be used in URIs.
Mentioned in http://unspecified.wordpress.com/2008/06/29/do-browsers-encode-urls-correctly/

Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-10-31 12:55:07 -04:00
Edward Z. Yang
0e6e2c4edf Bump descriptions to 3.2.0.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-10-31 12:25:43 -04:00
Edward Z. Yang
22d24e6b04 Add error planning documentation.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-10-25 02:11:58 -04:00
Edward Z. Yang
3a2fd0b5db Improve floating point scaling in UnitConverter.
When precision dictates that a number be zero padded, we cannot give sprintf()
a negative precision specifier.  This commit implements manual negative precision
printing of floats, taking into account common rounding errors with floating
point numbers.

Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-10-24 12:50:59 -04:00
Edward Z. Yang
25fa53c15b Fix misleading entry in NEWS.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-10-24 12:28:19 -04:00
David Morton
0b6ae1c3c1 Custom Injector to display URL address along with link text.
When viewing potentially hostile html, it may be helpful to see what
a given link was pointing to.  This new injector takes the href
attribute and adds the text after the link, and deletes the href
attribute.

Other forms of display could easily be contrived, but this seems to be
a good basic way to present the information.

Signed-off-by: David Morton <mortonda@dgrmm.net>
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-10-23 17:11:29 -04:00
Edward Z. Yang
ab263a0bf1 Rewrite spurious encoding test, as utf8 is sometimes useful.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-10-23 15:22:31 -04:00
Edward Z. Yang
c5b18d345c Fix validation error in documentation index.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-10-16 20:08:01 -04:00
Edward Z. Yang
d26418ca3a Ignore runtime files in Phorum plugin directory.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-10-11 23:20:12 -04:00
Edward Z. Yang
d304c5c976 Detect if domxml extension is loaded, and use DirectLex accordingly.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-10-08 17:06:10 -04:00
Edward Z. Yang
f7bc0b0875 Implement %Attr.DefaultImageAlt, allowing overriding default behavior for alt attributes.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-10-06 14:51:03 -04:00
Edward Z. Yang
70515dd48f Increase test coverage, and modify handleEnd behavior to only see correct tokens.
Previously, handleEnd was called for any end tag, except ones that were obviously
spurious because there were no parent tags. Now, it is only called for end tags
that are "approved." If an injector operates on the end tag, we automatically
punt. There may be some optimizations that could be made to this procedure,
but for now it's much more consistent.

Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-10-01 15:40:31 -04:00
Edward Z. Yang
1555cb617f Minor refactoring and bugfixing of Injector and MakeWellFormed.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-10-01 04:10:41 -04:00
Edward Z. Yang
cd4500457e More refactoring to MakeWellFormed and Injectors; they work better than ever now!
Major paradigm shift in this commit is bailing ship on the "skip" integers, which
were extremely buggy and error prone, and simply mark tokens as processed or
not processed by injectors. Other notable changes:

- Removed ad hoc decrements to inputIndex in favor of $reprocess flag variable
- Moved rewind outside of processToken()
- Make rewind properly ignore all other injectors
- Cleanup end of document code
- Reconfigure injector loops to account for skips and rewinds
- Punt the empty to start/end transformation
- Completely rewrite processToken to be array based
- Added skip and rewind member variables to tokens
- Fixed a longstanding bug with remove empty!

Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-10-01 03:14:28 -04:00
Edward Z. Yang
fa413e96ac Implement Injector->handleEnd, with lots of refactoring for injector.
Previous design of injector streaming involved editability only to start, empty
and text tokens, because they could be safely modified without causing formedness
errors.  By modifying notifyEnd to operate before MakeWellFormed's safeguards
kick into effect, it can be converted into a handle function, allowing for
arbitrary modification of end tags.

This change involved quite a bit of restructuring of the MakeWellFormed code,
including the moving of end of document tags to inside the loop, so rewinding
on those tags would be functional, increased reuse of the end tag codepath by
code that inserts end tags (as they could be changed out from under you), and
processToken modified to have an extra parameter to force re-processing of
a token if the original token was an end token.

We're not exactly sure if handleEnd works at this point, but the important
talking point about this refactoring is that nothing else broke. Also, a number
of convenience functions were moved from AutoParagraph to the Injector
supertype (specifically: forward, forwardToEndToken, backward, and current).

Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-10-01 00:54:51 -04:00
Edward Z. Yang
d0fdcc103e Add support for proprietary "background" attribute in table elements.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-09-27 21:19:35 -04:00
Edward Z. Yang
6a06b92f0c Setup ErrorCollector to maintain new error format, and output that HTML.
Also changed:
    - DirectLex keeps track of column numbers in context
    - New class HTMLPurifier_ErrorStruct

Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-09-15 19:08:58 -04:00
Edward Z. Yang
3184fee468 Undo start()/end() error collector changes in AttrValidator.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-09-05 17:25:35 -04:00
Edward Z. Yang
ed7983b559 Refactor lexer instantiation logic with exceptions and forced line tracking.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-09-05 14:04:23 -04:00
Edward Z. Yang
92df9e5b28 Update customize docs to use new directive name for definition caching.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-09-02 16:38:40 -04:00
Edward Z. Yang
2f41bd07fa Update docs, removing $Id$ and linking to repo.or.cz.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-09-02 15:01:25 -04:00
Edward Z. Yang
c6914dce51 Track column numbers in addition to line numbers.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-09-01 14:10:10 -04:00
Edward Z. Yang
9977350143 Fix bug with anonymous module and the SafeObject/SafeEmbed modules.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-08-31 19:06:25 -04:00
Edward Z. Yang
d9e60350d3 Migrate AttrValidator to nested error format; modify generator logic in ErrorCollector.
AttrValidator's changes are fairly self-explanatory, but ErrorCollector's
changes are worth a little discussion.  ErrorCollector can use generators at
various points during its flow control; there are two distinct generators that
it should use: 1. The one used for the output, and 2. The one used for the
error output.  These will usually be the same, but in the odd case where they
need to be different, getHTMLFormatted() will accept an alterate configuration
object with an appropriate doctype.

Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-08-18 22:13:58 -04:00
Edward Z. Yang
c807ed5fe2 Implement nested error collection with start() and end() in ErrorCollector.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-08-16 00:41:34 -04:00
Edward Z. Yang
c9b6f125aa Forms implementation for %HTML.Trusted. Some backend changes:
* Added Charsets and Character attribute types
* Fix a heavily recursive form of ContentSets, this allows a content-set
  to include another content-set which includes another content-set, and
  so forth.

Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-08-15 18:57:44 -04:00
Edward Z. Yang
dc28346677 Fix bug where absolute paths with dots/double-dots were not collapsed.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-08-15 13:12:54 -04:00
Edward Z. Yang
8423daef05 Increase test coverage for MakeAbsolute.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-08-13 23:19:38 -04:00
Edward Z. Yang
617f70a8ac Improve auto-paragraph to preserve newlines and handle edge-cases better.
This is a very large commit that includes numerous improvements to the
AutoParagraph injector.  These are:

* Rewritten flow control of the injector to use almost exclusively
  binary conditionals.
* Improved inline documentation with "State" comments, which give concise
  examples of what the token stack looks like at flow points.
* Documentation for all flow branches, even those with no actions.
* Factoring out of common operations to improve readability, especially the
  new iterator private methods.
* Expanded test-suite which covers new flow points, and corrects some errors
  in previous cases.

Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-08-10 00:32:29 -04:00
Edward Z. Yang
0423985b45 Detect if HTML support in DOM is disabled by checking loadHTML().
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-08-07 18:44:21 -04:00
Edward Z. Yang
e013bc9126 Fix bug involving autoclose and inline elements in strict <blockquote>.
The newest autoclose code uses the elements property in whether or not an
element should be closed by a particular tag.  The heuristic is simple; if
the element doesn't allow that tag as a child, it closes the parent
container.  This doesn't work, however, with <blockquote>, which while not
allowing inline styles under Strict doctypes, requires them to be passed
through MakeWellFormed.

The fix was to transition MakeWellFormed to call a method to retrieve the
elements, and then have StrictBlockquote implement a special version of
this method.  Future versions of HTML Purifier may be more flexible in this
regard--further study of the HTML5 specification is required.

Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-08-01 20:52:06 -04:00
Edward Z. Yang
1d90bb2397 Allow <![CDATA[<body>...</body>]]> not to trigger Core.ConvertDocumentToFragment
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-08-01 19:06:28 -04:00
Edward Z. Yang
03dabec2c0 Fix documentation error in Filter.ExtractStyleBlocks and give better example.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-08-01 18:58:47 -04:00
Edward Z. Yang
85090520f1 Add double-munging protection by checking if the domains are the same.
Previously, if an absolute munge URL location was used, HTML passed through
HTML Purifier multiple times would be munged multiple times. This patch
checks if the output URI has the same URI as the input URI; if they do,
the munge is considered unnecessary and discarded.

Requested-by: Chris <justbittin@gmail.com>
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-07-26 22:45:19 -06:00
Edward Z. Yang
3b6aa10592 %URI.DisableExternal(Resources) uses %URI.Base if %URI.Host is not available.
As part of its duties, URIDefinition determine the base URL and the host URL
of the page based on the two corresponding configuration directives. The
DisableExternal URIFilter, however, bypassed this check by directly checking
%URI.Host. This fix forwards the call through URIDefinition.

Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-07-10 18:46:46 -04:00
Edward Z. Yang
3a4b92da81 Slight optimization in LinkTypes using array_keys().
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-07-08 21:47:52 -04:00
Edward Z. Yang
0ec9731184 Update TODO to add IDNA support along with IRI support.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-07-08 20:47:44 -04:00
Edward Z. Yang
e05bd77344 Implement HTMLT tests, and migrate HTMLPurifierTest to this format.
HTMLT tests are a compact and easy-to-use way of making assertPurification
type tests. They take the format of:

--INI--
Ns.Directive = "directive value"
--HTML--
Input HTML
--EXPECT--
Expected HTML

Expect more features and migration to be coming soon.

Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-07-07 08:59:33 -04:00
Edward Z. Yang
334ffac5b4 Various improvements to test script command line options, i.e. --type
The following changes were made:
* Create --type parameter which accepts 'htmlpurifier', 'phpt', 'vtest', etc.
  in order to execute only that class of tests. This supercedes --only-phpt.
* Create --quick parameter for multitest.php, run only the tips of each
  release series.
* Create --distro parameter for multitest.php, supercedes --exclude-normal
  and --exclude-standalone.

Also, a grep for htmlt tests was added, although add_tests() doesn't do
anything with it yet.

Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-07-07 08:59:29 -04:00
Edward Z. Yang
a227cb483a Allow empty sections in string hashes; previously they were left undefined.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-07-07 08:57:16 -04:00
Edward Z. Yang
aa0fdeee30 Refine Lexers for parsing stray angled brackets; %Core.AggressivelyFixLt = true
By default, the DirectLex and DOMLex behavior with stray angled brackets
varied a great deal due to their implementations. A little known directive
%Core.AggressivelyFixLt attempted to match DOMLex's behavior with DirectLex's,
but it was off by default. By turning it on by default, users now enjoy these
benefits, and performance-minded users can turn it back off.

Also, several refinements to stray angled bracket parsing was made. Specifically:

* DirectLex: Handle each left angled bracket individually, which prevents
  strange behavior as reported by eon.
* DOMLex: Iterate aggressive lt fix, so that stacked brackets like << are
  handled.

Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-07-07 08:52:29 -04:00
Edward Z. Yang
ba418a1f19 Redirect stderr to stdout when calling flush.php
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-07-05 03:15:36 -04:00
Edward Z. Yang
c845f0bb78 Give warnings when attempting to use encoding iconv doesn't support.
Previously, attempting to set %Core.Encoding to an encoding iconv didn't
know about would result in a silent failure, with the return of the
boolean false. Now it will fatally error out.

Reported-by: mcgrailm <mgm19@psu.edu>
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-07-05 03:14:32 -04:00
Edward Z. Yang
594268ca3b Fix two bugs in MakeAbsolute filter involving base URIs that have empty path.
The bugs are:
* Undefined $is_folder variable when path is empty, and
* Improper concatenation of host and path together.

Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-07-05 03:12:44 -04:00
Edward Z. Yang
965be3bd73 Add support for unrecognized elements in MakeWellFormed.
The MakeWellFormed strategy uses metadata from HTMLDefinition in order to
determine whether or not tokens need to be converted or tags need to be
auto-closed. While this functionality is good to have, it is by no means
essential, and MakeWellFormed should not error when this information is not
available.

Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-07-05 03:11:29 -04:00
Edward Z. Yang
700d5bcbfc Implement %AutoFormat.RemoveEmpty, end to start ref, and injector rewind.
Injector rewind: Injectors can now use the method rewind() in order to move
the input index backwards, so that they can reprocess tokens (other injectors
are not affected by a rewind). This functionality was necessary to implement
nested node removals in %AutoFormat.RemoveEmpty.

End to start ref: To facilitate rewinding, HTMLPurifier_Token_End now
maintains a reference called $start to the starting token for their node.

%AutoFormat.RemoveEmpty removes empty nodes. Lots of people have requested
it, so here is a partially effective implementation. Because it is implemented
as an Injector, it's not possible for it to handle newly introduced empty
nodes by later validators, specifically auto-closing and child validation.
The Injector is only meant to be used on HTML-ish languages.

Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-06-27 16:09:14 -04:00
Edward Z. Yang
fd384129bf Proper support for name attribute in <a> and <img>
Prior to this commit, the name attribute was unilaterally removed, except
for Strict doctypes or a heavy TidyLevel, when it was converted to an id
attribute. As name is actually permitted in both HTML 4.01 Strict and
XHTML 1.0 Strict, although deprecated, the more sensible default behavior
is to allow it unless TidyLevel is heavy.

Our implementation is slightly stricter than the specs, as name attributes are
treated as first class IDs, disallowing <a name="foo" id="foo"> or duplicate
names. The former should be treated as a special case, but that will be
a separate commit.

Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-06-27 15:44:27 -04:00
Edward Z. Yang
f8b47c64dd Make Strategy_MakeWellFormed operate in place.
Previously, MakeWellFormed processed tokens and appended them onto an output
array, which was presumably immutable and inaccessible to Injectors. By
having MakeWellFormed operate directly on the input array, the strategy
saves memory and will also allow for a rewind implementation, as a unifying
the two arrays allows Injectors to easily determine an index behind them they'd
like to reset state to.

Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-06-27 01:33:48 -04:00
Edward Z. Yang
a5ceb1e22a Update printTokens() debug function to work with new Generator API.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-06-27 01:33:20 -04:00
Edward Z. Yang
636e2883df Add ignore rules for configdoc generated files.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-06-27 00:14:39 -04:00
Edward Z. Yang
dba3ed7770 [3.1.2] Implement comments when %HTML.Trusted is on.
Some implementation notes: not all comments are valid; HTML makes sure
double-hyphens and trailing hyphens are not found in comments. In addition,
two new localizable messages were added.

Requested-by: Waldo Jaquith <waldo@vqronline.org>
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-06-25 23:12:19 -04:00
Edward Z. Yang
de9869d942 Ignore .phpt.skip.php files.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-06-25 23:10:03 -04:00
Edward Z. Yang
cfcdce0db8 Ignore test-settings.php 2008-06-25 22:47:12 -04:00
Edward Z. Yang
6bc04e0e10 Rename dummy file to proper location.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-06-25 22:43:55 -04:00
Edward Z. Yang
24f6db6fb2 [3.1.2] Add %Output.SortAttr to deal with FCKeditor bug
If %Output.SortAttr is true, attributes are sorted to be
in alphabetical order. This was requested by frank farmer.

See also: http://htmlpurifier.org/phorum/read.php?2,1576

Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-06-24 22:36:27 -04:00
Edward Z. Yang
85fb192d93 Remove incorrect information about bit-size
UTF-8 is a variable-width encoding that uses octets, UTF-16
is a variable-width encoding that uses 16-bit words, and
UCS-2 is an obsolete fixed-width encoding that doesn't not
support characters beyond the BMP. Explaining this would be
unwieldly, so we just removed the information.

See also: http://www.reddit.com/info/6mlqc/comments/c04aold

Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-06-24 22:12:56 -04:00
Edward Z. Yang
7727cea112 Add Git specific files and configuration
* Setup usage.xml to be binary, as XMLWriter does not honor operating
  system's newline format.
* Setup various files to ignore (svn:ignore was not carried over)
* Add dummy files to prevent git from ignoring empty directories

Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-06-24 22:02:16 -04:00
Edward Z. Yang
6bb8c1fcac Handle CRLF discrepancies
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-06-24 21:10:51 -04:00
Edward Z. Yang
a84b6d5be0 Add new NEWS entries
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1824 48356398-32a2-884e-a903-53898d9a118a
2008-06-21 05:00:17 +00:00
Edward Z. Yang
6e43cac9c9 Add some extra helpful data for FOCUS
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1821 48356398-32a2-884e-a903-53898d9a118a
2008-06-20 02:59:01 +00:00
Edward Z. Yang
656a0c95bf Add update Freshmeat script.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1818 48356398-32a2-884e-a903-53898d9a118a
2008-06-20 01:48:46 +00:00
Edward Z. Yang
7015aaff46 Release 3.1.1
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1808 48356398-32a2-884e-a903-53898d9a118a
2008-06-19 21:43:57 +00:00
Edward Z. Yang
1009bd41a6 var -> public
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1807 48356398-32a2-884e-a903-53898d9a118a
2008-06-19 21:24:50 +00:00
Edward Z. Yang
511dfe2d4a [3.1.1] Update Munge docs.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1804 48356398-32a2-884e-a903-53898d9a118a
2008-06-19 19:06:55 +00:00
Edward Z. Yang
463aa3a0fa [3.1.1] General munge improvements
- Add CurrentCSSProperty context variable
- Move Munge to its own class, derived off of SecureMunge.
- Rename %URI.SecureMunge to %URI.Munge
- Rename %URI.SecureMungeSecretKey to %URI.MungeSecretKey
- Add extra substitutions for munge

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1803 48356398-32a2-884e-a903-53898d9a118a
2008-06-18 03:29:27 +00:00
Edward Z. Yang
7189ec2790 Update TODO
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1802 48356398-32a2-884e-a903-53898d9a118a
2008-06-17 04:00:03 +00:00
Edward Z. Yang
e901d832ab Update Modx plugin to work with HTML Purifier 3.1.0.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1801 48356398-32a2-884e-a903-53898d9a118a
2008-06-17 03:41:40 +00:00
Edward Z. Yang
643ed1bddc [3.1.1] Fix text-decoration: none bug
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1799 48356398-32a2-884e-a903-53898d9a118a
2008-06-17 03:12:50 +00:00
Edward Z. Yang
41830cd902 Update TODO
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1798 48356398-32a2-884e-a903-53898d9a118a
2008-06-17 02:40:38 +00:00
Edward Z. Yang
261aa1aeaa Update news, installer, and add an extra specimen.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1796 48356398-32a2-884e-a903-53898d9a118a
2008-06-15 22:13:16 +00:00
Edward Z. Yang
486b401cf7 Fix broken tests.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1795 48356398-32a2-884e-a903-53898d9a118a
2008-06-12 03:12:39 +00:00
Edward Z. Yang
f2794e59c5 [3.1.1] Mimick movie value in data if not set in safe object.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1794 48356398-32a2-884e-a903-53898d9a118a
2008-06-11 23:12:38 +00:00
Edward Z. Yang
d702077d2e Undo redundant embedded URI check.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1782 48356398-32a2-884e-a903-53898d9a118a
2008-06-10 01:23:11 +00:00
Edward Z. Yang
36bd06d53e [3.1.1] Implement SafeEmbed. Also, miscellaneous bugfixes.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1781 48356398-32a2-884e-a903-53898d9a118a
2008-06-10 01:18:03 +00:00
Edward Z. Yang
13eb016e06 [3.1.1] Implement SafeObject.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1780 48356398-32a2-884e-a903-53898d9a118a
2008-06-10 00:13:44 +00:00
Edward Z. Yang
32025a12e1 [3.1.1] Allow injectors to be specified by modules.
- Make method for URI implemented
- Split out checkNeeded in Injector from prepare

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1779 48356398-32a2-884e-a903-53898d9a118a
2008-06-09 01:23:05 +00:00
Edward Z. Yang
7dae94c44b Update TODO.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1778 48356398-32a2-884e-a903-53898d9a118a
2008-06-08 16:57:48 +00:00
Edward Z. Yang
54cc691ba7 Update TODO.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1777 48356398-32a2-884e-a903-53898d9a118a
2008-06-04 22:52:48 +00:00
Edward Z. Yang
3af2ff8f98 Fix bug with SecureMunge regarding embedded URIs.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1775 48356398-32a2-884e-a903-53898d9a118a
2008-06-02 17:39:29 +00:00
Edward Z. Yang
36fb284d2f Add integration test, and fix broken SecureMunge
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1774 48356398-32a2-884e-a903-53898d9a118a
2008-05-27 17:47:25 +00:00
Edward Z. Yang
8d1f1e8e73 [3.1.1] Improved adherence to Unicode by checking for non-character codepoints. Thanks Geoffrey Sneddon for reporting.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1773 48356398-32a2-884e-a903-53898d9a118a
2008-05-26 21:27:52 +00:00
Edward Z. Yang
322288e6c0 [3.1.1] Implement %URI.SecureMunge and %URI.SecureMungeSecretKey, thanks Chris!
- URIFilter->prepare can return false in order to abort loading of the filter
- Implemented post URI filtering. Set member variable $post to true to set a URIFilter as such.

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1772 48356398-32a2-884e-a903-53898d9a118a
2008-05-26 16:26:47 +00:00
Edward Z. Yang
3c4346cb1e Fix back-compat regressions. Also, compactify configuration code.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1771 48356398-32a2-884e-a903-53898d9a118a
2008-05-26 04:35:12 +00:00
Edward Z. Yang
14d934c7ca [3.1.1] Land vs's HTMLPurifier_Generator patch, and a number of other bugfixes for that change
- Convert a number of calls to use new constructor signature for Generator
- Make generator require configuration; this exposes a number of latent bugs
- Removed generator hack
- Convert Printers to use new optimized ConfigSchema format
- Hack with Printer configuration; pass an array(generator config, render config) to distinguish between output and target.
- HTML/CSS Printers need to be primed, otherwise fatal errors
- Convert a few test-cases to use member properties

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1770 48356398-32a2-884e-a903-53898d9a118a
2008-05-26 04:05:48 +00:00
Edward Z. Yang
bb16d8eae5 [3.1.1] Fix Shift_JIS encoding wonkiness with yen symbols and whatnot
- Improve parseCDATA algorithm to take into account newline normalization
- Fix regression in FontFamily validator. We now have a legit parser in place, albeit somewhat limited in use. Will be superseded by parser for entire grammar
- Convert EncoderTest to new format

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1769 48356398-32a2-884e-a903-53898d9a118a
2008-05-25 05:40:20 +00:00
Edward Z. Yang
10530d7f81 [3.1.1] Fix stray backslashes in font-family.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1768 48356398-32a2-884e-a903-53898d9a118a
2008-05-24 18:19:36 +00:00
Edward Z. Yang
c7e172f660 Update news: not 1/2, more like 1/3
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1767 48356398-32a2-884e-a903-53898d9a118a
2008-05-23 17:15:13 +00:00
Edward Z. Yang
917d2ea5ef [3.1.1] More ConfigSchema optimizations: degenerate form can accommodate type and allow_null
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1766 48356398-32a2-884e-a903-53898d9a118a
2008-05-23 17:10:26 +00:00
Edward Z. Yang
895141e0b5 [3.1.1] Further optimize ConfigSchema by eliminating stdclass when only type is set.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1765 48356398-32a2-884e-a903-53898d9a118a
2008-05-23 17:00:58 +00:00
Edward Z. Yang
8ab30e24b7 [3.1.1] Memory optimizations for ConfigSchema. Changes include:
- Elimination of ConfigDef and subclasses in favor of stdclass. Most property names stay the same
- Added benchmark script for ConfigSchema
- Types are internally handled as magic integers. Use HTMLPurifier_VarParser->getTypeName to convert to human readable form. HTMLPurifier_VarParser still accepts strings.
- Parser in config schema only used for legacy interface


git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1764 48356398-32a2-884e-a903-53898d9a118a
2008-05-23 16:43:24 +00:00
Edward Z. Yang
9db891c3aa Add benchmark script for ConfigSchema.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1763 48356398-32a2-884e-a903-53898d9a118a
2008-05-23 15:24:15 +00:00
Edward Z. Yang
eb9f9bc7f6 [3.1.1] Round up imagecrash support with HTML.MaxImgLength
- Add $max to AttrDef/HTML/Pixels.php
- Add %HTML.MaxImgLength
- CSS width/height allows percents when MaxImgLength is disabled


git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1762 48356398-32a2-884e-a903-53898d9a118a
2008-05-23 02:09:43 +00:00
Edward Z. Yang
fcebb7731d [3.1.1] Migrate all HTMLModules to use setup($config) rather than __construct
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1761 48356398-32a2-884e-a903-53898d9a118a
2008-05-22 19:36:59 +00:00
Edward Z. Yang
8d0d0d1a03 [3.1.1] construct() to setup() in HTMLModules
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1760 48356398-32a2-884e-a903-53898d9a118a
2008-05-22 04:34:19 +00:00
Edward Z. Yang
80f59206d7 [3.1.1] Implement percent encoding for URI query and fragment
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1758 48356398-32a2-884e-a903-53898d9a118a
2008-05-21 02:58:41 +00:00
Edward Z. Yang
af3f5190dc [3.1.1] Lazy token updating for HTMLPurifier/AttrValidator.php
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1757 48356398-32a2-884e-a903-53898d9a118a
2008-05-21 02:30:27 +00:00
Edward Z. Yang
5620241165 [3.1.1] Disable percent height/width attributes for img
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1756 48356398-32a2-884e-a903-53898d9a118a
2008-05-21 02:01:25 +00:00
Edward Z. Yang
c06727190e Update NEWS accordingly.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1755 48356398-32a2-884e-a903-53898d9a118a
2008-05-21 01:58:48 +00:00
Edward Z. Yang
1a95852007 [3.1.1] Implement more robust imagecrash protection for CSS width/height.
- Change API for HTMLPurifier_AttrDef_CSS_Length
- Implement HTMLPurifier_AttrDef_Switch class
- Implement HTMLPurifier_Length->compareTo, and make make() accept object instances

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1754 48356398-32a2-884e-a903-53898d9a118a
2008-05-21 01:56:48 +00:00
Edward Z. Yang
c3fab7200e Add support for pixel as a pseudo-English unit.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1753 48356398-32a2-884e-a903-53898d9a118a
2008-05-21 00:42:55 +00:00
Edward Z. Yang
6d7a17e9b6 Implement without-bcmath compatible UnitConverter. We might want to factor our floating point fudges. These calculations are only accurate for small precisions, and are architecture-dependent. (Unit tests seem to work on 32bit, though).
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1752 48356398-32a2-884e-a903-53898d9a118a
2008-05-21 00:29:31 +00:00
Edward Z. Yang
64b5581bf2 [3.1.1] Have CSS/Length.php use the new Length class. Also, put onus of non-negative to callee, which would compare $n.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1751 48356398-32a2-884e-a903-53898d9a118a
2008-05-20 23:15:20 +00:00
Edward Z. Yang
d8da5ff406 Finally stabilize the unit converter.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1750 48356398-32a2-884e-a903-53898d9a118a
2008-05-20 21:23:38 +00:00
Edward Z. Yang
fda310f1e7 Update UnitConverter to deal more correctly with X.XX... decimals. Not complete.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1749 48356398-32a2-884e-a903-53898d9a118a
2008-05-20 17:48:15 +00:00
Edward Z. Yang
fc7dbdbd33 Disable Tidy test completely.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1748 48356398-32a2-884e-a903-53898d9a118a
2008-05-20 17:14:08 +00:00
Edward Z. Yang
02ac821503 Update TODO and run flush.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1747 48356398-32a2-884e-a903-53898d9a118a
2008-05-20 01:31:51 +00:00
Edward Z. Yang
16fa73afa0 [3.1.1] Added HTMLPurifier_UnitConverter and HTMLPurifier_Length for convenient handling of CSS-style lengths.
- Fixed another de-underscoring in the SimpleTest library

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1746 48356398-32a2-884e-a903-53898d9a118a
2008-05-20 01:19:00 +00:00
Edward Z. Yang
32a6afa27c Update news.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1744 48356398-32a2-884e-a903-53898d9a118a
2008-05-18 20:12:25 +00:00
Edward Z. Yang
587d642826 Release 3.1.0.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1728 48356398-32a2-884e-a903-53898d9a118a
2008-05-18 05:46:06 +00:00
Edward Z. Yang
0bef016271 [3.1.0] Get testing working again for all versions
- Standalone testing setup properly with autoload
- Bootstrap autoloader deals more robustly with classes that don't exist, preventing class_exists($class, true) from barfing.
- Cleanup $_reporter to $reporter

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1727 48356398-32a2-884e-a903-53898d9a118a
2008-05-16 01:49:33 +00:00
Edward Z. Yang
ef6a1c9274 Allow for users to load Language class files themselves. Messages are still HTML Purifier dependent; we need to figure out a way around that.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1725 48356398-32a2-884e-a903-53898d9a118a
2008-05-15 23:22:34 +00:00
Edward Z. Yang
86b1da9b6f [3.1.0] Fixed bug with fallback languages in LanguageFactory
- Also, reverted bogus Generator changes

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1723 48356398-32a2-884e-a903-53898d9a118a
2008-05-15 23:04:46 +00:00
Edward Z. Yang
00ea2062d4 [3.1.0] Fix buggy LanguageFactory. This revision is incomplete.
- Some bogus commits to Generator were made, and will be reverted next revision.

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1722 48356398-32a2-884e-a903-53898d9a118a
2008-05-15 17:47:47 +00:00
Edward Z. Yang
cb5d5d0648 [3.1.0] Revamp URI handling of percent encoding and validation.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1709 48356398-32a2-884e-a903-53898d9a118a
2008-05-14 02:19:00 +00:00
Edward Z. Yang
77ce3e8b4a [3.1.0] Extend scanner to catch $this->config; chmod new directories from Serializer. I'm not exactly sure what the implications of the bugfix are, but hopefully it won't blow up.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1708 48356398-32a2-884e-a903-53898d9a118a
2008-05-13 03:17:38 +00:00
Edward Z. Yang
e0c0d8eab6 [3.1.0] Allow arbitrary whitespace in %HTML.Allowed
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1707 48356398-32a2-884e-a903-53898d9a118a
2008-05-13 02:02:27 +00:00
Edward Z. Yang
ce46fb618c [3.1.0] Add missing tests and errors for forbidden attributes
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1706 48356398-32a2-884e-a903-53898d9a118a
2008-05-13 01:41:25 +00:00
Edward Z. Yang
9f37764614 Update TODO with items from Denis.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1702 48356398-32a2-884e-a903-53898d9a118a
2008-05-06 03:08:09 +00:00
Edward Z. Yang
aaf6ba421c Sync with SimpleTest codebase
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1701 48356398-32a2-884e-a903-53898d9a118a
2008-04-28 19:52:13 +00:00
Edward Z. Yang
4b862f64e6 [3.1.0] Fix ScriptRequired bug with trusted installs
- Generator now takes $config and $context during instantiation
- Double quotes outside of attributes are not escaped


git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1700 48356398-32a2-884e-a903-53898d9a118a
2008-04-28 01:35:07 +00:00
Edward Z. Yang
be2cfb7918 Fix latest batch of SimpleTest changes. Also, commit forgotten NEWS.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1699 48356398-32a2-884e-a903-53898d9a118a
2008-04-26 19:50:27 +00:00
Edward Z. Yang
2f29c27a59 [3.1.0] Fix broken PH5P in latest versions of DOM with bandaid; punt to DirectLex.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1698 48356398-32a2-884e-a903-53898d9a118a
2008-04-26 19:47:22 +00:00
Edward Z. Yang
144bd6f07a [3.1.0] Fix bug with 3.1.0-dev version number (the dash caused problems, so we switched to commas)
- Refactored out null definition cache during HTMLDefinition tests


git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1697 48356398-32a2-884e-a903-53898d9a118a
2008-04-26 19:28:14 +00:00
Edward Z. Yang
a95f600e76 Fix another fatal error from SimpleTest refactoring.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1696 48356398-32a2-884e-a903-53898d9a118a
2008-04-26 03:18:07 +00:00
Edward Z. Yang
84aa2ca390 [3.1.0] Implement tag@attr for Allowed and Forbidden
- Fix (or null) bug in configdoc

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1695 48356398-32a2-884e-a903-53898d9a118a
2008-04-26 03:14:01 +00:00
Edward Z. Yang
1f8619cda5 [3.1.0] Fix and revamp configForm.php smoketest
- Fix bool/null ConfigForm field

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1694 48356398-32a2-884e-a903-53898d9a118a
2008-04-26 01:13:58 +00:00
Edward Z. Yang
04b1ec33cb [3.1.0] version -> VERSION
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1693 48356398-32a2-884e-a903-53898d9a118a
2008-04-25 05:47:36 +00:00
Edward Z. Yang
6d9643a92e [3.1.0] Add const version to HTMLPurifier, also bump version to 3.1.0-dev; this apparently is a good idea!
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1692 48356398-32a2-884e-a903-53898d9a118a
2008-04-25 05:26:10 +00:00
Edward Z. Yang
438d973073 Renumber as 3.1.0, however, NOT releasing (WHATSNEW isn't updated)
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1691 48356398-32a2-884e-a903-53898d9a118a
2008-04-25 03:54:38 +00:00
Edward Z. Yang
f295465ad4 [3.1.0] Allow index to be false for config from for creation
- Static-ify Printer_ConfigForm's get* functions

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1690 48356398-32a2-884e-a903-53898d9a118a
2008-04-25 02:43:31 +00:00
Edward Z. Yang
eaabccdd9b [3.1.0] More PHP4->PHP5 conversions, notably reference removal of most methods that return objects
- Removed HTMLPurifier_Error
- Documentation updates
- Removed more copy() methods in favor of clone
- HTMLPurifier::getInstance() to HTMLPurifier::instance()
- Fix InterchangeBuilder to use HTMLPURIFIER_PREFIX

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1689 48356398-32a2-884e-a903-53898d9a118a
2008-04-23 02:40:17 +00:00
Edward Z. Yang
893cdd0301 All 3.1.0 TODOs are done!
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1687 48356398-32a2-884e-a903-53898d9a118a
2008-04-23 00:17:52 +00:00
Edward Z. Yang
1ba77fedd4 [3.1.0] Implement DenyElementDecorator for imagecrash-protection against CSS width/height
- Misc doc changes
- Add missing inheritance for AttrDef_CSS decorators


git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1684 48356398-32a2-884e-a903-53898d9a118a
2008-04-22 22:28:54 +00:00
Edward Z. Yang
fae720115a Update TODO
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1683 48356398-32a2-884e-a903-53898d9a118a
2008-04-22 20:57:11 +00:00
Edward Z. Yang
c0f2e69c9f [3.1.0] Update French documentation.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1682 48356398-32a2-884e-a903-53898d9a118a
2008-04-22 20:43:47 +00:00
Edward Z. Yang
ca6b20ff2b [phorum-3.0.0.1] Improve installation documentation.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1681 48356398-32a2-884e-a903-53898d9a118a
2008-04-22 18:30:01 +00:00
Edward Z. Yang
c4aa3ee40c [3.1.0] Encoder optimization, as suggested by Diego
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1680 48356398-32a2-884e-a903-53898d9a118a
2008-04-22 18:14:40 +00:00
Edward Z. Yang
e9c7873057 [3.1.0] Fix validation error with missing li.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1679 48356398-32a2-884e-a903-53898d9a118a
2008-04-22 17:35:39 +00:00
Edward Z. Yang
d45f42e6a8 Update docs.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1678 48356398-32a2-884e-a903-53898d9a118a
2008-04-22 17:32:42 +00:00
Edward Z. Yang
f46aef698e Post rc skirmishes.
- Update docs
- Update source code comments in generated files
- release1-update.php now flushes after it finishes
- Make InterchangeBuilder alphabetize

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1676 48356398-32a2-884e-a903-53898d9a118a
2008-04-22 16:20:45 +00:00
Edward Z. Yang
8fdf8c2e44 Release 3.1.0rc1
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1672 48356398-32a2-884e-a903-53898d9a118a
2008-04-22 07:24:47 +00:00
Edward Z. Yang
4fe475c57f [3.1.0] Implement %HTML.Forbidden*
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1671 48356398-32a2-884e-a903-53898d9a118a
2008-04-22 07:16:49 +00:00
Edward Z. Yang
d3710518ce Minor documentation updates; we're going to bite the bullet and tell PEAR users to change their installs.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1670 48356398-32a2-884e-a903-53898d9a118a
2008-04-22 06:47:45 +00:00
Edward Z. Yang
e1876c18ad [3.1.0] Deprecate addFilter; set up Filter namespace
- Added EXTERNAL dependency config-schema value
- Fix safe bug in Printer_HTMLDefinition
- Fixed broken smoketests

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1669 48356398-32a2-884e-a903-53898d9a118a
2008-04-22 06:40:04 +00:00
Edward Z. Yang
e616f07739 [3.1.0] Implement file sniffing of $config, for TRUE feature parity! Also add some really silly multi-column code in the XSLT.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1668 48356398-32a2-884e-a903-53898d9a118a
2008-04-22 05:38:03 +00:00
Edward Z. Yang
8708c0617a Minor whitespace and div fixes for XSLT
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1667 48356398-32a2-884e-a903-53898d9a118a
2008-04-22 02:24:11 +00:00
Edward Z. Yang
39be09ee14 [3.1.0] Add support for deprecated and version in configdoc
- Hide deprecated elements from ToC
- %HTML.Doctype takes null instead of empty string; this shouldn't affect anyone

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1666 48356398-32a2-884e-a903-53898d9a118a
2008-04-22 02:19:40 +00:00
Edward Z. Yang
949f605857 [3.1.0] Feature parity with configdoc rewrite
- Abolish most classes in ConfigDoc except for HTMLXSLTProcessor
- Implement Builder_Xml using XmlWriter
- Add some convenience functions

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1665 48356398-32a2-884e-a903-53898d9a118a
2008-04-22 01:58:06 +00:00
Edward Z. Yang
50aa0ea714 [3.1.0] Move $safe from ElementDef to HTMLModule
- Make $info in AttrTypes protected, to force cloning
- Remove copy() functions in favor of clone

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1664 48356398-32a2-884e-a903-53898d9a118a
2008-04-21 23:28:52 +00:00
Edward Z. Yang
59605d592b Classname() constructors to __construct() constructors, as per SimpleTest. Also normalized ppp declarations; no public declaration for test methods, public/protected for the rest
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1663 48356398-32a2-884e-a903-53898d9a118a
2008-04-21 15:24:18 +00:00
Edward Z. Yang
5dbd455afb Add nice error message
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1662 48356398-32a2-884e-a903-53898d9a118a
2008-04-15 03:59:36 +00:00
Edward Z. Yang
64d2da72f8 Make flush work with non-standard PHP binaries
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1661 48356398-32a2-884e-a903-53898d9a118a
2008-04-15 03:51:21 +00:00
Edward Z. Yang
5489537b4b Make flush use its input executable
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1660 48356398-32a2-884e-a903-53898d9a118a
2008-04-15 03:35:59 +00:00
Edward Z. Yang
a4181a80d7 Various fixes:
- Resync standalone with includes
- Fix error queue code
- Fix heisenbug with flush and certain definition error messages
- Fix premature $end in ini files
- Make $php config variable actually do something

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1659 48356398-32a2-884e-a903-53898d9a118a
2008-04-15 03:33:09 +00:00
Edward Z. Yang
a1ea149105 Fix reporter's custom CSS
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1658 48356398-32a2-884e-a903-53898d9a118a
2008-04-14 19:19:48 +00:00
Edward Z. Yang
3b5effcb72 - Removed tally(), swallowErrors(), assertNoErrors()
- Removed unnecessary ampersands

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1657 48356398-32a2-884e-a903-53898d9a118a
2008-04-14 03:06:36 +00:00
Edward Z. Yang
d1ace6af17 More documentation updates.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1656 48356398-32a2-884e-a903-53898d9a118a
2008-04-10 02:56:46 +00:00
Edward Z. Yang
a391dfe1de Update docs for SimpleTest and PHPT, also update TODO.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1655 48356398-32a2-884e-a903-53898d9a118a
2008-04-09 02:00:42 +00:00
Edward Z. Yang
119c70fc05 Remove some vestigial SimpleTest code, fix some tests, also reload the includes.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1654 48356398-32a2-884e-a903-53898d9a118a
2008-04-09 01:56:19 +00:00
Edward Z. Yang
b997076dfa Automatically disable PHPT if version of PHP isn't supported.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1653 48356398-32a2-884e-a903-53898d9a118a
2008-04-08 19:25:56 +00:00
Edward Z. Yang
04a6b1d3f2 Update broken test, also update SimpleTest loading code to trunk.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1652 48356398-32a2-884e-a903-53898d9a118a
2008-04-08 04:52:22 +00:00
Edward Z. Yang
27ba8f2192 [3.1.0] Document Config Schema, also, fix bug with null defaults
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1651 48356398-32a2-884e-a903-53898d9a118a
2008-04-05 18:37:08 +00:00
Edward Z. Yang
9f1e678b48 [3.1.0] Fixed fatal error in PH5P lexer with invalid tag names
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1650 48356398-32a2-884e-a903-53898d9a118a
2008-04-05 04:28:37 +00:00
Edward Z. Yang
c216968087 Miscellaneous documentation updates.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1649 48356398-32a2-884e-a903-53898d9a118a
2008-04-04 22:15:14 +00:00
Edward Z. Yang
d467af6c4b [3.1.0] Feature-parity achieved for validator! Implement alias checking.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1648 48356398-32a2-884e-a903-53898d9a118a
2008-04-04 22:04:10 +00:00
Edward Z. Yang
0ee090bc7b [3.1.0] Continue building up validation functions
- Remove incorrect parsing of value aliases
- Implement most allowed and value alias checks
- Add assertIsBool, assertIsArray and assertIsLookup to ValidatorAtom
- Publish string types in VarParser

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1647 48356398-32a2-884e-a903-53898d9a118a
2008-04-04 21:33:37 +00:00
Edward Z. Yang
6b21a841c4 Fix bug with safe-includes generation
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1646 48356398-32a2-884e-a903-53898d9a118a
2008-04-04 17:47:52 +00:00
Edward Z. Yang
08bdeb2ac2 [3.1.0] Add HTMLPurifier.safe-includes.php loader stub.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1645 48356398-32a2-884e-a903-53898d9a118a
2008-04-04 17:44:42 +00:00
Edward Z. Yang
9676e8580e [3.1.0] Improve maintenance script documentation
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1644 48356398-32a2-884e-a903-53898d9a118a
2008-04-03 22:39:50 +00:00
Edward Z. Yang
dac98cdb06 [3.1.0] Emit notice if setting configuration alias, and fix up our test code not to use aliases
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1643 48356398-32a2-884e-a903-53898d9a118a
2008-04-03 21:53:06 +00:00
Edward Z. Yang
870a4029ec Minor doc updates
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1642 48356398-32a2-884e-a903-53898d9a118a
2008-04-03 21:40:37 +00:00
Edward Z. Yang
e78df4dc9f [3.1.0] When flush fails, fail SimpleTest
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1641 48356398-32a2-884e-a903-53898d9a118a
2008-04-03 21:24:16 +00:00
Edward Z. Yang
1d25be875d [3.1.0] Maintenance scripts emit and honor proper exit codes
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1640 48356398-32a2-884e-a903-53898d9a118a
2008-04-03 20:52:08 +00:00
Edward Z. Yang
0e51e42197 Update sample test-settings file to be more user-friendly.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1638 48356398-32a2-884e-a903-53898d9a118a
2008-04-01 17:35:01 +00:00
Edward Z. Yang
51cbb72649 [3.1.0] Landed modified patch by Braden Anderson for %CSS.AllowedProperties
- Fix broken ConfigSchema build, as well as broken aliases
- Remove another advisory property from runtime ConfigSchema classes
- Reorder flush script to more accurately reflect dependencies
- Remove some aliases from unit tests

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1635 48356398-32a2-884e-a903-53898d9a118a
2008-03-30 21:44:16 +00:00
Edward Z. Yang
9f2f6c3166 [3.1.0] Fix bug with addAttribute when called multiple times on the same element
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1634 48356398-32a2-884e-a903-53898d9a118a
2008-03-26 04:31:04 +00:00
Edward Z. Yang
48311b3b02 Update TODO with a few bugs that need to be squished.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1633 48356398-32a2-884e-a903-53898d9a118a
2008-03-26 04:11:38 +00:00
Edward Z. Yang
880a5b9dae Remove incorrect script path.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1632 48356398-32a2-884e-a903-53898d9a118a
2008-03-24 19:38:56 +00:00
Edward Z. Yang
043099549c Tick off a few TODO items.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1631 48356398-32a2-884e-a903-53898d9a118a
2008-03-23 02:55:00 +00:00
Edward Z. Yang
0c051df108 Fix broken --only-phpt tests
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1630 48356398-32a2-884e-a903-53898d9a118a
2008-03-23 02:51:50 +00:00
Edward Z. Yang
7e59923029 Fix PHP 5.0 and other early version compatibility by removing use of __toString
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1629 48356398-32a2-884e-a903-53898d9a118a
2008-03-23 02:50:42 +00:00
Edward Z. Yang
6d517fab09 svn:eol-style=native for *.vtest
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1628 48356398-32a2-884e-a903-53898d9a118a
2008-03-23 02:40:27 +00:00
Edward Z. Yang
77302f845f [3.1.0] Implemented redundant validators and tests
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1627 48356398-32a2-884e-a903-53898d9a118a
2008-03-23 02:35:47 +00:00
Edward Z. Yang
82c9a737f4 [3.1.0] Implement more validators, add in missing DEFAULTs for many tests.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1626 48356398-32a2-884e-a903-53898d9a118a
2008-03-23 01:29:57 +00:00
Edward Z. Yang
aedfbd1e93 [3.1.0] Define *.vtest test hierarchy, and continue work on validator.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1625 48356398-32a2-884e-a903-53898d9a118a
2008-03-23 01:06:35 +00:00
Edward Z. Yang
848795d4a0 [3.1.0] Add multi-parse capability for StringHash
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1624 48356398-32a2-884e-a903-53898d9a118a
2008-03-23 00:02:37 +00:00
Edward Z. Yang
b8f00ace1a [3.1.0]
- Add tests for the atoms.
- Add Id validation for Directives

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1623 48356398-32a2-884e-a903-53898d9a118a
2008-03-22 21:06:55 +00:00
Edward Z. Yang
34ba0e408f [3.1.0] Initial validator implementation for namespaces.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1622 48356398-32a2-884e-a903-53898d9a118a
2008-03-22 20:26:04 +00:00
Edward Z. Yang
56cfcba5d1 [3.1.0] Make StringHash system-agnostic.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1621 48356398-32a2-884e-a903-53898d9a118a
2008-03-22 19:30:37 +00:00
Edward Z. Yang
ec59062a9d [3.1.0] De-crudify the ConfigSchema space; we're starting over again
- Optimize ConfigSchema by removing non-essential runtime data. We can probably optimize even more by collapsing object structures to arrays.
- Removed validation data from ConfigSchema; this will be reimplemented on Interchange
- Implement a sane Interchange composite hierarchy that doesn't use arrays
- Implement StringHash -> Interchange -> ConfigSchema, and rewrite maintenance file to account for this

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1615 48356398-32a2-884e-a903-53898d9a118a
2008-03-22 03:55:59 +00:00
Edward Z. Yang
93babf0a88 [3.1.0] Fix bug with rgb() w/ spaces inside shorthand CSS properties
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1612 48356398-32a2-884e-a903-53898d9a118a
2008-03-16 19:14:39 +00:00
Edward Z. Yang
42d2858c9d [3.1.0] Experimental kses support.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1610 48356398-32a2-884e-a903-53898d9a118a
2008-03-13 05:35:57 +00:00
Edward Z. Yang
c0dd6944a3 Implement If validator.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1609 48356398-32a2-884e-a903-53898d9a118a
2008-03-05 06:04:08 +00:00
Edward Z. Yang
e83573a3ad Implement ParseDefault.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1608 48356398-32a2-884e-a903-53898d9a118a
2008-03-05 05:49:18 +00:00
Edward Z. Yang
b65942a2c5 Implement "Or" composite validator.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1607 48356398-32a2-884e-a903-53898d9a118a
2008-03-05 05:38:28 +00:00
Edward Z. Yang
e4ab6d584e Implement composite validator, and make Interchange use that.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1606 48356398-32a2-884e-a903-53898d9a118a
2008-03-05 05:20:28 +00:00
Edward Z. Yang
e21e9b23ad [3.1.0] float -> int
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1605 48356398-32a2-884e-a903-53898d9a118a
2008-03-05 05:07:12 +00:00
Edward Z. Yang
6cdcc8b8e1 Implement native VarParser.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1604 48356398-32a2-884e-a903-53898d9a118a
2008-03-05 05:03:01 +00:00
Edward Z. Yang
ff60e09780 Refactor VarParser and VarParser_Flexible to use template method, factoring out common functionality.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1603 48356398-32a2-884e-a903-53898d9a118a
2008-03-05 04:41:45 +00:00
Edward Z. Yang
bd64a8346d Reorganize VarParser; there may be multiple implementations.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1602 48356398-32a2-884e-a903-53898d9a118a
2008-03-05 03:51:09 +00:00
Edward Z. Yang
7480e7b956 [3.1.0] Split out VarParser from ConfigSchema
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1601 48356398-32a2-884e-a903-53898d9a118a
2008-03-04 15:06:00 +00:00
Edward Z. Yang
b9eb44bf03 Add ParseType validator.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1600 48356398-32a2-884e-a903-53898d9a118a
2008-03-04 14:33:38 +00:00
Edward Z. Yang
c0b5bc3eea [3.1.0] Implement NamespaceExists and ParseId
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1599 48356398-32a2-884e-a903-53898d9a118a
2008-03-04 05:21:04 +00:00
Edward Z. Yang
14437cbf47 - Rename Duplicate to Unique, as the name of validator indicates what we want the input to be
- Enable flush to work when includes are renamed

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1597 48356398-32a2-884e-a903-53898d9a118a
2008-03-04 04:20:55 +00:00
Edward Z. Yang
4c798bd17e [3.1.0] Implement Duplicate validator, also modify some design things
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1596 48356398-32a2-884e-a903-53898d9a118a
2008-03-04 04:13:07 +00:00
Edward Z. Yang
1b434f0ecc Update index.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1595 48356398-32a2-884e-a903-53898d9a118a
2008-03-04 02:02:02 +00:00
Edward Z. Yang
fab5a9b3e1 Add Sander Tekelenburg's CSS extraction proposal.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1594 48356398-32a2-884e-a903-53898d9a118a
2008-03-04 01:58:33 +00:00
Edward Z. Yang
d8cb360f3b Refactor validators so that they can be reused between directives and namespaces.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1589 48356398-32a2-884e-a903-53898d9a118a
2008-03-02 04:39:14 +00:00
Edward Z. Yang
18320d59a4 Fix some backwards compatibility issues in FSTools::globr.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1588 48356398-32a2-884e-a903-53898d9a118a
2008-03-02 04:07:17 +00:00
Edward Z. Yang
0d9c05d13c [3.1.0] Create decorator validator/adapter for Interchange.
- Output flush output

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1587 48356398-32a2-884e-a903-53898d9a118a
2008-03-02 04:00:43 +00:00
Edward Z. Yang
d81bcbd208 Remove decorator pattern from validator; we'll only have one decorator which invokes the subsystem.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1586 48356398-32a2-884e-a903-53898d9a118a
2008-03-02 02:57:31 +00:00
Edward Z. Yang
fa6b6fe85f [3.1.0] Reconfigure tester to glob for test files using *Test.php pattern.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1585 48356398-32a2-884e-a903-53898d9a118a
2008-03-02 02:05:47 +00:00
Edward Z. Yang
8bda0c4dfb [3.1.0] Refactor out validation framework for Interchange
- Implement IdExists validator

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1584 48356398-32a2-884e-a903-53898d9a118a
2008-03-02 01:55:14 +00:00
Edward Z. Yang
d682a59a68 Make magic quotes fix play more nicely with PHP 5.3, but it's now untested.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1583 48356398-32a2-884e-a903-53898d9a118a
2008-03-01 18:09:52 +00:00
Edward Z. Yang
240b565513 [3.1.0] Implement ConfigSchema interchange
- Implement exception hierarchy

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1582 48356398-32a2-884e-a903-53898d9a118a
2008-03-01 17:06:23 +00:00
Edward Z. Yang
c521e3a534 Fix fatal error with empty migrate.php file
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1581 48356398-32a2-884e-a903-53898d9a118a
2008-02-28 05:35:45 +00:00
Edward Z. Yang
d765628d24 Minor cosmetic changes.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1580 48356398-32a2-884e-a903-53898d9a118a
2008-02-26 01:45:03 +00:00
Edward Z. Yang
2cc535ad84 [3.1.0] Support for display/visibility CSS with %CSS.AllowTricky
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1579 48356398-32a2-884e-a903-53898d9a118a
2008-02-25 22:05:49 +00:00
Edward Z. Yang
30eb982961 [3.1.0] Add support for !important, with %CSS.AllowImportant
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1578 48356398-32a2-884e-a903-53898d9a118a
2008-02-25 21:58:17 +00:00
Edward Z. Yang
a2d044f58d Reorganize configdoc, but it's still broken.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1577 48356398-32a2-884e-a903-53898d9a118a
2008-02-25 21:21:12 +00:00
Edward Z. Yang
002fe649f7 [3.1.0] Move ConfigSchema to HTMLPurifier core
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1576 48356398-32a2-884e-a903-53898d9a118a
2008-02-24 06:19:28 +00:00
Edward Z. Yang
d3c04de9dc [3.1.0] Move schema files to schema/ directory, this is in preparation for mass-class rename
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1575 48356398-32a2-884e-a903-53898d9a118a
2008-02-24 05:59:57 +00:00
Edward Z. Yang
e4ce3362a5 Some maintenance script cleanup
- Create super-script flush.php
- Rename old scripts with old- prefix.

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1574 48356398-32a2-884e-a903-53898d9a118a
2008-02-24 05:06:39 +00:00
Edward Z. Yang
cb793cd9b9 - Restore substr_count compatibility method; it's not just PHP 4
- Update missing includes
- Fix generate-standalone.php fatal error
- Make LexerTest resilient against variant versions of libxml

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1573 48356398-32a2-884e-a903-53898d9a118a
2008-02-20 01:28:19 +00:00
Edward Z. Yang
b5f1c76ee8 [3.1.0] Implement Proprietary HTML module with <marquee>
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1572 48356398-32a2-884e-a903-53898d9a118a
2008-02-20 00:53:09 +00:00
Edward Z. Yang
f2863557f5 [3.1.0] Make static ConfigSchema methods throw errors.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1571 48356398-32a2-884e-a903-53898d9a118a
2008-02-20 00:37:08 +00:00
Edward Z. Yang
6c9c8f2380 [3.1.0] [BACKPORT] Fix bug with comments in styles, and some associated issues
- Restore printTokens()

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1570 48356398-32a2-884e-a903-53898d9a118a
2008-02-20 00:15:44 +00:00
Edward Z. Yang
fbc595ebed Remove includes from unit tests.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1569 48356398-32a2-884e-a903-53898d9a118a
2008-02-18 04:41:42 +00:00
Edward Z. Yang
6651941e3f Rename merge-library to generate-standalone.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1568 48356398-32a2-884e-a903-53898d9a118a
2008-02-18 04:02:12 +00:00
Edward Z. Yang
f5a77c523b Make PHPT globally configurable via test-settings.php
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1567 48356398-32a2-884e-a903-53898d9a118a
2008-02-18 03:48:07 +00:00
Edward Z. Yang
ed2aa44bd2 - Make suite flush remote tests
- Make CliTestCase report fails when XML cannot be parsed

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1566 48356398-32a2-884e-a903-53898d9a118a
2008-02-18 03:35:27 +00:00
Edward Z. Yang
21cf2c94d4 - Gracefully error out of obscure bug involving non-static autoloaders
- Test multiple autoloaders
- Remove local URL test; doesn't work for other people

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1565 48356398-32a2-884e-a903-53898d9a118a
2008-02-18 02:33:50 +00:00
Edward Z. Yang
a4abc45505 All PHPT tests for now complete! Fix an SPL autoload bug.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1564 48356398-32a2-884e-a903-53898d9a118a
2008-02-18 01:11:17 +00:00
Edward Z. Yang
969a027a5b Rename to spl-autoload to more descriptive spl-autoload-and-user-autoload
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1563 48356398-32a2-884e-a903-53898d9a118a
2008-02-18 00:55:10 +00:00
Edward Z. Yang
e3fdda1f3c - Bulk up loading PHPT tests.
- Fix documentation error with regards to standalone path behavior

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1562 48356398-32a2-884e-a903-53898d9a118a
2008-02-18 00:49:45 +00:00
Edward Z. Yang
40d3d5b961 [3.1.0] Fix broken autoloader, resulting in duplicate classes
- Factor out phpt directory parsing
- More phpt tests

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1561 48356398-32a2-884e-a903-53898d9a118a
2008-02-17 23:59:23 +00:00
Edward Z. Yang
a3b6a15595 Expand flush capabilities of unit tester
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1560 48356398-32a2-884e-a903-53898d9a118a
2008-02-17 18:28:35 +00:00
Edward Z. Yang
4c24a51054 Numerous documentation and test code fixes for HTML Purifier loading
- Improve documentation for stub files
- Synchronize stub files between extras/ and library/
- Remove unnecessary include in function file
- Remove special treatment of Bootstrap
- Improve docs for HTMLPurifier, converted singleton to use static member variables and removed reference
- Add HTMLPurifier.path.php stub file
- Update sample test settings
- Reorganize includes in test files

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1559 48356398-32a2-884e-a903-53898d9a118a
2008-02-17 18:21:45 +00:00
Edward Z. Yang
f4c4354ae4 Make --disable-phpt and --only-phpt flags work with multitest.php
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1558 48356398-32a2-884e-a903-53898d9a118a
2008-02-17 01:12:30 +00:00
Edward Z. Yang
51da183f80 - Enable blank params to mean true for booleans again
- Make reporter send val=1 anyway
- Add htmlpurifier.org remote test

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1557 48356398-32a2-884e-a903-53898d9a118a
2008-02-17 01:07:17 +00:00
Edward Z. Yang
212958a9c7 - Make phpt files svn:eol-style=native
- Add missing NEWS entry

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1556 48356398-32a2-884e-a903-53898d9a118a
2008-02-17 00:19:27 +00:00
Edward Z. Yang
9f851b2139 - Rewrite command line parsing code to support --opt val
- Add $php settings variable

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1555 48356398-32a2-884e-a903-53898d9a118a
2008-02-16 23:23:45 +00:00
Edward Z. Yang
8c439aa62c [3.1.0] multitest.php now works again for all version, however, some hacks to PHPT are required
- Bootstrap hotfix to prevent multiple loading in standalone. We need a better way of doing this!
- Make extras/ autoloader polite too
- Initialize autoloaders in common.php
- Add trailing dash to "skip" in phpt
- Upgrade SimpleTest.php not to allow directories

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1554 48356398-32a2-884e-a903-53898d9a118a
2008-02-16 18:03:51 +00:00
Edward Z. Yang
5c0a1d467a [3.1.0] [BACKPORT] Fix bug with trusted script handling for versions of libxml 2.6.28 or later
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1553 48356398-32a2-884e-a903-53898d9a118a
2008-02-16 05:44:14 +00:00
Edward Z. Yang
929d932234 [3.1.0] Implement a few phpt, fix some autoload bugs
- Make our autoload handler polite, ensuring that any __autoload() 
functions get added
- Modify phpt calling code so that each phpt files gets its own 
test-case (this lets us run one phpt file at a time)
- Implement phpt for loading, which test varying loading methods of HTML 
Purifier
- Add --disable-phpt and --only-phpt flags
- More descriptive veto messages, also fix test count



git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1552 48356398-32a2-884e-a903-53898d9a118a
2008-02-16 05:40:59 +00:00
Edward Z. Yang
3441421e8b [3.1.0] Land PHPT integration (currently required; we'll relax this later)
- Port ConfigSchema and Config tests to new syntax
- Deprecate Debugger
- Add $schema params to most Config convenience functions
- Add --php flag to testing scripts for command line
- NOT TESTED WITH MULTITEST.PHP!


git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1551 48356398-32a2-884e-a903-53898d9a118a
2008-02-16 00:40:30 +00:00
Edward Z. Yang
e28d39e46b Organize TODO into sectors.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1550 48356398-32a2-884e-a903-53898d9a118a
2008-02-11 02:21:35 +00:00
Edward Z. Yang
bf6de96bd0 - More TODO items
- Update comments

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1549 48356398-32a2-884e-a903-53898d9a118a
2008-02-11 00:27:35 +00:00
Edward Z. Yang
65d0e1fdfe [3.1.0] HTMLPURIFIER_PREFIX can be defined outside of HTML Purifier
- Update TODO

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1548 48356398-32a2-884e-a903-53898d9a118a
2008-02-11 00:15:04 +00:00
Edward Z. Yang
d228d66785 - Update TODO
- Fix bug in merge-library.php

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1547 48356398-32a2-884e-a903-53898d9a118a
2008-02-10 23:05:28 +00:00
Edward Z. Yang
e9c22df148 Remove unnecessary includes: includes now works.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1546 48356398-32a2-884e-a903-53898d9a118a
2008-02-10 22:51:33 +00:00
Edward Z. Yang
de6e024464 Regenerate HTMLPurifier.includes.php, and fix parse errors.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1545 48356398-32a2-884e-a903-53898d9a118a
2008-02-10 22:47:08 +00:00
Edward Z. Yang
66229121bf Post-update changelog for hotfixes r1542 and r1543.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1544 48356398-32a2-884e-a903-53898d9a118a
2008-02-10 22:41:49 +00:00
Edward Z. Yang
0eadf98ee2 Split out tokens to prevent autoload barfing.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1543 48356398-32a2-884e-a903-53898d9a118a
2008-02-10 22:38:53 +00:00
Edward Z. Yang
6eb193a316 Prefix needs to be one-up.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1542 48356398-32a2-884e-a903-53898d9a118a
2008-02-10 22:30:32 +00:00
Edward Z. Yang
8165adb6c8 Move constant to Bootstrap to prevent autoload problems.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1541 48356398-32a2-884e-a903-53898d9a118a
2008-02-10 22:29:01 +00:00
Edward Z. Yang
37b24b6732 [3.1.0] Further cleanup, making standalone work again
- Remove includes call in HTMLPurifier.auto.php
- Relax ConfigSchema treatment in generate-includes.php
- Clean up some empty comments (there are probably more)
- De-indent some extends
- class_exists() should now attempt to use autoload
- schema.ser is now a standalone file
- tests/index.php can be run from any directory

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1540 48356398-32a2-884e-a903-53898d9a118a
2008-02-10 21:34:52 +00:00
Edward Z. Yang
35f8b3c801 Transition is complete! Cleanup and class rearrangement now necessary.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1539 48356398-32a2-884e-a903-53898d9a118a
2008-02-10 20:34:39 +00:00
Edward Z. Yang
c7e115c81c Add /null identifiers to schema definitions
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1538 48356398-32a2-884e-a903-53898d9a118a
2008-02-10 19:47:09 +00:00
Edward Z. Yang
4c502b25f2 Exported configuration values!
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1537 48356398-32a2-884e-a903-53898d9a118a
2008-02-10 02:44:44 +00:00
Edward Z. Yang
3d56c1253b Add version extraction functionality
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1536 48356398-32a2-884e-a903-53898d9a118a
2008-02-10 01:33:40 +00:00
Edward Z. Yang
d00cb1e64d Add API for version extraction.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1535 48356398-32a2-884e-a903-53898d9a118a
2008-02-10 00:49:31 +00:00
Edward Z. Yang
b6c9dcefd7 Implement schema extraction script; almost done except for version extraction. Also, some minor refinements.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1534 48356398-32a2-884e-a903-53898d9a118a
2008-02-07 19:29:08 +00:00
Edward Z. Yang
14ef0b75e5 Make StringHash compatible for all versions of PHP 5.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1533 48356398-32a2-884e-a903-53898d9a118a
2008-02-07 18:02:18 +00:00
Edward Z. Yang
3ba42106ba Implement ReverseAdapter. Also, debug some order of execution things in the Adapter.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1532 48356398-32a2-884e-a903-53898d9a118a
2008-02-07 17:41:40 +00:00
Edward Z. Yang
dea117032f Complete Adapter implementation, also unit-test keys with dashes in them in parser.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1531 48356398-32a2-884e-a903-53898d9a118a
2008-02-05 21:04:00 +00:00
Edward Z. Yang
b8a46821f3 Add README for the extras/ folder.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1530 48356398-32a2-884e-a903-53898d9a118a
2008-02-05 02:26:46 +00:00
Edward Z. Yang
2135597553 - Implement StringHash wrapper ArrayObject
- Implement namespace parsing for StringHashAdapter, as well as rudimentary error-checking

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1529 48356398-32a2-884e-a903-53898d9a118a
2008-02-05 01:54:20 +00:00
Edward Z. Yang
d238b1a9ed Temporary modification to auto until we get pure autoload working.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1528 48356398-32a2-884e-a903-53898d9a118a
2008-02-04 21:57:55 +00:00
Edward Z. Yang
c5e1e1711d Update HTMLPurifier.includes.php as per r1526.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1527 48356398-32a2-884e-a903-53898d9a118a
2008-01-27 21:25:01 +00:00
Edward Z. Yang
f2e42d1d3e Make bootstrap a require_once... for now.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1526 48356398-32a2-884e-a903-53898d9a118a
2008-01-27 21:21:18 +00:00
Edward Z. Yang
a74a590f1c [3.1.0] Remove multi-description functionality as well as file detection.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1525 48356398-32a2-884e-a903-53898d9a118a
2008-01-27 20:21:31 +00:00
Edward Z. Yang
3baf1774b2 [3.1.0] Initial commit of adapter functionality; not complete.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1524 48356398-32a2-884e-a903-53898d9a118a
2008-01-27 19:59:12 +00:00
Edward Z. Yang
41e3c091e1 [3.1.0] Give ConfigSchema non-static function equivalents
- Add todo to StringHashParser

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1523 48356398-32a2-884e-a903-53898d9a118a
2008-01-27 19:41:40 +00:00
Edward Z. Yang
5a6021599a [3.1.0] Add StringHashParser.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1522 48356398-32a2-884e-a903-53898d9a118a
2008-01-27 18:50:36 +00:00
Edward Z. Yang
40d7a296b5 Make directory recursive.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1521 48356398-32a2-884e-a903-53898d9a118a
2008-01-27 17:59:21 +00:00
Edward Z. Yang
fd81fbac82 Add complete handle support.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1519 48356398-32a2-884e-a903-53898d9a118a
2008-01-27 05:53:13 +00:00
Edward Z. Yang
2598b26778 Initial commit for extra class hierarchies FSTools and ConfigSchema.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1518 48356398-32a2-884e-a903-53898d9a118a
2008-01-27 05:31:06 +00:00
Edward Z. Yang
4fe6661c64 Update INSTALL file with new setup.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1517 48356398-32a2-884e-a903-53898d9a118a
2008-01-27 02:08:42 +00:00
Edward Z. Yang
522c8ed7c2 [3.1.0] The bulk of autoload support added
- Add FSTools:globr()
- require_once removed from all files
- HTMLPurifier.autoload.php added to register autoload handler
- Removed redundant chdir in maintenance script
- Modified standalone to use HTMLPurifier.includes.php for including stuff
- Added maintenance script remove-require-once.php which we used once and should never use again

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1516 48356398-32a2-884e-a903-53898d9a118a
2008-01-27 01:54:41 +00:00
Edward Z. Yang
81a4cf6a14 [3.1.0] Preparation for autoload
- Factor out getPath from autoload implementation to its own function
- Ensure extends is on the same line as class for all files

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1515 48356398-32a2-884e-a903-53898d9a118a
2008-01-27 00:06:10 +00:00
Edward Z. Yang
25551c4b78 Support dry runs in SimpleTest, as well as misc other improvements.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1514 48356398-32a2-884e-a903-53898d9a118a
2008-01-21 20:27:26 +00:00
Edward Z. Yang
c9bf2e8489 Add support for dry runs.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1513 48356398-32a2-884e-a903-53898d9a118a
2008-01-21 19:27:55 +00:00
Edward Z. Yang
07ca96a132 Make test_files not fatally error if CSSTidy is not available.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1512 48356398-32a2-884e-a903-53898d9a118a
2008-01-21 19:02:40 +00:00
Edward Z. Yang
43a5ef3cc6 Add support for autoload. We're not, however, using it by default.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1511 48356398-32a2-884e-a903-53898d9a118a
2008-01-21 18:43:59 +00:00
Edward Z. Yang
ff72b2d012 Minor typo/grammar fixes.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1510 48356398-32a2-884e-a903-53898d9a118a
2008-01-20 04:17:32 +00:00
Edward Z. Yang
5eee08c548 [3.1.0] Convert tokens to use instanceof, reducing memory footprint and improving comparison speed.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1509 48356398-32a2-884e-a903-53898d9a118a
2008-01-19 20:23:01 +00:00
Edward Z. Yang
dd8ef4d3f5 Fix double-encoded quotes.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1508 48356398-32a2-884e-a903-53898d9a118a
2008-01-18 07:04:30 +00:00
Edward Z. Yang
c43c0660f5 Update dev-includes.txt with our evil master plan.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1506 48356398-32a2-884e-a903-53898d9a118a
2008-01-15 03:05:43 +00:00
Edward Z. Yang
c85fd83d2b Add development documentation for handling the include problem.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1505 48356398-32a2-884e-a903-53898d9a118a
2008-01-14 22:19:44 +00:00
Edward Z. Yang
c23b9da2cd Update NEWS/TODO with slight entries
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1503 48356398-32a2-884e-a903-53898d9a118a
2008-01-13 05:40:53 +00:00
Edward Z. Yang
aca282104f More updates for ver 3.0.0
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1502 48356398-32a2-884e-a903-53898d9a118a
2008-01-13 05:28:39 +00:00
Edward Z. Yang
a8f7cddd49 Change help message to div for new theme.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1501 48356398-32a2-884e-a903-53898d9a118a
2008-01-13 04:20:54 +00:00
Edward Z. Yang
8a17b1fbc3 Make module compatible with Phorum 5.2.6. Changes:
- Modify signature/edit message handling to account for <phorum break>
- Update line numbers
- Update edit message fragile code
- Prevent message blanking when signature or edit message is empty
- Armor source for quote
- Update bbcode function call
- Use phorum_db_interact for our custom call

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1499 48356398-32a2-884e-a903-53898d9a118a
2008-01-13 01:58:34 +00:00
Edward Z. Yang
57f897661e [3.1.0] [BACKPORT] Fix <span><span><div> by sending back to the front of the loop for reprocessing.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1492 48356398-32a2-884e-a903-53898d9a118a
2008-01-10 21:40:41 +00:00
Edward Z. Yang
4e32902c63 Update release scripts, also, remove errant space from VERSION.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1491 48356398-32a2-884e-a903-53898d9a118a
2008-01-08 01:20:12 +00:00
Edward Z. Yang
02658df8b2 Release 3.0.0.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1483 48356398-32a2-884e-a903-53898d9a118a
2008-01-07 02:54:16 +00:00
Edward Z. Yang
be7c1e7a8f [3.0.0] Upgraded test scripts and other goodies. Also removed some PHP4 cruft.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1482 48356398-32a2-884e-a903-53898d9a118a
2008-01-07 00:17:49 +00:00
Edward Z. Yang
562f53b54c Factor out includes to a "common.php" file, just to tidy up index.php
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1481 48356398-32a2-884e-a903-53898d9a118a
2008-01-06 19:31:21 +00:00
Edward Z. Yang
38a59ef5b8 Completely remove style if naughty selector is found. This is for compatibility reasons until Tidy 1.4 is released.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1480 48356398-32a2-884e-a903-53898d9a118a
2008-01-06 05:36:48 +00:00
Edward Z. Yang
8779b46fc4 [3.0.0] Add global scoping support for ExtractStyleBlocks; scoped="" attribute bumped off for some 'other' time.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1478 48356398-32a2-884e-a903-53898d9a118a
2008-01-05 19:19:55 +00:00
Edward Z. Yang
a7fab00cdd [3.0.0] Convert all $context calls away from references
- Update TODO list
- URISchemeRegistry doesn't return a reference for instance anymore, should do the same for other singletons

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1477 48356398-32a2-884e-a903-53898d9a118a
2008-01-05 00:10:43 +00:00
Edward Z. Yang
beefb11879 Update release script to compensate for new variable format
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1476 48356398-32a2-884e-a903-53898d9a118a
2008-01-04 23:54:12 +00:00
Edward Z. Yang
ae1c8f47cc [3.0.0] Allow filter:none for proprietary filter property
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1474 48356398-32a2-884e-a903-53898d9a118a
2007-12-17 22:12:11 +00:00
Edward Z. Yang
0f961c6af4 [3.0.0] [BACKPORT] More work for hire from Chris
! Experimental support for some proprietary CSS attributes allowed: opacity (and all of the browser-specific equivalents) and scrollbar colors. Enable by setting %CSS.Proprietary to true.
- Colors missing # but in hex form will be corrected
- CSS Number algorithm improved
. New classes:
  + HTMLPurifier_AttrDef_CSS_AlphaValue
  + HTMLPurifier_AttrDef_CSS_Filter

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1473 48356398-32a2-884e-a903-53898d9a118a
2007-12-16 23:16:45 +00:00
Edward Z. Yang
a840c24796 Update packaging script to indicate PHP 5 change and HTMLPurifier3
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1472 48356398-32a2-884e-a903-53898d9a118a
2007-12-14 04:12:06 +00:00
Edward Z. Yang
66c0407bef Add extractStyleBlocks.php smoketest, also add some neat new functionality to TODO.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1471 48356398-32a2-884e-a903-53898d9a118a
2007-12-14 01:12:40 +00:00
Edward Z. Yang
5b3431d889 [3.0.0] Fully implement CSS extraction and cleaning. See NEWS for more information, it is now a Filter.
- Some Lexer things were moved around

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1469 48356398-32a2-884e-a903-53898d9a118a
2007-12-12 21:46:30 +00:00
Edward Z. Yang
831f552ec5 [3.0.0] <style> tags can now be extracted from input HTML using %HTML.ExtractStyleBlocks. These contents can be retrieved from $context->get('StyleBlocks');
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1468 48356398-32a2-884e-a903-53898d9a118a
2007-12-12 03:29:12 +00:00
Edward Z. Yang
54b37674f1 Add some more neat features I'd like to see sometime.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1466 48356398-32a2-884e-a903-53898d9a118a
2007-12-09 22:14:15 +00:00
Edward Z. Yang
62f3fd894d [3.0.0] [BACKPORT]
- Add register() for DefinitionCacheFactory so custom implementations can be specified
- Add docs for flush() regarding when this functionality is impossible to do
- Refactor unit tests in DefinitionCacheFactoryTest.php

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1464 48356398-32a2-884e-a903-53898d9a118a
2007-12-09 03:14:34 +00:00
Edward Z. Yang
b5546ff6f0 [3.0.0]
+ PHP4 reference/foreach cruft in Injector removed
. Unit tests for Injector improved
. Some todo stuff updated

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1462 48356398-32a2-884e-a903-53898d9a118a
2007-12-05 01:26:28 +00:00
Edward Z. Yang
7ddd9d0afe [3.0.0] [BACKPORT] Make CSS properties case-insensitive
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1461 48356398-32a2-884e-a903-53898d9a118a
2007-12-01 17:00:55 +00:00
Edward Z. Yang
620ab75906 Fix constructor I missed in ConfigForm.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1460 48356398-32a2-884e-a903-53898d9a118a
2007-11-29 22:06:21 +00:00
Edward Z. Yang
3ef9bdf8a2 __construct'ify all main library classes.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1459 48356398-32a2-884e-a903-53898d9a118a
2007-11-29 04:29:51 +00:00
Edward Z. Yang
43f01925cd Convert to PHP 5 only codebase, adding visibility modifiers to all members and methods in the main library area (function only for test methods)
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1458 48356398-32a2-884e-a903-53898d9a118a
2007-11-25 02:24:39 +00:00
Edward Z. Yang
85a23bacb6 Remove old profiling script, improve original two so they work, and are more efficient
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1456 48356398-32a2-884e-a903-53898d9a118a
2007-11-23 21:59:04 +00:00
Edward Z. Yang
4066416160 Slight clarification of where ElementDef's required_attr property gets populated
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1454 48356398-32a2-884e-a903-53898d9a118a
2007-11-13 02:49:47 +00:00
Edward Z. Yang
fad6aa45fa Make phpdoc more efficient, ignore the conf directory
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1450 48356398-32a2-884e-a903-53898d9a118a
2007-11-06 17:50:30 +00:00
Edward Z. Yang
a7e6d85f6d Update PEAR packager
- Ignore standalone directories
- Normalize base directory with realpath

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1447 48356398-32a2-884e-a903-53898d9a118a
2007-11-06 16:37:25 +00:00
Edward Z. Yang
c330860606 Release 2.1.3.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1443 48356398-32a2-884e-a903-53898d9a118a
2007-11-06 03:39:59 +00:00
Edward Z. Yang
0ea53e5a3d Make multitest.php also manage standalone version testing.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1442 48356398-32a2-884e-a903-53898d9a118a
2007-11-06 03:34:45 +00:00
Edward Z. Yang
68167176dc [2.1.3]
- Officially support 4.3.7 and up
- Modify PH5P to remove incompatible parameter type def
- Add more versions to multitest

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1441 48356398-32a2-884e-a903-53898d9a118a
2007-11-05 05:25:59 +00:00
Edward Z. Yang
bb08f679f0 [2.1.3]
- Work around unnecessary DOMElement type-cast in PH5P that caused errors in PHP 5.1
- Work around PHP 4 SimpleTest lack-of-error complaining for one-time-only HTMLDefinition errors, this may indicate problems with error-collecting facilities in PHP 5
- Make ErrorCollectorEMock work in both PHP 4 and PHP 5
. tests/multitest.php allows you to test multiple versions by running tests/index.php through multiple interpreters using `phpv` shell script (you must provide this script!)
. Minor cosmetic change to flush-definition-cache.php: trailing newline is outputted
. Maintenance script for generating PH5P patch added, original PH5P source file also added under version control
. Full unit test runner script title made more descriptive with PHP version

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1440 48356398-32a2-884e-a903-53898d9a118a
2007-11-05 05:01:51 +00:00
Edward Z. Yang
8cd1806ec8 Update INSTALL file with better instructions. Translation needs updating.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1439 48356398-32a2-884e-a903-53898d9a118a
2007-11-05 03:40:32 +00:00
Edward Z. Yang
1274cfed49 [2.1.3] Fix possible error in DirectLex reported by Nate Abele
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1438 48356398-32a2-884e-a903-53898d9a118a
2007-11-05 03:22:22 +00:00
Edward Z. Yang
1ab47ba949 Update NEWS.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1436 48356398-32a2-884e-a903-53898d9a118a
2007-11-02 03:20:55 +00:00
Edward Z. Yang
da95ee096a Beef up HTML Purifier help message. Todo: make it hideable.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1435 48356398-32a2-884e-a903-53898d9a118a
2007-11-02 01:55:45 +00:00
Edward Z. Yang
6d7250c309 Update Doxygen file after doxygen -u command
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1429 48356398-32a2-884e-a903-53898d9a118a
2007-10-30 03:08:06 +00:00
Edward Z. Yang
df55df1083 Update Doxyfile with new paths, also exclude standalone directory
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1428 48356398-32a2-884e-a903-53898d9a118a
2007-10-30 02:46:26 +00:00
Edward Z. Yang
1a8d864a42 Have tests also check for test-settings in conf file, this allows for configuration files to be separately versioned
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1427 48356398-32a2-884e-a903-53898d9a118a
2007-10-30 02:26:11 +00:00
Edward Z. Yang
552102f7f2 [2.1.3]
- HTMLDefinition->addElement now returns a reference to the created element object, as implied by the documentation
. Extend Injector hooks to allow for more powerful injector routines
. HTMLDefinition->addBlankElement created, as according to the HTMLModule method

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1425 48356398-32a2-884e-a903-53898d9a118a
2007-10-02 22:50:59 +00:00
Edward Z. Yang
f5371bbad4 [2.1.3]
- Buggy treatment of end tags of elements that have required attributes fixed (does not manifest on default tag-set)
- Spurious internal content reorganization error suppressed
. Error unit tests can now specify the expectation of no errors. Future iterations of the harness will be extremely strict about what errors are allowed

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1424 48356398-32a2-884e-a903-53898d9a118a
2007-10-02 01:19:46 +00:00
Edward Z. Yang
c8b020879d [2.1.3] Refine injector algorithm regarding behavior inside nodes that allow paragraphs inside them
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1423 48356398-32a2-884e-a903-53898d9a118a
2007-09-27 00:39:05 +00:00
Edward Z. Yang
094b20f58f [2.1.3] Fix PHP warning from MakeAbsolute, also improve URIFilter documentation
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1422 48356398-32a2-884e-a903-53898d9a118a
2007-09-27 00:07:27 +00:00
Edward Z. Yang
f2df669eec Refactor IDAccumulator so that unit tests now work, and initialization is inside the class.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1421 48356398-32a2-884e-a903-53898d9a118a
2007-09-26 23:36:37 +00:00
Edward Z. Yang
ca43df9fdd [2.1.3] Fatal error when <img> tag (or any other element with required attributes) has 'id' attribute fixed, thanks NykO18 for reporting
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1420 48356398-32a2-884e-a903-53898d9a118a
2007-09-26 23:18:24 +00:00
Edward Z. Yang
5f76796e14 Some small doc updates
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1419 48356398-32a2-884e-a903-53898d9a118a
2007-09-25 02:42:35 +00:00
Edward Z. Yang
1f9a6ba30e [2.1.3] Activate strict blockquote functionality for HTML 4.01 Strict.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1417 48356398-32a2-884e-a903-53898d9a118a
2007-09-09 01:46:59 +00:00
Edward Z. Yang
ccca8cc34f [2.1.3] Rename configuration directive
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1416 48356398-32a2-884e-a903-53898d9a118a
2007-09-09 01:35:50 +00:00
Edward Z. Yang
28c29656af [2.1.3] Fix off-by-one bug in injector functionality for dormant injectors
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1415 48356398-32a2-884e-a903-53898d9a118a
2007-09-09 01:27:09 +00:00
Edward Z. Yang
88f4f57a47 [2.1.3] Fix poor include ordering.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1414 48356398-32a2-884e-a903-53898d9a118a
2007-09-06 19:38:12 +00:00
Edward Z. Yang
43a98de909 Fix up some comments, reduce code duplication.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1409 48356398-32a2-884e-a903-53898d9a118a
2007-09-04 00:15:07 +00:00
Edward Z. Yang
b9d886d53b Release 2.1.2.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1402 48356398-32a2-884e-a903-53898d9a118a
2007-09-03 15:30:12 +00:00
Edward Z. Yang
5b3c8c5534 [2.1.2] Implement border-spacing
- Fix PH5P testing in PHP4

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1401 48356398-32a2-884e-a903-53898d9a118a
2007-09-03 15:16:33 +00:00
Edward Z. Yang
dd40d41bc3 Refactor TODO list.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1400 48356398-32a2-884e-a903-53898d9a118a
2007-09-02 17:22:31 +00:00
Edward Z. Yang
37a80f1295 Fix typo in sample PHP code.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1399 48356398-32a2-884e-a903-53898d9a118a
2007-08-26 18:42:55 +00:00
Edward Z. Yang
fb367dc871 [2.1.2] Correct usage of entity -> character entity reference.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1398 48356398-32a2-884e-a903-53898d9a118a
2007-08-26 18:29:37 +00:00
Edward Z. Yang
29c3c21b34 [2.1.2] Merge in Brett Zamir's patches.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1397 48356398-32a2-884e-a903-53898d9a118a
2007-08-26 18:20:46 +00:00
Edward Z. Yang
e45cc503a2 [2.1.2] Refactory merge-library.php script
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1396 48356398-32a2-884e-a903-53898d9a118a
2007-08-26 17:04:31 +00:00
Edward Z. Yang
85cdea0120 [2.1.2] Remove inclusion reflection from URISchemeRegistry
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1395 48356398-32a2-884e-a903-53898d9a118a
2007-08-26 15:43:17 +00:00
Edward Z. Yang
c7676afb0d Ignore out/ directory.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1391 48356398-32a2-884e-a903-53898d9a118a
2007-08-26 03:49:11 +00:00
Edward Z. Yang
d75c695994 [2.1.2] Fix problems with standalone distribution, change smoketests so that it's easier to test the standalone.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1388 48356398-32a2-884e-a903-53898d9a118a
2007-08-19 21:38:19 +00:00
Edward Z. Yang
6f6fcbc354 Add install script for PEAR installs.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1385 48356398-32a2-884e-a903-53898d9a118a
2007-08-19 19:52:45 +00:00
Edward Z. Yang
c31d6ec80e Add that PH5P is PHP5 only.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1384 48356398-32a2-884e-a903-53898d9a118a
2007-08-19 19:37:34 +00:00
Edward Z. Yang
cb92a57e4e [2.1.2] Implement experimental HTML5 parsing using PH5P
- Fix debugger so that tokens can be printed without an index
- Fix some broken PEAR unit tests

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1383 48356398-32a2-884e-a903-53898d9a118a
2007-08-19 18:49:35 +00:00
Edward Z. Yang
423afedbf4 [2.1.2] Fix validation errors in configuration form
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1382 48356398-32a2-884e-a903-53898d9a118a
2007-08-19 16:24:55 +00:00
Edward Z. Yang
7827a95273 [2.1.2] Fix some validation problems in printDefinition.php's output
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1381 48356398-32a2-884e-a903-53898d9a118a
2007-08-19 15:38:37 +00:00
Edward Z. Yang
9881a34712 More unit test refactoring into seperate methods.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1380 48356398-32a2-884e-a903-53898d9a118a
2007-08-16 06:48:24 +00:00
Edward Z. Yang
a19f30fdcf [2.1.2] Fix silly little typo with border-collapse:separate
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1379 48356398-32a2-884e-a903-53898d9a118a
2007-08-11 06:52:26 +00:00
Edward Z. Yang
8f58c7f49e [2.1.2?] Final migration for Injectors, deprecate the config and context parameters in assertResult
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1378 48356398-32a2-884e-a903-53898d9a118a
2007-08-08 05:38:52 +00:00
Edward Z. Yang
71301b36eb [2.1.2?] Implemented Object module for trusted users.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1377 48356398-32a2-884e-a903-53898d9a118a
2007-08-08 05:16:15 +00:00
Edward Z. Yang
4f0d012dfa [2.1.2?] Add missing sub-tests for strategy, fix error message typo.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1376 48356398-32a2-884e-a903-53898d9a118a
2007-08-08 05:08:59 +00:00
Edward Z. Yang
24a4dfdf83 [2.1.2?] Fix invisible DirectLex parsing error with empty elements that have attributes containing slashes
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1375 48356398-32a2-884e-a903-53898d9a118a
2007-08-08 05:05:30 +00:00
Edward Z. Yang
f922285383 More unit test refactoring; remove unnecessary periods from HTMLDefinition error messages
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1374 48356398-32a2-884e-a903-53898d9a118a
2007-08-07 05:38:22 +00:00
Edward Z. Yang
3af6457801 Refactor unit tests to have one logical assertion per method.
- Support executing a single unit tests using __only prefix
- Hook in Email classes to main code, even if they're unused


git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1373 48356398-32a2-884e-a903-53898d9a118a
2007-08-06 06:22:23 +00:00
Edward Z. Yang
d51d3c127b Release 2.1.1.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1366 48356398-32a2-884e-a903-53898d9a118a
2007-08-05 01:20:55 +00:00
Edward Z. Yang
4f92c0377f [2.1.1] Fix syntax error in standalone library
- fix faulty PHP4 test
- remove unnecessary HTMLPurifier_Config::create() call

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1365 48356398-32a2-884e-a903-53898d9a118a
2007-08-05 01:15:23 +00:00
Edward Z. Yang
c3efafb07d [2.1.1] Fix *another* but with addFilter() Jeez!
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1364 48356398-32a2-884e-a903-53898d9a118a
2007-08-04 22:46:17 +00:00
Edward Z. Yang
79c18eb781 [2.1.1] Single test methods can be invoked by prefixing them with __only
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1363 48356398-32a2-884e-a903-53898d9a118a
2007-08-04 14:51:06 +00:00
Edward Z. Yang
7b64bc37e2 [2.1.1] Fix show-stopping bug in URIDefinition.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1361 48356398-32a2-884e-a903-53898d9a118a
2007-08-03 21:17:15 +00:00
Edward Z. Yang
b3aa5fa0dc [2.1.1] Add prefix directory to include path in standalone: this prevents PEAR from clobbering our unit tests
- Add missing include to unit test harness
- Add missing unit test for HTMLPurifier::getInstance

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1358 48356398-32a2-884e-a903-53898d9a118a
2007-08-03 15:11:08 +00:00
Edward Z. Yang
350d8301dd Release 2.1.0.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1351 48356398-32a2-884e-a903-53898d9a118a
2007-08-03 03:04:40 +00:00
Edward Z. Yang
a40e16dd2e [2.1.0] Allow i18n font names
- Minor typos fixed; we're release ready!

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1350 48356398-32a2-884e-a903-53898d9a118a
2007-08-03 02:48:52 +00:00
Edward Z. Yang
ee388e86c0 Fix code typo in URI Filter documentation.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1349 48356398-32a2-884e-a903-53898d9a118a
2007-08-03 00:08:45 +00:00
Edward Z. Yang
79df79b2fd [2.1.0] Add tutorial for creating URI Filters
- Update NEWS

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1348 48356398-32a2-884e-a903-53898d9a118a
2007-08-02 23:34:30 +00:00
Edward Z. Yang
f5b72c623c [2.1.0] Implement Ruby.
- Destroy some zombie context variables
- Reorganize some TODO items

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1347 48356398-32a2-884e-a903-53898d9a118a
2007-08-02 22:44:42 +00:00
Edward Z. Yang
7bccc24977 [2.1.0] Implement MakeAbsolute URI filter
- Move some directives with complex dependencies to URIDefinition
- Fix a missing extends
- Add hierarchical information to URI schemes
- Fix bug in URIHarness.

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1346 48356398-32a2-884e-a903-53898d9a118a
2007-08-02 21:47:24 +00:00
Edward Z. Yang
25fe416ab2 Add test-case for blank TinyMCE allowed list.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1345 48356398-32a2-884e-a903-53898d9a118a
2007-08-02 15:13:12 +00:00
Edward Z. Yang
a9012f4387 Guard merge-library against non-cli execution.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1344 48356398-32a2-884e-a903-53898d9a118a
2007-08-02 12:51:52 +00:00
Edward Z. Yang
82f8561123 Factor out cli execution guard to common.php
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1343 48356398-32a2-884e-a903-53898d9a118a
2007-08-02 12:45:15 +00:00
Edward Z. Yang
0b743fb2db Update maintenance files with cgi-fcgi compiled PHP executable workaround.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1342 48356398-32a2-884e-a903-53898d9a118a
2007-08-02 12:40:54 +00:00
Edward Z. Yang
08e32597df Fix flush-definition-cache to clear everything, and make it accept a parameter specifying which cache to flush. Also, set svn:executable to CLI scripts.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1340 48356398-32a2-884e-a903-53898d9a118a
2007-08-02 12:24:50 +00:00
Edward Z. Yang
2b82fbacad Minor re-prioritization of TODO.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1339 48356398-32a2-884e-a903-53898d9a118a
2007-08-02 01:53:46 +00:00
Edward Z. Yang
710820cbe9 [2.1.0] Repair minor PHP4 regression due to undefined configuration directive
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1338 48356398-32a2-884e-a903-53898d9a118a
2007-08-02 01:48:43 +00:00
Edward Z. Yang
22ef52a7f6 [2.1.0] Migrate host blacklist functionality to URIFilter.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1336 48356398-32a2-884e-a903-53898d9a118a
2007-08-02 01:41:37 +00:00
Edward Z. Yang
4919187fc6 [2.1.0] Further refactoring of AttrDef_URI, creation of new URIFilter and URIDefinition subsystems.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1335 48356398-32a2-884e-a903-53898d9a118a
2007-08-02 01:12:27 +00:00
Edward Z. Yang
797b899305 [2.1.0] Create new URI object and migrate URI validation systems to use it. URIScheme interface changed.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1334 48356398-32a2-884e-a903-53898d9a118a
2007-08-01 18:34:46 +00:00
Edward Z. Yang
8c9dbe142d [2.1.0] Refactor AttrDef_URI: removed URIParser functionality
- Genericized flush-definition-cache script

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1333 48356398-32a2-884e-a903-53898d9a118a
2007-08-01 14:55:09 +00:00
Edward Z. Yang
2a002857ce [2.1.0] All unit tests inherit from HTMLPurifier_Harness, not UnitTestCase. prepareCommon() refactored to global test-case.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1332 48356398-32a2-884e-a903-53898d9a118a
2007-08-01 14:06:59 +00:00
Edward Z. Yang
9d98b45dea Fix typo in news file.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1331 48356398-32a2-884e-a903-53898d9a118a
2007-08-01 13:16:49 +00:00
Edward Z. Yang
b0f3116b9e [2.1.0] URI scheme is munged off if there is no authority and the scheme is the default one
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1330 48356398-32a2-884e-a903-53898d9a118a
2007-08-01 13:15:33 +00:00
Edward Z. Yang
b03a44abff Remove expectations from assertOutput in URITest.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1329 48356398-32a2-884e-a903-53898d9a118a
2007-08-01 02:19:43 +00:00
Edward Z. Yang
cf257cabde [2.1.0]
- AttrDef_URI unit tests refactored
- Block access to benchmarks: they should be called via command line

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1328 48356398-32a2-884e-a903-53898d9a118a
2007-08-01 01:48:51 +00:00
Edward Z. Yang
ab950a1909 [2.1.0] Fix fairly major bug introduced when logic was reorganized.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1324 48356398-32a2-884e-a903-53898d9a118a
2007-07-31 02:39:49 +00:00
Edward Z. Yang
a12ea4bb3b [2.1.0] Fix bug in mkdir_deep that would prevent absolute paths in Unix systems from being created properly
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1321 48356398-32a2-884e-a903-53898d9a118a
2007-07-31 02:04:32 +00:00
Edward Z. Yang
f80de908bd [2.1.0] Optimize ConfigSchema to only perform safety checks when HTMLPURIFIER_SCHEMA_STRICT is true
- Remove useless ->revision check in Config.php
- Add simple trace file to benchmarks folder

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1319 48356398-32a2-884e-a903-53898d9a118a
2007-07-31 01:04:38 +00:00
Edward Z. Yang
349c4de75b [2.1.0] Standalone file now can be generated using maintenance/merge-library.php. Also:
- HTMLPURIFIER_PREFIX constant added, and relevant files transitioned over
- Custom ChildDef added to default include list
- Tester accepts ?standalone parameter

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1316 48356398-32a2-884e-a903-53898d9a118a
2007-07-30 16:56:50 +00:00
Edward Z. Yang
89622c964e [2.1.0] Genericize element contents removal. This is done in a slightly hacky way since ElementDef is not available, but should be sufficient.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1313 48356398-32a2-884e-a903-53898d9a118a
2007-07-11 20:42:58 +00:00
Edward Z. Yang
732fe5cad7 [2.1.0] Two tiny bugfixes:
- Remove contents of <style> tags
- Use XHTMLStrict Tidy routines for XHTML 1.1

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1311 48356398-32a2-884e-a903-53898d9a118a
2007-07-11 20:06:15 +00:00
Edward Z. Yang
e7e81c0a5b [2.1.0] Fix some minor DirectLex bugs that may lead to PHP errors
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1310 48356398-32a2-884e-a903-53898d9a118a
2007-07-05 21:29:07 +00:00
Edward Z. Yang
626b2a13c8 Typographical and linkrot fixes for UTF-8 doc.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1308 48356398-32a2-884e-a903-53898d9a118a
2007-07-05 16:50:48 +00:00
Edward Z. Yang
35487c02ae Update test settings template.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1303 48356398-32a2-884e-a903-53898d9a118a
2007-06-30 16:13:10 +00:00
Edward Z. Yang
4bc1761b12 Update test settings file with more options.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1302 48356398-32a2-884e-a903-53898d9a118a
2007-06-30 05:02:27 +00:00
Edward Z. Yang
63f5414f2e [Phorum] Refactor settings.php into different files.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1298 48356398-32a2-884e-a903-53898d9a118a
2007-06-29 20:34:19 +00:00
Edward Z. Yang
88d014706b [Phorum] Double-reverse control.php's double-escaping
- Implement signature migration

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1297 48356398-32a2-884e-a903-53898d9a118a
2007-06-29 20:00:38 +00:00
Edward Z. Yang
f6de73d7e7 [Phorum] Deal more gracefully with signatures and edit messages. More improvements.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1296 48356398-32a2-884e-a903-53898d9a118a
2007-06-29 18:25:13 +00:00
Edward Z. Yang
733868a76d [2.1.0] Fix another AutoParagraph edge-case.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1295 48356398-32a2-884e-a903-53898d9a118a
2007-06-29 17:48:56 +00:00
Edward Z. Yang
fab6a212c8 Turn off WYSIWYG.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1286 48356398-32a2-884e-a903-53898d9a118a
2007-06-29 17:03:55 +00:00
Edward Z. Yang
ea1362ce5c [Phorum] Minor enhancements: add cache purge support and give a friendly HTML is on message above editor.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1281 48356398-32a2-884e-a903-53898d9a118a
2007-06-29 15:43:23 +00:00
Edward Z. Yang
cff498ef67 [2.1.0] Refine autoparagraphing algorithm.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1278 48356398-32a2-884e-a903-53898d9a118a
2007-06-29 03:57:14 +00:00
Edward Z. Yang
1765a7537a [Phorum] Fix cross-platform mutilation of cache data, remove excess newlines.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1277 48356398-32a2-884e-a903-53898d9a118a
2007-06-29 03:41:21 +00:00
Edward Z. Yang
d7157d0ccd Tweak to make more conducive to our new textareas.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1274 48356398-32a2-884e-a903-53898d9a118a
2007-06-29 02:17:17 +00:00
Edward Z. Yang
ed44b5c5ba [2.1.0] ConfigForm generates textareas instead of text inputs for lists, hashes, lookups, text and itext fields
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1273 48356398-32a2-884e-a903-53898d9a118a
2007-06-29 02:16:47 +00:00
Edward Z. Yang
5e5c0f3aa4 [2.1.0]
. Introduce new text/itext configuration directive values: these represent longer strings that would be more appropriately edited with a textarea
. Allow newlines to act as separators for lists, hashes, lookups and %HTML.Allowed

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1272 48356398-32a2-884e-a903-53898d9a118a
2007-06-29 01:54:48 +00:00
Edward Z. Yang
b2ed0aff01 [Phorum] Final polishing: Have default config auto-detect character encoding; add WYSIWYG hook; update error message to be more friendly
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1271 48356398-32a2-884e-a903-53898d9a118a
2007-06-29 00:48:55 +00:00
Edward Z. Yang
148681d1b0 Tidy up Phorum extension. Add svn:ignore, add img to the default list of allowed tags (for smileys), improve naming convention, move loading code out of main namespace, and add reset. Probably the last thing to do before this is feature complete is to have a WYSIWYG flag that turns on escaping for edit posts.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1270 48356398-32a2-884e-a903-53898d9a118a
2007-06-29 00:28:07 +00:00
Edward Z. Yang
2e7e411491 [2.1.0] Fix bug in auto-paragraphing: empty tags should be treated like start tags too.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1269 48356398-32a2-884e-a903-53898d9a118a
2007-06-29 00:24:59 +00:00
Edward Z. Yang
02051e465c [2.1.0] Phorum mod implemented for HTML Purifier. Some other code adjustments were made, they need to be cleaned up.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1267 48356398-32a2-884e-a903-53898d9a118a
2007-06-28 23:01:27 +00:00
Edward Z. Yang
a96b5bf612 [2.1.0] Friendly error messages for when injector needs a tag that's not allowed added
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1265 48356398-32a2-884e-a903-53898d9a118a
2007-06-28 13:06:15 +00:00
Edward Z. Yang
9dd7c8c7dd Add reference document on CSS lengths.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1264 48356398-32a2-884e-a903-53898d9a118a
2007-06-27 23:49:50 +00:00
Edward Z. Yang
0c59db1da3 Bring Null's flush() interface inline with parent.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1263 48356398-32a2-884e-a903-53898d9a118a
2007-06-27 21:03:07 +00:00
Edward Z. Yang
584a1abd15 [2.1.0] Standardize interface for Injector (we actually got it wrong in the 2.0.1-strict version, but this'll fix it)
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1262 48356398-32a2-884e-a903-53898d9a118a
2007-06-27 19:01:09 +00:00
Edward Z. Yang
a6ede3804e [2.1.0] True emoticon < fix.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1260 48356398-32a2-884e-a903-53898d9a118a
2007-06-27 16:40:18 +00:00
Edward Z. Yang
4476745003 Add new entries for 2.1.0 and 2.0.2
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1258 48356398-32a2-884e-a903-53898d9a118a
2007-06-27 15:16:27 +00:00
Edward Z. Yang
45748500ec Release 2.0.1.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1254 48356398-32a2-884e-a903-53898d9a118a
2007-06-27 14:10:43 +00:00
Edward Z. Yang
e99520ab96 Remove trailing ?> in PHP library files, add trailing newlines to all other files.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1253 48356398-32a2-884e-a903-53898d9a118a
2007-06-27 13:58:32 +00:00
Edward Z. Yang
1e2abb7f8f Fix little PHP 4.4.0 bug involving return by reference.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1252 48356398-32a2-884e-a903-53898d9a118a
2007-06-27 13:29:01 +00:00
Edward Z. Yang
362c802191 Since we're passing a temporary variable by reference, it needs to be committed back onto to the main array. To be honest, I'm not terribly happy with this behavior, but it doesn't seem to break anything.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1251 48356398-32a2-884e-a903-53898d9a118a
2007-06-27 13:20:02 +00:00
Edward Z. Yang
3a1d505b3d [2.0.1] Implement haphazard error collection for AttrValidator.
- Error collector / Language can take arrays and listify them
- AttrValidator takes token by reference
- Formatted errors now have their severity <strong>
- 100 test-cases! W00t!

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1250 48356398-32a2-884e-a903-53898d9a118a
2007-06-27 02:03:15 +00:00
Edward Z. Yang
a005da8a4c [2.0.1] Add error messages for FixNesting
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1249 48356398-32a2-884e-a903-53898d9a118a
2007-06-26 23:43:28 +00:00
Edward Z. Yang
9a66394abb Add warning on how error reporting is incomplete to "No Errors" message.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1247 48356398-32a2-884e-a903-53898d9a118a
2007-06-26 19:54:33 +00:00
Edward Z. Yang
62c0575468 [2.0.1] Fix minor regression where ins/del broadcasted most restrictive set when they should have been more lenient.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1246 48356398-32a2-884e-a903-53898d9a118a
2007-06-26 19:38:26 +00:00
Edward Z. Yang
6a95d91a1a [2.0.1] Revamp error collector scheme: we now have custom mocks and an exchange of responsibilities
- Fix oversight in AutoParagraph dealing with armor.
- Order errors with no line number last
- Language object now needs $config and $context objects to do parameterized objects
- Auto-close notice added
- Token constructors accept line numbers

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1245 48356398-32a2-884e-a903-53898d9a118a
2007-06-26 19:33:37 +00:00
Edward Z. Yang
275932ec05 [2.0.1] Fix DirectLex's incomprehension of un-armored script contents as CDATA using custom preg_replace_callback
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1244 48356398-32a2-884e-a903-53898d9a118a
2007-06-26 16:08:42 +00:00
Edward Z. Yang
ae90bb919d Remove unnecessary $this parameters from mock instantiation; SimpleTest doesn't use it!
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1243 48356398-32a2-884e-a903-53898d9a118a
2007-06-26 15:40:04 +00:00
Edward Z. Yang
3c734b4c72 [2.0.1] Implement error messages for MakeWellFormed. Armor AutoParagraph generated p start tags from these tag closing errors. Fix another auto-paragraphing edge-case. Create common Strategy error harness.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1242 48356398-32a2-884e-a903-53898d9a118a
2007-06-26 15:07:07 +00:00
Edward Z. Yang
3d02a2a7d4 Remove magic quotes test: it fails in systems that have magic quotes disabled.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1241 48356398-32a2-884e-a903-53898d9a118a
2007-06-26 03:34:29 +00:00
Edward Z. Yang
0bfa42f9b7 Downgrade comment removal error to E_NOTICE.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1240 48356398-32a2-884e-a903-53898d9a118a
2007-06-26 03:22:07 +00:00
Edward Z. Yang
7a8edc88f9 [2.0.1] Implement error collection for RemoveForeignElements.
- Register Generator context variable.
- Implement special substitutions for error collector.
- Also sort by order the errors came in.
- Fix line number determination bug in Lexer::create().
- Remove vestigial variables.
- Force all tag transforms to use copy(), implement serialize, unserialize algorithm for copy() in tokens.

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1238 48356398-32a2-884e-a903-53898d9a118a
2007-06-26 02:49:21 +00:00
Edward Z. Yang
98b4e70a93 [2.0.1] Rewire line numbering so that if it's null it's autodetected based on error collection. also, update TODO.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1237 48356398-32a2-884e-a903-53898d9a118a
2007-06-25 23:22:35 +00:00
Edward Z. Yang
6f5592ae60 [2.0.1] Normalize newlines to \n for internal processing.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1235 48356398-32a2-884e-a903-53898d9a118a
2007-06-25 19:18:55 +00:00
Edward Z. Yang
9f996b125a [2.0.1]
- Printer adheres to configuration's directives on output format
- Fix improperly named form field in ConfigForm printer
. HTMLPurifier_Config::getAllowedDirectivesForForm implemented, allows much easier selective embedding of configuration values
. Doctype objects now accept public and system DTD identifiers
. %HTML.Doctype is now constrained by specific values, to specify a custom doctype use new %HTML.CustomDoctype
. ConfigForm truncates long directives to keep the form small, and does not re-output namespaces

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1232 48356398-32a2-884e-a903-53898d9a118a
2007-06-25 18:38:39 +00:00
Edward Z. Yang
96b571d236 [2.0.1] Fix unescaped print_r that handles user input
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1231 48356398-32a2-884e-a903-53898d9a118a
2007-06-25 15:20:45 +00:00
Edward Z. Yang
0e9904a9ba Factor out DirectLex error testing to its own class.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1230 48356398-32a2-884e-a903-53898d9a118a
2007-06-25 01:56:00 +00:00
Edward Z. Yang
e66a98c396 [2.0.1] Convert test language messages to use new format.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1229 48356398-32a2-884e-a903-53898d9a118a
2007-06-25 01:11:56 +00:00
Edward Z. Yang
728088f2ba [2.0.1] Rather than pass line number by parameter, have it be retrieved via Context. Add $ignore_error boolean to get().
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1228 48356398-32a2-884e-a903-53898d9a118a
2007-06-25 01:08:57 +00:00
Edward Z. Yang
8ae2604440 [2.0.1] Start making more moves towards full error reporting. Revise message naming conventions. Fix variable assignment for error collecting. Revise Language interface to be as readable as possible (NOT compact). Add error reporting to DirectLex. Rewrite ErrorCollector.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1227 48356398-32a2-884e-a903-53898d9a118a
2007-06-25 00:48:26 +00:00
Edward Z. Yang
7b087c7bbe [2.0.1] Add severity to error collector
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1226 48356398-32a2-884e-a903-53898d9a118a
2007-06-24 23:20:35 +00:00
Edward Z. Yang
58064592ff [2.0.1]
- Stray xmlns attributes removed from configuration documentation
. Interlinking in configuration documentation added using Injector_PurifierLinkify
. Directives now keep track of aliases to themselves

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1225 48356398-32a2-884e-a903-53898d9a118a
2007-06-24 22:22:00 +00:00
Edward Z. Yang
b19fc32a5a Genericize Injector loading code, create new AutoFormatParam namespace, move out unit tests.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1224 48356398-32a2-884e-a903-53898d9a118a
2007-06-24 21:35:34 +00:00
Edward Z. Yang
b15cbbb42a [2.0.1] Officially add experimental auto-paragraphing and linkification functionality. Rename %Core.DefinitionCache to %Cache.DefinitionImpl. Have AutoParagraph handle even more edge cases. Fix MakeWellFormed bug.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1223 48356398-32a2-884e-a903-53898d9a118a
2007-06-24 20:29:50 +00:00
Edward Z. Yang
5f0663cad7 Refactor MakeWellFormed/Injector for performance and as little code duplication as possible. Also, make AutoParagraph smarter about root nodes that don't like p tags.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1221 48356398-32a2-884e-a903-53898d9a118a
2007-06-24 17:44:27 +00:00
Edward Z. Yang
75e52a12a6 Make context errors more friendly; factor out disabled; fix broken test cases; update TODO.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1220 48356398-32a2-884e-a903-53898d9a118a
2007-06-24 04:22:28 +00:00
Edward Z. Yang
269268b843 Fix possible infinite loop by incrementing everybody's offsets. Add printTokens debugger function. Refine Linkify parent node checks (also check excludes, although technically later steps will catch it!)
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1218 48356398-32a2-884e-a903-53898d9a118a
2007-06-24 03:34:33 +00:00
Edward Z. Yang
62c6d93b6d Add more unit tests; everything seems to be good, but I'm suspicious.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1217 48356398-32a2-884e-a903-53898d9a118a
2007-06-24 02:59:06 +00:00
Edward Z. Yang
31704c92f6 Implement working linkification, now, the real challenge is to get it to play nice with auto-paragraphing.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1216 48356398-32a2-884e-a903-53898d9a118a
2007-06-24 02:45:38 +00:00
Edward Z. Yang
291fa4cb29 Convert to numerically indexed array.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1215 48356398-32a2-884e-a903-53898d9a118a
2007-06-24 02:27:57 +00:00
Edward Z. Yang
389fcc9a5d Convert injector to use arrays.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1214 48356398-32a2-884e-a903-53898d9a118a
2007-06-24 02:17:34 +00:00
Edward Z. Yang
e5191b3ada [2.0.1] Scrap auto_close in favor of ChildDef->elements heuristic.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1213 48356398-32a2-884e-a903-53898d9a118a
2007-06-23 20:52:57 +00:00
Edward Z. Yang
5d0a992579 Refactor Injector not to edit $result directly.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1212 48356398-32a2-884e-a903-53898d9a118a
2007-06-23 19:39:03 +00:00
Edward Z. Yang
ae83bebc98 Convert handleStart to the new format.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1211 48356398-32a2-884e-a903-53898d9a118a
2007-06-23 18:58:51 +00:00
Edward Z. Yang
9191877740 Factor out auto-paragraph to injector class.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1210 48356398-32a2-884e-a903-53898d9a118a
2007-06-23 18:50:41 +00:00
Edward Z. Yang
3066ca357a Further refactoring in preparation for logic change.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1209 48356398-32a2-884e-a903-53898d9a118a
2007-06-23 17:44:28 +00:00
Edward Z. Yang
53fd096641 Refactor auto-paragraph code in preparation for fundamental logic change.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1208 48356398-32a2-884e-a903-53898d9a118a
2007-06-23 17:11:05 +00:00
Edward Z. Yang
2166246b7e Fix quick bug: it's 2 dashes, not 3.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1207 48356398-32a2-884e-a903-53898d9a118a
2007-06-23 14:48:08 +00:00
Edward Z. Yang
49bb6ec35d [2.0.1] DefinitionCache no longer throws errors when it encounters old serial files that do not conform to the current style
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1206 48356398-32a2-884e-a903-53898d9a118a
2007-06-23 14:45:13 +00:00
Edward Z. Yang
401612dc3a [2.0.1] Improve directory permissions checks. UNTESTED!!!
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1205 48356398-32a2-884e-a903-53898d9a118a
2007-06-23 14:38:16 +00:00
Edward Z. Yang
dc0fb7d2b4 [2.0.1] DefinitionCache related bug-fixes
- Fixed bug where manually modified definitions were not saved via cache (mostly harmless, except for the fact that it would be a little slower)
- Configuration objects with different serials do not clobber each others when revision numbers are unequal
. DefinitionCache keys reordered to reflect precedence: version number, hash, then revision number

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1204 48356398-32a2-884e-a903-53898d9a118a
2007-06-23 14:05:09 +00:00
Edward Z. Yang
eee45fed37 [2.0.1] Add preliminary auto-paragraph implementation. It needs to be aggressively refactored and generalized.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1202 48356398-32a2-884e-a903-53898d9a118a
2007-06-22 21:32:56 +00:00
Edward Z. Yang
03657ad51a Update NEWS.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1197 48356398-32a2-884e-a903-53898d9a118a
2007-06-22 00:09:20 +00:00
Edward Z. Yang
dda4038446 [2.0.1] Reorder definition cache includes
- Update some comments, rename some variables

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1196 48356398-32a2-884e-a903-53898d9a118a
2007-06-21 23:56:19 +00:00
Edward Z. Yang
996ccdbdda [1.7.0] Update HTMLDefinition printer with some of the new attributes.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1192 48356398-32a2-884e-a903-53898d9a118a
2007-06-21 16:02:06 +00:00
Edward Z. Yang
008348db21 Update TODO.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1191 48356398-32a2-884e-a903-53898d9a118a
2007-06-21 15:28:50 +00:00
Edward Z. Yang
b10a380ff4 [2.0.1] Rewire test-cases to swallow errors, not expect them
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1190 48356398-32a2-884e-a903-53898d9a118a
2007-06-21 15:15:02 +00:00
Edward Z. Yang
bf0d659c47 [2.0.1] Improve special case handling for <script>
- DirectLex now honors comments with greater than or less than signs in them
- Comments are transformed into script elements, ending comments are scrapped
- Buggy generator code rewritten to be more error-proof
- AttrValidator checks if token has attributes before processing
- Remove invalid documentation from Scripting
- "Commenting" of script elements switched to the more advanced version

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1189 48356398-32a2-884e-a903-53898d9a118a
2007-06-21 14:44:26 +00:00
Edward Z. Yang
e55551ecdd Remove SVN checkout warnings from these two docs: they are no longer applicable.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1185 48356398-32a2-884e-a903-53898d9a118a
2007-06-21 02:14:47 +00:00
Edward Z. Yang
e9f3fef47b Release 2.0.0.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1178 48356398-32a2-884e-a903-53898d9a118a
2007-06-20 23:40:10 +00:00
Edward Z. Yang
840f9f7434 Update INSTALL document.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1176 48356398-32a2-884e-a903-53898d9a118a
2007-06-20 22:36:10 +00:00
Edward Z. Yang
10c970760d [1.7.0] Complete Customization end user tutorial.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1175 48356398-32a2-884e-a903-53898d9a118a
2007-06-20 22:08:45 +00:00
Edward Z. Yang
69996acc9e [1.7.0] Add native support for required elements
- Factored out large portion of ValidateAttributes to AttrValidator
- Implemented ValidateAttributes armor
- Fix clear cache bug
- Implement armoring for ValidateAttributes

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1174 48356398-32a2-884e-a903-53898d9a118a
2007-06-20 21:39:28 +00:00
Edward Z. Yang
8bbb73e47d [1.7.0] ChildDef_Custom's regex generation has been improved, removing several false positives
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1173 48356398-32a2-884e-a903-53898d9a118a
2007-06-20 15:54:50 +00:00
Edward Z. Yang
cf7a50163c Officially transition from 1.7 -> 2.0, mass substitution. Also, wrote WHATSNEW. We are in feature-freeze!
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1172 48356398-32a2-884e-a903-53898d9a118a
2007-06-20 03:00:36 +00:00
Edward Z. Yang
da2ea348fd [1.7.0] Change ->Revision member variable to a legit configuration directive. Start writing tutorial for customization.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1171 48356398-32a2-884e-a903-53898d9a118a
2007-06-20 02:43:43 +00:00
Edward Z. Yang
ab3ebcba6d Update TODO.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1170 48356398-32a2-884e-a903-53898d9a118a
2007-06-19 22:26:57 +00:00
Edward Z. Yang
d399abba50 [1.7.0] Bug resulting from tag transforms to non-allowed elements fixed
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1169 48356398-32a2-884e-a903-53898d9a118a
2007-06-19 22:10:39 +00:00
Edward Z. Yang
0b0a505c30 [1.7.0] Implement addElement: the advanced API is complete!
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1165 48356398-32a2-884e-a903-53898d9a118a
2007-06-19 01:55:31 +00:00
Edward Z. Yang
6aa3dfc116 [1.7.0] Implement addAttribute() of advanced API.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1164 48356398-32a2-884e-a903-53898d9a118a
2007-06-19 01:29:50 +00:00
Edward Z. Yang
c3094275ef Fix PHP4 compatibility problems with substr_count
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1163 48356398-32a2-884e-a903-53898d9a118a
2007-06-19 01:20:00 +00:00
Edward Z. Yang
220c150e0a [1.7.0] StrictBlockquote child definition refrains from wrapping whitespace in tags now.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1159 48356398-32a2-884e-a903-53898d9a118a
2007-06-18 19:53:46 +00:00
Edward Z. Yang
32d30a9181 Add note that functionality IS NOT released yet. This needs to be removed once 1.7/2.0 comes out.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1158 48356398-32a2-884e-a903-53898d9a118a
2007-06-18 19:26:29 +00:00
Edward Z. Yang
0e5491b20c [1.7.0] Wire in Language and ErrorCollector to main class, now, the only thing to do is actually implement the stuff
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1157 48356398-32a2-884e-a903-53898d9a118a
2007-06-18 03:05:18 +00:00
Edward Z. Yang
7699efd593 Implement bare minimum extra functions for language implementation.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1156 48356398-32a2-884e-a903-53898d9a118a
2007-06-18 02:25:27 +00:00
Edward Z. Yang
4bf15de536 [1.7.0] Implement line number counting in DirectLex, in preparation for error reporting
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1155 48356398-32a2-884e-a903-53898d9a118a
2007-06-18 02:01:01 +00:00
Edward Z. Yang
70bcccf54c Update docs for config.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1154 48356398-32a2-884e-a903-53898d9a118a
2007-06-18 00:40:15 +00:00
Edward Z. Yang
bf6ce67fc1 [1.7.0] Prototype-declarations for Lexer removed in favor of configuration determination of Lexer implementations.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1153 48356398-32a2-884e-a903-53898d9a118a
2007-06-17 21:27:39 +00:00
Edward Z. Yang
bd44105ca9 [1.7.0] DOMLex will not emit errors when a custom error handler that does not honor error_reporting is used
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1152 48356398-32a2-884e-a903-53898d9a118a
2007-06-17 20:36:29 +00:00
Edward Z. Yang
d1f43636e5 [1.7.0] DefinitionCache->flush() now requires configuration object. DefinitionCache_Serializer now will create directories for new types on the fly, and can accept custom directories to save serials into.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1147 48356398-32a2-884e-a903-53898d9a118a
2007-06-16 20:46:44 +00:00
Edward Z. Yang
9c7483166c [1.7.0] Add DefinitionID for HTML, to prevent caching conflicts with custom-edited definition objects. Also, more user friendly error messages from Config.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1146 48356398-32a2-884e-a903-53898d9a118a
2007-06-16 20:21:00 +00:00
Edward Z. Yang
e840564228 [1.7.0] Contents between <script> tags are now completely removed if <script> is not allowed
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1145 48356398-32a2-884e-a903-53898d9a118a
2007-06-16 19:31:45 +00:00
Edward Z. Yang
7d4b532d6b Update API.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1136 48356398-32a2-884e-a903-53898d9a118a
2007-06-12 03:03:28 +00:00
Edward Z. Yang
58f00105c8 Update txt docs.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1134 48356398-32a2-884e-a903-53898d9a118a
2007-06-09 14:53:21 +00:00
Edward Z. Yang
8d15d1ce13 Repair links to renamed documentation; fix typo in ref-html-modularization.txt
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1127 48356398-32a2-884e-a903-53898d9a118a
2007-06-08 01:52:42 +00:00
Edward Z. Yang
9c60eeed04 Rename xhtml-1.1 to html-modularization and remove outdated segments.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1124 48356398-32a2-884e-a903-53898d9a118a
2007-06-02 18:59:58 +00:00
Edward Z. Yang
2e089477a5 Rename and rewrite content models docs.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1123 48356398-32a2-884e-a903-53898d9a118a
2007-06-02 18:51:50 +00:00
Edward Z. Yang
b442d09ea6 [1.7.0] Update INSTALL and basic example to use the new APIs.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1120 48356398-32a2-884e-a903-53898d9a118a
2007-05-29 21:31:24 +00:00
Edward Z. Yang
12f73605a3 [1.7.0] Implement HTML.Allowed, a TinyMCE style whitelist format.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1119 48356398-32a2-884e-a903-53898d9a118a
2007-05-29 21:26:43 +00:00
Edward Z. Yang
e2a951420f [1.7.0] Implement Cleanup decorator
- Create generic DecoratorHarness
- Name decorators, so that they can be overridden or removed
- Add setup function to definition cache factory

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1118 48356398-32a2-884e-a903-53898d9a118a
2007-05-29 20:49:33 +00:00
Edward Z. Yang
002395de09 [1.7.0] Add DefinitionCache decorators, implement Memory decorator
- Move serialization responsibility to Config
- Create DefinitionCacheFactory
- Implement Null definition cache

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1117 48356398-32a2-884e-a903-53898d9a118a
2007-05-29 20:21:33 +00:00
Edward Z. Yang
d1187ed331 [1.7.0] Add versioning to serializer cache
- Make some AttrDef member-variables lazy-loading to save serialization space, clean up others
- Refactor get*Definition() methods

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1116 48356398-32a2-884e-a903-53898d9a118a
2007-05-29 18:19:42 +00:00
Edward Z. Yang
426fbd1f97 [1.7.0] Complete Legacy element and attribute native support.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1115 48356398-32a2-884e-a903-53898d9a118a
2007-05-29 16:51:32 +00:00
Edward Z. Yang
9c5f01a0cf [1.7.0] Fix bug in Bool class
- Genericize allElements into basic smoketest, add beginnings of legacy smoketest too.

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1113 48356398-32a2-884e-a903-53898d9a118a
2007-05-29 02:12:08 +00:00
Edward Z. Yang
f985d3cd96 Add initial allElements smoketest. Incomplete.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1112 48356398-32a2-884e-a903-53898d9a118a
2007-05-29 00:39:00 +00:00
Edward Z. Yang
0cb1d85822 Cordon off configuration form values into one form element name.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1111 48356398-32a2-884e-a903-53898d9a118a
2007-05-28 13:15:06 +00:00
Edward Z. Yang
073ddb0cb2 Remove unlink(types.xml) from cleanup
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1110 48356398-32a2-884e-a903-53898d9a118a
2007-05-28 04:27:51 +00:00
Edward Z. Yang
889ccb1a92 Centralize types.xml writing.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1109 48356398-32a2-884e-a903-53898d9a118a
2007-05-28 04:26:25 +00:00
Edward Z. Yang
aec84dc3f6 Simplify generate.php variable naming and comments.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1108 48356398-32a2-884e-a903-53898d9a118a
2007-05-28 03:58:02 +00:00
Edward Z. Yang
dea62ffdab - Modify hash format to be more intuitive
- Add parameter that controls magic quotes processing in loadArrayFromForm

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1107 48356398-32a2-884e-a903-53898d9a118a
2007-05-28 03:55:36 +00:00
Edward Z. Yang
8913239b7f Document Printer_ConfigForm. Factor out form controls to printer.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1106 48356398-32a2-884e-a903-53898d9a118a
2007-05-28 03:49:06 +00:00
Edward Z. Yang
e06929c218 Further refactoring to remove hacks. Move everything into the ConfigDoc facade object. Add parameters to plain.xsl. Optionally singleton-ize HTML Purifier. Add loadArrayFromForm to Config object.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1105 48356398-32a2-884e-a903-53898d9a118a
2007-05-28 03:33:12 +00:00
Edward Z. Yang
aaf4839c34 Further refactor ConfigDoc, creating HTMLXSLTProcessor. Update NEWS.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1104 48356398-32a2-884e-a903-53898d9a118a
2007-05-28 02:55:50 +00:00
Edward Z. Yang
c113f43440 Add basic structure for ConfigDoc namespace, begin moving things over.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1103 48356398-32a2-884e-a903-53898d9a118a
2007-05-28 02:41:01 +00:00
Edward Z. Yang
bd8ecdd268 Rewire test runner to use full path to test file, this means we can introduce new namespaces.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1102 48356398-32a2-884e-a903-53898d9a118a
2007-05-28 02:29:48 +00:00
Edward Z. Yang
ef51f8681a [1.7.0] Create ConfigForm printer classes
- Extend hash to convert strings from form key,value,key,value
- Hack up configdoc to accommodate configForm.php smoketest

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1101 48356398-32a2-884e-a903-53898d9a118a
2007-05-28 02:20:55 +00:00
Edward Z. Yang
ee61ffc0d9 Minor test-case refactoring.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1100 48356398-32a2-884e-a903-53898d9a118a
2007-05-27 23:12:17 +00:00
Edward Z. Yang
f758f7c534 Oh whitespace how I despise you! Fix whitespace discrepancies between DOM and DirectLex.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1099 48356398-32a2-884e-a903-53898d9a118a
2007-05-27 16:17:14 +00:00
Edward Z. Yang
95499e34da Factor out common DefinitionCache test code to a harness.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1098 48356398-32a2-884e-a903-53898d9a118a
2007-05-27 15:52:45 +00:00
Edward Z. Yang
de23201cbb [1.7.0] HTML Purifier now works with PHP 4.3.2. Yay!
- Armor some character index checking
- Add compatibility stuff for PHP_EOL
- Add autoclose for colgroup
- Compensate for realpath() quirkiness in old versions
- Add flush maintenance script

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1096 48356398-32a2-884e-a903-53898d9a118a
2007-05-27 14:27:54 +00:00
Edward Z. Yang
21ab12a6a8 [1.7.0] Add missing functions for DefinitionCache: replace, flush and type-checking
- Add version to configuration object, and have update script change it accordingly

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1095 48356398-32a2-884e-a903-53898d9a118a
2007-05-27 13:25:54 +00:00
Edward Z. Yang
69666e977f Fixed typo that caused problems with native PHP 4 fwrite Serializer code.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1094 48356398-32a2-884e-a903-53898d9a118a
2007-05-25 01:44:01 +00:00
Edward Z. Yang
fa05319e30 [1.7.0] Factor out caching of definitions to DefinitionCache, hook in CSS, add a bunch of todos for this functionality. Attr namespace no longer affects HTMLDefinition.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1093 48356398-32a2-884e-a903-53898d9a118a
2007-05-25 01:32:29 +00:00
Edward Z. Yang
ea46d79b0a Add missing parent class Definition.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1092 48356398-32a2-884e-a903-53898d9a118a
2007-05-24 22:08:29 +00:00
Edward Z. Yang
a62f8971e4 [1.7.0] Refactor HTMLDefinition and CSSDefinition to have a common Definition parent, rename setup() to doSetup() and make setup() call the template method after setting the setup variable. Test for references in ConfigTest.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1091 48356398-32a2-884e-a903-53898d9a118a
2007-05-24 21:50:43 +00:00
Edward Z. Yang
7a3e06d4d0 [1.7.0] Lexer is now pre-emptively included, with a conditional include for the PHP5 only version.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1090 48356398-32a2-884e-a903-53898d9a118a
2007-05-24 20:36:50 +00:00
Edward Z. Yang
e180b7689e [1.7.0] Implement HTMLDefinition cache (very hacked together, but long unit test times were driving me crazy!)
- Add extra protection in AttrDef_URI against phantom Schemes
- Doctype moved from config to HTMLDefinition
- AttrDef_URITest mocks have more generic object parameters to deal with PHP4's copy-happy behavior

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1089 48356398-32a2-884e-a903-53898d9a118a
2007-05-23 03:27:36 +00:00
Edward Z. Yang
7579932948 [1.7.0] New compact syntax for AttrDef objects that can be used to instantiate new objects via make()
- Implemented make() for Enum and Bool
- Migrate classes over to this new syntax
- Add AttrDef_HTML_Bool unit test

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1088 48356398-32a2-884e-a903-53898d9a118a
2007-05-23 00:39:07 +00:00
Edward Z. Yang
818d0d7a23 [1.7.0] Add missing includes for AttrTypes, add phantom unit test for future things to come
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1087 48356398-32a2-884e-a903-53898d9a118a
2007-05-22 23:48:38 +00:00
Edward Z. Yang
797d3e0393 [1.7.0] Rewire dependencies, removing redundant includes and adding necessary ones
- Rework descendants_are_inline to have default value as false, ins/del handling now works top-level when parent element is not block
- Remove CleanUTF8OnGeneration, feature didn't even work

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1086 48356398-32a2-884e-a903-53898d9a118a
2007-05-22 00:47:03 +00:00
Edward Z. Yang
ff7eec7424 Properly tag Tidy with keyword prop.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1085 48356398-32a2-884e-a903-53898d9a118a
2007-05-21 03:03:25 +00:00
Edward Z. Yang
0ea04db559 [1.7.0] Finish implementing legacy elements, begin implementing legacy attributes
- Migrated most unit tests over to XHTML 1.0 Strict to preserve transformation behavior
- Created %Core.ColorKeywords to be shared between CSS_Color and HTML_Color
- Added AttrDef_HTML_Color as AttrType Color
- HTMLPurifier_Config::create(HTMLPurifier_Config $config) now clones the object
- Attribute minimization for HTML implemented in Generator
- Move div@align fix from proprietary to regular set
- Color keywords now map to full six digit hexadecimal codes
- Harness will now tack on per-use-case configuration

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1084 48356398-32a2-884e-a903-53898d9a118a
2007-05-21 01:36:15 +00:00
Edward Z. Yang
831db14c79 [1.7.0] Remove HTMLModule tests. They were a bad idea.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1083 48356398-32a2-884e-a903-53898d9a118a
2007-05-21 00:24:32 +00:00
Edward Z. Yang
a470fc5621 [1.7.0] Refactor HTMLModule unit tests
- AttrCollections does not barf when an inclusion is not present
- HTMLDefinition configuration directives now use new syntax
- Added %HTML.AllowedModules and %HTML.CoreModules for testing
- Extend Harness so that it can accept a default configuration object member variable
- Refactor modules to use Scaffolding, which defines some custom attributes that allows for the easy testing of attribute collections

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1082 48356398-32a2-884e-a903-53898d9a118a
2007-05-20 22:29:31 +00:00
Edward Z. Yang
2945f6a930 [1.7.0] Implement u, s, and strike tag transforms
- Extend Simple so that it can accept some light CSS
- Remove Center transform in favor of Simple

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1081 48356398-32a2-884e-a903-53898d9a118a
2007-05-20 21:22:54 +00:00
Edward Z. Yang
71326abec1 Armor maintenance script by testing for CLI.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1080 48356398-32a2-884e-a903-53898d9a118a
2007-05-20 20:05:52 +00:00
Edward Z. Yang
23ef535043 Update WYSIWYG by removing Mantis link: bugtracker is no longer active.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1079 48356398-32a2-884e-a903-53898d9a118a
2007-05-20 19:56:16 +00:00
Edward Z. Yang
fda2043ace [1.7.0] Code audit
- Add set accessor, update access control on variables in AttrTypes
- Add warning notes to non-unit tested, out of date or unused code files
- Remove redundant include in EntityParser, expand string regexp to match all ASCII XML-style entities
- Remove obsolete hooks in HTMLModule

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1078 48356398-32a2-884e-a903-53898d9a118a
2007-05-20 19:45:49 +00:00
Edward Z. Yang
3f06d8316c [1.7.0] Add unit test for AttrCollections
- Fixed bug where recursive attribute collections would result in infinite loop
- Fixed bug with deep inclusions in attribute collections
- Reset doctype object if HTML or Attr is changed
- Add accessor functions to AttrTypes, unit tested class

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1077 48356398-32a2-884e-a903-53898d9a118a
2007-05-20 19:29:05 +00:00
Edward Z. Yang
e4b621eec2 [1.7.0] Make doctype object available from config, switch generator over to it.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1076 48356398-32a2-884e-a903-53898d9a118a
2007-05-20 18:14:52 +00:00
Edward Z. Yang
9728be4a52 [1.7.0] Configuration object now finalizes itself after first read operation
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1075 48356398-32a2-884e-a903-53898d9a118a
2007-05-20 18:06:51 +00:00
Edward Z. Yang
f1ec05afd0 [1.7.0] Make AttrDef classes more friendly to serialization by not storing final static data in member variables
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1074 48356398-32a2-884e-a903-53898d9a118a
2007-05-20 17:23:09 +00:00
Edward Z. Yang
7481d349d3 Update TODO.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1072 48356398-32a2-884e-a903-53898d9a118a
2007-05-20 02:28:09 +00:00
Edward Z. Yang
086dc9177b [1.7.0] Add documentation for the Tidy functionality
- Make specifying the child property for ElementDef unnecessary when overloading content_model or content_model_type
- Add necessary includes to Tidy module files
- Move div@align fix to Tidy_Proprietary
- Future proof attrTransform.php by setting doctype to strict

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1071 48356398-32a2-884e-a903-53898d9a118a
2007-05-20 02:12:01 +00:00
Edward Z. Yang
4d38c02932 [1.7.0] Implement and hook-in Tidy module setup.
- CommonAttributes factored into XMLCommonAttributes and NonXMLCommonAttributes
- Tidy abstract module was completely refactored in interest of usability
- Add friendly error message if module does not have name

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1070 48356398-32a2-884e-a903-53898d9a118a
2007-05-19 21:00:12 +00:00
Edward Z. Yang
83a50465dc [1.7.0] Commit abstract implementation of Tidy module: migration to follow.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1069 48356398-32a2-884e-a903-53898d9a118a
2007-05-19 01:42:17 +00:00
Edward Z. Yang
dd62a303eb [1.7.0] Create new Output configuration namespace and migrate directives that directly impact Generator to it. Rename %Core.Strict to %HTML.Strict. Pilot heredoc syntax.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1068 48356398-32a2-884e-a903-53898d9a118a
2007-05-19 00:24:23 +00:00
Edward Z. Yang
e4e981b6f1 Update documentation.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1067 48356398-32a2-884e-a903-53898d9a118a
2007-05-17 18:36:39 +00:00
Edward Z. Yang
a846f4e70b [1.7.0] Update Advanced API documentation to reflect new changes.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1066 48356398-32a2-884e-a903-53898d9a118a
2007-05-16 03:35:57 +00:00
Edward Z. Yang
a5136b65e4 [1.7.0] Eliminated modes in favor for special-case "Tidy" modules
- Add $xml property to Doctype, make more serialize friendly in preparation for stuffing into Config object
- Add FIXME markers for areas of further development, code is hooked so this is easy
- Document what the new Tidy classes will be

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1065 48356398-32a2-884e-a903-53898d9a118a
2007-05-16 03:00:18 +00:00
Edward Z. Yang
2d035483dd Update TODO with specific tasks for 1.7.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1064 48356398-32a2-884e-a903-53898d9a118a
2007-05-15 03:01:57 +00:00
Edward Z. Yang
831a09d455 [1.7.0] Various updates
- Implement addModule(), requires new userModules property
- Remove unnecessary $config passing for getElement(s)
- Revamp HTMLModuleManagerTest
- Fix buggy unit test for unrecognized parent
- Remove anonymous generator member variable from ChildDef_Required

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1063 48356398-32a2-884e-a903-53898d9a118a
2007-05-15 02:33:19 +00:00
Edward Z. Yang
2cbb3be602 [1.7.0] Armor error messages against XSS injection.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1062 48356398-32a2-884e-a903-53898d9a118a
2007-05-15 01:24:20 +00:00
Edward Z. Yang
f7eccc0038 [1.7.0] Add %HTML.Trusted directive to allow untrusted elements in. Add special-case code for <script> into Generator.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1061 48356398-32a2-884e-a903-53898d9a118a
2007-05-15 01:17:10 +00:00
Edward Z. Yang
65252d6fbd [1.7.0] Wire in DoctypeRegistry to HTMLModuleManager, convert doctype declarations, migrate some related functionality to proper class
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1060 48356398-32a2-884e-a903-53898d9a118a
2007-05-15 00:31:53 +00:00
Edward Z. Yang
6b9c5ec603 [1.7.0] Implement DoctypeRegistry. Add transparent constructor to Doctype.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1059 48356398-32a2-884e-a903-53898d9a118a
2007-05-14 22:36:35 +00:00
Edward Z. Yang
e7b15068c2 [1.7.0] More refactoring
- Remove vestigial initialize code
- Update documentation
- Rename member variable: modules -> registeredModules and validModules -> modules

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1057 48356398-32a2-884e-a903-53898d9a118a
2007-05-14 02:24:21 +00:00
Edward Z. Yang
53c19552d2 [1.7.0] More HTMLModuleManager work:
- Move Doctype to its own file
- Remove vestigial autoDoctype and order
- Setup will automatically load modules for you
- Allow overriding trust level for parent element
- Random documentation update

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1056 48356398-32a2-884e-a903-53898d9a118a
2007-05-14 01:58:05 +00:00
Edward Z. Yang
048242004e [1.7.0] Remove vestigal chunks of code from HTMLModuleManager, switch HTMLDefinition to use validModules, and update some inline docs.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1055 48356398-32a2-884e-a903-53898d9a118a
2007-05-14 01:03:21 +00:00
Edward Z. Yang
05e1aca2fa [1.7.0] Begin refactoring of HTMLModuleManager, a lot of vestigal code remaining, but basic transferral to decentralized safety design finished. Enable scripting module.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1053 48356398-32a2-884e-a903-53898d9a118a
2007-05-14 00:14:21 +00:00
Edward Z. Yang
23feb457f2 [1.7.0] Drastically reorganize TransformToStrict, attributes now ordered alphabetically and are commented
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1052 48356398-32a2-884e-a903-53898d9a118a
2007-05-13 21:46:10 +00:00
Edward Z. Yang
8f6380d63a [1.7.0] Minor reformatting of some modules to make them more like the XHTML abstract definitions
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1051 48356398-32a2-884e-a903-53898d9a118a
2007-05-13 20:50:53 +00:00
Edward Z. Yang
3b1c40b2fc [1.7.0] Add some module unit tests for Edit, Hypertext, Image and Legacy (incomplete). Remove redundant img scaffolding.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1050 48356398-32a2-884e-a903-53898d9a118a
2007-05-13 20:43:38 +00:00
Edward Z. Yang
da92cb9ff4 [1.7.0] Fix bug in HTMLPurifier_Harness that causes certain aspects of $input to change after parsing
- Add makeLookup() convenience function to HTMLModule
- Relocate SGML exclusion comment
- Add preliminary Bdo module test

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1049 48356398-32a2-884e-a903-53898d9a118a
2007-05-13 03:42:09 +00:00
Edward Z. Yang
bda9167423 [1.7.0] Modify behavior of ElementDef->mergeIn to also merge safe property, this means default is now null.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1048 48356398-32a2-884e-a903-53898d9a118a
2007-05-12 21:47:03 +00:00
Edward Z. Yang
cb9c96a2b0 [1.7.0] Implement addBlankElement for non-standalone elements.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1047 48356398-32a2-884e-a903-53898d9a118a
2007-05-12 20:54:55 +00:00
Edward Z. Yang
e0cf214c44 [1.7.0] Modify addElement to return a reference to the created definition, shorten other HTMLModules accordingly.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1046 48356398-32a2-884e-a903-53898d9a118a
2007-05-12 20:44:47 +00:00
Edward Z. Yang
ed73fdd5b8 [1.7.0] Convert table module to new format. Add support for literal object $contents variable.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1045 48356398-32a2-884e-a903-53898d9a118a
2007-05-12 20:26:26 +00:00
Edward Z. Yang
eaea42f827 [1.7.0] Migrate Presentation module to new syntax, compactify Edit, Legacy and List declarations.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1044 48356398-32a2-884e-a903-53898d9a118a
2007-05-11 00:54:04 +00:00
Edward Z. Yang
7f39e1e2c3 [1.7.0] Convert Image, Legacy and List to use new format.
- Make attribute array parameter optional
- Optimize contents parsing for keywords

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1041 48356398-32a2-884e-a903-53898d9a118a
2007-05-09 22:01:07 +00:00
Edward Z. Yang
b81fb0af90 [1.7.0] Add more convenience functions to HTMLModule, wire Edit and Hypertext to use new functionality
- Added LanguageCode to AttrTypes. We should prefer string representations of attribute definitions.

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1040 48356398-32a2-884e-a903-53898d9a118a
2007-05-08 03:28:58 +00:00
Edward Z. Yang
47fe34ad81 [1.7.0] Create convenience functions for HTMLModule constructors, HTMLModule_Bdo was hooked up
- Add initial "safe" property for elements, is not set for most though

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1039 48356398-32a2-884e-a903-53898d9a118a
2007-05-07 01:51:26 +00:00
Edward Z. Yang
ac50d333a5 [1.7.0] Unit test for ElementDef created, ElementDef behavior modified to be more flexible
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1038 48356398-32a2-884e-a903-53898d9a118a
2007-05-07 00:38:23 +00:00
Edward Z. Yang
ce013e2962 Remove orphaned release (1.5.1)
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1037 48356398-32a2-884e-a903-53898d9a118a
2007-05-07 00:04:39 +00:00
Edward Z. Yang
67fab710bf Standardize release script names, add cli execution guards.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1036 48356398-32a2-884e-a903-53898d9a118a
2007-05-06 21:49:32 +00:00
Edward Z. Yang
b3a599e8c2 Add some more release scripts.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1033 48356398-32a2-884e-a903-53898d9a118a
2007-05-05 22:40:51 +00:00
Edward Z. Yang
f4e4c1556d Release 1.6.1.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1025 48356398-32a2-884e-a903-53898d9a118a
2007-05-05 20:26:42 +00:00
Edward Z. Yang
c5e33416d3 [1.6.1] Unit tests now use exclusively assertIdentical
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1024 48356398-32a2-884e-a903-53898d9a118a
2007-05-05 20:17:04 +00:00
Edward Z. Yang
6c08ca4c16 [1.6.1] Fix bug (== v. ===) that caused merged in attribute definitions to be messed up
- Make our modified class_exists() check to work in both PHP 4 and 5
(todo: we need some unit tests for ElementDef)

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1023 48356398-32a2-884e-a903-53898d9a118a
2007-05-05 20:04:34 +00:00
Edward Z. Yang
b1822bb04f [1.6.1] Implement AttrTransform for type in ul, ol and li
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1022 48356398-32a2-884e-a903-53898d9a118a
2007-05-05 19:13:52 +00:00
Edward Z. Yang
893e962890 [1.6.1] Update unit tests for font transformation
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1021 48356398-32a2-884e-a903-53898d9a118a
2007-05-05 18:59:24 +00:00
Edward Z. Yang
bd6071cb3b [1.6.1] Transformation of font's size attribute now handles super-large numbers
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1020 48356398-32a2-884e-a903-53898d9a118a
2007-05-05 18:56:45 +00:00
Edward Z. Yang
92ea74cba2 [1.6.1] Add attribute transformation smoketests
- Repair broken noshade implementation
- Add lots of advisory comments to TransformToStrict.php

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1019 48356398-32a2-884e-a903-53898d9a118a
2007-05-05 18:41:53 +00:00
Edward Z. Yang
a01459c87a [1.6.1] Implement clear in br and align in caption, table, img and hr
- Refactored ValidateAttributesTest.php

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1018 48356398-32a2-884e-a903-53898d9a118a
2007-05-05 16:18:04 +00:00
Edward Z. Yang
fd35c43643 [1.6.1] Implement generic EnumToCSS attribute transformation, migrate text alignment to it
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1017 48356398-32a2-884e-a903-53898d9a118a
2007-05-05 15:48:41 +00:00
Edward Z. Yang
0426985c81 [1.6.1] Refactor AttrTransform to reduce duplication.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1016 48356398-32a2-884e-a903-53898d9a118a
2007-05-05 02:25:55 +00:00
Edward Z. Yang
bbea02f55c Rewrite docs on align attribute, complete with smoketest-case and licensing info.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1014 48356398-32a2-884e-a903-53898d9a118a
2007-05-04 01:29:06 +00:00
Edward Z. Yang
4e77a1adbd [1.6.1] Fix fatal error with XHTML 1.1 validation.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1013 48356398-32a2-884e-a903-53898d9a118a
2007-05-04 01:17:00 +00:00
Edward Z. Yang
bd58a7ba77 [1.6.1] Implement BoolToCSS attribute transformations for td,th.nowrap and hr.noshade
- Implement CSS property white-space:nowrap;
- Update TODO with more ambitious goal: all transforms by 1.6.1

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1012 48356398-32a2-884e-a903-53898d9a118a
2007-05-03 04:07:47 +00:00
Edward Z. Yang
a3ed9196b9 Downgrade code-quality back to a txt scratchpad, add more items for AttrDef
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1010 48356398-32a2-884e-a903-53898d9a118a
2007-05-03 03:15:29 +00:00
Edward Z. Yang
2646f5ea57 Add experimental and dangerous Scripting module. This is NOT mentioned in the NEWS items, and will be officially released with 1.7.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1008 48356398-32a2-884e-a903-53898d9a118a
2007-05-01 21:43:24 +00:00
Edward Z. Yang
424c7ad2e3 Update target milestones, add Windows live mail specimen.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1007 48356398-32a2-884e-a903-53898d9a118a
2007-05-01 21:37:35 +00:00
Edward Z. Yang
234b3085d7 [1.6.1] Activate transform for hr.size
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1006 48356398-32a2-884e-a903-53898d9a118a
2007-05-01 21:36:19 +00:00
Edward Z. Yang
3d978c961d [1.6.1] Implement target module/attribute.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1002 48356398-32a2-884e-a903-53898d9a118a
2007-04-30 21:19:15 +00:00
Edward Z. Yang
72254cd77a [1.6.1] Implement vspace and hspace transformations in img.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1001 48356398-32a2-884e-a903-53898d9a118a
2007-04-30 19:39:42 +00:00
Edward Z. Yang
d8a6361244 [1.6.1] Empty strings get converted to empty arrays instead of arrays with an empty string in them.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1000 48356398-32a2-884e-a903-53898d9a118a
2007-04-30 01:14:21 +00:00
Edward Z. Yang
968dfa2feb [1.6.1] Fix broken configuration directive %Core.RemoveInvalidImg, also make basic demo operational out-of-the-box
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@999 48356398-32a2-884e-a903-53898d9a118a
2007-04-30 00:53:13 +00:00
Edward Z. Yang
114d6841ab Update TODO: rename release and add HTML configuration interface
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@998 48356398-32a2-884e-a903-53898d9a118a
2007-04-30 00:48:22 +00:00
Edward Z. Yang
1c68d769b5 Fix bug in packager: force all files to be "php"
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@996 48356398-32a2-884e-a903-53898d9a118a
2007-04-29 04:06:40 +00:00
Edward Z. Yang
ac0ca3f15c Miscellaneous URL updates.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@993 48356398-32a2-884e-a903-53898d9a118a
2007-04-22 22:26:20 +00:00
Edward Z. Yang
2d5498b8aa Update URLs from hp.jpsband.org to htmlpurifier.org
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@992 48356398-32a2-884e-a903-53898d9a118a
2007-04-22 22:22:48 +00:00
Edward Z. Yang
71ccae1a3a [1.6.0] Add news item on how demo script was removed
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@991 48356398-32a2-884e-a903-53898d9a118a
2007-04-22 22:11:35 +00:00
Edward Z. Yang
cb186dddc4 Compactify HTML Purifier library inclusion
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@987 48356398-32a2-884e-a903-53898d9a118a
2007-04-22 21:01:48 +00:00
Edward Z. Yang
2ceccc0969 Moved remotely to website.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@984 48356398-32a2-884e-a903-53898d9a118a
2007-04-22 20:55:52 +00:00
Edward Z. Yang
93aa98ad01 Update package.php with new URLs from migration.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@979 48356398-32a2-884e-a903-53898d9a118a
2007-04-22 02:56:05 +00:00
Edward Z. Yang
c0b38bab85 [1.6.1] Invert HTMLModuleManager->addModule() processing order to check prefixes first and then the literal module
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@971 48356398-32a2-884e-a903-53898d9a118a
2007-04-21 02:31:38 +00:00
Edward Z. Yang
d6c4473a12 [1.6.1] Possibly fatal bug with __autoload() fixed in module manager
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@969 48356398-32a2-884e-a903-53898d9a118a
2007-04-21 02:19:18 +00:00
Edward Z. Yang
fc06f221d5 Remove redundant $info member variable.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@960 48356398-32a2-884e-a903-53898d9a118a
2007-04-11 21:44:26 +00:00
Edward Z. Yang
ac3ab2a556 [1.6.1] DirectLex now preserves text in which a < bracket is followed by a non-alphanumeric character. This means that certain emoticons are now preserved.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@939 48356398-32a2-884e-a903-53898d9a118a
2007-04-04 02:22:27 +00:00
Edward Z. Yang
2c330cac73 Add 1.6.1 TODO stuff.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@937 48356398-32a2-884e-a903-53898d9a118a
2007-04-02 13:28:49 +00:00
Edward Z. Yang
a0d6543b84 Some packaging fixes:
- Add VERSION file, which contains just the version number of the release
- Add WHATSNEW, which is a short summary of the new release
- Add release.php which bumps all the necessary version numbers in files
- Update package.php so that the version numbers aren't hardcoded
- Add news entry for 1.7.0

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@935 48356398-32a2-884e-a903-53898d9a118a
2007-04-02 03:58:59 +00:00
Edward Z. Yang
e223490a78 Release 1.6.0.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@930 48356398-32a2-884e-a903-53898d9a118a
2007-04-01 22:31:16 +00:00
Edward Z. Yang
2666f067cc Add partial French install file.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@929 48356398-32a2-884e-a903-53898d9a118a
2007-04-01 21:38:10 +00:00
Edward Z. Yang
826a57a04a Update Advanced API with various edits and Customization section.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@928 48356398-32a2-884e-a903-53898d9a118a
2007-04-01 18:21:43 +00:00
Edward Z. Yang
e08b5aaa70 [1.6.0] Add error messages for when user attempts to "allow" elements or attributes HTML Purifier does not support.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@927 48356398-32a2-884e-a903-53898d9a118a
2007-03-31 03:41:22 +00:00
Edward Z. Yang
b15e8c344e [1.6.0] Implement ID regexp matching.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@926 48356398-32a2-884e-a903-53898d9a118a
2007-03-31 03:25:10 +00:00
Edward Z. Yang
2c9e041b4c Update TODO and progress document.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@925 48356398-32a2-884e-a903-53898d9a118a
2007-03-31 03:09:46 +00:00
Edward Z. Yang
e2c3394d70 [1.6.0] Add support for LinkTypes, used for rel and rev attributes.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@924 48356398-32a2-884e-a903-53898d9a118a
2007-03-31 02:58:16 +00:00
Edward Z. Yang
1532fe703a Update docs:
- Progress hr.size was changed from width to height
- UTF-8 rules of thumb were clarified to make clear this is only necessary for UTF-8 text.

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@923 48356398-32a2-884e-a903-53898d9a118a
2007-03-30 00:01:35 +00:00
Edward Z. Yang
058f1eba7d [1.6.0] Implement width/height attribute transforms with Length.php
- Also, enabled 'height' CSS attribute

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@922 48356398-32a2-884e-a903-53898d9a118a
2007-03-29 23:48:54 +00:00
Edward Z. Yang
1102dc6e27 [1.6.0] Add support for name transformation to id
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@921 48356398-32a2-884e-a903-53898d9a118a
2007-03-29 23:19:53 +00:00
Edward Z. Yang
85374d330f [1.6.0] Add support for border attribute transform
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@920 48356398-32a2-884e-a903-53898d9a118a
2007-03-29 21:41:17 +00:00
Edward Z. Yang
a16d6c4342 [1.6.0] Add support for bgcolor attribute transform.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@919 48356398-32a2-884e-a903-53898d9a118a
2007-03-29 21:20:44 +00:00
Edward Z. Yang
9b5e2978ad Add ID regexps to the TODO list.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@913 48356398-32a2-884e-a903-53898d9a118a
2007-03-29 00:13:12 +00:00
Edward Z. Yang
06468a4157 [1.5.1] Add segfault fix to news log.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@912 48356398-32a2-884e-a903-53898d9a118a
2007-03-27 23:29:10 +00:00
Edward Z. Yang
0167f8aa84 [1.5.1] Try separating out declarations, might stop segfaulting.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@911 48356398-32a2-884e-a903-53898d9a118a
2007-03-27 23:15:01 +00:00
Edward Z. Yang
f1a90e684b [1.5.1] Separate out trouble area that's having segfaults. (note: this commit actually inadvertently let us discover a fix for the segfault, applied in the next revision).
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@910 48356398-32a2-884e-a903-53898d9a118a
2007-03-27 23:07:21 +00:00
Edward Z. Yang
14d98413fd Update advanced API with more details on selection interface.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@908 48356398-32a2-884e-a903-53898d9a118a
2007-03-27 01:26:26 +00:00
Edward Z. Yang
97a4ec7598 Add in terracc's suggestions to TODO file.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@886 48356398-32a2-884e-a903-53898d9a118a
2007-03-25 00:40:13 +00:00
Edward Z. Yang
71ed725c5c Complete PEAR packager that actually works!
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@885 48356398-32a2-884e-a903-53898d9a118a
2007-03-25 00:23:35 +00:00
Edward Z. Yang
d4bf41288a Add package2.xml
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@884 48356398-32a2-884e-a903-53898d9a118a
2007-03-24 20:43:16 +00:00
Edward Z. Yang
365bd78c20 Commit PEAR package stuffs.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@883 48356398-32a2-884e-a903-53898d9a118a
2007-03-24 20:39:00 +00:00
Edward Z. Yang
52fa958fb2 Release 1.5.0 (bumped HTMLPurifier.php version number).
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@874 48356398-32a2-884e-a903-53898d9a118a
2007-03-24 02:10:33 +00:00
Edward Z. Yang
17d32bac7f Almost release 1.5.0. Merged in a few strict changes.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@870 48356398-32a2-884e-a903-53898d9a118a
2007-03-24 01:24:38 +00:00
Edward Z. Yang
e2babe5308 Almost release 1.5.0.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@867 48356398-32a2-884e-a903-53898d9a118a
2007-03-24 00:35:53 +00:00
Edward Z. Yang
5f1a6b883f Update NEWS with a few old items I missed. We may yet have a 1.4.2 interim release.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@811 48356398-32a2-884e-a903-53898d9a118a
2007-03-14 21:34:37 +00:00
Edward Z. Yang
c5e3796202 Update advanced API docs, link to it from index.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@808 48356398-32a2-884e-a903-53898d9a118a
2007-03-14 04:56:44 +00:00
Edward Z. Yang
72f1984229 Add notes on "mode" to advanced API.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@801 48356398-32a2-884e-a903-53898d9a118a
2007-03-12 03:53:09 +00:00
Edward Z. Yang
918081b372 [1.4.x?] Make regex multiline.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@794 48356398-32a2-884e-a903-53898d9a118a
2007-03-04 02:55:44 +00:00
Edward Z. Yang
6c56dd070f Updated Advanced API docs.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@769 48356398-32a2-884e-a903-53898d9a118a
2007-03-01 03:56:08 +00:00
Edward Z. Yang
299f93f8f0 Add initial version of advanced API specification, also add <q> tag fix.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@768 48356398-32a2-884e-a903-53898d9a118a
2007-02-28 04:42:08 +00:00
Edward Z. Yang
4169846c57 Modules are not passed by reference, so in PHP 4 we cannot guarantee same module that went in will be used.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@767 48356398-32a2-884e-a903-53898d9a118a
2007-02-27 23:57:54 +00:00
Edward Z. Yang
aff4957531 [1.4.x?] Alright, have both PHP5 and DOMDocument requirements for DOMLex checked.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@766 48356398-32a2-884e-a903-53898d9a118a
2007-02-27 23:54:29 +00:00
Edward Z. Yang
e4bdf472a6 Fix typo.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@764 48356398-32a2-884e-a903-53898d9a118a
2007-02-20 03:05:03 +00:00
Edward Z. Yang
9a99750474 - Setup doctypes, auto properties, and work on making the interface more user-friendly
- Yet even more unit test for HTMLModuleManager
- Sample code in printDefinition for defining a new element
- Downgraded importances of HTMLModule->elements

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@762 48356398-32a2-884e-a903-53898d9a118a
2007-02-18 05:29:19 +00:00
Edward Z. Yang
7eb751b5f5 More refactoring: for interest of unit testing, default doctypes were moved to an initialize() method which could optionally be omitted. Disable collection aliases in favor of doctype aliases.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@761 48356398-32a2-884e-a903-53898d9a118a
2007-02-17 22:17:14 +00:00
Edward Z. Yang
0d0173eb6e Implement unit tests for very public interfaces of HTMLModuleManager, also added lots of error checking. tally_errors now requires unit test to be passed in as parameter.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@760 48356398-32a2-884e-a903-53898d9a118a
2007-02-17 19:37:48 +00:00
Edward Z. Yang
556ed4ea90 - Shuffle around includes to the right places
- Fix error in unit test

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@759 48356398-32a2-884e-a903-53898d9a118a
2007-02-17 17:43:44 +00:00
Edward Z. Yang
cf445a6107 - Revamp ordering scheme: onus in on collections, conflict resolution based on module load order.
- Miscellaneous refactoring and documentation

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@758 48356398-32a2-884e-a903-53898d9a118a
2007-02-17 17:10:28 +00:00
Edward Z. Yang
243ad45e59 Add some clarifying comments on what belongs in activeModules and validModules.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@757 48356398-32a2-884e-a903-53898d9a118a
2007-02-16 03:48:25 +00:00
Edward Z. Yang
31d0c621f5 Create two more module sets: activeModules and validModules to supplant the getModules() method.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@756 48356398-32a2-884e-a903-53898d9a118a
2007-02-16 03:33:29 +00:00
Edward Z. Yang
0870974a25 Have processCollections() perform name to instance indexing at the get-go.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@755 48356398-32a2-884e-a903-53898d9a118a
2007-02-16 03:16:17 +00:00
Edward Z. Yang
5c4a0a6785 Migrate default attribute collections to their own module, do late-loading of the attribute collection.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@754 48356398-32a2-884e-a903-53898d9a118a
2007-02-16 03:07:47 +00:00
Edward Z. Yang
e55babdc53 Move order to module itself, as member variable type.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@753 48356398-32a2-884e-a903-53898d9a118a
2007-02-16 03:01:23 +00:00
Edward Z. Yang
6e1b540d99 Remove missing include.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@752 48356398-32a2-884e-a903-53898d9a118a
2007-02-15 14:02:01 +00:00
Edward Z. Yang
edf20018f0 Add an HTMLModuleManager.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@751 48356398-32a2-884e-a903-53898d9a118a
2007-02-15 14:00:18 +00:00
Edward Z. Yang
c09432e171 Add command line support for loading a single test file.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@750 48356398-32a2-884e-a903-53898d9a118a
2007-02-15 00:17:23 +00:00
Edward Z. Yang
9c031b5c1e Add name class member variable to modules.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@749 48356398-32a2-884e-a903-53898d9a118a
2007-02-14 22:30:17 +00:00
Edward Z. Yang
a827cbc3ba Slight formatting change.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@748 48356398-32a2-884e-a903-53898d9a118a
2007-02-14 22:21:07 +00:00
Edward Z. Yang
c05eebee15 [1.5.0] AttrDef partitioned into HTML, CSS and URI segments. Also, some minor bugs with MultiLength fixed.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@747 48356398-32a2-884e-a903-53898d9a118a
2007-02-14 20:38:51 +00:00
Edward Z. Yang
93a69d020a Fix typo.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@746 48356398-32a2-884e-a903-53898d9a118a
2007-02-14 16:22:28 +00:00
Edward Z. Yang
f3fa9c01ba Add IDREF support to TODO list.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@744 48356398-32a2-884e-a903-53898d9a118a
2007-02-14 03:59:25 +00:00
Edward Z. Yang
bae5b0c022 Move out SetParent and TweakSubtractiveWhitelist. Move out some other configurations, disable ID references.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@743 48356398-32a2-884e-a903-53898d9a118a
2007-02-14 02:54:41 +00:00
Edward Z. Yang
67befbc8a8 [1.5.0] Rename %Attr.DisableURI to %URI.Disable and move it over to the AttrDef.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@742 48356398-32a2-884e-a903-53898d9a118a
2007-02-14 01:57:06 +00:00
Edward Z. Yang
cac22f01cf [1.5.0]
- More framework work (modules now are treated first class)
- Config will regenerate definitions when appropriate entries are set
- Add HTMLModule->setup for pre-processing stuff
- Constructor receives $definition not $config
- Config rolled inside $definition->config until end of setup()

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@741 48356398-32a2-884e-a903-53898d9a118a
2007-02-14 01:44:06 +00:00
Edward Z. Yang
94d2dbaa74 Fix broken benchmark code.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@739 48356398-32a2-884e-a903-53898d9a118a
2007-02-13 20:51:47 +00:00
Edward Z. Yang
6add828bc8 Update UTF-8 title.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@735 48356398-32a2-884e-a903-53898d9a118a
2007-02-13 03:09:34 +00:00
Edward Z. Yang
800b67ed65 Add preProcess and postProcess infrastructure to HTMLModule and HTMLDefinition so that almost all functionality that does not involve merging the modules together can be factored into modules.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@733 48356398-32a2-884e-a903-53898d9a118a
2007-02-12 03:02:26 +00:00
Edward Z. Yang
71e4ddd222 [1.5.0] Implement Legacy module.
- Yet another test EnableAttrID
- ElementDef now is mindful of attr inclusion merges

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@732 48356398-32a2-884e-a903-53898d9a118a
2007-02-11 01:52:56 +00:00
Edward Z. Yang
54a68a1713 [1.5.0] Implement TransformToStrict proprietary module
- Factored out strictblockquote from the common definition
- Text module now follows "strict" rules by default
- attr_transform_* now are indexed with string keys, to allow overloading
- Implement ElementDef mergin, and add standalone class variable to ElementDef to prevent half-baked element definitions from masquerading as full ones
- Implement merging global attributes from modules, namely info_attr_transform_post, info_attr_transform_pre and info_tag_transform
- Rename setupInfo() to processModules()
- Fix typo in HTMLModule/Bdo.php

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@731 48356398-32a2-884e-a903-53898d9a118a
2007-02-10 23:35:21 +00:00
Edward Z. Yang
bd544ad038 Formatting and documentation fixes.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@730 48356398-32a2-884e-a903-53898d9a118a
2007-02-09 03:19:43 +00:00
Edward Z. Yang
d5491da77f [1.5.0] Rewrite XHTML 1.1 document to describe HTMLDefinition's modularization
- Use ElementDef->child to define a literal ChildDef object, rather than ElementDef->content_model.
- Add notes on transforms, HTMLModule will be able to write those too
- Fix some misc typos.

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@729 48356398-32a2-884e-a903-53898d9a118a
2007-02-08 23:10:49 +00:00
Edward Z. Yang
591fc0ae28 Divvy up TagTransform library files into their own separate files. Similar action needs to be taken for the tests.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@728 48356398-32a2-884e-a903-53898d9a118a
2007-02-06 01:33:28 +00:00
Edward Z. Yang
dac7ac1eae Add documentation.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@727 48356398-32a2-884e-a903-53898d9a118a
2007-02-05 05:23:20 +00:00
Edward Z. Yang
64ee756b7a Rename ConfigEntity to ConfigDef and factor into its own classes.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@726 48356398-32a2-884e-a903-53898d9a118a
2007-02-05 03:22:32 +00:00
Edward Z. Yang
e2103ce0f2 Factor out content set and childdef functionality to ContentSets. Remove redundant info suffix from attr_collections.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@725 48356398-32a2-884e-a903-53898d9a118a
2007-02-05 03:05:46 +00:00
Edward Z. Yang
219902ebff Revert back to pre XHTMLDefinition testing state.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@724 48356398-32a2-884e-a903-53898d9a118a
2007-02-04 23:18:53 +00:00
Edward Z. Yang
21116373a7 [1.5.0] Implemented new HTMLDefinition based on XHTML 1.1 Modularization
- Well, not really, but it's now official. Some gunky prototype code left, but it's pretty much all done.

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@723 48356398-32a2-884e-a903-53898d9a118a
2007-02-04 23:17:13 +00:00
Edward Z. Yang
5ed88809f3 Add a bunch of compatibility gunk to XHTMLDefinition for modules we've not implemented yet and replace HTMLDefinition with it.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@722 48356398-32a2-884e-a903-53898d9a118a
2007-02-04 23:10:10 +00:00
Edward Z. Yang
bb8b38b1e0 Rename attr_collection to attr_collections, which is more accurate. HTMLModule now has attr_collections_info rather than attr_collections which implied an object. Further clarified naming conventions.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@721 48356398-32a2-884e-a903-53898d9a118a
2007-02-04 22:26:56 +00:00
Edward Z. Yang
236159242f Enforce info_ prefix convention for data that is accessed by HTML Purifier internals.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@720 48356398-32a2-884e-a903-53898d9a118a
2007-02-04 22:08:51 +00:00
Edward Z. Yang
9d8f839bf2 Add empty template HTMLModule for legacy-related processing.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@719 48356398-32a2-884e-a903-53898d9a118a
2007-02-04 21:58:38 +00:00
Edward Z. Yang
882148f9ad Add nested test for del/ins inline support.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@718 48356398-32a2-884e-a903-53898d9a118a
2007-02-04 21:02:35 +00:00
Edward Z. Yang
a863f62489 Add full documentation. Implement deferred ChildDef to HTMLModule. Add missing attributes for table, switched some to Number. Add necessary includes to module files. Add pre exclusions. Printer now ksorts arrays before output. Exclude ins/del from descendants_are_inline flagging.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@717 48356398-32a2-884e-a903-53898d9a118a
2007-02-04 20:09:35 +00:00
Edward Z. Yang
6478c7c2df Implement Style Attribute Module, cleanup some attribute collections and add some documentation.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@716 48356398-32a2-884e-a903-53898d9a118a
2007-02-04 18:27:59 +00:00
Edward Z. Yang
129a4ea506 Implement Image Module.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@715 48356398-32a2-884e-a903-53898d9a118a
2007-02-04 16:35:40 +00:00
Edward Z. Yang
a122243a89 Implement Tables Module.
- Fix HTMLDefinition rendering of table children

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@714 48356398-32a2-884e-a903-53898d9a118a
2007-02-04 16:23:26 +00:00
Edward Z. Yang
315c55eeb1 Implement Bdo module. Also added some documentation and missing values, as well as support for attr_collection additions.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@713 48356398-32a2-884e-a903-53898d9a118a
2007-02-04 15:28:47 +00:00
Edward Z. Yang
cfe50ff8ae Implement Edit module.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@712 48356398-32a2-884e-a903-53898d9a118a
2007-02-04 14:56:55 +00:00
Edward Z. Yang
d0018a2696 Implement Presentation module.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@711 48356398-32a2-884e-a903-53898d9a118a
2007-02-04 04:41:34 +00:00
Edward Z. Yang
77d9e05a07 [1.5.0] Massive refactoring for Blockquote and Chameleon to be more extensible and accommodating of XHTMLDefinition.
- Fixed buggy chameleon-support for ins and del
. Removed context variable ParentType, replaced with IsInline, which
  is false when you're not inline and an integer of the parent that
  caused you to become inline when you are (so possibly zero)
. Removed ElementDef->type in favor of ElementDef->descendants_are_inline
  and HTMLDefinition->content_sets
. StrictBlockquote now reports what elements its supposed to allow,
  rather than what it does allow
. Removed HTMLDefinition->info_flow_elements in favor of
  HTMLDefinition->content_sets['Flow']
. Removed redundant "exclusionary" definitions from DTD roster
. StrictBlockquote now requires a construction parameter as if it
  were an Required ChildDef, this is the "real" set of allowed elements

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@710 48356398-32a2-884e-a903-53898d9a118a
2007-02-04 03:53:57 +00:00
Edward Z. Yang
80243f377c Implement List module.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@709 48356398-32a2-884e-a903-53898d9a118a
2007-02-04 01:52:13 +00:00
Edward Z. Yang
43b157cf4d Add Hypertext module.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@708 48356398-32a2-884e-a903-53898d9a118a
2007-02-04 01:01:27 +00:00
Edward Z. Yang
f6b50d4bfd Initial implementation of XHTMLDefinition, you can see it in action at the smoketest printDefinition.php?x (add the x at the end).
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@707 48356398-32a2-884e-a903-53898d9a118a
2007-02-04 00:07:52 +00:00
Edward Z. Yang
806901cfd2 [1.5.0] Rename Class to Nmtokens (more accurate)
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@706 48356398-32a2-884e-a903-53898d9a118a
2007-02-03 20:15:33 +00:00
Edward Z. Yang
f90eef7f1f Update docs. Delineate XHTML 1.1 revamping of HTMLDefinition.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@705 48356398-32a2-884e-a903-53898d9a118a
2007-02-03 17:03:04 +00:00
Edward Z. Yang
06867e14b6 Increase child definition sets to all elements to facilitate later expansion. Currently has no perceptible effect.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@704 48356398-32a2-884e-a903-53898d9a118a
2007-02-03 03:45:13 +00:00
Edward Z. Yang
bda2615b30 [1.5.0] Add support for IDREF
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@703 48356398-32a2-884e-a903-53898d9a118a
2007-02-02 22:03:09 +00:00
Edward Z. Yang
e1a5d10e75 Fix typo in comment.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@702 48356398-32a2-884e-a903-53898d9a118a
2007-01-30 00:34:23 +00:00
Edward Z. Yang
98fd6b7d82 [1.5.0] Add rudimentary I18N and L10N support based off MediaWiki
- Also: allow 'x' subtag in language codes

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@701 48356398-32a2-884e-a903-53898d9a118a
2007-01-29 20:11:00 +00:00
Edward Z. Yang
be264a4b20 Update docs.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@700 48356398-32a2-884e-a903-53898d9a118a
2007-01-29 17:53:54 +00:00
Edward Z. Yang
01c85b71d2 Fix minor typo.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@699 48356398-32a2-884e-a903-53898d9a118a
2007-01-28 22:19:05 +00:00
Edward Z. Yang
2d22c0aa55 [1.4.x?] Completed enduser-utf8.html
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@697 48356398-32a2-884e-a903-53898d9a118a
2007-01-24 23:48:35 +00:00
Edward Z. Yang
6e061f5184 I18N -> International/internationalization
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@696 48356398-32a2-884e-a903-53898d9a118a
2007-01-24 21:24:54 +00:00
Edward Z. Yang
44b988f1f6 Fix some editing mistakes.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@695 48356398-32a2-884e-a903-53898d9a118a
2007-01-24 03:00:48 +00:00
Edward Z. Yang
0ead9558b4 Finish up to BOM.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@694 48356398-32a2-884e-a903-53898d9a118a
2007-01-24 01:29:25 +00:00
Edward Z. Yang
159a1cced1 Complete HTML Purifier segment.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@693 48356398-32a2-884e-a903-53898d9a118a
2007-01-23 03:27:10 +00:00
Edward Z. Yang
6871a54d64 Release 1.4.1.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@687 48356398-32a2-884e-a903-53898d9a118a
2007-01-21 21:47:18 +00:00
Edward Z. Yang
96ac7e8797 [1.4.1] docs/enduser-youtube.html updated according to new functionality and YouTube IDs can have underscores and dashes
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@686 48356398-32a2-884e-a903-53898d9a118a
2007-01-21 21:45:14 +00:00
877 changed files with 62115 additions and 10547 deletions

23
.gitattributes vendored Normal file
View File

@@ -0,0 +1,23 @@
/.gitattributes export-ignore
/.github export-ignore
/.gitignore export-ignore
/art export-ignore
/benchmarks export-ignore
/configdoc export-ignore
/configdoc/usage.xml -crlf
/docs export-ignore
/Doxyfile export-ignore
/extras export-ignore
/INSTALL* export-ignore
/maintenance export-ignore
/NEWS export-ignore
/package.php export-ignore
/plugins export-ignore
/phpdoc.ini export-ignore
/smoketests export-ignore
/test-* export-ignore
/tests export-ignore
/TODO export-ignore
/update-for-release export-ignore
/WYSIWYG export-ignore
/release.config.js export-ignore

36
.github/workflows/ci.yml vendored Normal file
View File

@@ -0,0 +1,36 @@
name: ci
on:
push:
pull_request:
jobs:
linux_tests:
runs-on: ubuntu-latest
strategy:
fail-fast: true
matrix:
php: ['5.6', '7.0', '7.1', '7.2', '7.3', '7.4', '8.0', '8.1', '8.2']
name: PHP ${{ matrix.php }}
steps:
- name: Checkout code
uses: actions/checkout@v3
- name: Setup PHP
uses: shivammathur/setup-php@v2
with:
php-version: ${{ matrix.php }}
tools: composer:v2
ini-values: error_reporting=E_ALL
extensions: iconv, bcmath, tidy, mbstring, intl
- name: Install dependencies
run: composer install
- name: Configure simpletest
run: cp test-settings.sample.php test-settings.php
- name: Execute Unit tests
run: php tests/index.php

19
.github/workflows/lint-pr.yml vendored Normal file
View File

@@ -0,0 +1,19 @@
name: "Lint PR"
on:
pull_request_target:
types:
- opened
- edited
- synchronize
jobs:
main:
name: Validate PR title
runs-on: ubuntu-latest
steps:
- uses: amannn/action-semantic-pull-request@v4
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

29
.github/workflows/release.yml vendored Normal file
View File

@@ -0,0 +1,29 @@
name: release
on:
workflow_dispatch:
jobs:
release:
runs-on: ubuntu-latest
name: Release
steps:
- name: Checkout code
uses: actions/checkout@v2
- name: Setup PHP
uses: shivammathur/setup-php@v2
with:
php-version: 5.5
- name: Run automated release process with semantic-release
uses: cycjimmy/semantic-release-action@v2
with:
extra_plugins: |
@semantic-release/changelog
@semantic-release/git
@semantic-release/exec
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

28
.gitignore vendored Normal file
View File

@@ -0,0 +1,28 @@
tags
conf/
test-settings.php
config-schema.php
library/HTMLPurifier/DefinitionCache/Serializer/*/
library/standalone/
library/HTMLPurifier.standalone.php
library/HTMLPurifier*.tgz
library/package*.xml
smoketests/test-schema.html
configdoc/*.html
configdoc/configdoc.xml
docs/doxygen*
*.phpt.diff
*.phpt.exp
*.phpt.log
*.phpt.out
*.phpt.php
*.phpt.skip.php
*.htmlt.ini
*.patch
/*.php
vendor
composer.lock
*.rej
*.orig
*.bak
core

6
CHANGELOG.md Normal file
View File

@@ -0,0 +1,6 @@
# [4.16.0](https://github.com/ezyang/htmlpurifier/compare/v4.15.0...v4.16.0) (2022-09-18)
### Features
* add semantic release ([#307](https://github.com/ezyang/htmlpurifier/issues/307)) ([db31243](https://github.com/ezyang/htmlpurifier/commit/db312435cb9d8d73395f75f9642a43ba6de5e903)), closes [#322](https://github.com/ezyang/htmlpurifier/issues/322) [#323](https://github.com/ezyang/htmlpurifier/issues/323) [#326](https://github.com/ezyang/htmlpurifier/issues/326) [#327](https://github.com/ezyang/htmlpurifier/issues/327) [#328](https://github.com/ezyang/htmlpurifier/issues/328) [#329](https://github.com/ezyang/htmlpurifier/issues/329) [#330](https://github.com/ezyang/htmlpurifier/issues/330) [#331](https://github.com/ezyang/htmlpurifier/issues/331) [#332](https://github.com/ezyang/htmlpurifier/issues/332) [#333](https://github.com/ezyang/htmlpurifier/issues/333) [#337](https://github.com/ezyang/htmlpurifier/issues/337) [#335](https://github.com/ezyang/htmlpurifier/issues/335) [ezyang/htmlpurifier#334](https://github.com/ezyang/htmlpurifier/issues/334) [#336](https://github.com/ezyang/htmlpurifier/issues/336) [#338](https://github.com/ezyang/htmlpurifier/issues/338)

View File

@@ -5,3 +5,5 @@ Almost everything written by Edward Z. Yang (Ambush Commander). Lots of thanks
to the DevNetwork Community for their help (see docs/ref-devnetwork.html for
more details), Feyd especially (namely IPv6 and optimization). Thanks to RSnake
for letting me package his fantastic XSS cheatsheet for a smoketest.
vim: et sw=4 sts=4

1173
Doxyfile

File diff suppressed because it is too large Load Diff

317
INSTALL
View File

@@ -2,61 +2,61 @@
Install
How to install HTML Purifier
HTML Purifier is designed to run out of the box, so actually using the library
is extremely easy. (Although, if you were looking for a step-by-step
installation GUI, you've come to the wrong place!) The impatient can scroll
down to the bottom of this INSTALL document to see the code, but you really
should make sure a few things are properly done.
HTML Purifier is designed to run out of the box, so actually using the
library is extremely easy. (Although... if you were looking for a
step-by-step installation GUI, you've downloaded the wrong software!)
Todo: Convert to using the array syntax for configuration.
While the impatient can get going immediately with some of the sample
code at the bottom of this library, it's well worth reading this entire
document--most of the other documentation assumes that you are familiar
with these contents.
---------------------------------------------------------------------------
1. Compatibility
HTML Purifier works in both PHP 4 and PHP 5, from PHP 4.3.9 and up. It has no
core dependencies with other libraries. (Whoopee!)
HTML Purifier is PHP 5 and PHP 7, and is actively tested from PHP 5.3
and up. It has no core dependencies with other libraries.
Optional extensions are iconv (usually installed) and tidy (also common).
If you use UTF-8 and don't plan on pretty-printing HTML, you can get away with
not having either of these extensions.
These optional extensions can enhance the capabilities of HTML Purifier:
* iconv : Converts text to and from non-UTF-8 encodings
* bcmath : Used for unit conversion and imagecrash protection
* tidy : Used for pretty-printing HTML
These optional libraries can enhance the capabilities of HTML Purifier:
2. Including the library
* CSSTidy : Clean CSS stylesheets using %Core.ExtractStyleBlocks
Note: You should use the modernized fork of CSSTidy available
at https://github.com/Cerdic/CSSTidy
* Net_IDNA2 (PEAR) : IRI support using %Core.EnableIDNA
Note: This is not necessary for PHP 5.3 or later
Simply use:
---------------------------------------------------------------------------
2. Reconnaissance
require_once '/path/to/library/HTMLPurifier.auto.php';
A big plus of HTML Purifier is its inerrant support of standards, so
your web-pages should be standards-compliant. (They should also use
semantic markup, but that's another issue altogether, one HTML Purifier
cannot fix without reading your mind.)
...and you're good to go. Since HTML Purifier's codebase is fairly
large, I recommend only including HTML Purifier when you need it.
If you don't like your include_path to be fiddled around with, simply set
HTML Purifier's library/ directory to the include path yourself and then:
require_once 'HTMLPurifier.php';
Only the contents in the library/ folder are necessary, so you can remove
everything else when using HTML Purifier in a production environment.
3. Preparing the proper output environment
HTML Purifier is all about web-standards, so accordingly your webpages should
be standards compliant. HTML Purifier can deal with these doctypes:
HTML Purifier can process these doctypes:
* XHTML 1.0 Transitional (default)
* XHTML 1.0 Strict
* HTML 4.01 Transitional
* HTML 4.01 Strict
* XHTML 1.1
...and these character encodings:
* UTF-8 (default)
* Any encoding iconv supports (support is crippled for i18n though)
* Any encoding iconv supports (with crippled internationalization support)
The defaults are there for a reason: they are best-practice choices that
should not be changed lightly. For those of you in the dark, you can determine
the doctype from this code in your HTML documents:
These defaults reflect what my choices would be if I were authoring an
HTML document, however, what you choose depends on the nature of your
codebase. If you don't know what doctype you are using, you can determine
the doctype from this identifier at the top of your source code:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
@@ -65,35 +65,135 @@ the doctype from this code in your HTML documents:
<meta http-equiv="Content-type" content="text/html;charset=ENCODING">
For legacy codebases these declarations may be missing. If that is the case,
STOP, and read up on character encodings and doctypes (in that order). Here
are some links:
* http://www.joelonsoftware.com/articles/Unicode.html
* http://alistapart.com/stories/doctype/
You may currently be vulnerable to XSS and other security threats, and HTML
Purifier won't be able to fix that.
If the character encoding declaration is missing, STOP NOW, and
read 'docs/enduser-utf8.html' (web accessible at
http://htmlpurifier.org/docs/enduser-utf8.html). In fact, even if it is
present, read this document anyway, as many websites specify their
document's character encoding incorrectly.
---------------------------------------------------------------------------
3. Including the library
The procedure is quite simple:
require_once '/path/to/library/HTMLPurifier.auto.php';
This will setup an autoloader, so the library's files are only included
when you use them.
Only the contents in the library/ folder are necessary, so you can remove
everything else when using HTML Purifier in a production environment.
If you installed HTML Purifier via PEAR, all you need to do is:
require_once 'HTMLPurifier.auto.php';
Please note that the usual PEAR practice of including just the classes you
want will not work with HTML Purifier's autoloading scheme.
Advanced users, read on; other users can skip to section 4.
Autoload compatibility
----------------------
HTML Purifier attempts to be as smart as possible when registering an
autoloader, but there are some cases where you will need to change
your own code to accomodate HTML Purifier. These are those cases:
AN __autoload FUNCTION IS DECLARED AFTER OUR AUTOLOADER IS REGISTERED
spl_autoload_register() has the curious behavior of disabling
the existing __autoload() handler. Users need to explicitly
spl_autoload_register('__autoload'). Because we use SPL when it
is available, __autoload() will ALWAYS be disabled. If __autoload()
is declared before HTML Purifier is loaded, this is not a problem:
HTML Purifier will register the function for you. But if it is
declared afterwards, it will mysteriously not work. This
snippet of code (after your autoloader is defined) will fix it:
spl_autoload_register('__autoload')
For better performance
----------------------
Opcode caches, which greatly speed up PHP initialization for scripts
with large amounts of code (HTML Purifier included), don't like
autoloaders. We offer an include file that includes all of HTML Purifier's
files in one go in an opcode cache friendly manner:
// If /path/to/library isn't already in your include path, uncomment
// the below line:
// require '/path/to/library/HTMLPurifier.path.php';
require 'HTMLPurifier.includes.php';
Optional components still need to be included--you'll know if you try to
use a feature and you get a class doesn't exists error! The autoloader
can be used in conjunction with this approach to catch classes that are
missing. Simply add this afterwards:
require 'HTMLPurifier.autoload.php';
Standalone version
------------------
HTML Purifier has a standalone distribution; you can also generate
a standalone file from the full version by running the script
maintenance/generate-standalone.php . The standalone version has the
benefit of having most of its code in one file, so parsing is much
faster and the library is easier to manage.
If HTMLPurifier.standalone.php exists in the library directory, you
can use it like this:
require '/path/to/HTMLPurifier.standalone.php';
This is equivalent to including HTMLPurifier.includes.php, except that
the contents of standalone/ will be added to your path. To override this
behavior, specify a new HTMLPURIFIER_PREFIX where standalone files can
be found (usually, this will be one directory up, the "true" library
directory in full distributions). Don't forget to set your path too!
The autoloader can be added to the end to ensure the classes are
loaded when necessary; otherwise you can manually include them.
To use the autoloader, use this:
require 'HTMLPurifier.autoload.php';
For advanced users
------------------
HTMLPurifier.auto.php performs a number of operations that can be done
individually. These are:
HTMLPurifier.path.php
Puts /path/to/library in the include path. For high performance,
this should be done in php.ini.
HTMLPurifier.autoload.php
Registers our autoload handler HTMLPurifier_Bootstrap::autoload($class).
You can do these operations by yourself, if you like.
---------------------------------------------------------------------------
4. Configuration
HTML Purifier is designed to run out-of-the-box, but occasionally HTML
Purifier needs to be told what to do. If you answered no to any of these
questions, read on, otherwise, you can skip to the next section (or, if you're
Purifier needs to be told what to do. If you answer no to any of these
questions, read on; otherwise, you can skip to the next section (or, if you're
into configuring things just for the heck of it, skip to 4.3).
* Am I using UTF-8?
* Am I using XHTML 1.0 Transitional?
If you answered yes to any of these questions, instantiate a configuration
If you answered no to any of these questions, instantiate a configuration
object and read on:
$config = HTMLPurifier_Config::createDefault();
4.1. Setting a different character encoding
You really shouldn't use any other encoding except UTF-8, especially if you
@@ -105,74 +205,123 @@ HTML Purifier uses iconv to support other character encodings, as such,
any encoding that iconv supports <http://www.gnu.org/software/libiconv/>
HTML Purifier supports with this code:
$config->set('Core', 'Encoding', /* put your encoding here */);
$config->set('Core.Encoding', /* put your encoding here */);
An example usage for Latin-1 websites (the most common encoding for English
websites):
$config->set('Core', 'Encoding', 'ISO-8859-1');
$config->set('Core.Encoding', 'ISO-8859-1');
Note that HTML Purifier's support for non-Unicode encodings is crippled by the
fact that any character not supported by that encoding will be silently
dropped, EVEN if it is ampersand escaped. This is a current limitation of
HTML Purifier that we are NOT actively working to fix. Patches are welcome,
but there are so many other gotchas and problems in I18N for non-Unicode
encodings that this functionality is low priority. See
<http://ppewww.ph.gla.ac.uk/~flavell/charset/form-i18n.html> for a more
detailed lowdown on the topic.
dropped, EVEN if it is ampersand escaped. If you want to work around
this, you are welcome to read docs/enduser-utf8.html for a fix,
but please be cognizant of the issues the "solution" creates (for this
reason, I do not include the solution in this document).
4.2. Setting a different doctype
For those of you stuck using HTML 4.01 Transitional, you can disable
For those of you using HTML 4.01 Transitional, you can disable
XHTML output like this:
$config->set('Core', 'XHTML', false);
$config->set('HTML.Doctype', 'HTML 4.01 Transitional');
I recommend that you use XHTML, although not as much as I recommend UTF-8. If
your HTML 4.01 page validates, good for you!
Currently, we can only guarantee transitional-complaint output, future
versions will also allow strict-compliant output.
Other supported doctypes include:
* HTML 4.01 Strict
* HTML 4.01 Transitional
* XHTML 1.0 Strict
* XHTML 1.0 Transitional
* XHTML 1.1
4.3. Other settings
There are more configuration directives which can be read about
here: <http://hp.jpsband.org/live/configdoc/plain.html> They're a bit boring,
here: <http://htmlpurifier.org/live/configdoc/plain.html> They're a bit boring,
but they can help out for those of you who like to exert maximum control over
your code.
your code. Some of the more interesting ones are configurable at the
demo <http://htmlpurifier.org/demo.php> and are well worth looking into
for your own system.
For example, you can fine tune allowed elements and attributes, convert
relative URLs to absolute ones, and even autoparagraph input text! These
are, respectively, %HTML.Allowed, %URI.MakeAbsolute and %URI.Base, and
%AutoFormat.AutoParagraph. The %Namespace.Directive naming convention
translates to:
$config->set('Namespace.Directive', $value);
E.g.
$config->set('HTML.Allowed', 'p,b,a[href],i');
$config->set('URI.Base', 'http://www.example.com');
$config->set('URI.MakeAbsolute', true);
$config->set('AutoFormat.AutoParagraph', true);
---------------------------------------------------------------------------
5. Caching
5. Using the code
HTML Purifier generates some cache files (generally one or two) to speed up
its execution. For maximum performance, make sure that
library/HTMLPurifier/DefinitionCache/Serializer is writeable by the webserver.
If you are in the library/ folder of HTML Purifier, you can set the
appropriate permissions using:
chmod -R 0755 HTMLPurifier/DefinitionCache/Serializer
If the above command doesn't work, you may need to assign write permissions
to group:
chmod -R 0775 HTMLPurifier/DefinitionCache/Serializer
You can also chmod files via your FTP client; this option
is usually accessible by right clicking the corresponding directory and
then selecting "chmod" or "file permissions".
Starting with 2.0.1, HTML Purifier will generate friendly error messages
that will tell you exactly what you have to chmod the directory to, if in doubt,
follow its advice.
If you are unable or unwilling to give write permissions to the cache
directory, you can either disable the cache (and suffer a performance
hit):
$config->set('Core.DefinitionCache', null);
Or move the cache directory somewhere else (no trailing slash):
$config->set('Cache.SerializerPath', '/home/user/absolute/path');
---------------------------------------------------------------------------
6. Using the code
The interface is mind-numbingly simple:
$purifier = new HTMLPurifier();
$clean_html = $purifier->purify( $dirty_html );
...or, if you're using the configuration object:
$purifier = new HTMLPurifier($config);
$clean_html = $purifier->purify( $dirty_html );
That's it! For more examples, check out docs/examples/ (they aren't very
different though). Also, SLOW gives advice on what to do if HTML Purifier
is slowing down your application.
different though). Also, docs/enduser-slow.html gives advice on what to
do if HTML Purifier is slowing down your application.
---------------------------------------------------------------------------
7. Quick install
6. Quick install
First, make sure library/HTMLPurifier/DefinitionCache/Serializer is
writable by the webserver (see Section 5: Caching above for details).
If your website is in UTF-8 and XHTML Transitional, use this code:
<?php
require_once '/path/to/htmlpurifier/library/HTMLPurifier.auto.php';
$purifier = new HTMLPurifier();
$config = HTMLPurifier_Config::createDefault();
$purifier = new HTMLPurifier($config);
$clean_html = $purifier->purify($dirty_html);
?>
@@ -180,11 +329,13 @@ If your website is in a different encoding or doctype, use this code:
<?php
require_once '/path/to/htmlpurifier/library/HTMLPurifier.auto.php';
$config = HTMLPurifier_Config::createDefault();
$config->set('Core', 'Encoding', 'ISO-8859-1'); //replace with your encoding
$config->set('Core', 'XHTML', true); //replace with false if HTML 4.01
$config->set('Core.Encoding', 'ISO-8859-1'); // replace with your encoding
$config->set('HTML.Doctype', 'HTML 4.01 Transitional'); // replace with your doctype
$purifier = new HTMLPurifier($config);
$clean_html = $purifier->purify($dirty_html);
?>
?>
vim: et sw=4 sts=4

60
INSTALL.fr.utf8 Normal file
View File

@@ -0,0 +1,60 @@
Installation
Comment installer HTML Purifier
Attention : Ce document est encodé en UTF-8, si les lettres avec des accents
ne s'affichent pas, prenez un meilleur éditeur de texte.
L'installation de HTML Purifier est très simple, parce qu'il n'a pas besoin
de configuration. Pour les utilisateurs impatients, le code se trouve dans le
pied de page, mais je recommande de lire le document.
1. Compatibilité
HTML Purifier fonctionne avec PHP 5. PHP 5.3 est la dernière version testée.
Il ne dépend pas d'autres librairies.
Les extensions optionnelles sont iconv (généralement déjà installée) et tidy
(répendue aussi). Si vous utilisez UTF-8 et que vous ne voulez pas l'indentation,
vous pouvez utiliser HTML Purifier sans ces extensions.
2. Inclure la librairie
Quand vous devez l'utilisez, incluez le :
require_once('/path/to/library/HTMLPurifier.auto.php');
Ne pas l'inclure si ce n'est pas nécessaire, car HTML Purifier est lourd.
HTML Purifier utilise "autoload". Si vous avez défini la fonction __autoload,
vous devez ajouter cette fonction :
spl_autoload_register('__autoload')
Plus d'informations dans le document "INSTALL".
3. Installation rapide
Si votre site Web est en UTF-8 et XHTML Transitional, utilisez :
<?php
require_once('/path/to/htmlpurifier/library/HTMLPurifier.auto.php');
$purificateur = new HTMLPurifier();
$html_propre = $purificateur->purify($html_a_purifier);
?>
Sinon, utilisez :
<?php
require_once('/path/to/html/purifier/library/HTMLPurifier.auto.load');
$config = $HTMLPurifier_Config::createDefault();
$config->set('Core', 'Encoding', 'ISO-8859-1'); //Remplacez par votre
encodage
$config->set('Core', 'XHTML', true); //Remplacer par false si HTML 4.01
$purificateur = new HTMLPurifier($config);
$html_propre = $purificateur->purify($html_a_purifier);
?>
vim: et sw=4 sts=4

View File

@@ -146,7 +146,7 @@ such a program is covered only if its contents constitute a work based
on the Library (independent of the use of the Library in a tool for
writing it). Whether that is true depends on what the Library does
and what the program that uses the Library does.
1. You may copy and distribute verbatim copies of the Library's
complete source code as you receive it, in any medium, provided that
you conspicuously and appropriately publish on each copy an
@@ -501,4 +501,4 @@ necessary. Here is a sample; alter the names:
That's all there is to it!
vim: et sw=4 sts=4

1097
NEWS

File diff suppressed because it is too large Load Diff

22
README
View File

@@ -1,22 +0,0 @@
README
All about HTML Purifier
HTML Purifier is an HTML filtering solution that uses a unique combination
of robust whitelists and agressive parsing to ensure that not only are
XSS attacks thwarted, but the resulting HTML is standards compliant.
HTML Purifier is oriented towards richly formatted documents from
untrusted sources that require CSS and a full tag-set. This library can
be configured to accept a more restrictive set of tags, but it won't be
as efficient as more bare-bones parsers. It will, however, do the job
right, which may be more important.
Places to go:
* See INSTALL for a quick installation guide
* See docs/ for developer-oriented documentation, code examples and
an in-depth installation guide.
* See WYSIWYG for information on editors like TinyMCE and FCKeditor
HTML Purifier can be found on the web at: http://hp.jpsband.org/

29
README.md Normal file
View File

@@ -0,0 +1,29 @@
HTML Purifier [![Build Status](https://github.com/ezyang/htmlpurifier/actions/workflows/ci.yml/badge.svg?branch=master)](https://github.com/ezyang/htmlpurifier/actions/workflows/ci.yml)
=============
HTML Purifier is an HTML filtering solution that uses a unique combination
of robust whitelists and aggressive parsing to ensure that not only are
XSS attacks thwarted, but the resulting HTML is standards compliant.
HTML Purifier is oriented towards richly formatted documents from
untrusted sources that require CSS and a full tag-set. This library can
be configured to accept a more restrictive set of tags, but it won't be
as efficient as more bare-bones parsers. It will, however, do the job
right, which may be more important.
Places to go:
* See INSTALL for a quick installation guide
* See docs/ for developer-oriented documentation, code examples and
an in-depth installation guide.
* See WYSIWYG for information on editors like TinyMCE and FCKeditor
HTML Purifier can be found on the web at: [http://htmlpurifier.org/](http://htmlpurifier.org/)
## Installation
Package available on [Composer](https://packagist.org/packages/ezyang/htmlpurifier).
If you're using Composer to manage dependencies, you can use
$ composer require ezyang/htmlpurifier

195
TODO
View File

@@ -4,88 +4,147 @@ TODO List
= KEY ====================
# Flagship
- Regular
? At-risk
? Maybe I'll Do It
==========================
1.5 release
# Implement all non-essential attribute transforms, configurable
# URI validation routines tighter (see docs/dev-code-quality.html) (COMPLEX)
# Advanced URI filtering schemes (see docs/proposal-new-directives.txt)
If no interest is expressed for a feature that may require a considerable
amount of effort to implement, it may get endlessly delayed. Do not be
afraid to cast your vote for the next feature to be implemented!
Things to do as soon as possible:
- http://htmlpurifier.org/phorum/read.php?3,5560,6307#msg-6307
- Think about allowing explicit order of operations hooks for transforms
- Fix "<.<" bug (trailing < is removed if not EOD)
- Build in better internal state dumps and debugging tools for remote
debugging
- Allowed/Allowed* have strange interactions when both set
? Transform lone embeds into object tags
- Deprecated config options that emit warnings when you set them (with'
a way of muting the warning if you really want to)
- Make HTML.Trusted work with Output.FlashCompat
- HTML.Trusted and HTML.SafeObject have funny interaction; general
problem is what to do when a module "supersedes" another
(see also tables and basic tables.) This is a little dicier
because HTML.SafeObject has some extra functionality that
trusted might find useful. See http://htmlpurifier.org/phorum/read.php?3,5762,6100
FUTURE VERSIONS
---------------
4.9 release [OMG CONFIG PONIES]
! Fix Printer. It's from the old days when we didn't have decent XML classes
! Factor demo.php into a set of Printer classes, and then create a stub
file for users here (inside the actual HTML Purifier library)
- Fix error handling with form construction
- Do encoding validation in Printers, or at least, where user data comes in
- Config: Add examples to everything (make built-in which also automatically
gives output)
- Add "register" field to config schemas to eliminate dependence on
naming conventions (try to remember why we ultimately decided on tihs)
5.0 release [HTML 5]
# Swap out code to use html5lib tokenizer and tree-builder
! Allow turning off of FixNesting and required attribute insertion
5.1 release [It's All About Trust] (floating)
# Implement untrusted, dangerous elements/attributes
# Implement IDREF support (harder than it seems, since you cannot have
IDREFs to non-existent IDs)
- Implement <area> (client and server side image maps are blocking
on IDREF support)
# Frameset XHTML 1.0 and HTML 4.01 doctypes
- Figure out how to simultaneously set %CSS.Trusted and %HTML.Trusted (?)
5.2 release [Error'ed]
# Error logging for filtering/cleanup procedures
- Requires I18N facilities to be created first (COMPLEX)
? Configuration profiles: sets of directives that get set with one func call
- XSS-attempt detection
1.6 release
# Add pre-packaged "levels" of cleaning (custom behavior already done)
- More fine-grained control over escaping behavior
- Silently drop content inbetween SCRIPT tags (can be generalized to allow
specification of elements that, when detected as foreign, trigger removal
of children, although unbalanced tags could wreck havoc (or at least
delete the rest of the document)).
- Allow specifying global attributes on a tag-by-tag basis in
%HTML.AllowAttributes
? More user-friendly warnings when %HTML.Allow* attempts to specify a
tag or attribute that is not supported
- Parse TinyMCE whitelist into our %HTML.Allow* whitelists
1.7 release
# Additional support for poorly written HTML
- Microsoft Word HTML cleaning (i.e. MsoNormal, but research essential!)
- Friendly strict handling of <address> (block -> <br>)
- Remove redundant tags, ex. <u><u>Underlined</u></u>. Implementation notes:
- XSS-attempt detection--certain errors are flagged XSS-like
- Append something to duplicate IDs so they're still usable (impl. note: the
dupe detector would also need to detect the suffix as well)
6.0 release [Beyond HTML]
# Legit token based CSS parsing (will require revamping almost every
AttrDef class). Probably will use CSSTidy
# More control over allowed CSS properties using a modularization
# IRI support (this includes IDN)
- Standardize token armor for all areas of processing
7.0 release [To XML and Beyond]
- Extended HTML capabilities based on namespacing and tag transforms (COMPLEX)
- Hooks for adding custom processors to custom namespaced tags and
attributes, offer default implementation
- Lots of documentation and samples
Ongoing
- More refactoring to take advantage of PHP5's facilities
- Refactor unit tests into lots of test methods
- Plugins for major CMSes (COMPLEX)
- phpBB
- Also, a FAQ for extension writers with HTML Purifier
AutoFormat
- Smileys
- Syntax highlighting (with GeSHi) with <pre> and possibly <?php
- Look at http://drupal.org/project/Modules/category/63 for ideas
Neat feature related
! Support exporting configuration, so users can easily tweak settings
in the demo, and then copy-paste into their own setup
- Advanced URI filtering schemes (see docs/proposal-new-directives.txt)
- Allow scoped="scoped" attribute in <style> tags; may be troublesome
because regular CSS has no way of uniquely identifying nodes, so we'd
have to generate IDs
- Explain how to use HTML Purifier in non-PHP languages / create
a simple command line stub (or complicated?)
- Fixes for Firefox's inability to handle COL alignment props (Bug 915)
- Automatically add non-breaking spaces to empty table cells when
empty-cells:show is applied to have compatibility with Internet Explorer
- Table of Contents generation (XHTML Compiler might be reusable). May also
be out-of-band information.
- Full set of color keywords. Also, a way to add onto them without
finalizing the configuration object.
- Write a var_export and memcached DefinitionCache - Denis
- Built-in support for target="_blank" on all external links
- Convert RTL/LTR override characters to <bdo> tags, or vice versa on demand.
Also, enable disabling of directionality
? Externalize inline CSS to promote clean HTML, proposed by Sander Tekelenburg
? Remove redundant tags, ex. <u><u>Underlined</u></u>. Implementation notes:
1. Analyzing which tags to remove duplicants
2. Ensure attributes are merged into the parent tag
3. Extend the tag exclusion system to specify whether or not the
contents should be dropped or not (currently, there's code that could do
something like this if it didn't drop the inner text too.)
- Remove <span> tags that don't do anything (no attributes)
- Remove empty inline tags<i></i>
- Append something to duplicate IDs so they're still usable (impl. note: the
dupe detector would also need to detect the suffix as well)
? Make AutoParagraph also support paragraph-izing double <br> tags, and not
just double newlines. This is kind of tough to do in the current framework,
though, and might be reasonably approximated by search replacing double <br>s
with newlines before running it through HTML Purifier.
2.0 release
# Legit token based CSS parsing (will require revamping almost every
AttrDef class)
# Formatters for plaintext (COMPLEX)
- Auto-paragraphing (be sure to leverage fact that we know when things
shouldn't be paragraphed, such as lists and tables).
- Linkify URLs
- Smileys
- Linkification for HTML Purifier docs: notably configuration and classes
Maintenance related (slightly boring)
# CHMOD install script for PEAR installs
! Factor out command line parser into its own class, and unit test it
- Reduce size of internal data-structures (esp. HTMLDefinition)
- Allow merging configurations. Thus,
a -> b -> default
c -> d -> default
becomes
a -> b -> c -> d -> default
Maybe allow more fine-grained tuning of this behavior. Alternatively,
encourage people to use short plist depths before building them up.
- Time PHPT tests
3.0 release
- Extended HTML capabilities based on namespacing and tag transforms (COMPLEX)
- Hooks for adding custom processors to custom namespaced tags and
attributes, offer default implementation
- Lots of documentation and samples
- Allow tags to be "armored", an internal flag that protects them
from validation and passes them out unharmed
- XHTML 1.1 support
- Fixes for Firefox's inability to handle COL alignment props (Bug 915)
- Automatically add non-breaking spaces to empty table cells when
empty-cells:show is applied to have compatibility with Internet Explorer
- Convert RTL/LTR override characters to <bdo> tags, or vice versa on demand.
Also, enable disabling of directionality
Ongoing
- Lots of profiling, make it faster!
- Plugins for major CMSes (COMPLEX)
- WordPress
- eFiction
- more! (look for ones that use WYSIWYGs)
Unknown release (on a scratch-an-itch basis)
- Upgrade SimpleTest testing code to newest versions
- Have 'lang' attribute be checked against official lists
? Semi-lossy dumb alternate character encoding transformations, achieved by
encoding all characters that have string entity equivalents
Requested
? Native content compression, whitespace stripping (don't rely on Tidy, make
sure we don't remove from <pre> or related tags)
ChildDef related (very boring)
- Abstract ChildDef_BlockQuote to work with all elements that only
allow blocks in them, required or optional
- Implement lenient <ruby> child validation
Wontfix
- Non-lossy smart alternate character encoding transformations (unless
patch provided)
- Pretty-printing HTML, users can use Tidy on the output on entire page
- Pretty-printing HTML: users can use Tidy on the output on entire page
- Native content compression, whitespace stripping: use gzip if this is
really important
vim: et sw=4 sts=4

1
VERSION Normal file
View File

@@ -0,0 +1 @@
4.15.0

View File

@@ -17,6 +17,4 @@ HTML Purifier is perfect for filtering pure-HTML input from WYSIWYG editors.
Enough said.
There is a proof-of-concept integration of HTML Purifier with the Mantis
bugtracker at http://hp.jpsband.org/mantis/ You can see notes on how
this integration was acheived at http://hp.jpsband.org/mantis_notes.txt
vim: et sw=4 sts=4

BIN
art/100cases.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.7 KiB

7
benchmarks/.htaccess Normal file
View File

@@ -0,0 +1,7 @@
<IfModule mod_authz_core.c>
Require all denied
</IfModule>
<IfModule !mod_authz_core.c>
Deny from all
</ifModule>

View File

@@ -0,0 +1,16 @@
<?php
chdir(dirname(__FILE__));
//require_once '../library/HTMLPurifier.path.php';
shell_exec('php ../maintenance/generate-schema-cache.php');
require_once '../library/HTMLPurifier.path.php';
require_once 'HTMLPurifier.includes.php';
$begin = xdebug_memory_usage();
$schema = HTMLPurifier_ConfigSchema::makeFromSerial();
echo xdebug_memory_usage() - $begin;
// vim: et sw=4 sts=4

View File

@@ -1,86 +1,76 @@
<?php
// emulates inserting a dir called HTMLPurifier into your class dir
set_include_path(get_include_path() . PATH_SEPARATOR . '../library/');
require_once '../library/HTMLPurifier.auto.php';
@include_once '../test-settings.php';
require_once 'HTMLPurifier/ConfigSchema.php';
require_once 'HTMLPurifier/Config.php';
// PEAR
require_once 'Benchmark/Timer.php'; // to do the timing
require_once 'Text/Password.php'; // for generating random input
$LEXERS = array();
$RUNS = isset($GLOBALS['HTMLPurifierTest']['Runs'])
? $GLOBALS['HTMLPurifierTest']['Runs'] : 2;
? $GLOBALS['HTMLPurifierTest']['Runs'] : 2;
require_once 'HTMLPurifier/Lexer/DirectLex.php';
$LEXERS['DirectLex'] = new HTMLPurifier_Lexer_DirectLex();
if (!empty($GLOBALS['HTMLPurifierTest']['PEAR'])) {
require_once 'HTMLPurifier/Lexer/PEARSax3.php';
$LEXERS['PEARSax3'] = new HTMLPurifier_Lexer_PEARSax3();
} else {
exit('PEAR required to perform benchmark.');
}
if (version_compare(PHP_VERSION, '5', '>=')) {
require_once 'HTMLPurifier/Lexer/DOMLex.php';
$LEXERS['DOMLex'] = new HTMLPurifier_Lexer_DOMLex();
}
// PEAR
require_once 'Benchmark/Timer.php'; // to do the timing
require_once 'Text/Password.php'; // for generating random input
// custom class to aid unit testing
class RowTimer extends Benchmark_Timer
{
var $name;
function RowTimer($name, $auto = false) {
public $name;
public function __construct($name, $auto = false)
{
$this->name = htmlentities($name);
$this->Benchmark_Timer($auto);
}
function getOutput() {
public function getOutput()
{
$total = $this->TimeElapsed();
$result = $this->getProfiling();
$dashes = '';
$out = '<tr>';
$out .= "<td>{$this->name}</td>";
$standard = false;
foreach ($result as $k => $v) {
if ($v['name'] == 'Start' || $v['name'] == 'Stop') continue;
//$perc = (($v['diff'] * 100) / $total);
//$tperc = (($v['total'] * 100) / $total);
//$out .= '<td align="right">' . $v['diff'] . '</td>';
if ($standard == false) $standard = $v['diff'];
$perc = $v['diff'] * 100 / $standard;
$bad_run = ($v['diff'] < 0);
$out .= '<td align="right"'.
($bad_run ? ' style="color:#AAA;"' : '').
'>' . number_format($perc, 2, '.', '') .
'%</td><td>'.number_format($v['diff'],4,'.','').'</td>';
}
$out .= '</tr>';
return $out;
}
}
function print_lexers() {
function print_lexers()
{
global $LEXERS;
$first = true;
foreach ($LEXERS as $key => $value) {
@@ -90,17 +80,21 @@ function print_lexers() {
}
}
function do_benchmark($name, $document) {
function do_benchmark($name, $document)
{
global $LEXERS, $RUNS;
$config = HTMLPurifier_Config::createDefault();
$context = new HTMLPurifier_Context();
$timer = new RowTimer($name);
$timer->start();
foreach($LEXERS as $key => $lexer) {
for ($i=0; $i<$RUNS; $i++) $tokens = $lexer->tokenizeHTML($document);
for ($i=0; $i<$RUNS; $i++) $tokens = $lexer->tokenizeHTML($document, $config, $context);
$timer->setMarker($key);
}
$timer->stop();
$timer->display();
}
@@ -127,11 +121,11 @@ foreach ($LEXERS as $key => $value) {
$dir = 'samples/Lexer';
$dh = opendir($dir);
while (false !== ($filename = readdir($dh))) {
if (strpos($filename, '.html') !== strlen($filename) - 5) continue;
$document = file_get_contents($dir . '/' . $filename);
do_benchmark("File: $filename", $document);
}
// crashers, caused infinite loops before
@@ -161,4 +155,7 @@ echo '<div>Random input was: ' .
?>
</body></html>
</body></html>
<?php
// vim: et sw=4 sts=4

View File

@@ -1,16 +0,0 @@
<?php
set_include_path(get_include_path() . PATH_SEPARATOR . '../library/');
require_once 'HTMLPurifier/ConfigSchema.php';
require_once 'HTMLPurifier/Config.php';
require_once 'HTMLPurifier/Lexer/DirectLex.php';
$input = file_get_contents('samples/Lexer/4.html');
$lexer = new HTMLPurifier_Lexer_DirectLex();
for ($i = 0; $i < 10; $i++) {
$tokens = $lexer->tokenizeHTML($input);
}
?>

21
benchmarks/Trace.php Normal file
View File

@@ -0,0 +1,21 @@
<?php
ini_set('xdebug.trace_format', 1);
ini_set('xdebug.show_mem_delta', true);
if (file_exists('Trace.xt')) {
echo "Previous trace Trace.xt must be removed before this script can be run.";
exit;
}
xdebug_start_trace(dirname(__FILE__) . '/Trace');
require_once '../library/HTMLPurifier.auto.php';
$purifier = new HTMLPurifier();
$data = $purifier->purify(file_get_contents('samples/Lexer/4.html'));
xdebug_stop_trace();
echo "Trace finished.";
// vim: et sw=4 sts=4

View File

@@ -28,7 +28,7 @@
<li>HX Edison Taiji Club <a href="http://www.taijiclub.org/downloads/Taiji_club_regulation_.pdf">by-law</a> effective 3/28/2006</li>
<li>A new email account for our club: HXEdisontaijiclub@yahoo.com</li>
<li>Workshop conducted by <a href="http://www.taijiclub.org/ch/Digest/LiDeyin">?????</a> Li Deyin is set on June 4, 2006 at Clarion Hotel in Edison from 9:30am-12pm; <a href="http://www.taijiclub.org/en/Registration">Registration</a></li>
</ul>
@@ -36,7 +36,7 @@
<p><i>Taiji</i> is an ancient Chinese tradition of movement systems that is associated with philosophy, physiology, psychology, geometry and dynamics. It is the slowest form of martial arts and is meant to improve the internal spirit. It is soothing to the soul and extremely invigorating. </p>
<p><i>Taiji</i> is an ancient Chinese tradition of movement systems that is associated with philosophy, physiology, psychology, geometry and dynamics. It is the slowest form of martial arts and is meant to improve the internal spirit. It is soothing to the soul and extremely invigorating. </p>
<p>The founder of Taiji was Zhang Sanfeng (Chang San-feng), who was a monk of the Wu Dang (Wu Tang) Monastery and lived in the period from 1391 to 1459. His exercises stressed suppleness and elasticity as opposed to the hardness and force of other martial art styles. Several centuries old, Taiji was originally developed as a form of self-defense, emphasizing strength, balance, flexibility and speed. Tai Chi also differs from other martial arts in that it is based on the Taoist religion and aims to avoid aggressive forces. </p>
@@ -50,4 +50,7 @@
<div style="text-align:center;">Click on photo to see HR version</div></div>
</body>
</html>
</html>
<!-- vim: et sw=4 sts=4
-->

View File

@@ -14,4 +14,7 @@ function rwt(el,ct,cd,sg){var e = window.encodeURIComponent ? encodeURIComponent
<form action=/search name=f><script><!--
function qs(el) {if (window.RegExp && window.encodeURIComponent) {var ue=el.href;var qe=encodeURIComponent(document.f.q.value);if(ue.indexOf("q=")!=-1){el.href=ue.replace(new RegExp("q=[^&$]*"),"q="+qe);}else{el.href=ue+"&q="+qe;}}return 1;}
// -->
</script><table border=0 cellspacing=0 cellpadding=4><tr><td nowrap><font size=-1><b>Web</b>&nbsp;&nbsp;&nbsp;&nbsp;<a id=1a class=q href="/imghp?hl=en&tab=wi" onClick="return qs(this);">Images</a>&nbsp;&nbsp;&nbsp;&nbsp;<a id=2a class=q href="http://groups.google.com/grphp?hl=en&tab=wg" onClick="return qs(this);">Groups</a>&nbsp;&nbsp;&nbsp;&nbsp;<a id=4a class=q href="http://news.google.com/nwshp?hl=en&tab=wn" onClick="return qs(this);">News</a>&nbsp;&nbsp;&nbsp;&nbsp;<a id=5a class=q href="http://froogle.google.com/frghp?hl=en&tab=wf" onClick="return qs(this);">Froogle</a>&nbsp;&nbsp;&nbsp;&nbsp;<a id=8a class=q href="/lochp?hl=en&tab=wl" onClick="return qs(this);">Local</a>&nbsp;&nbsp;&nbsp;&nbsp;<b><a href="/intl/en/options/" class=q>more&nbsp;&raquo;</a></b></font></td></tr></table><table cellspacing=0 cellpadding=0><tr><td width=25%>&nbsp;</td><td align=center><input type=hidden name=hl value=en><input maxlength=2048 size=55 name=q value="" title="Google Search"><br><input type=submit value="Google Search" name=btnG><input type=submit value="I'm Feeling Lucky" name=btnI></td><td valign=top nowrap width=25%><font size=-2>&nbsp;&nbsp;<a href=/advanced_search?hl=en>Advanced Search</a><br>&nbsp;&nbsp;<a href=/preferences?hl=en>Preferences</a><br>&nbsp;&nbsp;<a href=/language_tools?hl=en>Language Tools</a></font></td></tr></table></form><br><br><font size=-1><a href="/ads/">Advertising&nbsp;Programs</a> - <a href=/services/>Business Solutions</a> - <a href=/about.html>About Google</a></font><p><font size=-2>&copy;2006 Google</font></p></center></body></html>
</script><table border=0 cellspacing=0 cellpadding=4><tr><td nowrap><font size=-1><b>Web</b>&nbsp;&nbsp;&nbsp;&nbsp;<a id=1a class=q href="/imghp?hl=en&tab=wi" onClick="return qs(this);">Images</a>&nbsp;&nbsp;&nbsp;&nbsp;<a id=2a class=q href="http://groups.google.com/grphp?hl=en&tab=wg" onClick="return qs(this);">Groups</a>&nbsp;&nbsp;&nbsp;&nbsp;<a id=4a class=q href="http://news.google.com/nwshp?hl=en&tab=wn" onClick="return qs(this);">News</a>&nbsp;&nbsp;&nbsp;&nbsp;<a id=5a class=q href="http://froogle.google.com/frghp?hl=en&tab=wf" onClick="return qs(this);">Froogle</a>&nbsp;&nbsp;&nbsp;&nbsp;<a id=8a class=q href="/lochp?hl=en&tab=wl" onClick="return qs(this);">Local</a>&nbsp;&nbsp;&nbsp;&nbsp;<b><a href="/intl/en/options/" class=q>more&nbsp;&raquo;</a></b></font></td></tr></table><table cellspacing=0 cellpadding=0><tr><td width=25%>&nbsp;</td><td align=center><input type=hidden name=hl value=en><input maxlength=2048 size=55 name=q value="" title="Google Search"><br><input type=submit value="Google Search" name=btnG><input type=submit value="I'm Feeling Lucky" name=btnI></td><td valign=top nowrap width=25%><font size=-2>&nbsp;&nbsp;<a href=/advanced_search?hl=en>Advanced Search</a><br>&nbsp;&nbsp;<a href=/preferences?hl=en>Preferences</a><br>&nbsp;&nbsp;<a href=/language_tools?hl=en>Language Tools</a></font></td></tr></table></form><br><br><font size=-1><a href="/ads/">Advertising&nbsp;Programs</a> - <a href=/services/>Business Solutions</a> - <a href=/about.html>About Google</a></font><p><font size=-2>&copy;2006 Google</font></p></center></body></html>
<!-- vim: et sw=4 sts=4
-->

View File

@@ -67,21 +67,21 @@ if (objAdMgr.isSlotAvailable("leaderboard")) {
</table>
<table width="86%" border="0" cellspacing="0" cellpadding="2">
<tr>
<td height="388" width="19%" bgcolor="#FFCCFF" valign="top">
<td height="388" width="19%" bgcolor="#FFCCFF" valign="top">
<p>May 1, 2000</p>
<p><b>Pop Culture</b> </p>
<p>by. H. Finkelstein</p>
</td>
<td height="388" width="52%" valign="top">
<p>Welcome to the <b>Anime Digi-Lib</b>, a virtual index to anime on the
internet. This site strives to house a comprehensive index to both personal
and commercial websites and provides reviews to these sites. We hope to
be a gateway for people who've never imagined they'd ever be interested
<td height="388" width="52%" valign="top">
<p>Welcome to the <b>Anime Digi-Lib</b>, a virtual index to anime on the
internet. This site strives to house a comprehensive index to both personal
and commercial websites and provides reviews to these sites. We hope to
be a gateway for people who've never imagined they'd ever be interested
in Japanese Animation. </p>
<table width="99%" border="1" cellspacing="0" cellpadding="2" height="320" name="Searchnservices">
<tr>
<td height="263" valign="top" width="58%">
<td height="263" valign="top" width="58%">
<p>&nbsp; </p>
<p>&nbsp;</p>
@@ -107,15 +107,15 @@ if (objAdMgr.isSlotAvailable("leaderboard")) {
</form>
<td>
<td>
<table border="0" cellpadding="0" cellspacing="0" width="100%">
<tr><td><font face="verdana,geneva" color="#000011" size="1">What is better, subtitled or dubbed anime?</font></td></tr>
<tr><td><input type="radio" name="rd" value="1"><font face="verdana" size="2" color="#000011">Subtitled</font></td></tr>
<tr><td align="middle"><font face="verdana" size="1"><a href="http://pub.alxnet.com/poll?id=2079873&q=view">Current results</a></font></td></tr>
</table></td></tr>
<tr>
<td><font face="verdana" size="1"><a href="http://www.alxnet.com/services/poll/">Free
<tr>
<td><font face="verdana" size="1"><a href="http://www.alxnet.com/services/poll/">Free
Web Polls</a></font></td>
</tr>
</table></form>
@@ -126,3 +126,6 @@ if (objAdMgr.isSlotAvailable("leaderboard")) {
</body>
</html>
<!-- vim: et sw=4 sts=4
-->

View File

@@ -138,7 +138,7 @@
</table>
<p><script type="text/javascript">
//<![CDATA[
if (window.showTocToggle) { var tocShowText = "show"; var tocHideText = "hide"; showTocToggle(); }
if (window.showTocToggle) { var tocShowText = "show"; var tocHideText = "hide"; showTocToggle(); }
//]]>
</script></p>
<div class="editsection" style="float:right;margin-left:5px;">[<a href="/w/index.php?title=Tai_Chi_Chuan&amp;action=edit&amp;section=1" title="Edit section: Overview">edit</a>]</div>
@@ -279,19 +279,19 @@ Yang Small Frame | <a href="/wik
| |
<a href="/w/index.php?title=Wu_Ta-kuei&amp;action=edit" class="new" title="Wu Ta-kuei">Wu Ta-kuei</a> <a href="/w/index.php?title=Sun_Hsing-i&amp;action=edit" class="new" title="Sun Hsing-i">Sun Hsing-i</a>
1923-1970 1891-1929
<b>MODERN FORMS</b>
from Yang Ch`eng-fu
|
|
|
+--------------+
| |
<a href="/wiki/Cheng_Man-ch%27ing" title="Cheng Man-ch'ing">Cheng Man-ch'ing</a> |
1901-1975 |
Short (37) Form |
1923-1970 1891-1929
<b>MODERN FORMS</b>
from Yang Ch`eng-fu
|
|
|
+--------------+
| |
<a href="/wiki/Cheng_Man-ch%27ing" title="Cheng Man-ch'ing">Cheng Man-ch'ing</a> |
1901-1975 |
Short (37) Form |
|
Chinese Sports Commission
1956
@@ -538,3 +538,6 @@ Retrieved from "<a href="http://en.wikipedia.org/wiki/Tai_Chi_Chuan">http://en.w
<!-- Served by srv25 in 0.089 secs. -->
</body></html>
<!-- vim: et sw=4 sts=4
-->

View File

@@ -2,4 +2,6 @@ Disclaimer:
The HTML used in these samples are taken from random websites. I claim
no copyright over these and assert that I may use them like this under
fair use.
fair use.
vim: et sw=4 sts=4

44
composer.json Normal file
View File

@@ -0,0 +1,44 @@
{
"name": "ezyang/htmlpurifier",
"description": "Standards compliant HTML filter written in PHP",
"type": "library",
"keywords": ["html"],
"homepage": "http://htmlpurifier.org/",
"license": "LGPL-2.1-or-later",
"authors": [
{
"name": "Edward Z. Yang",
"email": "admin@htmlpurifier.org",
"homepage": "http://ezyang.com"
}
],
"require": {
"php": "~5.6.0 || ~7.0.0 || ~7.1.0 || ~7.2.0 || ~7.3.0 || ~7.4.0 || ~8.0.0 || ~8.1.0 || ~8.2.0"
},
"require-dev": {
"cerdic/css-tidy": "^1.7 || ^2.0",
"simpletest/simpletest": "dev-master"
},
"autoload": {
"psr-0": { "HTMLPurifier": "library/" },
"files": ["library/HTMLPurifier.composer.php"],
"exclude-from-classmap": [
"/library/HTMLPurifier/Language/"
]
},
"suggest": {
"cerdic/css-tidy": "If you want to use the filter 'Filter.ExtractStyleBlocks'.",
"ext-iconv": "Converts text to and from non-UTF-8 encodings",
"ext-bcmath": "Used for unit conversion and imagecrash protection",
"ext-tidy": "Used for pretty-printing HTML"
},
"config": {
"sort-packages": true
},
"repositories": [
{
"type": "vcs",
"url": "https://github.com/ezyang/simpletest.git"
}
]
}

View File

@@ -2,219 +2,63 @@
/**
* Generates XML and HTML documents describing configuration.
* @note PHP 5.2+ only!
*/
/*
TODO:
- make XML format richer (see below)
- make XML format richer
- extend XSLT transformation (see the corresponding XSLT file)
- allow generation of packaged docs that can be easily moved
- multipage documentation
- determine how to multilingualize
- factor out code into classes
- add blurbs to ToC
*/
// ---------------------------------------------------------------------------
// Check and configure environment
if (version_compare(PHP_VERSION, '5.2', '<')) exit('PHP 5.2+ required.');
error_reporting(E_ALL | E_STRICT);
if (version_compare('5', PHP_VERSION, '>')) exit('Requires PHP 5 or higher.');
error_reporting(E_ALL);
// load dual-libraries
require_once dirname(__FILE__) . '/../extras/HTMLPurifierExtras.auto.php';
require_once dirname(__FILE__) . '/../library/HTMLPurifier.auto.php';
// setup HTML Purifier singleton
HTMLPurifier::getInstance(array(
'AutoFormat.PurifierLinkify' => true
));
// ---------------------------------------------------------------------------
// Include HTML Purifier library
$builder = new HTMLPurifier_ConfigSchema_InterchangeBuilder();
$interchange = new HTMLPurifier_ConfigSchema_Interchange();
$builder->buildDir($interchange);
$loader = dirname(__FILE__) . '/../config-schema.php';
if (file_exists($loader)) include $loader;
$interchange->validate();
set_include_path('../library' . PATH_SEPARATOR . get_include_path());
require_once 'HTMLPurifier.php';
$style = 'plain'; // use $_GET in the future, careful to validate!
$configdoc_xml = dirname(__FILE__) . '/configdoc.xml';
$xml_builder = new HTMLPurifier_ConfigSchema_Builder_Xml();
$xml_builder->openURI($configdoc_xml);
$xml_builder->build($interchange);
unset($xml_builder); // free handle
// ---------------------------------------------------------------------------
// Setup convenience functions
$xslt = new ConfigDoc_HTMLXSLTProcessor();
$xslt->importStylesheet(dirname(__FILE__) . "/styles/$style.xsl");
$output = $xslt->transformToHTML($configdoc_xml);
function appendHTMLDiv($document, $node, $html) {
global $purifier;
$html = $purifier->purify($html);
$dom_html = $document->createDocumentFragment();
$dom_html->appendXML($html);
$dom_div = $document->createElement('div');
$dom_div->setAttribute('xmlns', 'http://www.w3.org/1999/xhtml');
$dom_div->appendChild($dom_html);
$node->appendChild($dom_div);
if (!$output) {
echo "Error in generating files\n";
exit(1);
}
// ---------------------------------------------------------------------------
// Load copies of HTMLPurifier_ConfigDef and HTMLPurifier
$schema = HTMLPurifier_ConfigSchema::instance();
$purifier = new HTMLPurifier();
// ---------------------------------------------------------------------------
// Generate types.xml, a document describing the constraint "type"
$types_document = new DOMDocument('1.0', 'UTF-8');
$types_root = $types_document->createElement('types');
$types_document->appendChild($types_root);
$types_document->formatOutput = true;
foreach ($schema->types as $name => $expanded_name) {
$types_type = $types_document->createElement('type', $expanded_name);
$types_type->setAttribute('id', $name);
$types_root->appendChild($types_type);
}
$types_document->save('types.xml');
// ---------------------------------------------------------------------------
// Generate configdoc.xml, a document documenting configuration directives
$dom_document = new DOMDocument('1.0', 'UTF-8');
$dom_root = $dom_document->createElement('configdoc');
$dom_document->appendChild($dom_root);
$dom_document->formatOutput = true;
// add the name of the application
$dom_root->appendChild($dom_document->createElement('title', 'HTML Purifier'));
/*
TODO for XML format:
- create a definition (DTD or other) once interface stabilizes
*/
foreach($schema->info as $namespace_name => $namespace_info) {
$dom_namespace = $dom_document->createElement('namespace');
$dom_root->appendChild($dom_namespace);
$dom_namespace->setAttribute('id', $namespace_name);
$dom_namespace->appendChild(
$dom_document->createElement('name', $namespace_name)
);
$dom_namespace_description = $dom_document->createElement('description');
$dom_namespace->appendChild($dom_namespace_description);
appendHTMLDiv($dom_document, $dom_namespace_description,
$schema->info_namespace[$namespace_name]->description);
foreach ($namespace_info as $name => $info) {
if ($info->class == 'alias') continue;
$dom_directive = $dom_document->createElement('directive');
$dom_namespace->appendChild($dom_directive);
$dom_directive->setAttribute('id', $namespace_name . '.' . $name);
$dom_directive->appendChild(
$dom_document->createElement('name', $name)
);
$dom_constraints = $dom_document->createElement('constraints');
$dom_directive->appendChild($dom_constraints);
$dom_type = $dom_document->createElement('type', $info->type);
if ($info->allow_null) {
$dom_type->setAttribute('allow-null', 'yes');
}
$dom_constraints->appendChild($dom_type);
if ($info->allowed !== true) {
$dom_allowed = $dom_document->createElement('allowed');
$dom_constraints->appendChild($dom_allowed);
foreach ($info->allowed as $allowed => $bool) {
$dom_allowed->appendChild(
$dom_document->createElement('value', $allowed)
);
}
}
$raw_default = $schema->defaults[$namespace_name][$name];
if (is_bool($raw_default)) {
$default = $raw_default ? 'true' : 'false';
} elseif (is_string($raw_default)) {
$default = "\"$raw_default\"";
} elseif (is_null($raw_default)) {
$default = 'null';
} else {
$default = print_r(
$schema->defaults[$namespace_name][$name], true
);
}
$dom_default = $dom_document->createElement('default', $default);
// remove this once we get a DTD
$dom_default->setAttribute('xml:space', 'preserve');
$dom_constraints->appendChild($dom_default);
$dom_descriptions = $dom_document->createElement('descriptions');
$dom_directive->appendChild($dom_descriptions);
foreach ($info->descriptions as $file => $file_descriptions) {
foreach ($file_descriptions as $line => $description) {
$dom_description = $dom_document->createElement('description');
$dom_description->setAttribute('file', $file);
$dom_description->setAttribute('line', $line);
appendHTMLDiv($dom_document, $dom_description, $description);
$dom_descriptions->appendChild($dom_description);
}
}
}
}
// print_r($dom_document->saveXML());
// save a copy of the raw XML
$dom_document->save('configdoc.xml');
// ---------------------------------------------------------------------------
// Generate final output using XSLT
// load the stylesheet
$xsl_stylesheet_name = 'plain';
$xsl_stylesheet = "styles/$xsl_stylesheet_name.xsl";
$xsl_dom_stylesheet = new DOMDocument();
$xsl_dom_stylesheet->load($xsl_stylesheet);
// setup the XSLT processor
$xsl_processor = new XSLTProcessor();
// perform the transformation
$xsl_processor->importStylesheet($xsl_dom_stylesheet);
$html_output = $xsl_processor->transformToXML($dom_document);
// some slight fudges to preserve backwards compatibility
$html_output = str_replace('/>', ' />', $html_output); // <br /> not <br>
$html_output = str_replace(' xmlns=""', '', $html_output); // rm unnecessary xmlns
if (class_exists('Tidy')) {
// cleanup output
$config = array(
'indent' => true,
'output-xhtml' => true,
'wrap' => 80
);
$tidy = new Tidy;
$tidy->parseString($html_output, $config, 'utf8');
$tidy->cleanRepair();
$html_output = (string) $tidy;
}
// write it to a file (todo: parse into seperate pages)
file_put_contents("$xsl_stylesheet_name.html", $html_output);
// ---------------------------------------------------------------------------
// Output for instant feedback
// write out
file_put_contents(dirname(__FILE__) . "/$style.html", $output);
if (php_sapi_name() != 'cli') {
echo $html_output;
// output (instant feedback if it's a browser)
echo $output;
} else {
echo 'Files generated successfully.';
echo "Files generated successfully.\n";
}
?>
// vim: et sw=4 sts=4

View File

@@ -1,16 +1,30 @@
body {margin:1em 4em;}
body {margin:0;padding:0;}
#content {
margin:1em auto;
max-width: 47em;
width: expression(document.body.clientWidth >
85 * parseInt(document.body.currentStyle.fontSize) ?
"54em": "auto");
}
table {border-collapse:collapse;}
table td, table th {padding:0.2em;}
table.constraints {margin:0 0 1em;}
table.constraints th {text-align:left;padding-left:0.4em;}
table.constraints td {padding-right:0.4em;}
table.constraints th {
text-align:right;padding-left:0.4em;padding-right:0.4em;background:#EEE;
width:8em;vertical-align:top;}
table.constraints td {padding-right:0.4em; padding-left: 1em;}
table.constraints td ul {padding:0; margin:0; list-style:none;}
table.constraints td pre {margin:0;}
#toc {list-style-type:none; font-weight:bold;}
#toc ul {list-style-type:disc; font-weight:normal;}
#tocContainer {position:relative;}
#toc {list-style-type:none; font-weight:bold; font-size:1em; margin-bottom:1em;}
#toc li {position:relative; line-height: 1.2em;}
#toc .col-2 {margin-left:50%;}
#toc .col-l {float:left;}
#toc ul {list-style-type:disc; font-weight:normal; padding-bottom:1.2em;}
.description p {margin-top:0;margin-bottom:1em;}
@@ -19,6 +33,12 @@ table.constraints td pre {margin:0;}
#library {font-size:1em;}
h1 {margin-top:0;}
h2 {border-bottom:1px solid #CCC; font-family:sans-serif; font-weight:normal;
font-size:1.3em;}
font-size:1.3em; clear:both;}
h3 {font-family:sans-serif; font-size:1.1em; font-weight:bold; }
h4 {font-family:sans-serif; font-size:0.9em; font-weight:bold; }
.deprecated {color: #CCC;}
.deprecated table.constraints th {background:#FFF;}
.deprecated-notice {color: #000; text-align:center; margin-bottom: 1em;}
/* vim: et sw=4 sts=4 */

View File

@@ -12,103 +12,213 @@
indent = "no"
media-type = "text/html"
/>
<xsl:variable name="typeLookup" select="document('../types.xml')" />
<xsl:param name="css" select="'styles/plain.css'"/>
<xsl:param name="title" select="'Configuration Documentation'"/>
<xsl:variable name="typeLookup" select="document('../types.xml')/types" />
<xsl:variable name="usageLookup" select="document('../usage.xml')/usage" />
<!-- Twiddle this variable to get the columns as even as possible -->
<xsl:variable name="maxNumberAdjust" select="2" />
<xsl:template match="/">
<html lang="en" xml:lang="en">
<head>
<title>Configuration Documentation - <xsl:value-of select="/configdoc/title" /></title>
<title><xsl:value-of select="$title" /> - <xsl:value-of select="/configdoc/title" /></title>
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8" />
<link rel="stylesheet" type="text/css" href="styles/plain.css" />
<link rel="stylesheet" type="text/css" href="{$css}" />
</head>
<body>
<div id="library"><xsl:value-of select="/configdoc/title" /></div>
<h1>Configuration Documentation</h1>
<h2>Table of Contents</h2>
<ul id="toc">
<xsl:apply-templates mode="toc" />
</ul>
<xsl:apply-templates />
<div id="content">
<div id="library"><xsl:value-of select="/configdoc/title" /></div>
<h1><xsl:value-of select="$title" /></h1>
<div id="tocContainer">
<h2>Table of Contents</h2>
<ul id="toc">
<xsl:apply-templates mode="toc">
<xsl:with-param name="overflowNumber" select="round(count(/configdoc/namespace) div 2) + $maxNumberAdjust" />
</xsl:apply-templates>
</ul>
</div>
<div id="typesContainer">
<h2>Types</h2>
<xsl:apply-templates select="$typeLookup" mode="types" />
</div>
<xsl:apply-templates />
</div>
</body>
</html>
</xsl:template>
<xsl:template match="type" mode="types">
<div class="type-block">
<xsl:attribute name="id">type-<xsl:value-of select="@id" /></xsl:attribute>
<h3><code><xsl:value-of select="@id" /></code>: <xsl:value-of select="@name" /></h3>
<div class="type-description">
<xsl:copy-of xmlns:xhtml="http://www.w3.org/1999/xhtml" select="xhtml:div/node()" />
</div>
</div>
</xsl:template>
<xsl:template match="title" mode="toc" />
<xsl:template match="namespace" mode="toc">
<xsl:param name="overflowNumber" />
<xsl:variable name="number"><xsl:number level="single" /></xsl:variable>
<xsl:variable name="directiveNumber"><xsl:number level="any" count="directive" /></xsl:variable>
<xsl:if test="count(directive)&gt;0">
<li>
<!-- BEGIN multicolumn code -->
<xsl:if test="$number &gt;= $overflowNumber">
<xsl:attribute name="class">col-2</xsl:attribute>
</xsl:if>
<xsl:if test="$number = $overflowNumber">
<xsl:attribute name="style">margin-top:-<xsl:value-of select="($number * 2 + $directiveNumber - 3) * 1.2" />em</xsl:attribute>
</xsl:if>
<!-- END multicolumn code -->
<a href="#{@id}"><xsl:value-of select="name" /></a>
<ul>
<xsl:apply-templates select="directive" mode="toc" />
<xsl:apply-templates select="directive" mode="toc">
<xsl:with-param name="overflowNumber" select="$overflowNumber" />
</xsl:apply-templates>
</ul>
<xsl:if test="$number + 1 = $overflowNumber">
<div class="col-l" />
</xsl:if>
</li>
</xsl:if>
</xsl:template>
<xsl:template match="directive" mode="toc">
<li><a href="#{@id}"><xsl:value-of select="name" /></a></li>
</xsl:template>
<xsl:template match="title" />
<xsl:template match="namespace">
<xsl:apply-templates />
<xsl:if test="count(directive)=0">
<p>No configuration directives defined for this namespace.</p>
<xsl:variable name="number">
<xsl:number level="any" count="directive|namespace" />
</xsl:variable>
<xsl:if test="not(deprecated)">
<li>
<a href="#{@id}"><xsl:value-of select="name" /></a>
</li>
</xsl:if>
</xsl:template>
<xsl:template match="title" />
<xsl:template match="namespace">
<div class="namespace">
<xsl:apply-templates />
<xsl:if test="count(directive)=0">
<p>No configuration directives defined for this namespace.</p>
</xsl:if>
</div>
</xsl:template>
<xsl:template match="namespace/name">
<h2 id="{../@id}"><xsl:value-of select="." /></h2>
</xsl:template>
<xsl:template match="namespace/description">
<div class="description">
<xsl:copy-of select="div/node()" />
<xsl:copy-of xmlns:xhtml="http://www.w3.org/1999/xhtml" select="xhtml:div/node()" />
</div>
</xsl:template>
<xsl:template match="directive">
<xsl:apply-templates />
<div>
<xsl:attribute name="class"><!--
-->directive<!--
--><xsl:if test="deprecated"> deprecated</xsl:if><!--
--></xsl:attribute>
<xsl:apply-templates>
<xsl:with-param name="id" select="@id" />
</xsl:apply-templates>
</div>
</xsl:template>
<xsl:template match="directive/name">
<h3 id="{../@id}"><xsl:value-of select="../@id" /></h3>
<xsl:param name="id" />
<xsl:apply-templates select="../aliases/alias" mode="anchor" />
<h3 id="{$id}"><xsl:value-of select="$id" /></h3>
</xsl:template>
<xsl:template match="alias" mode="anchor">
<a id="{.}"></a>
</xsl:template>
<!-- Do not pass through -->
<xsl:template match="alias"></xsl:template>
<xsl:template match="directive/constraints">
<xsl:param name="id" />
<table class="constraints">
<xsl:apply-templates />
<!-- Calculated other values -->
<tr>
<th>Used by:</th>
<td>
<xsl:for-each select="../descriptions/description">
<xsl:if test="position()&gt;1">, </xsl:if>
<xsl:value-of select="@file" />
</xsl:for-each>
</td>
</tr>
<xsl:if test="../aliases/alias">
<xsl:apply-templates select="../aliases" mode="constraints" />
</xsl:if>
<xsl:apply-templates select="$usageLookup/directive[@id=$id]" />
</table>
</xsl:template>
<xsl:template match="directive//description">
<xsl:template match="directive/aliases" mode="constraints">
<tr>
<th>Aliases</th>
<td>
<xsl:for-each select="alias">
<xsl:if test="position()&gt;1">, </xsl:if>
<xsl:value-of select="." />
</xsl:for-each>
</td>
</tr>
</xsl:template>
<xsl:template match="directive/description">
<div class="description">
<xsl:copy-of select="div/node()" />
<xsl:copy-of xmlns:xhtml="http://www.w3.org/1999/xhtml" select="xhtml:div/node()" />
</div>
</xsl:template>
<xsl:template match="directive/deprecated">
<div class="deprecated-notice">
<strong>Warning:</strong>
This directive was deprecated in version <xsl:value-of select="version" />.
<a href="#{use}">%<xsl:value-of select="use" /></a> should be used instead.
</div>
</xsl:template>
<xsl:template match="usage/directive">
<tr>
<th>Used in</th>
<td>
<ul>
<xsl:apply-templates />
</ul>
</td>
</tr>
</xsl:template>
<xsl:template match="usage/directive/file">
<li>
<em><xsl:value-of select="@name" /></em> on line<xsl:if test="count(line)&gt;1">s</xsl:if>
<xsl:text> </xsl:text>
<xsl:for-each select="line">
<xsl:if test="position()&gt;1">, </xsl:if>
<xsl:value-of select="." />
</xsl:for-each>
</li>
</xsl:template>
<xsl:template match="constraints/version">
<tr>
<th>Version added</th>
<td><xsl:value-of select="." /></td>
</tr>
</xsl:template>
<xsl:template match="constraints/type">
<tr>
<th>Type:</th>
<th>Type</th>
<td>
<xsl:variable name="type" select="text()" />
<xsl:attribute name="class">type type-<xsl:value-of select="$type" /></xsl:attribute>
<xsl:value-of select="$typeLookup/types/type[@id=$type]/text()" />
<xsl:if test="@allow-null='yes'">
(or null)
</xsl:if>
<a>
<xsl:attribute name="href">#type-<xsl:value-of select="$type" /></xsl:attribute>
<xsl:value-of select="$typeLookup/type[@id=$type]/@name" />
<xsl:if test="@allow-null='yes'">
(or null)
</xsl:if>
</a>
</td>
</tr>
</xsl:template>
<xsl:template match="constraints/allowed">
<tr>
<th>Allowed values:</th>
<th>Allowed values</th>
<td>
<xsl:for-each select="value"><!--
--><xsl:if test="position()&gt;1">, </xsl:if>
@@ -119,9 +229,25 @@
</xsl:template>
<xsl:template match="constraints/default">
<tr>
<th>Default:</th>
<th>Default</th>
<td><pre><xsl:value-of select="." xml:space="preserve" /></pre></td>
</tr>
</xsl:template>
</xsl:stylesheet>
<xsl:template match="constraints/external">
<tr>
<th>External deps</th>
<td>
<ul>
<xsl:apply-templates />
</ul>
</td>
</tr>
</xsl:template>
<xsl:template match="constraints/external/project">
<li><xsl:value-of select="." /></li>
</xsl:template>
</xsl:stylesheet>
<!-- vim: et sw=4 sts=4
-->

69
configdoc/types.xml Normal file
View File

@@ -0,0 +1,69 @@
<?xml version="1.0" encoding="UTF-8"?>
<types>
<type id="string" name="String"><div xmlns="http://www.w3.org/1999/xhtml">
A <a
href="http://docs.php.net/manual/en/language.types.string.php">sequence
of characters</a>.
</div></type>
<type id="istring" name="Case-insensitive string"><div xmlns="http://www.w3.org/1999/xhtml">
A series of case-insensitive characters. Internally, upper-case
ASCII characters will be converted to lower-case.
</div></type>
<type id="text" name="Text"><div xmlns="http://www.w3.org/1999/xhtml">
A series of characters that may contain newlines. Text tends to
indicate human-oriented text, as opposed to a machine format.
</div></type>
<type id="itext" name="Case-insensitive text"><div xmlns="http://www.w3.org/1999/xhtml">
A series of case-insensitive characters that may contain newlines.
</div></type>
<type id="int" name="Integer"><div xmlns="http://www.w3.org/1999/xhtml">
An <a
href="http://docs.php.net/manual/en/language.types.integer.php">
integer</a>. You are alternatively permitted to pass a string of
digits instead, which will be cast to an integer using
<code>(int)</code>.
</div></type>
<type id="float" name="Float"><div xmlns="http://www.w3.org/1999/xhtml">
A <a href="http://docs.php.net/manual/en/language.types.float.php">
floating point number</a>. You are alternatively permitted to
pass a numeric string (as defined by <code>is_numeric()</code>),
which will be cast to a float using <code>(float)</code>.
</div></type>
<type id="bool" name="Boolean"><div xmlns="http://www.w3.org/1999/xhtml">
A <a
href="http://docs.php.net/manual/en/language.types.boolean.php">boolean</a>.
You are alternatively permitted to pass an integer <code>0</code> or
<code>1</code> (other integers are not permitted) or a string
<code>"on"</code>, <code>"true"</code> or <code>"1"</code> for
<code>true</code>, and <code>"off"</code>, <code>"false"</code> or
<code>"0"</code> for <code>false</code>.
</div></type>
<type id="lookup" name="Lookup array"><div xmlns="http://www.w3.org/1999/xhtml">
An array whose values are <code>true</code>, e.g. <code>array('key'
=> true, 'key2' => true)</code>. You are alternatively permitted
to pass an array list of the keys <code>array('key', 'key2')</code>
or a comma-separated string of keys <code>"key, key2"</code>. If
you pass an array list of values, ensure that your values are
strictly numerically indexed: <code>array('key1', 2 =>
'key2')</code> will not do what you expect and emits a warning.
</div></type>
<type id="list" name="Array list"><div xmlns="http://www.w3.org/1999/xhtml">
An array which has consecutive integer indexes, e.g.
<code>array('val1', 'val2')</code>. You are alternatively permitted
to pass a comma-separated string of keys <code>"val1, val2"</code>.
If your array is not in this form, <code>array_values</code> is run
on the array and a warning is emitted.
</div></type>
<type id="hash" name="Associative array"><div xmlns="http://www.w3.org/1999/xhtml">
An array which is a mapping of keys to values, e.g.
<code>array('key1' => 'val1', 'key2' => 'val2')</code>. You are
alternatively permitted to pass a comma-separated string of
key-colon-value strings, e.g. <code>"key1: val1, key2: val2"</code>.
</div></type>
<type id="mixed" name="Mixed"><div xmlns="http://www.w3.org/1999/xhtml">
An arbitrary PHP value of any type.
</div></type>
</types>
<!-- vim: et sw=4 sts=4
-->

603
configdoc/usage.xml Normal file
View File

@@ -0,0 +1,603 @@
<?xml version="1.0" encoding="UTF-8"?>
<usage>
<directive id="Core.CollectErrors">
<file name="HTMLPurifier.php">
<line>162</line>
</file>
<file name="HTMLPurifier/Lexer.php">
<line>90</line>
<line>331</line>
</file>
<file name="HTMLPurifier/Lexer/DirectLex.php">
<line>67</line>
<line>87</line>
<line>385</line>
</file>
<file name="HTMLPurifier/Strategy/RemoveForeignElements.php">
<line>57</line>
</file>
</directive>
<directive id="CSS.MaxImgLength">
<file name="HTMLPurifier/CSSDefinition.php">
<line>256</line>
</file>
</directive>
<directive id="CSS.Proprietary">
<file name="HTMLPurifier/CSSDefinition.php">
<line>381</line>
</file>
</directive>
<directive id="CSS.AllowTricky">
<file name="HTMLPurifier/CSSDefinition.php">
<line>385</line>
</file>
</directive>
<directive id="CSS.Trusted">
<file name="HTMLPurifier/CSSDefinition.php">
<line>389</line>
</file>
</directive>
<directive id="CSS.AllowImportant">
<file name="HTMLPurifier/CSSDefinition.php">
<line>393</line>
</file>
</directive>
<directive id="CSS.AllowedProperties">
<file name="HTMLPurifier/CSSDefinition.php">
<line>522</line>
</file>
</directive>
<directive id="CSS.ForbiddenProperties">
<file name="HTMLPurifier/CSSDefinition.php">
<line>538</line>
</file>
</directive>
<directive id="Cache.DefinitionImpl">
<file name="HTMLPurifier/DefinitionCacheFactory.php">
<line>66</line>
</file>
</directive>
<directive id="HTML.Doctype">
<file name="HTMLPurifier/DoctypeRegistry.php">
<line>119</line>
</file>
</directive>
<directive id="HTML.CustomDoctype">
<file name="HTMLPurifier/DoctypeRegistry.php">
<line>123</line>
</file>
</directive>
<directive id="HTML.XHTML">
<file name="HTMLPurifier/DoctypeRegistry.php">
<line>128</line>
</file>
</directive>
<directive id="HTML.Strict">
<file name="HTMLPurifier/DoctypeRegistry.php">
<line>133</line>
</file>
</directive>
<directive id="Core.Encoding">
<file name="HTMLPurifier/Encoder.php">
<line>380</line>
<line>428</line>
</file>
</directive>
<directive id="Test.ForceNoIconv">
<file name="HTMLPurifier/Encoder.php">
<line>388</line>
<line>439</line>
</file>
</directive>
<directive id="Core.EscapeNonASCIICharacters">
<file name="HTMLPurifier/Encoder.php">
<line>429</line>
</file>
</directive>
<directive id="Output.CommentScriptContents">
<file name="HTMLPurifier/Generator.php">
<line>70</line>
</file>
</directive>
<directive id="Output.FixInnerHTML">
<file name="HTMLPurifier/Generator.php">
<line>71</line>
</file>
</directive>
<directive id="Output.SortAttr">
<file name="HTMLPurifier/Generator.php">
<line>72</line>
</file>
</directive>
<directive id="Output.FlashCompat">
<file name="HTMLPurifier/Generator.php">
<line>73</line>
</file>
</directive>
<directive id="Output.TidyFormat">
<file name="HTMLPurifier/Generator.php">
<line>104</line>
</file>
</directive>
<directive id="Core.NormalizeNewlines">
<file name="HTMLPurifier/Generator.php">
<line>122</line>
</file>
<file name="HTMLPurifier/Lexer.php">
<line>313</line>
</file>
</directive>
<directive id="Output.Newline">
<file name="HTMLPurifier/Generator.php">
<line>123</line>
</file>
</directive>
<directive id="HTML.BlockWrapper">
<file name="HTMLPurifier/HTMLDefinition.php">
<line>263</line>
</file>
</directive>
<directive id="HTML.Parent">
<file name="HTMLPurifier/HTMLDefinition.php">
<line>273</line>
</file>
</directive>
<directive id="HTML.AllowedElements">
<file name="HTMLPurifier/HTMLDefinition.php">
<line>291</line>
</file>
</directive>
<directive id="HTML.AllowedAttributes">
<file name="HTMLPurifier/HTMLDefinition.php">
<line>292</line>
</file>
</directive>
<directive id="HTML.Allowed">
<file name="HTMLPurifier/HTMLDefinition.php">
<line>295</line>
</file>
</directive>
<directive id="HTML.ForbiddenElements">
<file name="HTMLPurifier/HTMLDefinition.php">
<line>399</line>
</file>
</directive>
<directive id="HTML.ForbiddenAttributes">
<file name="HTMLPurifier/HTMLDefinition.php">
<line>400</line>
</file>
</directive>
<directive id="HTML.Trusted">
<file name="HTMLPurifier/HTMLModuleManager.php">
<line>234</line>
</file>
<file name="HTMLPurifier/Lexer.php">
<line>318</line>
<line>358</line>
</file>
<file name="HTMLPurifier/AttrDef/HTML/ContentEditable.php">
<line>8</line>
</file>
<file name="HTMLPurifier/HTMLModule/Image.php">
<line>37</line>
</file>
<file name="HTMLPurifier/Lexer/DirectLex.php">
<line>47</line>
</file>
<file name="HTMLPurifier/Strategy/RemoveForeignElements.php">
<line>30</line>
</file>
</directive>
<directive id="HTML.AllowedModules">
<file name="HTMLPurifier/HTMLModuleManager.php">
<line>241</line>
</file>
</directive>
<directive id="HTML.CoreModules">
<file name="HTMLPurifier/HTMLModuleManager.php">
<line>242</line>
</file>
</directive>
<directive id="HTML.Proprietary">
<file name="HTMLPurifier/HTMLModuleManager.php">
<line>256</line>
</file>
</directive>
<directive id="HTML.SafeObject">
<file name="HTMLPurifier/HTMLModuleManager.php">
<line>259</line>
</file>
</directive>
<directive id="HTML.SafeEmbed">
<file name="HTMLPurifier/HTMLModuleManager.php">
<line>262</line>
</file>
</directive>
<directive id="HTML.SafeScripting">
<file name="HTMLPurifier/HTMLModuleManager.php">
<line>265</line>
</file>
<file name="HTMLPurifier/HTMLModule/SafeScripting.php">
<line>22</line>
</file>
</directive>
<directive id="HTML.Nofollow">
<file name="HTMLPurifier/HTMLModuleManager.php">
<line>268</line>
</file>
</directive>
<directive id="HTML.TargetBlank">
<file name="HTMLPurifier/HTMLModuleManager.php">
<line>271</line>
</file>
</directive>
<directive id="HTML.TargetNoreferrer">
<file name="HTMLPurifier/HTMLModuleManager.php">
<line>276</line>
</file>
</directive>
<directive id="HTML.TargetNoopener">
<file name="HTMLPurifier/HTMLModuleManager.php">
<line>279</line>
</file>
</directive>
<directive id="Attr.IDBlacklist">
<file name="HTMLPurifier/IDAccumulator.php">
<line>27</line>
</file>
</directive>
<directive id="Core.Language">
<file name="HTMLPurifier/LanguageFactory.php">
<line>93</line>
</file>
</directive>
<directive id="Core.LexerImpl">
<file name="HTMLPurifier/Lexer.php">
<line>85</line>
</file>
</directive>
<directive id="Core.MaintainLineNumbers">
<file name="HTMLPurifier/Lexer.php">
<line>89</line>
</file>
<file name="HTMLPurifier/Lexer/DirectLex.php">
<line>62</line>
</file>
</directive>
<directive id="Core.LegacyEntityDecoder">
<file name="HTMLPurifier/Lexer.php">
<line>220</line>
<line>342</line>
</file>
</directive>
<directive id="Core.ConvertDocumentToFragment">
<file name="HTMLPurifier/Lexer.php">
<line>329</line>
</file>
</directive>
<directive id="Core.RemoveProcessingInstructions">
<file name="HTMLPurifier/Lexer.php">
<line>352</line>
</file>
</directive>
<directive id="Core.HiddenElements">
<file name="HTMLPurifier/Lexer.php">
<line>356</line>
</file>
<file name="HTMLPurifier/Strategy/RemoveForeignElements.php">
<line>36</line>
</file>
</directive>
<directive id="Core.AggressivelyRemoveScript">
<file name="HTMLPurifier/Lexer.php">
<line>357</line>
</file>
</directive>
<directive id="Core.RemoveScriptContents">
<file name="HTMLPurifier/Lexer.php">
<line>358</line>
</file>
<file name="HTMLPurifier/Strategy/RemoveForeignElements.php">
<line>35</line>
</file>
</directive>
<directive id="URI.">
<file name="HTMLPurifier/URIDefinition.php">
<line>65</line>
</file>
<file name="HTMLPurifier/URIFilter/Munge.php">
<line>46</line>
</file>
</directive>
<directive id="URI.Host">
<file name="HTMLPurifier/URIDefinition.php">
<line>76</line>
</file>
<file name="HTMLPurifier/URIScheme.php">
<line>89</line>
</file>
</directive>
<directive id="URI.Base">
<file name="HTMLPurifier/URIDefinition.php">
<line>77</line>
</file>
</directive>
<directive id="URI.DefaultScheme">
<file name="HTMLPurifier/URIDefinition.php">
<line>84</line>
</file>
</directive>
<directive id="URI.AllowedSchemes">
<file name="HTMLPurifier/URISchemeRegistry.php">
<line>48</line>
</file>
</directive>
<directive id="URI.OverrideAllowedSchemes">
<file name="HTMLPurifier/URISchemeRegistry.php">
<line>49</line>
</file>
</directive>
<directive id="CSS.AllowDuplicates">
<file name="HTMLPurifier/AttrDef/CSS.php">
<line>28</line>
</file>
</directive>
<directive id="URI.Disable">
<file name="HTMLPurifier/AttrDef/URI.php">
<line>47</line>
</file>
</directive>
<directive id="Core.ColorKeywords">
<file name="HTMLPurifier/AttrDef/CSS/Color.php">
<line>29</line>
</file>
<file name="HTMLPurifier/AttrDef/HTML/Color.php">
<line>19</line>
</file>
</directive>
<directive id="CSS.AllowedFonts">
<file name="HTMLPurifier/AttrDef/CSS/FontFamily.php">
<line>64</line>
</file>
</directive>
<directive id="Attr.AllowedClasses">
<file name="HTMLPurifier/AttrDef/HTML/Class.php">
<line>33</line>
</file>
</directive>
<directive id="Attr.ForbiddenClasses">
<file name="HTMLPurifier/AttrDef/HTML/Class.php">
<line>34</line>
</file>
</directive>
<directive id="Attr.AllowedFrameTargets">
<file name="HTMLPurifier/AttrDef/HTML/FrameTarget.php">
<line>32</line>
</file>
</directive>
<directive id="Attr.EnableID">
<file name="HTMLPurifier/AttrDef/HTML/ID.php">
<line>41</line>
</file>
</directive>
<directive id="Attr.IDPrefix">
<file name="HTMLPurifier/AttrDef/HTML/ID.php">
<line>51</line>
</file>
</directive>
<directive id="Attr.IDPrefixLocal">
<file name="HTMLPurifier/AttrDef/HTML/ID.php">
<line>53</line>
<line>58</line>
</file>
</directive>
<directive id="Attr.ID.HTML5">
<file name="HTMLPurifier/AttrDef/HTML/ID.php">
<line>75</line>
</file>
</directive>
<directive id="Attr.IDBlacklistRegexp">
<file name="HTMLPurifier/AttrDef/HTML/ID.php">
<line>97</line>
</file>
</directive>
<directive id="Attr.">
<file name="HTMLPurifier/AttrDef/HTML/LinkTypes.php">
<line>46</line>
</file>
</directive>
<directive id="Core.AllowHostnameUnderscore">
<file name="HTMLPurifier/AttrDef/URI/Host.php">
<line>77</line>
</file>
</directive>
<directive id="Core.EnableIDNA">
<file name="HTMLPurifier/AttrDef/URI/Host.php">
<line>109</line>
</file>
</directive>
<directive id="Attr.DefaultTextDir">
<file name="HTMLPurifier/AttrTransform/BdoDir.php">
<line>22</line>
</file>
</directive>
<directive id="Core.RemoveInvalidImg">
<file name="HTMLPurifier/AttrTransform/ImgRequired.php">
<line>24</line>
</file>
<file name="HTMLPurifier/Strategy/RemoveForeignElements.php">
<line>27</line>
</file>
</directive>
<directive id="Attr.DefaultInvalidImage">
<file name="HTMLPurifier/AttrTransform/ImgRequired.php">
<line>27</line>
</file>
</directive>
<directive id="Attr.DefaultImageAlt">
<file name="HTMLPurifier/AttrTransform/ImgRequired.php">
<line>33</line>
</file>
</directive>
<directive id="Attr.DefaultInvalidImageAlt">
<file name="HTMLPurifier/AttrTransform/ImgRequired.php">
<line>40</line>
</file>
</directive>
<directive id="HTML.Attr.Name.UseCDATA">
<file name="HTMLPurifier/AttrTransform/Name.php">
<line>18</line>
</file>
<file name="HTMLPurifier/HTMLModule/Name.php">
<line>19</line>
</file>
</directive>
<directive id="HTML.FlashAllowFullScreen">
<file name="HTMLPurifier/AttrTransform/SafeParam.php">
<line>58</line>
</file>
</directive>
<directive id="Cache.SerializerPath">
<file name="HTMLPurifier/DefinitionCache/Serializer.php">
<line>185</line>
</file>
</directive>
<directive id="Cache.SerializerPermissions">
<file name="HTMLPurifier/DefinitionCache/Serializer.php">
<line>202</line>
<line>218</line>
</file>
</directive>
<directive id="Filter.ExtractStyleBlocks.TidyImpl">
<file name="HTMLPurifier/Filter/ExtractStyleBlocks.php">
<line>94</line>
</file>
</directive>
<directive id="Filter.ExtractStyleBlocks.Scope">
<file name="HTMLPurifier/Filter/ExtractStyleBlocks.php">
<line>125</line>
</file>
</directive>
<directive id="Filter.ExtractStyleBlocks.Escaping">
<file name="HTMLPurifier/Filter/ExtractStyleBlocks.php">
<line>330</line>
</file>
</directive>
<directive id="HTML.Forms">
<file name="HTMLPurifier/HTMLModule/Forms.php">
<line>31</line>
</file>
</directive>
<directive id="HTML.SafeIframe">
<file name="HTMLPurifier/HTMLModule/Iframe.php">
<line>28</line>
</file>
<file name="HTMLPurifier/URIFilter/SafeIframe.php">
<line>48</line>
</file>
</directive>
<directive id="HTML.MaxImgLength">
<file name="HTMLPurifier/HTMLModule/Image.php">
<line>21</line>
</file>
<file name="HTMLPurifier/HTMLModule/SafeEmbed.php">
<line>18</line>
</file>
<file name="HTMLPurifier/HTMLModule/SafeObject.php">
<line>24</line>
</file>
</directive>
<directive id="HTML.TidyLevel">
<file name="HTMLPurifier/HTMLModule/Tidy.php">
<line>50</line>
</file>
</directive>
<directive id="HTML.TidyAdd">
<file name="HTMLPurifier/HTMLModule/Tidy.php">
<line>54</line>
</file>
</directive>
<directive id="HTML.TidyRemove">
<file name="HTMLPurifier/HTMLModule/Tidy.php">
<line>55</line>
</file>
</directive>
<directive id="AutoFormat.PurifierLinkify.DocURL">
<file name="HTMLPurifier/Injector/PurifierLinkify.php">
<line>31</line>
</file>
</directive>
<directive id="AutoFormat.RemoveEmpty.RemoveNbsp">
<file name="HTMLPurifier/Injector/RemoveEmpty.php">
<line>46</line>
</file>
</directive>
<directive id="AutoFormat.RemoveEmpty.RemoveNbsp.Exceptions">
<file name="HTMLPurifier/Injector/RemoveEmpty.php">
<line>47</line>
</file>
</directive>
<directive id="AutoFormat.RemoveEmpty.Predicate">
<file name="HTMLPurifier/Injector/RemoveEmpty.php">
<line>48</line>
</file>
</directive>
<directive id="Core.AggressivelyFixLt">
<file name="HTMLPurifier/Lexer/DOMLex.php">
<line>54</line>
</file>
</directive>
<directive id="Core.AllowParseManyTags">
<file name="HTMLPurifier/Lexer/DOMLex.php">
<line>72</line>
</file>
</directive>
<directive id="Core.DirectLexLineNumberSyncInterval">
<file name="HTMLPurifier/Lexer/DirectLex.php">
<line>84</line>
</file>
</directive>
<directive id="Core.DisableExcludes">
<file name="HTMLPurifier/Strategy/FixNesting.php">
<line>54</line>
</file>
</directive>
<directive id="Core.EscapeInvalidTags">
<file name="HTMLPurifier/Strategy/MakeWellFormed.php">
<line>72</line>
</file>
<file name="HTMLPurifier/Strategy/RemoveForeignElements.php">
<line>26</line>
</file>
</directive>
<directive id="HTML.AllowedComments">
<file name="HTMLPurifier/Strategy/RemoveForeignElements.php">
<line>31</line>
</file>
</directive>
<directive id="HTML.AllowedCommentsRegexp">
<file name="HTMLPurifier/Strategy/RemoveForeignElements.php">
<line>32</line>
</file>
</directive>
<directive id="URI.HostBlacklist">
<file name="HTMLPurifier/URIFilter/HostBlacklist.php">
<line>25</line>
</file>
</directive>
<directive id="URI.MungeResources">
<file name="HTMLPurifier/URIFilter/Munge.php">
<line>48</line>
</file>
</directive>
<directive id="URI.MungeSecretKey">
<file name="HTMLPurifier/URIFilter/Munge.php">
<line>49</line>
</file>
</directive>
<directive id="URI.SafeIframeRegexp">
<file name="HTMLPurifier/URIFilter/SafeIframe.php">
<line>35</line>
</file>
</directive>
</usage>

View File

@@ -0,0 +1,26 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta name="description" content="Specification for HTML Purifier's advanced API for defining custom filtering behavior." />
<link rel="stylesheet" type="text/css" href="style.css" />
<title>Advanced API - HTML Purifier</title>
</head><body>
<h1>Advanced API</h1>
<div id="filing">Filed under Development</div>
<div id="index">Return to the <a href="index.html">index</a>.</div>
<div id="home"><a href="http://htmlpurifier.org/">HTML Purifier</a> End-User Documentation</div>
<p>
Please see <a href="enduser-customize.html">Customize!</a>
</p>
</body></html>
<!-- vim: et sw=4 sts=4
-->

View File

@@ -1,52 +0,0 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta name="description" content="Discusses code quality issues and places that need to be refactored in HTML Purifier." />
<link rel="stylesheet" type="text/css" href="./style.css" />
<title>Code Quality Issues - HTML Purifier</title>
</head><body>
<h1>Code Quality Issues</h1>
<div id="filing">Filed under Development</div>
<div id="index">Return to the <a href="index.html">index</a>.</div>
<div id="home"><a href="http://hp.jpsband.org/">HTML Purifier</a> End-User Documentation</div>
<p>Okay, face it. Programmers can get lazy, cut corners, or make mistakes. They
also can do quick prototypes, and then forget to rewrite them later. Well,
while I can't list mistakes in here, I can list prototype-like segments
of code that should be aggressively refactored. This does not list
optimization issues, that needs to be done after intense profiling.</p>
<pre>
docs/examples/demo.php - ad hoc HTML/PHP soup to the extreme
AttrDef
Class - doesn't support Unicode characters (fringe); uses regular
expressions
Lang - code duplication; premature optimization
Length - easily mistaken for CSSLength
URI - multiple regular expressions; missing validation for parts (?)
CSS - parser doesn't accept advanced CSS (fringe)
Number - constructor interface inconsistent with Integer
ConfigSchema - redefinition is a mess
Strategy
FixNesting - cannot bubble nodes out of structures, duplicated checks
for special-case parent node
MakeWellFormed - insufficient automatic closing definitions (check HTML
spec for optional end tags, also, closing based on type (block/inline)
might be efficient).
RemoveForeignElements - should be run in parallel with MakeWellFormed
URIScheme - needs to have callable generic checks
mailto - doesn't validate emails, doesn't validate querystring
news - doesn't validate opaque path
nntp - doesn't constrain path
</pre>
<div id="version">$Id$</div>
</body></html>

30
docs/dev-code-quality.txt Normal file
View File

@@ -0,0 +1,30 @@
Code Quality Issues
Okay, face it. Programmers can get lazy, cut corners, or make mistakes. They
also can do quick prototypes, and then forget to rewrite them later. Well,
while I can't list mistakes in here, I can list prototype-like segments
of code that should be aggressively refactored. This does not list
optimization issues, that needs to be done after intense profiling.
docs/examples/demo.php - ad hoc HTML/PHP soup to the extreme
AttrDef - a lot of duplication, more generic classes need to be created;
a lot of strtolower() calls, no legit casing
Class - doesn't support Unicode characters (fringe); uses regular expressions
Lang - code duplication; premature optimization
Length - easily mistaken for CSSLength
URI - multiple regular expressions; missing validation for parts (?)
CSS - parser doesn't accept advanced CSS (fringe)
Number - constructor interface inconsistent with Integer
Strategy
FixNesting - cannot bubble nodes out of structures, duplicated checks
for special-case parent node
RemoveForeignElements - should be run in parallel with MakeWellFormed
URIScheme - needs to have callable generic checks
mailto - doesn't validate emails, doesn't validate querystring
news - doesn't validate opaque path
nntp - doesn't constrain path
tel - doesn't validate phone numbers, only allows characters '+', '1-9', and 'x'
vim: et sw=4 sts=4

View File

@@ -0,0 +1,79 @@
Configuration Backwards-Compatibility Breaks
In version 4.0.0, the configuration subsystem (composed of the outwards
facing Config class, as well as the ConfigSchema and ConfigSchema_Interchange
subsystems), was significantly revamped to make use of property lists.
While most of the changes are internal, some internal APIs were changed for the
sake of clarity. HTMLPurifier_Config was kept completely backwards compatible,
although some of the functions were retrofitted with an unambiguous alternate
syntax. Both of these changes are discussed in this document.
1. Outwards Facing Changes
--------------------------------------------------------------------------------
The HTMLPurifier_Config class now takes an alternate syntax. The general rule
is:
If you passed $namespace, $directive, pass "$namespace.$directive"
instead.
An example:
$config->set('HTML', 'Allowed', 'p');
becomes:
$config->set('HTML.Allowed', 'p');
New configuration options may have more than one namespace, they might
look something like %Filter.YouTube.Blacklist. While you could technically
set it with ('HTML', 'YouTube.Blacklist'), the logical extension
('HTML', 'YouTube', 'Blacklist') does not work.
The old API will still work, but will emit E_USER_NOTICEs.
2. Internal API Changes
--------------------------------------------------------------------------------
Some overarching notes: we've completely eliminated the notion of namespace;
it's now an informal construct for organizing related configuration directives.
Also, the validation routines for keys (formerly "$namespace.$directive")
have been completely relaxed. I don't think it really should be necessary.
2.1 HTMLPurifier_ConfigSchema
First off, if you're interfacing with this class, you really shouldn't.
HTMLPurifier_ConfigSchema_Builder_ConfigSchema is really the only class that
should ever be creating HTMLPurifier_ConfigSchema, and HTMLPurifier_Config the
only class that should be reading it.
All namespace related methods were removed; they are completely unnecessary
now. Any $namespace, $name arguments must be replaced with $key (where
$key == "$namespace.$name"), including for addAlias().
The $info and $defaults member variables are no longer indexed as
[$namespace][$name]; they are now indexed as ["$namespace.$name"].
All deprecated methods were finally removed, after having yelled at you as
an E_USER_NOTICE for a while now.
2.2 HTMLPurifier_ConfigSchema_Interchange
Member variable $namespaces was removed.
2.3 HTMLPurifier_ConfigSchema_Interchange_Id
Member variable $namespace and $directive removed; member variable $key added.
Any method that took $namespace, $directive now takes $key.
2.4 HTMLPurifier_ConfigSchema_Interchange_Namespace
Removed.
vim: et sw=4 sts=4

165
docs/dev-config-naming.txt Normal file
View File

@@ -0,0 +1,165 @@
Configuration naming
HTML Purifier 4.0.0 features a new configuration naming system that
allows arbitrary nesting of namespaces. While there are certain cases
in which using two namespaces is obviously better (the canonical example
is where we were using AutoFormatParam to contain directives for AutoFormat
parameters), it is unclear whether or not a general migration to highly
namespaced directives is a good idea or not.
== Case studies ==
=== Attr.* ===
We have a dead duck HTML.Attr.Name.UseCDATA which migrated before we decided
to think this out thoroughly.
We currently have a large number of directives in the Attr.* namespace.
These directives tweak the behavior of some HTML attributes. They have
the properties:
* While they apply to only one attribute at a time, the attribute can
span over multiple elements (not necessarily all attributes, either).
The information of which elements it impacts is either omitted or
informally stated (EnableID applies to all elements, DefaultImageAlt
applies to <img> tags, AllowedRev doesn't say but only applies to a tags).
* There is a certain degree of clustering that could be applied, especially
to the ID directives. The clustering could be done with respect to
what element/attribute was used, i.e.
*.id -> EnableID, IDBlacklistRegexp, IDBlacklist, IDPrefixLocal, IDPrefix
img.src -> DefaultInvalidImage
img.alt -> DefaultImageAlt, DefaultInvalidImageAlt
bdo.dir -> DefaultTextDir
a.rel -> AllowedRel
a.rev -> AllowedRev
a.target -> AllowedFrameTargets
a.name -> Name.UseCDATA
* The directives often reference generic attribute types that were specified
in the DTD/specification. However, some of the behavior specifically relies
on the fact that other use cases of the attribute are not, at current,
supported by HTML Purifier.
AllowedRel, AllowedRev -> heavily <a> specific; if <link> ends up being
allowed, we will also have to give users specificity there (we also
want to preserve generality) DTD %Linktypes, HTML5 distinguishes
between <link> and <a>/<area>
AllowedFrameTargets -> heavily <a> specific, but also used by <area>
and <form>. Transitional DTD %FrameTarget, not present in strict,
HTML5 calls them "browsing contexts"
Default*Image* -> as a default parameter, is almost entirely exlcusive
to <img>
EnableID -> global attribute
Name.UseCDATA -> heavily <a> specific, but has heavy other usage by
many things
== AutoFormat.* ==
These have the fairly normal pluggable architecture that lends itself to
large amounts of namespaces (pluggability may be the key to figuring
out when gratuitous namespacing is good.) Properties:
* Boolean directives are fair game for being namespaced: for example,
RemoveEmpty.RemoveNbsp triggers RemoveEmpty.RemoveNbsp.Exceptions,
the latter of which only makes sense when RemoveEmpty.RemoveNbsp
is set to true. (The same applies to RemoveNbsp too)
The AutoFormat string is a bit long, but is the only bit of repeated
context.
== Core.* ==
Core is the potpourri of directives, mostly regarding some minor behavioral
tweaks for HTML handling abilities.
AggressivelyFixLt
AllowParseManyTags
ConvertDocumentToFragment
DirectLexLineNumberSyncInterval
LexerImpl
MaintainLineNumbers
Lexer
CollectErrors
Language
Error handling (Language is ostensibly a little more general, but
it's only used for error handling right now)
ColorKeywords
CSS and HTML
Encoding
EscapeNonASCIICharacters
Character encoding
EscapeInvalidChildren
EscapeInvalidTags
HiddenElements
RemoveInvalidImg
Lexing/Output
RemoveScriptContents
Deprecated
== HTML.* ==
AllowedAttributes
AllowedElements
AllowedModules
Allowed
ForbiddenAttributes
ForbiddenElements
Element set tuning
BlockWrapper
Child def advanced twiddle
CoreModules
CustomDoctype
Advanced HTMLModuleManager twiddles
DefinitionID
DefinitionRev
Caching
Doctype
Parent
Strict
XHTML
Global environment
MaxImgLength
Attribute twiddle? (applies to two attributes)
Proprietary
SafeEmbed
SafeObject
Trusted
Extra functionality/tagsets
TidyAdd
TidyLevel
TidyRemove
Tidy
== Output.* ==
These directly affect the output of Generator. These are all advanced
twiddles.
== URI.* ==
AllowedSchemes
OverrideAllowedSchemes
Scheme tuning
Base
DefaultScheme
Host
Global environment
DefinitionID
DefinitionRev
Caching
DisableExternalResources
DisableExternal
DisableResources
Disable
Contextual/authority tuning
HostBlacklist
Authority tuning
MakeAbsolute
MungeResources
MungeSecretKey
Munge
Transformation behavior (munge can be grouped)

412
docs/dev-config-schema.html Normal file
View File

@@ -0,0 +1,412 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta name="description" content="Describes config schema framework in HTML Purifier." />
<link rel="stylesheet" type="text/css" href="./style.css" />
<title>Config Schema - HTML Purifier</title>
</head>
<body>
<h1>Config Schema</h1>
<div id="filing">Filed under Development</div>
<div id="index">Return to the <a href="index.html">index</a>.</div>
<div id="home"><a href="http://htmlpurifier.org/">HTML Purifier</a> End-User Documentation</div>
<p>
HTML Purifier has a fairly complex system for configuration. Users
interact with a <code>HTMLPurifier_Config</code> object to
set configuration directives. The values they set are validated according
to a configuration schema, <code>HTMLPurifier_ConfigSchema</code>.
</p>
<p>
The schema is mostly transparent to end-users, but if you're doing development
work for HTML Purifier and need to define a new configuration directive,
you'll need to interact with it. We'll also talk about how to define
userspace configuration directives at the very end.
</p>
<h2>Write a directive file</h2>
<p>
Directive files define configuration directives to be used by
HTML Purifier. They are placed in <code>library/HTMLPurifier/ConfigSchema/schema/</code>
in the form <code><em>Namespace</em>.<em>Directive</em>.txt</code> (I
couldn't think of a more descriptive file extension.)
Directive files are actually what we call <code>StringHash</code>es,
i.e. associative arrays represented in a string form reminiscent of
<a href="http://qa.php.net/write-test.php">PHPT</a> tests. Here's a
sample directive file, <code>Test.Sample.txt</code>:
</p>
<pre>Test.Sample
TYPE: string/null
DEFAULT: NULL
ALLOWED: 'foo', 'bar'
VALUE-ALIASES: 'baz' => 'bar'
VERSION: 3.1.0
--DESCRIPTION--
This is a sample configuration directive for the purposes of the
&lt;code&gt;dev-config-schema.html&lt;code&gt; documentation.
--ALIASES--
Test.Example</pre>
<p>
Each of these segments has a specific meaning:
</p>
<table class="table">
<thead>
<tr>
<th>Key</th>
<th>Example</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>ID</td>
<td>Test.Sample</td>
<td>The name of the directive, in the form Namespace.Directive
(implicitly the first line)</td>
</tr>
<tr>
<td>TYPE</td>
<td>string/null</td>
<td>The type of variable this directive accepts. See below for
details. You can also add <code>/null</code> to the end of
any basic type to allow null values too.</td>
</tr>
<tr>
<td>DEFAULT</td>
<td>NULL</td>
<td>A parseable PHP expression of the default value.</td>
</tr>
<tr>
<td>DESCRIPTION</td>
<td>This is a...</td>
<td>An HTML description of what this directive does.</td>
</tr>
<tr>
<td>VERSION</td>
<td>3.1.0</td>
<td><em>Recommended</em>. The version of HTML Purifier this directive was added.
Directives that have been around since 1.0.0 don't have this,
but any new ones should.</td>
</tr>
<tr>
<td>ALIASES</td>
<td>Test.Example</td>
<td><em>Optional</em>. A comma separated list of aliases for this directive.
This is most useful for backwards compatibility and should
not be used otherwise.</td>
</tr>
<tr>
<td>ALLOWED</td>
<td>'foo', 'bar'</td>
<td><em>Optional</em>. Set of allowed value for a directive,
a comma separated list of parseable PHP expressions. This
is only allowed string, istring, text and itext TYPEs.</td>
</tr>
<tr>
<td>VALUE-ALIASES</td>
<td>'baz' =&gt; 'bar'</td>
<td><em>Optional</em>. Mapping of one value to another, and
should be a comma separated list of keypair duples. This
is only allowed string, istring, text and itext TYPEs.</td>
</tr>
<tr>
<td>DEPRECATED-VERSION</td>
<td>3.1.0</td>
<td><em>Not shown</em>. Indicates that the directive was
deprecated this version.</td>
</tr>
<tr>
<td>DEPRECATED-USE</td>
<td>Test.NewDirective</td>
<td><em>Not shown</em>. Indicates what new directive should be
used instead. Note that the directives will functionally be
different, although they should offer the same functionality.
If they are identical, use an alias instead.</td>
</tr>
<tr>
<td>EXTERNAL</td>
<td>CSSTidy</td>
<td><em>Not shown</em>. Indicates if there is an external library
the user will need to download and install to use this configuration
directive. As of right now, this is merely a Google-able name; future
versions may also provide links and instructions.</td>
</tr>
</tbody>
</table>
<p>
Some notes on format and style:
</p>
<ul>
<li>
Each of these keys can be expressed in the short format
(<code>KEY: Value</code>) or the long format
(<code>--KEY--</code> with value beneath). You must use the
long format if multiple lines are needed, or if a long format
has been used already (that's why <code>ALIASES</code> in our
example is in the long format); otherwise, it's user preference.
</li>
<li>
The HTML descriptions should be wrapped at about 80 columns; do
not rely on editor word-wrapping.
</li>
</ul>
<p>
Also, as promised, here is the set of possible types:
</p>
<table class="table">
<thead>
<tr>
<th>Type</th>
<th>Example</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>string</td>
<td>'Foo'</td>
<td><a href="http://docs.php.net/manual/en/language.types.string.php">String</a> without newlines</td>
</tr>
<tr>
<td>istring</td>
<td>'foo'</td>
<td>Case insensitive ASCII string without newlines</td>
</tr>
<tr>
<td>text</td>
<td>"A<em>\n</em>b"</td>
<td>String with newlines</td>
</tr>
<tr>
<td>itext</td>
<td>"a<em>\n</em>b"</td>
<td>Case insensitive ASCII string without newlines</td>
</tr>
<tr>
<td>int</td>
<td>23</td>
<td>Integer</td>
</tr>
<tr>
<td>float</td>
<td>3.0</td>
<td>Floating point number</td>
</tr>
<tr>
<td>bool</td>
<td>true</td>
<td>Boolean</td>
</tr>
<tr>
<td>lookup</td>
<td>array('key' =&gt; true)</td>
<td>Lookup array, used with <code>isset($var[$key])</code></td>
</tr>
<tr>
<td>list</td>
<td>array('f', 'b')</td>
<td>List array, with ordered numerical indexes</td>
</tr>
<tr>
<td>hash</td>
<td>array('key' =&gt; 'val')</td>
<td>Associative array of keys to values</td>
</tr>
<tr>
<td>mixed</td>
<td>new stdClass</td>
<td>Any PHP variable is fine</td>
</tr>
</tbody>
</table>
<p>
The examples represent what will be returned out of the configuration
object; users have a little bit of leeway when setting configuration
values (for example, a lookup value can be specified as a list;
HTML Purifier will flip it as necessary.) These types are defined
in <a href="http://repo.or.cz/w/htmlpurifier.git?a=blob;hb=HEAD;f=library/HTMLPurifier/VarParser.php">
library/HTMLPurifier/VarParser.php</a>.
</p>
<p>
For more information on what values are allowed, and how they are parsed,
consult <a href="http://repo.or.cz/w/htmlpurifier.git?a=blob;hb=HEAD;f=library/HTMLPurifier/ConfigSchema/InterchangeBuilder.php">
library/HTMLPurifier/ConfigSchema/InterchangeBuilder.php</a>, as well
as <a href="http://repo.or.cz/w/htmlpurifier.git?a=blob;hb=HEAD;f=library/HTMLPurifier/ConfigSchema/Interchange/Directive.php">
library/HTMLPurifier/ConfigSchema/Interchange/Directive.php</a> for
the semantics of the parsed values.
</p>
<h2>Refreshing the cache</h2>
<p>
You may have noticed that your directive file isn't doing anything
yet. That's because it hasn't been added to the runtime
<code>HTMLPurifier_ConfigSchema</code> instance. Run
<code>maintenance/generate-schema-cache.php</code> to fix this.
If there were no errors, you're good to go! Don't forget to add
some unit tests for your functionality!
</p>
<p>
If you ever make changes to your configuration directives, you
will need to run this script again.
</p>
<h2>Adding in-house schema definitions</h2>
<p>
Placing stuff directly in HTML Purifier's source tree is generally not a
good idea, so HTML Purifier 4.0.0+ has some facilities in place to make your
life easier.
</p>
<p>
The first is to pass an extra parameter to <code>maintenance/generate-schema-cache.php</code>
with the location of your directory (relative or absolute path will do). For example,
if I'm storing my custom definitions in <em>/var/htmlpurifier/myschema</em>, run:
<code>php maintenance/generate-schema-cache.php /var/htmlpurifier/myschema</code>.
</p>
<p>
Alternatively, you can create a small loader PHP file in the HTML Purifier base
directory named <code>config-schema.php</code> (this is the same directory
you would place a <code>test-settings.php</code> file). In this file, add
the following line for each directory you want to load:
</p>
<pre>$builder-&gt;buildDir($interchange, '/var/htmlpurifier/myschema');</pre>
<p>You can even load a single file using:</p>
<pre>$builder-&gt;buildFile($interchange, '/var/htmlpurifier/myschema/MyApp.Directive.txt');</pre>
<p>Storing custom definitions that you don't plan on sending back upstream in
a separate directory is <em>definitely</em> a good idea! Additionally, picking
a good namespace can go a long way to saving you grief if you want to use
someone else's change, but they picked the same name, or if HTML Purifier
decides to add support for a configuration directive that has the same name.</p>
<!-- TODO: how to name directives that rely on naming conventions -->
<h2>Errors</h2>
<p>
All directive files go through a rigorous validation process
through <a href="http://repo.or.cz/w/htmlpurifier.git?a=blob;hb=HEAD;f=library/HTMLPurifier/ConfigSchema/Validator.php">
library/HTMLPurifier/ConfigSchema/Validator.php</a>, as well
as some basic checks during building. While
listing every error out here is out-of-scope for this document, we
can give some general tips for interpreting error messages.
There are two types of errors: builder errors and validation errors.
</p>
<h3>Builder errors</h3>
<blockquote>
<p>
<strong>Exception:</strong> Expected type string, got
integer in DEFAULT in directive hash 'Ns.Dir'
</p>
</blockquote>
<p>
You can identify a builder error by the keyword "directive hash."
These are the easiest to deal with, because they directly correspond
with your directive file. Find the offending directive file (which
is the directive hash plus the .txt extension), find the
offending index ("in DEFAULT" means the DEFAULT key) and fix the error.
This particular error would occur if your default value is not the same
type as TYPE.
</p>
<h3>Validation errors</h3>
<blockquote>
<p>
<strong>Exception:</strong> Alias 3 in valueAliases in directive
'Ns.Dir' must be a string
</p>
</blockquote>
<p>
These are a little trickier, because we're not actually validating
your directive file, or even the direct string hash representation.
We're validating an Interchange object, and the error messages do
not mention any string hash keys.
</p>
<p>
Nevertheless, it's not difficult to figure out what went wrong.
Read the "context" statements in reverse:
</p>
<dl>
<dt>in directive 'Ns.Dir'</dt>
<dd>This means we need to look at the directive file <code>Ns.Dir.txt</code></dd>
<dt>in valueAliases</dt>
<dd>There's no key actually called this, but there's one that's close:
VALUE-ALIASES. Indeed, that's where to look.</dd>
<dt>Alias 3</dt>
<dd>The value alias that is equal to 3 is the culprit.</dd>
</dl>
<p>
In this particular case, you're not allowed to alias integers values to
strings values.
</p>
<p>
The most difficult part is translating the Interchange member variable (valueAliases)
into a directive file key (VALUE-ALIASES), but there's a one-to-one
correspondence currently. If the two formats diverge, any discrepancies
will be described in <a href="http://repo.or.cz/w/htmlpurifier.git?a=blob;hb=HEAD;f=library/HTMLPurifier/ConfigSchema/InterchangeBuilder.php">
library/HTMLPurifier/ConfigSchema/InterchangeBuilder.php</a>.
</p>
<h2>Internals</h2>
<p>
Much of the configuration schema framework's codebase deals with
shuffling data from one format to another, and doing validation on this
data.
The keystone of all of this is the <code>HTMLPurifier_ConfigSchema_Interchange</code>
class, which represents the purest, parsed representation of the schema.
</p>
<p>
Hand-writing this data is unwieldy, however, so we write directive files.
These directive files are parsed by <code>HTMLPurifier_StringHashParser</code>
into <code>HTMLPurifier_StringHash</code>es, which then
are run through <code>HTMLPurifier_ConfigSchema_InterchangeBuilder</code>
to construct the interchange object.
</p>
<p>
From the interchange object, the data can be siphoned into other forms
using <code>HTMLPurifier_ConfigSchema_Builder</code> subclasses.
For example, <code>HTMLPurifier_ConfigSchema_Builder_ConfigSchema</code>
generates a runtime <code>HTMLPurifier_ConfigSchema</code> object,
which <code>HTMLPurifier_Config</code> uses to validate its incoming
data. There is also an XML serializer, which is used to build documentation.
</p>
</body>
</html>
<!-- vim: et sw=4 sts=4
-->

68
docs/dev-flush.html Normal file
View File

@@ -0,0 +1,68 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta name="description" content="Discusses when to flush HTML Purifier's various caches." />
<link rel="stylesheet" type="text/css" href="./style.css" />
<title>Flushing the Purifier - HTML Purifier</title>
</head>
<body>
<h1>Flushing the Purifier</h1>
<div id="filing">Filed under Development</div>
<div id="index">Return to the <a href="index.html">index</a>.</div>
<div id="home"><a href="http://htmlpurifier.org/">HTML Purifier</a> End-User Documentation</div>
<p>
If you've been poking around the various folders in HTML Purifier,
you may have noticed the <code>maintenance</code> directory. Almost
all of these scripts are devoted to flushing out the various caches
HTML Purifier uses. Normal users don't have to worry about this:
regular library usage is transparent. However, when doing development
work on HTML Purifier, you may find you have to flush one of the
caches.
</p>
<p>
As a general rule of thumb, run <code>flush.php</code> whenever you make
any <em>major</em> changes, or when tests start mysteriously failing.
In more detail, run this script if:
</p>
<ul>
<li>
You added new source files to HTML Purifier's main library.
(see <code>generate-includes.php</code>)
</li>
<li>
You modified the configuration schema (see
<code>generate-schema-cache.php</code>). This usually means
adding or modifying files in <code>HTMLPurifier/ConfigSchema/schema/</code>,
although in rare cases modifying <code>HTMLPurifier/ConfigSchema.php</code>
will also require this.
</li>
<li>
You modified a Definition, or its subsystems. The most usual candidate
is <code>HTMLPurifier/HTMLDefinition.php</code>, which also encompasses
the files in <code>HTMLPurifier/HTMLModule/</code> as well as if you've
<a href="enduser-customize.html">customizing definitions</a> without
the cache disabled. (see <code>flush-generation-cache.php</code>)
</li>
<li>
You modified source files, and have been using the standalone
version from the full installation. (see <code>generate-standalone.php</code>)
</li>
</ul>
<p>
You can check out the corresponding scripts for more information on what they
do.
</p>
</body></html>
<!-- vim: et sw=4 sts=4
-->

281
docs/dev-includes.txt Normal file
View File

@@ -0,0 +1,281 @@
INCLUDES, AUTOLOAD, BYTECODE CACHES and OPTIMIZATION
The Problem
-----------
HTML Purifier contains a number of extra components that are not used all
of the time, only if the user explicitly specifies that we should use
them.
Some of these optional components are optionally included (Filter,
Language, Lexer, Printer), while others are included all the time
(Injector, URIFilter, HTMLModule, URIScheme). We will stipulate that these
are all developer specified: it is conceivable that certain Tokens are not
used, but this is user-dependent and should not be trusted.
We should come up with a consistent way to handle these things and ensure
that we get the maximum performance when there is bytecode caches and
when there are not. Unfortunately, these two goals seem contrary to each
other.
A peripheral issue is the performance of ConfigSchema, which has been
shown take a large, constant amount of initialization time, and is
intricately linked to the issue of includes due to its pervasive use
in our plugin architecture.
Pros and Cons
-------------
We will assume that user-based extensions will be included by them.
Conditional includes:
Pros:
- User management is simplified; only a single directive needs to be set
- Only necessary code is included
Cons:
- Doesn't play nicely with opcode caches
- Adds complexity to standalone version
- Optional configuration directives are not exposed without a little
extra coaxing (not implemented yet)
Include it all:
Pros:
- User management is still simple
- Plays nicely with opcode caches and standalone version
- All configuration directives are present
Cons:
- Lots of (how much?) extra code is included
- Classes that inherit from external libraries will cause compile
errors
Build an include stub (Let's do this!):
Pros:
- Only necessary code is included
- Plays nicely with opcode caches and standalone version
- require (without once) can be used, see above
- Could further extend as a compilation to one file
Cons:
- Not implemented yet
- Requires user intervention and use of a command line script
- Standalone script must be chained to this
- More complex and compiled-language-like
- Requires a whole new class of system-wide configuration directives,
as configuration objects can be reused
- Determining what needs to be included can be complex (see above)
- No way of autodetecting dynamically instantiated classes
- Might be slow
Include stubs
-------------
This solution may be "just right" for users who are heavily oriented
towards performance. However, there are a number of picky implementation
details to work out beforehand.
The number one concern is how to make the HTML Purifier files "work
out of the box", while still being able to easily get them into a form
that works with this setup. As the codebase stands right now, it would
be necessary to strip out all of the require_once calls. The only way
we could get rid of the require_once calls is to use __autoload or
use the stub for all cases (which might not be a bad idea).
Aside
-----
An important thing to remember, however, is that these require_once's
are valuable data about what classes a file needs. Unfortunately, there's
no distinction between whether or not the file is needed all the time,
or whether or not it is one of our "optional" files. Thus, it is
effectively useless.
Deprecated
----------
One of the things I'd like to do is have the code search for any classes
that are explicitly mentioned in the code. If a class isn't mentioned, I
get to assume that it is "optional," i.e. included via introspection.
The choice is either to use PHP's tokenizer or use regexps; regexps would
be faster but a tokenizer would be more correct. If this ends up being
unfeasible, adding dependency comments isn't a bad idea. (This could
even be done automatically by search/replacing require_once, although
we'd have to manually inspect the results for the optional requires.)
NOTE: This ends up not being necessary, as we're going to make the user
figure out all the extra classes they need, and only include the core
which is predetermined.
Using the autoload framework with include stubs works nicely with
introspective classes: instead of having to have require_once inside
the function, we can let autoload do the work; we simply need to
new $class or accept the object straight from the caller. Handling filters
becomes a simple matter of ticking off configuration directives, and
if ConfigSchema spits out errors, adding the necessary includes. We could
also use the autoload framework as a fallback, in case the user forgets
to make the include, but doesn't really care about performance.
Insight
-------
All of this talk is merely a natural extension of what our current
standalone functionality does. However, instead of having our code
perform the includes, or attempting to inline everything that possibly
could be used, we boot the issue to the user, making them include
everything or setup the fallback autoload handler.
Configuration Schema
--------------------
A common deficiency for all of the conditional include setups (including
the dynamically built include PHP stub) is that if one of this
conditionally included files includes a configuration directive, it
is not accessible to configdoc. A stopgap solution for this problem is
to have it piggy-back off of the data in the merge-library.php script
to figure out what extra files it needs to include, but if the file also
inherits classes that don't exist, we're in big trouble.
I think it's high time we centralized the configuration documentation.
However, the type checking has been a great boon for the library, and
I'd like to keep that. The compromise is to use some other source, and
then parse it into the ConfigSchema internal format (sans all of those
nasty documentation strings which we really don't need at runtime) and
serialize that for future use.
The next question is that of format. XML is very verbose, and the prospect
of setting defaults in it gives me willies. However, this may be necessary.
Splitting up the file into manageable chunks may alleviate this trouble,
and we may be even want to create our own format optimized for specifying
configuration. It might look like (based off the PHPT format, which is
nicely compact yet unambiguous and human-readable):
Core.HiddenElements
TYPE: lookup
DEFAULT: array('script', 'style') // auto-converted during processing
--ALIASES--
Core.InvisibleElements, Core.StupidElements
--DESCRIPTION--
<p>
Blah blah
</p>
The first line is the directive name, the lines after that prior to the
first --HEADER-- block are single-line values, and then after that
the multiline values are there. No value is restricted to a particular
format: DEFAULT could very well be multiline if that would be easier.
This would make it insanely easy, also, to add arbitrary extra parameters,
like:
VERSION: 3.0.0
ALLOWED: 'none', 'light', 'medium', 'heavy' // this is wrapped in array()
EXTERNAL: CSSTidy // this would be documented somewhere else with a URL
The final loss would be that you wouldn't know what file the directive
was used in; with some clever regexps it should be possible to
figure out where $config->get($ns, $d); occurs. Reflective calls to
the configuration object is mitigated by the fact that getBatch is
used, so we can simply talk about that in the namespace definition page.
This might be slow, but it would only happen when we are creating
the documentation for consumption, and is sugar.
We can put this in a schema/ directory, outside of HTML Purifier. The serialized
data gets treated like entities.ser.
The final thing that needs to be handled is user defined configurations.
They can be added at runtime using ConfigSchema::registerDirectory()
which globs the directory and grabs all of the directives to be incorporated
in. Then, the result is saved. We may want to take advantage of the
DefinitionCache framework, although it is not altogether certain what
configuration directives would be used to generate our key (meta-directives!)
Further thoughts
----------------
Our master configuration schema will only need to be updated once
every new version, so it's easily versionable. User specified
schema files are far more volatile, but it's far too expensive
to check the filemtimes of all the files, so a DefinitionRev style
mechanism works better. However, we can uniquely identify the
schema based on the directories they loaded, so there's no need
for a DefinitionId until we give them full programmatic control.
These variables should be directly incorporated into ConfigSchema,
and ConfigSchema should handle serialization. Some refactoring will be
necessary for the DefinitionCache classes, as they are built with
Config in mind. If the user changes something, the cache file gets
rebuilt. If the version changes, the cache file gets rebuilt. Since
our unit tests flush the caches before we start, and the operation is
pretty fast, this will not negatively impact unit testing.
One last thing: certain configuration directives require that files
get added. They may even be specified dynamically. It is not a good idea
for the HTMLPurifier_Config object to be used directly for such matters.
Instead, the userland code should explicitly perform the includes. We may
put in something like:
REQUIRES: HTMLPurifier_Filter_ExtractStyleBlocks
To indicate that if that class doesn't exist, and the user is attempting
to use the directive, we should fatally error out. The stub includes the core files,
and the user includes everything else. Any reflective things like new
$class would be required to tie in with the configuration.
It would work very well with rarely used configuration options, but it
wouldn't be so good for "core" parts that can be disabled. In such cases
the core include file would need to be modified, and the only way
to properly do this is use the configuration object. Once again, our
ability to create cache keys saves the day again: we can create arbitrary
stub files for arbitrary configurations and include those. They could
even be the single file affairs. The only thing we'd need to include,
then, would be HTMLPurifier_Config! Then, the configuration object would
load the library.
An aside...
-----------
One questions, however, the wisdom of letting PHP files write other PHP
files. It seems like a recipe for disaster, or at least lots of headaches
in highly secured setups, where PHP does not have the ability to write
to its root. In such cases, we could use sticky bits or tell the user
to manually generate the file.
The other troublesome bit is actually doing the calculations necessary.
For certain cases, it's simple (such as URIScheme), but for AttrDef
and HTMLModule the dependency trees are very complex in relation to
%HTML.Allowed and friends. I think that this idea should be shelved
and looked at a later, less insane date.
An interesting dilemma presents itself when a configuration form is offered
to the user. Normally, the configuration object is not accessible without
editing PHP code; this facility changes thing. The sensible thing to do
is stipulate that all classes required by the directives you allow must
be included.
Unit testing
------------
Setting up the parsing and translation into our existing format would not
be difficult to do. It might represent a good time for us to rethink our
tests for these facilities; as creative as they are, they are often hacky
and require public visibility for things that ought to be protected.
This is especially applicable for our DefinitionCache tests.
Migration
---------
Because we are not *adding* anything essentially new, it should be trivial
to write a script to take our existing data and dump it into the new format.
Well, not trivial, but fairly easy to accomplish. Primary implementation
difficulties would probably involve formatting the file nicely.
Backwards-compatibility
-----------------------
I expect that the ConfigSchema methods should stick around for a little bit,
but display E_USER_NOTICE warnings that they are deprecated. This will
require documentation!
New stuff
---------
VERSION: Version number directive was introduced
DEPRECATED-VERSION: If the directive was deprecated, when was it deprecated?
DEPRECATED-USE: If the directive was deprecated, what should the user use now?
REQUIRES: What classes does this configuration directive require, but are
not part of the HTML Purifier core?
vim: et sw=4 sts=4

View File

@@ -14,7 +14,7 @@
<div id="filing">Filed under Development</div>
<div id="index">Return to the <a href="index.html">index</a>.</div>
<div id="home"><a href="http://hp.jpsband.org/">HTML Purifier</a> End-User Documentation</div>
<div id="home"><a href="http://htmlpurifier.org/">HTML Purifier</a> End-User Documentation</div>
<p>The classes in this library follow a few naming conventions, which may
help you find the correct functionality more quickly. Here they are:</p>
@@ -35,7 +35,7 @@ help you find the correct functionality more quickly. Here they are:</p>
<dt>Harness and Test are reserved class names for unit tests</dt>
<dd>The suffix <code>Test</code> indicates that the class is a subclass of UnitTestCase
(of the Simpletest library) and is testable. "Harness" indicates a subclass
of UnitTestCase that is not meant to be run but to be extended into
of UnitTestCase that is not meant to be run but to be extended into
concrete test cases and contains custom test methods (i.e. assert*())</dd>
<dt>Class names do not necessarily represent inheritance hierarchies</dt>
@@ -51,7 +51,7 @@ help you find the correct functionality more quickly. Here they are:</p>
all must be present in order for proper functioning.</dd>
<dt>Abbreviations are avoided</dt>
<dd>We try to avoid abbreviations as much as possible, but in some cases,
<dd>We try to avoid abbreviations as much as possible, but in some cases,
abbreviated version is more readable than the full version. Here, we
list common abbreviations:
<ul>
@@ -77,6 +77,7 @@ help you find the correct functionality more quickly. Here they are:</p>
</dl>
<div id="version">$Id$</div>
</body></html>
</body></html>
<!-- vim: et sw=4 sts=4
-->

View File

@@ -14,7 +14,7 @@
<div id="filing">Filed under Development</div>
<div id="index">Return to the <a href="index.html">index</a>.</div>
<div id="home"><a href="http://hp.jpsband.org/">HTML Purifier</a> End-User Documentation</div>
<div id="home"><a href="http://htmlpurifier.org/">HTML Purifier</a> End-User Documentation</div>
<p>Here are some possible optimization techniques we can apply to code sections if
they turn out to be slow. Be sure not to prematurely optimize: if you get
@@ -23,11 +23,11 @@ that itch, put it here!</p>
<ul>
<li>Make Tokens Flyweights (may prove problematic, probably not worth it)</li>
<li>Rewrite regexps into PHP code</li>
<li>Serialize the Definition object</li>
<li>Batch regexp validation (do as many per function call as possible)</li>
<li>Parallelize strategies</li>
</ul>
<div id="version">$Id$</div>
</body></html>
</body></html>
<!-- vim: et sw=4 sts=4
-->

View File

@@ -32,14 +32,19 @@ thead th {text-align:left;padding:0.1em;background-color:#EEE;}
<div id="filing">Filed under Development</div>
<div id="index">Return to the <a href="index.html">index</a>.</div>
<div id="home"><a href="http://hp.jpsband.org/">HTML Purifier</a> End-User Documentation</div>
<div id="home"><a href="http://htmlpurifier.org/">HTML Purifier</a> End-User Documentation</div>
<p>
<strong>Warning:</strong> This table is kept for historical purposes and
is not being actively updated.
</p>
<h2>Key</h2>
<table cellspacing="0"><tbody>
<tr><td class="impl-yes">Implemented</td></tr>
<tr><td class="impl-partial">Partially implemented</td></tr>
<tr><td class="impl-no">Will not implement</td></tr>
<tr><td class="impl-no">Not priority to implement</td></tr>
<tr><td class="danger">Dangerous attribute/property</td></tr>
<tr><td class="css1">Present in CSS1</td></tr>
<tr><td class="feature">Feature, requires extra work</td></tr>
@@ -118,6 +123,7 @@ thead th {text-align:left;padding:0.1em;background-color:#EEE;}
<tbody>
<tr><th colspan="2">Table</th></tr>
<tr class="impl-yes"><td>border-collapse</td><td>ENUM(collapse, seperate)</td></tr>
<tr class="impl-yes"><td>border-space</td><td>MULTIPLE</td></tr>
<tr class="impl-yes"><td>caption-side</td><td>ENUM(top, bottom)</td></tr>
<tr class="feature"><td>empty-cells</td><td>ENUM(show, hide), No IE support makes this useless,
possible fix with &amp;nbsp;? Unknown release milestone.</td></tr>
@@ -142,16 +148,16 @@ thead th {text-align:left;padding:0.1em;background-color:#EEE;}
<tbody>
<tr><th colspan="2">Unknown</th></tr>
<tr class="danger css1 impl-yes"><td>background-image</td><td>Dangerous, target milestone 1.3</td></tr>
<tr class="danger css1 impl-yes"><td>background-image</td><td>Dangerous</td></tr>
<tr class="css1 impl-yes"><td>background-attachment</td><td>ENUM(scroll, fixed),
Depends on background-image</td></tr>
<tr class="css1 impl-yes"><td>background-position</td><td>Depends on background-image</td></tr>
<tr class="danger impl-no"><td>cursor</td><td>Dangerous but fluffy</td></tr>
<tr class="danger css1"><td>display</td><td>ENUM(...), Dangerous but interesting;
<tr class="danger impl-yes"><td>display</td><td>ENUM(...), Dangerous but interesting;
will not implement list-item, run-in (Opera only) or table (no IE);
inline-block has incomplete IE6 support and requires -moz-inline-box
for Mozilla. Unknown target milestone.</td></tr>
<tr class="css1"><td>height</td><td>Interesting, why use it? Unknown target milestone.</td></tr>
<tr class="css1 impl-yes"><td>height</td><td>Interesting, why use it? Unknown target milestone.</td></tr>
<tr class="danger css1 impl-yes"><td>list-style-image</td><td>Dangerous?</td></tr>
<tr class="impl-no"><td>max-height</td><td rowspan="4">No IE 5/6</td></tr>
<tr class="impl-no"><td>min-height</td></tr>
@@ -166,11 +172,11 @@ thead th {text-align:left;padding:0.1em;background-color:#EEE;}
Mostly supported. Unknown target milestone.</td></tr>
<tr><td>page-break-inside</td><td>ENUM(avoid, auto), Opera only. Unknown target milestone.</td></tr>
<tr class="impl-no"><td>quotes</td><td>May be dropped from CSS2, fairly useless for inline context</td></tr>
<tr class="impl-no"><td>visibility</td><td>ENUM(visible, hidden, collapse),
<tr class="danger impl-yes"><td>visibility</td><td>ENUM(visible, hidden, collapse),
Dangerous</td></tr>
<tr class="css1 feature"><td>white-space</td><td>ENUM(normal, pre, nowrap, pre-wrap,
<tr class="css1 feature impl-partial"><td>white-space</td><td>ENUM(normal, pre, nowrap, pre-wrap,
pre-line), Spotty implementation:
pre (no IE 5/6), nowrap (no IE 5),
pre (no IE 5/6), <em>nowrap</em> (no IE 5, supported),
pre-wrap (only Opera), pre-line (no support). Fixable? Unknown target milestone.</td></tr>
</tbody>
@@ -238,18 +244,18 @@ Mozilla on inside and needs -moz-outline, no IE support.</td></tr>
<tr><th colspan="3">Questionable</th></tr>
<tr class="impl-no"><td>accesskey</td><td>A</td><td>May interfere with main interface</td></tr>
<tr class="impl-no"><td>tabindex</td><td>A</td><td>May interfere with main interface</td></tr>
<tr><td>target</td><td>A</td><td>Config enabled, only useful for frame layouts, disallowed in strict</td></tr>
<tr class="impl-yes"><td>target</td><td>A</td><td>Config enabled, only useful for frame layouts, disallowed in strict</td></tr>
</tbody>
<tbody>
<tr><th colspan="3">Miscellaneous</th></tr>
<tr><td>datetime</td><td>DEL, INS</td><td>No visible effect, ISO format</td></tr>
<tr><td>rel</td><td>A</td><td>Largely user-defined: nofollow, tag (see microformats)</td></tr>
<tr><td>rev</td><td>A</td><td>Largely user-defined: vote-*</td></tr>
<tr class="impl-yes"><td>rel</td><td>A</td><td>Largely user-defined: nofollow, tag (see microformats)</td></tr>
<tr class="impl-yes"><td>rev</td><td>A</td><td>Largely user-defined: vote-*</td></tr>
<tr class="feature"><td>axis</td><td>TD, TH</td><td>W3C only: No browser implementation</td></tr>
<tr class="feature"><td>char</td><td>COL, COLGROUP, TBODY, TD, TFOOT, TH, THEAD, TR</td><td>W3C only: No browser implementation</td></tr>
<tr class="feature"><td>headers</td><td>TD, TH</td><td>W3C only: No browser implementation</td></tr>
<tr class="feature"><td>scope</td><td>TD, TH</td><td>W3C only: No browser implementation</td></tr>
<tr class="impl-yes"><td>scope</td><td>TD, TH</td><td>W3C only: No browser implementation</td></tr>
</tbody>
<tbody class="impl-yes">
@@ -262,41 +268,42 @@ Mozilla on inside and needs -moz-outline, no IE support.</td></tr>
</tbody>
<tbody>
<tr><th colspan="3">Transform, target milestone 1.4</th></tr>
<tr><td rowspan="5">align</td><td>CAPTION</td><td>Near-equiv style 'caption-side', drop left and right</td></tr>
<tr><td>IMG</td><td rowspan="2">Margin-left and margin-right = auto or parent div</td></tr>
<tr><td>TABLE</td></tr>
<tr><td>HR</td><td>Near-equivalent style 'text-align' (Works for IE and Opera, but not Firefox). Also try <code>margin-right:auto; margin-left:0;</code> for left or <code>margin-right:0; margin-left:auto;</code> for right (optionally replacing 0 with the original margin for that side)</td></tr>
<tr><th colspan="3">Transform</th></tr>
<tr class="impl-yes"><td rowspan="5">align</td><td>CAPTION</td><td>'caption-side' for top/bottom, 'text-align' for left/right</td></tr>
<tr class="impl-yes"><td>IMG</td><td rowspan="3">See specimens/html-align-to-css.html</td></tr>
<tr class="impl-yes"><td>TABLE</td></tr>
<tr class="impl-yes"><td>HR</td></tr>
<tr class="impl-yes"><td>H1, H2, H3, H4, H5, H6, P</td><td>Equivalent style 'text-align'</td></tr>
<tr class="required impl-yes"><td>alt</td><td>IMG</td><td>Required, insert image filename if src is present or default invalid image text</td></tr>
<tr><td rowspan="3">bgcolor</td><td>TABLE</td><td>Equivalent style 'background-color'</td></tr>
<tr><td>TR</td><td>Equivalent style 'background-color'</td></tr>
<tr><td>TD, TH</td><td>Equivalent style 'background-color'</td></tr>
<tr><td>border</td><td>IMG</td><td>Near equivalent style 'border-width', as it only applies when link present</td></tr>
<tr><td>clear</td><td>BR</td><td>Near-equiv style 'clear', transform 'all' into 'both'</td></tr>
<tr class="impl-yes"><td rowspan="3">bgcolor</td><td>TABLE</td><td>Superset style 'background-color'</td></tr>
<tr class="impl-yes"><td>TR</td><td>Superset style 'background-color'</td></tr>
<tr class="impl-yes"><td>TD, TH</td><td>Superset style 'background-color'</td></tr>
<tr class="impl-yes"><td>border</td><td>IMG</td><td>Equivalent style <code>border:[number]px solid</code></td></tr>
<tr class="impl-yes"><td>clear</td><td>BR</td><td>Near-equiv style 'clear', transform 'all' into 'both'</td></tr>
<tr class="impl-no"><td>compact</td><td>DL, OL, UL</td><td>Boolean, needs custom CSS class; rarely used anyway</td></tr>
<tr class="required impl-yes"><td>dir</td><td>BDO</td><td>Required, insert ltr (or configuration value) if none</td></tr>
<tr><td>height</td><td>TD, TH</td><td>Near-equiv style 'height', needs px suffix if original was in pixels</td></tr>
<tr><td>hspace</td><td>IMG</td><td>Near-equiv styles 'margin-top' and 'margin-bottom', needs px suffix</td></tr>
<tr class="impl-yes"><td>height</td><td>TD, TH</td><td>Near-equiv style 'height', needs px suffix if original was in pixels</td></tr>
<tr class="impl-yes"><td>hspace</td><td>IMG</td><td>Near-equiv styles 'margin-top' and 'margin-bottom', needs px suffix</td></tr>
<tr class="impl-yes"><td>lang</td><td>*</td><td>Copy value to xml:lang</td></tr>
<tr><td rowspan="2">name</td><td>IMG</td><td>Turn into ID</td></tr>
<tr><td>A</td><td>Turn into ID? (not deprecated, though in which specs?)</td></tr>
<tr><td>noshade</td><td>HR</td><td>Boolean, style 'border-style:solid;'</td></tr>
<tr><td>nowrap</td><td>TD, TH</td><td>Boolean, style 'white-space:nowrap;' (not compat with IE5)</td></tr>
<tr><td>size</td><td>HR</td><td>Near-equiv 'width', needs px suffix if original was pixels</td></tr>
<tr class="impl-yes"><td rowspan="2">name</td><td>IMG</td><td>Turn into ID</td></tr>
<tr class="impl-yes"><td>A</td><td>Turn into ID</td></tr>
<tr class="impl-yes"><td>noshade</td><td>HR</td><td>Boolean, style 'border-style:solid;'</td></tr>
<tr class="impl-yes"><td>nowrap</td><td>TD, TH</td><td>Boolean, style 'white-space:nowrap;' (not compat with IE5)</td></tr>
<tr class="impl-yes"><td>size</td><td>HR</td><td>Near-equiv 'height', needs px suffix if original was pixels</td></tr>
<tr class="required impl-yes"><td>src</td><td>IMG</td><td>Required, insert blank or default img if not set</td></tr>
<tr class="impl-yes"><td>start</td><td>OL</td><td>Poorly supported 'counter-reset', allowed in loose, dropped in strict</td></tr>
<tr><td rowspan="3">type</td><td>LI</td><td rowspan="3">Equivalent style 'list-style-type', different allowed values though. (needs testing)</td></tr>
<tr><td>OL</td></tr>
<tr><td>UL</td></tr>
<tr class="impl-yes"><td rowspan="3">type</td><td>LI</td><td rowspan="3">Equivalent style 'list-style-type', different allowed values though. (needs testing)</td></tr>
<tr class="impl-yes"><td>OL</td></tr>
<tr class="impl-yes"><td>UL</td></tr>
<tr class="impl-yes"><td>value</td><td>LI</td><td>Poorly supported 'counter-reset', allowed in loose, dropped in strict</td></tr>
<tr><td>vspace</td><td>IMG</td><td>Near-equiv styles 'margin-left' and 'margin-right', needs px suffix, see hspace</td></tr>
<tr><td rowspan="2">width</td><td>HR</td><td rowspan="2">Near-equiv style 'width', needs px suffix if original was pixels</td></tr>
<tr><td>TD, TH</td></tr>
<tr class="impl-yes"><td>vspace</td><td>IMG</td><td>Near-equiv styles 'margin-left' and 'margin-right', needs px suffix, see hspace</td></tr>
<tr class="impl-yes"><td rowspan="2">width</td><td>HR</td><td rowspan="2">Near-equiv style 'width', needs px suffix if original was pixels</td></tr>
<tr class="impl-yes"><td>TD, TH</td></tr>
</tbody>
</table>
<div id="version">$Id$</div>
</body></html>
</body></html>
<!-- vim: et sw=4 sts=4
-->

850
docs/enduser-customize.html Normal file
View File

@@ -0,0 +1,850 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta name="description" content="Tutorial for customizing HTML Purifier's tag and attribute sets." />
<link rel="stylesheet" type="text/css" href="style.css" />
<title>Customize - HTML Purifier</title>
</head><body>
<h1 class="subtitled">Customize!</h1>
<div class="subtitle">HTML Purifier is a Swiss-Army Knife</div>
<div id="filing">Filed under End-User</div>
<div id="index">Return to the <a href="index.html">index</a>.</div>
<div id="home"><a href="http://htmlpurifier.org/">HTML Purifier</a> End-User Documentation</div>
<p>
HTML Purifier has this quirk where if you try to allow certain elements or
attributes, HTML Purifier will tell you that it's not supported, and that
you should go to the forums to find out how to implement it. Well, this
document is how to implement elements and attributes which HTML Purifier
doesn't support out of the box.
</p>
<h2>Is it necessary?</h2>
<p>
Before we even write any code, it is paramount to consider whether or
not the code we're writing is necessary or not. HTML Purifier, by default,
contains a large set of elements and attributes: large enough so that
<em>any</em> element or attribute in XHTML 1.0 or 1.1 (and its HTML variants)
that can be safely used by the general public is implemented.
</p>
<p>
So what needs to be implemented? (Feel free to skip this section if
you know what you want).
</p>
<h3>XHTML 1.0</h3>
<p>
All of the modules listed below are based off of the
<a href="http://www.w3.org/TR/2001/REC-xhtml-modularization-20010410/abstract_modules.html#sec_5.2.">modularization of
XHTML</a>, which, while technically for XHTML 1.1, is quite a useful
resource.
</p>
<ul>
<li>Structure</li>
<li>Frames</li>
<li>Applets (deprecated)</li>
<li>Forms</li>
<li>Image maps</li>
<li>Objects</li>
<li>Frames</li>
<li>Events</li>
<li>Meta-information</li>
<li>Style sheets</li>
<li>Link (not hypertext)</li>
<li>Base</li>
<li>Name</li>
</ul>
<p>
If you don't recognize it, you probably don't need it. But the curious
can look all of these modules up in the above-mentioned document. Note
that inline scripting comes packaged with HTML Purifier (more on this
later).
</p>
<h3>XHTML 1.1</h3>
<p>
As of HTMLPurifier 2.1.0, we have implemented the
<a href="http://www.w3.org/TR/2001/REC-ruby-20010531/">Ruby module</a>,
which defines a set of tags
for publishing short annotations for text, used mostly in Japanese
and Chinese school texts, but applicable for positioning any text (not
limited to translations) above or below other corresponding text.
</p>
<h3>HTML 5</h3>
<p>
<a href="http://www.whatwg.org/specs/web-apps/current-work/">HTML 5</a>
is a fork of HTML 4.01 by WHATWG, who believed that XHTML 2.0 was headed
in the wrong direction. It too is a working draft, and may change
drastically before publication, but it should be noted that the
<code>canvas</code> tag has been implemented by many browser vendors.
</p>
<h3>Proprietary</h3>
<p>
There are a number of proprietary tags still in the wild. Many of them
have been documented in <a href="ref-proprietary-tags.txt">ref-proprietary-tags.txt</a>,
but there is currently no implementation for any of them.
</p>
<h3>Extensions</h3>
<p>
There are also a number of other XML languages out there that can
be embedded in HTML documents: two of the most popular are MathML and
SVG, and I frequently get requests to implement these. But they are
expansive, comprehensive specifications, and it would take far too long
to implement them <em>correctly</em> (most systems I've seen go as far
as whitelisting tags and no further; come on, what about nesting!)
</p>
<p>
Word of warning: HTML Purifier is currently <em>not</em> namespace
aware.
</p>
<h2>Giving back</h2>
<p>
As you may imagine from the details above (don't be abashed if you didn't
read it all: a glance over would have done), there's quite a bit that
HTML Purifier doesn't implement. Recent architectural changes have
allowed HTML Purifier to implement elements and attributes that are not
safe! Don't worry, they won't be activated unless you set %HTML.Trusted
to true, but they certainly help out users who need to put, say, forms
on their page and don't want to go through the trouble of reading this
and implementing it themself.
</p>
<p>
So any of the above that you implement for your own application could
help out some other poor sap on the other side of the globe. Help us
out, and send back code so that it can be hammered into a module and
released with the core. Any code would be greatly appreciated!
</p>
<h2>And now...</h2>
<p>
Enough philosophical talk, time for some code:
</p>
<pre>$config = HTMLPurifier_Config::createDefault();
$config-&gt;set('HTML.DefinitionID', 'enduser-customize.html tutorial');
$config-&gt;set('HTML.DefinitionRev', 1);
if ($def = $config-&gt;maybeGetRawHTMLDefinition()) {
// our code will go here
}</pre>
<p>
Assuming that HTML Purifier has already been properly loaded (hint:
include <code>HTMLPurifier.auto.php</code>), this code will set up
the environment that you need to start customizing the HTML definition.
What's going on?
</p>
<ul>
<li>
The first three lines are regular configuration code:
<ul>
<li>
%HTML.DefinitionID is set to a unique identifier for your
custom HTML definition. This prevents it from clobbering
other custom definitions on the same installation.
</li>
<li>
%HTML.DefinitionRev is a revision integer of your HTML
definition. Because HTML definitions are cached, you'll need
to increment this whenever you make a change in order to flush
the cache.
</li>
</ul>
</li>
<li>
The fourth line retrieves a raw <code>HTMLPurifier_HTMLDefinition</code>
object that we will be tweaking. Interestingly enough, we have
placed it in an if block: this is because
<code>maybeGetRawHTMLDefinition</code>, as its name suggests, may
return a NULL, in which case we should skip doing any
initialization. This, in fact, will correspond to when our fully
customized object is already in the cache.
</li>
</ul>
<h2>Turn off caching</h2>
<p>
To make development easier, we're going to temporarily turn off
definition caching:
</p>
<pre>$config = HTMLPurifier_Config::createDefault();
$config-&gt;set('HTML.DefinitionID', 'enduser-customize.html tutorial');
$config-&gt;set('HTML.DefinitionRev', 1);
<strong>$config-&gt;set('Cache.DefinitionImpl', null); // TODO: remove this later!</strong>
$def = $config-&gt;getHTMLDefinition(true);</pre>
<p>
A few things should be mentioned about the caching mechanism before
we move on. For performance reasons, HTML Purifier caches generated
<code>HTMLPurifier_Definition</code> objects in serialized files
stored (by default) in <code>library/HTMLPurifier/DefinitionCache/Serializer</code>.
A lot of processing is done in order to create these objects, so it
makes little sense to repeat the same processing over and over again
whenever HTML Purifier is called.
</p>
<p>
In order to identify a cache entry, HTML Purifier uses three variables:
the library's version number, the value of %HTML.DefinitionRev and
a serial of relevant configuration. Whenever any of these changes,
a new HTML definition is generated. Notice that there is no way
for the definition object to track changes to customizations: here, it
is up to you to supply appropriate information to DefinitionID and
DefinitionRev.
</p>
<h2 id="addAttribute">Add an attribute</h2>
<p>
For this example, we're going to implement the <code>target</code> attribute found
on <code>a</code> elements. To implement an attribute, we have to
ask a few questions:
</p>
<ol>
<li>What element is it found on?</li>
<li>What is its name?</li>
<li>Is it required or optional?</li>
<li>What are valid values for it?</li>
</ol>
<p>
The first three are easy: the element is <code>a</code>, the attribute
is <code>target</code>, and it is not a required attribute. (If it
was required, we'd need to append an asterisk to the attribute name,
you'll see an example of this in the addElement() example).
</p>
<p>
The last question is a little trickier.
Lets allow the special values: _blank, _self, _target and _top.
The form of this is called an <strong>enumeration</strong>, a list of
valid values, although only one can be used at a time. To translate
this into code form, we write:
</p>
<pre>$config = HTMLPurifier_Config::createDefault();
$config-&gt;set('HTML.DefinitionID', 'enduser-customize.html tutorial');
$config-&gt;set('HTML.DefinitionRev', 1);
$config-&gt;set('Cache.DefinitionImpl', null); // remove this later!
$def = $config-&gt;getHTMLDefinition(true);
<strong>$def->addAttribute('a', 'target', 'Enum#_blank,_self,_target,_top');</strong></pre>
<p>
The <code>Enum#_blank,_self,_target,_top</code> does all the magic.
The string is split into two parts, separated by a hash mark (#):
</p>
<ol>
<li>The first part is the name of what we call an <code>AttrDef</code></li>
<li>The second part is the parameter of the above-mentioned <code>AttrDef</code></li>
</ol>
<p>
If that sounds vague and generic, it's because it is! HTML Purifier defines
an assortment of different attribute types one can use, and each of these
has their own specialized parameter format. Here are some of the more useful
ones:
</p>
<table class="table">
<thead>
<tr>
<th>Type</th>
<th>Format</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<th>Enum</th>
<td><em>[s:]</em>value1,value2,...</td>
<td>
Attribute with a number of valid values, one of which may be used. When
s: is present, the enumeration is case sensitive.
</td>
</tr>
<tr>
<th>Bool</th>
<td>attribute_name</td>
<td>
Boolean attribute, with only one valid value: the name
of the attribute.
</td>
</tr>
<tr>
<th>CDATA</th>
<td></td>
<td>
Attribute of arbitrary text. Can also be referred to as <strong>Text</strong>
(the specification makes a semantic distinction between the two).
</td>
</tr>
<tr>
<th>ID</th>
<td></td>
<td>
Attribute that specifies a unique ID
</td>
</tr>
<tr>
<th>Pixels</th>
<td></td>
<td>
Attribute that specifies an integer pixel length
</td>
</tr>
<tr>
<th>Length</th>
<td></td>
<td>
Attribute that specifies a pixel or percentage length
</td>
</tr>
<tr>
<th>NMTOKENS</th>
<td></td>
<td>
Attribute that specifies a number of name tokens, example: the
<code>class</code> attribute
</td>
</tr>
<tr>
<th>URI</th>
<td></td>
<td>
Attribute that specifies a URI, example: the <code>href</code>
attribute
</td>
</tr>
<tr>
<th>Number</th>
<td></td>
<td>
Attribute that specifies an positive integer number
</td>
</tr>
</tbody>
</table>
<p>
For a complete list, consult
<a href="http://repo.or.cz/w/htmlpurifier.git?a=blob;hb=HEAD;f=library/HTMLPurifier/AttrTypes.php"><code>library/HTMLPurifier/AttrTypes.php</code></a>;
more information on attributes that accept parameters can be found on their
respective includes in
<a href="http://repo.or.cz/w/htmlpurifier.git?a=tree;hb=HEAD;f=library/HTMLPurifier/AttrDef"><code>library/HTMLPurifier/AttrDef</code></a>.
</p>
<p>
Sometimes, the restrictive list in AttrTypes just doesn't cut it. Don't
sweat: you can also use a fully instantiated object as the value. The
equivalent, verbose form of the above example is:
</p>
<pre>$config = HTMLPurifier_Config::createDefault();
$config-&gt;set('HTML.DefinitionID', 'enduser-customize.html tutorial');
$config-&gt;set('HTML.DefinitionRev', 1);
$config-&gt;set('Cache.DefinitionImpl', null); // remove this later!
$def = $config-&gt;getHTMLDefinition(true);
<strong>$def-&gt;addAttribute('a', 'target', new HTMLPurifier_AttrDef_Enum(
array('_blank','_self','_target','_top')
));</strong></pre>
<p>
Trust me, you'll learn to love the shorthand.
</p>
<h2>Add an element</h2>
<p>
Adding attributes is really small-fry stuff, though, and it was possible
to add them (albeit a bit more wordy) prior to 2.0. The real gem of
the Advanced API is adding elements. There are five questions to
ask when adding a new element:
</p>
<ol>
<li>What is the element's name?</li>
<li>What content set does this element belong to?</li>
<li>What are the allowed children of this element?</li>
<li>What attributes does the element allow that are general?</li>
<li>What attributes does the element allow that are specific to this element?</li>
</ol>
<p>
It's a mouthful, and you'll be slightly lost if your not familiar with
the HTML specification, so let's explain them step by step.
</p>
<h3>Content set</h3>
<p>
The HTML specification defines two major content sets: Inline
and Block. Each of these
content sets contain a list of elements: Inline contains things like
<code>span</code> and <code>b</code> while Block contains things like
<code>div</code> and <code>blockquote</code>.
</p>
<p>
These content sets amount to a macro mechanism for HTML definition. Most
elements in HTML are organized into one of these two sets, and most
elements in HTML allow elements from one of these sets. If we had
to write each element verbatim into each other element's allowed
children, we would have ridiculously large lists; instead we use
content sets to compactify the declaration.
</p>
<p>
Practically speaking, there are several useful values you can use here:
</p>
<table class="table">
<thead>
<tr>
<th>Content set</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<th>Inline</th>
<td>Character level elements, text</td>
</tr>
<tr>
<th>Block</th>
<td>Block-like elements, like paragraphs and lists</td>
</tr>
<tr>
<th><em>false</em></th>
<td>
Any element that doesn't fit into the mold, for example <code>li</code>
or <code>tr</code>
</td>
</tr>
</tbody>
</table>
<p>
By specifying a valid value here, all other elements that use that
content set will also allow your element, without you having to do
anything. If you specify <em>false</em>, you'll have to register
your element manually.
</p>
<h3>Allowed children</h3>
<p>
Allowed children defines the elements that this element can contain.
The allowed values may range from none to a complex regexp depending on
your element.
</p>
<p>
If you've ever taken a look at the HTML DTD's before, you may have
noticed declarations like this:
</p>
<pre>&lt;!ELEMENT LI - O (%flow;)* -- list item --&gt;</pre>
<p>
The <code>(%flow;)*</code> indicates the allowed children of the
<code>li</code> tag: <code>li</code> allows any number of flow
elements as its children. (The <code>- O</code> allows the closing tag to be
omitted, though in XML this is not allowed.) In HTML Purifier,
we'd write it like <code>Flow</code> (here's where the content sets
we were discussing earlier come into play). There are three shorthand
content models you can specify:
</p>
<table class="table">
<thead>
<tr>
<th>Content model</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<th>Empty</th>
<td>No children allowed, like <code>br</code> or <code>hr</code></td>
</tr>
<tr>
<th>Inline</th>
<td>Any number of inline elements and text, like <code>span</code></td>
</tr>
<tr>
<th>Flow</th>
<td>Any number of inline elements, block elements and text, like <code>div</code></td>
</tr>
</tbody>
</table>
<p>
This covers 90% of all the cases out there, but what about elements that
break the mold like <code>ul</code>? This guy requires at least one
child, and the only valid children for it are <code>li</code>. The
content model is: <code>Required: li</code>. There are two parts: the
first type determines what <code>ChildDef</code> will be used to validate
content models. The most common values are:
</p>
<table class="table">
<thead>
<tr>
<th>Type</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<th>Required</th>
<td>Children must be one or more of the valid elements</td>
</tr>
<tr>
<th>Optional</th>
<td>Children can be any number of the valid elements</td>
</tr>
<tr>
<th>Custom</th>
<td>Children must follow the DTD-style regex</td>
</tr>
</tbody>
</table>
<p>
You can also implement your own <code>ChildDef</code>: this was done
for a few special cases in HTML Purifier such as <code>Chameleon</code>
(for <code>ins</code> and <code>del</code>), <code>StrictBlockquote</code>
and <code>Table</code>.
</p>
<p>
The second part specifies either valid elements or a regular expression.
Valid elements are separated with horizontal bars (|), i.e.
"<code>a | b | c</code>". Use #PCDATA to represent plain text.
Regular expressions are based off of DTD's style:
</p>
<ul>
<li>Parentheses () are used for grouping</li>
<li>Commas (,) separate elements that should come one after another</li>
<li>Horizontal bars (|) indicate one or the other elements should be used</li>
<li>Plus signs (+) are used for a one or more match</li>
<li>Asterisks (*) are used for a zero or more match</li>
<li>Question marks (?) are used for a zero or one match</li>
</ul>
<p>
For example, "<code>a, b?, (c | d), e+, f*</code>" means "In this order,
one <code>a</code> element, at most one <code>b</code> element,
one <code>c</code> or <code>d</code> element (but not both), one or more
<code>e</code> elements, and any number of <code>f</code> elements."
Regex veterans should be able to jump right in, and those not so savvy
can always copy-paste W3C's content model definitions into HTML Purifier
and hope for the best.
</p>
<p>
A word of warning: while the regex format is extremely flexible on
the developer's side, it is
quite unforgiving on the user's side. If the user input does not <em>exactly</em>
match the specification, the entire contents of the element will
be nuked. This is why there is are specific content model types like
Optional and Required: while they could be implemented as <code>Custom:
(valid | elements)*</code>, the custom classes contain special recovery
measures that make sure as much of the user's original content gets
through. HTML Purifier's core, as a rule, does not use Custom.
</p>
<p>
One final note: you can also use Content Sets inside your valid elements
lists or regular expressions. In fact, the three shorthand content models
mentioned above are just that: abbreviations:
</p>
<table class="table">
<thead>
<tr>
<th>Content model</th>
<th>Implementation</th>
</tr>
</thead>
<tbody>
<tr>
<th>Inline</th>
<td>Optional: Inline | #PCDATA</td>
</tr>
<tr>
<th>Flow</th>
<td>Optional: Flow | #PCDATA</td>
</tr>
</tbody>
</table>
<p>
When the definition is compiled, Inline will be replaced with a
horizontal-bar separated list of inline elements. Also, notice that
it does not contain text: you have to specify that yourself.
</p>
<h3>Common attributes</h3>
<p>
Congratulations: you have just gotten over the proverbial hump (Allowed
children). Common attributes is much simpler, and boils down to
one question: does your element have the <code>id</code>, <code>style</code>,
<code>class</code>, <code>title</code> and <code>lang</code> attributes?
If so, you'll want to specify the <code>Common</code> attribute collection,
which contains these five attributes that are found on almost every
HTML element in the specification.
</p>
<p>
There are a few more collections, but they're really edge cases:
</p>
<table class="table">
<thead>
<tr>
<th>Collection</th>
<th>Attributes</th>
</tr>
</thead>
<tbody>
<tr>
<th>I18N</th>
<td><code>lang</code>, possibly <code>xml:lang</code></td>
</tr>
<tr>
<th>Core</th>
<td><code>style</code>, <code>class</code>, <code>id</code> and <code>title</code></td>
</tr>
</tbody>
</table>
<p>
Common is a combination of the above-mentioned collections.
</p>
<p class="aside">
Readers familiar with the modularization may have noticed that the Core
attribute collection differs from that specified by the <a
href="http://www.w3.org/TR/xhtml-modularization/abstract_modules.html#s_commonatts">abstract
modules of the XHTML Modularization 1.1</a>. We believe this section
to be in error, as <code>br</code> permits the use of the <code>style</code>
attribute even though it uses the <code>Core</code> collection, and
the DTD and XML Schemas supplied by W3C support our interpretation.
</p>
<h3>Attributes</h3>
<p>
If you didn't read the <a href="#addAttribute">earlier section on
adding attributes</a>, read it now. The last parameter is simply
an array of attribute names to attribute implementations, in the exact
same format as <code>addAttribute()</code>.
</p>
<h3>Putting it all together</h3>
<p>
We're going to implement <code>form</code>. Before we embark, lets
grab a reference implementation from over at the
<a href="http://www.w3.org/TR/html4/sgml/loosedtd.html">transitional DTD</a>:
</p>
<pre>&lt;!ELEMENT FORM - - (%flow;)* -(FORM) -- interactive form --&gt;
&lt;!ATTLIST FORM
%attrs; -- %coreattrs, %i18n, %events --
action %URI; #REQUIRED -- server-side form handler --
method (GET|POST) GET -- HTTP method used to submit the form--
enctype %ContentType; &quot;application/x-www-form-urlencoded&quot;
accept %ContentTypes; #IMPLIED -- list of MIME types for file upload --
name CDATA #IMPLIED -- name of form for scripting --
onsubmit %Script; #IMPLIED -- the form was submitted --
onreset %Script; #IMPLIED -- the form was reset --
target %FrameTarget; #IMPLIED -- render in this frame --
accept-charset %Charsets; #IMPLIED -- list of supported charsets --
&gt;</pre>
<p>
Juicy! With just this, we can answer four of our five questions:
</p>
<ol>
<li>What is the element's name? <strong>form</strong></li>
<li>What content set does this element belong to? <strong>Block</strong>
(this needs a little sleuthing, I find the easiest way is to search
the DTD for <code>FORM</code> and determine which set it is in.)</li>
<li>What are the allowed children of this element? <strong>One
or more flow elements, but no nested <code>form</code>s</strong></li>
<li>What attributes does the element allow that are general? <strong>Common</strong></li>
<li>What attributes does the element allow that are specific to this element? <strong>A whole bunch, see ATTLIST;
we're going to do the vital ones: <code>action</code>, <code>method</code> and <code>name</code></strong></li>
</ol>
<p>
Time for some code:
</p>
<pre>$config = HTMLPurifier_Config::createDefault();
$config-&gt;set('HTML.DefinitionID', 'enduser-customize.html tutorial');
$config-&gt;set('HTML.DefinitionRev', 1);
$config-&gt;set('Cache.DefinitionImpl', null); // remove this later!
$def = $config-&gt;getHTMLDefinition(true);
$def-&gt;addAttribute('a', 'target', new HTMLPurifier_AttrDef_Enum(
array('_blank','_self','_target','_top')
));
<strong>$form = $def-&gt;addElement(
'form', // name
'Block', // content set
'Flow', // allowed children
'Common', // attribute collection
array( // attributes
'action*' => 'URI',
'method' => 'Enum#get|post',
'name' => 'ID'
)
);
$form-&gt;excludes = array('form' => true);</strong></pre>
<p>
Each of the parameters corresponds to one of the questions we asked.
Notice that we added an asterisk to the end of the <code>action</code>
attribute to indicate that it is required. If someone specifies a
<code>form</code> without that attribute, the tag will be axed.
Also, the extra line at the end is a special extra declaration that
prevents forms from being nested within each other.
</p>
<p>
And that's all there is to it! Implementing the rest of the form
module is left as an exercise to the user; to see more examples
check the <a href="http://repo.or.cz/w/htmlpurifier.git?a=tree;hb=HEAD;f=library/HTMLPurifier/HTMLModule"><code>library/HTMLPurifier/HTMLModule/</code></a> directory
in your local HTML Purifier installation.
</p>
<h2>And beyond...</h2>
<p>
Perceptive users may have realized that, to a certain extent, we
have simply re-implemented the facilities of XML Schema or the
Document Type Definition. What you are seeing here, however, is
not just an XML Schema or Document Type Definition: it is a fully
expressive method of specifying the definition of HTML that is
a portable superset of the capabilities of the two above-mentioned schema
languages. What makes HTMLDefinition so powerful is the fact that
if we don't have an implementation for a content model or an attribute
definition, you can supply it yourself by writing a PHP class.
</p>
<p>
There are many facets of HTMLDefinition beyond the Advanced API I have
walked you through today. To find out more about these, you can
check out these source files:
</p>
<ul>
<li><a href="http://repo.or.cz/w/htmlpurifier.git?a=blob;hb=HEAD;f=library/HTMLPurifier/HTMLModule.php"><code>library/HTMLPurifier/HTMLModule.php</code></a></li>
<li><a href="http://repo.or.cz/w/htmlpurifier.git?a=blob;hb=HEAD;f=library/HTMLPurifier/ElementDef.php"><code>library/HTMLPurifier/ElementDef.php</code></a></li>
</ul>
<h2 id="optimized">Notes for HTML Purifier 4.2.0 and earlier</h3>
<p>
Previously, this tutorial gave some incorrect template code for
editing raw definitions, and that template code will now produce the
error <q>Due to a documentation error in previous version of HTML
Purifier...</q> Here is how to mechanically transform old-style
code into new-style code.
</p>
<p>
First, identify all code that edits the raw definition object, and
put it together. Ensure none of this code must be run on every
request; if some sub-part needs to always be run, move it outside
this block. Here is an example below, with the raw definition
object code bolded.
</p>
<pre>$config = HTMLPurifier_Config::createDefault();
$config-&gt;set('HTML.DefinitionID', 'enduser-customize.html tutorial');
$config-&gt;set('HTML.DefinitionRev', 1);
$def = $config-&gt;getHTMLDefinition(true);
<strong>$def->addAttribute('a', 'target', 'Enum#_blank,_self,_target,_top');</strong>
$purifier = new HTMLPurifier($config);</pre>
<p>
Next, replace the raw definition retrieval with a
maybeGetRawHTMLDefinition method call inside an if conditional, and
place the editing code inside that if block.
</p>
<pre>$config = HTMLPurifier_Config::createDefault();
$config-&gt;set('HTML.DefinitionID', 'enduser-customize.html tutorial');
$config-&gt;set('HTML.DefinitionRev', 1);
<strong>if ($def = $config-&gt;maybeGetRawHTMLDefinition()) {
$def->addAttribute('a', 'target', 'Enum#_blank,_self,_target,_top');
}</strong>
$purifier = new HTMLPurifier($config);</pre>
<p>
And you're done! Alternatively, if you're OK with not ever caching
your code, the following will still work and not emit warnings.
</p>
<pre>$config = HTMLPurifier_Config::createDefault();
$def = $config-&gt;getHTMLDefinition(true);
$def->addAttribute('a', 'target', 'Enum#_blank,_self,_target,_top');
$purifier = new HTMLPurifier($config);</pre>
<p>
A slightly less efficient version of this was what was going on with
old versions of HTML Purifier.
</p>
<p>
<em>Technical notes:</em> ajh pointed out on <a
href="http://htmlpurifier.org/phorum/read.php?5,5164,5169#msg-5169">in a forum topic</a> that
HTML Purifier appeared to be repeatedly writing to the cache even
when a cache entry already existed. Investigation lead to the
discovery of the following infelicity: caching of customized
definitions didn't actually work! The problem was that even though
a cache file would be written out at the end of the process, there
was no way for HTML Purifier to say, <q>Actually, I've already got a
copy of your work, no need to reconfigure your
customizations</q>. This required the API to change: placing
all of the customizations to the raw definition object in a
conditional which could be skipped.
</p>
</body></html>
<!-- vim: et sw=4 sts=4
-->

View File

@@ -15,7 +15,7 @@
<div id="filing">Filed under End-User</div>
<div id="index">Return to the <a href="index.html">index</a>.</div>
<div id="home"><a href="http://hp.jpsband.org/">HTML Purifier</a> End-User Documentation</div>
<div id="home"><a href="http://htmlpurifier.org/">HTML Purifier</a> End-User Documentation</div>
<p>Prior to HTML Purifier 1.2.0, this library blithely accepted user input that
looked like this:</p>
@@ -31,7 +31,7 @@ by default.</p>
<p>IDs, however, are quite useful functionality to have, so if users start
complaining about broken anchors you'll probably want to turn them back on
with %HTML.EnableAttrID. But before you go mucking around with the config
with %Attr.EnableID. But before you go mucking around with the config
object, it's probably worth to take some precautions to keep your page
validating. Why?</p>
@@ -56,9 +56,9 @@ validating. Why?</p>
deal with the most obvious solution: preventing users from using any IDs that
appear elsewhere on the document. The method is simple:</p>
<pre>$config->set('HTML', 'EnableAttrID', true);
$config->set('Attr', 'IDBlacklist' array(
'list', 'of', 'attributes', 'that', 'are', 'forbidden'
<pre>$config-&gt;set('Attr.EnableID', true);
$config-&gt;set('Attr.IDBlacklist' array(
'list', 'of', 'attribute', 'values', 'that', 'are', 'forbidden'
));</pre>
<p>That being said, there are some notable drawbacks. First of all, you have to
@@ -71,9 +71,9 @@ to possible standards-compliance issues.</p>
<p>Furthermore, this position becomes untenable when a single web page must hold
multiple portions of user-submitted content. Since there's obviously no way
to find out before-hand what IDs users will use, the blacklist is helpless.
And even since HTML Purifier validates each segment seperately, perhaps doing
And since HTML Purifier validates each segment separately, perhaps doing
so at different times, it would be extremely difficult to dynamically update
the blacklist inbetween runs.</p>
the blacklist in between runs.</p>
<p>Finally, simply destroying the ID is extremely un-userfriendly behavior: after
all, they might have simply specified a duplicate ID by accident.</p>
@@ -88,8 +88,8 @@ all, they might have simply specified a duplicate ID by accident.</p>
<p>This method, too, is quite simple: add a prefix to all user IDs. With this
code:</p>
<pre>$config->set('HTML', 'EnableAttrID', true);
$config->set('Attr', 'IDPrefix', 'user_');</pre>
<pre>$config-&gt;set('Attr.EnableID', true);
$config-&gt;set('Attr.IDPrefix', 'user_');</pre>
<p>...this:</p>
@@ -109,7 +109,7 @@ user_ to the beginning.&quot;</p>
nothing about multiple HTML Purifier outputs on one page. Thus, we have
a second configuration value to piggy-back off of: %Attr.IDPrefixLocal:</p>
<pre>$config->set('Attr', 'IDPrefixLocal', 'comment' . $id . '_');</pre>
<pre>$config-&gt;set('Attr.IDPrefixLocal', 'comment' . $id . '_');</pre>
<p>This new attributes does nothing but append on to regular IDPrefix, but is
special in that it is volatile: it's value is determined at run-time and
@@ -137,11 +137,12 @@ anchors is beyond me.</p>
<p>To revert back to pre-1.2.0 behavior, simply:</p>
<pre>$config->set('HTML', 'EnableAttrID', true);</pre>
<pre>$config-&gt;set('Attr.EnableID', true);</pre>
<p>Don't come crying to me when your page mysteriously stops validating, though.</p>
<div id="version">$Id$</div>
</body>
</html>
</html>
<!-- vim: et sw=4 sts=4
-->

View File

@@ -36,7 +36,7 @@ forgiving lexer. You may also be interested in the unit tests located in the
tests/ folder, which provide a living document on how exactly the filter deals
with malformed input.
In summary:
In summary (see corresponding classes for more details):
1. Parse document into an array of tag and text tokens (Lexer)
2. Remove all elements not on whitelist and transform certain other elements
@@ -55,3 +55,5 @@ HTML tags. Things like blog comments are, in all likelihood, most appropriately
written in an extremely restrictive set of markup that doesn't require
all this functionality (or not written in HTML at all), although this may
be changing in the future with the addition of levels of filtering.
vim: et sw=4 sts=4

View File

@@ -6,45 +6,13 @@ through negligence of people. This class will do its job: no more, no less,
and it's up to you to provide it the proper information and proper context
to be effective. Things to remember:
1. Character Encoding: UTF-8.
This segment will soon be obsoleted by enduser-utf8.html
Currently, the parser runs under the assumption that it is dealing
with UTF-8. Not ISO-8859-1 or Windows-1252, UTF-8. And definitely not "no
character encoding explicitly stated" or UTF-7. If you're not using UTF-8 as
your character encoding, make sure you configure HTML Purifier or switch
to UTF-8. Now. Also, make sure any input is properly converted to UTF-8, or
the parser will mangle it badly (though it won't be a security risk if you're
outputting it as UTF-8 though). Character encoding is, in general, a knotty
issue, but do yourself a favor and learn about it:
<http://www.joelonsoftware.com/articles/Unicode.html>
1. Character Encoding: see enduser-utf8.html for more info.
2. Doctype: XHTML 1.0 Transitional
This is what the parser is outputting. For the most
part, it's compatible with HTML 4.01, but XHTML enforces some very nice things
that all web developers should use. Regardless, NO DOCTYPE is a NO. Quirks mode
has waaaay too many quirks for a little parser to handle. We did not select
strict in order to prevent ourselves from being too draconic on users, but
this may be configurable in the future. Do you want standards compliance?
The doctype is a good place to start.
2. IDs: see enduser-id.html for more info
3. IDs
This segment is obsoleted by enduser-id.html
They need to be unique, but without some knowledge of the
rest of the document, it's difficult to know what's unique. %Attr.IDBlacklist
needs to be set: we may want to consider disallowing IDs by default to
save lazy programmers.
3. URIs: see enduser-uri-filter.html
4. [PROJECTED] Links
We're not going to try for spam protection (although
some hooks for such a module might be nice) but we may offer the ability to
only accept relative URLs. Pick the one that's right for you.
4. CSS: document pending
Explain which CSS styles we blocked and why.
5. CSS
While we can prevent the most flagrant cases from affecting your
layout (such as absolutely positioned elements), no amount of code is going
to protect your pages from being attacked by garish colors and plain old
bad taste. A neat feature would be the ability to define acceptable colors
in a document, but that's not likely to be implemented for a while. In the
meantime, be sure to make sure that floated elements (permitted, since they
can be quite useful) can't mess up your layout. Once again, we may want to
disable this by default to protect lazy developers.
vim: et sw=4 sts=4

View File

@@ -15,27 +15,27 @@
<div id="filing">Filed under End-User</div>
<div id="index">Return to the <a href="index.html">index</a>.</div>
<div id="home"><a href="http://hp.jpsband.org/">HTML Purifier</a> End-User Documentation</div>
<div id="home"><a href="http://htmlpurifier.org/">HTML Purifier</a> End-User Documentation</div>
<p>HTML Purifier is a very powerful library. But with power comes great
responsibility, in the form of longer execution times. Remember, this
library isn't lightly grazing over submitted HTML: it's deconstructing
the whole thing, rigorously checking the parts, and then putting it back
<p>HTML Purifier is a very powerful library. But with power comes great
responsibility, in the form of longer execution times. Remember, this
library isn't lightly grazing over submitted HTML: it's deconstructing
the whole thing, rigorously checking the parts, and then putting it back
together. </p>
<p>So, if it so turns out that HTML Purifier is kinda too slow for outbound
<p>So, if it so turns out that HTML Purifier is kinda too slow for outbound
filtering, you've got a few options: </p>
<h2>Inbound filtering</h2>
<p>Perform filtering of HTML when it's submitted by the user. Since the
user is already submitting something, an extra half a second tacked on
to the load time probably isn't going to be that huge of a problem.
Then, displaying the content is a simple a manner of outputting it
directly from your database/filesystem. The trouble with this method is
that your user loses the original text, and when doing edits, will be
handling the filtered text. While this may be a good thing, especially
if you're using a WYSIWYG editor, it can also result in data-loss if a
<p>Perform filtering of HTML when it's submitted by the user. Since the
user is already submitting something, an extra half a second tacked on
to the load time probably isn't going to be that huge of a problem.
Then, displaying the content is a simple a manner of outputting it
directly from your database/filesystem. The trouble with this method is
that your user loses the original text, and when doing edits, will be
handling the filtered text. While this may be a good thing, especially
if you're using a WYSIWYG editor, it can also result in data-loss if a
user makes a typo. </p>
<p>Example (non-functional):</p>
@@ -66,14 +66,14 @@ user makes a typo. </p>
<h2>Caching the filtered output</h2>
<p>Accept the submitted text and put it unaltered into the database, but
then also generate a filtered version and stash that in the database.
Serve the filtered version to readers, and the unaltered version to
editors. If need be, you can invalidate the cache and have the cached
filtered version be regenerated on the first page view. Pros? Full data
retention. Cons? It's more complicated, and opens other editors up to
XSS if they are using a WYSIWYG editor (to fix that, they'd have to be
able to get their hands on the *really* original text served in
<p>Accept the submitted text and put it unaltered into the database, but
then also generate a filtered version and stash that in the database.
Serve the filtered version to readers, and the unaltered version to
editors. If need be, you can invalidate the cache and have the cached
filtered version be regenerated on the first page view. Pros? Full data
retention. Cons? It's more complicated, and opens other editors up to
XSS if they are using a WYSIWYG editor (to fix that, they'd have to be
able to get their hands on the *really* original text served in
plaintext mode). </p>
<p>Example (non-functional):</p>
@@ -108,10 +108,13 @@ plaintext mode). </p>
<p>In short, inbound filtering is the simple option and caching is the
robust option (albeit with bigger storage requirements). </p>
<p>There is a third option, independent of the two we've discussed: profile
and optimize HTMLPurifier yourself. Be sure to report back your results
if you decide to do that! Especially if you port HTML Purifier to C++.
<p>There is a third option, independent of the two we've discussed: profile
and optimize HTMLPurifier yourself. Be sure to report back your results
if you decide to do that! Especially if you port HTML Purifier to C++.
<tt>;-)</tt></p>
</body>
</html>
</html>
<!-- vim: et sw=4 sts=4
-->

231
docs/enduser-tidy.html Normal file
View File

@@ -0,0 +1,231 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta name="description" content="Tutorial for tweaking HTML Purifier's Tidy-like behavior." />
<link rel="stylesheet" type="text/css" href="style.css" />
<title>Tidy - HTML Purifier</title>
</head><body>
<h1>Tidy</h1>
<div id="filing">Filed under Development</div>
<div id="index">Return to the <a href="index.html">index</a>.</div>
<div id="home"><a href="http://htmlpurifier.org/">HTML Purifier</a> End-User Documentation</div>
<p>You've probably heard of HTML Tidy, Dave Raggett's little piece
of software that cleans up poorly written HTML. Let me say it straight
out:</p>
<p class="emphasis">This ain't HTML Tidy!</p>
<p>Rather, Tidy stands for a cool set of Tidy-inspired features in HTML Purifier
that allows users to submit deprecated elements and attributes and get
valid strict markup back. For example:</p>
<pre>&lt;center&gt;Centered&lt;/center&gt;</pre>
<p>...becomes:</p>
<pre>&lt;div style=&quot;text-align:center;&quot;&gt;Centered&lt;/div&gt;</pre>
<p>...when this particular fix is run on the HTML. This tutorial will give
you the lowdown of what exactly HTML Purifier will do when Tidy
is on, and how to fine-tune this behavior. Once again, <strong>you do
not need Tidy installed on your PHP to use these features!</strong></p>
<h2>What does it do?</h2>
<p>Tidy will do several things to your HTML:</p>
<ul>
<li>Convert deprecated elements and attributes to standards-compliant
alternatives</li>
<li>Enforce XHTML compatibility guidelines and other best practices</li>
<li>Preserve data that would normally be removed as per W3C</li>
</ul>
<h2>What are levels?</h2>
<p>Levels describe how aggressive the Tidy module should be when
cleaning up HTML. There are four levels to pick: none, light, medium
and heavy. Each of these levels has a well-defined set of behavior
associated with it, although it may change depending on your doctype.</p>
<dl>
<dt>light</dt>
<dd>This is the <strong>lenient</strong> level. If a tag or attribute
is about to be removed because it isn't supported by the
doctype, Tidy will step in and change into an alternative that
is supported.</dd>
<dt>medium</dt>
<dd>This is the <strong>correctional</strong> level. At this level,
all the functions of light are performed, as well as some extra,
non-essential best practices enforcement. Changes made on this
level are very benign and are unlikely to cause problems.</dd>
<dt>heavy</dt>
<dd>This is the <strong>aggressive</strong> level. If a tag or
attribute is deprecated, it will be converted into a non-deprecated
version, no ifs ands or buts.</dd>
</dl>
<p>By default, Tidy operates on the <strong>medium</strong> level. You can
change the level of cleaning by setting the %HTML.TidyLevel configuration
directive:</p>
<pre>$config-&gt;set('HTML.TidyLevel', 'heavy'); // burn baby burn!</pre>
<h2>Is the light level really light?</h2>
<p>It depends on what doctype you're using. If your documents are HTML
4.01 <em>Transitional</em>, HTML Purifier will be lazy
and won't clean up your <code>center</code>
or <code>font</code> tags. But if you're using HTML 4.01 <em>Strict</em>,
HTML Purifier has no choice: it has to convert them, or they will
be nuked out of existence. So while light on Transitional will result
in little to no changes, light on Strict will still result in quite
a lot of fixes.</p>
<p>This is different behavior from 1.6 or before, where deprecated
tags in transitional documents would
always be cleaned up regardless. This is also better behavior.</p>
<h2>My pages look different!</h2>
<p>HTML Purifier is tasked with converting deprecated tags and
attributes to standards-compliant alternatives, which usually
need copious amounts of CSS. It's also not foolproof: sometimes
things do get lost in the translation. This is why when HTML Purifier
can get away with not doing cleaning, it won't; this is why
the default value is <strong>medium</strong> and not heavy.</p>
<p>Fortunately, only a few attributes have problems with the switch
over. They are described below:</p>
<table class="table">
<thead><tr>
<th>Element@Attr</th>
<th>Changes</th>
</tr></thead>
<tbody>
<tr>
<td>caption@align</td>
<td>Firefox supports stuffing the caption on the
left and right side of the table, a feature that
Internet Explorer, understandably, does not have.
When align equals right or left, the text will simply
be aligned on the left or right side.</td>
</tr>
<tr>
<td>img@align</td>
<td>The implementation for align bottom is good, but not
perfect. There are a few pixel differences.</td>
</tr>
<tr>
<td>br@clear</td>
<td>Clear both gets a little wonky in Internet Explorer. Haven't
really been able to figure out why.</td>
</tr>
<tr>
<td>hr@noshade</td>
<td>All browsers implement this slightly differently: we've
chosen to make noshade horizontal rules gray.</td>
</tr>
</tbody>
</table>
<p>There are a few more minor, although irritating, bugs.
Some older browsers support deprecated attributes,
but not CSS. Transformed elements and attributes will look unstyled
to said browsers. Also, CSS precedence is slightly different for
inline styles versus presentational markup. In increasing precedence:</p>
<ol>
<li>Presentational attributes</li>
<li>External style sheets</li>
<li>Inline styling</li>
</ol>
<p>This means that styling that may have been masked by external CSS
declarations will start showing up (a good thing, perhaps). Finally,
if you've turned off the style attribute, almost all of
these transformations will not work. Sorry mates.</p>
<p>You can review the rendering before and after of these transformations
by consulting the <a
href="http://htmlpurifier.org/live/smoketests/attrTransform.php">attrTransform.php
smoketest</a>.</p>
<h2>I like the general idea, but the specifics bug me!</h2>
<p>So you want HTML Purifier to clean up your HTML, but you're not
so happy about the br@clear implementation. That's perfectly fine!
HTML Purifier will make accomodations:</p>
<pre>$config-&gt;set('HTML.Doctype', 'XHTML 1.0 Transitional');
$config-&gt;set('HTML.TidyLevel', 'heavy'); // all changes, minus...
<strong>$config-&gt;set('HTML.TidyRemove', 'br@clear');</strong></pre>
<p>That third line does the magic, removing the br@clear fix
from the module, ensuring that <code>&lt;br clear="both" /&gt;</code>
will pass through unharmed. The reverse is possible too:</p>
<pre>$config-&gt;set('HTML.Doctype', 'XHTML 1.0 Transitional');
$config-&gt;set('HTML.TidyLevel', 'none'); // no changes, plus...
<strong>$config-&gt;set('HTML.TidyAdd', 'p@align');</strong></pre>
<p>In this case, all transformations are shut off, except for the p@align
one, which you found handy.</p>
<p>To find out what the names of fixes you want to turn on or off are,
you'll have to consult the source code, specifically the files in
<code>HTMLPurifier/HTMLModule/Tidy/</code>. There is, however, a
general syntax:</p>
<table class="table">
<thead>
<tr>
<th>Name</th>
<th>Example</th>
<th>Interpretation</th>
</tr>
</thead>
<tbody>
<tr>
<td>element</td>
<td>font</td>
<td>Tag transform for <em>element</em></td>
</tr>
<tr>
<td>element@attr</td>
<td>br@clear</td>
<td>Attribute transform for <em>attr</em> on <em>element</em></td>
</tr>
<tr>
<td>@attr</td>
<td>@lang</td>
<td>Global attribute transform for <em>attr</em></td>
</tr>
<tr>
<td>e#content_model_type</td>
<td>blockquote#content_model_type</td>
<td>Change of child processing implementation for <em>e</em></td>
</tr>
</tbody>
</table>
<h2>So... what's the lowdown?</h2>
<p>The lowdown is, quite frankly, HTML Purifier's default settings are
probably good enough. The next step is to bump the level up to heavy,
and if that still doesn't satisfy your appetite, do some fine-tuning.
Other than that, don't worry about it: this all works silently and
effectively in the background.</p>
</body></html>
<!-- vim: et sw=4 sts=4
-->

View File

@@ -0,0 +1,204 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta name="description" content="Tutorial for creating custom URI filters." />
<link rel="stylesheet" type="text/css" href="style.css" />
<title>URI Filters - HTML Purifier</title>
</head><body>
<h1>URI Filters</h1>
<div id="filing">Filed under End-User</div>
<div id="index">Return to the <a href="index.html">index</a>.</div>
<div id="home"><a href="http://htmlpurifier.org/">HTML Purifier</a> End-User Documentation</div>
<p>
This is a quick and dirty document to get you on your way to writing
custom URI filters for your own URL filtering needs. Why would you
want to write a URI filter? If you need URIs your users put into
HTML to magically change into a different URI, this is
exactly what you need!
</p>
<h2>Creating the class</h2>
<p>
Any URI filter you make will be a subclass of <code>HTMLPurifier_URIFilter</code>.
The scaffolding is thus:
</p>
<pre>class HTMLPurifier_URIFilter_<strong>NameOfFilter</strong> extends HTMLPurifier_URIFilter
{
public $name = '<strong>NameOfFilter</strong>';
public function prepare($config) {}
public function filter(&$uri, $config, $context) {}
}</pre>
<p>
Fill in the variable <code>$name</code> with the name of your filter, and
take a look at the two methods. <code>prepare()</code> is an initialization
method that is called only once, before any filtering has been done of the
HTML. Use it to perform any costly setup work that only needs to be done
once. <code>filter()</code> is the guts and innards of our filter:
it takes the URI and does whatever needs to be done to it.
</p>
<p>
If you've worked with HTML Purifier, you'll recognize the <code>$config</code>
and <code>$context</code> parameters. On the other hand, <code>$uri</code>
is something unique to this section of the application: it's a
<code>HTMLPurifier_URI</code> object. The interface is thus:
</p>
<pre>class HTMLPurifier_URI
{
public $scheme, $userinfo, $host, $port, $path, $query, $fragment;
public function HTMLPurifier_URI($scheme, $userinfo, $host, $port, $path, $query, $fragment);
public function toString();
public function copy();
public function getSchemeObj($config, $context);
public function validate($config, $context);
}</pre>
<p>
The first three methods are fairly self-explanatory: you have a constructor,
a serializer, and a cloner. Generally, you won't be using them when
you are manipulating the URI objects themselves.
<code>getSchemeObj()</code> is a special purpose method that returns
a <code>HTMLPurifier_URIScheme</code> object corresponding to the specific
URI at hand. <code>validate()</code> performs general-purpose validation
on the internal components of a URI. Once again, you don't need to
worry about these: they've already been handled for you.
</p>
<h2>URI format</h2>
<p>
As a URIFilter, we're interested in the member variables of the URI object.
</p>
<table class="quick"><tbody>
<tr><th>Scheme</th> <td>The protocol for identifying (and possibly locating) a resource (http, ftp, https)</td></tr>
<tr><th>Userinfo</th> <td>User information such as a username (bob)</td></tr>
<tr><th>Host</th> <td>Domain name or IP address of the server (example.com, 127.0.0.1)</td></tr>
<tr><th>Port</th> <td>Network port number for the server (80, 12345)</td></tr>
<tr><th>Path</th> <td>Data that identifies the resource, possibly hierarchical (/path/to, ed@example.com)</td></tr>
<tr><th>Query</th> <td>String of information to be interpreted by the resource (?q=search-term)</td></tr>
<tr><th>Fragment</th> <td>Additional information for the resource after retrieval (#bookmark)</td></tr>
</tbody></table>
<p>
Because the URI is presented to us in this form, and not
<code>http://bob@example.com:8080/foo.php?q=string#hash</code>, it saves us
a lot of trouble in having to parse the URI every time we want to filter
it. For the record, the above URI has the following components:
</p>
<table class="quick"><tbody>
<tr><th>Scheme</th> <td>http</td></tr>
<tr><th>Userinfo</th> <td>bob</td></tr>
<tr><th>Host</th> <td>example.com</td></tr>
<tr><th>Port</th> <td>8080</td></tr>
<tr><th>Path</th> <td>/foo.php</td></tr>
<tr><th>Query</th> <td>q=string</td></tr>
<tr><th>Fragment</th> <td>hash</td></tr>
</tbody></table>
<p>
Note that there is no question mark or octothorpe in the query or
fragment: these get removed during parsing.
</p>
<p>
With this information, you can get straight to implementing your
<code>filter()</code> method. But one more thing...
</p>
<h2>Return value: Boolean, not URI</h2>
<p>
You may have noticed that the URI is being passed in by reference.
This means that whatever changes you make to it, those changes will
be reflected in the URI object the callee had. <strong>Do not
return the URI object: it is unnecessary and will cause bugs.</strong>
Instead, return a boolean value, true if the filtering was successful,
or false if the URI is beyond repair and needs to be axed.
</p>
<p>
Let's suppose I wanted to write a filter that converted links with a
custom <code>image</code> scheme to its corresponding real path on
our website:
</p>
<pre>class HTMLPurifier_URIFilter_TransformImageScheme extends HTMLPurifier_URIFilter
{
public $name = 'TransformImageScheme';
public function filter(&$uri, $config, $context) {
if ($uri->scheme !== 'image') return true;
$img_name = $uri->path;
// Overwrite the previous URI object
$uri = new HTMLPurifier_URI('http', null, null, null, '/img/' . $img_name . '.png', null, null);
return true;
}
}</pre>
<p>
Notice I did not <code>return $uri;</code>. This filter would turn
<code>image:Foo</code> into <code>/img/Foo.png</code>.
</p>
<h2>Activating your filter</h2>
<p>
Having a filter is all well and good, but you need to tell HTML Purifier
to use it. Fortunately, this part's simple:
</p>
<pre>$uri = $config->getDefinition('URI');
$uri->addFilter(new HTMLPurifier_URIFilter_<strong>NameOfFilter</strong>(), $config);</pre>
<p>
After adding a filter, you won't be able to set configuration directives.
Structure your code accordingly.
</p>
<!-- XXX: link to new documentation system -->
<h2>Post-filter</h2>
<p>
Remember our TransformImageScheme filter? That filter acted before we had
performed scheme validation; otherwise, the URI would have been filtered
out when it was discovered that there was no image scheme. Well, a post-filter
is run after scheme specific validation, so it's ideal for bulk
post-processing of URIs, including munging. To specify a URI as a post-filter,
set the <code>$post</code> member variable to TRUE.
</p>
<pre>class HTMLPurifier_URIFilter_MyPostFilter extends HTMLPurifier_URIFilter
{
public $name = 'MyPostFilter';
public $post = true;
// ... extra code here
}
</pre>
<h2>Examples</h2>
<p>
Check the
<a href="http://repo.or.cz/w/htmlpurifier.git?a=tree;hb=HEAD;f=library/HTMLPurifier/URIFilter">URIFilter</a>
directory for more implementation examples, and see <a href="proposal-new-directives.txt">the
new directives proposal document</a> for ideas on what could be implemented
as a filter.
</p>
</body></html>
<!-- vim: et sw=4 sts=4
-->

View File

@@ -5,12 +5,11 @@
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta name="description" content="Describes the rationale for using UTF-8, the ramifications otherwise, and how to make the switch." />
<link rel="stylesheet" type="text/css" href="./style.css" />
<script defer="defer" type="text/javascript" src="./toc-gen.js"></script>
<style type="text/css">
.minor td {font-style:italic;}
</style>
<title>UTF-8 - HTML Purifier</title>
<title>UTF-8: The Secret of Character Encoding - HTML Purifier</title>
<!-- Note to users: this document, though professing to be UTF-8, attempts
to use only ASCII characters, because most webservers are configured
@@ -19,21 +18,27 @@ own advice for sake of portability. -->
</head><body>
<h1>UTF-8</h1>
<h1>UTF-8: The Secret of Character Encoding</h1>
<div id="filing">Filed under End-User</div>
<div id="index">Return to the <a href="index.html">index</a>.</div>
<div id="home"><a href="http://hp.jpsband.org/">HTML Purifier</a> End-User Documentation</div>
<div id="home"><a href="http://htmlpurifier.org/">HTML Purifier</a> End-User Documentation</div>
<p>Character encoding and character sets, in truth, are not that
difficult to understand. But if you don't understand them, you are going
to be caught by surprise by some of HTML Purifier's behavior, namely
the fact that it operates UTF-8 or the limitations of the character
encoding transformations it does. This document will walk you through
<p>Character encoding and character sets are not that
difficult to understand, but so many people blithely stumble
through the worlds of programming without knowing what to actually
do about it, or say &quot;Ah, it's a job for those <em>internationalization</em>
experts.&quot; No, it is not! This document will walk you through
determining the encoding of your system and how you should handle
this information. It will stay away from excessive discussion on
the internals of character encoding, but offer the information in
asides that can easily be skipped.</p>
the internals of character encoding.</p>
<p>This document is not designed to be read in its entirety: it will
slowly introduce concepts that build on each other: you need not get to
the bottom to have learned something new. However, I strongly
recommend you read all the way to <strong>Why UTF-8?</strong>, because at least
at that point you'd have made a conscious decision not to migrate,
which can be a rewarding (but difficult) task.</p>
<blockquote class="aside">
<div class="label">Asides</div>
@@ -43,10 +48,54 @@ asides that can easily be skipped.</p>
with a greater understanding of the underlying issues.</p>
</blockquote>
<h2>Table of Contents</h2>
<ol id="toc">
<li><a href="#findcharset">Finding the real encoding</a></li>
<li><a href="#findmetacharset">Finding the embedded encoding</a></li>
<li><a href="#fixcharset">Fixing the encoding</a><ol>
<li><a href="#fixcharset-none">No embedded encoding</a></li>
<li><a href="#fixcharset-diff">Embedded encoding disagrees</a></li>
<li><a href="#fixcharset-server">Changing the server encoding</a><ol>
<li><a href="#fixcharset-server-php">PHP header() function</a></li>
<li><a href="#fixcharset-server-phpini">PHP ini directive</a></li>
<li><a href="#fixcharset-server-nophp">Non-PHP</a></li>
<li><a href="#fixcharset-server-htaccess">.htaccess</a></li>
<li><a href="#fixcharset-server-ext">File extensions</a></li>
</ol></li>
<li><a href="#fixcharset-xml">XML</a></li>
<li><a href="#fixcharset-internals">Inside the process</a></li>
</ol></li>
<li><a href="#whyutf8">Why UTF-8?</a><ol>
<li><a href="#whyutf8-i18n">Internationalization</a></li>
<li><a href="#whyutf8-user">User-friendly</a></li>
<li><a href="#whyutf8-forms">Forms</a><ol>
<li><a href="#whyutf8-forms-urlencoded">application/x-www-form-urlencoded</a></li>
<li><a href="#whyutf8-forms-multipart">multipart/form-data</a></li>
</ol></li>
<li><a href="#whyutf8-support">Well supported</a></li>
<li><a href="#whyutf8-htmlpurifier">HTML Purifiers</a></li>
</ol></li>
<li><a href="#migrate">Migrate to UTF-8</a><ol>
<li><a href="#migrate-db">Configuring your database</a><ol>
<li><a href="#migrate-db-legit">Legit method</a></li>
<li><a href="#migrate-db-binary">Binary</a></li>
</ol></li>
<li><a href="#migrate-editor">Text editor</a></li>
<li><a href="#migrate-bom">Byte Order Mark (headers already sent!)</a></li>
<li><a href="#migrate-fonts">Fonts</a><ol>
<li><a href="#migrate-fonts-obscure">Obscure scripts</a></li>
<li><a href="#migrate-fonts-occasional">Occasional use</a></li>
</ol></li>
<li><a href="#migrate-variablewidth">Dealing with variable width in functions</a></li>
</ol></li>
<li><a href="#externallinks">Further Reading</a></li>
</ol>
<h2 id="findcharset">Finding the real encoding</h2>
<p>In the beginning, there was ASCII, and things were simple. But they
weren't good, for no one could write in Cryllic or Thai. So there
weren't good, for no one could write in Cyrillic or Thai. So there
exploded a proliferation of character encodings to remedy the problem
by extending the characters ASCII could express. This ridiculously
simplified version of the history of character encodings shows us that
@@ -57,7 +106,7 @@ there are now many character encodings floating around.</p>
interpret raw zeroes and ones into real characters. It
usually does this by pairing numbers with characters.</p>
<p>There are many different types of character encodings floating
around, but the ones we deal most frequently with are ASCII,
around, but the ones we deal most frequently with are ASCII,
8-bit encodings, and Unicode-based encodings.</p>
<ul>
<li><strong>ASCII</strong> is a 7-bit encoding based on the
@@ -69,9 +118,8 @@ there are now many character encodings floating around.</p>
see a page on the web, chances are it's encoded in one
of these encodings.</li>
<li><strong>Unicode-based encodings</strong> implement the
Unicode standard and include UTF-8, UCS-2 and UTF-16.
They go beyond 8-bits (the first two are variable length,
while the second one uses 16-bits), and support almost
Unicode standard and include UTF-8, UTF-16 and UTF-32/UCS-4.
They go beyond 8-bits and support almost
every language in the world. UTF-8 is gaining traction
as the dominant international encoding of the web.</li>
</ul>
@@ -88,7 +136,7 @@ browser:</p>
<dd>View &gt; Encoding: bulleted item is unofficial name</dd>
</dl>
<p>Internet Explorer won't give you the mime (i.e. useful/real) name of the
<p>Internet Explorer won't give you the MIME (i.e. useful/real) name of the
character encoding, so you'll have to look it up using their description.
Some common ones:</p>
@@ -166,6 +214,12 @@ if your <code>META</code> tag claims that either:</p>
<h2 id="fixcharset">Fixing the encoding</h2>
<p class="aside">The advice given here is for pages being served as
vanilla <code>text/html</code>. Different practices must be used
for <code>application/xml</code> or <code>application/xml+xhtml</code>, see
<a href="http://www.w3.org/TR/2002/NOTE-xhtml-media-types-20020430/">W3C's
document on XHTML media types</a> for more information.</p>
<p>If your <code>META</code> encoding and your real encoding match,
savvy! You can skip this section. If they don't...</p>
@@ -181,7 +235,7 @@ of your real encoding.</p>
why the character encoding should be explicitly stated. When the
browser isn't told what the character encoding of a text is, it
has to guess: and sometimes the guess is wrong. Hackers can manipulate
this guess in order to slip XSS pass filters and then fool the
this guess in order to slip XSS past filters and then fool the
browser into executing it as active code. A great example of this
is the <a href="http://shiflett.org/archive/177">Google UTF-7
exploit</a>.</p>
@@ -252,7 +306,8 @@ languages</a>. The appropriate code is:</p>
<p>...replacing UTF-8 with whatever your embedded encoding is.
This code must come before any output, so be careful about
stray whitespace in your application.</p>
stray whitespace in your application (i.e., any whitespace before
output excluding whitespace within &lt;?php ?&gt; tags).</p>
<h4 id="fixcharset-server-phpini">PHP ini directive</h4>
@@ -263,8 +318,8 @@ header call: <code><a href="http://php.net/ini.core#ini.default-charset">default
<p>...will also do the trick. If PHP is running as an Apache module (and
not as FastCGI, consult
<a href="http://php.net/phpinfo">phpinfo</a>() for details), you can even use htaccess do apply this property
globally:</p>
<a href="http://php.net/phpinfo">phpinfo</a>() for details), you can even use htaccess to apply this property
across many PHP files:</p>
<pre><a href="http://php.net/configuration.changes#configuration.changes.apache">php_value</a> default_charset &quot;UTF-8&quot;</pre>
@@ -275,7 +330,7 @@ your own php.ini file, ask your support for details. Use:</p>
<h4 id="fixcharset-server-nophp">Non-PHP</h4>
<p>You may, for whatever reason, may need to set the character encoding
<p>You may, for whatever reason, need to set the character encoding
on non-PHP files, usually plain ol' HTML files. Doing this
is more of a hit-or-miss process: depending on the software being
used as a webserver and the configuration of that software, certain
@@ -310,10 +365,11 @@ to send anything at all:</p>
<pre><a href="http://httpd.apache.org/docs/1.3/mod/core.html#adddefaultcharset">AddDefaultCharset</a> Off</pre>
<p>...making your <code>META</code> tags the sole source of
character encoding information. In these cases, it is
<em>especially</em> important to make sure you have valid <code>META</code>
tags on your pages and all the text before them is ASCII.</p>
<p>...making your internal charset declaration (usually the <code>META</code> tags)
the sole source of character encoding
information. In these cases, it is <em>especially</em> important to make
sure you have valid <code>META</code> tags on your pages and all the
text before them is ASCII.</p>
<blockquote class="aside"><p>These directives can also be
placed in httpd.conf file for Apache, but
@@ -378,30 +434,32 @@ IIS to change character encodings, I'd be grateful.</p>
<p><code>META</code> tags are the most common source of embedded
encodings, but they can also come from somewhere else: XML
processing instructions. They look like:</p>
Declarations. They look like:</p>
<pre>&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;</pre>
<p>...and are most often found in XML documents (including XHTML).</p>
<p>For XHTML, this processing instruction theoretically
<p>For XHTML, this XML Declaration theoretically
overrides the <code>META</code> tag. In reality, this happens only when the
XHTML is actually served as legit XML and not HTML, which is almost
always never due to Internet Explorer's lack of support for
XHTML is actually served as legit XML and not HTML, which is almost always
never due to Internet Explorer's lack of support for
<code>application/xhtml+xml</code> (even though doing so is often
argued to be <a href="http://www.hixie.ch/advocacy/xhtml">good practice</a>).</p>
argued to be <a href="http://www.hixie.ch/advocacy/xhtml">good
practice</a> and is required by the XHTML 1.1 specification).</p>
<p>For XML, however, this processing instruction is extremely important.
<p>For XML, however, this XML Declaration is extremely important.
Since most webservers are not configured to send charsets for .xml files,
this is the only thing a parser has to go on. Furthermore, the default
for XML files is UTF-8, which often butts heads with more common
ISO-8859-1 encoding (you see this in garbled RSS feeds).</p>
<p>In short, if you use XHTML and have gone through the
trouble of adding the XML header, be sure to make sure it jives
with your <code>META</code> tags and HTTP headers.</p>
trouble of adding the XML Declaration, make sure it jives
with your <code>META</code> tags (which should only be present
if served in text/html) and HTTP headers.</p>
<h3>Inside the process</h3>
<h3 id="fixcharset-internals">Inside the process</h3>
<p>This section is not required reading,
but may answer some of your questions on what's going on in all
@@ -423,7 +481,7 @@ if we don't know it's character encoding? And how do we figure out
the character encoding, if we don't know the contents of the
<code>META</code> tag?</p>
<p>Fortunantely for us, the characters we need to write the
<p>Fortunately for us, the characters we need to write the
<code>META</code> are in ASCII, which is pretty much universal
over every character encoding that is in common use today. So,
all the web-browser has to do is parse all the way down until
@@ -431,7 +489,7 @@ it gets to the Content-Type tag, extract the character encoding
tag, then re-parse the document according to this new information.</p>
<p>Obviously this is complicated, so browsers prefer the simpler
and more efficient solution: get the character encoding from a
and more efficient solution: get the character encoding from a
somewhere other than the document itself, i.e. the HTTP headers,
much to the chagrin of HTML authors who can't set these headers.</p>
@@ -456,7 +514,7 @@ usage in one language sometimes requires the occasional special character
that, without surprise, is not available in your character set. Sometimes
developers get around this by adding support for multiple encodings: when
using Chinese, use Big5, when using Japanese, use Shift-JIS, when
using Greek, etc. Other times, they use character entities with great
using Greek, etc. Other times, they use character references with great
zeal.</p>
<p>UTF-8, however, obviates the need for any of these complicated
@@ -468,16 +526,16 @@ you don't have to use those user-unfriendly entities.</p>
<h3 id="whyutf8-user">User-friendly</h3>
<p>Websites encoded in Latin-1 (ISO-8859-1) which ocassionally need
<p>Websites encoded in Latin-1 (ISO-8859-1) which occasionally need
a special character outside of their scope often will use a character
entity to achieve the desired effect. For instance, &theta; can be
entity reference to achieve the desired effect. For instance, &theta; can be
written <code>&amp;theta;</code>, regardless of the character encoding's
support of Greek letters.</p>
<p>This works nicely for limited use of special characters, but
say you wanted this sentence of Chinese text: &#28608;&#20809;,
&#36889;&#20841;&#20491;&#23383;&#26159;&#29978;&#40636;&#24847;&#24605;.
The entity-ized version would look like this:</p>
The ampersand encoded version would look like this:</p>
<pre>&amp;#28608;&amp;#20809;, &amp;#36889;&amp;#20841;&amp;#20491;&amp;#23383;&amp;#26159;&amp;#29978;&amp;#40636;&amp;#24847;&amp;#24605;</pre>
@@ -495,7 +553,7 @@ an application that originally used ISO-8859-1 but switched to UTF-8
when it became far to cumbersome to support foreign languages. Bots
will now actually go through articles and convert character entities
to their corresponding real characters for the sake of user-friendliness
and searcheability. See
and searchability. See
<a href="http://meta.wikimedia.org/wiki/Help:Special_characters">Meta's
page on special characters</a> for more details.
</p></blockquote>
@@ -503,7 +561,7 @@ page on special characters</a> for more details.
<h3 id="whyutf8-forms">Forms</h3>
<p>While we're on the tack of users, how do non-UTF-8 web forms deal
with characters that our outside of their character set? Rather than
with characters that are outside of their character set? Rather than
discuss what UTF-8 does right, we're going to show what could go wrong
if you didn't use UTF-8 and people tried to use characters outside
of your character encoding.</p>
@@ -517,21 +575,24 @@ which may be used by POST, and is required when you want to upload
files.</p>
<p>The following is a summarization of notes from
<a href="http://ppewww.physics.gla.ac.uk/~flavell/charset/form-i18n.html">
<a href="http://web.archive.org/web/20060427015200/ppewww.ph.gla.ac.uk/~flavell/charset/form-i18n.html">
<code>FORM</code> submission and i18n</a>. That document contains lots
of useful information, but is written in a rambly manner, so
here I try to get right to the point.</p>
here I try to get right to the point. (Note: the original has
disappeared off the web, so I am linking to the Web Archive copy.)</p>
<h4 id="whyutf8-forms-urlencoded"><code>application/x-www-form-urlencoded</code></h4>
<p>This is the Content-Type that GET requests must use, and POST requests
use by default. It involves the ubiquituous percent encoding format that
use by default. It involves the ubiquitous percent encoding format that
looks something like: <code>%C3%86</code>. There is no official way of
determining the character encoding of such a request, since the percent
encoding operates on a byte level, so it is usually assumed that it
is the same as the encoding the page containing the form was submitted
in. You'll run into very few problems if you only use characters in
the character encoding you chose.</p>
in. (<a href="http://tools.ietf.org/html/rfc3986#section-2.5">RFC 3986</a>
recommends that textual identifiers be translated to UTF-8; however, browser
compliance is spotty.) You'll run into very few problems
if you only use characters in the character encoding you chose.</p>
<p>However, once you start adding characters outside of your encoding
(and this is a lot more common than you may think: take curly
@@ -542,7 +603,7 @@ browser you're using, they might:</p>
<ul>
<li>Replace the unsupported characters with useless question marks,</li>
<li>Attempt to fix the characters (example: smart quotes to regular quotes),</li>
<li>Replace the character with a character entity, or</li>
<li>Replace the character with a character entity reference, or</li>
<li>Send it anyway as a different character encoding mixed in
with the original encoding (usually Windows-1252 rather than
iso-8859-1 or UTF-8 interspersed in 8-bit)</li>
@@ -558,7 +619,7 @@ since UTF-8 supports every character.</p>
<h4 id="whyutf8-forms-multipart"><code>multipart/form-data</code></h4>
<p>Multipart form submission takes a way a lot of the ambiguity
<p>Multipart form submission takes away a lot of the ambiguity
that percent-encoding had: the server now can explicitly ask for
certain encodings, and the client can explicitly tell the server
during the form submission what encoding the fields are in.</p>
@@ -571,9 +632,9 @@ Each method has deficiencies, especially the former.</p>
<p>If you tell the browser to send the form in the same encoding as
the page, you still have the trouble of what to do with characters
that are outside of the character encoding's range. The behavior, once
again, varies: Firefox 2.0 entity-izes them while Internet Explorer
7.0 mangles them beyond intelligibility. For serious I18N purposes,
this is not an option.</p>
again, varies: Firefox 2.0 converts them to character entity references
while Internet Explorer 7.0 mangles them beyond intelligibility. For
serious internationalization purposes, this is not an option.</p>
<p>The other possibility is to set Accept-Encoding to UTF-8, which
begs the question: Why aren't you using UTF-8 for everything then?
@@ -604,22 +665,378 @@ hounding you about broken pages.</p>
<h3 id="whyutf8-htmlpurifier">HTML Purifier</h3>
<p>And finally, we get to HTML Purifier.</p>
<p>And finally, we get to HTML Purifier. HTML Purifier is built to
deal with UTF-8: any indications otherwise are the result of an
encoder that converts text from your preferred encoding to UTF-8, and
back again. HTML Purifier never touches anything else, and leaves
it up to the module iconv to do the dirty work.</p>
<p>This approach, however, is not perfect. iconv is blithely unaware
of HTML character entities. HTML Purifier, in order to
protect against sophisticated escaping schemes, normalizes all character
and numeric entity references before processing the text. This leads to
one important ramification:</p>
<p><strong>Any character that is not supported by the target character
set, regardless of whether or not it is in the form of a character
entity reference or a raw character, will be silently ignored.</strong></p>
<p>Example of this principle at work: say you have <code>&amp;theta;</code>
in your HTML, but the output is in Latin-1 (which, understandably,
does not understand Greek), the following process will occur (assuming you've
set the encoding correctly using %Core.Encoding):</p>
<ul>
<li>The <code>Encoder</code> will transform the text from ISO 8859-1 to UTF-8
(note that theta is preserved here since it doesn't actually use
any non-ASCII characters): <code>&amp;theta;</code></li>
<li>The <code>EntityParser</code> will transform all named and numeric
character entities to their corresponding raw UTF-8 equivalents:
<code>&theta;</code></li>
<li>HTML Purifier processes the code: <code>&theta;</code></li>
<li>The <code>Encoder</code> now transforms the text back from UTF-8
to ISO 8859-1. Since Greek is not supported by ISO 8859-1, it
will be either ignored or replaced with a question mark:
<code>?</code></li>
</ul>
<p>This behaviour is quite unsatisfactory. It is a deal-breaker for
international applications, and it can be mildly annoying for the provincial
soul who occasionally needs a special character. Since 1.4.0, HTML
Purifier has provided a slightly more palatable workaround using
%Core.EscapeNonASCIICharacters. The process now looks like:</p>
<ul>
<li>The <code>Encoder</code> transforms encoding to UTF-8: <code>&amp;theta;</code></li>
<li>The <code>EntityParser</code> transforms entities: <code>&theta;</code></li>
<li>HTML Purifier processes the code: <code>&theta;</code></li>
<li>The <code>Encoder</code> replaces all non-ASCII characters
with numeric entity reference: <code>&amp;#952;</code></li>
<li>For good measure, <code>Encoder</code> transforms encoding back to
original (which is strictly unnecessary for 99% of encodings
out there): <code>&amp;#952;</code> (remember, it's all ASCII!)</li>
</ul>
<p>...which means that this is only good for an occasional foray into
the land of Unicode characters, and is totally unacceptable for Chinese
or Japanese texts. The even bigger kicker is that, supposing the
input encoding was actually ISO-8859-7, which <em>does</em> support
theta, the character would get converted into a character entity reference
anyway! (The Encoder does not discriminate).</p>
<p>The current functionality is about where HTML Purifier will be for
the rest of eternity. HTML Purifier could attempt to preserve the original
form of the character references so that they could be substituted back in, only the
DOM extension kills them off irreversibly. HTML Purifier could also attempt
to be smart and only convert non-ASCII characters that weren't supported
by the target encoding, but that would require reimplementing iconv
with HTML awareness, something I will not do.</p>
<p>So there: either it's UTF-8 or crippled international support. Your pick! (and I'm
not being sarcastic here: some people could care less about other languages).</p>
<h2 id="migrate">Migrate to UTF-8</h2>
<h3 id="migrate-editor">Text editor</h3>
<p>So, you've decided to bite the bullet, and want to migrate to UTF-8.
Note that this is not for the faint-hearted, and you should expect
the process to take longer than you think it will take.</p>
<p>The general idea is that you convert all existing text to UTF-8,
and then you set all the headers and META tags we discussed earlier
to UTF-8. There are many ways going about doing this: you could
write a conversion script that runs through the database and re-encodes
everything as UTF-8 or you could do the conversion on the fly when someone
reads the page. The details depend on your system, but I will cover
some of the more subtle points of migration that may trip you up.</p>
<h3 id="migrate-db">Configuring your database</h3>
<h3 id="migrate-convert">Convert old text</h3>
<p>Most modern databases, the most prominent open-source ones being MySQL
4.1+ and PostgreSQL, support character encodings. If you're switching
to UTF-8, logically speaking, you'd want to make sure your database
knows about the change too. There are some caveats though:</p>
<h4 id="migrate-db-legit">Legit method</h4>
<p>Standardization in terms of SQL syntax for specifying character
encodings is notoriously spotty. Refer to your respective database's
documentation on how to do this properly.</p>
<p>For <a href="http://dev.mysql.com/doc/refman/5.0/en/charset-conversion.html">MySQL</a>, <code>ALTER</code> will magically perform the
character encoding conversion for you. However, you have
to make sure that the text inside the column is what is says it is:
if you had put Shift-JIS in an ISO 8859-1 column, MySQL will irreversibly mangle
the text when you try to convert it to UTF-8. You'll have to convert
it to a binary field, convert it to a Shift-JIS field (the real encoding),
and then finally to UTF-8. Many a website had pages irreversibly mangled
because they didn't realize that they'd been deluding themselves about
the character encoding all along; don't become the next victim.</p>
<p>For <a href="http://www.postgresql.org/docs/8.2/static/multibyte.html">PostgreSQL</a>, there appears to be no direct way to change the
encoding of a database (as of 8.2). You will have to dump the data, and then reimport
it into a new table. Make sure that your client encoding is set properly:
this is how PostgreSQL knows to perform an encoding conversion.</p>
<p>Many times, you will be also asked about the &quot;collation&quot; of
the new column. Collation is how a DBMS sorts text, like ordering
B, C and A into A, B and C (the problem gets surprisingly complicated
when you get to languages like Thai and Japanese). If in doubt,
going with the default setting is usually a safe bet.</p>
<p>Once the conversion is all said and done, you still have to remember
to set the client encoding (your encoding) properly on each database
connection using <code>SET NAMES</code> (which is standard SQL and is
usually supported).</p>
<h4 id="migrate-db-binary">Binary</h4>
<p>Due to the aforementioned compatibility issues, a more interoperable
way of storing UTF-8 text is to stuff it in a binary datatype.
<code>CHAR</code> becomes <code>BINARY</code>, <code>VARCHAR</code> becomes
<code>VARBINARY</code> and <code>TEXT</code> becomes <code>BLOB</code>.
Doing so can save you some huge headaches:</p>
<ul>
<li>The syntax for binary data types is very portable,</li>
<li>MySQL 4.0 has <em>no</em> support for character encodings, so
if you want to support it you <em>have</em> to use binary,</li>
<li>MySQL, as of 5.1, has no support for four byte UTF-8 characters,
which represent characters beyond the basic multilingual
plane, and</li>
<li>You will never have to worry about your DBMS being too smart
and attempting to convert your text when you don't want it to.</li>
</ul>
<p>MediaWiki, a very prominent international application, uses binary fields
for storing their data because of point three.</p>
<p>There are drawbacks, of course:</p>
<ul>
<li>Database tools like PHPMyAdmin won't be able to offer you inline
text editing, since it is declared as binary,</li>
<li>It's not semantically correct: it's really text not binary
(lying to the database),</li>
<li>Unless you use the not-very-portable wizardry mentioned above,
you have to change the encoding yourself (usually, you'd do
it on the fly), and</li>
<li>You will not have collation.</li>
</ul>
<p>Choose based on your circumstances.</p>
<h3 id="migrate-editor">Text editor</h3>
<p>For more flat-file oriented systems, you will often be tasked with
converting reams of existing text and HTML files into UTF-8, as well as
making sure that all new files uploaded are properly encoded. Once again,
I can only point vaguely in the right direction for converting your
existing files: make sure you backup, make sure you use
<a href="http://php.net/ref.iconv">iconv</a>(), and
make sure you know what the original character encoding of the files
is (or are, depending on the tidiness of your system).</p>
<p>However, I can proffer more specific advice on the subject of
text editors. Many text editors have notoriously spotty Unicode support.
To find out how your editor is doing, you can check out <a
href="http://www.alanwood.net/unicode/utilities_editors.html">this list</a>
or <a href="http://en.wikipedia.org/wiki/Comparison_of_text_editors#Encoding_support">Wikipedia's list.</a>
I personally use Notepad++, which works like a charm when it comes to UTF-8.
Usually, you will have to <strong>explicitly</strong> tell the editor through some dialogue
(usually Save as or Format) what encoding you want it to use. An editor
will often offer &quot;Unicode&quot; as a method of saving, which is
ambiguous. Make sure you know whether or not they really mean UTF-8
or UTF-16 (which is another flavor of Unicode).</p>
<p>The two things to look out for are whether or not the editor
supports <strong>font mixing</strong> (multiple
fonts in one document) and whether or not it adds a <strong>BOM</strong>.
Font mixing is important because fonts rarely have support for every
language known to mankind: in order to be flexible, an editor must
be able to take a little from here and a little from there, otherwise
all your Chinese characters will come as nice boxes. We'll discuss
BOM below.</p>
<h3 id="migrate-bom">Byte Order Mark (headers already sent!)</h3>
<p>The BOM, or <a href="http://en.wikipedia.org/wiki/Byte_Order_Mark">Byte
Order Mark</a>, is a magical, invisible character placed at
the beginning of UTF-8 files to tell people what the encoding is and
what the endianness of the text is. It is also unnecessary.</p>
<p>Because it's invisible, it often
catches people by surprise when it starts doing things it shouldn't
be doing. For example, this PHP file:</p>
<pre><strong>BOM</strong>&lt;?php
header('Location: index.php');
?&gt;</pre>
<p>...will fail with the all too familiar <strong>Headers already sent</strong>
PHP error. And because the BOM is invisible, this culprit will go unnoticed.
My suggestion is to only use ASCII in PHP pages, but if you must, make
sure the page is saved WITHOUT the BOM.</p>
<blockquote class="aside">
<p>The headers the error is referring to are <strong>HTTP headers</strong>,
which are sent to the browser before any HTML to tell it various
information. The moment any regular text (and yes, a BOM counts as
ordinary text) is output, the headers must be sent, and you are
not allowed to send anymore. Thus, the error.</p>
</blockquote>
<p>If you are reading in text files to insert into the middle of another
page, it is strongly advised (but not strictly necessary) that you replace out the UTF-8 byte
sequence for BOM <code>&quot;\xEF\xBB\xBF&quot;</code> before inserting it in,
via:</p>
<pre>$text = str_replace(&quot;\xEF\xBB\xBF&quot;, '', $text);</pre>
<h3 id="migrate-fonts">Fonts</h3>
<p>Generally speaking, people who are having trouble with fonts fall
into two categories:</p>
<ul>
<li>Those who want to
use an extremely obscure language for which there is very little
support even among native speakers of the language, and</li>
<li>Those where the primary language of the text is
well-supported but there are occasional characters
that aren't supported.</li>
</ul>
<p>Yes, there's always a chance where an English user happens across
a Sinhalese website and doesn't have the right font. But an English user
who happens not to have the right fonts probably has no business reading Sinhalese
anyway. So we'll deal with the other two edge cases.</p>
<h4 id="migrate-fonts-obscure">Obscure scripts</h4>
<p>If you run a Bengali website, you may get comments from users who
would like to read your website but get heaps of question marks or
other meaningless characters. Fixing this problem requires the
installation of a font or language pack which is often highly
dependent on what the language is. <a href="http://bn.wikipedia.org/wiki/%E0%A6%89%E0%A6%87%E0%A6%95%E0%A6%BF%E0%A6%AA%E0%A7%87%E0%A6%A1%E0%A6%BF%E0%A6%AF%E0%A6%BC%E0%A6%BE:Bangla_script_display_and_input_help">Here is an example</a>
of such a help file for the Bengali language; I am sure there are
others out there too. You just have to point users to the appropriate
help file.</p>
<h4 id="migrate-fonts-occasional">Occasional use</h4>
<p>A prime example of when you'll see some very obscure Unicode
characters embedded in what otherwise would be very bland ASCII are
letters of the
<a href="http://en.wikipedia.org/wiki/International_Phonetic_Alphabet">International
Phonetic Alphabet (IPA)</a>, use to designate pronunciations in a very standard
manner (you probably see them all the time in your dictionary). Your
average font probably won't have support for all of the IPA characters
like &#664; (bilabial click) or &#658; (voiced postalveolar fricative).
So what's a poor browser to do? Font mix! Smart browsers like Mozilla Firefox
and Internet Explorer 7 will borrow glyphs from other fonts in order
to make sure that all the characters display properly.</p>
<p>But what happens when the browser isn't smart and happens to be the
most widely used browser in the entire world? Microsoft IE 6
is not smart enough to borrow from other fonts when a character isn't
present, so more often than not you'll be slapped with a nice big &#65533;.
To get things to work, MSIE 6 needs a little nudge. You could configure it
to use a different font to render the text, but you can achieve the same
effect by selectively changing the font for blocks of special characters
to known good Unicode fonts.</p>
<p>Fortunately, the folks over at Wikipedia have already done all the
heavy lifting for you. Get the CSS from the horses mouth here:
<a href="http://en.wikipedia.org/wiki/MediaWiki:Common.css">Common.css</a>,
and search for &quot;.IPA&quot; There are also a smattering of
other classes you can use for other purposes, check out
<a href="http://meta.wikimedia.org/wiki/Help:Special_characters#Displaying_Special_Characters">this page</a>
for more details. For you lazy ones, this should work:</p>
<pre>.Unicode {
font-family: Code2000, &quot;TITUS Cyberbit Basic&quot;, &quot;Doulos SIL&quot;,
&quot;Chrysanthi Unicode&quot;, &quot;Bitstream Cyberbit&quot;,
&quot;Bitstream CyberBase&quot;, Thryomanes, Gentium, GentiumAlt,
&quot;Lucida Grande&quot;, &quot;Arial Unicode MS&quot;, &quot;Microsoft Sans Serif&quot;,
&quot;Lucida Sans Unicode&quot;;
font-family /**/:inherit; /* resets fonts for everyone but IE6 */
}</pre>
<p>The standard usage goes along the lines of <code>&lt;span class=&quot;Unicode&quot;&gt;Crazy
Unicode stuff here&lt;/span&gt;</code>. Characters in the
<a href="http://en.wikipedia.org/wiki/Windows_Glyph_List_4">Windows Glyph List</a>
usually don't need to be fixed, but for anything else you probably
want to play it safe. Unless, of course, you don't care about IE6
users.</p>
<h3 id="migrate-variablewidth">Dealing with variable width in functions</h3>
<p>When people claim that PHP6 will solve all our Unicode problems, they're
misinformed. It will not fix any of the aforementioned troubles. It will,
however, fix the problem we are about to discuss: processing UTF-8 text
in PHP.</p>
<p>PHP (as of PHP5) is blithely unaware of the existence of UTF-8 (with a few
notable exceptions). Sometimes, this will cause problems, other times,
this won't. So far, we've avoided discussing the architecture of
UTF-8, so, we must first ask, what is UTF-8? Yes, it supports Unicode,
and yes, it is variable width. Other traits:</p>
<ul>
<li>Every character's byte sequence is unique and will never be found
inside the byte sequence of another character,</li>
<li>UTF-8 may use up to four bytes to encode a character,</li>
<li>UTF-8 text must be checked for well-formedness,</li>
<li>Pure ASCII is also valid UTF-8, and</li>
<li>Binary sorting will sort UTF-8 in the same order as Unicode.</li>
</ul>
<p>Each of these traits affect different domains of text processing
in different ways. It is beyond the scope of this document to explain
what precisely these implications are. PHPWact provides
a very good <a href="http://www.phpwact.org/php/i18n/utf-8">reference document</a>
on what to expect from each function, although coverage is spotty in
some areas. Their more general notes on
<a href="http://www.phpwact.org/php/i18n/charsets">character sets</a>
are also worth looking at for information on UTF-8. Some rules of thumb
when dealing with Unicode text:</p>
<ul>
<li>Do not EVER use functions that:<ul>
<li>...convert case (strtolower, strtoupper, ucfirst, ucwords)</li>
<li>...claim to be case-insensitive (str_ireplace, stristr, strcasecmp)</li>
</ul></li>
<li>Think twice before using functions that:<ul>
<li>...count characters (strlen will return bytes, not characters;
str_split and word_wrap may corrupt)</li>
<li>...convert characters to entity references (UTF-8 doesn't need entities)</li>
<li>...do very complex string processing (*printf)</li>
</ul></li>
</ul>
<p>Note: this list applies to UTF-8 encoded text only: if you have
a string that you are 100% sure is ASCII, be my guest and use
<code>strtolower</code> (HTML Purifier uses this function.)</p>
<p>Regardless, always think in bytes, not characters. If you use strpos()
to find the position of a character, it will be in bytes, but this
usually won't matter since substr() also operates with byte indices!</p>
<p>You'll also need to make sure your UTF-8 is well-formed and will
probably need replacements for some of these functions. I recommend
using Harry Fuecks' <a href="http://phputf8.sourceforge.net/">PHP
UTF-8</a> library, rather than use mb_string directly. HTML Purifier
also defines a few useful UTF-8 compatible functions: check out
<code>Encoder.php</code> in the <code>/library/HTMLPurifier/</code>
directory.</p>
<h2 id="externallinks">Further Reading</h2>
<p>Well, that's it. Hopefully this document has served as a very
practical springboard into knowledge of how UTF-8 works. You may have
decided that you don't want to migrate yet: that's fine, just know
what will happen to your output and what bug reports you may receive.</p>
<p>Many other developers have already discussed the subject of Unicode,
UTF-8 and internationalization, and I would like to defer to them for
a more in-depth look into character sets and encodings.</p>
@@ -637,4 +1054,7 @@ a more in-depth look into character sets and encodings.</p>
</ul>
</body>
</html>
</html>
<!-- vim: et sw=4 sts=4
-->

View File

@@ -15,7 +15,7 @@
<div id="filing">Filed under End-User</div>
<div id="index">Return to the <a href="index.html">index</a>.</div>
<div id="home"><a href="http://hp.jpsband.org/">HTML Purifier</a> End-User Documentation</div>
<div id="home"><a href="http://htmlpurifier.org/">HTML Purifier</a> End-User Documentation</div>
<p>Clients like their YouTube videos. It gives them a warm fuzzy feeling when
they see a neat little embedded video player on their websites that can play
@@ -37,7 +37,7 @@ from a specific website, it probably is okay. If no amount of pleading will
convince the people upstairs that they should just settle with just linking
to their movies, you may find this technique very useful.</p>
<h2>Sample</h2>
<h2>Looking in</h2>
<p>Below is custom code that allows users to embed
YouTube videos. This is not favoritism: this trick can easily be adapted for
@@ -67,57 +67,27 @@ into your documents. YouTube's code goes like this:</p>
</ol>
<p>What point 2 means is that if we have code like <code>&lt;span
class=&quot;embed-youtube&quot;&gt;AyPzM5WK8ys&lt;/span&gt;</code> your
class=&quot;youtube-embed&quot;&gt;AyPzM5WK8ys&lt;/span&gt;</code> your
application can reconstruct the full object from this small snippet that
passes through HTML Purifier <em>unharmed</em>.</p>
passes through HTML Purifier <em>unharmed</em>.
<a href="http://repo.or.cz/w/htmlpurifier.git?a=blob;hb=HEAD;f=library/HTMLPurifier/Filter/YouTube.php">Show me the code!</a></p>
<pre>
&lt;?php
<p>And the corresponding usage:</p>
class HTMLPurifierX_PreserveYouTube extends HTMLPurifier
{
function purify($html, $config = null) {
$pre_regex = '#&lt;object[^&gt;]+&gt;.+?'.
'http://www.youtube.com/v/([A-Za-z0-9]+).+?&lt;/object&gt;#';
$pre_replace = '&lt;span class=&quot;youtube-embed&quot;&gt;\1&lt;/span&gt;';
$html = preg_replace($pre_regex, $pre_replace, $html);
$html = parent::purify($html, $config);
$post_regex = '#&lt;span class=&quot;youtube-embed&quot;&gt;([A-Za-z0-9]+)&lt;/span&gt;#';
$post_replace = '&lt;object width=&quot;425&quot; height=&quot;350&quot; '.
'data=&quot;http://www.youtube.com/v/\1&quot;&gt;'.
'&lt;param name=&quot;movie&quot; value=&quot;http://www.youtube.com/v/\1&quot;&gt;&lt;/param&gt;'.
'&lt;param name=&quot;wmode&quot; value=&quot;transparent&quot;&gt;&lt;/param&gt;'.
'&lt;!--[if IE]&gt;'.
'&lt;embed src=&quot;http://www.youtube.com/v/\1&quot;'.
'type=&quot;application/x-shockwave-flash&quot;'.
'wmode=&quot;transparent&quot; width=&quot;425&quot; height=&quot;350&quot; /&gt;'.
'&lt;![endif]--&gt;'.
'&lt;/object&gt;';
$html = preg_replace($post_regex, $post_replace, $html);
return $html;
}
}
<pre>&lt;?php
$config-&gt;set('Filter.YouTube', true);
?&gt;</pre>
$purifier = new HTMLPurifierX_PreserveYouTube();
$html_still_with_youtube = $purifier->purify($html_with_youtube);
?&gt;
</pre>
<p>There is a bit going on here, so let's explain.</p>
<p>There is a bit going in the two code snippets, so let's explain.</p>
<ol>
<li>The class uses the prefix <code>HTMLPurifierX</code> because it's
userspace code. Don't use <code>HTMLPurifier</code> in front of your
class, since it might clobber another class in the library.</li>
<li>In order to keep the interface compatible, we've extended HTMLPurifier
into a new class that preserves the YouTube videos. This means that
all you have to do is replace all instances of
<code>new HTMLPurifier</code> to <code>new
HTMLPurifierX_PreserveYouTube</code>. There's other ways to go about
doing this: if you were calling a function that wrapped HTML Purifier,
you could paste the PHP right there. If you wanted to be really
fancy, you could make a decorator for HTMLPurifier.</li>
<li>This is a Filter object, which intercepts the HTML that is
coming into and out of the purifier. You can add as many
filter objects as you like. <code>preFilter()</code>
processes the code before it gets purified, and <code>postFilter()</code>
processes the code afterwards. So, we'll use <code>preFilter()</code> to
replace the object tag with a <code>span</code>, and <code>postFilter()</code>
to restore it.</li>
<li>The first preg_replace call replaces any YouTube code users may have
embedded into the benign span tag. Span is used because it is inline,
and objects are inline too. We are very careful to be extremely
@@ -165,17 +135,19 @@ it is important that you are cognizant of the risk.</p>
<p>This should go without saying, but if you're going to adapt this code
for Google Video or the like, make sure you do it <em>right</em>. It's
extremely easy to allow a character too many in the final section and
extremely easy to allow a character too many in <code>postFilter()</code> and
suddenly you're introducing XSS into HTML Purifier's XSS free output. HTML
Purifier may be well written, but it cannot guard against vulnerabilities
introduced after it has finished.</p>
<h2>Future plans</h2>
<h2>Help out!</h2>
<p>This functionality is part of the core library, using the
HTMLPurifier_Filter class to acheive the desired effect. Our implementation
is slightly different, and this page will be updated to reflect that
once 1.4.0 is released.</p>
<p>If you write a filter for your favorite video destination (or anything
like that, for that matter), send it over and it might get included
with the core!</p>
</body>
</html>
</html>
<!-- vim: et sw=4 sts=4
-->

View File

@@ -1,14 +1,23 @@
<?php exit;
<?php
// This file demonstrates basic usage of HTMLPurifier.
require_once '/path/to/htmlpurifier/library/HTMLPurifier.auto.php';
// replace this with the path to the HTML Purifier library
require_once '../../library/HTMLPurifier.auto.php';
$purifier = new HTMLPurifier();
$config = HTMLPurifier_Config::createDefault();
// configuration goes here:
$config->set('Core.Encoding', 'UTF-8'); // replace with your encoding
$config->set('HTML.Doctype', 'XHTML 1.0 Transitional'); // replace with your doctype
$purifier = new HTMLPurifier($config);
// untrusted input HTML
$html = '<b>Simple and short';
$pure_html = $purifier->purify($html);
echo $pure_html;
echo '<pre>' . htmlspecialchars($pure_html) . '</pre>';
?>
// vim: et sw=4 sts=4

View File

@@ -1,136 +0,0 @@
<?php
// using _REQUEST because we accept GET and POST requests
$content = empty($_REQUEST['xml']) ? 'text/html' : 'application/xhtml+xml';
header("Content-type:$content;charset=UTF-8");
// prevent PHP versions with shorttags from barfing
echo '<?xml version="1.0" encoding="UTF-8" ?>
';
function getFormMethod() {
return (isset($_REQUEST['post'])) ? 'post' : 'get';
}
if (empty($_REQUEST['strict'])) {
?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<?php
} else {
?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<?php
}
?>
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
<head>
<title>HTML Purifier Live Demo</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
</head>
<body>
<h1>HTML Purifier Live Demo</h1>
<?php
require_once '../../library/HTMLPurifier.auto.php';
if (!empty($_REQUEST['html'])) { // start result
if (strlen($_REQUEST['html']) > 50000) {
?>
<p>Request exceeds maximum allowed text size of 50kb.</p>
<?php
} else { // start main processing
$html = get_magic_quotes_gpc() ? stripslashes($_REQUEST['html']) : $_REQUEST['html'];
$config = HTMLPurifier_Config::createDefault();
$config->set('Core', 'TidyFormat', !empty($_REQUEST['tidy']));
$config->set('HTML', 'Strict', !empty($_REQUEST['strict']));
$purifier = new HTMLPurifier($config);
$pure_html = $purifier->purify($html);
?>
<p>Here is your purified HTML:</p>
<div style="border:5px solid #CCC;margin:0 10%;padding:1em;">
<?php if(getFormMethod() == 'get') { ?>
<div style="float:right;">
<a href="http://validator.w3.org/check?uri=referer"><img
src="http://www.w3.org/Icons/valid-xhtml10"
alt="Valid XHTML 1.0 Transitional" height="31" width="88" style="border:0;" /></a>
</div>
<?php } ?>
<?php
echo $pure_html;
?>
<div style="clear:both;"></div>
</div>
<p>Here is the source code of the purified HTML:</p>
<pre><?php
echo htmlspecialchars($pure_html, ENT_COMPAT, 'UTF-8');
?></pre>
<?php
if (getFormMethod() == 'post') { // start POST validation notice
?>
<p>If you would like to validate the code with
<a href="http://validator.w3.org/#validate-by-input">W3C's
validator</a>, copy and paste the <em>entire</em> demo page's source.</p>
<?php
} // end POST validation notice
} // end main processing
// end result
} else {
?>
<p>Welcome to the live demo. Enter some HTML and see how HTML Purifier
will filter it.</p>
<?php
}
?>
<form id="filter" action="demo.php<?php
echo '?' . getFormMethod();
if (isset($_REQUEST['profile']) || isset($_REQUEST['XDEBUG_PROFILE'])) {
echo '&amp;XDEBUG_PROFILE=1';
} ?>" method="<?php echo getFormMethod(); ?>">
<fieldset>
<legend>HTML Purifier Input (<?php echo getFormMethod(); ?>)</legend>
<textarea name="html" cols="60" rows="15"><?php
if (isset($html)) {
echo htmlspecialchars(
HTMLPurifier_Encoder::cleanUTF8($html), ENT_COMPAT, 'UTF-8');
}
?></textarea>
<?php if (getFormMethod() == 'get') { ?>
<p><strong>Warning:</strong> GET request method can only hold
8129 characters (probably less depending on your browser).
If you need to test anything
larger than that, try the <a href="demo.php?post">POST form</a>.</p>
<?php } ?>
<?php if (extension_loaded('tidy')) { ?>
<div>Nicely format output with Tidy? <input type="checkbox" value="1"
name="tidy"<?php if (!empty($_REQUEST['tidy'])) echo ' checked="checked"'; ?> /></div>
<?php } ?>
<div>XHTML 1.0 Strict output? <input type="checkbox" value="1"
name="strict"<?php if (!empty($_REQUEST['strict'])) echo ' checked="checked"'; ?> /></div>
<div>Serve as application/xhtml+xml? (not for IE) <input type="checkbox" value="1"
name="xml"<?php if (!empty($_REQUEST['xml'])) echo ' checked="checked"'; ?> /></div>
<div>
<input type="submit" value="Submit" name="submit" class="button" />
</div>
</fieldset>
</form>
<p>Return to <a href="http://hp.jpsband.org/">HTML Purifier's home page</a>.
Try the form in <a href="demo.php?get">GET</a> and <a href="demo.php?post">POST</a> request
flavors (GET is easy to validate with W3C, but POST allows larger inputs).</p>
</body>
</html>

9
docs/fixquotes.htc Normal file
View File

@@ -0,0 +1,9 @@
<public:attach event="oncontentready" onevent="init();" />
<script>
function init() {
element.innerHTML = '&#8220;'+element.innerHTML+'&#8221;';
}
</script>
<!-- vim: et sw=4 sts=4
-->

View File

@@ -13,7 +13,7 @@
<h1>Documentation</h1>
<p><strong><a href="http://hp.jpsband.org/">HTML Purifier</a></strong> has documentation for all types of people.
<p><strong><a href="http://htmlpurifier.org/">HTML Purifier</a></strong> has documentation for all types of people.
Here is an index of all of them.</p>
<h2>End-user</h2>
@@ -31,9 +31,18 @@ information for casual developers using HTML Purifier.</p>
<dt><a href="enduser-slow.html">Speeding up HTML Purifier</a></dt>
<dd>Explains how to speed up HTML Purifier through caching or inbound filtering.</dd>
<dt><a href="enduser-utf8.html">UTF-8</a></dt>
<dt><a href="enduser-utf8.html">UTF-8: The Secret of Character Encoding</a></dt>
<dd>Describes the rationale for using UTF-8, the ramifications otherwise, and how to make the switch.</dd>
<dt><a href="enduser-tidy.html">Tidy</a></dt>
<dd>Tutorial for tweaking HTML Purifier's Tidy-like behavior.</dd>
<dt><a href="enduser-customize.html">Customize</a></dt>
<dd>Tutorial for customizing HTML Purifier's tag and attribute sets.</dd>
<dt><a href="enduser-uri-filter.html">URI Filters</a></dt>
<dd>Tutorial for creating custom URI filters.</dd>
</dl>
<h2>Development</h2>
@@ -42,9 +51,6 @@ conventions.</p>
<dl>
<dt><a href="dev-code-quality.html">Code Quality Issues</a></dt>
<dd>Discusses code quality issues and places that need to be refactored.</dd>
<dt><a href="dev-progress.html">Implementation Progress</a></dt>
<dd>Tables detailing HTML element and CSS property implementation coverage.</dd>
@@ -54,6 +60,16 @@ conventions.</p>
<dt><a href="dev-optimization.html">Optimization</a></dt>
<dd>Discusses possible methods of optimizing HTML Purifier.</dd>
<dt><a href="dev-flush.html">Flushing the Purifier</a></dt>
<dd>Discusses when to flush HTML Purifier's various caches.</dd>
<dt><a href="dev-advanced-api.html">Advanced API</a></dt>
<dd>Specification for HTML Purifier's advanced API for defining
custom filtering behavior.</dd>
<dt><a href="dev-config-schema.html">Config Schema</a></dt>
<dd>Describes config schema framework in HTML Purifier.</dd>
</dl>
<h2>Proposals</h2>
@@ -82,8 +98,8 @@ the code. They may be upgraded to HTML files or stay as TXT scratchpads.</p>
<table class="table">
<thead><tr>
<th width="10%">Type</th>
<th width="20%">Name</th>
<th style="width:10%">Type</th>
<th style="width:20%">Name</th>
<th>Description</th>
</tr></thead>
@@ -101,6 +117,18 @@ the code. They may be upgraded to HTML files or stay as TXT scratchpads.</p>
<td>Common security issues that may still arise (half-baked).</td>
</tr>
<tr>
<td>Development</td>
<td><a href="dev-config-bcbreaks.txt">Config BC Breaks</a></td>
<td>Backwards-incompatible changes in HTML Purifier 4.0.0</td>
</tr>
<tr>
<td>Development</td>
<td><a href="dev-code-quality.txt">Code Quality Issues</a></td>
<td>Enumerates code quality issues and places that need to be refactored.</td>
</tr>
<tr>
<td>Proposal</td>
<td><a href="proposal-filter-levels.txt">Filter levels</a></td>
@@ -119,10 +147,16 @@ the code. They may be upgraded to HTML files or stay as TXT scratchpads.</p>
<td>Assorted configuration options that could be implemented.</td>
</tr>
<tr>
<td>Proposal</td>
<td><a href="proposal-css-extraction.txt">CSS extraction</a></td>
<td>Taking the inline CSS out of documents and into <code>style</code>.</td>
</tr>
<tr>
<td>Reference</td>
<td><a href="ref-loose-vs-strict.txt">Loose vs.Strict</a></td>
<td>Differences between HTML Strict and Transitional versions.</td>
<td><a href="ref-content-models.txt">Handling Content Model Changes</a></td>
<td>Discusses how to tidy up content model changes using custom ChildDef classes.</td>
</tr>
<tr>
@@ -133,14 +167,8 @@ the code. They may be upgraded to HTML files or stay as TXT scratchpads.</p>
<tr>
<td>Reference</td>
<td><a href="ref-strictness.txt">Strictness</a></td>
<td>Short essay on how loose definition isn't really loose.</td>
</tr>
<tr>
<td>Reference</td>
<td><a href="ref-xhtml-1.1.txt">XHTML 1.1</a></td>
<td>What we'd have to do to support XHTML 1.1.</td>
<td><a href="ref-html-modularization.txt">Modularization of HTMLDefinition</a></td>
<td>Provides a high-level overview of the concepts behind HTMLModules.</td>
</tr>
<tr>
@@ -153,6 +181,8 @@ the code. They may be upgraded to HTML files or stay as TXT scratchpads.</p>
</table>
<div id="version">$Id$</div>
</body>
</html>
</html>
<!-- vim: et sw=4 sts=4
-->

View File

@@ -15,7 +15,7 @@
<div id="filing">Filed under Proposals</div>
<div id="index">Return to the <a href="index.html">index</a>.</div>
<div id="home"><a href="http://hp.jpsband.org/">HTML Purifier</a> End-User Documentation</div>
<div id="home"><a href="http://htmlpurifier.org/">HTML Purifier</a> End-User Documentation</div>
<p>Your website probably has a color-scheme.
<span style="color:#090; background:#FFF;">Green on white</span>,
@@ -42,7 +42,8 @@ into the mix.</li>
something like that?</li>
</ol>
<div id="version">$Id$</div>
</body>
</html>
<!-- vim: et sw=4 sts=4
-->

View File

@@ -7,33 +7,17 @@ value is used for. This means decentralized configuration declarations that
are nevertheless error checking and a centralized configuration object.
Directives are divided into namespaces, indicating the major portion of
functionality they cover (although there may be overlaps. Please consult
functionality they cover (although there may be overlaps). Please consult
the documentation in ConfigDef for more information on these namespaces.
Since configuration is dependant on context, internal classes require a
configuration object to be passed as a parameter. (They also require a
Context object).
Context object). A majority of classes do not need the config object,
but for those who do, it is a lifesaver.
In relation to HTMLDefinition and CSSDefinition, there could be a special class
of directives that influence the *construction* of the Definition object.
A theoretical call pattern would look like:
Definition objects are complex datatypes influenced by their respective
directive namespaces (HTMLDefinition with HTML and CSSDefinition with CSS).
If any of these directives is updated, HTML Purifier forces the definition
to be regenerated.
1. Client calls Config->getHTMLDefinition()
2. Config calls HTMLDefinition->createNew(this)
3. HTMLDefinition constructs itself with base configuration
4. HTMLDefinition calls Config->get('HTML')
5. Config returns array of directives
6. HTMLDefinition performs operations and changes specified by directives
7. HTMLPurifier returns constructed definition
8. Config caches definition so it doesn't have to be generated again
9. Config returns definition
You could also override Config's copy of the definition with your own
custom copy, which OVERRIDES all directives. Only the base, vanilla copy
is the Singleton, the object actually interfaced with is a operated-upon
clone of that object. Also, if an update to the directives would update
the definition, you'd have to force reconstruction.
In practice, the pulling directives from the config object are
solely need-based, and the flex points are littered throughout the
setup() function. Some sort of refactoring is likely in order.
vim: et sw=4 sts=4

View File

@@ -0,0 +1,34 @@
Extracting inline CSS from HTML Purifier
voodoofied: Assigning semantics to elements
Sander Tekelenburg brought to my attention the poor programming style of
inline CSS in HTML documents. In an ideal world, we wouldn't be using inline
CSS at all: everything would be assigned using semantic class attributes
from an external stylesheet.
With ExtractStyleBlocks and CSSTidy, this is now possible (when allowed, users
can specify a style element which gets extracted from the user-submitted HTML, which
the application can place in the head of the HTML document). But there still
is the issue of inline CSS that refuses to go away.
The basic idea behind this feature is assign every element a unique identifier,
and then move all of the CSS data to a style-sheet. This HTML:
<div style="text-align:center">Big <span style="color:red;">things</span>!</div>
into
<div id="hp-12345">Big <span id="hp-12346">things</span>!</div>
and a stylesheet that is:
#hp-12345 {text-align:center;}
#hp-12346 {color:red;}
Beyond that, HTML Purifier can magically merge common CSS values together,
and a whole manner of other heuristic things. HTML Purifier should also
make it easy for an admin to re-style the HTML semantically. Speed is not
an issue. Also, better WYSIWYG editors are needed.
vim: et sw=4 sts=4

211
docs/proposal-errors.txt Normal file
View File

@@ -0,0 +1,211 @@
Considerations for ErrorCollection
Presently, HTML Purifier takes a code-execution centric approach to handling
errors. Errors are organized and grouped according to which segment of the
code triggers them, not necessarily the portion of the input document that
triggered the error. This means that errors are pseudo-sorted by category,
rather than location in the document.
One easy way to "fix" this problem would be to re-sort according to line number.
However, the "category" style information we derive from naively following
program execution is still useful. After all, each of the strategies which
can report errors still process the document mostly linearly. Furthermore,
not only do they process linearly, but the way they pass off operations to
sub-systems mirrors that of the document. For example, AttrValidator will
linearly proceed through elements, and on each element will use AttrDef to
validate those contents. From there, the attribute might have more
sub-components, which have execution passed off accordingly.
In fact, each strategy handles a very specific class of "error."
RemoveForeignElements - element tokens
MakeWellFormed - element token ordering
FixNesting - element token ordering
ValidateAttributes - attributes of elements
The crucial point is that while we care about the hierarchy governing these
different errors, we *don't* care about any other information about what actually
happens to the elements. This brings up another point: if HTML Purifier fixes
something, this is not really a notice/warning/error; it's really a suggestion
of a way to fix the aforementioned defects.
In short, the refactoring to take this into account kinda sucks.
Errors should not be recorded in order that they are reported. Instead, they
should be bound to the line (and preferably element) in which they were found.
This means we need some way to uniquely identify every element in the document,
which doesn't presently exist. An easy way of adding this would be to track
line columns. An important ramification of this is that we *must* use the
DirectLex implementation.
1. Implement column numbers for DirectLex [DONE!]
2. Disable error collection when not using DirectLex [DONE!]
Next, we need to re-orient all of the error declarations to place CurrentToken
at utmost important. Since this is passed via Context, it's not always clear
if that's available. ErrorCollector should complain HARD if it isn't available.
There are some locations when we don't have a token available. These include:
* Lexing - this can actually have a row and column, but NOT correspond to
a token
* End of document errors - bump this to the end
Actually, we *don't* have to complain if CurrentToken isn't available; we just
set it as a document-wide error. And actually, nothing needs to be done here.
Something interesting to consider is whether or not we care about the locations
of attributes and CSS properties, i.e. the sub-objects that compose these things.
In terms of consistency, at the very least attributes should have column/line
numbers attached to them. However, this may be overkill, as attributes are
uniquely identifiable. You could go even further, with CSS, but they are also
uniquely identifiable.
Bottom-line is, however, this information must be available, in form of the
CurrentAttribute and CurrentCssProperty (theoretical) context variables, and
it must be used to organize the errors that the sub-processes may throw.
There is also a hierarchy of sorts that may make merging this into one context
variable more sense, if it hadn't been for HTML's reasonably rigid structure.
A CSS property will never contain an HTML attribute. So we won't ever get
recursive relations, and having multiple depths won't ever make sense. Leave
this be.
We already have this information, and consequently, using start and end is
*unnecessary*, so long as the context variables are set appropriately. We don't
care if an error was thrown by an attribute transform or an attribute definition;
to the end user these are the same (for a developer, they are different, but
they're better off with a stack trace (which we should add support for) in such
cases).
3. Remove start()/end() code. Don't get rid of recursion, though [DONE]
4. Setup ErrorCollector to use context information to setup hierarchies.
This may require a different internal format. Use objects if it gets
complex. [DONE]
ASIDE
More on this topic: since we are now binding errors to lines
and columns, a particular error can have three relationships to that
specific location:
1. The token at that location directly
RemoveForeignElements
AttrValidator (transforms)
MakeWellFormed
2. A "component" of that token (i.e. attribute)
AttrValidator (removals)
3. A modification to that node (i.e. contents from start to end
token) as a whole
FixNesting
This needs to be marked accordingly. In the presentation, it might
make sense keep (3) separate, have (2) a sublist of (1). (1) can
be a closing tag, in which case (3) makes no sense at all, OR it
should be related with its opening tag (this may not necessarily
be possible before MakeWellFormed is run).
So, the line and column counts as our identifier, so:
$errors[$line][$col] = ...
Then, we need to identify case 1, 2 or 3. They are identified as
such:
1. Need some sort of semaphore in RemoveForeignElements, etc.
2. If CurrentAttr/CurrentCssProperty is non-null
3. Default (FixNesting, MakeWellFormed)
One consideration about (1) is that it usually is actually a
(3) modification, but we have no way of knowing about that because
of various optimizations. However, they can probably be treated
the same. The other difficulty is that (3) is never a line and
column; rather, it is a range (i.e. a duple) and telling the user
the very start of the range may confuse them. For example,
<b>Foo<div>bar</div></b>
^ ^
The node being operated on is <b>, so the error would be assigned
to the first caret, with a "node reorganized" error. Then, the
ChildDef would have submitted its own suggestions and errors with
regard to what's going in the internals. So I suppose this is
ok. :-)
Now, the structure of the earlier mentioned ... would be something
like this:
object {
type = (token|attr|property),
value, // appropriate for type
errors => array(),
sub-errors = [recursive],
}
This helps us keep things agnostic. It is also sufficiently complex
enough to warrant an object.
So, more wanking about the object format is in order. The way HTML Purifier is
currently setup, the only possible hierarchy is:
token -> attr -> css property
These relations do not exist all of the time; a comment or end token would not
ever have any attributes, and non-style attributes would never have CSS properties
associated with them.
I believe that it is worth supporting multiple paths. At some point, we might
have a hierarchy like:
* -> syntax
-> token -> attr -> css property
-> url
-> css stylesheet <style>
et cetera. Now, one of the practical implications of this is that every "node"
on our tree is well-defined, so in theory it should be possible to either 1.
create a separate class for each error struct, or 2. embed this information
directly into HTML Purifier's token stream. Embedding the information in the
token stream is not a terribly good idea, since tokens can be removed, etc.
So that leaves us with 1... and if we use a generic interface we can cut down
on a lot of code we might need. So let's leave it like this.
~~~~
Then we setup suggestions.
5. Setup a separate error class which tells the user any modifications
HTML Purifier made.
Some information about this:
Our current paradigm is to tell the user what HTML Purifier did to the HTML.
This is the most natural mode of operation, since that's what HTML Purifier
is all about; it was not meant to be a validator.
However, most other people have experience dealing with a validator. In cases
where HTML Purifier unambiguously does the right thing, simply giving the user
the correct version isn't a bad idea, but problems arise when:
- The user has such bad HTML we do something odd, when we should have just
flagged the HTML as an error. Such examples are when we do things like
remove text from directly inside a <table> tag. It was probably meant to
be in a <td> tag or be outside the table, but we're not smart enough to
realize this so we just remove it. In such a case, we should tell the user
that there was foreign data in the table, but then we shouldn't "demand"
the user remove the data; it's more of a "here's a possible way of
rectifying the problem"
- Giving line context for input is hard enough, but feasible; giving output
line context will be extremely difficult due to shifting lines; we'd probably
have to track what the tokens are and then find the appropriate out context
and it's not guaranteed to work etc etc etc.
````````````
Don't forget to spruce up output.
6. Output needs to automatically give line and column numbers, basically
"at line" on steroids. Look at W3C's output; it's ok. [PARTIALLY DONE]
- We need a standard CSS to apply (check demo.css for some starting
styling; some buttons would also be hip)
vim: et sw=4 sts=4

View File

@@ -2,23 +2,16 @@
Filter Levels
When one size *does not* fit all
The more I think about it, the less sense it makes for maintaining one huge
monolithic HTMLDefinition class. There's simply so much variation that
could go into this definition: the set of HTML good for blog entries is
definitely too large for HTML that would be allowed in blog comments. Going
from Transitional to Strict requires changes to the definition.
It makes little sense to constrain users to one set of HTML elements and
attributes and tell them that they are not allowed to mold this in
any fashion. Many users demand to be able to custom-select which elements
and attributes they want. This is fine: because HTML Purifier keeps close
track of what elements are safe to use, there is no way for them to
accidently allow an XSS-able tag.
Allowing users to specify their own whitelists is one step (implemented, btw),
but I have doubts on only doing this. Simply put, the typical programmer is too
lazy to actually go through the trouble of investigating which tags, attributes
and properties to allow. HTMLDefinition makes a big part of what HTMLPurifier
is.
The idea, then, is to setup fundamentally different set of definitions, which
can further be customized using simpler configuration options. Alternatively,
they could be implemented as configuration profiles, which simply load
a set of recommended directives to acheive a desired affect (no simpler
config options though).
However, combing through the HTML spec to make your own whitelist can
be a daunting task. HTML Purifier ought to offer pre-canned filter levels
that amateur users can select based on what they think is their use-case.
Here are some fuzzy levels you could set:
@@ -39,13 +32,17 @@ Here are some fuzzy levels you could set:
One final note: when you start axing tags that are more commonly used, you
run the risk of accidentally destroying user data, especially if the data
is incoming from a WYSIWYG eidtor that hasn't been synced accordingly. This may
is incoming from a WYSIWYG editor that hasn't been synced accordingly. This may
make forbidden element to text transformations desirable (for example, images).
== Element Risk Analysis ==
Although none of the currently supported elements presents a security
threat per-say, some can cause problems for page layouts or be
extremely complicated.
Legend:
[danger level] - regular tags / uncommon tags ~ deprecated tags
[danger level]* - rare tags
@@ -114,6 +111,10 @@ Partially presentational - table.cellpadding, table.cellspacing,
== CSS Risk Analysis ==
Currently, there is no support for fine-grained "allowed CSS" specification,
mainly because I'm lazy, partially because no one has asked for it. However,
this will be added eventually.
There are certain CSS elements that are extremely useful inline, but then
as you get to more presentation oriented styling it may not always be
appropriate to inline them.
@@ -126,8 +127,11 @@ any CSS properties that are not currently implemented (such as position).
Dangerous, can go outside container - float
Easy to abuse - font-size, font-family (font), width
Colored - background-color (background), border-color (border), color
(see proposal-colors.html)
Dramatic - border, list-style-position (list-style), margin, padding,
text-align, text-indent, text-transform, vertical-align, line-height
Dramatic elements substantially change the look of text in ways that should
probably have been reserved to other areas.
vim: et sw=4 sts=4

View File

@@ -1,42 +1,6 @@
We are going to model our I18N/L10N off of MediaWiki's system. Their's is
obviously quite complicated, so we're going to simplify it a bit for our needs.
== Structure ==
First, you have a Language object. This object contains all the localisable
message strings, as well as other important language-specific settings and
custom behavior (uppercasing, lowercasing, printing dates, formatting
numbers, etc.)
The object is constructed from two sources: subclassed versions of itself
(classes) and Message files (messages).
== General use ==
You load a language object by calling the Language::factory() function.
This function the class file for the object (taking in account fallback
languages by using the fallback langauge's object but overloading the
language key) and returns that object. Nothing else happens.
When a message/etc is requested, a lazy load initializor is called. Now the
real work starts. We're first going to take the scenario that the language
is not cached. The system loads the Messages file by:
require( $filename );
$cache = compact( self::$mLocalisationKeys );
...where self::$mLocalisationKeys is the name of variables that could be used
in the localization file. This lets you use things like:
$fallback = false;
$rtl = false;
...and easily siphon them into arrays.
Then, we load the $fallback language (if not set, English) to fill in the gaps in
the messages. There is specialized behavior for certain keys, as they can be
mergeable maps, lists or alias lists (not sure what the last one is).
== Caching ==
MediaWiki has lots of caching mechanisms built in, which make the code somewhat
@@ -96,3 +60,5 @@ Neat functionality:
- Roman numeral formatting
Items marked with a + likely need to be addressed by HTML Purifier
vim: et sw=4 sts=4

View File

@@ -2,7 +2,8 @@
Configuration Ideas
Here are some theoretical configuration ideas that we could implement some
time. Note the naming convention: %Namespace.Directive
time. Note the naming convention: %Namespace.Directive. If you want one
implemented, give us a ring, and we'll move it up the priority chain.
%Attr.RewriteFragments - if there's %Attr.IDPrefix we may want to transparently
rewrite the URLs we parse too. However, we can only do it when it's a pure
@@ -15,15 +16,13 @@ time. Note the naming convention: %Namespace.Directive
%Attr.ClassBlacklist. When it's Whitelist, only allow those in
%Attr.ClassWhitelist.
%Attr.MaxWidth,
%Attr.MaxWidth,
%Attr.MaxHeight - caps for width and height related checks.
(the hack in Pixels for an image crashing attack could be replaced by this)
%URI.AddRelNofollow - will add rel="nofollow" to all links, preventing the
spread of ill-gotten pagerank
%URI.RelativeToAbsolute - transforms all relative URIs to absolute form
%URI.HostBlacklistRegex - regexes that if matching the host are disallowed
%URI.HostWhitelist - domain names that are excluded from the host blacklist
%URI.HostPolicy - determines whether or not its reject all and then whitelist
@@ -42,3 +41,4 @@ time. Note the naming convention: %Namespace.Directive
absolute DNS. While this is actually the preferred method according to
the RFC, most people opt to use a relative domain name relative to . (root).
vim: et sw=4 sts=4

218
docs/proposal-plists.txt Normal file
View File

@@ -0,0 +1,218 @@
THE UNIVERSAL DESIGN PATTERN: PROPERTIES
Steve Yegge
Implementation:
get(name)
put(name, value)
has(name)
remove(name)
iteration, with filtering [this will be our namespaces]
parent
Representations:
- Keys are strings
- It's nice to not need to quote keys (if we formulate our own language,
consider this)
- Property not present representation (key missing)
- Frequent removal/re-add may have null help. If null is valid, use
another value. (PHP semantics are weird here)
Data structures:
- LinkedHashMap is wonderful (O(1) access and maintains order)
- Using a special property that points to the parent is usual
- Multiple inheritance possible, need rules for which to lookup first
- Iterative inheritance is best
- Consider performance!
Deletion
- Tricky problem with inheritance
- Distinguish between "not found" and "look in my parent for the property"
[Maybe HTML Purifier won't allow deletion]
Read/write asymmetry (it's correct!)
Read-only plists
- Allow ability to freeze [this is what we have already]
- Don't overuse it
Performance:
- Intern strings (PHP does this already)
- Don't be case-insensitive
- If all properties in a plist are known a-priori, you can use a "perfect"
hash function. Often overkill.
- Copy-on-read caching "plundering" reduces lookup, but uses memory and can
grow stale. Use as last resort.
- Refactoring to fields. Watch for API compatibility, system complexity,
and lack of flexibility.
- Refrigerator: external data-structure to hold plists
Transient properties:
[Don't need to worry about this]
- Use a separate plist for transient properties
- Non-numeric override; numeric should ADD
- Deletion: removeTransientProperty() and transientlyRemoveProperty()
Persistence:
- XML/JSON are good
- Text-based is good for readability, maintainability and bootstrapping
- Compressed binary format for network transport [not necessary]
- RDBMS or XML database
Querying: [not relevant]
- XML database is nice for XPath/XQuery
- jQuery for JSON
- Just load it all into a program
Backfills/Data integrity:
- Use usual methods
- Lazy backfill is a nice hack
Type systems:
- Flags: ReadOnly, Permanent, DontEnum
- Typed properties isn't that useful [It's also Not-PHP]
- Seperate meta-list of directive properties IS useful
- Duck typing is useful for systems designed fully around properties pattern
Trade-off:
+ Flexibility
+ Extensibility
+ Unit-testing/prototype-speed
- Performance
- Data integrity
- Navagability/Query-ability
- Reversability (hard to go back)
HTML Purifier
We are not happy with our current system of defining configuration directives,
because it has become clear that things will get a lot nicer if we allow
multiple namespaces, and there are some features that naturally lend themselves
to inheritance, which we do not really support well.
One of the considered implementation changes would be to go from a structure
like:
array(
'Namespace' => array(
'Directive' => 'val1',
'Directive2' => 'val2',
)
)
to:
array(
'Namespace.Directive' => 'val1',
'Namespace.Directive2' => 'val2',
)
The below implementation takes more memory, however, and it makes it a bit
complicated to grab all values from a namespace.
The alternate implementation choice is to allow nested plists. This keeps
iteration easy, but is problematic for inheritance (it would be difficult
to distinguish a plist from an array) and retrieval (when specifying multiple
namespaces we would need some multiple de-referencing).
----
We can bite the performance hit, and just do iteration with filter
(the strncmp call should be relatively cheap). Then, users should be able
to optimize doing something like:
$config = HTMLPurifier_Config::createDefault();
if (!file_exists('config.php')) {
// set up $config
$config->save('config.php');
} else {
$config->load('config.php');
}
Or maybe memcache, or something. This means that "// set up $config" must
not have any dynamic parts, or the user has to invalidate the cache when
they do update it. We have to think about this a little more carefully; the
file call might be more expensive.
----
This might get expensive, however, when we actually care about iterating
over the configuration and want the actual values. So what about nesting the
lists?
"ns.sub.directive" => values['ns']['sub']['directive']
We can distinguish between plists and arrays by using ArrayObjects for the
plists, and regular arrays for the arrays? Alternatively, use ArrayObjects
for the arrays, and regular arrays for the plists.
----
Implementation demands, and what has caused them:
1. DefinitionCache, the HTML, CSS and URI namespaces have caches attached to them
Results:
- getBatchSerial()
- getBatch() : in general, the ability to traverse just a namespace
2. AutoFormat/Filter, this is a plugin architecture, directives not hard-coded
- getBatch()
3. Configuration form
- Namespaces used to organize directives
Other than that, we have a pure plist. PERHAPS we should maintain separate things
for these different demands.
Issue 2: Directives for configuring the plugins are regular plists, but
when enabling them, while it's "plist-ish", what you're really doing is adding
them to an array of "autoformatters"/"filters" to enable. We can setup
magic BC as well as in the new interface, but there should also be an
add('AutoFormat', 'AutoParagraph'); which does the right thing.
One thing to consider is whether or not inheritance rules will apply to these.
I'd say yes. That means that they're still plisty, in fact, the underlying
implementation will probably be a plist. However, they will get their OWN
plists, and will NOT support nesting.
Issue 1: Our current implementation is generally not efficient; md5(serialize($foo))
is pretty expensive. So, I don't think there will be any problems if it
gets "less" efficient, as long as we give users a properly fast alternative;
DefinitionRev gives us a way to do this, by simply telling the user they must
update it whenever they update Configuration directives as well. (There are
obvious BC concerns here).
In such a case, we simply iterate over our plist (performing full retrievals
for each value), grab the entries we care about, and then serialize and hash.
It's going to be slow either way, due to the ability of plists to inherit.
If we ksort(), we don't have to traverse the entire array, however, the
cost of a ksort() call may not be worth it.
At this point, last time, I started worrying about the performance implications
of allowing inheritance, and wondering whether or not I wanted to squash
the plist. At first blush, our code might be under the assumption that
accessing properties is cheap; but actually we prefer to copy out the value
into a member variable if it's going to be used many times. With this is mind
I don't think CPU consumption from a few nested function calls is going to
be a problem. We *are* going to enforce a function only interface.
The next issue at hand is how we're going to manage the "special" plists,
which should still be able to be inherited. Basically, it means that multiple
plists would be attached to the configuration object, which is not the
best for memory performance. The alternative is to keep them all in one
big plist, and then eat the one-time cost of traversing the entire plist
to grab the appropriate values.
I think at this point we can write the generic interface, and then set up separate
plists if that ends up being necessary for performance (it probably won't.) Now
lets code our generic plist implementation.
----
Iterating over the plist presents some problems. The way we've chosen to solve
this is to squash all of the parents.
----
But I don't need iteration.
vim: et sw=4 sts=4

View File

@@ -0,0 +1,50 @@
Handling Content Model Changes
1. Context
The distinction between Transitional and Strict document types is somewhat
of an anomaly in the lineage of XHTML document types (following 1.0, no
doctypes do not have flavors: instead, modularization is used to let
document authors vary their elements). This transition is usually quite
straight-forward, as W3C usually deprecates attributes or elements, which
are quite easily handled using tag and attribute transforms.
However, for two elements, <blockquote>, <body> and <address>, W3C elected
to also change the content model. <blockquote> and <body> originally
accepted both inline and block elements, but in the strict doctype they
only allow block elements. With <address>, the situation is inverted:
<p> tags were now forbidden from appearing within this tag.
2. Current situation
Currently, HTML Purifier treats <blockquote> specially during Tidy mode
using a custom ChildDef class StrictBlockquote. StrictBlockquote
operates similarly to Required, except that when it encounters an inline
element, it will wrap it in a block tag (as specified by
%HTML.BlockWrapper, the default is <p>). The naming suggests it can
only be used for <blockquote>s, although it may be possible to
genericize it to work on other cases of this nature (this would be of
little practical application, as no other element in XHTML 1.1 or earlier
has a block-only content model).
Tidy currently contains no custom, lenient implementation for <address>.
If one were to be written, it would likely operate on the principle that,
when a <p> tag were to be encountered, it would be replaced with a
leading and trailing <br /> tag (the contents of <p>, being inline, are
not an issue). There is no prior work with this sort of operation.
3. Outside applicability
There are a number of other elements that contain restrictive content
models, such as <ul> or <span> (the latter is restrictive in that it
does not allow block elements). In the former case, an errant node
is eliminated completely, in the latter case, the text of the node
would is preserved (as the parent node does allow PCDATA). Custom
content model implementations probably are not the best way of handling
these cases, instead, node bubbling should be implemented instead.
vim: et sw=4 sts=4

30
docs/ref-css-length.txt Normal file
View File

@@ -0,0 +1,30 @@
CSS Length Reference
To bound, or not to bound, that is the question
It's quite a reasonable request, really, and it's already been implemented
for HTML. That is, length bounding. It makes little sense to let users
define text blocks that have a font-size of 63,360 inches (that's a mile,
by the way) or a width of forty-fold the parent container.
But it's a little more complicated then that. There are multiple units
one can use, and we have to a little unit conversion to get things working.
Here's what we have:
Absolute:
1 in ~= 2.54 cm
1 cm = 10 mm
1 pt = 1/72 in
1 pc = 12 pt
Relative:
1 em ~= 10.0667 px
1 ex ~= 0.5 em, though Mozilla Firefox says 1 ex = 6px
1 px ~= 1 pt
Watch out: font-sizes can also be nested to get successively larger
(although I do not relish having to keep track of context font-sizes,
this may be necessary, especially for some of the more advanced features
for preventing things like white on white).
vim: et sw=4 sts=4

View File

@@ -15,7 +15,7 @@
<div id="filing">Filed under Reference</div>
<div id="index">Return to the <a href="index.html">index</a>.</div>
<div id="home"><a href="http://hp.jpsband.org/">HTML Purifier</a> End-User Documentation</div>
<div id="home"><a href="http://htmlpurifier.org/">HTML Purifier</a> End-User Documentation</div>
<p>Many thanks to the DevNetwork community for answering questions,
theorizing about design, and offering encouragement during
@@ -40,6 +40,8 @@ the development of this library in these forum threads:</p>
<p>...as well as any I may have forgotten.</p>
<div id="version">$Id$</div>
</body>
</html>
</html>
<!-- vim: et sw=4 sts=4
-->

View File

@@ -0,0 +1,166 @@
The Modularization of HTMLDefinition in HTML Purifier
WARNING: This document was drafted before the implementation of this
system, and some implementation details may have evolved over time.
HTML Purifier uses the modularization of XHTML
<http://www.w3.org/TR/xhtml-modularization/> to organize the internals
of HTMLDefinition into a more manageable and extensible fashion. Rather
than have one super-object, HTMLDefinition is split into HTMLModules,
each of which are responsible for defining elements, their attributes,
and other properties (for a more indepth coverage, see
/library/HTMLPurifier/HTMLModule.php's docblock comments). These modules
are managed by HTMLModuleManager.
Modules that we don't support but could support are:
* 5.6. Table Modules
o 5.6.1. Basic Tables Module [?]
* 5.8. Client-side Image Map Module [?]
* 5.9. Server-side Image Map Module [?]
* 5.12. Target Module [?]
* 5.21. Name Identification Module [deprecated]
These modules would be implemented as "unsafe":
* 5.2. Core Modules
o 5.2.1. Structure Module
* 5.3. Applet Module
* 5.5. Forms Modules
o 5.5.1. Basic Forms Module
o 5.5.2. Forms Module
* 5.10. Object Module
* 5.11. Frames Module
* 5.13. Iframe Module
* 5.14. Intrinsic Events Module
* 5.15. Metainformation Module
* 5.16. Scripting Module
* 5.17. Style Sheet Module
* 5.19. Link Module
* 5.20. Base Module
We will not be using W3C's XML Schemas or DTDs directly due to the lack
of robust tools for handling them (the main problem is that all the
current parsers are usually PHP 5 only and solely-validating, not
correcting).
This system may be generalized and ported over for CSS.
== General Use-Case ==
The outwards API of HTMLDefinition has been largely preserved, not
only for backwards-compatibility but also by design. Instead,
HTMLDefinition can be retrieved "raw", in which it loads a structure
that closely resembles the modules of XHTML 1.1. This structure is very
dynamic, making it easy to make cascading changes to global content
sets or remove elements in bulk.
However, once HTML Purifier needs the actual definition, it retrieves
a finalized version of HTMLDefinition. The finalized definition involves
processing the modules into a form that it is optimized for multiple
calls. This final version is immutable and, even if editable, would
be extremely hard to change.
So, some code taking advantage of the XHTML modularization may look
like this:
<?php
$config = HTMLPurifier_Config::createDefault();
$def =& $config->getHTMLDefinition(true); // reference to raw
$def->addElement('marquee', 'Block', 'Flow', 'Common');
$purifier = new HTMLPurifier($config);
$purifier->purify($html); // now the definition is finalized
?>
== Inclusions ==
One of the nice features of HTMLDefinition is that piggy-backing off
of global attribute and content sets is extremely easy to do.
=== Attributes ===
HTMLModule->elements[$element]->attr stores attribute information for the
specific attributes of $element. This is quite close to the final
API that HTML Purifier interfaces with, but there's an important
extra feature: attr may also contain a array with a member index zero.
<?php
HTMLModule->elements[$element]->attr[0] = array('AttrSet');
?>
Rather than map the attribute key 0 to an array (which should be
an AttrDef), it defines a number of attribute collections that should
be merged into this elements attribute array.
Furthermore, the value of an attribute key, attribute value pair need
not be a fully fledged AttrDef object. They can also be a string, which
signifies a AttrDef that is looked up from a centralized registry
AttrTypes. This allows more concise attribute definitions that look
more like W3C's declarations, as well as offering a centralized point
for modifying the behavior of one attribute type. And, of course, the
old method of manually instantiating an AttrDef still works.
=== Attribute Collections ===
Attribute collections are stored and processed in the AttrCollections
object, which is responsible for performing the inclusions signified
by the 0 index. These attribute collections, too, are mutable, by
using HTMLModule->attr_collections. You may add new attributes
to a collection or define an entirely new collection for your module's
use. Inclusions can also be cumulative.
Attribute collections allow us to get rid of so called "global attributes"
(which actually aren't so global).
=== Content Models and ChildDef ===
An implementation of the above-mentioned attributes and attribute
collections was applied to the ChildDef system. HTML Purifier uses
a proprietary system called ChildDef for performance and flexibility
reasons, but this does not line up very well with W3C's notion of
regexps for defining the allowed children of an element.
HTMLPurifier->elements[$element]->content_model and
HTMLPurifier->elements[$element]->content_model_type store information
about the final ChildDef that will be stored in
HTMLPurifier->elements[$element]->child (we use a different variable
because the two forms are sufficiently different).
$content_model is an abstract, string representation of the internal
state of ChildDef, while $content_model_type is a string identifier
of which ChildDef subclass to instantiate. $content_model is processed
by substituting all content set identifiers (capitalized element names)
with their contents. It is then parsed and passed into the appropriate
ChildDef class, as defined by the ContentSets->getChildDef() or the
custom fallback HTMLModule->getChildDef() for custom child definitions
not in the core.
You'll need to use these facilities if you plan on referencing a content
set like "Inline" or "Block", and using them is recommended even if you're
not due to their conciseness.
A few notes on $content_model: it's structure can be as complicated
as you want, but the pipe symbol (|) is reserved for defining possible
choices, due to the content sets implementation. For example, a content
model that looks like:
"Inline -> Block -> a"
...when the Inline content set is defined as "span | b" and the Block
content set is defined as "div | blockquote", will expand into:
"span | b -> div | blockquote -> a"
The custom HTMLModule->getChildDef() function will need to be able to
then feed this information to ChildDef in a usable manner.
=== Content Sets ===
Content sets can be altered using HTMLModule->content_sets, an associative
array of content set names to content set contents. If the content set
already exists, your values are appended on to it (great for, say,
registering the font tag as an inline element), otherwise it is
created. They are substituted into content_model.
vim: et sw=4 sts=4

View File

@@ -1,37 +0,0 @@
Loose versus Strict
Changes from one doctype to another
There are changes. Wow, how insightful. Not everything changed is relevant
to HTML Purifier, though, so let's take a look:
== Major incompatibilities ==
[done] BLOCKQUOTE changes from 'flow' to 'block'
current behavior: inline inner contents should not be nuked, block-ify as necessary
[partially-done] U, S, STRIKE cut
current behavior: removed completely
projected behavior: replace with appropriate inline span + CSS
[done] ADDRESS from potpourri to Inline (removes p tags)
current behavior: block tags silently dropped
ideal behavior: replace tags with something like <br>. (not high priority)
== Things we can loosen up ==
Tags DIR, MENU, CENTER, ISINDEX, FONT, BASEFONT? allowed in loose
current behavior: transform to strict-valid forms
Attributes allowed in loose (see attribute transforms in 'dev-progress.html')
current behavior: projected to transform into strict-valid forms
== Periphery issues ==
A tag's attribute 'target' (for selecting frames) cut
current behavior: not allowed at all
projected behavior: use loose doctype if needed, needs valid values
[done] OL/LI tag's attribute 'start'/'value' (for renumbering lists) cut
current behavior: no substitute, just delete when in strict, allow in loose
Attribute 'name' deprecated in favor of 'id'
current behavior: dropped silently
projected behavior: create proper AttrTransform (currently not allowed at all)
[done] PRE tag allows SUB/SUP? (strict dtd comment vs syntax, loose disallows)
current behavior: disallow as usual

View File

@@ -18,5 +18,9 @@ HTML Purifier context.
<listing>, monospace pre-variant (extremely rare)
<plaintext>, escapes all tags to the end of document
<ruby> and friends, (more research needed, appears to be XHTML 1.1 markup)
<xmp>, monospace, replace with pre
These should be put into their own Tidy module, not loaded by default(?). These
all qualify as "lenient" transforms.
vim: et sw=4 sts=4

View File

@@ -1,37 +0,0 @@
Is HTML Purifier Strict or Transitional?
A little bit of helpful guidance
Despite the fact that HTML Purifier professes to support both transitional and
strict HTML, it rejects a lot of attributes and elements that are actually, indeed,
valid. You can investigate progress.html to find out precisely what we
are doing to these *deprecated* attributes.
However, users have found that Strict HTML imposes some quite unreasonable
restrictions on certain things. The start and value attributes in ol and
li (respectively) perhaps are the most contested. There's is currently no
widely supported browser method short of JavaScript that can replace these
two deprecated elements. It behooves us to allow these deprecated
attributes when the output is transitional.
Fortunantely, that's the only real bugger case. The others have near-perfect
CSS equivalents, and were presentational anyway. However, the other question
pops up: should we always convert these to the CSS forms when 1. the spec
allows them anyway and 2. older browsers support them better? After all, the
whole point about CSS is to seperate styling from content, so inline styling
doesn't solve that problem.
It's an icky question, and we'll have to deal with it as more and more
transforms get implemented. As of right now, however, we currently support
these loose-only constructs in loose mode:
- <ul start="1">, <li value="1"> attributes
- <u>, <strike>, <s> tags
- flow children in <blockquote>
- mixed children in <address>
The changed child definitions as well as the ul.start li.value are the most
compelling reasons why loose should be used. We may want offer disabling <u>,
<strike> and <s> by themselves. We may also want to offer no pre-emptive
deprecated conversions. This all must be unified.

View File

@@ -2,8 +2,25 @@
Web Hypertext Application Technology Working Group
WHATWG
I don't think we need to worry about them. Untrusted users shouldn't be
submitting applications, eh? But if some interesting attribute pops up in
their spec, and might be worth supporting, stick it here.
== HTML 5 ==
(none so far, as you can see)
URL: http://www.whatwg.org/specs/web-apps/current-work/
HTML 5 defines a kaboodle of new elements and attributes, as well as
some well-defined, "quirks mode" HTML parsing. Although WHATWG professes
to be targeted towards web applications, many of their semantic additions
would be quite useful in regular documents. Eventually, HTML
Purifier will need to audit their lists and figure out what changes need
to be made. This process is complicated by the fact that the WHATWG
doesn't buy into W3C's modularization of XHTML 1.1: we may need
to remodularize HTML 5 (probably done by section name). No sense in
committing ourselves till the spec stabilizes, though.
More immediately speaking though, however, is the well-defined parsing
behavior that HTML 5 adds. While I have little interest in writing
another DirectLex parser, other parsers like ph5p
<http://jero.net/lab/ph5p/> can be adapted to DOMLex to support much more
flexible HTML parsing (a cool feature I've seen is how they resolve
<b>bold<i>both</b>italic</i>).
vim: et sw=4 sts=4

View File

@@ -1,21 +0,0 @@
Getting XHTML 1.1 Working
It's quite simple, according to <http://www.w3.org/TR/xhtml11/changes.html>
1. Scratch lang entirely in favor of xml:lang
2. Scratch name entirely in favor of id (partially-done)
3. Support Ruby <http://www.w3.org/TR/2001/REC-ruby-20010531/>
...but that's only an informative section. More things to do:
1. Scratch style attribute (it's deprecated)
2. Be module-aware (this might entail intelligent grouping in the definition
and allowing users to specifically remove certain modules (see 5))
3. Cross-reference minimal content models with existing DTDs and determine
changes (todo)
4. Watch out for the Legacy Module
<http://www.w3.org/TR/2001/REC-xhtml-modularization-20010410/abstract_modules.html#s_legacymodule>
5. Let users specify their own custom modules
6. Study Modularization document
<http://www.w3.org/TR/2001/REC-xhtml-modularization-20010410/>

10
docs/specimens/LICENSE Normal file
View File

@@ -0,0 +1,10 @@
Licensing of Specimens
Some files in this directory have different licenses:
windows-live-mail-desktop-beta.html - donated by laacz, public domain
img.png - LGPL, from <http://commons.wikimedia.org/wiki/Image:Pastille_chrome.png>
All other files are by me, and are licensed under LGPL.
vim: et sw=4 sts=4

View File

@@ -0,0 +1,165 @@
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<title>HTML align attribute to CSS - HTML Purifier Specimen</title>
<style type="text/css">
div.container {position:relative;height:110px;}
div.container.legend .test {text-align:center;line-height:100px;}
div.test {width:100px;height:100px;border:1px solid black;
position:absolute;top:10px;}
div.test.html {left:10px;}
div.test.css {left:140px;}
table {background:#F00;}
img {border:1px solid #000;}
hr {width:50px;}
div.segment {width:250px; float:left; margin-top:1em;}
</style>
</head>
<body>
<h1>HTML align attribute to CSS</h1>
<p>Inspect source for methodology.</p>
<div class="container legend">
<div class="test html">
HTML
</div>
<div class="test css">
CSS
</div>
</div>
<div class="segment">
<h2>table.align</h2>
<h3>left</h3>
<div class="container">
<div class="test html">
a<table align="left"><tr><td>O</td></tr></table>a
</div>
<div class="test css">
a<table style="float:left;"><tr><td>O</td></tr></table>a
</div>
</div>
<h3>center</h3>
<div class="container">
<div class="test html">
a<table align="center"><tr><td>O</td></tr></table>a
</div>
<div class="test css">
a<table style="margin-left:auto; margin-right:auto;"><tr><td>O</td></tr></table>a
</div>
</div>
<h3>right</h3>
<div class="container">
<div class="test html">
a<table align="right"><tr><td>O</td></tr></table>a
</div>
<div class="test css">
a<table style="float:right;"><tr><td>O</td></tr></table>a
</div>
</div>
</div>
<!-- ################################################################## -->
<div class="segment">
<h2>img.align</h2>
<h3>left</h3>
<div class="container">
<div class="test html">
a<img src="img.png" align="left">a
</div>
<div class="test css">
a<img src="img.png" style="float:left;">a
</div>
</div>
<h3>right</h3>
<div class="container">
<div class="test html">
a<img src="img.png" align="right">a
</div>
<div class="test css">
a<img src="img.png" style="float:right;">a
</div>
</div>
<h3>bottom</h3>
<div class="container">
<div class="test html">
a<img src="img.png" align="bottom">a
</div>
<div class="test css">
a<img src="img.png" style="vertical-align:baseline;">a
</div>
</div>
<h3>middle</h3>
<div class="container">
<div class="test html">
a<img src="img.png" align="middle">a
</div>
<div class="test css">
a<img src="img.png" style="vertical-align:middle;">a
</div>
</div>
<h3>top</h3>
<div class="container">
<div class="test html">
a<img src="img.png" align="top">a
</div>
<div class="test css">
a<img src="img.png" style="vertical-align:top;">a
</div>
</div>
</div>
<!-- ################################################################## -->
<div class="segment">
<h2>hr.align</h2>
<h3>left</h3>
<div class="container">
<div class="test html">
<hr align="left" />
</div>
<div class="test css">
<hr style="margin-right:auto; margin-left:0; text-align:left;" />
</div>
</div>
<h3>center</h3>
<div class="container">
<div class="test html">
<hr align="center" />
</div>
<div class="test css">
<hr style="margin-right:auto; margin-left:auto; text-align:center;" />
</div>
</div>
<h3>right</h3>
<div class="container">
<div class="test html">
<hr align="right" />
</div>
<div class="test css">
<hr style="margin-right:0; margin-left:auto; text-align:right;" />
</div>
</div>
</div>
</body>
</html>

BIN
docs/specimens/img.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.1 KiB

View File

@@ -0,0 +1,129 @@
<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=us-ascii">
<meta name=Generator content="Microsoft Word 12 (filtered medium)">
<!--[if !mso]>
<style>
v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
..shape {behavior:url(#default#VML);}
</style>
<![endif]-->
<style>
<!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Tahoma;
panose-1:2 11 6 4 3 5 4 4 2 4;}
@font-face
{font-family:Verdana;
panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
margin-bottom:.0001pt;
font-size:10.0pt;
font-family:"Verdana","sans-serif";}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
p.MsoAcetate, li.MsoAcetate, div.MsoAcetate
{mso-style-priority:99;
mso-style-link:"Balloon Text Char";
margin:0cm;
margin-bottom:.0001pt;
font-size:8.0pt;
font-family:"Tahoma","sans-serif";}
span.EmailStyle17
{mso-style-type:personal-compose;
font-family:"Verdana","sans-serif";
color:windowtext;}
span.BalloonTextChar
{mso-style-name:"Balloon Text Char";
mso-style-priority:99;
mso-style-link:"Balloon Text";
font-family:"Tahoma","sans-serif";}
..MsoChpDefault
{mso-style-type:export-only;}
@page Section1
{size:612.0pt 792.0pt;
margin:70.85pt 70.85pt 70.85pt 70.85pt;}
div.Section1
{page:Section1;}
-->
</style>
<!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="2050" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang=NL link=blue vlink=purple>
<div class=Section1>
<p class=MsoNormal><img width=1277 height=994 id="Picture_x0020_1"
src="cid:image001.png@01C8CBDF.5D1BAEE0"><o:p></o:p></p>
<p class=MsoNormal><o:p>&nbsp;</o:p></p>
<p class=MsoNormal><b>Name<o:p></o:p></b></p>
<p class=MsoNormal>E-mail : <a href="mailto:mail@example.com"><span
style='color:windowtext'>mail@example.com</span></a><o:p></o:p></p>
<p class=MsoNormal><o:p>&nbsp;</o:p></p>
<p class=MsoNormal><b>Company<o:p></o:p></b></p>
<p class=MsoNormal>Address 1<o:p></o:p></p>
<p class=MsoNormal>Address 2<o:p></o:p></p>
<p class=MsoNormal><o:p>&nbsp;</o:p></p>
<p class=MsoNormal>Telefoon&nbsp; : +xx xx xxx xxx xx <span style='color:black'><o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US style='color:black'>Fax&nbsp; : +xx xx xxx xx xx<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US style='color:black'>Internet : </span><span
style='color:black'><a href="http://www.example.com/"><span lang=EN-US
style='color:black'>http://www.example.com</span></a></span><span
lang=EN-US style='color:black'><o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US style='color:black'>Kamer van koophandel
xxxxxxxxx<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US style='color:black'><o:p>&nbsp;</o:p></span></p>
<p class=MsoNormal><span lang=EN-US style='font-size:7.5pt;color:black'>Op deze
e-mail is een disclaimer van toepassing, ga naar </span><span lang=EN-US
style='font-size:7.5pt'><a
href="http://www.example.com/disclaimer"><span
style='color:black'>www.example.com/disclaimer</span></a><br>
<span style='color:black'>A disclaimer is applicable to this email, please
refer to </span><a href="http://www.example.com/disclaimer"><span
style='color:black'>www.example.com/disclaimer</span></a><o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US><o:p>&nbsp;</o:p></span></p>
</div>
</body>
</html>

View File

@@ -0,0 +1,74 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML ChildAreas="4" xmlns:canvas><HEAD>
<META http-equiv=Content-Type content=text/html;charset=windows-1257>
<STYLE></STYLE>
<META content="MSHTML 6.00.6000.16414" name=GENERATOR></HEAD>
<BODY id=MailContainerBody
style="PADDING-RIGHT: 10px; PADDING-LEFT: 10px; FONT-SIZE: 10pt; COLOR: #000000; PADDING-TOP: 15px; FONT-FAMILY: Arial"
bgColor=#ff6600 leftMargin=0 background="" topMargin=0
name="Compose message area" acc_role="text" CanvasTabStop="false">
<DIV
style="BORDER-TOP: #dddddd 1px solid; FONT-SIZE: 10pt; WIDTH: 100%; MARGIN-RIGHT: 10px; PADDING-TOP: 5px; BORDER-BOTTOM: #dddddd 1px solid; FONT-FAMILY: Verdana; HEIGHT: 25px; BACKGROUND-COLOR: #ffffff"><NOBR><SPAN
title="View a slideshow of the pictures in this e-mail message."
style="PADDING-RIGHT: 20px"><A style="COLOR: #0088e4"
href="http://g.msn.com/5meen_us/171?path=/photomail/{6fc0065f-ffdd-4ca6-9a4c-cc5a93dc122f}&amp;image=47D7B182CFEFB10!127&amp;imagehi=47D7B182CFEFB10!125&amp;CID=323550092004883216">Play
slideshow </A></SPAN><SPAN style="COLOR: #909090"><SPAN>|</SPAN><SPAN
style="PADDING-LEFT: 20px"> Download the highest quality version of a picture by
clicking the + above it </SPAN></SPAN></NOBR></DIV>
<DIV
style="PADDING-RIGHT: 5px; PADDING-LEFT: 7px; PADDING-BOTTOM: 2px; WIDTH: 100%; PADDING-TOP: 2px">
<OL>
<LI><IMG title="Angry smile emoticon"
style="FLOAT: none; MARGIN: 0px; POSITION: static" tabIndex=-1
alt="Angry smile emoticon" src="cid:49F0C856199E4D688D2D740680733D74@wc"
MSNNonUserImageOrEmoticon="true">Un ka <FONT style="BACKGROUND-COLOR: #800000"
color=#cc99ff><STRONG>Tev</STRONG></FONT> iet, un ko tu dari?
<LI>Aha!</LI></OL>
<UL>
<LI>Buletets
<LI>
<DIV align=justify><A title=http://laacz.lv/blog/
href="http://laacz.lv/blog/">http://laacz.lv/blog/</A> un <A
title=http://google.com/ href="http://google.com/">gugle</A></DIV>
<LI>Sarakstucitis</LI></UL></DIV><SPAN><SPAN xmlns:canvas="canvas-namespace-id"
layoutEmptyTextWellFont="Tahoma"><SPAN
style="MARGIN-BOTTOM: 15px; OVERFLOW: visible; HEIGHT: 16px"></SPAN><SPAN
style="MARGIN-BOTTOM: 25px; VERTICAL-ALIGN: top; OVERFLOW: visible; MARGIN-RIGHT: 25px; HEIGHT: 234px">
<TABLE style="DISPLAY: inline">
<TBODY>
<TR>
<TD>
<DIV
style="FONT-WEIGHT: bold; FONT-SIZE: 12pt; FONT-FAMILY: arial; TEXT-ALIGN: center"><A
id=HiresARef
title="Click here to view or download a high resolution version of this picture"
style="COLOR: #0088e4; TEXT-DECORATION: none"
href="http://byfiles.storage.msn.com/x1pMvt0I80jTgT6DuaCpEMbprX3nk3jNv_vjigxV_EYVSMyM_PKgEvDEUtuNhQC-F-23mTTcKyqx6eGaeK2e_wMJ0ikwpDdFntk4SY7pfJUv2g2Ck6R2S2vAA?download">+</A></DIV>
<DIV
title="Click here to view the full image using the online photo viewer."
style="DISPLAY: inline; OVERFLOW: hidden; WIDTH: 140px; HEIGHT: 140px"><A
href="http://g.msn.com/5meen_us/171?path=/photomail/{6fc0065f-ffdd-4ca6-9a4c-cc5a93dc122f}&amp;image=47D7B182CFEFB10!127&amp;imagehi=47D7B182CFEFB10!125&amp;CID=323550092004883216"
border="0"><IMG
style="MARGIN-TOP: 15px; DISPLAY: inline-block; MARGIN-LEFT: 0px"
height=109 src="cid:006A71303B80404E9FB6184E55D6A446@wc" width=140
border=0></A></DIV></TD></TR>
<TR>
<TD>
<DIV
style="FONT-SIZE: 10pt; WIDTH: 140px; FONT-FAMILY: verdana; TEXT-ALIGN: center"><EM><STRONG>This
<U>is </U></STRONG><U>tit</U>le</EM> fo<STRONG>r <FONT
face="Arial Black">t<FONT color=#800000 size=7>h<U>i</U></FONT>s
</FONT>picture</STRONG></DIV></TD></TR></TBODY></TABLE></SPAN></SPAN></SPAN>
<DIV
style="PADDING-RIGHT: 5px; PADDING-LEFT: 7px; PADDING-BOTTOM: 2px; WIDTH: 100%; PADDING-TOP: 2px; HEIGHT: 50px">
<DIV>&nbsp;</DIV></DIV>
<DIV
style="BORDER-TOP: #dddddd 1px solid; FONT-SIZE: 10pt; MARGIN-BOTTOM: 10px; WIDTH: 100%; COLOR: #909090; MARGIN-RIGHT: 10px; PADDING-TOP: 9px; FONT-FAMILY: Verdana; HEIGHT: 42px; BACKGROUND-COLOR: #ffffff"><NOBR><SPAN
title="Join Windows Live to share photos using Windows Live Photo E-mail.">Online
pictures are available for 30 days. <A style="COLOR: #0088e4"
href="http://g.msn.com/5meen_us/175">Get Windows Live Mail desktop to create
your own photo e-mails. </A></SPAN></NOBR></DIV></BODY></HTML>

View File

@@ -25,6 +25,7 @@ h4 {font-family:sans-serif; font-size:0.9em; font-weight:bold; }
.aside {margin-left:2em; font-family:sans-serif; font-size:0.9em; }
blockquote .label {font-weight:bold; font-size:1em; margin:0 0 .1em;
border-bottom:1px solid #CCC;}
.emphasis {font-weight:bold; text-align:center; font-size:1.3em;}
/* A regular table */
.table {border-collapse:collapse; border-bottom:2px solid #888; margin-left:2em; }
@@ -32,6 +33,9 @@ blockquote .label {font-weight:bold; font-size:1em; margin:0 0 .1em;
.table thead th:first-child {-moz-border-radius-topleft:1em;}
.table tbody td {border-bottom:1px solid #CCC; padding-right:0.6em;padding-left:0.6em;}
/* A quick table*/
table.quick tbody th {text-align:right; padding-right:1em;}
/* Category of the file */
#filing {font-weight:bold; font-size:smaller; }
@@ -42,3 +46,31 @@ blockquote .label {font-weight:bold; font-size:1em; margin:0 0 .1em;
/* Contains, without exception, $Id$, for SVN version info. */
#version {text-align:right; font-style:italic; margin:2em 0;}
#toc ol ol {list-style-type:lower-roman;}
#toc ol {list-style-type:decimal;}
#toc {list-style-type:upper-alpha;}
q {
behavior: url(fixquotes.htc); /* IE fix */
quotes: '\201C' '\201D' '\2018' '\2019';
}
q:before {
content: open-quote;
}
q:after {
content: close-quote;
}
/* Marks off implementation details interesting only to the person writing
the class described in the spec. */
.technical {margin-left:2em; }
.technical:before {content:"Technical note: "; font-weight:bold; color:#061; }
/* Marks off sections that are lacking. */
.fixme {margin-left:2em; }
.fixme:before {content:"Fix me: "; font-weight:bold; color:#C00; }
#applicability {margin: 1em 5%; font-style:italic;}
/* vim: et sw=4 sts=4 */

View File

@@ -0,0 +1,91 @@
<?php
/**
* Decorator/extender XSLT processor specifically for HTML documents.
*/
class ConfigDoc_HTMLXSLTProcessor
{
/**
* Instance of XSLTProcessor
*/
protected $xsltProcessor;
public function __construct($proc = false)
{
if ($proc === false) $proc = new XSLTProcessor();
$this->xsltProcessor = $proc;
}
/**
* @note Allows a string $xsl filename to be passed
*/
public function importStylesheet($xsl)
{
if (is_string($xsl)) {
$xsl_file = $xsl;
$xsl = new DOMDocument();
$xsl->load($xsl_file);
}
return $this->xsltProcessor->importStylesheet($xsl);
}
/**
* Transforms an XML file into compatible XHTML based on the stylesheet
* @param $xml XML DOM tree, or string filename
* @return string HTML output
* @todo Rename to transformToXHTML, as transformToHTML is misleading
*/
public function transformToHTML($xml)
{
if (is_string($xml)) {
$dom = new DOMDocument();
$dom->load($xml);
} else {
$dom = $xml;
}
$out = $this->xsltProcessor->transformToXML($dom);
// fudges for HTML backwards compatibility
// assumes that document is XHTML
$out = str_replace('/>', ' />', $out); // <br /> not <br/>
$out = str_replace(' xmlns=""', '', $out); // rm unnecessary xmlns
if (class_exists('Tidy')) {
// cleanup output
$config = array(
'indent' => true,
'output-xhtml' => true,
'wrap' => 80
);
$tidy = new Tidy;
$tidy->parseString($out, $config, 'utf8');
$tidy->cleanRepair();
$out = (string) $tidy;
}
return $out;
}
/**
* Bulk sets parameters for the XSL stylesheet
* @param array $options Associative array of options to set
*/
public function setParameters($options)
{
foreach ($options as $name => $value) {
$this->xsltProcessor->setParameter('', $name, $value);
}
}
/**
* Forward any other calls to the XSLT processor
*/
public function __call($name, $arguments)
{
call_user_func_array(array($this->xsltProcessor, $name), $arguments);
}
}
// vim: et sw=4 sts=4

164
extras/FSTools.php Normal file
View File

@@ -0,0 +1,164 @@
<?php
/**
* Filesystem tools not provided by default; can recursively create, copy
* and delete folders. Some template methods are provided for extensibility.
*
* @note This class must be instantiated to be used, although it does
* not maintain state.
*/
class FSTools
{
private static $singleton;
/**
* Returns a global instance of FSTools
*/
public static function singleton()
{
if (empty(FSTools::$singleton)) FSTools::$singleton = new FSTools();
return FSTools::$singleton;
}
/**
* Sets our global singleton to something else; useful for overloading
* functions.
*/
public static function setSingleton($singleton)
{
FSTools::$singleton = $singleton;
}
/**
* Recursively creates a directory
* @param string $folder Name of folder to create
* @note Adapted from the PHP manual comment 76612
*/
public function mkdirr($folder)
{
$folders = preg_split("#[\\\\/]#", $folder);
$base = '';
for($i = 0, $c = count($folders); $i < $c; $i++) {
if(empty($folders[$i])) {
if (!$i) {
// special case for root level
$base .= DIRECTORY_SEPARATOR;
}
continue;
}
$base .= $folders[$i];
if(!is_dir($base)){
$this->mkdir($base);
}
$base .= DIRECTORY_SEPARATOR;
}
}
/**
* Copy a file, or recursively copy a folder and its contents; modified
* so that copied files, if PHP, have includes removed
* @note Adapted from http://aidanlister.com/repos/v/function.copyr.php
*/
public function copyr($source, $dest)
{
// Simple copy for a file
if (is_file($source)) {
return $this->copy($source, $dest);
}
// Make destination directory
if (!is_dir($dest)) {
$this->mkdir($dest);
}
// Loop through the folder
$dir = $this->dir($source);
while ( false !== ($entry = $dir->read()) ) {
// Skip pointers
if ($entry == '.' || $entry == '..') {
continue;
}
if (!$this->copyable($entry)) {
continue;
}
// Deep copy directories
if ($dest !== "$source/$entry") {
$this->copyr("$source/$entry", "$dest/$entry");
}
}
// Clean up
$dir->close();
return true;
}
/**
* Overloadable function that tests a filename for copyability. By
* default, everything should be copied; you can restrict things to
* ignore hidden files, unreadable files, etc. This function
* applies to copyr().
*/
public function copyable($file)
{
return true;
}
/**
* Delete a file, or a folder and its contents
* @note Adapted from http://aidanlister.com/repos/v/function.rmdirr.php
*/
public function rmdirr($dirname)
{
// Sanity check
if (!$this->file_exists($dirname)) {
return false;
}
// Simple delete for a file
if ($this->is_file($dirname) || $this->is_link($dirname)) {
return $this->unlink($dirname);
}
// Loop through the folder
$dir = $this->dir($dirname);
while (false !== $entry = $dir->read()) {
// Skip pointers
if ($entry == '.' || $entry == '..') {
continue;
}
// Recurse
$this->rmdirr($dirname . DIRECTORY_SEPARATOR . $entry);
}
// Clean up
$dir->close();
return $this->rmdir($dirname);
}
/**
* Recursively globs a directory.
*/
public function globr($dir, $pattern, $flags = 0)
{
$files = $this->glob("$dir/$pattern", $flags);
if ($files === false) $files = array();
$sub_dirs = $this->glob("$dir/*", GLOB_ONLYDIR);
if ($sub_dirs === false) $sub_dirs = array();
foreach ($sub_dirs as $sub_dir) {
$sub_files = $this->globr($sub_dir, $pattern, $flags);
$files = array_merge($files, $sub_files);
}
return $files;
}
/**
* Allows for PHP functions to be called and be stubbed.
* @warning This function will not work for functions that need
* to pass references; manually define a stub function for those.
*/
public function __call($name, $args)
{
return call_user_func_array($name, $args);
}
}
// vim: et sw=4 sts=4

141
extras/FSTools/File.php Normal file
View File

@@ -0,0 +1,141 @@
<?php
/**
* Represents a file in the filesystem
*
* @warning Be sure to distinguish between get() and write() versus
* read() and put(), the former operates on the entire file, while
* the latter operates on a handle.
*/
class FSTools_File
{
/** Filename of file this object represents */
protected $name;
/** Handle for the file */
protected $handle = false;
/** Instance of FSTools for interfacing with filesystem */
protected $fs;
/**
* Filename of file you wish to instantiate.
* @note This file need not exist
*/
public function __construct($name, $fs = false)
{
$this->name = $name;
$this->fs = $fs ? $fs : FSTools::singleton();
}
/** Returns the filename of the file. */
public function getName() {return $this->name;}
/** Returns directory of the file without trailing slash */
public function getDirectory() {return $this->fs->dirname($this->name);}
/**
* Retrieves the contents of a file
* @todo Throw an exception if file doesn't exist
*/
public function get()
{
return $this->fs->file_get_contents($this->name);
}
/** Writes contents to a file, creates new file if necessary */
public function write($contents)
{
return $this->fs->file_put_contents($this->name, $contents);
}
/** Deletes the file */
public function delete()
{
return $this->fs->unlink($this->name);
}
/** Returns true if file exists and is a file. */
public function exists()
{
return $this->fs->is_file($this->name);
}
/** Returns last file modification time */
public function getMTime()
{
return $this->fs->filemtime($this->name);
}
/**
* Chmod a file
* @note We ignore errors because of some weird owner trickery due
* to SVN duality
*/
public function chmod($octal_code)
{
return @$this->fs->chmod($this->name, $octal_code);
}
/** Opens file's handle */
public function open($mode)
{
if ($this->handle) $this->close();
$this->handle = $this->fs->fopen($this->name, $mode);
return true;
}
/** Closes file's handle */
public function close()
{
if (!$this->handle) return false;
$status = $this->fs->fclose($this->handle);
$this->handle = false;
return $status;
}
/** Retrieves a line from an open file, with optional max length $length */
public function getLine($length = null)
{
if (!$this->handle) $this->open('r');
if ($length === null) return $this->fs->fgets($this->handle);
else return $this->fs->fgets($this->handle, $length);
}
/** Retrieves a character from an open file */
public function getChar()
{
if (!$this->handle) $this->open('r');
return $this->fs->fgetc($this->handle);
}
/** Retrieves an $length bytes of data from an open data */
public function read($length)
{
if (!$this->handle) $this->open('r');
return $this->fs->fread($this->handle, $length);
}
/** Writes to an open file */
public function put($string)
{
if (!$this->handle) $this->open('a');
return $this->fs->fwrite($this->handle, $string);
}
/** Returns TRUE if the end of the file has been reached */
public function eof()
{
if (!$this->handle) return true;
return $this->fs->feof($this->handle);
}
public function __destruct()
{
if ($this->handle) $this->close();
}
}
// vim: et sw=4 sts=4

View File

@@ -0,0 +1,11 @@
<?php
/**
* This is a stub include that automatically configures the include path.
*/
set_include_path(dirname(__FILE__) . PATH_SEPARATOR . get_include_path() );
require_once 'HTMLPurifierExtras.php';
require_once 'HTMLPurifierExtras.autoload.php';
// vim: et sw=4 sts=4

View File

@@ -0,0 +1,15 @@
<?php
/**
* @file
* Legacy autoloader for systems lacking spl_autoload_register
*
* Must be separate to prevent deprecation warning on PHP 7.2
*/
function __autoload($class)
{
return HTMLPurifierExtras::autoload($class);
}
// vim: et sw=4 sts=4

View File

@@ -0,0 +1,23 @@
<?php
/**
* @file
* Convenience file that registers autoload handler for HTML Purifier.
*
* @warning
* This autoloader does not contain the compatibility code seen in
* HTMLPurifier_Bootstrap; the user is expected to make any necessary
* changes to use this library.
*/
if (function_exists('spl_autoload_register')) {
spl_autoload_register(array('HTMLPurifierExtras', 'autoload'));
if (function_exists('__autoload')) {
// Be polite and ensure that userland autoload gets retained
spl_autoload_register('__autoload');
}
} elseif (!function_exists('__autoload')) {
require dirname(__FILE__) . '/HTMLPurifierExtras.autoload-legacy.php';
}
// vim: et sw=4 sts=4

View File

@@ -0,0 +1,31 @@
<?php
/**
* Meta-class for HTML Purifier's extra class hierarchies, similar to
* HTMLPurifier_Bootstrap.
*/
class HTMLPurifierExtras
{
public static function autoload($class)
{
$path = HTMLPurifierExtras::getPath($class);
if (!$path) return false;
require $path;
return true;
}
public static function getPath($class)
{
if (
strncmp('FSTools', $class, 7) !== 0 &&
strncmp('ConfigDoc', $class, 9) !== 0
) return false;
// Custom implementations can go here
// Standard implementation:
return str_replace('_', '/', $class) . '.php';
}
}
// vim: et sw=4 sts=4

32
extras/README Normal file
View File

@@ -0,0 +1,32 @@
HTML Purifier Extras
The Method Behind The Madness!
The extras/ folder in HTML Purifier contains--you guessed it--extra things
for HTML Purifier. Specifically, these are two extra libraries called
FSTools and ConfigSchema. They're extra for a reason: you don't need them
if you're using HTML Purifier for normal usage: filtering HTML. However,
if you're a developer, and would like to test HTML Purifier, or need to
use one of HTML Purifier's maintenance scripts, chances are they'll need
these libraries. Who knows: maybe you'll find them useful too!
Here are the libraries:
FSTools
-------
Short for File System Tools, this is a poor-man's object-oriented wrapper for
the filesystem. It currently consists of two classes:
- FSTools: This is a singleton that contains a manner of useful functions
such as recursive glob, directory removal, etc, as well as the ability
to call arbitrary native PHP functions through it like $FS->fopen(...).
This makes it a lot simpler to mock these filesystem calls for unit testing.
- FSTools_File: This object represents a single file, and has almost any
method imaginable one would need.
Check the files themselves for more information.
vim: et sw=4 sts=4

View File

@@ -5,6 +5,7 @@
*/
set_include_path(dirname(__FILE__) . PATH_SEPARATOR . get_include_path() );
require_once 'HTMLPurifier.php';
require_once 'HTMLPurifier/Bootstrap.php';
require_once 'HTMLPurifier.autoload.php';
?>
// vim: et sw=4 sts=4

View File

@@ -0,0 +1,14 @@
<?php
/**
* @file
* Legacy autoloader for systems lacking spl_autoload_register
*
*/
spl_autoload_register(function($class)
{
return HTMLPurifier_Bootstrap::autoload($class);
});
// vim: et sw=4 sts=4

View File

@@ -0,0 +1,25 @@
<?php
/**
* @file
* Convenience file that registers autoload handler for HTML Purifier.
* It also does some sanity checks.
*/
if (function_exists('spl_autoload_register') && function_exists('spl_autoload_unregister')) {
// We need unregister for our pre-registering functionality
HTMLPurifier_Bootstrap::registerAutoload();
if (function_exists('__autoload')) {
// Be polite and ensure that userland autoload gets retained
spl_autoload_register('__autoload');
}
} elseif (!function_exists('__autoload')) {
require dirname(__FILE__) . '/HTMLPurifier.autoload-legacy.php';
}
// phpcs:ignore PHPCompatibility.IniDirectives.RemovedIniDirectives.zend_ze1_compatibility_modeRemoved
if (ini_get('zend.ze1_compatibility_mode')) {
trigger_error("HTML Purifier is not compatible with zend.ze1_compatibility_mode; please turn it off", E_USER_ERROR);
}
// vim: et sw=4 sts=4

View File

@@ -0,0 +1,4 @@
<?php
if (!defined('HTMLPURIFIER_PREFIX')) {
define('HTMLPURIFIER_PREFIX', dirname(__FILE__));
}

View File

@@ -1,21 +1,25 @@
<?php
/**
* Function wrapper for HTML Purifier for quick use.
* @note This function only includes the library when it is called. While
* this is efficient for instances when you only use HTML Purifier
* on a few of your pages, it murders bytecode caching. You still
* need to add HTML Purifier to your path.
* @file
* Defines a function wrapper for HTML Purifier for quick use.
* @note ''HTMLPurifier()'' is NOT the same as ''new HTMLPurifier()''
*/
function HTMLPurifier($html, $config = null) {
/**
* Purify HTML.
* @param string $html String HTML to purify
* @param mixed $config Configuration to use, can be any value accepted by
* HTMLPurifier_Config::create()
* @return string
*/
function HTMLPurifier($html, $config = null)
{
static $purifier = false;
if (!$purifier) {
require_once 'HTMLPurifier.php';
$purifier = new HTMLPurifier();
}
return $purifier->purify($html, $config);
}
?>
// vim: et sw=4 sts=4

View File

@@ -0,0 +1,235 @@
<?php
/**
* @file
* This file was auto-generated by generate-includes.php and includes all of
* the core files required by HTML Purifier. Use this if performance is a
* primary concern and you are using an opcode cache. PLEASE DO NOT EDIT THIS
* FILE, changes will be overwritten the next time the script is run.
*
* @version 4.15.0
*
* @warning
* You must *not* include any other HTML Purifier files before this file,
* because 'require' not 'require_once' is used.
*
* @warning
* This file requires that the include path contains the HTML Purifier
* library directory; this is not auto-set.
*/
require 'HTMLPurifier.php';
require 'HTMLPurifier/Arborize.php';
require 'HTMLPurifier/AttrCollections.php';
require 'HTMLPurifier/AttrDef.php';
require 'HTMLPurifier/AttrTransform.php';
require 'HTMLPurifier/AttrTypes.php';
require 'HTMLPurifier/AttrValidator.php';
require 'HTMLPurifier/Bootstrap.php';
require 'HTMLPurifier/Definition.php';
require 'HTMLPurifier/CSSDefinition.php';
require 'HTMLPurifier/ChildDef.php';
require 'HTMLPurifier/Config.php';
require 'HTMLPurifier/ConfigSchema.php';
require 'HTMLPurifier/ContentSets.php';
require 'HTMLPurifier/Context.php';
require 'HTMLPurifier/DefinitionCache.php';
require 'HTMLPurifier/DefinitionCacheFactory.php';
require 'HTMLPurifier/Doctype.php';
require 'HTMLPurifier/DoctypeRegistry.php';
require 'HTMLPurifier/ElementDef.php';
require 'HTMLPurifier/Encoder.php';
require 'HTMLPurifier/EntityLookup.php';
require 'HTMLPurifier/EntityParser.php';
require 'HTMLPurifier/ErrorCollector.php';
require 'HTMLPurifier/ErrorStruct.php';
require 'HTMLPurifier/Exception.php';
require 'HTMLPurifier/Filter.php';
require 'HTMLPurifier/Generator.php';
require 'HTMLPurifier/HTMLDefinition.php';
require 'HTMLPurifier/HTMLModule.php';
require 'HTMLPurifier/HTMLModuleManager.php';
require 'HTMLPurifier/IDAccumulator.php';
require 'HTMLPurifier/Injector.php';
require 'HTMLPurifier/Language.php';
require 'HTMLPurifier/LanguageFactory.php';
require 'HTMLPurifier/Length.php';
require 'HTMLPurifier/Lexer.php';
require 'HTMLPurifier/Node.php';
require 'HTMLPurifier/PercentEncoder.php';
require 'HTMLPurifier/PropertyList.php';
require 'HTMLPurifier/PropertyListIterator.php';
require 'HTMLPurifier/Queue.php';
require 'HTMLPurifier/Strategy.php';
require 'HTMLPurifier/StringHash.php';
require 'HTMLPurifier/StringHashParser.php';
require 'HTMLPurifier/TagTransform.php';
require 'HTMLPurifier/Token.php';
require 'HTMLPurifier/TokenFactory.php';
require 'HTMLPurifier/URI.php';
require 'HTMLPurifier/URIDefinition.php';
require 'HTMLPurifier/URIFilter.php';
require 'HTMLPurifier/URIParser.php';
require 'HTMLPurifier/URIScheme.php';
require 'HTMLPurifier/URISchemeRegistry.php';
require 'HTMLPurifier/UnitConverter.php';
require 'HTMLPurifier/VarParser.php';
require 'HTMLPurifier/VarParserException.php';
require 'HTMLPurifier/Zipper.php';
require 'HTMLPurifier/AttrDef/CSS.php';
require 'HTMLPurifier/AttrDef/Clone.php';
require 'HTMLPurifier/AttrDef/Enum.php';
require 'HTMLPurifier/AttrDef/Integer.php';
require 'HTMLPurifier/AttrDef/Lang.php';
require 'HTMLPurifier/AttrDef/Switch.php';
require 'HTMLPurifier/AttrDef/Text.php';
require 'HTMLPurifier/AttrDef/URI.php';
require 'HTMLPurifier/AttrDef/CSS/Number.php';
require 'HTMLPurifier/AttrDef/CSS/AlphaValue.php';
require 'HTMLPurifier/AttrDef/CSS/Background.php';
require 'HTMLPurifier/AttrDef/CSS/BackgroundPosition.php';
require 'HTMLPurifier/AttrDef/CSS/Border.php';
require 'HTMLPurifier/AttrDef/CSS/Color.php';
require 'HTMLPurifier/AttrDef/CSS/Composite.php';
require 'HTMLPurifier/AttrDef/CSS/DenyElementDecorator.php';
require 'HTMLPurifier/AttrDef/CSS/Filter.php';
require 'HTMLPurifier/AttrDef/CSS/Font.php';
require 'HTMLPurifier/AttrDef/CSS/FontFamily.php';
require 'HTMLPurifier/AttrDef/CSS/Ident.php';
require 'HTMLPurifier/AttrDef/CSS/ImportantDecorator.php';
require 'HTMLPurifier/AttrDef/CSS/Length.php';
require 'HTMLPurifier/AttrDef/CSS/ListStyle.php';
require 'HTMLPurifier/AttrDef/CSS/Multiple.php';
require 'HTMLPurifier/AttrDef/CSS/Percentage.php';
require 'HTMLPurifier/AttrDef/CSS/TextDecoration.php';
require 'HTMLPurifier/AttrDef/CSS/URI.php';
require 'HTMLPurifier/AttrDef/HTML/Bool.php';
require 'HTMLPurifier/AttrDef/HTML/Nmtokens.php';
require 'HTMLPurifier/AttrDef/HTML/Class.php';
require 'HTMLPurifier/AttrDef/HTML/Color.php';
require 'HTMLPurifier/AttrDef/HTML/ContentEditable.php';
require 'HTMLPurifier/AttrDef/HTML/FrameTarget.php';
require 'HTMLPurifier/AttrDef/HTML/ID.php';
require 'HTMLPurifier/AttrDef/HTML/Pixels.php';
require 'HTMLPurifier/AttrDef/HTML/Length.php';
require 'HTMLPurifier/AttrDef/HTML/LinkTypes.php';
require 'HTMLPurifier/AttrDef/HTML/MultiLength.php';
require 'HTMLPurifier/AttrDef/URI/Email.php';
require 'HTMLPurifier/AttrDef/URI/Host.php';
require 'HTMLPurifier/AttrDef/URI/IPv4.php';
require 'HTMLPurifier/AttrDef/URI/IPv6.php';
require 'HTMLPurifier/AttrDef/URI/Email/SimpleCheck.php';
require 'HTMLPurifier/AttrTransform/Background.php';
require 'HTMLPurifier/AttrTransform/BdoDir.php';
require 'HTMLPurifier/AttrTransform/BgColor.php';
require 'HTMLPurifier/AttrTransform/BoolToCSS.php';
require 'HTMLPurifier/AttrTransform/Border.php';
require 'HTMLPurifier/AttrTransform/EnumToCSS.php';
require 'HTMLPurifier/AttrTransform/ImgRequired.php';
require 'HTMLPurifier/AttrTransform/ImgSpace.php';
require 'HTMLPurifier/AttrTransform/Input.php';
require 'HTMLPurifier/AttrTransform/Lang.php';
require 'HTMLPurifier/AttrTransform/Length.php';
require 'HTMLPurifier/AttrTransform/Name.php';
require 'HTMLPurifier/AttrTransform/NameSync.php';
require 'HTMLPurifier/AttrTransform/Nofollow.php';
require 'HTMLPurifier/AttrTransform/SafeEmbed.php';
require 'HTMLPurifier/AttrTransform/SafeObject.php';
require 'HTMLPurifier/AttrTransform/SafeParam.php';
require 'HTMLPurifier/AttrTransform/ScriptRequired.php';
require 'HTMLPurifier/AttrTransform/TargetBlank.php';
require 'HTMLPurifier/AttrTransform/TargetNoopener.php';
require 'HTMLPurifier/AttrTransform/TargetNoreferrer.php';
require 'HTMLPurifier/AttrTransform/Textarea.php';
require 'HTMLPurifier/ChildDef/Chameleon.php';
require 'HTMLPurifier/ChildDef/Custom.php';
require 'HTMLPurifier/ChildDef/Empty.php';
require 'HTMLPurifier/ChildDef/List.php';
require 'HTMLPurifier/ChildDef/Required.php';
require 'HTMLPurifier/ChildDef/Optional.php';
require 'HTMLPurifier/ChildDef/StrictBlockquote.php';
require 'HTMLPurifier/ChildDef/Table.php';
require 'HTMLPurifier/DefinitionCache/Decorator.php';
require 'HTMLPurifier/DefinitionCache/Null.php';
require 'HTMLPurifier/DefinitionCache/Serializer.php';
require 'HTMLPurifier/DefinitionCache/Decorator/Cleanup.php';
require 'HTMLPurifier/DefinitionCache/Decorator/Memory.php';
require 'HTMLPurifier/HTMLModule/Bdo.php';
require 'HTMLPurifier/HTMLModule/CommonAttributes.php';
require 'HTMLPurifier/HTMLModule/Edit.php';
require 'HTMLPurifier/HTMLModule/Forms.php';
require 'HTMLPurifier/HTMLModule/Hypertext.php';
require 'HTMLPurifier/HTMLModule/Iframe.php';
require 'HTMLPurifier/HTMLModule/Image.php';
require 'HTMLPurifier/HTMLModule/Legacy.php';
require 'HTMLPurifier/HTMLModule/List.php';
require 'HTMLPurifier/HTMLModule/Name.php';
require 'HTMLPurifier/HTMLModule/Nofollow.php';
require 'HTMLPurifier/HTMLModule/NonXMLCommonAttributes.php';
require 'HTMLPurifier/HTMLModule/Object.php';
require 'HTMLPurifier/HTMLModule/Presentation.php';
require 'HTMLPurifier/HTMLModule/Proprietary.php';
require 'HTMLPurifier/HTMLModule/Ruby.php';
require 'HTMLPurifier/HTMLModule/SafeEmbed.php';
require 'HTMLPurifier/HTMLModule/SafeObject.php';
require 'HTMLPurifier/HTMLModule/SafeScripting.php';
require 'HTMLPurifier/HTMLModule/Scripting.php';
require 'HTMLPurifier/HTMLModule/StyleAttribute.php';
require 'HTMLPurifier/HTMLModule/Tables.php';
require 'HTMLPurifier/HTMLModule/Target.php';
require 'HTMLPurifier/HTMLModule/TargetBlank.php';
require 'HTMLPurifier/HTMLModule/TargetNoopener.php';
require 'HTMLPurifier/HTMLModule/TargetNoreferrer.php';
require 'HTMLPurifier/HTMLModule/Text.php';
require 'HTMLPurifier/HTMLModule/Tidy.php';
require 'HTMLPurifier/HTMLModule/XMLCommonAttributes.php';
require 'HTMLPurifier/HTMLModule/Tidy/Name.php';
require 'HTMLPurifier/HTMLModule/Tidy/Proprietary.php';
require 'HTMLPurifier/HTMLModule/Tidy/XHTMLAndHTML4.php';
require 'HTMLPurifier/HTMLModule/Tidy/Strict.php';
require 'HTMLPurifier/HTMLModule/Tidy/Transitional.php';
require 'HTMLPurifier/HTMLModule/Tidy/XHTML.php';
require 'HTMLPurifier/Injector/AutoParagraph.php';
require 'HTMLPurifier/Injector/DisplayLinkURI.php';
require 'HTMLPurifier/Injector/Linkify.php';
require 'HTMLPurifier/Injector/PurifierLinkify.php';
require 'HTMLPurifier/Injector/RemoveEmpty.php';
require 'HTMLPurifier/Injector/RemoveSpansWithoutAttributes.php';
require 'HTMLPurifier/Injector/SafeObject.php';
require 'HTMLPurifier/Lexer/DOMLex.php';
require 'HTMLPurifier/Lexer/DirectLex.php';
require 'HTMLPurifier/Node/Comment.php';
require 'HTMLPurifier/Node/Element.php';
require 'HTMLPurifier/Node/Text.php';
require 'HTMLPurifier/Strategy/Composite.php';
require 'HTMLPurifier/Strategy/Core.php';
require 'HTMLPurifier/Strategy/FixNesting.php';
require 'HTMLPurifier/Strategy/MakeWellFormed.php';
require 'HTMLPurifier/Strategy/RemoveForeignElements.php';
require 'HTMLPurifier/Strategy/ValidateAttributes.php';
require 'HTMLPurifier/TagTransform/Font.php';
require 'HTMLPurifier/TagTransform/Simple.php';
require 'HTMLPurifier/Token/Comment.php';
require 'HTMLPurifier/Token/Tag.php';
require 'HTMLPurifier/Token/Empty.php';
require 'HTMLPurifier/Token/End.php';
require 'HTMLPurifier/Token/Start.php';
require 'HTMLPurifier/Token/Text.php';
require 'HTMLPurifier/URIFilter/DisableExternal.php';
require 'HTMLPurifier/URIFilter/DisableExternalResources.php';
require 'HTMLPurifier/URIFilter/DisableResources.php';
require 'HTMLPurifier/URIFilter/HostBlacklist.php';
require 'HTMLPurifier/URIFilter/MakeAbsolute.php';
require 'HTMLPurifier/URIFilter/Munge.php';
require 'HTMLPurifier/URIFilter/SafeIframe.php';
require 'HTMLPurifier/URIScheme/data.php';
require 'HTMLPurifier/URIScheme/file.php';
require 'HTMLPurifier/URIScheme/ftp.php';
require 'HTMLPurifier/URIScheme/http.php';
require 'HTMLPurifier/URIScheme/https.php';
require 'HTMLPurifier/URIScheme/mailto.php';
require 'HTMLPurifier/URIScheme/news.php';
require 'HTMLPurifier/URIScheme/nntp.php';
require 'HTMLPurifier/URIScheme/tel.php';
require 'HTMLPurifier/VarParser/Flexible.php';
require 'HTMLPurifier/VarParser/Native.php';

View File

@@ -0,0 +1,30 @@
<?php
/**
* @file
* Emulation layer for code that used kses(), substituting in HTML Purifier.
*/
require_once dirname(__FILE__) . '/HTMLPurifier.auto.php';
function kses($string, $allowed_html, $allowed_protocols = null)
{
$config = HTMLPurifier_Config::createDefault();
$allowed_elements = array();
$allowed_attributes = array();
foreach ($allowed_html as $element => $attributes) {
$allowed_elements[$element] = true;
foreach ($attributes as $attribute => $x) {
$allowed_attributes["$element.$attribute"] = true;
}
}
$config->set('HTML.AllowedElements', $allowed_elements);
$config->set('HTML.AllowedAttributes', $allowed_attributes);
if ($allowed_protocols !== null) {
$config->set('URI.AllowedSchemes', $allowed_protocols);
}
$purifier = new HTMLPurifier($config);
return $purifier->purify($string);
}
// vim: et sw=4 sts=4

View File

@@ -0,0 +1,11 @@
<?php
/**
* @file
* Convenience stub file that adds HTML Purifier's library file to the path
* without any other side-effects.
*/
set_include_path(dirname(__FILE__) . PATH_SEPARATOR . get_include_path() );
// vim: et sw=4 sts=4

View File

@@ -1,12 +1,11 @@
<?php
/*!
* @mainpage
*
/*! @mainpage
*
* HTML Purifier is an HTML filter that will take an arbitrary snippet of
* HTML and rigorously test, validate and filter it into a version that
* is safe for output onto webpages. It achieves this by:
*
*
* -# Lexing (parsing into tokens) the document,
* -# Executing various strategies on the tokens:
* -# Removing all elements not in the whitelist,
@@ -14,16 +13,14 @@
* -# Fixing the nesting of the nodes, and
* -# Validating attributes of the nodes; and
* -# Generating HTML from the purified tokens.
*
*
* However, most users will only need to interface with the HTMLPurifier
* class, so this massive amount of infrastructure is usually concealed.
* If you plan on working with the internals, be sure to include
* HTMLPurifier_ConfigSchema and HTMLPurifier_Config.
* and HTMLPurifier_Config.
*/
/*
HTML Purifier 1.4.0 - Standards Compliant HTML Filtering
Copyright (C) 2006 Edward Z. Yang
HTML Purifier 4.15.0 - Standards Compliant HTML Filtering
Copyright (C) 2006-2008 Edward Z. Yang
This library is free software; you can redistribute it and/or
modify it under the terms of the GNU Lesser General Public
@@ -40,131 +37,261 @@
Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
*/
// almost every class has an undocumented dependency to these, so make sure
// they get included
require_once 'HTMLPurifier/ConfigSchema.php';
require_once 'HTMLPurifier/Config.php';
require_once 'HTMLPurifier/Context.php';
require_once 'HTMLPurifier/Lexer.php';
require_once 'HTMLPurifier/Generator.php';
require_once 'HTMLPurifier/Strategy/Core.php';
require_once 'HTMLPurifier/Encoder.php';
/**
* Main library execution class.
*
* Facade that performs calls to the HTMLPurifier_Lexer,
* HTMLPurifier_Strategy and HTMLPurifier_Generator subsystems in order to
* purify HTML.
*
* @todo We need an easier way to inject strategies, it'll probably end
* up getting done through config though.
* Facade that coordinates HTML Purifier's subsystems in order to purify HTML.
*
* @note There are several points in which configuration can be specified
* for HTML Purifier. The precedence of these (from lowest to
* highest) is as follows:
* -# Instance: new HTMLPurifier($config)
* -# Invocation: purify($html, $config)
* These configurations are entirely independent of each other and
* are *not* merged (this behavior may change in the future).
*
* @todo We need an easier way to inject strategies using the configuration
* object.
*/
class HTMLPurifier
{
var $version = '1.4.0';
var $config;
var $filters;
var $lexer, $strategy, $generator;
/**
* Final HTMLPurifier_Context of last run purification. Might be an array.
* @public
* Version of HTML Purifier.
* @type string
*/
var $context;
public $version = '4.15.0';
/**
* Constant with version of HTML Purifier.
*/
const VERSION = '4.15.0';
/**
* Global configuration object.
* @type HTMLPurifier_Config
*/
public $config;
/**
* Array of extra filter objects to run on HTML,
* for backwards compatibility.
* @type HTMLPurifier_Filter[]
*/
private $filters = array();
/**
* Single instance of HTML Purifier.
* @type HTMLPurifier
*/
private static $instance;
/**
* @type HTMLPurifier_Strategy_Core
*/
protected $strategy;
/**
* @type HTMLPurifier_Generator
*/
protected $generator;
/**
* Resultant context of last run purification.
* Is an array of contexts if the last called method was purifyArray().
* @type HTMLPurifier_Context
*/
public $context;
/**
* Initializes the purifier.
* @param $config Optional HTMLPurifier_Config object for all instances of
* the purifier, if omitted, a default configuration is
* supplied (which can be overridden on a per-use basis).
*
* @param HTMLPurifier_Config|mixed $config Optional HTMLPurifier_Config object
* for all instances of the purifier, if omitted, a default
* configuration is supplied (which can be overridden on a
* per-use basis).
* The parameter can also be any type that
* HTMLPurifier_Config::create() supports.
*/
function HTMLPurifier($config = null) {
public function __construct($config = null)
{
$this->config = HTMLPurifier_Config::create($config);
$this->lexer = HTMLPurifier_Lexer::create();
$this->strategy = new HTMLPurifier_Strategy_Core();
$this->generator = new HTMLPurifier_Generator();
$this->strategy = new HTMLPurifier_Strategy_Core();
}
/**
* Adds a filter to process the output. First come first serve
* @param $filter HTMLPurifier_Filter object
*
* @param HTMLPurifier_Filter $filter HTMLPurifier_Filter object
*/
function addFilter($filter) {
public function addFilter($filter)
{
trigger_error(
'HTMLPurifier->addFilter() is deprecated, use configuration directives' .
' in the Filter namespace or Filter.Custom',
E_USER_WARNING
);
$this->filters[] = $filter;
}
/**
* Filters an HTML snippet/document to be XSS-free and standards-compliant.
*
* @param $html String of HTML to purify
* @param $config HTMLPurifier_Config object for this operation, if omitted,
* defaults to the config object specified during this
*
* @param string $html String of HTML to purify
* @param HTMLPurifier_Config $config Config object for this operation,
* if omitted, defaults to the config object specified during this
* object's construction. The parameter can also be any type
* that HTMLPurifier_Config::create() supports.
* @return Purified HTML
*
* @return string Purified HTML
*/
function purify($html, $config = null) {
public function purify($html, $config = null)
{
// :TODO: make the config merge in, instead of replace
$config = $config ? HTMLPurifier_Config::create($config) : $this->config;
// implementation is partially environment dependant, partially
// configuration dependant
$lexer = HTMLPurifier_Lexer::create($config);
$context = new HTMLPurifier_Context();
$html = HTMLPurifier_Encoder::convertToUTF8($html, $config, $context);
for ($i = 0, $size = count($this->filters); $i < $size; $i++) {
$html = $this->filters[$i]->preFilter($html, $config, $context);
// setup HTML generator
$this->generator = new HTMLPurifier_Generator($config, $context);
$context->register('Generator', $this->generator);
// set up global context variables
if ($config->get('Core.CollectErrors')) {
// may get moved out if other facilities use it
$language_factory = HTMLPurifier_LanguageFactory::instance();
$language = $language_factory->create($config, $context);
$context->register('Locale', $language);
$error_collector = new HTMLPurifier_ErrorCollector($context);
$context->register('ErrorCollector', $error_collector);
}
// setup id_accumulator context, necessary due to the fact that
// AttrValidator can be called from many places
$id_accumulator = HTMLPurifier_IDAccumulator::build($config, $context);
$context->register('IDAccumulator', $id_accumulator);
$html = HTMLPurifier_Encoder::convertToUTF8($html, $config, $context);
// setup filters
$filter_flags = $config->getBatch('Filter');
$custom_filters = $filter_flags['Custom'];
unset($filter_flags['Custom']);
$filters = array();
foreach ($filter_flags as $filter => $flag) {
if (!$flag) {
continue;
}
if (strpos($filter, '.') !== false) {
continue;
}
$class = "HTMLPurifier_Filter_$filter";
$filters[] = new $class;
}
foreach ($custom_filters as $filter) {
// maybe "HTMLPurifier_Filter_$filter", but be consistent with AutoFormat
$filters[] = $filter;
}
$filters = array_merge($filters, $this->filters);
// maybe prepare(), but later
for ($i = 0, $filter_size = count($filters); $i < $filter_size; $i++) {
$html = $filters[$i]->preFilter($html, $config, $context);
}
// purified HTML
$html =
$html =
$this->generator->generateFromTokens(
// list of tokens
$this->strategy->execute(
// list of un-purified tokens
$this->lexer->tokenizeHTML(
$lexer->tokenizeHTML(
// un-purified HTML
$html, $config, $context
$html,
$config,
$context
),
$config, $context
),
$config, $context
$config,
$context
)
);
for ($i = $size - 1; $i >= 0; $i--) {
$html = $this->filters[$i]->postFilter($html, $config, $context);
for ($i = $filter_size - 1; $i >= 0; $i--) {
$html = $filters[$i]->postFilter($html, $config, $context);
}
$html = HTMLPurifier_Encoder::convertFromUTF8($html, $config, $context);
$this->context =& $context;
return $html;
}
/**
* Filters an array of HTML snippets
* @param $config Optional HTMLPurifier_Config object for this operation.
*
* @param string[] $array_of_html Array of html snippets
* @param HTMLPurifier_Config $config Optional config object for this operation.
* See HTMLPurifier::purify() for more details.
* @return Array of purified HTML
*
* @return string[] Array of purified HTML
*/
function purifyArray($array_of_html, $config = null) {
public function purifyArray($array_of_html, $config = null)
{
$context_array = array();
foreach ($array_of_html as $key => $html) {
$array_of_html[$key] = $this->purify($html, $config);
$array = array();
foreach($array_of_html as $key=>$value){
if (is_array($value)) {
$array[$key] = $this->purifyArray($value, $config);
} else {
$array[$key] = $this->purify($value, $config);
}
$context_array[$key] = $this->context;
}
$this->context = $context_array;
return $array_of_html;
return $array;
}
/**
* Singleton for enforcing just one HTML Purifier in your system
*
* @param HTMLPurifier|HTMLPurifier_Config $prototype Optional prototype
* HTMLPurifier instance to overload singleton with,
* or HTMLPurifier_Config instance to configure the
* generated version with.
*
* @return HTMLPurifier
*/
public static function instance($prototype = null)
{
if (!self::$instance || $prototype) {
if ($prototype instanceof HTMLPurifier) {
self::$instance = $prototype;
} elseif ($prototype) {
self::$instance = new HTMLPurifier($prototype);
} else {
self::$instance = new HTMLPurifier();
}
}
return self::$instance;
}
/**
* Singleton for enforcing just one HTML Purifier in your system
*
* @param HTMLPurifier|HTMLPurifier_Config $prototype Optional prototype
* HTMLPurifier instance to overload singleton with,
* or HTMLPurifier_Config instance to configure the
* generated version with.
*
* @return HTMLPurifier
* @note Backwards compatibility, see instance()
*/
public static function getInstance($prototype = null)
{
return HTMLPurifier::instance($prototype);
}
}
?>
// vim: et sw=4 sts=4

View File

@@ -0,0 +1,229 @@
<?php
/**
* @file
* This file was auto-generated by generate-includes.php and includes all of
* the core files required by HTML Purifier. This is a convenience stub that
* includes all files using dirname(__FILE__) and require_once. PLEASE DO NOT
* EDIT THIS FILE, changes will be overwritten the next time the script is run.
*
* Changes to include_path are not necessary.
*/
$__dir = dirname(__FILE__);
require_once $__dir . '/HTMLPurifier.php';
require_once $__dir . '/HTMLPurifier/Arborize.php';
require_once $__dir . '/HTMLPurifier/AttrCollections.php';
require_once $__dir . '/HTMLPurifier/AttrDef.php';
require_once $__dir . '/HTMLPurifier/AttrTransform.php';
require_once $__dir . '/HTMLPurifier/AttrTypes.php';
require_once $__dir . '/HTMLPurifier/AttrValidator.php';
require_once $__dir . '/HTMLPurifier/Bootstrap.php';
require_once $__dir . '/HTMLPurifier/Definition.php';
require_once $__dir . '/HTMLPurifier/CSSDefinition.php';
require_once $__dir . '/HTMLPurifier/ChildDef.php';
require_once $__dir . '/HTMLPurifier/Config.php';
require_once $__dir . '/HTMLPurifier/ConfigSchema.php';
require_once $__dir . '/HTMLPurifier/ContentSets.php';
require_once $__dir . '/HTMLPurifier/Context.php';
require_once $__dir . '/HTMLPurifier/DefinitionCache.php';
require_once $__dir . '/HTMLPurifier/DefinitionCacheFactory.php';
require_once $__dir . '/HTMLPurifier/Doctype.php';
require_once $__dir . '/HTMLPurifier/DoctypeRegistry.php';
require_once $__dir . '/HTMLPurifier/ElementDef.php';
require_once $__dir . '/HTMLPurifier/Encoder.php';
require_once $__dir . '/HTMLPurifier/EntityLookup.php';
require_once $__dir . '/HTMLPurifier/EntityParser.php';
require_once $__dir . '/HTMLPurifier/ErrorCollector.php';
require_once $__dir . '/HTMLPurifier/ErrorStruct.php';
require_once $__dir . '/HTMLPurifier/Exception.php';
require_once $__dir . '/HTMLPurifier/Filter.php';
require_once $__dir . '/HTMLPurifier/Generator.php';
require_once $__dir . '/HTMLPurifier/HTMLDefinition.php';
require_once $__dir . '/HTMLPurifier/HTMLModule.php';
require_once $__dir . '/HTMLPurifier/HTMLModuleManager.php';
require_once $__dir . '/HTMLPurifier/IDAccumulator.php';
require_once $__dir . '/HTMLPurifier/Injector.php';
require_once $__dir . '/HTMLPurifier/Language.php';
require_once $__dir . '/HTMLPurifier/LanguageFactory.php';
require_once $__dir . '/HTMLPurifier/Length.php';
require_once $__dir . '/HTMLPurifier/Lexer.php';
require_once $__dir . '/HTMLPurifier/Node.php';
require_once $__dir . '/HTMLPurifier/PercentEncoder.php';
require_once $__dir . '/HTMLPurifier/PropertyList.php';
require_once $__dir . '/HTMLPurifier/PropertyListIterator.php';
require_once $__dir . '/HTMLPurifier/Queue.php';
require_once $__dir . '/HTMLPurifier/Strategy.php';
require_once $__dir . '/HTMLPurifier/StringHash.php';
require_once $__dir . '/HTMLPurifier/StringHashParser.php';
require_once $__dir . '/HTMLPurifier/TagTransform.php';
require_once $__dir . '/HTMLPurifier/Token.php';
require_once $__dir . '/HTMLPurifier/TokenFactory.php';
require_once $__dir . '/HTMLPurifier/URI.php';
require_once $__dir . '/HTMLPurifier/URIDefinition.php';
require_once $__dir . '/HTMLPurifier/URIFilter.php';
require_once $__dir . '/HTMLPurifier/URIParser.php';
require_once $__dir . '/HTMLPurifier/URIScheme.php';
require_once $__dir . '/HTMLPurifier/URISchemeRegistry.php';
require_once $__dir . '/HTMLPurifier/UnitConverter.php';
require_once $__dir . '/HTMLPurifier/VarParser.php';
require_once $__dir . '/HTMLPurifier/VarParserException.php';
require_once $__dir . '/HTMLPurifier/Zipper.php';
require_once $__dir . '/HTMLPurifier/AttrDef/CSS.php';
require_once $__dir . '/HTMLPurifier/AttrDef/Clone.php';
require_once $__dir . '/HTMLPurifier/AttrDef/Enum.php';
require_once $__dir . '/HTMLPurifier/AttrDef/Integer.php';
require_once $__dir . '/HTMLPurifier/AttrDef/Lang.php';
require_once $__dir . '/HTMLPurifier/AttrDef/Switch.php';
require_once $__dir . '/HTMLPurifier/AttrDef/Text.php';
require_once $__dir . '/HTMLPurifier/AttrDef/URI.php';
require_once $__dir . '/HTMLPurifier/AttrDef/CSS/Number.php';
require_once $__dir . '/HTMLPurifier/AttrDef/CSS/AlphaValue.php';
require_once $__dir . '/HTMLPurifier/AttrDef/CSS/Background.php';
require_once $__dir . '/HTMLPurifier/AttrDef/CSS/BackgroundPosition.php';
require_once $__dir . '/HTMLPurifier/AttrDef/CSS/Border.php';
require_once $__dir . '/HTMLPurifier/AttrDef/CSS/Color.php';
require_once $__dir . '/HTMLPurifier/AttrDef/CSS/Composite.php';
require_once $__dir . '/HTMLPurifier/AttrDef/CSS/DenyElementDecorator.php';
require_once $__dir . '/HTMLPurifier/AttrDef/CSS/Filter.php';
require_once $__dir . '/HTMLPurifier/AttrDef/CSS/Font.php';
require_once $__dir . '/HTMLPurifier/AttrDef/CSS/FontFamily.php';
require_once $__dir . '/HTMLPurifier/AttrDef/CSS/Ident.php';
require_once $__dir . '/HTMLPurifier/AttrDef/CSS/ImportantDecorator.php';
require_once $__dir . '/HTMLPurifier/AttrDef/CSS/Length.php';
require_once $__dir . '/HTMLPurifier/AttrDef/CSS/ListStyle.php';
require_once $__dir . '/HTMLPurifier/AttrDef/CSS/Multiple.php';
require_once $__dir . '/HTMLPurifier/AttrDef/CSS/Percentage.php';
require_once $__dir . '/HTMLPurifier/AttrDef/CSS/TextDecoration.php';
require_once $__dir . '/HTMLPurifier/AttrDef/CSS/URI.php';
require_once $__dir . '/HTMLPurifier/AttrDef/HTML/Bool.php';
require_once $__dir . '/HTMLPurifier/AttrDef/HTML/Nmtokens.php';
require_once $__dir . '/HTMLPurifier/AttrDef/HTML/Class.php';
require_once $__dir . '/HTMLPurifier/AttrDef/HTML/Color.php';
require_once $__dir . '/HTMLPurifier/AttrDef/HTML/ContentEditable.php';
require_once $__dir . '/HTMLPurifier/AttrDef/HTML/FrameTarget.php';
require_once $__dir . '/HTMLPurifier/AttrDef/HTML/ID.php';
require_once $__dir . '/HTMLPurifier/AttrDef/HTML/Pixels.php';
require_once $__dir . '/HTMLPurifier/AttrDef/HTML/Length.php';
require_once $__dir . '/HTMLPurifier/AttrDef/HTML/LinkTypes.php';
require_once $__dir . '/HTMLPurifier/AttrDef/HTML/MultiLength.php';
require_once $__dir . '/HTMLPurifier/AttrDef/URI/Email.php';
require_once $__dir . '/HTMLPurifier/AttrDef/URI/Host.php';
require_once $__dir . '/HTMLPurifier/AttrDef/URI/IPv4.php';
require_once $__dir . '/HTMLPurifier/AttrDef/URI/IPv6.php';
require_once $__dir . '/HTMLPurifier/AttrDef/URI/Email/SimpleCheck.php';
require_once $__dir . '/HTMLPurifier/AttrTransform/Background.php';
require_once $__dir . '/HTMLPurifier/AttrTransform/BdoDir.php';
require_once $__dir . '/HTMLPurifier/AttrTransform/BgColor.php';
require_once $__dir . '/HTMLPurifier/AttrTransform/BoolToCSS.php';
require_once $__dir . '/HTMLPurifier/AttrTransform/Border.php';
require_once $__dir . '/HTMLPurifier/AttrTransform/EnumToCSS.php';
require_once $__dir . '/HTMLPurifier/AttrTransform/ImgRequired.php';
require_once $__dir . '/HTMLPurifier/AttrTransform/ImgSpace.php';
require_once $__dir . '/HTMLPurifier/AttrTransform/Input.php';
require_once $__dir . '/HTMLPurifier/AttrTransform/Lang.php';
require_once $__dir . '/HTMLPurifier/AttrTransform/Length.php';
require_once $__dir . '/HTMLPurifier/AttrTransform/Name.php';
require_once $__dir . '/HTMLPurifier/AttrTransform/NameSync.php';
require_once $__dir . '/HTMLPurifier/AttrTransform/Nofollow.php';
require_once $__dir . '/HTMLPurifier/AttrTransform/SafeEmbed.php';
require_once $__dir . '/HTMLPurifier/AttrTransform/SafeObject.php';
require_once $__dir . '/HTMLPurifier/AttrTransform/SafeParam.php';
require_once $__dir . '/HTMLPurifier/AttrTransform/ScriptRequired.php';
require_once $__dir . '/HTMLPurifier/AttrTransform/TargetBlank.php';
require_once $__dir . '/HTMLPurifier/AttrTransform/TargetNoopener.php';
require_once $__dir . '/HTMLPurifier/AttrTransform/TargetNoreferrer.php';
require_once $__dir . '/HTMLPurifier/AttrTransform/Textarea.php';
require_once $__dir . '/HTMLPurifier/ChildDef/Chameleon.php';
require_once $__dir . '/HTMLPurifier/ChildDef/Custom.php';
require_once $__dir . '/HTMLPurifier/ChildDef/Empty.php';
require_once $__dir . '/HTMLPurifier/ChildDef/List.php';
require_once $__dir . '/HTMLPurifier/ChildDef/Required.php';
require_once $__dir . '/HTMLPurifier/ChildDef/Optional.php';
require_once $__dir . '/HTMLPurifier/ChildDef/StrictBlockquote.php';
require_once $__dir . '/HTMLPurifier/ChildDef/Table.php';
require_once $__dir . '/HTMLPurifier/DefinitionCache/Decorator.php';
require_once $__dir . '/HTMLPurifier/DefinitionCache/Null.php';
require_once $__dir . '/HTMLPurifier/DefinitionCache/Serializer.php';
require_once $__dir . '/HTMLPurifier/DefinitionCache/Decorator/Cleanup.php';
require_once $__dir . '/HTMLPurifier/DefinitionCache/Decorator/Memory.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/Bdo.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/CommonAttributes.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/Edit.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/Forms.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/Hypertext.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/Iframe.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/Image.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/Legacy.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/List.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/Name.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/Nofollow.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/NonXMLCommonAttributes.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/Object.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/Presentation.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/Proprietary.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/Ruby.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/SafeEmbed.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/SafeObject.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/SafeScripting.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/Scripting.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/StyleAttribute.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/Tables.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/Target.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/TargetBlank.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/TargetNoopener.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/TargetNoreferrer.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/Text.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/Tidy.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/XMLCommonAttributes.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/Tidy/Name.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/Tidy/Proprietary.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/Tidy/XHTMLAndHTML4.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/Tidy/Strict.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/Tidy/Transitional.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/Tidy/XHTML.php';
require_once $__dir . '/HTMLPurifier/Injector/AutoParagraph.php';
require_once $__dir . '/HTMLPurifier/Injector/DisplayLinkURI.php';
require_once $__dir . '/HTMLPurifier/Injector/Linkify.php';
require_once $__dir . '/HTMLPurifier/Injector/PurifierLinkify.php';
require_once $__dir . '/HTMLPurifier/Injector/RemoveEmpty.php';
require_once $__dir . '/HTMLPurifier/Injector/RemoveSpansWithoutAttributes.php';
require_once $__dir . '/HTMLPurifier/Injector/SafeObject.php';
require_once $__dir . '/HTMLPurifier/Lexer/DOMLex.php';
require_once $__dir . '/HTMLPurifier/Lexer/DirectLex.php';
require_once $__dir . '/HTMLPurifier/Node/Comment.php';
require_once $__dir . '/HTMLPurifier/Node/Element.php';
require_once $__dir . '/HTMLPurifier/Node/Text.php';
require_once $__dir . '/HTMLPurifier/Strategy/Composite.php';
require_once $__dir . '/HTMLPurifier/Strategy/Core.php';
require_once $__dir . '/HTMLPurifier/Strategy/FixNesting.php';
require_once $__dir . '/HTMLPurifier/Strategy/MakeWellFormed.php';
require_once $__dir . '/HTMLPurifier/Strategy/RemoveForeignElements.php';
require_once $__dir . '/HTMLPurifier/Strategy/ValidateAttributes.php';
require_once $__dir . '/HTMLPurifier/TagTransform/Font.php';
require_once $__dir . '/HTMLPurifier/TagTransform/Simple.php';
require_once $__dir . '/HTMLPurifier/Token/Comment.php';
require_once $__dir . '/HTMLPurifier/Token/Tag.php';
require_once $__dir . '/HTMLPurifier/Token/Empty.php';
require_once $__dir . '/HTMLPurifier/Token/End.php';
require_once $__dir . '/HTMLPurifier/Token/Start.php';
require_once $__dir . '/HTMLPurifier/Token/Text.php';
require_once $__dir . '/HTMLPurifier/URIFilter/DisableExternal.php';
require_once $__dir . '/HTMLPurifier/URIFilter/DisableExternalResources.php';
require_once $__dir . '/HTMLPurifier/URIFilter/DisableResources.php';
require_once $__dir . '/HTMLPurifier/URIFilter/HostBlacklist.php';
require_once $__dir . '/HTMLPurifier/URIFilter/MakeAbsolute.php';
require_once $__dir . '/HTMLPurifier/URIFilter/Munge.php';
require_once $__dir . '/HTMLPurifier/URIFilter/SafeIframe.php';
require_once $__dir . '/HTMLPurifier/URIScheme/data.php';
require_once $__dir . '/HTMLPurifier/URIScheme/file.php';
require_once $__dir . '/HTMLPurifier/URIScheme/ftp.php';
require_once $__dir . '/HTMLPurifier/URIScheme/http.php';
require_once $__dir . '/HTMLPurifier/URIScheme/https.php';
require_once $__dir . '/HTMLPurifier/URIScheme/mailto.php';
require_once $__dir . '/HTMLPurifier/URIScheme/news.php';
require_once $__dir . '/HTMLPurifier/URIScheme/nntp.php';
require_once $__dir . '/HTMLPurifier/URIScheme/tel.php';
require_once $__dir . '/HTMLPurifier/VarParser/Flexible.php';
require_once $__dir . '/HTMLPurifier/VarParser/Native.php';

View File

@@ -0,0 +1,71 @@
<?php
/**
* Converts a stream of HTMLPurifier_Token into an HTMLPurifier_Node,
* and back again.
*
* @note This transformation is not an equivalence. We mutate the input
* token stream to make it so; see all [MUT] markers in code.
*/
class HTMLPurifier_Arborize
{
public static function arborize($tokens, $config, $context) {
$definition = $config->getHTMLDefinition();
$parent = new HTMLPurifier_Token_Start($definition->info_parent);
$stack = array($parent->toNode());
foreach ($tokens as $token) {
$token->skip = null; // [MUT]
$token->carryover = null; // [MUT]
if ($token instanceof HTMLPurifier_Token_End) {
$token->start = null; // [MUT]
$r = array_pop($stack);
//assert($r->name === $token->name);
//assert(empty($token->attr));
$r->endCol = $token->col;
$r->endLine = $token->line;
$r->endArmor = $token->armor;
continue;
}
$node = $token->toNode();
$stack[count($stack)-1]->children[] = $node;
if ($token instanceof HTMLPurifier_Token_Start) {
$stack[] = $node;
}
}
//assert(count($stack) == 1);
return $stack[0];
}
public static function flatten($node, $config, $context) {
$level = 0;
$nodes = array($level => new HTMLPurifier_Queue(array($node)));
$closingTokens = array();
$tokens = array();
do {
while (!$nodes[$level]->isEmpty()) {
$node = $nodes[$level]->shift(); // FIFO
list($start, $end) = $node->toTokenPair();
if ($level > 0) {
$tokens[] = $start;
}
if ($end !== NULL) {
$closingTokens[$level][] = $end;
}
if ($node instanceof HTMLPurifier_Node_Element) {
$level++;
$nodes[$level] = new HTMLPurifier_Queue();
foreach ($node->children as $childNode) {
$nodes[$level]->push($childNode);
}
}
}
$level--;
if ($level && isset($closingTokens[$level])) {
while ($token = array_pop($closingTokens[$level])) {
$tokens[] = $token;
}
}
} while ($level > 0);
return $tokens;
}
}

Some files were not shown because too many files have changed in this diff Show More