Release 4.9.1 (sic)

Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
Add test case for removing empty list items.
2025-08-03 12:47:56 +02:00 · 2017-03-08 00:22:36 -08:00 · 2017-03-08 00:11:32 -08:00 · 2017-03-07 17:52:41 -08:00 · 2017-03-07 17:34:59 -08:00 · 2017-03-06 23:27:30 -08:00
58 changed files with 981 additions and 187 deletions
--- a/.gitattributes
+++ b/.gitattributes
@@ -1,5 +1,6 @@
 /.gitattributes export-ignore
 /.gitignore export-ignore
+/.travis.yml export-ignore
 /Doxyfile export-ignore
 /art/ export-ignore
 /benchmarks/ export-ignore
--- a/.travis.yml
+++ b/.travis.yml
@@ -0,0 +1,11 @@
+language: php
+php:
+    - '5.4'
+    - '5.5'
+    - '5.6'
+    - '7.0'
+before_script:
+    - git clone --depth=50 https://github.com/ezyang/simpletest.git
+    - cp test-settings.travis.php test-settings.php
+script:
+    - php tests/index.php
--- a/2
+++ b/2
@@ -31,7 +31,7 @@ PROJECT_NAME           = HTMLPurifier
 # This could be handy for archiving the generated documentation or
 # if some version control system is used.

-PROJECT_NUMBER         = 4.8.0
+PROJECT_NUMBER         = 4.9.1

 # The OUTPUT_DIRECTORY tag is used to specify the (relative or absolute)
 # base path where the generated documentation will be put.
--- a/39
+++ b/39
@@ -9,6 +9,45 @@ NEWS ( CHANGELOG and HISTORY )                                     HTMLPurifier
    . Internal change
 ==========================

+4.9.1, released 2017-03-08
+! %URI.DefaultScheme can now be set to null, in which case
+  all relative paths are removed.
+! New CSS properties: min-width, max-width, min-height, max-height (#94)
+! Transparency (rgba) and hsl/hsla supported where color CSS is present.
+  Thanks @fxbt for contributing the patch. (#118)
+- When idn_to_ascii is defined, we might accept malformed
+  hostnames.  Apply validation to the result in such cases.
+- Close directory when done in Serializer DefinitionCache (#100)
+- Deleted some asserts to avoid linters from choking (#97)
+- Rework Serializer cache behavior to avoid chmod'ing if possible (#32)
+- Embedded semicolons in strings in CSS are now handled correctly!
+- We accidentally dropped certain Unicode characters if there was
+  one or more invalid characters.  This has been fixed, thanks
+  to mpyw <ryosuke_i_628@yahoo.co.jp>
+- Fix for "Don't truncate upon encountering </div> when using DOMLex"
+  caused a regression with HTML 4.01 Strict parsing with libxml 2.9.1
+  (and maybe later versions, but known OK with libxml 2.9.4).  The
+  fix is to go about handling truncation a bit more cleverly so that
+  we can wrap with divs (sidestepping the bug) but slurping out the
+  rest of the text in case it ran off the end.  (#78)
+- Fix PREG_BACKTRACK_LIMIT_ERROR in HTMLPurifier_Filter_ExtractStyle.
+  Thanks @breathbath for contributing the report and fix (#120)
+- Fix entity decoding algorithm to be more conservative about
+  decoding entities that are missing trailing semicolon.
+  To get old behavior, set %Core.LegacyEntityDecoder to true.
+  (#119)
+- Workaround libxml bug when HTML tags are embedded inside
+  script tags.  To disable workaround set %Core.AggressivelyRemoveScript
+  to false. (#83)
+# By default, when a link has a target attribute associated
+  with it, we now also add rel="noopener" in order to
+  prevent the new window from being able to overwrite
+  the original frame.  To disable this protection,
+  set %HTML.TargetNoopener to FALSE.
+
+4.9.0 was cut on Git but never properly released; when we did the
+real release we decided to skip this version number.
+
 4.8.0, released 2016-07-16
 # By default, when a link has a target attribute associated
  with it, we now also add rel="noreferrer" in order to
--- a/README.md
+++ b/README.md
@@ -1,4 +1,4 @@
-HTML Purifier
+HTML Purifier [![Build Status](https://secure.travis-ci.org/ezyang/htmlpurifier.svg?branch=master)](http://travis-ci.org/ezyang/htmlpurifier)
 =============

 HTML Purifier is an HTML filtering solution that uses a unique combination
--- a/2
+++ b/2
@@ -1 +1 @@
-4.8.0
+4.9.1
--- a/17
+++ b/17
@@ -1,9 +1,8 @@
-HTML Purifier 4.8.0 is a bugfix release, collecting a year
-of accumulated bug fixes.  In particular, we fixed some minor
-bugs and now declare full PHP 7 compatibility. The primary
-backwards-incompatible change is that HTML Purifier will now
-add rel="noreferrer" to all links with target attributes
-(you can disable this with %HTML.TargetNoReferrer.)  Other
-changes: new configuration options %CSS.AllowDuplicates and
-%Attr.ID.HTML5; border-radius is partially supported when
-%CSS.AllowProprietary, and tel URIs are supported by default.
+HTML Purifier 4.9.0 is a maintenance release, collecting a year
+of accumulated bug fixes plus a few new feature.  New features
+include support for min/max-width/height CSS, and rgba/hsl/hsla
+in color specifications.  Major bugfixes include improvements
+in the Serializer cache to avoid chmod'ing directories, better
+entity decoding (we won't accidentally encode entities that occur
+in URLs) and rel="noopener" on links with target attributes,
+to prevent them from overwriting the original frame.
--- a/composer.json
+++ b/composer.json
@@ -15,6 +15,9 @@
    "require": {
        "php": ">=5.2"
    },
+    "require-dev": {
+        "simpletest/simpletest": "^1.1"
+    },
    "autoload": {
        "psr-0": { "HTMLPurifier": "library/" },
        "files": ["library/HTMLPurifier.composer.php"]
--- a/configdoc/usage.xml
+++ b/configdoc/usage.xml
@@ -6,7 +6,7 @@
  </file>
  <file name="HTMLPurifier/Lexer.php">
   <line>85</line>
-   <line>315</line>
+   <line>326</line>
  </file>
  <file name="HTMLPurifier/Lexer/DirectLex.php">
   <line>67</line>
@@ -24,32 +24,32 @@
 </directive>
 <directive id="CSS.Proprietary">
  <file name="HTMLPurifier/CSSDefinition.php">
-   <line>319</line>
+   <line>323</line>
  </file>
 </directive>
 <directive id="CSS.AllowTricky">
  <file name="HTMLPurifier/CSSDefinition.php">
-   <line>323</line>
+   <line>327</line>
  </file>
 </directive>
 <directive id="CSS.Trusted">
  <file name="HTMLPurifier/CSSDefinition.php">
-   <line>327</line>
+   <line>331</line>
  </file>
 </directive>
 <directive id="CSS.AllowImportant">
  <file name="HTMLPurifier/CSSDefinition.php">
-   <line>331</line>
+   <line>335</line>
  </file>
 </directive>
 <directive id="CSS.AllowedProperties">
  <file name="HTMLPurifier/CSSDefinition.php">
-   <line>460</line>
+   <line>464</line>
  </file>
 </directive>
 <directive id="CSS.ForbiddenProperties">
  <file name="HTMLPurifier/CSSDefinition.php">
-   <line>476</line>
+   <line>480</line>
  </file>
 </directive>
 <directive id="Cache.DefinitionImpl">
@@ -79,19 +79,19 @@
 </directive>
 <directive id="Core.Encoding">
  <file name="HTMLPurifier/Encoder.php">
-   <line>374</line>
-   <line>422</line>
+   <line>380</line>
+   <line>428</line>
  </file>
 </directive>
 <directive id="Test.ForceNoIconv">
  <file name="HTMLPurifier/Encoder.php">
-   <line>382</line>
-   <line>433</line>
+   <line>388</line>
+   <line>439</line>
  </file>
 </directive>
 <directive id="Core.EscapeNonASCIICharacters">
  <file name="HTMLPurifier/Encoder.php">
-   <line>423</line>
+   <line>429</line>
  </file>
 </directive>
 <directive id="Output.CommentScriptContents">
@@ -124,7 +124,7 @@
   <line>122</line>
  </file>
  <file name="HTMLPurifier/Lexer.php">
-   <line>297</line>
+   <line>308</line>
  </file>
 </directive>
 <directive id="Output.Newline">
@@ -172,7 +172,8 @@
   <line>234</line>
  </file>
  <file name="HTMLPurifier/Lexer.php">
-   <line>302</line>
+   <line>313</line>
+   <line>352</line>
  </file>
  <file name="HTMLPurifier/HTMLModule/Image.php">
   <line>37</line>
@@ -232,6 +233,11 @@
   <line>276</line>
  </file>
 </directive>
+ <directive id="HTML.TargetNoopener">
+  <file name="HTMLPurifier/HTMLModuleManager.php">
+   <line>279</line>
+  </file>
+ </directive>
 <directive id="Attr.IDBlacklist">
  <file name="HTMLPurifier/IDAccumulator.php">
   <line>27</line>
@@ -255,14 +261,41 @@
   <line>62</line>
  </file>
 </directive>
+ <directive id="Core.LegacyEntityDecoder">
+  <file name="HTMLPurifier/Lexer.php">
+   <line>215</line>
+   <line>337</line>
+  </file>
+ </directive>
 <directive id="Core.ConvertDocumentToFragment">
  <file name="HTMLPurifier/Lexer.php">
-   <line>313</line>
+   <line>324</line>
  </file>
 </directive>
 <directive id="Core.RemoveProcessingInstructions">
  <file name="HTMLPurifier/Lexer.php">
-   <line>334</line>
+   <line>347</line>
+  </file>
+ </directive>
+ <directive id="Core.AggressivelyRemoveScript">
+  <file name="HTMLPurifier/Lexer.php">
+   <line>351</line>
+  </file>
+ </directive>
+ <directive id="Core.RemoveScriptContents">
+  <file name="HTMLPurifier/Lexer.php">
+   <line>352</line>
+  </file>
+  <file name="HTMLPurifier/Strategy/RemoveForeignElements.php">
+   <line>35</line>
+  </file>
+ </directive>
+ <directive id="Core.HiddenElements">
+  <file name="HTMLPurifier/Lexer.php">
+   <line>353</line>
+  </file>
+  <file name="HTMLPurifier/Strategy/RemoveForeignElements.php">
+   <line>36</line>
  </file>
 </directive>
 <directive id="URI.">
@@ -313,7 +346,7 @@
 </directive>
 <directive id="Core.ColorKeywords">
  <file name="HTMLPurifier/AttrDef/CSS/Color.php">
-   <line>19</line>
+   <line>29</line>
  </file>
  <file name="HTMLPurifier/AttrDef/HTML/Color.php">
   <line>19</line>
@@ -423,13 +456,13 @@
 </directive>
 <directive id="Cache.SerializerPath">
  <file name="HTMLPurifier/DefinitionCache/Serializer.php">
-   <line>183</line>
+   <line>185</line>
  </file>
 </directive>
 <directive id="Cache.SerializerPermissions">
  <file name="HTMLPurifier/DefinitionCache/Serializer.php">
-   <line>200</line>
-   <line>219</line>
+   <line>202</line>
+   <line>218</line>
  </file>
 </directive>
 <directive id="Filter.ExtractStyleBlocks.TidyImpl">
@@ -439,12 +472,12 @@
 </directive>
 <directive id="Filter.ExtractStyleBlocks.Scope">
  <file name="HTMLPurifier/Filter/ExtractStyleBlocks.php">
-   <line>122</line>
+   <line>125</line>
  </file>
 </directive>
 <directive id="Filter.ExtractStyleBlocks.Escaping">
  <file name="HTMLPurifier/Filter/ExtractStyleBlocks.php">
-   <line>327</line>
+   <line>330</line>
  </file>
 </directive>
 <directive id="HTML.SafeIframe">
@@ -534,16 +567,6 @@
   <line>32</line>
  </file>
 </directive>
- <directive id="Core.RemoveScriptContents">
-  <file name="HTMLPurifier/Strategy/RemoveForeignElements.php">
-   <line>35</line>
-  </file>
- </directive>
- <directive id="Core.HiddenElements">
-  <file name="HTMLPurifier/Strategy/RemoveForeignElements.php">
-   <line>36</line>
-  </file>
- </directive>
 <directive id="URI.HostBlacklist">
  <file name="HTMLPurifier/URIFilter/HostBlacklist.php">
   <line>25</line>
--- a/library/HTMLPurifier.includes.php
+++ b/library/HTMLPurifier.includes.php
@@ -7,7 +7,7 @@
 * primary concern and you are using an opcode cache. PLEASE DO NOT EDIT THIS
 * FILE, changes will be overwritten the next time the script is run.
 *
- * @version 4.8.0
+ * @version 4.9.1
 *
 * @warning
 *      You must *not* include any other HTML Purifier files before this file,
@@ -137,6 +137,7 @@ require 'HTMLPurifier/AttrTransform/SafeObject.php';
 require 'HTMLPurifier/AttrTransform/SafeParam.php';
 require 'HTMLPurifier/AttrTransform/ScriptRequired.php';
 require 'HTMLPurifier/AttrTransform/TargetBlank.php';
+require 'HTMLPurifier/AttrTransform/TargetNoopener.php';
 require 'HTMLPurifier/AttrTransform/TargetNoreferrer.php';
 require 'HTMLPurifier/AttrTransform/Textarea.php';
 require 'HTMLPurifier/ChildDef/Chameleon.php';
@@ -176,6 +177,7 @@ require 'HTMLPurifier/HTMLModule/StyleAttribute.php';
 require 'HTMLPurifier/HTMLModule/Tables.php';
 require 'HTMLPurifier/HTMLModule/Target.php';
 require 'HTMLPurifier/HTMLModule/TargetBlank.php';
+require 'HTMLPurifier/HTMLModule/TargetNoopener.php';
 require 'HTMLPurifier/HTMLModule/TargetNoreferrer.php';
 require 'HTMLPurifier/HTMLModule/Text.php';
 require 'HTMLPurifier/HTMLModule/Tidy.php';
--- a/library/HTMLPurifier.php
+++ b/library/HTMLPurifier.php
@@ -19,7 +19,7 @@
 */

 /*
-    HTML Purifier 4.8.0 - Standards Compliant HTML Filtering
+    HTML Purifier 4.9.1 - Standards Compliant HTML Filtering
    Copyright (C) 2006-2008 Edward Z. Yang

    This library is free software; you can redistribute it and/or
@@ -58,12 +58,12 @@ class HTMLPurifier
     * Version of HTML Purifier.
     * @type string
     */
-    public $version = '4.8.0';
+    public $version = '4.9.1';

    /**
     * Constant with version of HTML Purifier.
     */
-    const VERSION = '4.8.0';
+    const VERSION = '4.9.1';

    /**
     * Global configuration object.
--- a/library/HTMLPurifier.safe-includes.php
+++ b/library/HTMLPurifier.safe-includes.php
@@ -131,6 +131,7 @@ require_once $__dir . '/HTMLPurifier/AttrTransform/SafeObject.php';
 require_once $__dir . '/HTMLPurifier/AttrTransform/SafeParam.php';
 require_once $__dir . '/HTMLPurifier/AttrTransform/ScriptRequired.php';
 require_once $__dir . '/HTMLPurifier/AttrTransform/TargetBlank.php';
+require_once $__dir . '/HTMLPurifier/AttrTransform/TargetNoopener.php';
 require_once $__dir . '/HTMLPurifier/AttrTransform/TargetNoreferrer.php';
 require_once $__dir . '/HTMLPurifier/AttrTransform/Textarea.php';
 require_once $__dir . '/HTMLPurifier/ChildDef/Chameleon.php';
@@ -170,6 +171,7 @@ require_once $__dir . '/HTMLPurifier/HTMLModule/StyleAttribute.php';
 require_once $__dir . '/HTMLPurifier/HTMLModule/Tables.php';
 require_once $__dir . '/HTMLPurifier/HTMLModule/Target.php';
 require_once $__dir . '/HTMLPurifier/HTMLModule/TargetBlank.php';
+require_once $__dir . '/HTMLPurifier/HTMLModule/TargetNoopener.php';
 require_once $__dir . '/HTMLPurifier/HTMLModule/TargetNoreferrer.php';
 require_once $__dir . '/HTMLPurifier/HTMLModule/Text.php';
 require_once $__dir . '/HTMLPurifier/HTMLModule/Tidy.php';
--- a/library/HTMLPurifier/Arborize.php
+++ b/library/HTMLPurifier/Arborize.php
@@ -19,8 +19,8 @@ class HTMLPurifier_Arborize
            if ($token instanceof HTMLPurifier_Token_End) {
                $token->start = null; // [MUT]
                $r = array_pop($stack);
-                assert($r->name === $token->name);
-                assert(empty($token->attr));
+                //assert($r->name === $token->name);
+                //assert(empty($token->attr));
                $r->endCol = $token->col;
                $r->endLine = $token->line;
                $r->endArmor = $token->armor;
@@ -32,7 +32,7 @@ class HTMLPurifier_Arborize
                $stack[] = $node;
            }
        }
-        assert(count($stack) == 1);
+        //assert(count($stack) == 1);
        return $stack[0];
    }

--- a/library/HTMLPurifier/AttrDef.php
+++ b/library/HTMLPurifier/AttrDef.php
@@ -86,7 +86,13 @@ abstract class HTMLPurifier_AttrDef
     */
    protected function mungeRgb($string)
    {
-        return preg_replace('/rgb\((\d+)\s*,\s*(\d+)\s*,\s*(\d+)\)/', 'rgb(\1,\2,\3)', $string);
+        $p = '\s*(\d+(\.\d+)?([%]?))\s*';
+
+        if (preg_match('/(rgba|hsla)\(/', $string)) {
+            return preg_replace('/(rgba|hsla)\('.$p.','.$p.','.$p.','.$p.'\)/', '\1(\2,\5,\8,\11)', $string);
+        }
+
+        return preg_replace('/(rgb|hsl)\('.$p.','.$p.','.$p.'\)/', '\1(\2,\5,\8)', $string);
    }

    /**
--- a/library/HTMLPurifier/AttrDef/CSS.php
+++ b/library/HTMLPurifier/AttrDef/CSS.php
@@ -27,13 +27,38 @@ class HTMLPurifier_AttrDef_CSS extends HTMLPurifier_AttrDef
        $definition = $config->getCSSDefinition();
        $allow_duplicates = $config->get("CSS.AllowDuplicates");

-        // we're going to break the spec and explode by semicolons.
-        // This is because semicolon rarely appears in escaped form
-        // Doing this is generally flaky but fast
-        // IT MIGHT APPEAR IN URIs, see HTMLPurifier_AttrDef_CSSURI
-        // for details

-        $declarations = explode(';', $css);
+        // According to the CSS2.1 spec, the places where a
+        // non-delimiting semicolon can appear are in strings
+        // escape sequences.   So here is some dumb hack to
+        // handle quotes.
+        $len = strlen($css);
+        $accum = "";
+        $declarations = array();
+        $quoted = false;
+        for ($i = 0; $i < $len; $i++) {
+            $c = strcspn($css, ";'\"", $i);
+            $accum .= substr($css, $i, $c);
+            $i += $c;
+            if ($i == $len) break;
+            $d = $css[$i];
+            if ($quoted) {
+                $accum .= $d;
+                if ($d == $quoted) {
+                    $quoted = false;
+                }
+            } else {
+                if ($d == ";") {
+                    $declarations[] = $accum;
+                    $accum = "";
+                } else {
+                    $accum .= $d;
+                    $quoted = $d;
+                }
+            }
+        }
+        if ($accum != "") $declarations[] = $accum;
+
        $propvalues = array();
        $new_declarations = '';

--- a/library/HTMLPurifier/AttrDef/CSS/Color.php
+++ b/library/HTMLPurifier/AttrDef/CSS/Color.php
@@ -6,6 +6,16 @@
 class HTMLPurifier_AttrDef_CSS_Color extends HTMLPurifier_AttrDef
 {

+    /**
+     * @type HTMLPurifier_AttrDef_CSS_AlphaValue
+     */
+    protected $alpha;
+
+    public function __construct()
+    {
+        $this->alpha = new HTMLPurifier_AttrDef_CSS_AlphaValue();
+    }
+
    /**
     * @param string $color
     * @param HTMLPurifier_Config $config
@@ -29,59 +39,104 @@ class HTMLPurifier_AttrDef_CSS_Color extends HTMLPurifier_AttrDef
            return $colors[$lower];
        }

-        if (strpos($color, 'rgb(') !== false) {
-            // rgb literal handling
+        if (preg_match('#(rgb|rgba|hsl|hsla)\(#', $color, $matches) === 1) {
            $length = strlen($color);
            if (strpos($color, ')') !== $length - 1) {
                return false;
            }
-            $triad = substr($color, 4, $length - 4 - 1);
-            $parts = explode(',', $triad);
-            if (count($parts) !== 3) {
+
+            // get used function : rgb, rgba, hsl or hsla
+            $function = $matches[1];
+
+            $parameters_size = 3;
+            $alpha_channel = false;
+            if (substr($function, -1) === 'a') {
+                $parameters_size = 4;
+                $alpha_channel = true;
+            }
+
+            /*
+             * Allowed types for values :
+             * parameter_position => [type => max_value]
+             */
+            $allowed_types = [
+                1 => ['percentage' => 100, 'integer' => 255],
+                2 => ['percentage' => 100, 'integer' => 255],
+                3 => ['percentage' => 100, 'integer' => 255],
+            ];
+            $allow_different_types = false;
+
+            if (strpos($function, 'hsl') !== false) {
+                $allowed_types = [
+                    1 => ['integer' => 360],
+                    2 => ['percentage' => 100],
+                    3 => ['percentage' => 100],
+                ];
+                $allow_different_types = true;
+            }
+
+            $values = trim(str_replace($function, '', $color), ' ()');
+
+            $parts = explode(',', $values);
+            if (count($parts) !== $parameters_size) {
                return false;
            }
-            $type = false; // to ensure that they're all the same type
+
+            $type = false;
            $new_parts = array();
+            $i = 0;
+
            foreach ($parts as $part) {
+                $i++;
                $part = trim($part);
+
                if ($part === '') {
                    return false;
                }
-                $length = strlen($part);
-                if ($part[$length - 1] === '%') {
-                    // handle percents
-                    if (!$type) {
-                        $type = 'percentage';
-                    } elseif ($type !== 'percentage') {
+
+                // different check for alpha channel
+                if ($alpha_channel === true && $i === count($parts)) {
+                    $result = $this->alpha->validate($part, $config, $context);
+
+                    if ($result === false) {
                        return false;
                    }
-                    $num = (float)substr($part, 0, $length - 1);
-                    if ($num < 0) {
-                        $num = 0;
-                    }
-                    if ($num > 100) {
-                        $num = 100;
-                    }
-                    $new_parts[] = "$num%";
+
+                    $new_parts[] = (string)$result;
+                    continue;
+                }
+
+                if (substr($part, -1) === '%') {
+                    $current_type = 'percentage';
                } else {
-                    // handle integers
-                    if (!$type) {
-                        $type = 'integer';
-                    } elseif ($type !== 'integer') {
-                        return false;
-                    }
-                    $num = (int)$part;
-                    if ($num < 0) {
-                        $num = 0;
-                    }
-                    if ($num > 255) {
-                        $num = 255;
-                    }
-                    $new_parts[] = (string)$num;
+                    $current_type = 'integer';
+                }
+
+                if (!array_key_exists($current_type, $allowed_types[$i])) {
+                    return false;
+                }
+
+                if (!$type) {
+                    $type = $current_type;
+                }
+
+                if ($allow_different_types === false && $type != $current_type) {
+                    return false;
+                }
+
+                $max_value = $allowed_types[$i][$current_type];
+
+                if ($current_type == 'integer') {
+                    // Return value between range 0 -> $max_value
+                    $new_parts[] = (int)max(min($part, $max_value), 0);
+                } elseif ($current_type == 'percentage') {
+                    $new_parts[] = (float)max(min(rtrim($part, '%'), $max_value), 0) . '%';
                }
            }
-            $new_triad = implode(',', $new_parts);
-            $color = "rgb($new_triad)";
+
+            $new_values = implode(',', $new_parts);
+
+            $color = $function . '(' . $new_values . ')';
        } else {
            // hexadecimal handling
            if ($color[0] === '#') {
@@ -100,6 +155,7 @@ class HTMLPurifier_AttrDef_CSS_Color extends HTMLPurifier_AttrDef
        }
        return $color;
    }
+
 }

 // vim: et sw=4 sts=4
--- a/library/HTMLPurifier/AttrDef/URI/Host.php
+++ b/library/HTMLPurifier/AttrDef/URI/Host.php
@@ -97,7 +97,7 @@ class HTMLPurifier_AttrDef_URI_Host extends HTMLPurifier_AttrDef

        // PHP 5.3 and later support this functionality natively
        if (function_exists('idn_to_ascii')) {
-            return idn_to_ascii($string);
+            $string = idn_to_ascii($string);

        // If we have Net_IDNA2 support, we can support IRIs by
        // punycoding them. (This is the most portable thing to do,
@@ -123,13 +123,14 @@ class HTMLPurifier_AttrDef_URI_Host extends HTMLPurifier_AttrDef
                    }
                }
                $string = implode('.', $new_parts);
-                if (preg_match("/^($domainlabel\.)*$toplabel\.?$/i", $string)) {
-                    return $string;
-                }
            } catch (Exception $e) {
                // XXX error reporting
            }
        }
+        // Try again
+        if (preg_match("/^($domainlabel\.)*$toplabel\.?$/i", $string)) {
+            return $string;
+        }
        return false;
    }
 }
--- a/library/HTMLPurifier/AttrTransform/TargetNoopener.php
+++ b/library/HTMLPurifier/AttrTransform/TargetNoopener.php
@@ -0,0 +1,37 @@
+<?php
+
+// must be called POST validation
+
+/**
+ * Adds rel="noopener" to any links which target a different window
+ * than the current one.  This is used to prevent malicious websites
+ * from silently replacing the original window, which could be used
+ * to do phishing.
+ * This transform is controlled by %HTML.TargetNoopener.
+ */
+class HTMLPurifier_AttrTransform_TargetNoopener extends HTMLPurifier_AttrTransform
+{
+    /**
+     * @param array $attr
+     * @param HTMLPurifier_Config $config
+     * @param HTMLPurifier_Context $context
+     * @return array
+     */
+    public function transform($attr, $config, $context)
+    {
+        if (isset($attr['rel'])) {
+            $rels = explode(' ', $attr['rel']);
+        } else {
+            $rels = array();
+        }
+        if (isset($attr['target']) && !in_array('noopener', $rels)) {
+            $rels[] = 'noopener';
+        }
+        if (!empty($rels) || isset($attr['rel'])) {
+            $attr['rel'] = implode(' ', $rels);
+        }
+
+        return $attr;
+    }
+}
+
--- a/library/HTMLPurifier/CSSDefinition.php
+++ b/library/HTMLPurifier/CSSDefinition.php
@@ -225,6 +225,10 @@ class HTMLPurifier_CSSDefinition extends HTMLPurifier_Definition
        );
        $max = $config->get('CSS.MaxImgLength');

+        $this->info['min-width'] =
+        $this->info['max-width'] =
+        $this->info['min-height'] =
+        $this->info['max-height'] =
        $this->info['width'] =
        $this->info['height'] =
            $max === null ?
--- a/library/HTMLPurifier/ChildDef/Table.php
+++ b/library/HTMLPurifier/ChildDef/Table.php
@@ -203,7 +203,7 @@ class HTMLPurifier_ChildDef_Table extends HTMLPurifier_ChildDef
                    $current_tr_tbody->children[] = $node;
                    break;
                case '#PCDATA':
-                    assert($node->is_whitespace);
+                    //assert($node->is_whitespace);
                    if ($current_tr_tbody === null) {
                        $ret[] = $node;
                    } else {
--- a/library/HTMLPurifier/Config.php
+++ b/library/HTMLPurifier/Config.php
@@ -21,7 +21,7 @@ class HTMLPurifier_Config
     * HTML Purifier's version
     * @type string
     */
-    public $version = '4.8.0';
+    public $version = '4.9.1';

    /**
     * Whether or not to automatically finalize
--- a/library/HTMLPurifier/ConfigSchema/schema.ser
+++ b/library/HTMLPurifier/ConfigSchema/schema.ser
--- a/library/HTMLPurifier/ConfigSchema/schema/Core.AggressivelyRemoveScript.txt
+++ b/library/HTMLPurifier/ConfigSchema/schema/Core.AggressivelyRemoveScript.txt
@@ -0,0 +1,16 @@
+Core.AggressivelyRemoveScript
+TYPE: bool
+VERSION: 4.9.0
+DEFAULT: true
+--DESCRIPTION--
+<p>
+    This directive enables aggressive pre-filter removal of
+    script tags.  This is not necessary for security,
+    but it can help work around a bug in libxml where embedded
+    HTML elements inside script sections cause the parser to
+    choke.  To revert to pre-4.9.0 behavior, set this to false.
+    This directive has no effect if %Core.Trusted is true,
+    %Core.RemoveScriptContents is false, or %Core.HiddenElements
+    does not contain script.
+</p>
+--# vim: et sw=4 sts=4
--- a/library/HTMLPurifier/ConfigSchema/schema/Core.LegacyEntityDecoder.txt
+++ b/library/HTMLPurifier/ConfigSchema/schema/Core.LegacyEntityDecoder.txt
@@ -0,0 +1,36 @@
+Core.LegacyEntityDecoder
+TYPE: bool
+VERSION: 4.9.0
+DEFAULT: false
+--DESCRIPTION--
+<p>
+    Prior to HTML Purifier 4.9.0, entities were decoded by performing
+    a global search replace for all entities whose decoded versions
+    did not have special meanings under HTML, and replaced them with
+    their decoded versions.  We would match all entities, even if they did
+    not have a trailing semicolon, but only if there weren't any trailing
+    alphanumeric characters.
+</p>
+<table>
+<tr><th>Original</th><th>Text</th><th>Attribute</th></tr>
+<tr><td>&amp;yen;</td><td>&yen;</td><td>&yen;</td></tr>
+<tr><td>&amp;yen</td><td>&yen;</td><td>&yen;</td></tr>
+<tr><td>&amp;yena</td><td>&amp;yena</td><td>&amp;yena</td></tr>
+<tr><td>&amp;yen=</td><td>&yen;=</td><td>&yen;=</td></tr>
+</table>
+<p>
+    In HTML Purifier 4.9.0, we changed the behavior of entity parsing
+    to match entities that had missing trailing semicolons in less
+    cases, to more closely match HTML5 parsing behavior:
+</p>
+<table>
+<tr><th>Original</th><th>Text</th><th>Attribute</th></tr>
+<tr><td>&amp;yen;</td><td>&yen;</td><td>&yen;</td></tr>
+<tr><td>&amp;yen</td><td>&yen;</td><td>&yen;</td></tr>
+<tr><td>&amp;yena</td><td>&yen;a</td><td>&amp;yena</td></tr>
+<tr><td>&amp;yen=</td><td>&yen;=</td><td>&amp;yen=</td></tr>
+</table>
+<p>
+    This flag reverts back to pre-HTML Purifier 4.9.0 behavior.
+</p>
+--# vim: et sw=4 sts=4
--- a/library/HTMLPurifier/ConfigSchema/schema/HTML.TargetNoopener.txt
+++ b/library/HTMLPurifier/ConfigSchema/schema/HTML.TargetNoopener.txt
@@ -0,0 +1,10 @@
+--# vim: et sw=4 sts=4
+HTML.TargetNoopener
+TYPE: bool
+VERSION: 4.8.0
+DEFAULT: TRUE
+--DESCRIPTION--
+If enabled, noopener rel attributes are added to links which have
+a target attribute associated with them.  This prevents malicious
+destinations from overwriting the original window.
+--# vim: et sw=4 sts=4
--- a/library/HTMLPurifier/ConfigSchema/schema/URI.DefaultScheme.txt
+++ b/library/HTMLPurifier/ConfigSchema/schema/URI.DefaultScheme.txt
@@ -1,5 +1,5 @@
 URI.DefaultScheme
-TYPE: string
+TYPE: string/null
 DEFAULT: 'http'
 --DESCRIPTION--

@@ -7,4 +7,9 @@ DEFAULT: 'http'
    Defines through what scheme the output will be served, in order to
    select the proper object validator when no scheme information is present.
 </p>
+
+<p>
+    Starting with HTML Purifier 4.9.0, the default scheme can be null, in
+    which case we reject all URIs which do not have explicit schemes.
+</p>
 --# vim: et sw=4 sts=4
--- a/library/HTMLPurifier/DefinitionCache/Serializer.php
+++ b/library/HTMLPurifier/DefinitionCache/Serializer.php
@@ -112,6 +112,7 @@ class HTMLPurifier_DefinitionCache_Serializer extends HTMLPurifier_DefinitionCac
            }
            unlink($dir . '/' . $filename);
        }
+        closedir($dh);
        return true;
    }

@@ -142,6 +143,7 @@ class HTMLPurifier_DefinitionCache_Serializer extends HTMLPurifier_DefinitionCac
                unlink($dir . '/' . $filename);
            }
        }
+        closedir($dh);
        return true;
    }

@@ -198,11 +200,8 @@ class HTMLPurifier_DefinitionCache_Serializer extends HTMLPurifier_DefinitionCac
        if ($result !== false) {
            // set permissions of the new file (no execute)
            $chmod = $config->get('Cache.SerializerPermissions');
-            if ($chmod === null) {
-                // don't do anything
-            } else {
-                $chmod = $chmod & 0666;
-                chmod($file, $chmod);
+            if ($chmod !== null) {
+                chmod($file, $chmod & 0666);
            }
        }
        return $result;
@@ -217,6 +216,11 @@ class HTMLPurifier_DefinitionCache_Serializer extends HTMLPurifier_DefinitionCac
    {
        $directory = $this->generateDirectoryPath($config);
        $chmod = $config->get('Cache.SerializerPermissions');
+        if ($chmod === null) {
+            // TODO: This races
+            if (is_dir($directory)) return true;
+            return mkdir($directory);
+        }
        if (!is_dir($directory)) {
            $base = $this->generateBaseDirectoryPath($config);
            if (!is_dir($base)) {
@@ -229,25 +233,14 @@ class HTMLPurifier_DefinitionCache_Serializer extends HTMLPurifier_DefinitionCac
            } elseif (!$this->_testPermissions($base, $chmod)) {
                return false;
            }
-            if ($chmod === null) {
+            if (!mkdir($directory, $chmod)) {
                trigger_error(
-                    'Base directory ' . $base . ' does not exist,
-                    please create or change using %Cache.SerializerPath',
+                    'Could not create directory ' . $directory . '',
                    E_USER_WARNING
                );
                return false;
            }
-            if ($chmod !== null) {
-                mkdir($directory, $chmod);
-            } else {
-                mkdir($directory);
-            }
            if (!$this->_testPermissions($directory, $chmod)) {
-                trigger_error(
-                    'Base directory ' . $base . ' does not exist,
-                    please create or change using %Cache.SerializerPath',
-                    E_USER_WARNING
-                );
                return false;
            }
        } elseif (!$this->_testPermissions($directory, $chmod)) {
--- a/library/HTMLPurifier/Encoder.php
+++ b/library/HTMLPurifier/Encoder.php
@@ -101,6 +101,14 @@ class HTMLPurifier_Encoder
     * It will parse according to UTF-8 and return a valid UTF8 string, with
     * non-SGML codepoints excluded.
     *
+     * Specifically, it will permit:
+     * \x{9}\x{A}\x{D}\x{20}-\x{7E}\x{A0}-\x{D7FF}\x{E000}-\x{FFFD}\x{10000}-\x{10FFFF}
+     * Source: https://www.w3.org/TR/REC-xml/#NT-Char
+     * Arguably this function should be modernized to the HTML5 set
+     * of allowed characters:
+     * https://www.w3.org/TR/html5/syntax.html#preprocessing-the-input-stream
+     * which simultaneously expand and restrict the set of allowed characters.
+     *
     * @param string $str The string to clean
     * @param bool $force_php
     * @return string
@@ -122,15 +130,12 @@ class HTMLPurifier_Encoder
     *       function that needs to be able to understand UTF-8 characters.
     *       As of right now, only smart lossless character encoding converters
     *       would need that, and I'm probably not going to implement them.
-     *       Once again, PHP 6 should solve all our problems.
     */
    public static function cleanUTF8($str, $force_php = false)
    {
        // UTF-8 validity is checked since PHP 4.3.5
        // This is an optimization: if the string is already valid UTF-8, no
        // need to do PHP stuff. 99% of the time, this will be the case.
-        // The regexp matches the XML char production, as well as well as excluding
-        // non-SGML codepoints U+007F to U+009F
        if (preg_match(
            '/^[\x{9}\x{A}\x{D}\x{20}-\x{7E}\x{A0}-\x{D7FF}\x{E000}-\x{FFFD}\x{10000}-\x{10FFFF}]*$/Du',
            $str
@@ -255,6 +260,7 @@ class HTMLPurifier_Encoder
                                // 7F-9F is not strictly prohibited by XML,
                                // but it is non-SGML, and thus we don't allow it
                                (0xA0 <= $mUcs4 && 0xD7FF >= $mUcs4) ||
+                                (0xE000 <= $mUcs4 && 0xFFFD >= $mUcs4) ||
                                (0x10000 <= $mUcs4 && 0x10FFFF >= $mUcs4)
                            )
                        ) {
--- a/library/HTMLPurifier/EntityParser.php
+++ b/library/HTMLPurifier/EntityParser.php
@@ -16,6 +16,138 @@ class HTMLPurifier_EntityParser
     */
    protected $_entity_lookup;

+    /**
+     * Callback regex string for entities in text.
+     * @type string
+     */
+    protected $_textEntitiesRegex;
+
+    /**
+     * Callback regex string for entities in attributes.
+     * @type string
+     */
+    protected $_attrEntitiesRegex;
+
+    /**
+     * Tests if the beginning of a string is a semi-optional regex
+     */
+    protected $_semiOptionalPrefixRegex;
+
+    public function __construct() {
+        // From
+        // http://stackoverflow.com/questions/15532252/why-is-reg-being-rendered-as-without-the-bounding-semicolon
+        $semi_optional = "quot|QUOT|lt|LT|gt|GT|amp|AMP|AElig|Aacute|Acirc|Agrave|Aring|Atilde|Auml|COPY|Ccedil|ETH|Eacute|Ecirc|Egrave|Euml|Iacute|Icirc|Igrave|Iuml|Ntilde|Oacute|Ocirc|Ograve|Oslash|Otilde|Ouml|REG|THORN|Uacute|Ucirc|Ugrave|Uuml|Yacute|aacute|acirc|acute|aelig|agrave|aring|atilde|auml|brvbar|ccedil|cedil|cent|copy|curren|deg|divide|eacute|ecirc|egrave|eth|euml|frac12|frac14|frac34|iacute|icirc|iexcl|igrave|iquest|iuml|laquo|macr|micro|middot|nbsp|not|ntilde|oacute|ocirc|ograve|ordf|ordm|oslash|otilde|ouml|para|plusmn|pound|raquo|reg|sect|shy|sup1|sup2|sup3|szlig|thorn|times|uacute|ucirc|ugrave|uml|uuml|yacute|yen|yuml";
+
+        // NB: three empty captures to put the fourth match in the right
+        // place
+        $this->_semiOptionalPrefixRegex = "/&()()()($semi_optional)/";
+
+        $this->_textEntitiesRegex =
+            '/&(?:'.
+            // hex
+            '[#]x([a-fA-F0-9]+);?|'.
+            // dec
+            '[#]0*(\d+);?|'.
+            // string (mandatory semicolon)
+            // NB: order matters: match semicolon preferentially
+            '([A-Za-z_:][A-Za-z0-9.\-_:]*);|'.
+            // string (optional semicolon)
+            "($semi_optional)".
+            ')/';
+
+        $this->_attrEntitiesRegex =
+            '/&(?:'.
+            // hex
+            '[#]x([a-fA-F0-9]+);?|'.
+            // dec
+            '[#]0*(\d+);?|'.
+            // string (mandatory semicolon)
+            // NB: order matters: match semicolon preferentially
+            '([A-Za-z_:][A-Za-z0-9.\-_:]*);|'.
+            // string (optional semicolon)
+            // don't match if trailing is equals or alphanumeric (URL
+            // like)
+            "($semi_optional)(?![=;A-Za-z0-9])".
+            ')/';
+
+    }
+
+    /**
+     * Substitute entities with the parsed equivalents.  Use this on
+     * textual data in an HTML document (as opposed to attributes.)
+     *
+     * @param string $string String to have entities parsed.
+     * @return string Parsed string.
+     */
+    public function substituteTextEntities($string)
+    {
+        return preg_replace_callback(
+            $this->_textEntitiesRegex,
+            array($this, 'entityCallback'),
+            $string
+        );
+    }
+
+    /**
+     * Substitute entities with the parsed equivalents.  Use this on
+     * attribute contents in documents.
+     *
+     * @param string $string String to have entities parsed.
+     * @return string Parsed string.
+     */
+    public function substituteAttrEntities($string)
+    {
+        return preg_replace_callback(
+            $this->_attrEntitiesRegex,
+            array($this, 'entityCallback'),
+            $string
+        );
+    }
+
+    /**
+     * Callback function for substituteNonSpecialEntities() that does the work.
+     *
+     * @param array $matches  PCRE matches array, with 0 the entire match, and
+     *                  either index 1, 2 or 3 set with a hex value, dec value,
+     *                  or string (respectively).
+     * @return string Replacement string.
+     */
+
+    protected function entityCallback($matches)
+    {
+        $entity = $matches[0];
+        $hex_part = @$matches[1];
+        $dec_part = @$matches[2];
+        $named_part = empty($matches[3]) ? @$matches[4] : $matches[3];
+        if ($hex_part) {
+            return HTMLPurifier_Encoder::unichr(hexdec($hex_part));
+        } elseif ($dec_part) {
+            return HTMLPurifier_Encoder((int) $dec_part);
+        } else {
+            if (!$this->_entity_lookup) {
+                $this->_entity_lookup = HTMLPurifier_EntityLookup::instance();
+            }
+            if (isset($this->_entity_lookup->table[$named_part])) {
+                return $this->_entity_lookup->table[$named_part];
+            } else {
+                // exact match didn't match anything, so test if
+                // any of the semicolon optional match the prefix.
+                // Test that this is an EXACT match is important to
+                // prevent infinite loop
+                if (!empty($matches[3])) {
+                    return preg_replace_callback(
+                        $this->_semiOptionalPrefixRegex,
+                        array($this, 'entityCallback'),
+                        $entity
+                    );
+                }
+                return $entity;
+            }
+        }
+    }
+
+    // LEGACY CODE BELOW
+
    /**
     * Callback regex string for parsing entities.
     * @type string
@@ -144,7 +276,7 @@ class HTMLPurifier_EntityParser
                $entity;
        } else {
            return isset($this->_special_ent2dec[$matches[3]]) ?
-                $this->_special_ent2dec[$matches[3]] :
+                $this->_special_dec2str[$this->_special_ent2dec[$matches[3]]] :
                $entity;
        }
    }
--- a/library/HTMLPurifier/Filter/ExtractStyleBlocks.php
+++ b/library/HTMLPurifier/Filter/ExtractStyleBlocks.php
@@ -95,7 +95,10 @@ class HTMLPurifier_Filter_ExtractStyleBlocks extends HTMLPurifier_Filter
        if ($tidy !== null) {
            $this->_tidy = $tidy;
        }
-        $html = preg_replace_callback('#<style(?:\s.*)?>(.+)</style>#isU', array($this, 'styleCallback'), $html);
+        // NB: this must be NON-greedy because if we have
+        // <style>foo</style>  <style>bar</style>
+        // we must not grab foo</style>  <style>bar
+        $html = preg_replace_callback('#<style(?:\s.*)?>(.*)<\/style>#isU', array($this, 'styleCallback'), $html);
        $style_blocks = $this->_styleMatches;
        $this->_styleMatches = array(); // reset
        $context->register('StyleBlocks', $style_blocks); // $context must not be reused
--- a/library/HTMLPurifier/HTMLModule/TargetNoopener.php
+++ b/library/HTMLPurifier/HTMLModule/TargetNoopener.php
@@ -0,0 +1,21 @@
+<?php
+
+/**
+ * Module adds the target-based noopener attribute transformation to a tags.  It
+ * is enabled by HTML.TargetNoopener
+ */
+class HTMLPurifier_HTMLModule_TargetNoopener extends HTMLPurifier_HTMLModule
+{
+    /**
+     * @type string
+     */
+    public $name = 'TargetNoopener';
+
+    /**
+     * @param HTMLPurifier_Config $config
+     */
+    public function setup($config) {
+        $a = $this->addBlankElement('a');
+        $a->attr_transform_post[] = new HTMLPurifier_AttrTransform_TargetNoopener();
+    }
+}
--- a/library/HTMLPurifier/HTMLModuleManager.php
+++ b/library/HTMLPurifier/HTMLModuleManager.php
@@ -271,11 +271,14 @@ class HTMLPurifier_HTMLModuleManager
        if ($config->get('HTML.TargetBlank')) {
            $modules[] = 'TargetBlank';
        }
-        // NB: HTML.TargetNoreferrer must be AFTER HTML.TargetBlank
+        // NB: HTML.TargetNoreferrer and HTML.TargetNoopener must be AFTER HTML.TargetBlank
        // so that its post-attr-transform gets run afterwards.
        if ($config->get('HTML.TargetNoreferrer')) {
            $modules[] = 'TargetNoreferrer';
        }
+        if ($config->get('HTML.TargetNoopener')) {
+            $modules[] = 'TargetNoopener';
+        }

        // merge in custom modules
        $modules = array_merge($modules, $this->userModules);
--- a/library/HTMLPurifier/Lexer.php
+++ b/library/HTMLPurifier/Lexer.php
@@ -169,21 +169,24 @@ class HTMLPurifier_Lexer
            '&#x27;' => "'"
        );

+    public function parseText($string, $config) {
+        return $this->parseData($string, false, $config);
+    }
+
+    public function parseAttr($string, $config) {
+        return $this->parseData($string, true, $config);
+    }
+
    /**
     * Parses special entities into the proper characters.
     *
     * This string will translate escaped versions of the special characters
     * into the correct ones.
     *
-     * @warning
-     * You should be able to treat the output of this function as
-     * completely parsed, but that's only because all other entities should
-     * have been handled previously in substituteNonSpecialEntities()
-     *
     * @param string $string String character data to be parsed.
     * @return string Parsed character data.
     */
-    public function parseData($string)
+    public function parseData($string, $is_attr, $config)
    {
        // following functions require at least one character
        if ($string === '') {
@@ -209,7 +212,15 @@ class HTMLPurifier_Lexer
        }

        // hmm... now we have some uncommon entities. Use the callback.
-        $string = $this->_entity_parser->substituteSpecialEntities($string);
+        if ($config->get('Core.LegacyEntityDecoder')) {
+            $string = $this->_entity_parser->substituteSpecialEntities($string);
+        } else {
+            if ($is_attr) {
+                $string = $this->_entity_parser->substituteAttrEntities($string);
+            } else {
+                $string = $this->_entity_parser->substituteTextEntities($string);
+            }
+        }
        return $string;
    }

@@ -323,7 +334,9 @@ class HTMLPurifier_Lexer
        }

        // expand entities that aren't the big five
-        $html = $this->_entity_parser->substituteNonSpecialEntities($html);
+        if ($config->get('Core.LegacyEntityDecoder')) {
+            $html = $this->_entity_parser->substituteNonSpecialEntities($html);
+        }

        // clean into wellformed UTF-8 string for an SGML context: this has
        // to be done after entity expansion because the entities sometimes
@@ -335,6 +348,12 @@ class HTMLPurifier_Lexer
            $html = preg_replace('#<\?.+?\?>#s', '', $html);
        }

+        if ($config->get('Core.AggressivelyRemoveScript') &&
+            !($config->get('HTML.Trusted') || !$config->get('Core.RemoveScriptContents')
+            || empty($config->get('Core.HiddenElements')["script"]))) {
+            $html = preg_replace('#<script[^>]*>.*?</script>#i', '', $html);
+        }
+
        return $html;
    }

--- a/library/HTMLPurifier/Lexer/DOMLex.php
+++ b/library/HTMLPurifier/Lexer/DOMLex.php
@@ -72,12 +72,20 @@ class HTMLPurifier_Lexer_DOMLex extends HTMLPurifier_Lexer
        $doc->loadHTML($html);
        restore_error_handler();

+        $body = $doc->getElementsByTagName('html')->item(0)-> // <html>
+                      getElementsByTagName('body')->item(0);  // <body>
+
+        $div = $body->getElementsByTagName('div')->item(0); // <div>
        $tokens = array();
-        $this->tokenizeDOM(
-            $doc->getElementsByTagName('html')->item(0)-> // <html>
-            getElementsByTagName('body')->item(0), //   <body>
-            $tokens
-        );
+        $this->tokenizeDOM($div, $tokens, $config);
+        // If the div has a sibling, that means we tripped across
+        // a premature </div> tag.  So remove the div we parsed,
+        // and then tokenize the rest of body.  We can't tokenize
+        // the sibling directly as we'll lose the tags in that case.
+        if ($div->nextSibling) {
+            $body->removeChild($div);
+            $this->tokenizeDOM($body, $tokens, $config);
+        }
        return $tokens;
    }

@@ -88,7 +96,7 @@ class HTMLPurifier_Lexer_DOMLex extends HTMLPurifier_Lexer
     * @param HTMLPurifier_Token[] $tokens   Array-list of already tokenized tokens.
     * @return HTMLPurifier_Token of node appended to previously passed tokens.
     */
-    protected function tokenizeDOM($node, &$tokens)
+    protected function tokenizeDOM($node, &$tokens, $config)
    {
        $level = 0;
        $nodes = array($level => new HTMLPurifier_Queue(array($node)));
@@ -97,7 +105,7 @@ class HTMLPurifier_Lexer_DOMLex extends HTMLPurifier_Lexer
            while (!$nodes[$level]->isEmpty()) {
                $node = $nodes[$level]->shift(); // FIFO
                $collect = $level > 0 ? true : false;
-                $needEndingTag = $this->createStartNode($node, $tokens, $collect);
+                $needEndingTag = $this->createStartNode($node, $tokens, $collect, $config);
                if ($needEndingTag) {
                    $closingNodes[$level][] = $node;
                }
@@ -127,7 +135,7 @@ class HTMLPurifier_Lexer_DOMLex extends HTMLPurifier_Lexer
     * @return bool if the token needs an endtoken
     * @todo data and tagName properties don't seem to exist in DOMNode?
     */
-    protected function createStartNode($node, &$tokens, $collect)
+    protected function createStartNode($node, &$tokens, $collect, $config)
    {
        // intercept non element nodes. WE MUST catch all of them,
        // but we're not getting the character reference nodes because
@@ -151,7 +159,7 @@ class HTMLPurifier_Lexer_DOMLex extends HTMLPurifier_Lexer
                    }
                }
            }
-            $tokens[] = $this->factory->createText($this->parseData($data));
+            $tokens[] = $this->factory->createText($this->parseText($data, $config));
            return false;
        } elseif ($node->nodeType === XML_COMMENT_NODE) {
            // this is code is only invoked for comments in script/style in versions
@@ -252,7 +260,7 @@ class HTMLPurifier_Lexer_DOMLex extends HTMLPurifier_Lexer
     * @param HTMLPurifier_Context $context
     * @return string
     */
-    protected function wrapHTML($html, $config, $context)
+    protected function wrapHTML($html, $config, $context, $use_div = true)
    {
        $def = $config->getDefinition('HTML');
        $ret = '';
@@ -271,7 +279,11 @@ class HTMLPurifier_Lexer_DOMLex extends HTMLPurifier_Lexer
        $ret .= '<html><head>';
        $ret .= '<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />';
        // No protection if $html contains a stray </div>!
-        $ret .= '</head><body>' . $html . '</body></html>';
+        $ret .= '</head><body>';
+        if ($use_div) $ret .= '<div>';
+        $ret .= $html;
+        if ($use_div) $ret .= '</div>';
+        $ret .= '</body></html>';
        return $ret;
    }
 }
--- a/library/HTMLPurifier/Lexer/DirectLex.php
+++ b/library/HTMLPurifier/Lexer/DirectLex.php
@@ -129,12 +129,12 @@ class HTMLPurifier_Lexer_DirectLex extends HTMLPurifier_Lexer
                // We are not inside tag and there still is another tag to parse
                $token = new
                HTMLPurifier_Token_Text(
-                    $this->parseData(
+                    $this->parseText(
                        substr(
                            $html,
                            $cursor,
                            $position_next_lt - $cursor
-                        )
+                        ), $config
                    )
                );
                if ($maintain_line_numbers) {
@@ -154,11 +154,11 @@ class HTMLPurifier_Lexer_DirectLex extends HTMLPurifier_Lexer
                // Create Text of rest of string
                $token = new
                HTMLPurifier_Token_Text(
-                    $this->parseData(
+                    $this->parseText(
                        substr(
                            $html,
                            $cursor
-                        )
+                        ), $config
                    )
                );
                if ($maintain_line_numbers) {
@@ -324,8 +324,8 @@ class HTMLPurifier_Lexer_DirectLex extends HTMLPurifier_Lexer
                $token = new
                HTMLPurifier_Token_Text(
                    '<' .
-                    $this->parseData(
-                        substr($html, $cursor)
+                    $this->parseText(
+                        substr($html, $cursor), $config
                    )
                );
                if ($maintain_line_numbers) {
@@ -429,7 +429,7 @@ class HTMLPurifier_Lexer_DirectLex extends HTMLPurifier_Lexer
            if ($value === false) {
                $value = '';
            }
-            return array($key => $this->parseData($value));
+            return array($key => $this->parseAttr($value, $config));
        }

        // setup loop environment
@@ -518,7 +518,7 @@ class HTMLPurifier_Lexer_DirectLex extends HTMLPurifier_Lexer
                if ($value === false) {
                    $value = '';
                }
-                $array[$key] = $this->parseData($value);
+                $array[$key] = $this->parseAttr($value, $config);
                $cursor++;
            } else {
                // boolattr
--- a/library/HTMLPurifier/Lexer/PH5P.php
+++ b/library/HTMLPurifier/Lexer/PH5P.php
@@ -21,7 +21,7 @@ class HTMLPurifier_Lexer_PH5P extends HTMLPurifier_Lexer_DOMLex
    public function tokenizeHTML($html, $config, $context)
    {
        $new_html = $this->normalize($html, $config, $context);
-        $new_html = $this->wrapHTML($new_html, $config, $context);
+        $new_html = $this->wrapHTML($new_html, $config, $context, false /* no div */);
        try {
            $parser = new HTML5($new_html);
            $doc = $parser->save();
@@ -34,9 +34,9 @@ class HTMLPurifier_Lexer_PH5P extends HTMLPurifier_Lexer_DOMLex
        $tokens = array();
        $this->tokenizeDOM(
            $doc->getElementsByTagName('html')->item(0)-> // <html>
-                getElementsByTagName('body')->item(0) //   <body>
+                  getElementsByTagName('body')->item(0) //   <body>
            ,
-            $tokens
+            $tokens, $config
        );
        return $tokens;
    }
@@ -1515,6 +1515,7 @@ class HTML5
                // Consume the maximum number of characters possible, with the
                // consumed characters case-sensitively matching one of the
                // identifiers in the first column of the entities table.
+
                $e_name = $this->characters('0-9A-Za-z;', $this->char + 1);
                $len = strlen($e_name);

@@ -1547,7 +1548,7 @@ class HTML5

        // Return a character token for the character corresponding to the
        // entity name (as given by the second column of the entities table).
-        return html_entity_decode('&' . $entity . ';', ENT_QUOTES, 'UTF-8');
+        return html_entity_decode('&' . rtrim($entity, ';') . ';', ENT_QUOTES, 'UTF-8');
    }

    private function emitToken($token)
--- a/library/HTMLPurifier/Strategy/MakeWellFormed.php
+++ b/library/HTMLPurifier/Strategy/MakeWellFormed.php
@@ -165,7 +165,7 @@ class HTMLPurifier_Strategy_MakeWellFormed extends HTMLPurifier_Strategy
                        if (empty($zipper->front)) break;
                        $token = $zipper->prev($token);
                        // indicate that other injectors should not process this token,
-                        // but we need to reprocess it
+                        // but we need to reprocess it.  See Note [Injector skips]
                        unset($token->skip[$i]);
                        $token->rewind = $i;
                        if ($token instanceof HTMLPurifier_Token_Start) {
@@ -210,6 +210,7 @@ class HTMLPurifier_Strategy_MakeWellFormed extends HTMLPurifier_Strategy
                if ($token instanceof HTMLPurifier_Token_Text) {
                    foreach ($this->injectors as $i => $injector) {
                        if (isset($token->skip[$i])) {
+                            // See Note [Injector skips]
                            continue;
                        }
                        if ($token->rewind !== null && $token->rewind !== $i) {
@@ -367,6 +368,7 @@ class HTMLPurifier_Strategy_MakeWellFormed extends HTMLPurifier_Strategy
            if ($ok) {
                foreach ($this->injectors as $i => $injector) {
                    if (isset($token->skip[$i])) {
+                        // See Note [Injector skips]
                        continue;
                    }
                    if ($token->rewind !== null && $token->rewind !== $i) {
@@ -422,6 +424,7 @@ class HTMLPurifier_Strategy_MakeWellFormed extends HTMLPurifier_Strategy
                $token->start = $current_parent;
                foreach ($this->injectors as $i => $injector) {
                    if (isset($token->skip[$i])) {
+                        // See Note [Injector skips]
                        continue;
                    }
                    if ($token->rewind !== null && $token->rewind !== $i) {
@@ -534,12 +537,17 @@ class HTMLPurifier_Strategy_MakeWellFormed extends HTMLPurifier_Strategy
     */
    protected function processToken($token, $injector = -1)
    {
+        // Zend OpCache miscompiles $token = array($token), so
+        // avoid this pattern.  See: https://github.com/ezyang/htmlpurifier/issues/108
+
        // normalize forms of token
        if (is_object($token)) {
-            $token = array(1, $token);
+            $tmp = $token;
+            $token = array(1, $tmp);
        }
        if (is_int($token)) {
-            $token = array($token);
+            $tmp = $token;
+            $token = array($tmp);
        }
        if ($token === false) {
            $token = array(1);
@@ -561,7 +569,12 @@ class HTMLPurifier_Strategy_MakeWellFormed extends HTMLPurifier_Strategy
        list($old, $r) = $this->zipper->splice($this->token, $delete, $token);

        if ($injector > -1) {
-            // determine appropriate skips
+            // See Note [Injector skips]
+            // Determine appropriate skips.  Here's what the code does:
+            //  *If* we deleted one or more tokens, copy the skips
+            //  of those tokens into the skips of the new tokens (in $token).
+            //  Also, mark the newly inserted tokens as having come from
+            //  $injector.
            $oldskip = isset($old[0]) ? $old[0]->skip : array();
            foreach ($token as $object) {
                $object->skip = $oldskip;
@@ -597,4 +610,50 @@ class HTMLPurifier_Strategy_MakeWellFormed extends HTMLPurifier_Strategy
    }
 }

+// Note [Injector skips]
+// ~~~~~~~~~~~~~~~~~~~~~
+// When I originally designed this class, the idea behind the 'skip'
+// property of HTMLPurifier_Token was to help avoid infinite loops
+// in injector processing.  For example, suppose you wrote an injector
+// that bolded swear words.  Naively, you might write it so that
+// whenever you saw ****, you replaced it with <strong>****</strong>.
+//
+// When this happens, we will reprocess all of the tokens with the
+// other injectors.  Now there is an opportunity for infinite loop:
+// if we rerun the swear-word injector on these tokens, we might
+// see **** and then reprocess again to get
+// <strong><strong>****</strong></strong> ad infinitum.
+//
+// Thus, the idea of a skip is that once we process a token with
+// an injector, we mark all of those tokens as having "come from"
+// the injector, and we never run the injector again on these
+// tokens.
+//
+// There were two more complications, however:
+//
+//  - With HTMLPurifier_Injector_RemoveEmpty, we noticed that if
+//    you had <b><i></i></b>, after you removed the <i></i>, you
+//    really would like this injector to go back and reprocess
+//    the <b> tag, discovering that it is now empty and can be
+//    removed.  So we reintroduced the possibility of infinite looping
+//    by adding a "rewind" function, which let you go back to an
+//    earlier point in the token stream and reprocess it with injectors.
+//    Needless to say, we need to UN-skip the token so it gets
+//    reprocessed.
+//
+//  - Suppose that you successfuly process a token, replace it with
+//    one with your skip mark, but now another injector wants to
+//    process the skipped token with another token.  Should you continue
+//    to skip that new token, or reprocess it?  If you reprocess,
+//    you can end up with an infinite loop where one injector converts
+//    <a> to <b>, and then another injector converts it back.  So
+//    we inherit the skips, but for some reason, I thought that we
+//    should inherit the skip from the first token of the token
+//    that we deleted.  Why?  Well, it seems to work OK.
+//
+// If I were to redesign this functionality, I would absolutely not
+// go about doing it this way: the semantics are just not very well
+// defined, and in any case you probably wanted to operate on trees,
+// not token streams.
+
 // vim: et sw=4 sts=4
--- a/library/HTMLPurifier/Token.php
+++ b/library/HTMLPurifier/Token.php
@@ -26,7 +26,7 @@ abstract class HTMLPurifier_Token
    public $armor = array();

    /**
-     * Used during MakeWellFormed.
+     * Used during MakeWellFormed.  See Note [Injector skips]
     * @type
     */
    public $skip;
--- a/library/HTMLPurifier/URI.php
+++ b/library/HTMLPurifier/URI.php
@@ -85,11 +85,13 @@ class HTMLPurifier_URI
            $def = $config->getDefinition('URI');
            $scheme_obj = $def->getDefaultScheme($config, $context);
            if (!$scheme_obj) {
-                // something funky happened to the default scheme object
-                trigger_error(
-                    'Default scheme object "' . $def->defaultScheme . '" was not readable',
-                    E_USER_WARNING
-                );
+                if ($def->defaultScheme !== null) {
+                    // something funky happened to the default scheme object
+                    trigger_error(
+                        'Default scheme object "' . $def->defaultScheme . '" was not readable',
+                        E_USER_WARNING
+                    );
+                } // suppress error if it's null
                return false;
            }
        }
--- a/test-settings.travis.php
+++ b/test-settings.travis.php
@@ -0,0 +1,72 @@
+<?php
+
+// This file is the configuration for Travis testing.
+
+// Note: The only external library you *need* is SimpleTest; everything else
+//       is optional.
+
+// We've got a lot of tests, so we recommend turning the limit off.
+set_time_limit(0);
+
+// Turning off output buffering will prevent mysterious errors from core dumps.
+$data = @ob_get_clean();
+if ($data !== false && $data !== '') {
+    echo "Output buffer contains data [".urlencode($data)."]\n";
+    exit;
+}
+
+// -----------------------------------------------------------------------------
+// REQUIRED SETTINGS
+
+// Note on running SimpleTest:
+//      You want the Git copy of SimpleTest, found here:
+//          https://github.com/simpletest/simpletest/
+//
+//      If SimpleTest is borked with HTML Purifier, please contact me or
+//      the SimpleTest devs; I am a developer for SimpleTest so I should be
+//      able to quickly assess a fix. SimpleTest's problem is my problem!
+
+// Where is SimpleTest located? Remember to include a trailing slash!
+$simpletest_location = dirname(__FILE__) . '/simpletest/';
+
+// -----------------------------------------------------------------------------
+// OPTIONAL SETTINGS
+
+// Note on running PHPT:
+//      Vanilla PHPT from https://github.com/tswicegood/PHPT_Core should
+//      work fine on Linux w/o multitest.
+//
+//      To do multitest or Windows testing, you'll need some more
+//      patches at https://github.com/ezyang/PHPT_Core
+//
+//      I haven't tested the Windows setup in a while so I don't know if
+//      it still works.
+
+// Should PHPT tests be enabled?
+$GLOBALS['HTMLPurifierTest']['PHPT'] = false;
+
+// If PHPT isn't in your Path via PEAR, set that here:
+// set_include_path('/path/to/phpt/Core/src' . PATH_SEPARATOR . get_include_path());
+
+// Where is CSSTidy located? (Include trailing slash. Leave false to disable.)
+$csstidy_location    = false;
+
+// For tests/multitest.php, which versions to test?
+$versions_to_test    = array();
+
+// Stable PHP binary to use when invoking maintenance scripts.
+$php = 'php';
+
+// For tests/multitest.php, what is the multi-version executable? It must
+// accept an extra parameter (version number) before all other arguments
+$phpv = false;
+
+// Should PEAR tests be run? If you've got a valid PEAR installation, set this
+// to true (or, if it's not in the include path, to its install directory).
+$GLOBALS['HTMLPurifierTest']['PEAR'] = false;
+
+// If PEAR is enabled, what PEAR tests should be run? (Note: you will
+// need to ensure these libraries are installed)
+$GLOBALS['HTMLPurifierTest']['Net_IDNA2'] = true;
+
+// vim: et sw=4 sts=4
--- a/tests/HTMLPurifier/AttrDef/CSS/BackgroundTest.php
+++ b/tests/HTMLPurifier/AttrDef/CSS/BackgroundTest.php
@@ -12,12 +12,18 @@ class HTMLPurifier_AttrDef_CSS_BackgroundTest extends HTMLPurifier_AttrDefHarnes
        $this->assertDef($valid);
        $this->assertDef('url(\'chess.png\') #333 50% top repeat fixed', $valid);
        $this->assertDef(
-            'rgb(34, 56, 33) url(chess.png) repeat fixed top',
-            'rgb(34,56,33) url("chess.png") repeat fixed top'
+            'rgb(34%, 56%, 33%) url(chess.png) repeat fixed top',
+            'rgb(34%,56%,33%) url("chess.png") repeat fixed top'
+        );
+        $this->assertDef(
+            'rgba(74, 12, 85, 0.35) repeat fixed bottom',
+            'rgba(74,12,85,.35) repeat fixed bottom'
+        );
+        $this->assertDef(
+            'hsl(244, 47.4%, 88.1%) right center',
+            'hsl(244,47.4%,88.1%) right center'
        );
-
    }
-
 }

 // vim: et sw=4 sts=4
--- a/tests/HTMLPurifier/AttrDef/CSS/ColorTest.php
+++ b/tests/HTMLPurifier/AttrDef/CSS/ColorTest.php
@@ -11,13 +11,33 @@ class HTMLPurifier_AttrDef_CSS_ColorTest extends HTMLPurifier_AttrDefHarness
        $this->assertDef('#fff');
        $this->assertDef('#eeeeee');
        $this->assertDef('#808080');
+
        $this->assertDef('rgb(255, 0, 0)', 'rgb(255,0,0)'); // rm spaces
        $this->assertDef('rgb(100%,0%,0%)');
        $this->assertDef('rgb(50.5%,23.2%,43.9%)'); // decimals okay
+        $this->assertDef('rgb(-5,0,0)', 'rgb(0,0,0)'); // negative values
+        $this->assertDef('rgb(295,0,0)', 'rgb(255,0,0)'); // max values
+        $this->assertDef('rgb(12%,150%,0%)', 'rgb(12%,100%,0%)'); // percentage max values
+
+        $this->assertDef('rgba(255, 0, 0, 0)', 'rgba(255,0,0,0)'); // rm spaces
+        $this->assertDef('rgba(100%,0%,0%,.4)');
+        $this->assertDef('rgba(38.1%,59.7%,1.8%,0.7)', 'rgba(38.1%,59.7%,1.8%,.7)'); // decimals okay
+
+        $this->assertDef('hsl(275, 45%, 81%)', 'hsl(275,45%,81%)'); // rm spaces
+        $this->assertDef('hsl(100,0%,0%)');
+        $this->assertDef('hsl(38,59.7%,1.8%)', 'hsl(38,59.7%,1.8%)'); // decimals okay
+        $this->assertDef('hsl(-11,-15%,25%)', 'hsl(0,0%,25%)'); // negative values
+        $this->assertDef('hsl(380,125%,0%)', 'hsl(360,100%,0%)'); // max values
+
+        $this->assertDef('hsla(100, 74%, 29%, 0)', 'hsla(100,74%,29%,0)'); // rm spaces
+        $this->assertDef('hsla(154,87%,21%,.4)');
+        $this->assertDef('hsla(45,94.3%,4.1%,0.7)', 'hsla(45,94.3%,4.1%,.7)'); // decimals okay

        $this->assertDef('#G00', false);
        $this->assertDef('cmyk(40, 23, 43, 23)', false);
-        $this->assertDef('rgb(0%, 23, 68%)', false);
+        $this->assertDef('rgb(0%, 23, 68%)', false); // no mixed type
+        $this->assertDef('rgb(231, 144, 28.2%)', false); // no mixed type
+        $this->assertDef('hsl(18%,12%,89%)', false); // integer, percentage, percentage

        // clip numbers outside sRGB gamut
        $this->assertDef('rgb(200%, -10%, 0%)', 'rgb(100%,0%,0%)');
--- a/tests/HTMLPurifier/AttrDef/CSSTest.php
+++ b/tests/HTMLPurifier/AttrDef/CSSTest.php
@@ -27,6 +27,7 @@ class HTMLPurifier_AttrDef_CSSTest extends HTMLPurifier_AttrDefHarness
        $this->assertDef('background-color:rgb(0,0,255);');
        $this->assertDef('background-color:transparent;');
        $this->assertDef('background:#333 url("chess.png") repeat fixed 50% top;');
+        $this->assertDef('background:#333 url("che;ss.png") repeat fixed 50% top;');
        $this->assertDef('color:#F00;');
        $this->assertDef('border-top-color:#F00;');
        $this->assertDef('border-color:#F00 #FF0;');
@@ -61,6 +62,10 @@ class HTMLPurifier_AttrDef_CSSTest extends HTMLPurifier_AttrDefHarness
        $this->assertDef('width:50px;');
        $this->assertDef('width:auto;');
        $this->assertDef('width:-50px;', false);
+        $this->assertDef('min-width:50%;');
+        $this->assertDef('min-width:50px;');
+        $this->assertDef('min-width:auto;');
+        $this->assertDef('min-width:-50px;', false);
        $this->assertDef('text-decoration:underline;');
        $this->assertDef('font-family:sans-serif;');
        $this->assertDef("font-family:Gill, 'Times New Roman', sans-serif;");
--- a/tests/HTMLPurifier/AttrDef/URI/HostTest.php
+++ b/tests/HTMLPurifier/AttrDef/URI/HostTest.php
@@ -38,7 +38,7 @@ class HTMLPurifier_AttrDef_URI_HostTest extends HTMLPurifier_AttrDefHarness
        $this->assertDef('f-.top', false);
        $this->assertDef('1a');

-        $this->assertDef("\xE4\xB8\xAD\xE6\x96\x87.com.cn", false);
+        $this->assertDef("\xE4\xB8\xAD\xE6\x96\x87.com.cn", 'xn--fiq228c.com.cn', true);

    }

--- a/tests/HTMLPurifier/AttrDef/URITest.php
+++ b/tests/HTMLPurifier/AttrDef/URITest.php
@@ -81,6 +81,12 @@ class HTMLPurifier_AttrDef_URITest extends HTMLPurifier_AttrDefHarness
        $this->assertDef('http://example.com/foo/bar');
    }

+    public function testDefaultSchemeNull()
+    {
+        $this->config->set('URI.DefaultScheme', null);
+        $this->assertDef('foo', false);
+    }
+
    public function testAltSchemeNotRemoved()
    {
        $this->assertDef('mailto:this-looks-like-a-path@example.com');
--- a/tests/HTMLPurifier/AttrDefHarness.php
+++ b/tests/HTMLPurifier/AttrDefHarness.php
@@ -13,14 +13,18 @@ class HTMLPurifier_AttrDefHarness extends HTMLPurifier_Harness
    }

    // cannot be used for accumulator
-    public function assertDef($string, $expect = true)
+    public function assertDef($string, $expect = true, $or_false = false)
    {
        // $expect can be a string or bool
        $result = $this->def->validate($string, $this->config, $this->context);
        if ($expect === true) {
-            $this->assertIdentical($string, $result);
+            if (!($or_false && $result === false)) {
+                $this->assertIdentical($string, $result);
+            }
        } else {
-            $this->assertIdentical($expect, $result);
+            if (!($or_false && $result === false)) {
+                $this->assertIdentical($expect, $result);
+            }
        }
    }

--- a/tests/HTMLPurifier/AttrValidator_ErrorsTest.php
+++ b/tests/HTMLPurifier/AttrValidator_ErrorsTest.php
@@ -10,7 +10,8 @@ class HTMLPurifier_AttrValidator_ErrorsTest extends HTMLPurifier_ErrorsHarness
        $this->language = HTMLPurifier_LanguageFactory::instance()->create($config, $this->context);
        $this->context->register('Locale', $this->language);
        $this->collector = new HTMLPurifier_ErrorCollector($this->context);
-        $this->context->register('Generator', new HTMLPurifier_Generator($config, $this->context));
+        $gen = new HTMLPurifier_Generator($config, $this->context);
+        $this->context->register('Generator', $gen);
    }

    protected function invoke($input)
--- a/tests/HTMLPurifier/DefinitionCacheFactoryTest.php
+++ b/tests/HTMLPurifier/DefinitionCacheFactoryTest.php
@@ -30,7 +30,8 @@ class HTMLPurifier_DefinitionCacheFactoryTest extends HTMLPurifier_Harness
        $this->factory->addDecorator('Memory');
        $cache = $this->factory->create('Test', $this->config);
        $cache_real = new HTMLPurifier_DefinitionCache_Decorator_Memory();
-        $cache_real = $cache_real->decorate(new HTMLPurifier_DefinitionCache_Serializer('Test'));
+        $ser = new HTMLPurifier_DefinitionCache_Serializer('Test');
+        $cache_real = $cache_real->decorate($ser);
        $this->assertEqual($cache, $cache_real);
    }

@@ -39,7 +40,8 @@ class HTMLPurifier_DefinitionCacheFactoryTest extends HTMLPurifier_Harness
        $this->factory->addDecorator(new HTMLPurifier_DefinitionCache_Decorator_Memory());
        $cache = $this->factory->create('Test', $this->config);
        $cache_real = new HTMLPurifier_DefinitionCache_Decorator_Memory();
-        $cache_real = $cache_real->decorate(new HTMLPurifier_DefinitionCache_Serializer('Test'));
+        $ser = new HTMLPurifier_DefinitionCache_Serializer('Test');
+        $cache_real = $cache_real->decorate($ser);
        $this->assertEqual($cache, $cache_real);
    }

--- a/tests/HTMLPurifier/EncoderTest.php
+++ b/tests/HTMLPurifier/EncoderTest.php
@@ -23,6 +23,7 @@ class HTMLPurifier_EncoderTest extends HTMLPurifier_Harness
        $this->assertCleanUTF8('Normal string.');
        $this->assertCleanUTF8("Test\tAllowed\nControl\rCharacters");
        $this->assertCleanUTF8("null byte: \0", 'null byte: ');
+        $this->assertCleanUTF8("あ（い）う（え）お\0", "あ（い）う（え）お"); // test for issue #122
        $this->assertCleanUTF8("\1\2\3\4\5\6\7", '');
        $this->assertCleanUTF8("\x7F", ''); // one byte invalid SGML char
        $this->assertCleanUTF8("\xC2\x80", ''); // two byte invalid SGML
--- a/tests/HTMLPurifier/Filter/ExtractStyleBlocksTest.php
+++ b/tests/HTMLPurifier/Filter/ExtractStyleBlocksTest.php
@@ -256,6 +256,12 @@ text-align:center
        $this->assertCleanCSS("a .foo #ID div.cl#foo {\nbackground:url(\"http://foo/BAR\")\n}");
    }

+    public function test_extractStyleBlocks_backtracking()
+    {
+        $goo = str_repeat("a", 1000000); // 1M to trigger, sometimes it's less!
+        $this->assertExtractStyleBlocks("<style></style>" . $goo, $goo, array(''));
+    }
+
 }

 // vim: et sw=4 sts=4
--- a/tests/HTMLPurifier/HTMLModule/TargetBlankTest.php
+++ b/tests/HTMLPurifier/HTMLModule/TargetBlankTest.php
@@ -13,7 +13,14 @@ class HTMLPurifier_HTMLModule_TargetBlankTest extends HTMLPurifier_HTMLModuleHar
    {
        $this->assertResult(
            '<a href="http://google.com">a</a><a href="/local">b</a><a href="mailto:foo@example.com">c</a>',
-            '<a href="http://google.com" target="_blank" rel="noreferrer">a</a><a href="/local">b</a><a href="mailto:foo@example.com">c</a>'
+            '<a href="http://google.com" target="_blank" rel="noreferrer noopener">a</a><a href="/local">b</a><a href="mailto:foo@example.com">c</a>'
+        );
+    }
+
+    public function testTargetBlankNoDupe() {
+        $this->assertResult(
+            '<a href="http://google.com" target="_blank">a</a>',
+            '<a href="http://google.com" target="_blank" rel="noreferrer noopener">a</a>'
        );
    }

--- a/tests/HTMLPurifier/HTMLModule/TargetNoopenerTest.php
+++ b/tests/HTMLPurifier/HTMLModule/TargetNoopenerTest.php
@@ -0,0 +1,51 @@
+<?php
+
+class HTMLPurifier_HTMLModule_TargetNoopenerTest extends HTMLPurifier_HTMLModuleHarness
+{
+
+    public function setUp()
+    {
+        parent::setUp();
+        $this->config->set('HTML.TargetNoreferrer', false);
+        $this->config->set('HTML.TargetNoopener', true);
+        $this->config->set('Attr.AllowedFrameTargets', '_blank');
+    }
+
+    public function testNoreferrer()
+    {
+        $this->assertResult(
+            '<a href="http://google.com" target="_blank">x</a>',
+            '<a href="http://google.com" target="_blank" rel="noopener">x</a>'
+        );
+    }
+
+    public function testNoreferrerNoDupe()
+    {
+        $this->config->set('Attr.AllowedRel', 'noopener');
+        $this->assertResult(
+            '<a href="http://google.com" target="_blank" rel="noopener">x</a>',
+            '<a href="http://google.com" target="_blank" rel="noopener">x</a>'
+        );
+    }
+
+    public function testTargetBlankNoreferrer()
+    {
+        $this->config->set('HTML.TargetBlank', true);
+        $this->assertResult(
+            '<a href="http://google.com">x</a>',
+            '<a href="http://google.com" target="_blank" rel="noopener">x</a>'
+        );
+    }
+
+    public function testNoTarget()
+    {
+        $this->assertResult(
+            '<a href="http://google.com">x</a>',
+            '<a href="http://google.com">x</a>'
+        );
+    }
+
+
+}
+
+// vim: et sw=4 sts=4
--- a/tests/HTMLPurifier/HTMLModule/TargetNoreferrerTest.php
+++ b/tests/HTMLPurifier/HTMLModule/TargetNoreferrerTest.php
@@ -7,6 +7,7 @@ class HTMLPurifier_HTMLModule_TargetNoreferrerTest extends HTMLPurifier_HTMLModu
    {
        parent::setUp();
        $this->config->set('HTML.TargetNoreferrer', true);
+        $this->config->set('HTML.TargetNoopener', false);
        $this->config->set('Attr.AllowedFrameTargets', '_blank');
    }

@@ -36,6 +37,14 @@ class HTMLPurifier_HTMLModule_TargetNoreferrerTest extends HTMLPurifier_HTMLModu
        );
    }

+    public function testNoTarget()
+    {
+        $this->assertResult(
+            '<a href="http://google.com">x</a>',
+            '<a href="http://google.com">x</a>'
+        );
+    }
+

 }

--- a/tests/HTMLPurifier/HTMLT/t78.htmlt
+++ b/tests/HTMLPurifier/HTMLT/t78.htmlt
@@ -0,0 +1,7 @@
+--INI--
+HTML.Doctype = HTML 4.01 Strict
+--HTML--
+<b>Vetgedrukt</b> <i>Schuingedrukt</i> <span>Hou</span><iframe></iframe><script></script> jij ook zo van vakjesdenken?
+--EXPECT--
+<b>Vetgedrukt</b> <i>Schuingedrukt</i> <span>Hou</span> jij ook zo van vakjesdenken?
+--# vim: et sw=4 sts=4
--- a/tests/HTMLPurifier/Injector/RemoveEmptyTest.php
+++ b/tests/HTMLPurifier/Injector/RemoveEmptyTest.php
@@ -78,6 +78,11 @@ class HTMLPurifier_Injector_RemoveEmptyTest extends HTMLPurifier_InjectorHarness
        $this->assertResult('<b>&nbsp;   &nbsp;</b>', '');
    }

+    public function testRemoveLi()
+    {
+        $this->assertResult("<ul><li>\n\n\n</li></ul>", '');
+    }
+
    public function testDontRemoveNbsp()
    {
        $this->config->set('AutoFormat.RemoveEmpty.RemoveNbsp', true);
--- a/tests/HTMLPurifier/LexerTest.php
+++ b/tests/HTMLPurifier/LexerTest.php
@@ -46,11 +46,11 @@ class HTMLPurifier_LexerTest extends HTMLPurifier_Harness

    // HTMLPurifier_Lexer->parseData() -----------------------------------------

-    public function assertParseData($input, $expect = true)
+    public function assertParseData($input, $expect = true, $is_attr = false)
    {
        if ($expect === true) $expect = $input;
        $lexer = new HTMLPurifier_Lexer();
-        $this->assertIdentical($expect, $lexer->parseData($input));
+        $this->assertIdentical($expect, $lexer->parseData($input, $is_attr, $this->config));
    }

    public function test_parseData_plainText()
@@ -95,7 +95,58 @@ class HTMLPurifier_LexerTest extends HTMLPurifier_Harness

    public function test_parseData_improperEntityFaultToleranceTest()
    {
-        $this->assertParseData('&#x2D;');
+        $this->assertParseData('&#x2D;', '-');
+    }
+
+    public function test_parseData_noTrailingSemi()
+    {
+        $this->assertParseData('&ampA', '&A');
+    }
+
+    public function test_parseData_noTrailingSemiAttr()
+    {
+        $this->assertParseData('&ampA', '&ampA', true);
+    }
+
+    public function test_parseData_T119()
+    {
+        $this->assertParseData('&ampA', '&ampA', true);
+    }
+
+    public function test_parseData_T119b()
+    {
+        $this->assertParseData('&trade=', true, true);
+    }
+
+    public function test_parseData_legacy1()
+    {
+        $this->config->set('Core.LegacyEntityDecoder', true);
+        $this->assertParseData('&ampa', true);
+        $this->assertParseData('&amp=', "&=");
+        $this->assertParseData('&ampa', true, true);
+        $this->assertParseData('&amp=', "&=", true);
+        $this->assertParseData('&lta', true);
+        $this->assertParseData('&lt=', "<=");
+        $this->assertParseData('&lta', true, true);
+        $this->assertParseData('&lt=', "<=", true);
+    }
+
+    public function test_parseData_nonlegacy1()
+    {
+        $this->assertParseData('&ampa', "&a");
+        $this->assertParseData('&amp=', "&=");
+        $this->assertParseData('&ampa', true, true);
+        $this->assertParseData('&amp=', true, true);
+        $this->assertParseData('&lta', "<a");
+        $this->assertParseData('&lt=', "<=");
+        $this->assertParseData('&lta', true, true);
+        $this->assertParseData('&lt=', true, true);
+        $this->assertParseData('&lta;', "<a;");
+    }
+
+    public function test_parseData_noTrailingSemiNever()
+    {
+        $this->assertParseData('&imath');
    }

    // HTMLPurifier_Lexer->extractBody() ---------------------------------------
@@ -814,13 +865,21 @@ div {}
    public function test_tokenizeHTML_prematureDivClose()
    {
        $this->assertTokenization(
-            '</div>dontdie',
+            '</div>dont<b>die</b>',
            array(
                new HTMLPurifier_Token_End('div'),
-                new HTMLPurifier_Token_Text('dontdie')
+                new HTMLPurifier_Token_Text('dont'),
+                new HTMLPurifier_Token_Start('b'),
+                new HTMLPurifier_Token_Text('die'),
+                new HTMLPurifier_Token_End('b'),
            ),
            array(
-                'DOMLex' => $alt = array(new HTMLPurifier_Token_Text('dontdie')),
+                'DOMLex' => $alt = array(
+                    new HTMLPurifier_Token_Text('dont'),
+                    new HTMLPurifier_Token_Start('b'),
+                    new HTMLPurifier_Token_Text('die'),
+                    new HTMLPurifier_Token_End('b')
+                ),
                'PH5P' => $alt
            )
        );
--- a/tests/HTMLPurifier/Strategy/ValidateAttributesTest.php
+++ b/tests/HTMLPurifier/Strategy/ValidateAttributesTest.php
@@ -189,7 +189,7 @@ class HTMLPurifier_Strategy_ValidateAttributesTest extends
    {
        $this->config->set('Attr.AllowedFrameTargets', '_top');
        $this->config->set('HTML.Doctype', 'XHTML 1.0 Transitional');
-        $this->assertResult('<a href="foo" target="_top" rel="noreferrer" />');
+        $this->assertResult('<a href="foo" target="_top" rel="noreferrer noopener" />');
    }

    public function testRemoveTargetWhenNotSupported()
--- a/tests/index.php
+++ b/tests/index.php
@@ -212,6 +212,12 @@ if ($AC['file']) {

 if ($AC['dry']) $reporter->makeDry();

-$test->run($reporter);
+$result = $test->run($reporter);
+
+if ($result) {
+    exit(0); // Success!
+} else {
+    exit(1); // Abject failure.
+}

 // vim: et sw=4 sts=4
Author	SHA1	Message	Date
Edward Z. Yang	de82f9845f	Release 4.9.1 (sic) Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>	2017-03-08 00:22:36 -08:00
Edward Z. Yang	9d2d75d8bc	Add test case for removing empty list items. Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>	2017-03-08 00:11:32 -08:00
Edward Z. Yang	74f123a84c	Fix #83 . Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>	2017-03-07 17:52:41 -08:00
Edward Z. Yang	7e11c271b9	Revamp entity decoding to be more like HTML5. See %Core.LegacyEntityDecoder for more details. Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>	2017-03-07 17:34:59 -08:00
Edward Z. Yang	66bbae73a9	Comment on why it's a non-greedy match. Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>	2017-03-06 23:27:30 -08:00
Edward Z. Yang	5886326cd0	Test for catastrophic backtracking. Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>	2017-03-06 23:26:55 -08:00
Edward Z. Yang	564af61809	Usage/includes update. Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>	2017-03-06 23:06:56 -08:00
Edward Z. Yang	b19dcb0ba5	CHANGELOG for #120 fix, and remove the array_filter. Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>	2017-03-06 23:06:24 -08:00
Edward Z. Yang	586abc63e4	CHANGELOG for rgba/hsl/hsla patch. Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>	2017-03-06 23:03:33 -08:00
Edward Z. Yang	5b6a3f55bf	Merge pull request #121 from breathbath/master Fixing PREG_BACKTRACK_LIMIT_ERROR in HTMLPurifier_Filter_ExtractStyle…	2017-03-06 23:01:34 -08:00
Edward Z. Yang	0c31b22240	Merge pull request #118 from fxbt/master Add hsl, hsla and rgba support for css color attribute definition	2017-03-06 23:01:06 -08:00
Edward Z. Yang	5662efc936	Fix #78 . Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>	2017-03-06 22:54:54 -08:00
Edward Z. Yang	353c96f156	Document skips in more detail, #116 . Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>	2017-03-06 20:31:28 -08:00
Edward Z. Yang	4047a6230b	Extra cleanup on cleanUTF8. Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>	2017-03-06 16:31:02 -08:00
Andrey Pozolotin	9195cb7a2e	Added escape sequense	2017-03-06 16:28:53 -08:00
Andrey Pozolotin	39c4c359ad	Fixing PREG_BACKTRACK_LIMIT_ERROR in HTMLPurifier_Filter_ExtractStyleBlocks	2017-03-06 16:28:53 -08:00
Edward Z. Yang	bb3f86e80a	Merge pull request #123 from mpyw-forks/fix/#122/surrogate-pair-range Fix surrogate pair range	2017-03-03 23:13:30 -08:00
mpyw	d16e73e63e	Add test for #122	2017-03-04 15:40:44 +09:00
mpyw	f145f64bf4	Fix #122 : correct surrogate pair range	2017-03-04 15:38:01 +09:00
Andrey Pozolotin	5fdec87fe9	Added escape sequense	2017-03-01 17:52:00 +01:00
Andrey Pozolotin	4462559459	Fixing PREG_BACKTRACK_LIMIT_ERROR in HTMLPurifier_Filter_ExtractStyleBlocks	2017-03-01 17:46:03 +01:00
f.godfrin	12185143ef	Use a constructor and a property for the alpha check	2017-02-10 21:03:11 +01:00
f.godfrin	17a90a951a	Better regex for mungeRgb	2017-02-10 00:40:56 +01:00
f.godfrin	0bab4b9fd0	Fix mungeRgb to handle percent, float and hsl values	2017-02-10 00:38:05 +01:00
f.godfrin	bd92f3531b	Remove double %	2017-02-09 23:37:36 +01:00
f.godfrin	0d5ab2fe13	Include hsl and hsla support	2017-02-09 23:34:19 +01:00
f.godfrin	d41a59e422	Add rgba support for css color attribute definition	2017-02-09 22:18:15 +01:00
Bastian Hofmann	8e4cacf0a7	Refactor HTML.Noopener to HTML.TargetNoopener so that it behaves like HTML.TargetNoreferrer and is active by default if a target is set	2017-02-03 16:54:51 -08:00
Bastian Hofmann	c82051c3e1	Add HTML.Noopener to add a noopener rel to every external link This has performance benefits https://jakearchibald.com/2016/performance-benefits-of-rel-noopener/ but most importantly also security benefits https://mathiasbynens.github.io/rel-noopener/ Adresses https://github.com/ezyang/htmlpurifier/issues/96	2017-02-03 16:54:51 -08:00
Edward Z. Yang	d4a96463ef	export-ignore .travis.yml Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>	2017-01-19 09:28:40 -08:00
Edward Z. Yang	1b7d684d07	Remove $a = array($a) which is miscompiled by Zend OpCache. Fixes #108. Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>	2017-01-04 14:35:52 -05:00
Edward Z. Yang	5070404376	Handle semicolons in strings in CSS correctly. Fixes http://htmlpurifier.org/phorum/read.php?3,7522,8096 Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>	2016-10-29 00:01:19 -07:00
Edward Z. Yang	cef27f750d	Add missing changelog entries. Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>	2016-10-27 17:31:10 -07:00
Edward Z. Yang	59463c5c39	Allow %URI.DefaultScheme to be null. Fixes #103. Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>	2016-10-27 17:30:44 -07:00
Edward Z. Yang	d19d648a26	[ci skip] Add a Travis build badge. Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>	2016-10-27 02:02:29 -07:00
Edward Z. Yang	20b40a5441	Travis support. Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>	2016-10-27 02:00:47 -07:00
Edward Z. Yang	34d252cbbc	Update usage.xml. Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>	2016-10-27 02:00:47 -07:00
Edward Z. Yang	8b28e571fe	Handle case when IDNAs are supported. Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>	2016-10-27 02:00:46 -07:00
Edward Z. Yang	3ae21ce511	PHP 7.0 warnings fix: don't pass rvalue by reference. Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>	2016-10-27 02:00:46 -07:00
Edward Z. Yang	3ba9133b21	Don't assume that idn_to_ascii does validation. Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>	2016-10-27 02:00:46 -07:00
Edward Z. Yang	dc8702160c	Merge pull request #101 from yankos/hotfix/directory_not_close FIX directory not closing	2016-10-15 23:14:10 -07:00
yan_kos	4dc68aa920	FIX directory not closing #100	2016-10-15 16:20:47 +03:00
Edward Z. Yang	08eee90e15	Delete asserts, fixes #97 . Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>	2016-10-02 00:14:41 -07:00
Edward Z. Yang	1ef4375dbb	Proposed fix to Serializer code. Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>	2016-09-05 15:24:08 -07:00
Edward Z. Yang	6a221a3045	Merge pull request #94 from zobzn/css-min-max-width css definition (min-width, max-width, min-height, max-height)	2016-09-05 14:57:44 -07:00
zema	246fc8946a	css properties: min-width, max-width, min-height, max-height	2016-09-05 10:45:58 +03:00
Edward Z. Yang	1ce2fde400	Merge pull request #91 from apsdsm/fix-permissions-bug changed chmod behaviour in Serializer	2016-07-29 03:25:41 -07:00
Nick del Pozo	1f982d279f	rollback change to permissions	2016-07-29 08:56:36 +09:00
Nick del Pozo	8be8cee9b3	changed chmod behaviour in Serializer	2016-07-27 12:56:03 +09:00
@@ -1 +1 @@
 .8.0
 .9.1