mirror of
https://github.com/ezyang/htmlpurifier.git
synced 2025-08-05 13:47:24 +02:00
New directive %Core.AllowHostnameUnderscore
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
This commit is contained in:
@@ -47,10 +47,23 @@ class HTMLPurifier_AttrDef_URI_Host extends HTMLPurifier_AttrDef
|
||||
// This doesn't match I18N domain names, but we don't have proper IRI support,
|
||||
// so force users to insert Punycode.
|
||||
|
||||
// There is not a good sense in which underscores should be
|
||||
// allowed, since it's technically not! (And if you go as
|
||||
// far to allow everything as specified by the DNS spec...
|
||||
// well, that's literally everything, modulo some space limits
|
||||
// for the components and the overall name (which, by the way,
|
||||
// we are NOT checking!). So we (arbitrarily) decide this:
|
||||
// let's allow underscores wherever we would have allowed
|
||||
// hyphens, if they are enabled. This is a pretty good match
|
||||
// for browser behavior, for example, a large number of browsers
|
||||
// cannot handle foo_.example.com, but foo_bar.example.com is
|
||||
// fairly well supported.
|
||||
$underscore = $config->get('Core.AllowHostnameUnderscore') ? '_' : '';
|
||||
|
||||
// The productions describing this are:
|
||||
$a = '[a-z]'; // alpha
|
||||
$an = '[a-z0-9]'; // alphanum
|
||||
$and = '[a-z0-9-]'; // alphanum | "-"
|
||||
$and = "[a-z0-9-$underscore]"; // alphanum | "-"
|
||||
// domainlabel = alphanum | alphanum *( alphanum | "-" ) alphanum
|
||||
$domainlabel = "$an($and*$an)?";
|
||||
// toplabel = alpha | alpha *( alphanum | "-" ) alphanum
|
||||
|
Binary file not shown.
@@ -0,0 +1,16 @@
|
||||
Core.AllowHostnameUnderscore
|
||||
TYPE: bool
|
||||
VERSION: 4.6.0
|
||||
DEFAULT: false
|
||||
--DESCRIPTION--
|
||||
<p>
|
||||
By RFC 1123, underscores are not permitted in host names.
|
||||
(This is in contrast to the specification for DNS, RFC
|
||||
2181, which allows underscores.)
|
||||
However, most browsers do the right thing when faced with
|
||||
an underscore in the host name, and so some poorly written
|
||||
websites are written with the expectation this should work.
|
||||
Setting this parameter to true relaxes our allowed character
|
||||
check so that underscores are permitted.
|
||||
</p>
|
||||
--# vim: et sw=4 sts=4
|
Reference in New Issue
Block a user