From 0960cf6aceeed14f3faa4f3e0d1be78d6c16051c Mon Sep 17 00:00:00 2001 From: "Edward Z. Yang" Date: Mon, 20 Nov 2006 02:47:00 +0000 Subject: [PATCH] [1.2.0] Converted enduser-id.txt to HTML. Fixed summary in index. Added extra style .subsubtitle git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@539 48356398-32a2-884e-a903-53898d9a118a --- docs/enduser-id.html | 146 +++++++++++++++++++++++++++++++++++++++++++ docs/enduser-id.txt | 124 ------------------------------------ docs/index.html | 9 ++- docs/style.css | 5 +- 4 files changed, 157 insertions(+), 127 deletions(-) create mode 100644 docs/enduser-id.html delete mode 100644 docs/enduser-id.txt diff --git a/docs/enduser-id.html b/docs/enduser-id.html new file mode 100644 index 00000000..6e474be8 --- /dev/null +++ b/docs/enduser-id.html @@ -0,0 +1,146 @@ + + + + + + + +IDs - HTML Purifier + + + +

IDs

+
What they are, why you should(n't) wear them, and how to deal with it
+ +
Filed under End-User
+
Return to the index.
+ +

Prior to HTML Purifier 1.2.0, this library blithely accepted user input that +looked like this:

+ +
<a id="fragment">Anchor</a>
+ +

...presenting an attractive vector for those that would destroy standards +compliance: simply set the ID to one that is already used elsewhere in the +document and voila: validation breaks. There was a half-hearted attempt to +prevent this by allowing users to blacklist IDs, but I suspect that no one +really bothered, and thus, with the release of 1.2.0, IDs are now removed +by default.

+ +

IDs, however, are quite useful functionality to have, so if users start +complaining about broken anchors you'll probably want to turn them back on +with %HTML.EnableAttrID. But before you go mucking around with the config +object, it's probably worth to take some precautions to keep your page +validating. Why?

+ +
    +
  1. Standards-compliant pages are good
  2. +
  3. Duplicated IDs interfere with anchors. If there are two id="foobar"s in a + document, which spot does a browser presented with the fragment #foobar go + to? Most browsers opt for the first appearing ID, making it impossible + to references the second section. Similarly, duplicated IDs can hijack + client-side scripting that relies on the IDs of elements.
  4. +
+ +

You have (currently) four ways of dealing with the problem.

+ + + +

Blacklisting IDs

+
Good for pages with single content source and stable templates
+ +

Keeping in terms with the +KISS principle, let us +deal with the most obvious solution: preventing users from using any IDs that +appear elsewhere on the document. The method is simple:

+ +
$config->set('HTML', 'EnableAttrID', true);
+$config->set('Attr', 'IDBlacklist' array(
+    'list', 'of', 'attributes', 'that', 'are', 'forbidden'
+));
+ +

That being said, there are some notable drawbacks. First of all, you have to +know precisely which IDs are being used by the HTML surrounding the user code. +This is easier said than done: quite often the page designer and the system +coder work separately, so the designer has to constantly be talking with the +coder whenever he decides to add a new anchor. Miss one and you open yourself +to possible standards-compliance issues.

+ +

Furthermore, this position becomes untenable when a single web page must hold +multiple portions of user-submitted content. Since there's obviously no way +to find out before-hand what IDs users will use, the blacklist is helpless. +And even since HTML Purifier validates each segment seperately, perhaps doing +so at different times, it would be extremely difficult to dynamically update +the blacklist inbetween runs.

+ +

Finally, simply destroying the ID is extremely un-userfriendly behavior: after +all, they might have simply specified a duplicate ID by accident.

+ +

Thus, we get to our second method.

+ + + +

Namespacing IDs

+
Lazy developer's way, but needs user education
+ +

This method, too, is quite simple: add a prefix to all user IDs. With this +code:

+ +
$config->set('HTML', 'EnableAttrID', true);
+$config->set('Attr', 'IDPrefix', 'user_');
+ +

...this:

+ +
<a id="foobar">Anchor!</a>
+ +

...turns into:

+ +
<a id="user_foobar">Anchor!</a>
+ +

As long as you don't have any IDs that start with user_, collisions are +guaranteed not to happen. The drawback is obvious: if a user submits +id="foobar", they probably expect to be able to reference their page with +#foobar. You'll have to tell them, "No, that doesn't work, you have to add +user_ to the beginning."

+ +

And yes, things get hairier. Even with a nice prefix, we still have done +nothing about multiple HTML Purifier outputs on one page. Thus, we have +a second configuration value to piggy-back off of: %Attr.IDPrefixLocal:

+ +
$config->set('Attr', 'IDPrefixLocal', 'comment' . $id . '_');
+ +

This new attributes does nothing but append on to regular IDPrefix, but is +special in that it is volatile: it's value is determined at run-time and +cannot possibly be cordoned into, say, a .ini config file. As for what to +put into the directive, is up to you, but I would recommend the ID number +the text has been assigned in the database. Whatever you pick, however, it +has to be unique and stable for the text you are validating. Note, however, +that we require that %Attr.IDPrefix be set before you use this directive.

+ +

And also remember: the user has to know what this prefix is too!

+ + + +

Abstinence

+ +

You may not want to bother. That's okay too, just don't enable IDs.

+ +

Personally, I would take this road whenever user-submitted content would be +possibly be shown together on one page. Why a blog comment would need to use +anchors is beyond me.

+ + + +

Denial

+ +

To revert back to pre-1.2.0 behavior, simply:

+ +
$config->set('HTML', 'EnableAttrID', true);
+ +

Don't come crying to me when your page mysteriously stops validating, though.

+ +
$Id$
+ + + \ No newline at end of file diff --git a/docs/enduser-id.txt b/docs/enduser-id.txt deleted file mode 100644 index f8f26584..00000000 --- a/docs/enduser-id.txt +++ /dev/null @@ -1,124 +0,0 @@ - -IDs - What they are, why you should(n't) wear them, and how to deal with it - -Prior to HTML Purifier 1.2.0, this library blithely accepted user input that -looked like this: - - Anchor - -...presenting an attractive vector for those that would destroy standards -compliance: simply set the ID to one that is already used elsewhere in the -document and voila: validation breaks. There was a half-hearted attempt to -prevent this by allowing users to blacklist IDs, but I suspect that no one -really bothered, and thus, with the release of 1.2.0, IDs are now *removed* -by default. - -IDs, however, are quite useful functionality to have, so if users start -complaining about broken anchors you'll probably want to turn them back on -with %HTML.EnableAttrID. But before you go mucking around with the config -object, it's probably worth to take some precautions to keep your page -validating. Why? - -1. Standards-compliant pages are good -2. Duplicated IDs interfere with anchors. If there are two id="foobar"s in a - document, which spot does a browser presented with the fragment #foobar go - to? Most browsers opt for the first appearing ID, making it impossible - to references the second section. Similarly, duplicated IDs can hijack - client-side scripting that relies on the IDs of elements. - -You have (currently) four ways of dealing with the problem. - - - -Road #1: Blacklisting IDs - Good for pages with single content source and stable templates - -Keeping in terms with the KISS (Keep It Simple, Stupid) principle, let us -deal with the most obvious solution: preventing users from using any IDs that -appear elsewhere on the document. The method is simple: - - $config->set('HTML', 'EnableAttrID', true); - $config->set('Attr', 'IDBlacklist' array( - 'list', 'of', 'attributes', 'that', 'are', 'forbidden' - )); - -That being said, there are some notable drawbacks. First of all, you have to -know precisely which IDs are being used by the HTML surrounding the user code. -This is easier said than done: quite often the page designer and the system -coder work separately, so the designer has to constantly be talking with the -coder whenever he decides to add a new anchor. Miss one and you open yourself -to possible standards-compliance issues. - -Furthermore, this position becomes untenable when a single web page must hold -multiple portions of user-submitted content. Since there's obviously no way -to find out before-hand what IDs users will use, the blacklist is helpless. -And even since HTML Purifier validates each segment seperately, perhaps doing -so at different times, it would be extremely difficult to dynamically update -the blacklist inbetween runs. - -Finally, simply destroying the ID is extremely un-userfriendly behavior: after -all, they might have simply specified a duplicate ID by accident. - -Thus, we get to our second method. - - - -Road #2: Namespacing IDs - Lazy developer's way, but needs user education - -This method, too, is quite simple: add a prefix to all user IDs. With this -code: - - $config->set('HTML', 'EnableAttrID', true); - $config->set('Attr', 'IDPrefix', 'user_'); - -...this: - - Anchor! - -...turns into: - - Anchor! - -As long as you don't have any IDs that start with user_, collisions are -guaranteed not to happen. The drawback is obvious: if a user submits -id="foobar", they probably expect to be able to reference their page with -#foobar. You'll have to tell them, "No, that doesn't work, you have to add -user_ to the beginning." - -And yes, things get hairier. Even with a nice prefix, we still have done -nothing about multiple HTML Purifier outputs on one page. Thus, we have -a second configuration value to piggy-back off of: %Attr.IDPrefixLocal: - - $config->set('Attr', 'IDPrefixLocal', 'comment' . $id . '_'); - -This new attributes does nothing but append on to regular IDPrefix, but is -special in that it is volatile: it's value is determined at run-time and -cannot possibly be cordoned into, say, a .ini config file. As for what to -put into the directive, is up to you, but I would recommend the ID number -the text has been assigned in the database. Whatever you pick, however, it -has to be unique and stable for the text you are validating. Note, however, -that we require that %Attr.IDPrefix be set before you use this directive. - -And also remember: the user has to know what this prefix is too! - - - -Path #3: Abstinence - -You may not want to bother. That's okay too, just don't enable IDs. - -Personally, I would take this road whenever user-submitted content would be -possibly be shown together on one page. Why a blog comment would need to use -anchors is beyond me. - - - -Path #4: Denial - -To revert back to pre-1.2.0 behavior, simply: - - $config->set('HTML', 'EnableAttrID', true); - -Don't come crying to me when your page mysteriously stops validating, though. diff --git a/docs/index.html b/docs/index.html index 854e189c..a5a70385 100644 --- a/docs/index.html +++ b/docs/index.html @@ -3,7 +3,7 @@ "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> - + Documentation - HTML Purifier @@ -20,6 +20,13 @@ Here is an index of all of them.

End-user documentation that contains articles, tutorials and useful information for casual developers using HTML Purifier.

+
+ +
IDs
+
Explains various methods for allowing IDs in documents safely in HTML Purifier.
+ +
+

Development

Developer documentation detailing code issues, roadmaps and project conventions.

diff --git a/docs/style.css b/docs/style.css index 8c3aee83..bc7e85a4 100644 --- a/docs/style.css +++ b/docs/style.css @@ -14,8 +14,9 @@ h4 {font-family:sans-serif; font-size:0.9em; font-weight:bold; } /* For witty quips */ .subtitled {margin-bottom:0em;} -.subtitle {font-size:.8em; margin-bottom:1em; text-align:center; - font-style:italic; margin-top:-.2em;} +.subtitle , .subsubtitle {font-size:.8em; margin-bottom:1em; + font-style:italic; margin-top:-.2em;text-align:center;} +.subsubtitle {text-align:left;margin-left:2em;} /* Used for special "See also" links. */ .reference {font-style:italic;margin-left:2em;}