From 4d2ec806ac84c82803faf7cfb2e8f853d56a6bdb Mon Sep 17 00:00:00 2001 From: "Edward Z. Yang" Date: Mon, 17 Apr 2006 21:32:53 +0000 Subject: [PATCH] Add a security document, detailing issues that white-listing won't resolve. git-svn-id: http://htmlpurifier.org/svnroot/html_purifier/trunk@45 48356398-32a2-884e-a903-53898d9a118a --- docs/security.txt | 34 ++++++++++++++++++++++++++++++++++ 1 file changed, 34 insertions(+) create mode 100644 docs/security.txt diff --git a/docs/security.txt b/docs/security.txt new file mode 100644 index 00000000..7b7963e4 --- /dev/null +++ b/docs/security.txt @@ -0,0 +1,34 @@ +== Possible Security Issues == + +Like anything that claims to afford security, HTML_Purifier can be circumvented +through negligence of people. This class will do its job: no more, no less, +and it's up to you to provide it the proper information and proper context +to be effective. Things to remember: + +1. UTF-8. Currently, the parser runs under the assumption that it is dealing +with UTF-8. Not ISO-8859-1 or Windows-1252, UTF-8. And definitely not "no +character encoding explicitly stated" or UTF-7. If you're not using UTF-8 as +your character encoding, you should switch. Now. (in future versions, however, +I may make the character encoding configurable, but there's only so much I +can do). Make sure any input is properly converted to UTF-8, or the parser +will mangle it badly (though it won't be a security risk if you're outputting +it as UTF-8). + +2. XHTML 1.0. This is what the parser is outputting. For the most part, it's +compatible with HTML 4.01, but XHTML enforces some very nice things that all +web developers should use. Regardless, NO DOCTYPE is a NO. Quirks mode has +waaaay too many quirks for a little parser to handle. + +3. [PROJECTED] IDs. They need to be unique, but without some knowledge of the +rest of the document, it's difficult to know what's unique. I project default +behavior being a customizable prefix to all ID declarations in the document, +so make sure you don't use that prefix. Might cause problems for multiple +instances of HTML escaped output too (especially when it comes to caching). +Best to just zap them completely, perhaps. This will be configurable, and you'll +have to pick the correct one. + +4. [PROJECTED] Links. We're not going to try for spam protection (although +some hooks for such a module might be nice) but we may offer the ability to +only accept relative URLs. Pick the one that's right for you. + +5. [PROJECTED] CSS. What a knotty issue. Probably will have to be configurable. \ No newline at end of file