diff --git a/docs/enduser-overview.txt b/docs/enduser-overview.txt index 1e9f13c0..3ebccd21 100644 --- a/docs/enduser-overview.txt +++ b/docs/enduser-overview.txt @@ -36,7 +36,7 @@ forgiving lexer. You may also be interested in the unit tests located in the tests/ folder, which provide a living document on how exactly the filter deals with malformed input. -In summary: +In summary (see corresponding classes for more details): 1. Parse document into an array of tag and text tokens (Lexer) 2. Remove all elements not on whitelist and transform certain other elements diff --git a/docs/enduser-security.txt b/docs/enduser-security.txt index e7c9a8ce..d33f473c 100644 --- a/docs/enduser-security.txt +++ b/docs/enduser-security.txt @@ -6,45 +6,17 @@ through negligence of people. This class will do its job: no more, no less, and it's up to you to provide it the proper information and proper context to be effective. Things to remember: -1. Character Encoding: UTF-8. - This segment will soon be obsoleted by enduser-utf8.html -Currently, the parser runs under the assumption that it is dealing -with UTF-8. Not ISO-8859-1 or Windows-1252, UTF-8. And definitely not "no -character encoding explicitly stated" or UTF-7. If you're not using UTF-8 as -your character encoding, make sure you configure HTML Purifier or switch -to UTF-8. Now. Also, make sure any input is properly converted to UTF-8, or -the parser will mangle it badly (though it won't be a security risk if you're -outputting it as UTF-8 though). Character encoding is, in general, a knotty -issue, but do yourself a favor and learn about it: - +1. Character Encoding: see enduser-utf8.html for more info. -2. Doctype: XHTML 1.0 Transitional -This is what the parser is outputting. For the most -part, it's compatible with HTML 4.01, but XHTML enforces some very nice things -that all web developers should use. Regardless, NO DOCTYPE is a NO. Quirks mode -has waaaay too many quirks for a little parser to handle. We did not select -strict in order to prevent ourselves from being too draconic on users, but -this may be configurable in the future. Do you want standards compliance? -The doctype is a good place to start. +2. Doctype: document pending feature completion +Not strictly necessary, actually. More in-depth discussion once we figure +out how to get strict loose mode working. -3. IDs - This segment is obsoleted by enduser-id.html -They need to be unique, but without some knowledge of the -rest of the document, it's difficult to know what's unique. %Attr.IDBlacklist -needs to be set: we may want to consider disallowing IDs by default to -save lazy programmers. +3. IDs: see enduser-id.html for more info -4. [PROJECTED] Links -We're not going to try for spam protection (although -some hooks for such a module might be nice) but we may offer the ability to -only accept relative URLs. Pick the one that's right for you. +4. Links: document pending feature completion +Rudimentary blacklisting, we should also allow only relative URIs. We +need a doc to explain the stuff. -5. CSS -While we can prevent the most flagrant cases from affecting your -layout (such as absolutely positioned elements), no amount of code is going -to protect your pages from being attacked by garish colors and plain old -bad taste. A neat feature would be the ability to define acceptable colors -in a document, but that's not likely to be implemented for a while. In the -meantime, be sure to make sure that floated elements (permitted, since they -can be quite useful) can't mess up your layout. Once again, we may want to -disable this by default to protect lazy developers. +5. CSS: document pending +Explain which CSS styles we blocked and why.