mirror of
				https://github.com/ezyang/htmlpurifier.git
				synced 2025-10-26 10:06:02 +01:00 
			
		
		
		
	git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/branches/strict@1026 48356398-32a2-884e-a903-53898d9a118a
		
			
				
	
	
		
			117 lines
		
	
	
		
			4.6 KiB
		
	
	
	
		
			HTML
		
	
	
	
	
	
			
		
		
	
	
			117 lines
		
	
	
		
			4.6 KiB
		
	
	
	
		
			HTML
		
	
	
	
	
	
| <?xml version="1.0" encoding="UTF-8"?>
 | |
| <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
 | |
|     "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
 | |
| <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head>
 | |
| <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
 | |
| <meta name="description" content="Explains how to speed up HTML Purifier through caching or inbound filtering." />
 | |
| <link rel="stylesheet" type="text/css" href="./style.css" />
 | |
| 
 | |
| <title>Speeding up HTML Purifier - HTML Purifier</title>
 | |
| 
 | |
| </head><body>
 | |
| 
 | |
| <h1 class="subtitled">Speeding up HTML Purifier</h1>
 | |
| <div class="subtitle">...also known as the HELP ME LIBRARY IS TOO SLOW MY PAGE TAKE TOO LONG page</div>
 | |
| 
 | |
| <div id="filing">Filed under End-User</div>
 | |
| <div id="index">Return to the <a href="index.html">index</a>.</div>
 | |
| <div id="home"><a href="http://htmlpurifier.org/">HTML Purifier</a> End-User Documentation</div>
 | |
| 
 | |
| <p>HTML Purifier is a very powerful library. But with power comes great 
 | |
| responsibility, in the form of longer execution times.  Remember, this 
 | |
| library isn't lightly grazing over submitted HTML: it's deconstructing 
 | |
| the whole thing, rigorously checking the parts, and then putting it back 
 | |
| together. </p>
 | |
| 
 | |
| <p>So, if it so turns out that HTML Purifier is kinda too slow for outbound 
 | |
| filtering, you've got a few options: </p>
 | |
| 
 | |
| <h2>Inbound filtering</h2>
 | |
| 
 | |
| <p>Perform filtering of HTML when it's submitted by the user. Since the 
 | |
| user is already submitting something, an extra half a second tacked on 
 | |
| to the load time probably isn't going to be that huge of a problem.  
 | |
| Then, displaying the content is a simple a manner of outputting it 
 | |
| directly from your database/filesystem. The trouble with this method is 
 | |
| that your user loses the original text, and when doing edits, will be 
 | |
| handling the filtered text.  While this may be a good thing, especially 
 | |
| if you're using a WYSIWYG editor, it can also result in data-loss if a 
 | |
| user makes a typo. </p>
 | |
| 
 | |
| <p>Example (non-functional):</p>
 | |
| 
 | |
| <pre><?php
 | |
|     /**
 | |
|      * FORM SUBMISSION PAGE
 | |
|      * display_error($message) : displays nice error page with message
 | |
|      * display_success() : displays a nice success page
 | |
|      * display_form() : displays the HTML submission form
 | |
|      * database_insert($html) : inserts data into database as new row
 | |
|      */
 | |
|     if (!empty($_POST)) {
 | |
|         require_once '/path/to/library/HTMLPurifier.auto.php';
 | |
|         require_once 'HTMLPurifier.func.php';
 | |
|         $dirty_html = isset($_POST['html']) ? $_POST['html'] : false;
 | |
|         if (!$dirty_html) {
 | |
|             display_error('You must write some HTML!');
 | |
|         }
 | |
|         $html = HTMLPurifier($dirty_html);
 | |
|         database_insert($html);
 | |
|         display_success();
 | |
|         // notice that $dirty_html is *not* saved
 | |
|     } else {
 | |
|         display_form();
 | |
|     }
 | |
| ?></pre>
 | |
| 
 | |
| <h2>Caching the filtered output</h2>
 | |
| 
 | |
| <p>Accept the submitted text and put it unaltered into the database, but 
 | |
| then also generate a filtered version and stash that in the database.  
 | |
| Serve the filtered version to readers, and the unaltered version to 
 | |
| editors.  If need be, you can invalidate the cache and have the cached 
 | |
| filtered version be regenerated on the first page view.  Pros? Full data 
 | |
| retention. Cons? It's more complicated, and opens other editors up to 
 | |
| XSS if they are using a WYSIWYG editor (to fix that, they'd have to be 
 | |
| able to get their hands on the *really* original text served in 
 | |
| plaintext mode). </p>
 | |
| 
 | |
| <p>Example (non-functional):</p>
 | |
| 
 | |
| <pre><?php
 | |
|     /**
 | |
|      * VIEW PAGE
 | |
|      * display_error($message) : displays nice error page with message
 | |
|      * cache_get($id) : retrieves HTML from fast cache (db or file)
 | |
|      * cache_insert($id, $html) : inserts good HTML into cache system
 | |
|      * database_get($id) : retrieves raw HTML from database
 | |
|      */
 | |
|     $id = isset($_GET['id']) ? (int) $_GET['id'] : false;
 | |
|     if (!$id) {
 | |
|         display_error('Must specify ID.');
 | |
|         exit;
 | |
|     }
 | |
|     $html = cache_get($id); // filesystem or database
 | |
|     if ($html === false) {
 | |
|         // cache didn't have the HTML, generate it
 | |
|         $raw_html = database_get($id);
 | |
|         require_once '/path/to/library/HTMLPurifier.auto.php';
 | |
|         require_once 'HTMLPurifier.func.php';
 | |
|         $html = HTMLPurifier($raw_html);
 | |
|         cache_insert($id, $html);
 | |
|     }
 | |
|     echo $html;
 | |
| ?></pre>
 | |
| 
 | |
| <h2>Summary</h2>
 | |
| 
 | |
| <p>In short, inbound filtering is the simple option and caching is the
 | |
| robust option (albeit with bigger storage requirements). </p>
 | |
| 
 | |
| <p>There is a third option, independent of the two we've discussed: profile 
 | |
| and optimize HTMLPurifier yourself. Be sure to report back your results 
 | |
| if you decide to do that! Especially if you port HTML Purifier to C++. 
 | |
| <tt>;-)</tt></p>
 | |
| 
 | |
| </body>
 | |
| </html> |