mirror of
				https://github.com/ezyang/htmlpurifier.git
				synced 2025-10-25 10:36:59 +02:00 
			
		
		
		
	git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1253 48356398-32a2-884e-a903-53898d9a118a
		
			
				
	
	
		
			153 lines
		
	
	
		
			7.2 KiB
		
	
	
	
		
			HTML
		
	
	
	
	
	
			
		
		
	
	
			153 lines
		
	
	
		
			7.2 KiB
		
	
	
	
		
			HTML
		
	
	
	
	
	
| <?xml version="1.0" encoding="UTF-8"?>
 | |
| <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
 | |
|     "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
 | |
| <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head>
 | |
| <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
 | |
| <meta name="description" content="Explains how to safely allow the embedding of flash from trusted sites in HTML Purifier." />
 | |
| <link rel="stylesheet" type="text/css" href="./style.css" />
 | |
| 
 | |
| <title>Embedding YouTube Videos - HTML Purifier</title>
 | |
| 
 | |
| </head><body>
 | |
| 
 | |
| <h1 class="subtitled">Embedding YouTube Videos</h1>
 | |
| <div class="subtitle">...as well as other dangerous active content</div>
 | |
| 
 | |
| <div id="filing">Filed under End-User</div>
 | |
| <div id="index">Return to the <a href="index.html">index</a>.</div>
 | |
| <div id="home"><a href="http://htmlpurifier.org/">HTML Purifier</a> End-User Documentation</div>
 | |
| 
 | |
| <p>Clients like their YouTube videos. It gives them a warm fuzzy feeling when
 | |
| they see a neat little embedded video player on their websites that can play
 | |
| the latest clips from their documentary "Fido and the Bones of Spring".
 | |
| All joking aside, the ability to embed YouTube videos or other active
 | |
| content in their pages is something that a lot of people like.</p>
 | |
| 
 | |
| <p>This is a <em>bad</em> idea. The moment you embed anything untrusted,
 | |
| you will definitely be slammed by a manner of nasties that can be
 | |
| embedded in things from your run of the mill Flash movie to
 | |
| <a href="http://blog.spywareguide.com/2006/12/myspace_phish_attack_leads_use.html">Quicktime movies</a>.
 | |
| Even <code>img</code> tags, which HTML Purifier allows by default, can be
 | |
| dangerous. Be distrustful of anything that tells a browser to load content
 | |
| from another website automatically.</p>
 | |
| 
 | |
| <p>Luckily for us, however, whitelisting saves the day. Sure, letting users
 | |
| include any old random flash file could be dangerous, but if it's
 | |
| from a specific website, it probably is okay. If no amount of pleading will
 | |
| convince the people upstairs that they should just settle with just linking
 | |
| to their movies, you may find this technique very useful.</p>
 | |
| 
 | |
| <h2>Looking in</h2>
 | |
| 
 | |
| <p>Below is custom code that allows users to embed
 | |
| YouTube videos. This is not favoritism: this trick can easily be adapted for
 | |
| other forms of embeddable content.</p>
 | |
| 
 | |
| <p>Usually, websites like YouTube give us boilerplate code that you can insert
 | |
| into your documents. YouTube's code goes like this:</p>
 | |
| 
 | |
| <pre>
 | |
| <object width="425" height="350">
 | |
|   <param name="movie" value="http://www.youtube.com/v/AyPzM5WK8ys" />
 | |
|   <param name="wmode" value="transparent" />
 | |
|   <embed src="http://www.youtube.com/v/AyPzM5WK8ys"
 | |
|          type="application/x-shockwave-flash"
 | |
|          wmode="transparent" width="425" height="350" />
 | |
| </object>
 | |
| </pre>
 | |
| 
 | |
| <p>There are two things to note about this code:</p>
 | |
| 
 | |
| <ol>
 | |
|     <li><code><embed></code> is not recognized by W3C, so if you want
 | |
|         standards-compliant code, you'll have to get rid of it.</li>
 | |
|     <li>The code is exactly the same for all instances, except for the
 | |
|         identifier <tt>AyPzM5WK8ys</tt> which tells us which movie file
 | |
|         to retrieve.</li>
 | |
| </ol>
 | |
| 
 | |
| <p>What point 2 means is that if we have code like <code><span
 | |
| class="embed-youtube">AyPzM5WK8ys</span></code> your
 | |
| application can reconstruct the full object from this small snippet that
 | |
| passes through HTML Purifier <em>unharmed</em>.
 | |
| <a href="http://htmlpurifier.org/svnroot/htmlpurifier/trunk/library/HTMLPurifier/Filter/YouTube.php">Show me the code!</a></p>
 | |
| 
 | |
| <p>And the corresponding usage:</p>
 | |
| 
 | |
| <pre><?php
 | |
|     // assuming $purifier is an instance of HTMLPurifier
 | |
|     require_once 'HTMLPurifier/Filter/YouTube.php';
 | |
|     $purifier->addFilter(new HTMLPurifier_Filter_YouTube());
 | |
| ?></pre>
 | |
| 
 | |
| <p>There is a bit going in the two code snippets, so let's explain.</p>
 | |
| 
 | |
| <ol>
 | |
|     <li>This is a Filter object, which intercepts the HTML that is
 | |
|         coming into and out of the purifier. You can add as many
 | |
|         filter objects as you like. <code>preFilter()</code>
 | |
|         processes the code before it gets purified, and <code>postFilter()</code>
 | |
|         processes the code afterwards. So, we'll use <code>preFilter()</code> to
 | |
|         replace the object tag with a <code>span</code>, and <code>postFilter()</code>
 | |
|         to restore it.</li>
 | |
|     <li>The first preg_replace call replaces any YouTube code users may have
 | |
|         embedded into the benign span tag. Span is used because it is inline,
 | |
|         and objects are inline too. We are very careful to be extremely
 | |
|         restrictive on what goes inside the span tag, as if an errant code
 | |
|         gets in there it could get messy.</li>
 | |
|     <li>The HTML is then purified as usual.</li>
 | |
|     <li>Then, another preg_replace replaces the span tag with a fully fledged
 | |
|         object. Note that the embed is removed, and, in its place, a data
 | |
|         attribute was added to the object. This makes the tag standards
 | |
|         compliant! It also breaks Internet Explorer, so we add in a bit of
 | |
|         conditional comments with the old embed code to make it work again.
 | |
|         It's all quite convoluted but works.</li>
 | |
| </ol>
 | |
| 
 | |
| <h2>Warning</h2>
 | |
| 
 | |
| <p>There are a number of possible problems with the code above, depending
 | |
| on how you look at it.</p>
 | |
| 
 | |
| <h3>Cannot change width and height</h3>
 | |
| 
 | |
| <p>The width and height of the final YouTube movie cannot be adjusted. This
 | |
| is because I am lazy. If you really insist on letting users change the size
 | |
| of the movie, what you need to do is package up the attributes inside the
 | |
| span tag (along with the movie ID). It gets complicated though: a malicious
 | |
| user can specify an outrageously large height and width and attempt to crash
 | |
| the user's operating system/browser. You need to either cap it by limiting
 | |
| the amount of digits allowed in the regex or using a callback to check the
 | |
| number.</p>
 | |
| 
 | |
| <h3>Trusts media's host's security</h3>
 | |
| 
 | |
| <p>By allowing this code onto our website, we are trusting that YouTube has
 | |
| tech-savvy enough people not to allow their users to inject malicious
 | |
| code into the Flash files.  An exploit on YouTube means an exploit on your
 | |
| site.  Even though YouTube is run by the reputable Google, it
 | |
| <a href="http://ha.ckers.org/blog/20061213/google-xss-vuln/">doesn't</a>
 | |
| mean they are
 | |
| <a href="http://ha.ckers.org/blog/20061208/xss-in-googles-orkut/">invulnerable.</a>
 | |
| You're putting a certain measure of the job on an external provider (just as
 | |
| you have by entrusting your user input to HTML Purifier), and
 | |
| it is important that you are cognizant of the risk.</p>
 | |
| 
 | |
| <h3>Poorly written adaptations compromise security</h3>
 | |
| 
 | |
| <p>This should go without saying, but if you're going to adapt this code
 | |
| for Google Video or the like, make sure you do it <em>right</em>. It's
 | |
| extremely easy to allow a character too many in <code>postFilter()</code> and
 | |
| suddenly you're introducing XSS into HTML Purifier's XSS free output. HTML
 | |
| Purifier may be well written, but it cannot guard against vulnerabilities
 | |
| introduced after it has finished.</p>
 | |
| 
 | |
| <h2>Help out!</h2>
 | |
| 
 | |
| <p>If you write a filter for your favorite video destination (or anything
 | |
| like that, for that matter), send it over and it might get included
 | |
| with the core!</p>
 | |
| 
 | |
| </body>
 | |
| </html>
 |