Replace cheerio with parse5 (#934)

* Replace cheerio with parse5 * Convert to dependency injection for HTML parser * Add options.domParser to HTML serializer * Fallback to native DOMParser if present and no option provided * Error if no DOM parser is available (option or native) * Update tests to pass parse5 as config option * Update test so it passes. Cheerio interprets `<p><hr /></p>` as one `p` node with a child `hr` node, but both parse5 and native DOMParser interpret it as 3 nodes: a blank `p` node, `hr` node, and second blank `p` node. Update test expectation to match new API. * Remove cheerio-esque compatibility conversion. * Use `application/xml` in native DOMParser Using `text/html` causes it to wrap the fragment in html, body, etc * Change error message to single line. Was inserting an undesired newline char * Add documentation for new `domParser` option to html serializer Also boyscout missing documentation for `defaultBlockType` option * Rename `domParser` option to `parseHtml` Rename the option to make it clearer what it does, since it accepts a function and not a `DOMParser` analogue object.
2025-08-30 18:39:51 +02:00 · 2017-07-20 12:46:02 -04:00
parent cf85c6e3fb
commit 4bbf7487ea
26 changed files with 186 additions and 262 deletions
--- a/docs/walkthroughs/saving-and-loading-html-content.md
+++ b/docs/walkthroughs/saving-and-loading-html-content.md
@@ -58,7 +58,7 @@ const rules = [
        return {
          kind: 'block',
          type: 'paragraph',
-          nodes: next(el.children)
+          nodes: next(el.childNodes)
        }
      }
    }
@@ -68,7 +68,7 @@ const rules = [

 If you've worked with the [`Raw`](../reference/serializers/raw.md) serializer before, the return value of the `deserialize` should look familiar! It's just the same raw JSON format.

-The `el` argument that the `deserialize` function receives is just a [`cheerio`](https://github.com/cheeriojs/cheerio) element object. And the `next` argument is a function that will deserialize any `cheerio` element(s) we pass it, which is how you recurse through each nodes children.
+The `el` argument that the `deserialize` function receives is just a DOM element. And the `next` argument is a function that will deserialize any element(s) we pass it, which is how you recurse through each node's children.

 Okay, that's `deserialize`, now let's define the `serialize` property of the paragraph rule as well:

@@ -80,7 +80,7 @@ const rules = [
        return {
          kind: 'block',
          type: 'paragraph',
-          nodes: next(el.children)
+          nodes: next(el.childNodes)
        }
      }
    },
@@ -119,7 +119,7 @@ const rules = [
      return {
        kind: 'block',
        type: type,
-        nodes: next(el.children)
+        nodes: next(el.childNodes)
      }
    },
    // Switch serialize to handle more blocks...
@@ -137,7 +137,7 @@ const rules = [

 Now each of our block types is handled.

-You'll notice that even though code blocks are nested in a `<pre>` and a `<code>` element, we don't need to specifically handle that case in our `deserialize` function, because the `Html` serializer will automatically recurse through `el.children` if no matching deserializer is found. This way, unknown tags will just be skipped over in the tree, instead of their contents omitted completely.
+You'll notice that even though code blocks are nested in a `<pre>` and a `<code>` element, we don't need to specifically handle that case in our `deserialize` function, because the `Html` serializer will automatically recurse through `el.childNodes` if no matching deserializer is found. This way, unknown tags will just be skipped over in the tree, instead of their contents omitted completely.

 Okay. So now our serializer can handle blocks, but we need to add our marks to it as well. Let's do that with a new rule...

@@ -164,7 +164,7 @@ const rules = [
      return {
        kind: 'block',
        type: type,
-        nodes: next(el.children)
+        nodes: next(el.childNodes)
      }
    },
    serialize(object, children) {
@@ -184,7 +184,7 @@ const rules = [
      return {
        kind: 'mark',
        type: type,
-        nodes: next(el.children)
+        nodes: next(el.childNodes)
      }
    },
    serialize(object, children) {