Fix Plainify edge cases

This commit replaces the main part of `helpers.StripHTML` with Go's implementation in its html/template package.

It's a little slower, but correctness is more important:

```bash
BenchmarkStripHTMLOld-10    	  680316	      1764 ns/op	     728 B/op	       4 allocs/op
BenchmarkStripHTMLNew-10    	  384520	      3099 ns/op	    2089 B/op	      10 allocs/op
```

Fixes #9199
Fixes #9909
Closes #9410
This commit is contained in:
Bjørn Erik Pedersen
2022-05-25 10:56:14 +02:00
parent cd0112a05a
commit 3854a6fa6c
10 changed files with 103 additions and 85 deletions

View File

@@ -201,7 +201,7 @@ func newPageContentOutput(p *pageState, po *pageOutput) (*pageContentOutput, err
})
cp.initPlain = cp.initMain.Branch(func() (any, error) {
cp.plain = helpers.StripHTML(string(cp.content))
cp.plain = tpl.StripHTML(string(cp.content))
cp.plainWords = strings.Fields(cp.plain)
cp.setWordCounts(p.m.isCJKLanguage)