1
0
mirror of https://github.com/nextapps-de/flexsearch.git synced 2025-08-28 16:20:04 +02:00

add readme part 1.5 of 2

This commit is contained in:
Thomas Wilkerling
2025-03-29 18:57:26 +01:00
parent 2b1771fd6d
commit b75fff8937
8 changed files with 771 additions and 1086 deletions

371
README.md
View File

@@ -15,13 +15,13 @@ FlexSearch v0.8: [Overview and Migration Guide](doc/0.8.0.md)
[Basic Start](#load-library)  •  [Basic Start](#load-library)  • 
[API Reference](#api-overview)  •  [API Reference](#api-overview)  • 
<a href="doc/encoder.md">Encoder</a> &ensp;&bull;&ensp; [Encoder](doc/encoder.md) &ensp;&bull;&ensp;
<a href="doc/document-search.md">Document Search</a> &ensp;&bull;&ensp; [Document Search](doc/document-search.md) &ensp;&bull;&ensp;
<a href="doc/persistent.md">Persistent Indexes</a> &ensp;&bull;&ensp; [Persistent Indexes](doc/persistent.md) &ensp;&bull;&ensp;
<a href="doc/worker.md">Using Worker</a> &ensp;&bull;&ensp; [Using Worker](doc/worker.md) &ensp;&bull;&ensp;
<a href="doc/document-search.md#tag-search">Tag Search</a> &ensp;&bull;&ensp; [Tag Search](doc/document-search.md#tag-search) &ensp;&bull;&ensp;
<a href="doc/resolver.md">Resolver</a> &ensp;&bull;&ensp; [Resolver](doc/resolver.md) &ensp;&bull;&ensp;
<a href="CHANGELOG.md">Changelog</a> [Changelog](CHANGELOG.md)
<!-- <!--
> [!NOTE] > [!NOTE]
@@ -108,7 +108,7 @@ Benchmarks:
<details> <details>
<summary>Latest Benchmark Results</summary> <summary>Latest Benchmark Results</summary>
<br>
The benchmark was measured in terms per seconds, higher values are better (except the test "Memory"). The benchmark was measured in terms per seconds, higher values are better (except the test "Memory").
The memory value refers to the amount of memory which was additionally allocated during search. The memory value refers to the amount of memory which was additionally allocated during search.
@@ -250,6 +250,7 @@ Extern Projects & Plugins:
- [Module (ESM)](#module-esm) - [Module (ESM)](#module-esm)
- [Node.js](#nodejs) - [Node.js](#nodejs)
- [Basic Usage and Variants](#basic-usage-and-variants) - [Basic Usage and Variants](#basic-usage-and-variants)
- [Common Code Examples (Browser, Node.js)](#common-code-examples)
- [API Overview](#api-overview) - [API Overview](#api-overview)
- [Options](doc/options.md) - [Options](doc/options.md)
- [Index Options](doc/options.md) - [Index Options](doc/options.md)
@@ -261,20 +262,21 @@ Extern Projects & Plugins:
- [Context Search](#context-search) - [Context Search](#context-search)
- [Document Search (Multi-Field Search)](doc/document-search.md) - [Document Search (Multi-Field Search)](doc/document-search.md)
- [Multi-Tag Search](doc/document-search.md) - [Multi-Tag Search](doc/document-search.md)
- [Phonetic Search (Fuzzy Search)](doc/fuzzy-search.md) - [Phonetic Search (Fuzzy Search)](#fuzzy-search)
- [Tokenizer (Partial Search)](#tokenizer-partial-match) - [Tokenizer (Partial Search)](#tokenizer-partial-match)
- [Encoder](doc/encoder.md) - [Encoder](doc/encoder.md)
- Universal Charset Collection - [Universal Charset Collection](doc/encoder.md)
- Latin Charset Encoder Presets - [Latin Charset Encoder Presets](doc/encoder.md)
- Language Specific Preset - [Language Specific Preset](doc/encoder.md)
- [Custom Encoder](doc/encoder.md#custom-encoder) - [Custom Encoder](doc/encoder.md#custom-encoder)
- [Non-Blocking Runtime Balancer (Async)](doc/async.md) - [Non-Blocking Runtime Balancer (Async)](doc/async.md)
- [Worker Indexes](doc/worker.md) - [Worker Indexes](doc/worker.md)
- [Resolver (Complex Queries)](doc/resolver.md) - [Resolver (Complex Queries)](doc/resolver.md)
- Boolean Operations (and, or, xor, not) - [Boolean Operations (and, or, xor, not)](doc/resolver.md)
- Boost - [Boost](doc/resolver.md)
- Limit / Offset - [Limit / Offset](doc/resolver.md)
- Resolve - [Resolve](doc/resolver.md)
- [Auto-Balanced Cache by Popularity/Last Query](#auto-balanced-cache-by-popularity)
- [Export / Import Indexes](doc/export-import.md) - [Export / Import Indexes](doc/export-import.md)
- [Fast-Boot Serialization](doc/export-import.md#fast-boot-serialization-for-server-side-rendering-php-python-ruby-rust-java-go-nodejs-) - [Fast-Boot Serialization](doc/export-import.md#fast-boot-serialization-for-server-side-rendering-php-python-ruby-rust-java-go-nodejs-)
- [Persistent Indexes](doc/persistent.md) - [Persistent Indexes](doc/persistent.md)
@@ -285,14 +287,12 @@ Extern Projects & Plugins:
- [SQLite](doc/persistent-sqlite.md) - [SQLite](doc/persistent-sqlite.md)
- [Clickhouse](doc/persistent-clickhouse.md) - [Clickhouse](doc/persistent-clickhouse.md)
- [Result Highlighting](doc/result-highlighting.md) - [Result Highlighting](doc/result-highlighting.md)
- [Common Code Examples (Browser, Node.js)](#common-code-examples) - [Custom Score Function](doc/customization.md)
- [Custom Builds](doc/custom-builds.md)
- [Extended Keystores (In-Memory)](doc/keystore.md)
## Load Library (Node.js, ESM, Legacy Browser) ## Load Library (Node.js, ESM, Legacy Browser)
<!--
> Please do not use the `/src/` folder of this repo. It isn't meant to be used directly, instead it needs [conditional compilation](https://en.wikipedia.org/wiki/Conditional_compilation). You can easily perform a <a href="#builds">custom build</a>, but you shouldn't use the source folder for production. You will need at least any kind of compiler which resolve the compiler flags within the code like [Closure Compiler](https://github.com/google/closure-compiler) (Advanced Compilation) or with [Babel conditional-compile](https://github.com/brianZeng/babel-plugin-conditional-compile) plugin. The `/dist/` folder is containing every version which you probably need including unminified ESM modules.
-->
```bash ```bash
npm install flexsearch npm install flexsearch
``` ```
@@ -301,7 +301,7 @@ The **_dist_** folder is located in: `node_modules/flexsearch/dist/`
<details> <details>
<summary>Download Builds</summary> <summary>Download Builds</summary>
<br>
<table> <table>
<tr></tr> <tr></tr>
<tr> <tr>
@@ -397,8 +397,9 @@ The **_dist_** folder is located in: `node_modules/flexsearch/dist/`
<a name="bundles"></a> <a name="bundles"></a>
<details> <details>
<summary>Compare Bundles: Light, Compact, Bundle</summary> <summary>Compare Bundles: Light, Compact, Bundle</summary>
<br>
> The Node.js package includes all features from `flexsearch.bundle.js`. > The Node.js package includes all features.
<table> <table>
<tr></tr> <tr></tr>
@@ -419,7 +420,7 @@ The **_dist_** folder is located in: `node_modules/flexsearch/dist/`
<tr></tr> <tr></tr>
<tr> <tr>
<td> <td>
<a href="#async">Async Processing</a> <a href="doc/async.md">Async Processing</a>
</td> </td>
<td></td> <td></td>
<td></td> <td></td>
@@ -428,7 +429,7 @@ The **_dist_** folder is located in: `node_modules/flexsearch/dist/`
<tr></tr> <tr></tr>
<tr> <tr>
<td> <td>
<a href="#webworker">Workers (Web + Node.js)</a> <a href="doc/worker.md">Workers (Web + Node.js)</a>
</td> </td>
<td></td> <td></td>
<td>-</td> <td>-</td>
@@ -437,7 +438,7 @@ The **_dist_** folder is located in: `node_modules/flexsearch/dist/`
<tr></tr> <tr></tr>
<tr> <tr>
<td> <td>
<a href="#contextual">Context Search</a> <a href="#context-search">Context Search</a>
</td> </td>
<td></td> <td></td>
<td></td> <td></td>
@@ -446,7 +447,7 @@ The **_dist_** folder is located in: `node_modules/flexsearch/dist/`
<tr></tr> <tr></tr>
<tr> <tr>
<td> <td>
<a href="#docs">Document Search</a> <a href="doc/document-search.md">Document Search</a>
</td> </td>
<td></td> <td></td>
<td></td> <td></td>
@@ -455,7 +456,7 @@ The **_dist_** folder is located in: `node_modules/flexsearch/dist/`
<tr></tr> <tr></tr>
<tr> <tr>
<td> <td>
<a href="#store">Document Store</a> <a href="doc/document-search.md#store">Document Store</a>
</td> </td>
<td></td> <td></td>
<td></td> <td></td>
@@ -473,16 +474,7 @@ The **_dist_** folder is located in: `node_modules/flexsearch/dist/`
<tr></tr> <tr></tr>
<tr> <tr>
<td> <td>
Relevance Scoring <a href="doc/cache.md">Auto-Balanced Cache by Popularity/Last Queries</a>
</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr></tr>
<tr>
<td>
<a href="#cache">Auto-Balanced Cache by Popularity/Last Queries</a>
</td> </td>
<td></td> <td></td>
<td></td> <td></td>
@@ -491,7 +483,7 @@ The **_dist_** folder is located in: `node_modules/flexsearch/dist/`
<tr></tr> <tr></tr>
<tr> <tr>
<td> <td>
<a href="#tags">Tag Search</a> <a href="doc/document-search.md#tag-search">Tag Search</a>
</td> </td>
<td></td> <td></td>
<td></td> <td></td>
@@ -509,7 +501,7 @@ The **_dist_** folder is located in: `node_modules/flexsearch/dist/`
<tr></tr> <tr></tr>
<tr> <tr>
<td> <td>
<a href="#phonetic">Phonetic Search (Fuzzy Search)</a> <a href="#fuzzy-search">Phonetic Search (Fuzzy Search)</a>
</td> </td>
<td></td> <td></td>
<td></td> <td></td>
@@ -517,28 +509,28 @@ The **_dist_** folder is located in: `node_modules/flexsearch/dist/`
</tr> </tr>
<tr></tr> <tr></tr>
<tr> <tr>
<td>Encoder</td> <td><a href="doc/encoder.md">Encoder</a></td>
<td></td> <td></td>
<td></td> <td></td>
<td></td> <td></td>
</tr> </tr>
<tr></tr> <tr></tr>
<tr> <tr>
<td><a href="#export">Export / Import Indexes</a></td> <td><a href="doc/export-import.md">Export / Import Indexes</a></td>
<td></td> <td></td>
<td></td> <td></td>
<td>-</td> <td>-</td>
</tr> </tr>
<tr></tr> <tr></tr>
<tr> <tr>
<td><a href="#resolver">Resolver</a></td> <td><a href="doc/resolver.md">Resolver</a></td>
<td></td> <td></td>
<td>-</td> <td>-</td>
<td>-</td> <td>-</td>
</tr> </tr>
<tr></tr> <tr></tr>
<tr> <tr>
<td><a href="#db">Persistent Index (IndexedDB)</a></td> <td><a href="doc/persistent.md">Persistent Index (IndexedDB)</a></td>
<td></td> <td></td>
<td>-</td> <td>-</td>
<td>-</td> <td>-</td>
@@ -700,15 +692,15 @@ const index = new FlexSearch.Index(/* ... */);
Or require FlexSearch members separately by: Or require FlexSearch members separately by:
```js ```js
const { Index, Document, Encoder, Charset, Resolver, Worker, IndexedDB } = require("flexsearch"); const { Index, Document, Encoder, Charset, Resolver, Worker } = require("flexsearch");
const index = new Index(/* ... */); const index = new Index(/* ... */);
``` ```
When using ESM instead of CommonJS: When using ESM instead of CommonJS:
```js ```js
import { Index, Document, Encoder, Charset, Resolver, Worker, IndexedDB } from "flexsearch"; import { Index, Document, Encoder, Charset, Resolver, Worker } from "flexsearch";
const index = new FlexSearch.Index(/* ... */); const index = new Index(/* ... */);
``` ```
Language packs are accessible via: Language packs are accessible via:
@@ -746,19 +738,17 @@ index.add(id, text);
const result = index.search(text, options); const result = index.search(text, options);
``` ```
```js
worker.add(id, text);
const result = worker.search(text, options);
```
```js ```js
document.add(doc); document.add(doc);
const result = document.search(text, options); const result = document.search(text, options);
``` ```
Each of these index types have a persistent model (optionally). So, persistent index isn't a new 4th index type, instead it extends the existing ones. ```js
await worker.add(id, text);
const result = await worker.search(text, options);
```
> Every method called on a `Worker` index is treated as async. You will get back a `Promise` or you can provide a callback function as the last parameter additionally. > Every method called on a `Worker` index is treated as async. You will get back a `Promise` or you can provide a callback function as the last parameter alternatively.
### Common Code Examples ### Common Code Examples
@@ -766,7 +756,7 @@ The documentation will refer to several examples. A list of all examples:
<a name="examples-nodejs"></a> <a name="examples-nodejs"></a>
<details> <details>
<summary>Examples Node.js (CommonJS)</summary> <summary>Examples Node.js (CommonJS)</summary><br>
- [basic](example/nodejs-commonjs/basic) - [basic](example/nodejs-commonjs/basic)
- [basic-suggestion](example/nodejs-commonjs/basic-suggestion) - [basic-suggestion](example/nodejs-commonjs/basic-suggestion)
@@ -786,7 +776,7 @@ The documentation will refer to several examples. A list of all examples:
</details> </details>
<details> <details>
<summary>Examples Node.js (ESM/Module)</summary> <summary>Examples Node.js (ESM/Module)</summary><br>
- [basic](example/nodejs-esm/basic) - [basic](example/nodejs-esm/basic)
- [basic-suggestion](example/nodejs-esm/basic-suggestion) - [basic-suggestion](example/nodejs-esm/basic-suggestion)
@@ -808,7 +798,7 @@ The documentation will refer to several examples. A list of all examples:
<a name="examples-browser"></a> <a name="examples-browser"></a>
<details> <details>
<summary>Examples Browser (Legacy)</summary> <summary>Examples Browser (Legacy)</summary><br>
- [basic](example/browser-legacy/basic) - [basic](example/browser-legacy/basic)
- [basic-suggestion](example/browser-legacy/basic-suggestion) - [basic-suggestion](example/browser-legacy/basic-suggestion)
@@ -823,7 +813,7 @@ The documentation will refer to several examples. A list of all examples:
</details> </details>
<details> <details>
<summary>Examples Browser (ESM/Module)</summary> <summary>Examples Browser (ESM/Module)</summary><br>
- [basic](example/browser-module/basic) - [basic](example/browser-module/basic)
- [basic-suggestion](example/browser-module/basic-suggestion) - [basic-suggestion](example/browser-module/basic-suggestion)
@@ -888,19 +878,19 @@ Global Members:
`Document` Methods: `Document` Methods:
- document.<a href="#document.add">__add__</a>(\<id\>, document) - document.<a href="#document.add">__add__</a>(\<id\>, document)\
- ~~document.<a href="#document.append">__append__</a>(\<id\>, document)~~ - ~~document.<a href="#document.append">__append__</a>(\<id\>, document)~~\
- document.<a href="#document.update">__update__</a>(\<id\>, document) - document.<a href="#document.update">__update__</a>(\<id\>, document)\
- document.<a href="#document.remove">__remove__</a>(id) - document.<a href="#document.remove">__remove__</a>(id)\
- document.<a href="#document.remove">__remove__</a>(document) - document.<a href="#document.remove">__remove__</a>(document)\
- document.<a href="#document.search">__search__</a>(string, \<limit\>, \<options\>) - document.<a href="#document.search">__search__</a>(string, \<limit\>, \<options\>)\
- document.<a href="#document.search">__search__</a>(options) - document.<a href="#document.search">__search__</a>(options)\
- document.<a href="#document.searchCache">__searchCache__</a>(...) - document.<a href="#document.searchCache">__searchCache__</a>(...)\
- document.<a href="#document.contain">__contain__</a>(id) - document.<a href="#document.contain">__contain__</a>(id)\
- document.<a href="#document.clear">__clear__</a>() - document.<a href="#document.clear">__clear__</a>()\
- document.<a href="#index.cleanup">__cleanup__</a>() - document.<a href="#index.cleanup">__cleanup__</a>()\
- document.<a href="#document.get">__get__</a>(id) - document.<a href="#document.get">__get__</a>(id)\
- document.<a href="#document.get">__set__</a>(\<id\>, document) - document.<a href="#document.get">__set__</a>(\<id\>, document)\
- <small>_async_</small> document.<a href="#document.export">__export__</a>(handler) - <small>_async_</small> document.<a href="#document.export">__export__</a>(handler)
@@ -970,12 +960,14 @@ Methods `export` and also `import` are always async as well as every method you
--- ---
`Charset` Encoder Preset: `Charset` Universal Encoder Preset:
- Charset.<a href="#charset">__Exact__</a> - Charset.<a href="#charset">__Exact__</a>
- Charset.<a href="#charset">__Default__</a> - Charset.<a href="#charset">__Default__</a>
- Charset.<a href="#charset">__Normalize__</a> - Charset.<a href="#charset">__Normalize__</a>
- Charset.<a href="#charset">__Dedupe__</a>
`Charset` Latin-specific Encoder Preset:
- Charset.<a href="#charset">__LatinBalance__</a> - Charset.<a href="#charset">__LatinBalance__</a>
- Charset.<a href="#charset">__LatinAdvanced__</a> - Charset.<a href="#charset">__LatinAdvanced__</a>
@@ -1064,53 +1056,55 @@ Encoding is one of the most important task and heavily influence:
<tr> <tr>
<td>Option</td> <td>Option</td>
<td>Description</td> <td>Description</td>
<td>Charset Type</td>
<td>Compression Ratio</td> <td>Compression Ratio</td>
</tr> </tr>
<tr> <tr>
<td><code>Exact</code></td> <td><code>Exact</code></td>
<td>Bypass encoding and take exact input</td> <td>Bypass encoding and take exact input</td>
<td>Universal (multi-lang)</td>
<td>0%</td> <td>0%</td>
</tr> </tr>
<tr></tr> <tr></tr>
<tr> <tr>
<td><code>Default</code></td> <td><code>Normalize (Default)</code></td>
<td>Case in-sensitive encoding</td> <td>Case in-sensitive encoding<br>Charset normalization<br>Letter deduplication</td>
<td>3%</td> <td>Universal (multi-lang)</td>
</tr>
<tr></tr>
<tr>
<td><code>Normalize</code></td>
<td>Case in-sensitive encoding<br>Charset normalization</td>
<td>~ 7%</td> <td>~ 7%</td>
</tr> </tr>
<tr></tr> <tr></tr>
<tr> <tr>
<td><code>LatinBalance</code></td> <td><code>LatinBalance</code></td>
<td>Case in-sensitive encoding<br>Charset normalization<br>Phonetic basic transformation</td> <td>Case in-sensitive encoding<br>Charset normalization<br>Letter deduplication<br>Phonetic basic transformation</td>
<td>Latin</td>
<td>~ 30%</td> <td>~ 30%</td>
</tr> </tr>
<tr></tr> <tr></tr>
<tr> <tr>
<td><code>LatinAdvanced</code></td> <td><code>LatinAdvanced</code></td>
<td>Case in-sensitive encoding<br>Charset normalization<br>Phonetic advanced transformation</td> <td>Case in-sensitive encoding<br>Charset normalization<br>Letter deduplication<br>Phonetic advanced transformation</td>
<td>Latin</td>
<td>~ 45%</td> <td>~ 45%</td>
</tr> </tr>
<tr></tr> <tr></tr>
<tr> <tr>
<td><code>LatinExtra</code></td> <td><code>LatinExtra</code></td>
<td>Case in-sensitive encoding<br>Charset normalization<br>Soundex-like transformation</td> <td>Case in-sensitive encoding<br>Charset normalization<br>Letter deduplication<br>Soundex-like transformation</td>
<td>Latin</td>
<td>~ 60%</td> <td>~ 60%</td>
</tr> </tr>
<tr></tr> <tr></tr>
<tr> <tr>
<td><code>LatinSoundex</code></td> <td><code>LatinSoundex</code></td>
<td>Full Soundex transformation</td> <td>Full Soundex transformation</td>
<td>Latin</td>
<td>~ 70%</td> <td>~ 70%</td>
</tr> </tr>
<tr></tr> <tr></tr>
<tr> <tr>
<td><code>function(str) => [str]</code></td> <td><code>function(str) => [str]</code></td>
<td>Pass a custom encoding function to the <code>Encoder</code></td> <td>Pass a custom encoding function to the <code>Encoder</code></td>
<td>Latin</td>
<td></td> <td></td>
</tr> </tr>
</table> </table>
@@ -1121,22 +1115,22 @@ Encoding is one of the most important task and heavily influence:
#### Create a new index #### Create a new index
```js ```js
var index = new Index(); const index = new Index();
``` ```
Create a new index and choosing one of the presets: Create a new index and choosing one of the [Presets](#presets):
```js ```js
var index = new Index("performance"); const index = new Index("match");
``` ```
Create a new index with custom options: Create a new index with custom options:
```js ```js
var index = new Index({ const index = new Index({
charset: "latin:extra", tokenize: "forward",
tokenize: "reverse", resolution: 9,
resolution: 9 fastupdate: true
}); });
``` ```
@@ -1150,6 +1144,19 @@ var index = new FlexSearch({
}); });
``` ```
Create a new index and assign an [Encoder](doc/encoder.md):
```js
//import { Charset } from "./dist/module/charset.js";
import { Charset } from "flexsearch";
const index = new Index({
tokenize: "forward",
encoder: Charset.LatinBalance
});
```
The resolution refers to the maximum count of scoring slots on which the content is divided into. The resolution refers to the maximum count of scoring slots on which the content is divided into.
> A formula to determine a well-balanced value for the `resolution` is: $2*floor(\sqrt{content.length})$ where content is the value pushed by `index.add()`. Here the maximum length of all contents should be used. > A formula to determine a well-balanced value for the `resolution` is: $2*floor(\sqrt{content.length})$ where content is the value pushed by `index.add()`. Here the maximum length of all contents should be used.
@@ -1182,6 +1189,7 @@ Limit the result:
index.search("John", 10); index.search("John", 10);
``` ```
<a name="index.contain"></a>
#### Check existence of already indexed IDs #### Check existence of already indexed IDs
You can check if an ID was already indexed by: You can check if an ID was already indexed by:
@@ -1192,34 +1200,6 @@ if(index.contain(1)){
} }
``` ```
<!--
## Append Contents (*deprecated)
You can append contents to an existing index like:
```js
index.append(id, content);
```
This will not overwrite the old indexed contents as it will do when perform `index.update(id, content)`. Keep in mind that `index.add(id, content)` will also perform "update" under the hood when the id was already being indexed.
Appended contents will have their own context and also their own full `resolution`. Therefore, the relevance isn't being stacked but gets its own context.
Let us take this example:
```js
index.add(0, "some index");
index.append(0, "some appended content");
index.add(1, "some text");
index.append(1, "index appended content");
```
When you query `index.search("index")` then you will get index id 1 as the first entry in the result, because the context starts from zero for the appended data (isn't stacked to the old context) and here "index" is the first term.
If you didn't want this behavior than just use the standard `index.add(id, content)` and provide the full length of content.
-->
<a name="index.update"></a> <a name="index.update"></a>
#### Update item from an index #### Update item from an index
@@ -1238,24 +1218,136 @@ index.update(0, "Max Miller");
index.remove(0); index.remove(0);
``` ```
<a name="docs"></a>
## Document Search (Field-Search)
[Read here](doc/document-search.md)
<a name="chaining"></a>
### Chaining ### Chaining
Simply chain methods like: Simply chain methods like:
```js ```js
var index = Index.create().addMatcher({'â': 'a'}).add(0, 'foo').add(1, 'bar'); const index = Index.create().addMatcher({'â': 'a'}).add(0, 'foo').add(1, 'bar');
``` ```
```js ```js
index.remove(0).update(1, 'foo').add(2, 'foobar'); index.remove(0).update(1, 'foo').add(2, 'foobar');
``` ```
## Suggestions
Any query on each of the index types is supporting the option `suggest: true`. Also within some of the `Resolver` stages (and, not, xor) you can add this option for the same purpose.
When suggestions is enabled, it allows results which does not perfectly match to the given query e.g. when one term was not included. Suggestion-Search will keep track of the scoring, therefore the first result entry is the closest one to a perfect match.
```js
const index = Index.create().add(1, "cat dog bird");
const result = index.search("cat fish");
// result => []
```
Same query with suggestion enabled:
```js
const result = index.search("cat fish", { suggest: true });
// result => [ 1 ]
```
At least one match (or partial match) has to be found to get back any result:
```js
const result = index.search("horse fish", { suggest: true });
// result => []
```
## Fuzzy-Search
Fuzzysearch describes a basic concept of how making queries more tolerant. FlexSearch provides several methods to achieve fuzziness:
1. Use a tokenizer: `forward`, `reverse` or `full`
2. Don't forget to use any of the builtin encoder `simple` > `balance` > `advanced` > `extra` > `soundex` (sorted by fuzziness)
3. Use one of the language specific presets e.g. `/lang/en.js` for en-US specific content
4. Enable suggestions by passing the search option `suggest: true`
Additionally, you can apply custom `Mapper`, `Replacer`, `Stemmer`, `Filter` or by assigning a custom `normalize(str)`, `prepare(str)` or `finalize(arr)` function to the Encoder.
### Compare Built-In Encoder Preset
Original term which was indexed: "Struldbrugs"
<table>
<tr>
<th align="left">Encoder:</th>
<th><code>Exact</code></th>
<th><code>Normalize (Default)</code></th>
<th><code>LatinBalance</code></th>
<th><code>LatinAdvanced</code></th>
<th><code>LatinExtra</code></th>
<th><code>LatinSoundex</code></th>
</tr>
<tr>
<th align="left">Index Size</th>
<th>3.1 Mb</th>
<th>1.9 Mb</th>
<th>1.7 Mb</th>
<th>1.6 Mb</th>
<th>1.1 Mb</th>
<th>0.7 Mb</th>
</tr>
<tr>
<td align="left">Struldbrugs</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td align="left">strũlldbrųĝgs</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td align="left">strultbrooks</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td align="left">shtruhldbrohkz</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td align="left">zdroltbrykz</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td align="left">struhlbrogger</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</table>
The index size was measured after indexing the book "Gulliver's Travels".
<a name="context-search"></a> <a name="context-search"></a>
## Context Search ## Context Search
@@ -1292,6 +1384,24 @@ var index = new FlexSearch({
> The contextual index requires <a href="#memory">additional amount of memory</a> depending on depth. > The contextual index requires <a href="#memory">additional amount of memory</a> depending on depth.
## Auto-Balanced Cache (By Popularity)
You need to initialize the cache and its limit of available cache slots during the creation of the index:
```js
const index = new Index({ cache: 100 });
```
> The method `.searchCache(query)` is available for each type of index.
```js
const results = index.searchCache(query);
```
> The cache automatically balance stored entries related to their popularity.
The cache also stores latest queries. A common scenario is an autocomplete or instant search when typing.
## Index Memory Allocation ## Index Memory Allocation
The book "Gulliver's Travels" (Swift Jonathan 1726) was indexed for this test. The book "Gulliver's Travels" (Swift Jonathan 1726) was indexed for this test.
@@ -1331,9 +1441,18 @@ You can pass a preset during creation/initialization of the index.
## Best Practices ## Best Practices
### Page-Load / Fast-Boot
There are several options to optimize either the page load or when booting up or populate an index on server-side:
- Using [Fast-Boot Serialization](doc/export-import.md#fast-boot-serialization-for-server-side-rendering-php-python-ruby-rust-java-go-nodejs-) for small and simple indexes
- Using [Non-Blocking Runtime Balancer (Async)](doc/async.md) for populating larger amounts of contents while doing other processes in parallel
- Using [Worker Indexes](doc/worker.md) will distribute the workload to dedicated balanced threads
- Using [Persistent Indexes](doc/persistent.md) when targeting a zero-latency boot-up
### Use numeric IDs ### Use numeric IDs
It is recommended to use numeric id values as reference when adding content to the index. The byte length of passed ids influences the memory consumption significantly. If this is not possible you should consider to use a index table and map the ids with indexes, this becomes important especially when using contextual indexes on a large amount of content. It is recommended to use id values from type `number` as reference when adding content to the index. The reserved byte length of passed ids influences the memory consumption significantly. When stringified numeric IDs are included in your datasets consider replacing these by `parseInt(...)` before pushing to the index.
--- ---

View File

@@ -1,17 +0,0 @@
## Auto-Balanced Cache (By Popularity)
You need to initialize the cache and its limit during the creation of the index:
```js
const index = new Index({ cache: 100 });
```
```js
const results = index.searchCache(query);
```
A common scenario for using a cache is an autocomplete or instant search when typing.
> When passing a number as a limit the cache automatically balance stored entries related to their popularity.
> When just using "true" the cache is unbounded and perform actually 2-3 times faster (because the balancer do not have to run).

View File

@@ -12,7 +12,7 @@ You can't resolve build flags with:
- rollup - rollup
- Terser - Terser
These are some of the basic builds located in the `/dist/` folder: You can run any of the basic builds located in the `/dist/` folder, e.g.:
```bash ```bash
npm run build:bundle npm run build:bundle

View File

@@ -1,10 +1,24 @@
# Document Search (Field-Search)
<a name="docs"></a> Whereas the simple `Index` can just consume id-content pairs, the `Document`-Index is able to process more complex data structures like JSON.
## Index Documents (Field-Search) Technically, a `Document`-Index is a layer on top of several default indexes. You can create multiple independent Document-Indexes in parallel, any of them can use the `Worker` or `Persistent` model optionally.
### The Document Descriptor FlexSearch Documents also contain these features:
Assuming our document has a data structure like this: - Document Store including Enrichment
- Multi-Field-Search
- Multi-Tag-Search
- Resolver (Chain Complex Queries)
- Result Highlighting
- Export/Import
- Worker
- Persistent
## The Document Descriptor
When creating a `Document`-Index you will need to define a document descriptor in the field `document`. This descriptor is including any specific information about how the document data should be indexed.
Assuming our document has a simple data structure like this:
```json ```json
{ {
@@ -13,42 +27,32 @@ Assuming our document has a data structure like this:
} }
``` ```
> The document descriptor has slightly changed, there is no `field` branch anymore, instead just apply one level higher, so `key` becomes a main member of options. An appropriate Document Descriptor has always to define at least 2 things:
For the new syntax the field "doc" was renamed to `document` and the field "field" was renamed to `index`: 1. the property `id` describes the location of the document ID within a document item
2. the property `index` (or `tag`) containing one or multiple fields from the document, which should be indexed for searching
```js ```js
// create a document index
const index = new Document({ const index = new Document({
document: { document: {
id: "id", id: "id",
index: ["content"] index: "content"
} }
}); });
// add documents to the index
index.add({ index.add({
id: 0, id: 0,
content: "some text" content: "some text"
}); });
``` ```
The field `id` describes where the ID or unique key lives inside your documents. The default key gets the value `id` by default when not passed, so you can shorten the example from above to: As briefly explained above, the field `id` describes where the ID or unique key lives inside your documents. When not passed it will always take the field `id` from the top level scope of your data.
```js The property `index` takes all fields you would like to have indexed. When just selecting one field, then you can pass a string.
const index = new Document({
document: {
index: ["content"]
}
});
```
The member `index` has a list of fields which you want to be indexed from your documents. When just selecting one field, then you can pass a string. When also using default key `id` then this shortens to just: The next example will add 2 fields `title` and `content` to the index:
```js
const index = new Document({ document: "content" });
index.add({ id: 0, content: "some text" });
```
Assuming you have several fields, you can add multiple fields to the index:
```js ```js
var docs = [{ var docs = [{
@@ -69,7 +73,7 @@ const index = new Document({
}); });
``` ```
You can pass custom options for each field: Add both fields to the document descriptor and pass individual [Index-Options](options.md) for each field:
```js ```js
const index = new Document({ const index = new Document({
@@ -77,47 +81,37 @@ const index = new Document({
index: [{ index: [{
field: "title", field: "title",
tokenize: "forward", tokenize: "forward",
optimize: true, encoder: Charset.LatinAdvanced,
resolution: 9 resolution: 9
},{ },{
field: "content", field: "content",
tokenize: "strict", tokenize: "forward",
optimize: true, encoder: Charset.LatinAdvanced,
resolution: 5, resolution: 3
minlength: 3,
context: {
depth: 1,
resolution: 3
}
}] }]
}); });
``` ```
Field options gets inherited when also global options was passed, e.g.: Field options inherits from top level options when passed, e.g.:
```js ```js
const index = new Document({ const index = new Document({
tokenize: "strict", tokenize: "forward",
optimize: true, encoder: Charset.LatinAdvanced,
resolution: 9, resolution: 9,
document: { document: {
id: "id", id: "id",
index:[{ index:[{
field: "title", field: "title"
tokenize: "forward"
},{ },{
field: "content", field: "content",
minlength: 3, resolution: 3
context: {
depth: 1,
resolution: 3
}
}] }]
} }
}); });
``` ```
Note: The context options from the field "content" also gets inherited by the corresponding field options, whereas this field options was inherited by the global option. > Assigning the `Encoder` instance to the top level configuration will share the encoder to all fields. You should avoid this when contents of fields don't have the same type of content (e.g. one field contains terms, another contains numeric IDs).
### Nested Data Fields (Complex Objects) ### Nested Data Fields (Complex Objects)
@@ -136,7 +130,7 @@ Assume the document array looks more complex (has nested branches etc.), e.g.:
} }
``` ```
Then use the colon separated notation `root:child:child` to define hierarchy within the document descriptor: Then use the colon separated notation `root:child:child` as a name for each field defining the hierarchy which corresponds to the document:
```js ```js
const index = new Document({ const index = new Document({
@@ -150,9 +144,11 @@ const index = new Document({
} }
}); });
``` ```
> Just add fields you want to query against. Do not add fields to the index, you just need in the result (but did not query against). For this purpose you can store documents independently of its index (read below).
When you want to query through a field you have to pass the exact key of the field you have defined in the `doc` as a field name (with colon syntax): > [!TIP]
> Just add fields you want to query against. Do not add fields to the index, you just need in the result. For this purpose you can store documents independently of its index (read below).
To query against one or multiple specific fields you have to pass the exact key of the field you have defined in the document descriptor as a field name (with colon syntax):
```js ```js
index.search(query, { index.search(query, {
@@ -176,6 +172,20 @@ index.search(query, [
Using field-specific options: Using field-specific options:
```js
index.search("some query", [{
field: "record:title",
limit: 100,
suggest: true
},{
field: "record:content:header",
limit: 100,
suggest: false
}]);
```
You can also perform a search through the same field with different queries:
```js ```js
index.search([{ index.search([{
field: "record:title", field: "record:title",
@@ -190,15 +200,11 @@ index.search([{
}]); }]);
``` ```
You can perform a search through the same field with different queries.
> When passing field-specific options you need to provide the full configuration for each field. They get not inherited like the document descriptor.
### Complex Documents ### Complex Documents
You need to follow 2 rules for your documents: You need to follow 2 rules for your documents:
1. The document cannot start with an Array at the root index. This will introduce sequential data and isn't supported yet. See below for a workaround for such data. 1. The document cannot start with an Array __at the root__. This will introduce sequential data and isn't supported yet. See below for a workaround for such data.
```js ```js
[ // <-- not allowed as document start! [ // <-- not allowed as document start!
@@ -209,7 +215,7 @@ You need to follow 2 rules for your documents:
] ]
``` ```
2. The id can't be nested inside an array (also none of the parent fields can't be an array). This will introduce sequential data and isn't supported yet. See below for a workaround for such data. 2. The document ID can't be nested __inside an Array__. This will introduce sequential data and isn't supported yet. See below for a workaround for such data.
```js ```js
{ {
@@ -255,27 +261,29 @@ The corresponding document descriptor (when all fields should be indexed) looks
const index = new Document({ const index = new Document({
document: { document: {
id: "meta:id", id: "meta:id",
tag: "meta:tag",
index: [ index: [
"contents[]:body:title", "contents:body:title",
"contents[]:body:footer", "contents:body:footer"
"contents[]:keywords" ],
tag: [
"meta:tag",
"contents:keywords"
] ]
} }
}); });
``` ```
Again, when searching you have to use the same colon-separated-string from your field definition. Remember when searching you have to use the same colon-separated-string as a key from your field definition.
```js ```js
index.search(query, { index.search(query, {
index: "contents[]:body:title" index: "contents:body:title"
}); });
``` ```
### Not Supported Documents (Sequential Data) ### Not Supported Documents (Sequential Data)
This example breaks both rules from above: This example breaks both rules described above:
```js ```js
[ // <-- not allowed as document start! [ // <-- not allowed as document start!
@@ -303,90 +311,83 @@ This example breaks both rules from above:
] ]
``` ```
You need to apply some kind of structure normalization. You need to unroll your data within a simple loop before adding to the index.
A workaround to such a data structure looks like this: A workaround to such a data structure from above could look like:
```js ```js
const index = new Document({ const index = new Document({
document: { document: {
id: "record:id", id: "id",
tag: "tag",
index: [ index: [
"record:body:title", "body:title",
"record:body:footer", "body:footer"
"record:body:keywords" ],
tag: [
"tag",
"keywords"
] ]
} }
}); });
function add(sequential_data){ function add(sequential_data){
for(let x = 0, data; x < sequential_data.length; x++){ for(let x = 0, item; x < sequential_data.length; x++){
data = sequential_data[x]; item = sequential_data[x];
for(let y = 0, record; y < data.records.length; y++){ for(let y = 0, record; y < item.records.length; y++){
record = item.records[y];
record = data.records[y]; // append tag to each record
record.tag = item.tag;
index.add({ // add to index
id: record.id, index.add(record);
tag: data.tag,
record: record
});
} }
} }
} }
// now just use add() helper method as usual: // now just use add() helper method as usual:
add([{ add([{
// sequential structured data // sequential structured data
// take the data example above // take the data example above
}]); }]);
``` ```
You can skip the first loop when your document data has just one index as the outer array. ### Add/Update/Remove Documents
### Add/Update/Remove Documents to/from the Index
Add a document to the index: Add a document to the index:
```js ```js
index.add({ index.add({
id: 0, id: 0,
title: "Foo", title: "Foo",
content: "Bar" content: "Bar"
});
```
Update index with a single object or an array of objects:
```js
index.update({
data:{
id: 0,
title: "Foo",
body: {
content: "Bar"
}
}
}); });
``` ```
Remove a single object or an array of objects from the index: Update index:
```js ```js
index.remove(docs); index.update({
id: 0,
title: "Foo",
content: "Foobar"
});
``` ```
When the id is known, you can also simply remove by (faster): Remove a document and all its contents from an index, by ID:
```js ```js
index.remove(id); index.remove(id);
``` ```
Or by the document data:
```js
index.remove(doc);
```
<!--
### Join / Append Arrays ### Join / Append Arrays
On the complex example above, the field `keywords` is an array but here the markup did not have brackets like `keywords[]`. That will also detect the array but instead of appending each entry to a new context, the array will be joined into on large string and added to the index. On the complex example above, the field `keywords` is an array but here the markup did not have brackets like `keywords[]`. That will also detect the array but instead of appending each entry to a new context, the array will be joined into on large string and added to the index.
@@ -396,8 +397,9 @@ The difference of both kinds of adding array contents is the relevance when sear
So assuming the keyword from the example above are pre-sorted by relevance to its popularity, then you want to keep this order (information of relevance). For this purpose do not add brackets to the notation. Otherwise, it would take the entries in a new scoring context (the old order is getting lost). So assuming the keyword from the example above are pre-sorted by relevance to its popularity, then you want to keep this order (information of relevance). For this purpose do not add brackets to the notation. Otherwise, it would take the entries in a new scoring context (the old order is getting lost).
Also you can left bracket notation for better performance and smaller memory footprint. Use it when you did not need the granularity of relevance by the entries. Also you can left bracket notation for better performance and smaller memory footprint. Use it when you did not need the granularity of relevance by the entries.
-->
### Field-Search ## Field-Search
Search through all fields: Search through all fields:
@@ -417,13 +419,7 @@ Search through a given set of fields:
index.search(query, { index: ["title", "content"] }); index.search(query, { index: ["title", "content"] });
``` ```
Same as: Pass custom options and/or queries to each field:
```js
index.search(query, ["title", "content"]);
```
Pass custom modifiers and queries to each field:
```js ```js
index.search([{ index.search([{
@@ -439,11 +435,21 @@ index.search([{
}]); }]);
``` ```
You can perform a search through the same field with different queries. ### Limit & Offset
<a href="#options-field-search">See all available field-search options.</a> > By default, every query is limited to 100 entries. Unbounded queries leads into issues. You need to set the limit as an option to adjust the size.
### The Result Set You can set the limit and the offset for each query:
```js
index.search(query, { limit: 20, offset: 100 });
```
> You cannot pre-count the size of the result-set. That's a limit by the design of FlexSearch. When you really need a count of all results you are able to page through, then just assign a high enough limit and get back all results and apply your paging offset manually (this works also on server-side). FlexSearch is fast enough that this isn't an issue.
[See all available field-search options](options.md)
## The Result Set
Schema of the result-set: Schema of the result-set:
@@ -566,51 +572,98 @@ index.search(query, {
This gives you result which are tagged with one of the given tag. This gives you result which are tagged with one of the given tag.
> Multiple tags will apply as the boolean "or" by default. It just needs one of the tags to be existing.
This is another situation where the `bool` property is still supported. When you like to switch the default "or" logic from the tag search into "and", e.g.:
```js
index.search(query, {
index: "content",
tag: ["dog", "animal"],
bool: "and"
});
```
You will just get results which contains both tags (in this example there is just one records which has the tag "dog" and "animal"). You will just get results which contains both tags (in this example there is just one records which has the tag "dog" and "animal").
### Tag Search ## Multi-Tag-Search
You can also fetch results from one or more tags when no query was passed:
Assume this document schema (a dataset from IMDB):
```js ```js
index.search({ tag: ["cat", "dog"] }); {
"tconst": "tt0000001",
"titleType": "short",
"primaryTitle": "Carmencita",
"originalTitle": "Carmencita",
"isAdult": 0,
"startYear": "1894",
"endYear": "",
"runtimeMinutes": "1",
"genres": [
"Documentary",
"Short"
]
}
``` ```
In this case the result-set looks like: An appropriate document descriptor could look like:
```js ```js
[{ import Charset from "flexsearch";
tag: "cat", const flexsearch = new Document({
result: [ /* all cats */ ] encoder: Charset.Normalize,
},{ resolution: 3,
tag: "dog", document: {
result: [ /* all dogs */ ] id: "tconst",
}] //store: true, // document store
index: [{
field: "primaryTitle",
tokenize: "forward"
},{
field: "originalTitle",
tokenize: "forward"
}],
tag: [
"startYear",
"genres"
]
}
});
```
The field contents of `primaryTitle` and `originalTitle` are encoded by the forward tokenizer. The field contents of `startYear` and `genres` are added as tags.
Get all entries of a specific tag:
```js
const result = flexsearch.search({
//enrich: true, // enrich documents
tag: { "genres": "Documentary" },
limit: 1000,
offset: 0
});
``` ```
### Limit & Offset Get entries of multiple tags (intersection):
> By default, every query is limited to 100 entries. Unbounded queries leads into issues. You need to set the limit as an option to adjust the size.
You can set the limit and the offset for each query:
```js ```js
index.search(query, { limit: 20, offset: 100 }); const result = flexsearch.search({
//enrich: true, // enrich documents
tag: {
"genres": ["Documentary", "Short"],
"startYear": "1894"
}
});
``` ```
> You cannot pre-count the size of the result-set. That's a limit by the design of FlexSearch. When you really need a count of all results you are able to page through, then just assign a high enough limit and get back all results and apply your paging offset manually (this works also on server-side). FlexSearch is fast enough that this isn't an issue. Combine tags with queries (intersection):
```js
const result = flexsearch.search({
query: "Carmen", // forward tokenizer
tag: {
"genres": ["Documentary", "Short"],
"startYear": "1894"
}
});
```
Alternative declaration:
```js
const result = flexsearch.search("Carmen", {
tag: [{
field: "genres",
tag: ["Documentary", "Short"]
},{
field: "startYear",
tag: "1894"
}]
});
```
## Document Store ## Document Store
@@ -770,99 +823,6 @@ By passing the search option `merge: true` the result set will be merged into (g
}] }]
``` ```
<a name="tag-search"></a>
## Multi-Tag-Search
Assume this document schema (a dataset from IMDB):
```js
{
"tconst": "tt0000001",
"titleType": "short",
"primaryTitle": "Carmencita",
"originalTitle": "Carmencita",
"isAdult": 0,
"startYear": "1894",
"endYear": "",
"runtimeMinutes": "1",
"genres": [
"Documentary",
"Short"
]
}
```
An appropriate document descriptor could look like:
```js
import LatinEncoder from "./charset/latin/simple.js";
const flexsearch = new Document({
encoder: LatinEncoder,
resolution: 3,
document: {
id: "tconst",
//store: true, // document store
index: [{
field: "primaryTitle",
tokenize: "forward"
},{
field: "originalTitle",
tokenize: "forward"
}],
tag: [
"startYear",
"genres"
]
}
});
```
The field contents of `primaryTitle` and `originalTitle` are encoded by the forward tokenizer. The field contents of `startYear` and `genres` are added as tags.
Get all entries of a specific tag:
```js
const result = flexsearch.search({
//enrich: true, // enrich documents
tag: { "genres": "Documentary" },
limit: 1000,
offset: 0
});
```
Get entries of multiple tags (intersection):
```js
const result = flexsearch.search({
//enrich: true, // enrich documents
tag: {
"genres": ["Documentary", "Short"],
"startYear": "1894"
}
});
```
Combine tags with queries (intersection):
```js
const result = flexsearch.search({
query: "Carmen", // forward tokenizer
tag: {
"genres": ["Documentary", "Short"],
"startYear": "1894"
}
});
```
Alternative declaration:
```js
const result = flexsearch.search("Carmen", {
tag: [{
field: "genres",
tag: ["Documentary", "Short"]
},{
field: "startYear",
tag: "1894"
}]
});
```
## Filter Fields (Index / Tags / Datastore) ## Filter Fields (Index / Tags / Datastore)
```js ```js
@@ -898,7 +858,6 @@ const flexsearch = new Document({
}); });
``` ```
## Custom Fields (Index / Tags / Datastore) ## Custom Fields (Index / Tags / Datastore)
Dataset example: Dataset example:
@@ -979,3 +938,6 @@ const result = flexsearch.search({
}); });
``` ```
### Best Practices: Merge Documents
[Read here](encoder.md#merge-documents)

3
doc/encoder-workflow.svg Normal file

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 26 KiB

File diff suppressed because it is too large Load Diff

View File

@@ -1,109 +0,0 @@
## Fuzzy-Search
Fuzzysearch describes a basic concept of how making queries more tolerant. FlexSearch provides several methods to achieve fuzziness:
1. Use a tokenizer: `forward`, `reverse` or `full`
2. Don't forget to use any of the builtin encoder `simple` > `balance` > `advanced` > `extra` > `soundex` (sorted by fuzziness)
3. Use one of the language specific presets e.g. `/lang/en.js` for en-US specific content
4. Enable suggestions by passing the search option `suggest: true`
Additionally, you can apply custom `Mapper`, `Replacer`, `Stemmer`, `Filter` or by assigning a custom `normalize(str)`, `prepare(str)` or `finalize(arr)` function to the Encoder.
### Compare Fuzzy-Search Encoding
Original term which was indexed: "Struldbrugs"
<table>
<tr>
<th align="left">Encoder:</th>
<th><code>LatinExact</code></th>
<th><code>LatinDefault</code></th>
<th><code>LatinSimple</code></th>
<th><code>LatinBalance</code></th>
<th><code>LatinAdvanced</code></th>
<th><code>LatinExtra</code></th>
<th><code>LatinSoundex</code></th>
</tr>
<tr>
<th align="left">Index Size</th>
<th>3.1 Mb</th>
<th>1.9 Mb</th>
<th>1.8 Mb</th>
<th>1.7 Mb</th>
<th>1.6 Mb</th>
<th>1.1 Mb</th>
<th>0.7 Mb</th>
</tr>
<tr>
<td align="left">Struldbrugs</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td align="left">struldbrugs</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td align="left">strũldbrųĝgs</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td align="left">strultbrooks</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td align="left">shtruhldbrohkz</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td align="left">zdroltbrykz</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td align="left">struhlbrogger</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</table>
The index size was measured after indexing the book "Gulliver's Travels".

View File

@@ -402,3 +402,43 @@
<td>"or"</td> <td>"or"</td>
</tr> </tr>
</table> </table>
## Encoder Options
<table>
<tr></tr>
<tr>
<td>Field</td>
<td>Category</td>
<td>Description</td>
</tr>
<tr>
<td><b>encode</b></td>
<td>charset</td>
<td>The encoder function. Has to return an array of separated words (or an empty string).</td>
</tr>
<tr></tr>
<tr>
<td><b>rtl</b></td>
<td>charset</td>
<td>A boolean property which indicates right-to-left encoding.</td>
</tr>
<tr></tr>
<tr>
<td><b>filter</b></td>
<td>language</td>
<td>Filter are also known as "stopwords", they completely filter out words from being indexed.</td>
</tr>
<tr></tr>
<tr>
<td><b>stemmer</b></td>
<td>language</td>
<td>Stemmer removes word endings and is a kind of "partial normalization". A word ending just matched when the word length is bigger than the matched partial.</td>
</tr>
<tr></tr>
<tr>
<td><b>matcher</b></td>
<td>language</td>
<td>Matcher replaces all occurrences of a given string regardless of its position and is also a kind of "partial normalization".</td>
</tr>
</table>