1
0
mirror of https://github.com/nextapps-de/flexsearch.git synced 2025-08-19 04:12:37 +02:00

update readme part 2 of 2

This commit is contained in:
Thomas Wilkerling
2025-03-30 16:49:53 +02:00
parent b75fff8937
commit 2ef17eacaf
8 changed files with 395 additions and 363 deletions

270
README.md
View File

@@ -23,23 +23,6 @@ FlexSearch v0.8: [Overview and Migration Guide](doc/0.8.0.md)
[Resolver](doc/resolver.md)  • 
[Changelog](CHANGELOG.md)
<!--
> [!NOTE]
> Useful information that users should know, even when skimming content.
> [!TIP]
> Helpful advice for doing things better or more easily.
> [!IMPORTANT]
> Key information users need to know to achieve their goal.
> [!WARNING]
> Urgent info that needs immediate user attention to avoid problems.
> [!CAUTION]
> Advises about risks or negative outcomes of certain actions.
-->
## Please Support this Project
FlexSearch has been helping developers around the world build powerful, efficient search functionalities for years. Maintaining and improving the library requires significant time and resources. If youve found this project valuable and you're interested in supporting the project, please consider donating. Thanks a lot for your continued support!
@@ -259,11 +242,15 @@ Extern Projects & Plugins:
- [Persistent Options](doc/options.md)
- [Encoder Options](doc/options.md)
- [Resolver Options](doc/options.md)
- [Presets](#presets)
- [Context Search](#context-search)
- [Fast-Update Mode](#fast-update-mode)
- [Suggestions](#suggestions)
- [Document Search (Multi-Field Search)](doc/document-search.md)
- [Multi-Tag Search](doc/document-search.md)
- [Phonetic Search (Fuzzy Search)](#fuzzy-search)
- [Tokenizer (Partial Search)](#tokenizer-partial-match)
- [Charset Collection](#charset-collection)
- [Encoder](doc/encoder.md)
- [Universal Charset Collection](doc/encoder.md)
- [Latin Charset Encoder Presets](doc/encoder.md)
@@ -290,6 +277,9 @@ Extern Projects & Plugins:
- [Custom Score Function](doc/customization.md)
- [Custom Builds](doc/custom-builds.md)
- [Extended Keystores (In-Memory)](doc/keystore.md)
- [Best Practices](#best-practices)
- [Page-Load / Fast-Boot](#page-load--fast-boot)
- [Use numeric IDs](#use-numeric-ids)
## Load Library (Node.js, ESM, Legacy Browser)
@@ -830,170 +820,167 @@ The documentation will refer to several examples. A list of all examples:
</details>
<a name="api"></a>
## API Overview
Constructors:
- new <a href="#mikado.new">**Index**</a>(\<options\>) : <small>_index_</small>
- new <a href="#mikado.new">**Document**</a>(options) : <small>_document_</small>
- new <a href="#mikado.new">**Worker**</a>(\<options\>) : <small>_worker_</small>
- new <a href="#mikado.new">**Encoder**</a>(\<options\>, \<options\>, ...) : <small>_encoder_</small>
- new <a href="#mikado.new">**Resolver**</a>(\<options\>) : <small>_resolver_</small>
- new <a href="#mikado.new">**IndexedDB**</a>(\<options\>) : <small>_indexeddb_</small>
- new [**Index**](#basic-usage)(\<options\>) : <small>_index_</small>
- new [**Document**](doc/document-search.md)(options) : <small>_document_</small>
- new [**Worker**](doc/worker.md)(\<options\>) : <small>_worker_</small>
- new [**Encoder**](doc/encoder.md)(\<options\>, \<options\>, ...) : <small>_encoder_</small>
- new [**Resolver**](doc/resolver.md)(\<options\>) : <small>_resolver_</small>
- new [**IndexedDB**](doc/persistent-indexeddb.md)(\<options\>) : <small>_indexeddb_</small>
---
Global Members:
- <a href="#flexsearch.resolver">Charset</a>
- <a href="#flexsearch.persistent">Language</a> (Legacy Browser)
- [**Charset**](#charset-collection)
- [**Language**](doc/encoder.md#built-in-language-packs) (Legacy Browser Only)
---
`Index` / `Worker`-Index Methods:
- index.<a href="#index.add">__add__</a>(id, string)
- ~~index.<a href="#index.append">__append__</a>(id, string)~~
- index.<a href="#index.update">__update__</a>(id, string)
- index.<a href="#index.remove">__remove__</a>(id)
- index.<a href="#index.search">__search__</a>(string, \<limit\>, \<options\>)
- index.<a href="#index.search">__search__</a>(options)
- index.<a href="#index.searchCache">__searchCache__</a>(...)
- index.<a href="#index.contain">__contain__</a>(id)
- index.<a href="#index.clear">__clear__</a>()
- index.<a href="#index.cleanup">__cleanup__</a>()
- index.[**add**](#add-text-item-to-an-index)(id, string)
- ~~index.[**append**]()(id, string)~~
- index.[**update**](#update-item-from-an-index)(id, string)
- index.[**remove**](#remove-item-from-an-index)(id)
- index.[**search**](#search-items)(string, \<limit\>, \<options\>)
- index.[**search**](#search-items)(options)
- index.[**searchCache**](#auto-balanced-cache-by-popularity)(...)
- index.[**contain**](#check-existence-of-already-indexed-ids)(id)
- index.[**clear**](#clear-all-items-from-an-index)()
- index.[**cleanup**](#fast-update-mode)()
- <small>_async_</small> index.<a href="#index.export">__export__</a>(handler)
- <small>_async_</small> index.<a href="#index.import">__import__</a>(key, data)
- <small>_async_</small> index.<a href="#index.serialize">__serialize__</a>(boolean)
- <small>_async_</small> index.[**export**](doc/export-import.md)(handler)
- <small>_async_</small> index.[**import**](doc/export-import.md)(key, data)
- <small>_async_</small> index.[**serialize**](doc/export-import.md#fast-boot-serialization-for-server-side-rendering-php-python-ruby-rust-java-go-nodejs-)(boolean)
- <small>_async_</small> index.<a href="#index.mount">__mount__</a>(db)
- <small>_async_</small> index.<a href="#index.commit">__commit__</a>(boolean)
- <small>_async_</small> index.<a href="#index.destroy">__destroy__</a>()
- <small>_async_</small> index.[**mount**](doc/persistent.md)(db)
- <small>_async_</small> index.[**commit**](doc/persistent.md)(boolean)
- <small>_async_</small> index.[**destroy**](doc/persistent.md#delete-store--migration)()
---
`Document` Methods:
- document.<a href="#document.add">__add__</a>(\<id\>, document)\
- ~~document.<a href="#document.append">__append__</a>(\<id\>, document)~~\
- document.<a href="#document.update">__update__</a>(\<id\>, document)\
- document.<a href="#document.remove">__remove__</a>(id)\
- document.<a href="#document.remove">__remove__</a>(document)\
- document.<a href="#document.search">__search__</a>(string, \<limit\>, \<options\>)\
- document.<a href="#document.search">__search__</a>(options)\
- document.<a href="#document.searchCache">__searchCache__</a>(...)\
- document.<a href="#document.contain">__contain__</a>(id)\
- document.<a href="#document.clear">__clear__</a>()\
- document.<a href="#index.cleanup">__cleanup__</a>()\
- document.<a href="#document.get">__get__</a>(id)\
- document.<a href="#document.get">__set__</a>(\<id\>, document)\
- document.[**add**](doc/document-search.md#addupdateremove-documents)(\<id\>, document)
- ~~document.[**append**]()(\<id\>, document)~~
- document.[**update**](doc/document-search.md#addupdateremove-documents)(\<id\>, document)
- document.[**remove**](doc/document-search.md#addupdateremove-documents)(id)
- document.[**remove**](doc/document-search.md#addupdateremove-documents)(document)
- document.[**search**](doc/document-search.md#document-search-field-search)(string, \<limit\>, \<options\>)
- document.[**search**](doc/document-search.md#document-search-field-search)(options)
- document.[**searchCache**](#auto-balanced-cache-by-popularity)(...)
- document.[**contain**](doc/document-search.md)(id)
- document.[**clear**](doc/document-search.md)()
- document.[**cleanup**](#fast-update-mode)()
- document.[**get**](doc/document-search.md#document-store)(id)
- document.[**set**](doc/document-search.md#document-store)(\<id\>, document)
- <small>_async_</small> document.<a href="#document.export">__export__</a>(handler)
- <small>_async_</small> document.<a href="#document.import">__import__</a>(key, data)
- <small>_async_</small> document.[**export**](doc/export-import.md)(handler)
- <small>_async_</small> document.[**import**](doc/export-import.md)(key, data)
- <small>_async_</small> document.<a href="#document.mount">__mount__</a>(db)
- <small>_async_</small> document.<a href="#document.commit">__commit__</a>(boolean)
- <small>_async_</small> document.<a href="#document.destroy">__destroy__</a>()
- <small>_async_</small> document.[**mount**](doc/persistent.md)(db)
- <small>_async_</small> document.[**commit**](doc/persistent.md)(boolean)
- <small>_async_</small> document.[**destroy**](doc/persistent.md#delete-store--migration)()
`Document` Properties:
- document.<a href="#document.store">__store__</a>
- document.[**store**](doc/document-search.md#document-store)
---
Async Equivalents (Non-Blocking Balanced):
- <small>_async_</small> <a href="#addAsync">.__addAsync__( ... , \<callback\>)</a>
- <small>_async_</small> ~~<a href="#appendAsync">.__appendAsync__( ... , \<callback\>)</a>~~
- <small>_async_</small> <a href="#updateAsync">.__updateAsync__( ... , \<callback\>)</a>
- <small>_async_</small> <a href="#removeAsync">.__removeAsync__( ... , \<callback\>)</a>
- <small>_async_</small> <a href="#searchAsync">.__searchAsync__( ... , \<callback\>)</a>
- <small>_async_</small> [**.addAsync**](doc/async.md)( ... , \<callback\>)
- <small>_async_</small> ~~[**.appendAsync**](doc/async.md)( ... , \<callback\>)~~
- <small>_async_</small> [**.updateAsync**](doc/async.md)( ... , \<callback\>)
- <small>_async_</small> [**.removeAsync**](doc/async.md)( ... , \<callback\>)
- <small>_async_</small> [**.searchAsync**](doc/async.md)( ... , \<callback\>)
Async methods will return a `Promise`, additionally you can pass a callback function as the last parameter.
Methods `export` and also `import` are always async as well as every method you call on a Worker-based or Persistent Index.
Methods `.export()` and also `.import()` are always async as well as every method you call on a `Worker`-based or `Persistent` Index.
---
`Encoder` Methods:
- encoder.<a href="#encoder.encode">__encode__</a>(string)
- encoder.<a href="#encoder.assign">__assign__</a>(options, \<options\>, ...)
- encoder.<a href="#encoder.addFilter">__addFilter__</a>(string)
- encoder.<a href="#encoder.addStemmer">__addStemmer__</a>(string => boolean)
- encoder.<a href="#encoder.addMapper">__addMapper__</a>(char, char)
- encoder.<a href="#encoder.addMatcher">__addMatcher__</a>(string, string)
- encoder.<a href="#encoder.addReplacer">__addReplacer__</a>(regex, string)
- encoder.[**encode**](doc/encoder.md)(string)
- encoder.[**assign**](doc/encoder.md)(options, \<options\>, ...)
- encoder.[**addFilter**](doc/encoder.md#add-language-specific-stemmer-andor-filter)(string)
- encoder.[**addStemmer**](doc/encoder.md#add-language-specific-stemmer-andor-filter)(string => boolean)
- encoder.[**addMapper**](doc/encoder.md)(char, char)
- encoder.[**addMatcher**](doc/encoder.md)(string, string)
- encoder.[**addReplacer**](doc/encoder.md)(regex, string)
---
`Resolver` Methods:
- resolver.<a href="#resolver.and">__and__</a>(options)
- resolver.<a href="#resolver.or">__or__</a>(options)
- resolver.<a href="#resolver.xor">__xor__</a>(options)
- resolver.<a href="#resolver.not">__not__</a>(options)
- resolver.<a href="#resolver.boost">__boost__</a>(number)
- resolver.<a href="#resolver.limit">__limit__</a>(number)
- resolver.<a href="#resolver.offset">__offset__</a>(number)
- resolver.<a href="#resolver.resolve">__resolve__</a>(\<options\>)
- resolver.[**and**](doc/resolver.md)(options)
- resolver.[**or**](doc/resolver.md)(options)
- resolver.[**xor**](doc/resolver.md)(options)
- resolver.[**not**](doc/resolver.md)(options)
- resolver.[**boost**](doc/resolver.md)(number)
- resolver.[**limit**](doc/resolver.md)(number)
- resolver.[**offset**](doc/resolver.md)(number)
- resolver.[**resolve**](doc/resolver.md)(\<options\>)
`Resolver` Properties:
- resolver.<a href="#resolver.result">__result__</a>
- resolver.[**result**](doc/resolver.md)
---
`StorageInterface` Methods:
- <small>_async_</small> db.<a href="#db.open">__mount__</a>(index, \<options\>)
- <small>_async_</small> db.<a href="#db.open">__open__</a>()
- <small>_async_</small> db.<a href="#db.close">__close__</a>()
- <small>_async_</small> db.<a href="#db.destroy">__destroy__</a>()
- <small>_async_</small> db.<a href="#db.clear">__clear__</a>()
- <small>_async_</small> db.[**mount**](doc/persistent.md)(index, \<options\>)
- <small>_async_</small> db.[**open**](doc/persistent.md)()
- <small>_async_</small> db.[**close**](doc/persistent.md)()
- <small>_async_</small> db.[**destroy**](doc/persistent.md)()
- <small>_async_</small> db.[**clear**](doc/persistent.md)()
---
`Charset` Universal Encoder Preset:
- Charset.<a href="#charset">__Exact__</a>
- Charset.<a href="#charset">__Default__</a>
- Charset.<a href="#charset">__Normalize__</a>
- Charset.<a href="#charset">__Dedupe__</a>
- Charset.[**Exact**](#charset-collection)
- Charset.[**Default**](#charset-collection)
- Charset.[**Normalize**](#charset-collection)
`Charset` Latin-specific Encoder Preset:
- Charset.<a href="#charset">__LatinBalance__</a>
- Charset.<a href="#charset">__LatinAdvanced__</a>
- Charset.<a href="#charset">__LatinExtra__</a>
- Charset.<a href="#charset">__LatinSoundex__</a>
- Charset.[**LatinBalance**](#charset-collection)
- Charset.[**LatinAdvanced**](#charset-collection)
- Charset.[**LatinExtra**](#charset-collection)
- Charset.[**LatinSoundex**](#charset-collection)
---
`Language` Encoder Preset:
- <a href="#charset">__en__</a>
- <a href="#charset">__de__</a>
- <a href="#charset">__fr__</a>
- [**en**](doc/encoder.md#built-in-language-packs)
- [**de**](doc/encoder.md#built-in-language-packs)
- [**fr**](doc/encoder.md#built-in-language-packs)
## Options
- [Index Options](doc/options.md#options-index)
- [Context Options](doc/options.md#options-context)
- [Index Options](doc/options.md)
- [Context Options](doc/options.md)
- [Document Options](doc/options.md)
- [Encoder Options](doc/options.md)
- [Encoder Options](doc/encoder.md#property-overview)
- [Resolver Options](doc/options.md)
- [Search Options](doc/options.md)
- [Document Search Options](doc/options.md)
- [Worker Options](doc/options.md)
- [Persistent Options](doc/options.md)
<a name="tokenize"></a>
## Tokenizer (Partial Match)
The tokenizer is one of the most important options and heavily influence:
@@ -1015,35 +1002,34 @@ Try to choose the most upper of these tokenizer which covers your requirements:
<td>Memory Factor (n = length of term)</td>
</tr>
<tr>
<td><b>"strict"</b><br><b>"exact"</b><br><b>"default"</b></td>
<td><code>"strict"</code><br><code>"exact"</code><br><code>"default"</code></td>
<td>index the full term</td>
<td><code>foobar</code></td>
<td>* 1</td>
<td><a>foobar</a></td>
<td>1</td>
</tr>
<tr></tr>
<tr>
<td><b>"forward"</b></td>
<td><code>"forward"</code></td>
<td>index term in forward direction (supports right-to-left by Index option <code>rtl: true</code>)</td>
<td><code>fo</code>obar<br><code>foob</code>ar<br></td>
<td>* n</td>
<td><a>fo</a>obar<br><a>foob</a>ar<br></td>
<td>n</td>
</tr>
<tr></tr>
<tr>
<td><b>"reverse"</b><br><b>"bidirectional"</b></td>
<td><code>"reverse"</code><br><code>"bidirectional"</code></td>
<td>index term in both directions</td>
<td><code>fo</code>obar<br><code>foob</code>ar<br>foob<code>ar</code><br>fo<code>obar</code></td>
<td>* 2n - 1</td>
<td><a>fo</a>obar<br><a>foob</a>ar<br>foob<a>ar</a><br>fo<a>obar</a></td>
<td>2n - 1</td>
</tr>
<tr></tr>
<tr>
<td><b>"full"</b></td>
<td><code>"full"</code></td>
<td>index every consecutive partial</td>
<td>fo<code>oba</code>r<br>f<code>oob</code>ar</td>
<td>* n * (n - 1)</td>
<td>fo<a>oba</a>r<br>f<a>oob</a>ar</td>
<td>n * (n - 1)</td>
</tr>
</table>
<a name="charset"></a>
## Charset Collection
Encoding is one of the most important task and heavily influence:
@@ -1067,7 +1053,7 @@ Encoding is one of the most important task and heavily influence:
</tr>
<tr></tr>
<tr>
<td><code>Normalize (Default)</code></td>
<td><code>Normalize</code><br><code>Default</code></td>
<td>Case in-sensitive encoding<br>Charset normalization<br>Letter deduplication</td>
<td>Universal (multi-lang)</td>
<td>~ 7%</td>
@@ -1100,18 +1086,16 @@ Encoding is one of the most important task and heavily influence:
<td>Latin</td>
<td>~ 70%</td>
</tr>
<tr></tr>
<tr>
<td><code>function(str) => [str]</code></td>
<td>Pass a custom encoding function to the <code>Encoder</code></td>
<td>Latin</td>
<td></td>
<td></td>
</tr>
</table>
## Basic Usage
<a name="flexsearch.create"></a>
#### Create a new index
```js
@@ -1155,15 +1139,12 @@ const index = new Index({
});
```
The resolution refers to the maximum count of scoring slots on which the content is divided into.
> A formula to determine a well-balanced value for the `resolution` is: $2*floor(\sqrt{content.length})$ where content is the value pushed by `index.add()`. Here the maximum length of all contents should be used.
<a href="#options">See all available custom options.</a>
<a name="index.add"></a>
#### Add text item to an index
Every content which should be added to the index needs an ID. When your content has no ID, then you need to create one by passing an index or count or something else as an ID (a value from type `number` is highly recommended). Those IDs are unique references to a given content. This is important when you update or adding over content through existing IDs. When referencing is not a concern, you can simply use something simple like `count++`.
@@ -1174,7 +1155,6 @@ Every content which should be added to the index needs an ID. When your content
index.add(0, "John Doe");
```
<a name="index.search"></a>
#### Search items
> Index.__search(string | options, \<limit\>, \<options\>)__
@@ -1189,18 +1169,16 @@ Limit the result:
index.search("John", 10);
```
<a name="index.contain"></a>
#### Check existence of already indexed IDs
You can check if an ID was already indexed by:
```js
if(index.contain(1)){
console.log("ID is already in index");
console.log("ID was found in index");
}
```
<a name="index.update"></a>
#### Update item from an index
> Index.__update(id, string)__
@@ -1209,7 +1187,6 @@ if(index.contain(1)){
index.update(0, "Max Miller");
```
<a name="index.remove"></a>
#### Remove item from an index
> Index.__remove(id)__
@@ -1218,6 +1195,14 @@ index.update(0, "Max Miller");
index.remove(0);
```
#### Clear all items from an index
> Index.__clear()__
```js
index.clear();
```
### Chaining
Simply chain methods like:
@@ -1348,7 +1333,33 @@ Original term which was indexed: "Struldbrugs"
The index size was measured after indexing the book "Gulliver's Travels".
<a name="context-search"></a>
## Fast-Update Mode
The default mode is highly optimized for search performance and adding contents to the index. Whenever you need to `update` or `remove` existing contents of an index you can enable an additional register which boost those tasks also to a high-performance level. This register will take an extra amount of memory (~30% increase of index size).
```js
const index = new Index({
fastupdate: true
});
```
```js
const index = new Document({
fastupdate: true
});
```
> `Worker`-Index and `Persistent`-Index does not support the fastupdate option, because of its nature.
When using fastupdate, the index won't fully clear up, when removing items. A barely rest of structure will still remain. It's not a memory issue, because this rest will take less than 1% of the index size. But instead the internal performance of key lookups will lose efficiency, because of not used (empty) keys in the index.
In most cases this is not an issue. But you can trigger a `cleanup` task, which will find those empty index slots and remove them:
```js
index.cleanup();
```
> The `cleanup` method has no effect when not using `fastupdate: true`.
## Context Search
The basic idea of this concept is to limit relevance by its context instead of calculating relevance through the whole distance of its corresponding document. The context acts like a bidirectional moving window of 2 pointers (terms) which can initially have a maximum distance of the value passed via option setting `depth` and dynamically growth on search when the query did not match any results.
@@ -1357,7 +1368,6 @@ The basic idea of this concept is to limit relevance by its context instead of c
<img src="https://cdn.jsdelivr.net/gh/nextapps-de/flexsearch@master/doc/contextual-index.svg?v=4" width="100%">
</p>
<a name="contextual_enable"></a>
### Enable Context-Search
Create an index and use the default context:

View File

@@ -451,13 +451,17 @@ index.search(query, { limit: 20, offset: 100 });
## The Result Set
Schema of the result-set:
Schema of the default result-set:
> `fields[] => { field, result[] => { document }}`
> `fields[] => { field, result[] => id }`
The first index is an array of fields the query was applied to. Each of this field has a record (object) with 2 properties "field" and "result". The "result" is also an array and includes the result for this specific field. The result could be an array of IDs or as enriched with stored document data.
Schema of an enriched result-set:
A non-enriched result set now looks like:
> `fields[] => { field, result[] => { id, doc }}`
The top-level scope of the result set is an array of fields on which the query was applied to. Each of this field has a record (object) with 2 properties `field` and `result`. The `result` could be an array of IDs or is getting enriched by the stored document data (when index was created with `store: true`).
A default non-enriched result set looks like:
```js
[{
@@ -469,7 +473,7 @@ A non-enriched result set now looks like:
}]
```
An enriched result set now looks like:
An enriched result set looks like:
```js
[{
@@ -489,12 +493,36 @@ An enriched result set now looks like:
}]
```
When using `pluck` instead of "field" you can explicitly select just one field and get back a flat representation:
### Merge Document Results
Schema of the merged result-set:
> `result[] => { id, doc, field[] }}`
By passing the search option `merge: true` all fields of the result set will be merged (grouped by ID):
```js
index.search(query, { pluck: "title", enrich: true });
[{
id: 1001,
doc: {/* stored document */}
field: ["fieldname-1", "fieldname-2"]
},{
id: 1002,
doc: {/* stored document */}
field: ["fieldname-3"]
}]
```
### Pluck Single Fields
When using `pluck` instead of `field` you can explicitly select just one field and get back a flat representation:
```js
index.search(query, {
pluck: "title",
enrich: true
});
```
```js
[
{ id: 0, doc: { /* document */ }},
@@ -503,30 +531,15 @@ index.search(query, { pluck: "title", enrich: true });
]
```
This result set is a replacement of "boolean search". Instead of applying your bool logic to a nested object, you can apply your logic by yourself on top of the result-set dynamically. This opens hugely capabilities on how you process the results. Therefore, the results from the fields aren't squashed into one result anymore. That keeps some important information, like the name of the field as well as the relevance of each field results which didn't get mixed anymore.
## Tags
> A field search will apply a query with the boolean "or" logic by default. Each field has its own result to the given query.
There is one situation where the `bool` property is being still supported. When you like to switch the default "or" logic from the field search into "and", e.g.:
```js
index.search(query, {
index: ["title", "content"],
bool: "and"
});
```
You will just get results which contains the query in both fields. That's it.
### Tags
Like the `key` for the ID just define the path to the tag:
Like the property `index` within a document descriptor just define a property `tag`:
```js
const index = new Document({
document: {
id: "id",
tag: "tag",
tag: "species",
index: "content"
}
});
@@ -535,17 +548,17 @@ const index = new Document({
```js
index.add({
id: 0,
tag: "cat",
species: "cat",
content: "Some content ..."
});
```
Your data also can have multiple tags as an array:
Your data also can include multiple tags as an array:
```js
index.add({
id: 1,
tag: ["animal", "dog"],
species: ["fish", "dog"],
content: "Some content ..."
});
```
@@ -553,29 +566,32 @@ index.add({
You can perform a tag-specific search by:
```js
index.search(query, {
index: "content",
tag: "animal"
index.search(query, {
tag: { species: "fish" }
});
```
This just gives you result which was tagged with the given tag.
This just gives you results which was tagged with the given tag.
Use multiple tags when searching:
```js
index.search(query, {
index: "content",
tag: ["cat", "dog"]
index.search(query, {
tag: { species: ["cat", "dog"] }
});
```
This gives you result which are tagged with one of the given tag.
This give you results which was tagged with at least one of the given tags.
Get back all tagged results without passing any query:
You will just get results which contains both tags (in this example there is just one records which has the tag "dog" and "animal").
```js
index.search({
tag: { species: "cat" }
});
```
## Multi-Tag-Search
### Multi-Tag Search
Assume this document schema (a dataset from IMDB):
```js
@@ -598,7 +614,7 @@ Assume this document schema (a dataset from IMDB):
An appropriate document descriptor could look like:
```js
import Charset from "flexsearch";
const flexsearch = new Document({
const index = new Document({
encoder: Charset.Normalize,
resolution: 3,
document: {
@@ -622,7 +638,7 @@ The field contents of `primaryTitle` and `originalTitle` are encoded by the forw
Get all entries of a specific tag:
```js
const result = flexsearch.search({
const result = index.search({
//enrich: true, // enrich documents
tag: { "genres": "Documentary" },
limit: 1000,
@@ -632,7 +648,7 @@ const result = flexsearch.search({
Get entries of multiple tags (intersection):
```js
const result = flexsearch.search({
const result = index.search({
//enrich: true, // enrich documents
tag: {
"genres": ["Documentary", "Short"],
@@ -643,7 +659,7 @@ const result = flexsearch.search({
Combine tags with queries (intersection):
```js
const result = flexsearch.search({
const result = index.search({
query: "Carmen", // forward tokenizer
tag: {
"genres": ["Documentary", "Short"],
@@ -654,7 +670,7 @@ const result = flexsearch.search({
Alternative declaration:
```js
const result = flexsearch.search("Carmen", {
const result = index.search("Carmen", {
tag: [{
field: "genres",
tag: ["Documentary", "Short"]
@@ -726,24 +742,12 @@ Your results look now like:
}]
```
### Configure Storage (Recommended)
### Configure Document Store (Recommended)
This will add just specific fields from a document to the store (the ID isn't necessary to keep in store):
You can configure independently what should being indexed and what should being stored. This can reduce required index space significantly. Indexed fields do not require to be included in the stored data (also the ID isn't necessary to keep in store).
It is recommended to just add fields to the store you'll need in the final result to process further on.
```js
const index = new Document({
document: {
index: "content",
store: ["author", "email"]
}
});
index.add(id, content);
```
You can configure independently what should being indexed and what should being stored. It is highly recommended to make use of this whenever you can.
Here a useful example of configuring doc and store:
A short example of configuring a document store:
```js
const index = new Document({
@@ -782,51 +786,14 @@ Your results are now looking like:
}]
```
Both field "author" and "email" are not indexed.
## Merge Document Results
By default, the result set of Field-Search has a structure grouped by field names:
```js
[{
field: "fieldname-1",
result: [{
id: 1001,
doc: {/* stored document */}
}]
},{
field: "fieldname-2",
result: [{
id: 1001,
doc: {/* stored document */}
}]
},{
field: "fieldname-3",
result: [{
id: 1002,
doc: {/* stored document */}
}]
}]
```
By passing the search option `merge: true` the result set will be merged into (group by id):
```js
[{
id: 1001,
doc: {/* stored document */}
field: ["fieldname-1", "fieldname-2"]
},{
id: 1002,
doc: {/* stored document */}
field: ["fieldname-3"]
}]
```
Both field "author" and "email" are not indexed, whereas the indexed field "content" was not included in the stored data.
## Filter Fields (Index / Tags / Datastore)
You can pass a function to the field option property `filter`. This function just has to return `true` if the document should be indexed.
```js
const flexsearch = new Document({
const index = new Document({
document: {
id: "id",
index: [{
@@ -860,7 +827,13 @@ const flexsearch = new Document({
## Custom Fields (Index / Tags / Datastore)
You can pass a function to the field option property `custom` to either:
1. change and/or extend the original input string
2. create a new "virtual" field which is not included in document data
Dataset example:
```js
{
"id": 10001,
@@ -873,9 +846,10 @@ Dataset example:
}
```
You can apply custom fields derived from data or by anything else:
You can apply custom fields derived from document data or by any external data:
```js
const flexsearch = new Document({
const index = new Document({
document: {
id: "id",
index: [{
@@ -925,14 +899,14 @@ const flexsearch = new Document({
Perform a query against the custom field as usual:
```js
const result = flexsearch.search({
const result = index.search({
query: "10178 Berlin Alexanderplatz",
field: "location"
});
```
```js
const result = flexsearch.search({
const result = index.search({
query: "john doe",
tag: { "city": "Berlin" }
});

View File

@@ -5,20 +5,6 @@
Search capabilities highly depends on language processing. The Encoder class is one of the most important core functionalities of FlexSearch.
Current Encoding Pipeline:
1. Charset Normalization
2. Custom Preparation
3. Split Content (into terms, apply includes/excludes)
4. Filter: Pre-Filter
5. Matcher (substitute partials)
6. Stemmer (substitute term endings)
7. Filter: Post-Filter
8. Replace Chars (Mapper)
9. Custom Regex (Replacer)
10. Letter Deduplication
11. Custom Finalize
> Encoders are basically responsible for "fuzziness". [Read here about Phonetic Search/Fuzzy Search](../README.md#fuzzy-search)
### Default Encoder
@@ -76,7 +62,7 @@ const encoder = new Encoder({
3. Charset.LatinExtra
4. Charset.LatinSoundex
### Example
### Basic Usage
```js
const encoder = new Encoder({
@@ -249,6 +235,15 @@ encoder.addStemmer("ly", "");
encoder.addReplacer(/[´`ʼ]/g, "'");
```
Using a custom filter:
```js
encoder.addFilter(function(str){
// return true to keep the content
return str.length > 1;
});
```
Shortcut for just assigning one encoder configuration to an index:
```js
@@ -257,13 +252,92 @@ const index = new Index({
});
```
### Encoder Processing Workflow
### Property Overview
This workflow schema might help you to understand each step in the iteration:
<br><br>
<img src="encoder-workflow.svg">
<table>
<tr></tr>
<tr>
<th align="left">Property</th>
<th width="50%" align="left">Description</th>
<th align="left">Values</th>
</tr>
<tr>
<td><code>normalize</code></td>
<td>The normalization stage will simplify the input content e.g. by replacing "é" to "e"</td>
<td>
<code>true</code> enable normalization (default)
<code>false</code> disable normalization<br>
<code>function(str) => str</code> custom function
</td>
</tr>
<tr></tr>
<tr>
<td><code>prepare</code></td>
<td>The preparation stage is a custom function direct followed when normalization was done</td>
<td>
<code>function(str) => str</code> custom function
</td>
</tr>
<tr></tr>
<tr>
<td><code>finalize</code></td>
<td>The finalization stage is a custom function executed at the last task in the encoding pipeline (here it gets an array of tokens and need to return an array of tokens)</td>
<td>
<code>function([str]) => [str]</code> custom function
</td>
</tr>
<tr></tr>
<tr>
<td><code>filter</code></td>
<td>Stop-word filter is like a blacklist of words to be filtered out from indexing at all (e.g. "and", "to" or "be"). This is also very useful when using <a href="../README.md#context-search">Context Search</a></td>
<td>
<code>Set(["and", "to", "be"])</code><br>
<code>function(str) => bool</code> custom function<hr style="margin: 5px">
<code>encoder.addFilter("and")</code>
</td>
</tr>
<tr></tr>
<tr>
<td><code>stemmer</code></td>
<td>Stemmer will normalize several linguistic mutations of the same word (e.g. "run" and "running", or "property" and "properties"). This is also very useful when using <a href="../README.md#context-search">Context Search</a></td>
<td>
<code>Map([["ing", ""], ["ies", "y"]])</code><hr style="margin: 5px">
<code>encoder.addStemmer("ing", "")</code>
</td>
</tr>
<tr></tr>
<tr>
<td><code>mapper</code></td>
<td>Mapper will replace a single char (e.g. "é" into "e")</td>
<td>
<code>Map([["é", "e"], ["ß", "ss"]])</code><hr style="margin: 5px">
<code>encoder.addMapper("é", "e")</code>
</td>
</tr>
<tr></tr>
<tr>
<td><code>matcher</code></td>
<td>Matcher will do same as Mapper but instead of single chars it will replace char sequences</td>
<td>
<code>Map([["and", "&"], ["usd", "$"]])</code><hr style="margin: 5px">
<code>encoder.addMatcher("and", "&")</code>
</td>
</tr>
<tr></tr>
<tr>
<td><code>replacer</code></td>
<td>Replacer takes custom regular expressions and couldn't get optimized in the same way as Mapper or Matcher. You should take this as the last option when no other replacement can do the same.</td>
<td>
<code>[/[^a-z0-9]/g, "", /([^aeo])h(.)/g, "$1$2"])</code><hr style="margin: 5px">
<code>encoder.addReplacer(/[^a-z0-9]/g, "")</code>
</td>
</tr>
</table>
### Custom Encoder
> [!TIP]
> The methods `.addMapper()`, `.addMatcher()` and `.addReplacer()` might be confusing. For this reason they will automatically resolve to the right one when just using the same method for every rule. You can simplify this e.g. by just use `.addReplacer()` for each of this 3 rules.
## Custom Encoder
Since it is very simple to create a custom Encoder, you are welcome to create your own.
e.g.
@@ -291,100 +365,40 @@ If nothing of them are applicable for your task you should tokenize everything i
If you get some good results please feel free to share your encoder.
### Add language-specific stemmer and/or filter
### Encoder Processing Workflow
> __Stemmer:__ several linguistic mutations of the same word (e.g. "run" and "running")
1. Charset Normalization
2. Custom Preparation
3. Split Content (into terms, apply includes/excludes)
4. Filter: Pre-Filter
5. Stemmer (substitute term endings)
6. Filter: Post-Filter
7. Replace Chars (Mapper)
8. Letter Deduplication
9. Matcher (substitute partials)
10. Custom Regex (Replacer)
11. Custom Finalize
> __Filter:__ a blacklist of words to be filtered out from indexing at all (e.g. "and", "to" or "be")
This workflow schema might help you to understand each step in the iteration:
<br><br>
<img src="encoder-workflow.svg" style="max-width: 650px" width="100%">
Assign a private custom stemmer or filter during creation/initialization:
```js
const index = new Index({
stemmer: {
// object {key: replacement}
"ational": "ate",
"tional": "tion",
"enci": "ence",
"ing": ""
},
filter: [
// array blacklist
"in",
"into",
"is",
"isn't",
"it",
"it's"
]
});
```
Using a custom filter, e.g.:
```js
const index = new Index({
filter: function(value){
// just add values with length > 1 to the index
return value.length > 1;
}
});
```
Load language packs with legacy browser support (non-modules):
```html
<html>
<head>
<script src="js/flexsearch.bundle.min.js"></script>
<script src="js/lang/en.min.js"></script>
<script src="js/lang/de.min.js"></script>
</head>
...
```
In Node.js all built-in language packs files are available by its scope:
```js
const EnglishPreset = require("flexsearch/lang/en");
const index = new Index({
encoder: EnglishPreset
});
```
<a name="rtl"></a>
## Right-To-Left Support
> Set the tokenizer at least to "reverse" or "full" when using RTL.
> [!NOTE]
> When a string is already encoded/interpreted as Right-To-Left you didn't need to use that. This option is just useful, when the source content wasn't encoded as RTL.
Just set the field "rtl" to _true_ and use a compatible tokenizer:
Just set the property `rtl: true` when creating the `Encoder`:
```js
var index = new Index({
encode: str => str.toLowerCase().split(/[^a-z]+/),
tokenize: "reverse",
rtl: true
});
const encoder = new Encoder({ rtl: true });
```
<a name="cjk"></a>
## CJK Word Break (Chinese, Japanese, Korean)
Set a custom tokenizer which fits your needs, e.g.:
```js
var index = FlexSearch.create({
encode: str => str.replace(/[\x00-\x7F]/g, "").split("")
});
```
You can also pass a custom encoder function to apply some linguistic transformations.
```js
const index = new Index();
index.add(0, "一个单词");
```
```js
var results = index.search("单词");
```
@@ -394,7 +408,7 @@ var results = index.search("单词");
- German: `de`
- French: `fr`
### 1. Import Language Packs: ES6 Modules
### Import Language Packs: ES6 Modules
The most simple way to assign charset/language specific encoding via modules is:
@@ -428,7 +442,7 @@ const index = Index({
});
```
#### 2. Import Language Packs: ES5 Legacy Browser
#### Import Language Packs: ES5 Legacy Browser
When loading language packs, make sure that the library was loaded before:
@@ -457,6 +471,17 @@ const index = FlexSearch.Index({
});
```
#### Import Language Packs: Node.js
In Node.js all built-in language packs files are available by its scope:
```js
const EnglishPreset = require("flexsearch/lang/en");
const index = new Index({
encoder: EnglishPreset
});
```
### Share Encoders
Assigning the `Encoder` instance to the top level configuration will share the encoder to all fields. You should avoid this when contents of fields don't have the same type of content (e.g. one field contains terms, another contains numeric IDs).

View File

@@ -35,6 +35,8 @@ When using Server-Side-Rendering you can create a different export which instant
> When your index is too large you should use the default export/import mechanism.
You'll need Javascript to create the serialized output. Alternatively just create a small Node.js script to build the output.
As the first step populate the FlexSearch index with your contents.
You have two options:

View File

@@ -1,9 +1,11 @@
## Big In-Memory Keystores
The default maximum keystore limit for the In-Memory index is 2^24 of distinct terms/partials being stored (so-called "cardinality"). An additional register could be enabled and is dividing the index into self-balanced partitions.
The default maximum keystore limit for the In-Memory index is 2^24 of distinct terms/partials being stored (cardinality).
An additional register could be enabled and is dividing the index into self-balanced partitions.
The extended keystore is supported by any type of index.
```js
const index = new FlexSearchIndex({
const index = new Index({
// e.g. set keystore range to 8-Bit:
// 2^8 * 2^24 = 2^32 keys total
keystore: 8
@@ -14,4 +16,5 @@ You can theoretically store up to 2^88 keys (64-Bit address range).
The internal ID arrays scales automatically when limit of 2^31 has reached by using Proxy.
> Persistent storages has no keystore limit by default. You should not enable keystore when using persistent indexes, as long as you do not stress the buffer too hard before calling `index.commit()`.
> Persistent storages has no keystore limit by default.
> You should not enable keystore when using persistent indexes, as long as you do not stress the buffer too hard before calling `index.commit()`.

View File

@@ -74,8 +74,8 @@ await index.commit();
Alternatively mount a store by index creation:
```js
const index = new FlexSearchIndex({
db: new Storage("my-store")
const index = new Index({
db: new IndexedDB("my-store")
});
// await for the db response before access the first time
@@ -94,7 +94,7 @@ Auto-Commit is enabled by default and will process changes asynchronously in bat
You can fully disable the auto-commit feature and perform them manually:
```js
const index = new FlexSearchIndex({
const index = new Index({
db: new Storage("my-store"),
commit: false
});

View File

@@ -1,6 +1,10 @@
## Result Highlighting
Result highlighting could be just enabled when using Document-Index with enabled Data-Store. Also when you just want to add id-content-pairs you'll need to use a DocumentIndex for this feature (just define a simple document descriptor as shown below).
Demo: <a href="https://raw.githack.com/nextapps-de/flexsearch/master/demo/autocomplete.html" target="_blank">Auto-Complete</a>
> Result highlighting could be just enabled when using `Document`-Index with enabled document store by passing option `store` on creation.
Alternatively simply upgrade id-content-pairs to a flat document on-the-fly when calling `.add()`.
```js
// create the document index

View File

@@ -1,10 +1,9 @@
<a name="webworker"></a>
## Worker Parallelism (Browser + Node.js)
The new worker model from v0.7.0 is divided into "fields" from the document (1 worker = 1 field index). This way the worker becomes able to solve tasks (subtasks) completely. The downside of this paradigm is they might not have been perfect balanced in storing contents (fields may have different length of contents). On the other hand there is no indication that balancing the storage gives any advantage (they all require the same amount in total).
The internal worker model is distributed by document fields and will solve subtasks in parallel.
When using a document index, then just apply the option "worker":
```js
const index = new Document({
index: ["tag", "name", "title", "text"],
@@ -33,36 +32,51 @@ When you perform a field search through all fields then this task is being balan
### Worker Index
Above we have seen that documents will create worker automatically for each field. You can also create a WorkerIndex directly (same like using `Index` instead of `Document`).
Above we have seen that documents will create worker automatically for each field. You can also create a `Worker`-Index directly. It's the same as using `Index` instead of `Document`.
Use as ES6 module:
> Worker-Index always return a `Promise` for all methods called on the index.
#### ES6 Module (Bundle):
When using one of the bundles from `/dist/` you can create a Worker-Index:
```js
import WorkerIndex from "./worker/index.js";
const index = new WorkerIndex(options);
import { Worker } from "./dist/flexsearch.bundle.module.min.js";
const index = new Worker({/* options */ });
await index.add(1, "some");
await index.add(2, "content");
await index.add(3, "to");
await index.add(4, "index");
```
#### ES6 Module (Non-Bundle):
When not using a bundle you can take the worker file from `/dist/` folder as follows:
```js
import Worker from "./dist/module/worker.js";
const index = new Worker({/* options */ });
index.add(1, "some")
.add(2, "content")
.add(3, "to")
.add(4, "index");
```
Or when bundled version was used instead:
#### Browser Legacy (Bundle):
When loading a legacy bundle via script tag (non-modules):
```js
var index = new FlexSearch.Worker(options);
index.add(1, "some")
.add(2, "content")
.add(3, "to")
.add(4, "index");
const index = new FlexSearch.Worker({/* options */ });
await index.add(1, "some");
await index.add(2, "content");
await index.add(3, "to");
await index.add(4, "index");
```
Such a WorkerIndex works pretty much the same as a created instance of `Index`.
### Worker (Node.js)
> A WorkerIndex only support the `async` variant of all methods. That means when you call `index.search()` on a WorkerIndex this will perform also in async the same way as `index.searchAsync()` will do.
### Worker Threads (Node.js)
The worker model for Node.js is based on "worker threads" and works exactly the same way:
The worker model for Node.js is based on native worker threads and works exactly the same way:
```js
const { Document } = require("flexsearch");
@@ -80,7 +94,7 @@ const { Worker } = require("flexsearch");
const index = new Worker({ options });
```
### The Worker Async Model (Best Practices)
## The Worker Async Model (Best Practices)
A worker will always perform as async. On a query method call you always should handle the returned promise (e.g. use `await`) or pass a callback function as the last parameter.