1
0
mirror of https://github.com/nextapps-de/flexsearch.git synced 2025-09-02 18:33:17 +02:00

push v0.8 to master

# Conflicts:
#	README.md
This commit is contained in:
Thomas Wilkerling
2025-03-17 01:14:11 +01:00
parent 44153a67ad
commit 1820491b05
27 changed files with 6591 additions and 287 deletions

View File

@@ -1,5 +1,34 @@
# Changelog
### v0.8.0
- Persistent indexes support for: `IndexedDB` (Browser), `Redis`, `SQLite`, `Postgres`, `MongoDB`, `Clickhouse`
- Enhanced language customization via the new `Encoder` class
- Result Highlighting
- Query performance achieve results up to 4.5 times faster compared to the previous generation v0.7.x by also improving the quality of results
- Enhanced support for larger indexes or larger result sets
- Improved offset and limit processing achieve up to 100 times faster traversal performance through large datasets
- Support for larger In-Memory index with extended key size (the defaults maximum keystore limit is: 2^24)
- Greatly enhanced performance of the whole text encoding pipeline
- Improved indexing of numeric content (Triplets)
- Intermediate result sets and `Resolver`
- Basic Resolver: `and`, `or`, `xor`, `not`, `limit`, `offset`, `boost`, `resolve`
- Improved charset collection
- New charset preset `soundex` which further reduces memory consumption by also increasing "fuzziness"
- Performance gain when polling tasks to the index by using "Event-Loop-Caches"
- Up to 100 times faster deletion/replacement when not using the additional "fastupdate" register
- Regex Pre-Compilation (transforms hundreds of regex rules into just a few)
- Extended support for multiple tags (DocumentIndex)
- Custom Fields ("Virtual Fields")
- Custom Filter
- Custom Score Function
- Added French language preset (stop-word filter, stemmer)
- Enhanced Worker Support
- Export / Import index in chunks
- Improved Build System + Bundler (Supported: CommonJS, ESM, Global Namespace), also the import of language packs are now supported for Node.js
- Full covering index.d.ts type definitions
- Fast-Boot Serialization optimized for Server-Side-Rendering (PHP, Python, Ruby, Rust, Java, Go, Node.js, ...)
### v0.7.0
- Bidirectional Context (the order of words can now vary, does not increase memory when using bidirectional context)

3393
README.md

File diff suppressed because it is too large Load Diff

1857
doc/0.8.0.md Normal file

File diff suppressed because it is too large Load Diff

174
doc/custom-builds.md Normal file
View File

@@ -0,0 +1,174 @@
## Custom Builds
The `/src/` folder of this repository requires some compilation to resolve the build flags. Those are your options:
- Closure Compiler (Advanced Compilation) (used by this library <a href="task/build.js">here</a>)
- Babel + Plugin `babel-plugin-conditional-compile` (used by this library <a href="task/babel.min.json">here</a>)
You can't resolve build flags with:
- Webpack
- esbuild
- rollup
- Terser
These are some of the basic builds located in the `/dist/` folder:
```bash
npm run build:bundle
npm run build:light
npm run build:module
npm run build:es5
```
Perform a custom build (UMD bundle) by passing build flags:
```bash
npm run build:custom SUPPORT_DOCUMENT=true SUPPORT_TAGS=true LANGUAGE_OUT=ECMASCRIPT5 POLYFILL=true
```
Perform a custom build in ESM module format:
```bash
npm run build:custom RELEASE=custom.module SUPPORT_DOCUMENT=true SUPPORT_TAGS=true
```
Perform a debug build:
```bash
npm run build:custom DEBUG=true SUPPORT_DOCUMENT=true SUPPORT_TAGS=true
```
> On custom builds each build flag will be set to `false` by default when not passed.
The custom build will be saved to `dist/flexsearch.custom.xxxx.min.js` or when format is module to `dist/flexsearch.custom.module.xxxx.min.js` (the "xxxx" is a hash based on the used build flags).
<a name="build-flags" id="builds"></a>
### Supported Build Flags
<table>
<tr>
<td>Flag</td>
<td>Values</td>
<td>Info</td>
</tr>
<tr>
<td colspan="3"><br><b>Feature Flags</b></td>
</tr>
<tr>
<td>SUPPORT_WORKER</td>
<td>true, false</td>
<td></td>
</tr>
<tr></tr>
<tr>
<td>SUPPORT_ENCODER</td>
<td>true, false</td>
<td></td>
</tr>
<tr></tr>
<tr>
<td>SUPPORT_CHARSET</td>
<td>true, false</td>
<td></td>
</tr>
<tr></tr>
<tr>
<td>SUPPORT_CACHE</td>
<td>true, false</td>
<td></td>
</tr>
<tr></tr>
<tr>
<td>SUPPORT_ASYNC</td>
<td>true, false</td>
<td>Asynchronous Rendering (support Promises)</td>
</tr>
<tr></tr>
<tr>
<td>SUPPORT_STORE</td>
<td>true, false</td>
<td></td>
</tr>
<tr></tr>
<tr>
<td>SUPPORT_SUGGESTION</td>
<td>true, false</td>
<td></td>
</tr>
<tr></tr>
<tr>
<td>SUPPORT_SERIALIZE</td>
<td>true, <b>false</b></td>
<td></td>
</tr>
<tr></tr>
<tr>
<td>SUPPORT_DOCUMENT</td>
<td>true, false</td>
<td></td>
</tr>
<tr></tr>
<tr>
<td>SUPPORT_TAGS</td>
<td>true, false</td>
<td></td>
</tr>
<tr></tr>
<tr>
<td>SUPPORT_PERSISTENT</td>
<td>true, false</td>
<td></td>
</tr>
<tr></tr>
<tr>
<td>SUPPORT_KEYSTORE</td>
<td>true, false</td>
<td></td>
</tr>
<tr></tr>
<tr>
<td>SUPPORT_COMPRESSION</td>
<td>true, false</td>
<td></td>
</tr>
<tr></tr>
<tr>
<td>SUPPORT_RESOLVER</td>
<td>true, false</td>
<td></td>
</tr>
<tr>
<td colspan="3"><br><b>Compiler Flags</b></td>
</tr>
<tr>
<td>DEBUG</td>
<td>true, <b>false</b></td>
<td>Output debug information to the console (default: false)</td>
</tr>
<tr></tr>
<tr>
<td>RELEASE<br><br><br><br><br></td>
<td><b>custom</b><br>custom.module<br>bundle<br>bundle.module<br>es5<br>light<br>compact</td>
<td></td>
</tr>
<tr></tr>
<tr>
<td>POLYFILL</td>
<td>true, <b>false</b></td>
<td>Include Polyfills (based on LANGUAGE_OUT)</td>
</tr>
<tr></tr>
<tr>
<td>PROFILER</td>
<td>true, <b>false</b></td>
<td>Just used for automatic performance tests</td>
</tr>
<tr></tr>
<tr>
<td>LANGUAGE_OUT<br><br><br><br><br><br><br><br><br><br><br></td>
<td>ECMASCRIPT3<br>ECMASCRIPT5<br>ECMASCRIPT_2015<br>ECMASCRIPT_2016<br>ECMASCRIPT_2017<br>ECMASCRIPT_2018<br>ECMASCRIPT_2019<br>ECMASCRIPT_2020<br>ECMASCRIPT_2021<br>ECMASCRIPT_2022<br>ECMASCRIPT_NEXT<br>STABLE</td>
<td>Target language</td>
</tr>
</table>

39
doc/customization.md Normal file
View File

@@ -0,0 +1,39 @@
## Custom Score Function
```js
const index = new FlexSearchIndex({
resolution: 10,
score: function(content, term, term_index, partial, partial_index){
// you'll need to return a number between 0 and "resolution"
// score is starting from 0, which is the highest score
// for a resolution of 10 you can return 0 - 9
// ...
return 3;
}
});
```
A common situation is you have some predefined labels which are related to some kind of order, e.g. the importance or priority. A priority label could be `high`, `moderate`, `low` so you can derive the scoring from those properties. Another example is when you have something already ordered and you would like to keep this order as relevance.
Probably you won't need the parameters passed to the score function. But when needed here are the parameters from the score function explained:
1. `content` is the whole content as an array of terms (encoded)
2. `term` is the current term which is actually processed (encoded)
3. `term_index` is the index of the term in the content array
4. `partial` is the current partial of a term which is actually processed
5. `partial_index` is the index position of the partial within the term
Partials params are empty when using tokenizer `strict`. Let's take an example by using the tokenizer `full`.
The content: "This is an ex[amp]()le of partial encoding"<br>
The highlighting part marks the partial which is actually processed. Then your score function will called by passing these parameters:
```js
function score(content, term, term_index, partial, partial_index){
content = ["this", "is", "an", "example", "of", "partial", "encoding"]
term = "example"
term_index = 3
partial = "amp"
partial_index = 2
}
```

246
doc/document-search.md Normal file
View File

@@ -0,0 +1,246 @@
## Merge Document Results
By default, the result set of Field-Search has a structure grouped by field names:
```js
[{
field: "fieldname-1",
result: [{
id: 1001,
doc: {/* stored document */}
}]
},{
field: "fieldname-2",
result: [{
id: 1001,
doc: {/* stored document */}
}]
},{
field: "fieldname-3",
result: [{
id: 1002,
doc: {/* stored document */}
}]
}]
```
By passing the search option `merge: true` the result set will be merged into (group by id):
```js
[{
id: 1001,
doc: {/* stored document */}
field: ["fieldname-1", "fieldname-2"]
},{
id: 1002,
doc: {/* stored document */}
field: ["fieldname-3"]
}]
```
## Multi-Tag-Search
Assume this document schema (a dataset from IMDB):
```js
{
"tconst": "tt0000001",
"titleType": "short",
"primaryTitle": "Carmencita",
"originalTitle": "Carmencita",
"isAdult": 0,
"startYear": "1894",
"endYear": "",
"runtimeMinutes": "1",
"genres": [
"Documentary",
"Short"
]
}
```
An appropriate document descriptor could look like:
```js
import LatinEncoder from "./charset/latin/simple.js";
const flexsearch = new Document({
encoder: LatinEncoder,
resolution: 3,
document: {
id: "tconst",
//store: true, // document store
index: [{
field: "primaryTitle",
tokenize: "forward"
},{
field: "originalTitle",
tokenize: "forward"
}],
tag: [
"startYear",
"genres"
]
}
});
```
The field contents of `primaryTitle` and `originalTitle` are encoded by the forward tokenizer. The field contents of `startYear` and `genres` are added as tags.
Get all entries of a specific tag:
```js
const result = flexsearch.search({
//enrich: true, // enrich documents
tag: { "genres": "Documentary" },
limit: 1000,
offset: 0
});
```
Get entries of multiple tags (intersection):
```js
const result = flexsearch.search({
//enrich: true, // enrich documents
tag: {
"genres": ["Documentary", "Short"],
"startYear": "1894"
}
});
```
Combine tags with queries (intersection):
```js
const result = flexsearch.search({
query: "Carmen", // forward tokenizer
tag: {
"genres": ["Documentary", "Short"],
"startYear": "1894"
}
});
```
Alternative declaration:
```js
const result = flexsearch.search("Carmen", {
tag: [{
field: "genres",
tag: ["Documentary", "Short"]
},{
field: "startYear",
tag: "1894"
}]
});
```
## Filter Fields (Index / Tags / Datastore)
```js
const flexsearch = new Document({
document: {
id: "id",
index: [{
// custom field:
field: "somefield",
filter: function(data){
// return false to filter out
// return anything else to keep
return true;
}
}],
tag: [{
field: "city",
filter: function(data){
// return false to filter out
// return anything else to keep
return true;
}
}],
store: [{
field: "anotherfield",
filter: function(data){
// return false to filter out
// return anything else to keep
return true;
}
}]
}
});
```
## Custom Fields (Index / Tags / Datastore)
Dataset example:
```js
{
"id": 10001,
"firstname": "John",
"lastname": "Doe",
"city": "Berlin",
"street": "Alexanderplatz",
"number": "1a",
"postal": "10178"
}
```
You can apply custom fields derived from data or by anything else:
```js
const flexsearch = new Document({
document: {
id: "id",
index: [{
// custom field:
field: "fullname",
custom: function(data){
// return custom string
return data.firstname + " " +
data.lastname;
}
},{
// custom field:
field: "location",
custom: function(data){
return data.street + " " +
data.number + ", " +
data.postal + " " +
data.city;
}
}],
tag: [{
// existing field
field: "city"
},{
// custom field:
field: "category",
custom: function(data){
let tags = [];
// push one or multiple tags
// ....
return tags;
}
}],
store: [{
field: "anotherfield",
custom: function(data){
// return a falsy value to filter out
// return anything else as to keep in store
return data;
}
}]
}
});
```
> Filter is also available in custom functions when returning `false`.
Perform a query against the custom field as usual:
```js
const result = flexsearch.search({
query: "10178 Berlin Alexanderplatz",
field: "location"
});
```
```js
const result = flexsearch.search({
query: "john doe",
tag: { "city": "Berlin" }
});
```

183
doc/encoder.md Normal file
View File

@@ -0,0 +1,183 @@
## Encoder
Search capabilities highly depends on language processing. The old workflow wasn't really practicable. The new Encoder class is a huge improvement and fully replaces the encoding part. Some FlexSearch options was moved to the new `Encoder` instance.
New Encoding Pipeline:
1. charset normalization
2. custom preparation
3. split into terms (apply includes/excludes)
4. filter (pre-filter)
5. matcher (substitute terms)
6. stemmer (substitute term endings)
7. filter (post-filter)
8. replace chars (mapper)
9. custom regex (replacer)
10. letter deduplication
11. apply finalize
### Example
```js
const encoder = new Encoder({
normalize: true,
dedupe: true,
cache: true,
include: {
letter: true,
number: true,
symbol: false,
punctuation: false,
control: false,
char: "@"
}
});
```
You can use an `include` __instead__ of an `exclude` definition:
```js
const encoder = new Encoder({
exclude: {
letter: false,
number: false,
symbol: true,
punctuation: true,
control: true
}
});
```
Instead of using `include` or `exclude` you can pass a regular expression to the field `split`:
```js
const encoder = new Encoder({
split: /\s+/
});
```
> The definitions `include` and `exclude` is a replacement for `split`. You can just define one of those 3.
Adding custom functions to the encoder pipeline:
```js
const encoder = new Encoder({
normalize: function(str){
return str.toLowerCase();
},
prepare: function(str){
return str.replace(/&/g, " and ");
},
finalize: function(arr){
return arr.filter(term => term.length > 2);
}
});
```
Assign encoder to an index:
```js
const index = new Index({
encoder: encoder
});
```
Define language specific transformations:
```js
const encoder = new Encoder({
replacer: [
/[´`ʼ]/g, "'"
],
filter: new Set([
"and",
]),
matcher: new Map([
["xvi", "16"]
]),
stemmer: new Map([
["ly", ""]
]),
mapper: new Map([
["é", "e"]
])
});
```
Or use predefined language and extend it with custom options:
```js
import EnglishBookPreset from "./lang/en.js";
const encoder = new Encoder(EnglishBookPreset, {
filter: false
});
```
Equivalent:
```js
import EnglishBookPreset from "./lang/en.js";
const encoder = new Encoder(EnglishBookPreset);
encoder.assign({ filter: false });
```
Assign extensions to the encoder instance:
```js
import LatinEncoderPreset from "./charset/latin/simple.js";
import EnglishBookPreset from "./lang/en.js";
// stack definitions to the encoder instance
const encoder = new Encoder()
.assign(LatinEncoderPreset)
.assign(EnglishBookPreset)
// override preset options ...
.assign({ minlength: 3 });
// assign further presets ...
```
> When adding extension to the encoder every previously assigned configuration is still intact, very much like Mixins, also when assigning custom functions.
Add custom transformations to an existing index:
```js
import LatinEncoderPreset from "./charset/latin/default.js";
const encoder = new Encoder(LatinEncoderPreset);
encoder.addReplacer(/[´`ʼ]/g, "'");
encoder.addFilter("and");
encoder.addMatcher("xvi", "16");
encoder.addStemmer("ly", "");
encoder.addMapper("é", "e");
```
Shortcut for just assigning one encoder configuration to an index:
```js
import LatinEncoderPreset from "./charset/latin/default.js";
const index = new Index({
encoder: LatinEncoderPreset
});
```
### Custom Encoder
Since it is very simple to create a custom Encoder, you are welcome to create your own.
e.g.
```js
function customEncoder(content){
const tokens = [];
// split content into terms/tokens
// apply your changes to each term/token
// you will need to return an Array of terms/tokens
// so just iterate through the input string and
// push tokens to the array
// ...
return tokens;
}
const index = new Index({
// set to strict when your tokenization was already done
tokenize: "strict",
encode: customEncoder
});
```
If you get some good results please feel free to share your encoder.

99
doc/export-import.md Normal file
View File

@@ -0,0 +1,99 @@
## Import / Export (In-Memory)
> Persistent-Indexes and Worker-Indexes don't support Import/Export.
Export an `Index` or `Document-Index` to the folder `/export/`:
```js
import { promises as fs } from "fs";
await index.export(async function(key, data){
await fs.writeFile("./export/" + key, data, "utf8");
});
```
Import from folder `/export/` into an `Index` or `Document-Index`:
```js
const index = new Index({/* keep old config and place it here */});
const files = await fs.readdir("./export/");
for(let i = 0; i < files.length; i++){
const data = await fs.readFile("./export/" + files[i], "utf8");
await index.import(files[i], data);
}
```
> You'll need to use the same configuration as you used before the export. Any changes on the configuration needs to be re-indexed.
## Fast-Boot Serialization for Server-Side-Rendering (PHP, Python, Ruby, Rust, Java, Go, Node.js, ...)
> This is an experimental feature with limited support which probably might drop in future release. You're welcome to give some feedback.
When using Server-Side-Rendering you can create a different export which instantly boot up. Especially when using Server-side rendered content, this could help to restore a __<u>static</u>__ index on page load. Document-Indexes aren't supported yet for this method.
> When your index is too large you should use the default export/import mechanism.
As the first step populate the FlexSearch index with your contents.
You have two options:
### 1. Create a function as string
```js
const fn_string = index.serialize();
```
The contents of `fn_string` is a valid Javascript-Function declared as `inject(index)`. Store it or place this somewhere in your code.
This function basically looks like:
```js
function inject(index){
index.reg = new Set([/* ... */]);
index.map = new Map([/* ... */]);
index.ctx = new Map([/* ... */]);
}
```
You can save this function by e.g. `fs.writeFileSync("inject.js", fn_string);` or place it as string in your SSR-generated markup.
After creating the index on client side just call the inject method like:
```js
const index = new Index({/* use same configuration! */});
inject(index);
```
That's it.
> You'll need to use the same configuration as you used before the export. Any changes on the configuration needs to be re-indexed.
### 2. Create just a function body as string
Alternatively you can use lazy function declaration by passing `false` to the serialize function:
```js
const fn_body = index.serialize(false);
```
You will get just the function body which looks like:
```js
index.reg = new Set([/* ... */]);
index.map = new Map([/* ... */]);
index.ctx = new Map([/* ... */]);
```
Now you can place this in your code directly (name your index as `index`), or you can also create an inject function from it, e.g.:
```js
const inject = new Function("index", fn_body);
```
This function is callable like the above example:
```js
const index = new Index();
inject(index);
```

View File

@@ -0,0 +1,87 @@
<?xml version="1.0" encoding="UTF-8"?>
<svg width="717px" height="276px" viewBox="0 0 717 276" version="1.1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
<title>FlexSearch</title>
<g id="Page-1" stroke="none" stroke-width="1" fill="none" fill-rule="evenodd">
<g id="Artboard" transform="translate(-442.000000, -355.000000)">
<g id="flexsearch" transform="translate(442.820000, 435.016000)">
<path d="M369.16,89.984 C381.362667,89.984 387.464,84.864 387.464,74.624 C387.464,69.6746667 386.674667,65.792 385.096,62.976 C383.517333,60.16 380.509333,57.8986667 376.072,56.192 L360.968,49.408 C356.104,47.5306667 353.672,44.8 353.672,41.216 L353.672,39.04 C353.672,37.3333333 354.461333,35.84 356.04,34.56 C357.618667,33.28 359.944,32.64 363.016,32.64 L382.344,32.64 C383.538667,32.64 384.136,32.0853333 384.136,30.976 L384.136,26.496 C384.136,25.3013333 383.538667,24.704 382.344,24.704 L361.352,24.704 C349.832,24.704 344.072,29.952 344.072,40.448 C344.072,44.4586667 345.010667,47.68 346.888,50.112 C348.765333,52.544 351.88,54.656 356.232,56.448 L371.336,62.848 C375.688,64.5546667 377.864,67.6693333 377.864,72.192 L377.864,74.624 C377.864,79.5733333 374.536,82.048 367.88,82.048 L346.376,82.048 C345.181333,82.048 344.584,82.6026667 344.584,83.712 L344.584,88.192 C344.584,89.3866667 345.181333,89.984 346.376,89.984 L369.16,89.984 Z M456.24,89.984 C457.434667,89.984 458.032,89.4293333 458.032,88.32 L458.032,83.712 C458.032,82.5173333 457.434667,81.92 456.24,81.92 L430.128,81.92 C422.96,81.92 419.376,78.208 419.376,70.784 L419.376,62.976 C419.376,62.208 419.802667,61.824 420.656,61.824 L456.88,61.824 C458.074667,61.824 458.672,61.184 458.672,59.904 L458.672,43.648 C458.672,37.6746667 457.008,33.024 453.68,29.696 C450.352,26.368 445.744,24.704 439.856,24.704 L429.104,24.704 C423.216,24.704 418.586667,26.368 415.216,29.696 C411.845333,33.024 410.16,37.6746667 410.16,43.648 L410.16,71.04 C410.16,77.0133333 411.845333,81.664 415.216,84.992 C418.586667,88.32 423.216,89.984 429.104,89.984 L456.24,89.984 Z M420.656,55.04 C419.802667,55.04 419.376,54.6133333 419.376,53.76 L419.376,43.904 C419.376,36.5653333 422.96,32.896 430.128,32.896 L438.832,32.896 C446,32.896 449.584,36.5653333 449.584,43.904 L449.584,53.76 C449.584,54.6133333 449.157333,55.04 448.304,55.04 L420.656,55.04 Z M507.736,89.984 C514.306667,89.984 518.658667,87.7226667 520.792,83.2 L521.304,83.2 L521.304,88.192 C521.304,89.3866667 521.901333,89.984 523.096,89.984 L528.856,89.984 C530.050667,89.984 530.648,89.3866667 530.648,88.192 L530.648,26.496 C530.648,25.3013333 530.050667,24.704 528.856,24.704 L501.08,24.704 C495.192,24.704 490.562667,26.368 487.192,29.696 C483.821333,33.024 482.136,37.6746667 482.136,43.648 L482.136,71.04 C482.136,77.0133333 483.821333,81.664 487.192,84.992 C490.562667,88.32 495.192,89.984 501.08,89.984 L507.736,89.984 Z M502.104,81.792 C495.021333,81.792 491.48,78.08 491.48,70.656 L491.48,44.032 C491.48,36.608 495.021333,32.896 502.104,32.896 L520.024,32.896 C520.877333,32.896 521.304,33.28 521.304,34.048 L521.304,70.016 C521.304,73.7706667 520.28,76.672 518.232,78.72 C516.184,80.768 513.24,81.792 509.4,81.792 L502.104,81.792 Z M564.864,89.984 C566.058667,89.984 566.656,89.3866667 566.656,88.192 L566.656,45.824 C566.656,41.8133333 567.68,38.656 569.728,36.352 C571.776,34.048 574.72,32.896 578.56,32.896 L585.472,32.896 C586.069333,32.9813333 586.496,32.8746667 586.752,32.576 C587.008,32.2773333 587.136,31.8293333 587.136,31.232 L587.136,26.496 C587.136,25.3013333 586.581333,24.704 585.472,24.704 L580.48,24.704 C577.237333,24.704 574.421333,25.3866667 572.032,26.752 C569.642667,28.1173333 568.021333,29.7386667 567.168,31.616 L566.656,31.616 L566.656,26.496 C566.656,25.3013333 566.058667,24.704 564.864,24.704 L559.104,24.704 C557.909333,24.704 557.312,25.3013333 557.312,26.496 L557.312,88.192 C557.312,89.3866667 557.909333,89.984 559.104,89.984 L564.864,89.984 Z M642.88,89.984 C644.074667,89.984 644.672,89.4293333 644.672,88.32 L644.672,83.584 C644.672,82.3893333 644.074667,81.792 642.88,81.792 L625.344,81.792 C618.261333,81.792 614.72,78.08 614.72,70.656 L614.72,44.032 C614.72,36.608 618.261333,32.896 625.344,32.896 L642.88,32.896 C644.074667,32.896 644.672,32.3413333 644.672,31.232 L644.672,26.496 C644.672,25.3013333 644.074667,24.704 642.88,24.704 L624.32,24.704 C618.432,24.704 613.802667,26.368 610.432,29.696 C607.061333,33.024 605.376,37.6746667 605.376,43.648 L605.376,71.04 C605.376,77.0133333 607.061333,81.664 610.432,84.992 C613.802667,88.32 618.432,89.984 624.32,89.984 L642.88,89.984 Z M674.28,89.984 C675.474667,89.984 676.072,89.3866667 676.072,88.192 L676.072,44.672 C676.072,40.9173333 677.096,38.016 679.144,35.968 C681.192,33.92 684.136,32.896 687.976,32.896 L695.144,32.896 C702.312,32.896 705.896,36.608 705.896,44.032 L705.896,88.192 C705.896,89.3866667 706.493333,89.984 707.688,89.984 L713.448,89.984 C714.642667,89.984 715.24,89.3866667 715.24,88.192 L715.24,43.648 C715.24,37.6746667 713.618667,33.024 710.376,29.696 C707.133333,26.368 702.610667,24.704 696.808,24.704 L689.64,24.704 C683.069333,24.704 678.717333,26.9653333 676.584,31.488 L676.072,31.488 L676.072,1.792 C676.072,0.597333333 675.474667,0 674.28,0 L668.52,0 C667.325333,0 666.728,0.597333333 666.728,1.792 L666.728,88.192 C666.728,89.3866667 667.325333,89.984 668.52,89.984 L674.28,89.984 Z" id="search" fill="#999" fill-rule="nonzero"></path>
<path d="M43.52,7.68 C45.2266667,7.68 46.08,8.576 46.08,10.368 L46.08,10.368 L46.08,19.072 C46.08,20.864 45.2266667,21.76 43.52,21.76 L43.52,21.76 L17.792,21.76 C16.768,21.76 16.256,22.2293333 16.256,23.168 L16.256,23.168 L16.256,43.008 C16.256,44.032 16.768,44.544 17.792,44.544 L17.792,44.544 L39.68,44.544 C41.472,44.544 42.368,45.44 42.368,47.232 L42.368,47.232 L42.368,55.808 C42.368,57.6853333 41.472,58.624 39.68,58.624 L39.68,58.624 L17.792,58.624 C16.768,58.624 16.256,59.0933333 16.256,60.032 L16.256,60.032 L16.256,87.296 C16.256,89.088 15.36,89.984 13.568,89.984 L13.568,89.984 L2.688,89.984 C0.896,89.984 0,89.088 0,87.296 L0,87.296 L0,10.368 C0,8.576 0.896,7.68 2.688,7.68 L2.688,7.68 Z M81.168,7.68 C82.96,7.68 83.856,8.576 83.856,10.368 L83.856,10.368 L83.856,74.368 C83.856,75.3066667 84.368,75.776 85.392,75.776 L85.392,75.776 L112.4,75.776 C114.106667,75.776 114.96,76.7146667 114.96,78.592 L114.96,78.592 L114.96,87.296 C114.96,89.088 114.106667,89.984 112.4,89.984 L112.4,89.984 L70.288,89.984 C68.496,89.984 67.6,89.088 67.6,87.296 L67.6,87.296 L67.6,10.368 C67.6,8.576 68.496,7.68 70.288,7.68 L70.288,7.68 Z M266.984,7.68 C269.117333,7.68 270.525333,8.576 271.208,10.368 L271.208,10.368 L283.368,37.888 L283.88,37.888 L295.912,10.368 C296.765333,8.576 298.130667,7.68 300.008,7.68 L300.008,7.68 L312.552,7.68 C313.405333,7.68 314.024,8 314.408,8.64 C314.792,9.28 314.813333,9.94133333 314.472,10.624 L314.472,10.624 L295.016,47.872 L314.856,87.04 C315.282667,87.8933333 315.325333,88.5973333 314.984,89.152 C314.642667,89.7066667 314.002667,89.984 313.064,89.984 L313.064,89.984 L300.264,89.984 C298.472,89.984 297.234667,89.1306667 296.552,87.424 L296.552,87.424 L283.88,59.776 L283.368,59.776 L270.568,87.424 C269.8,89.1306667 268.52,89.984 266.728,89.984 L266.728,89.984 L254.056,89.984 C253.202667,89.984 252.605333,89.7066667 252.264,89.152 C251.922667,88.5973333 251.922667,87.936 252.264,87.168 L252.264,87.168 L272.104,47.872 L252.904,10.496 C252.562667,9.81333333 252.562667,9.17333333 252.904,8.576 C253.245333,7.97866667 253.842667,7.68 254.696,7.68 L254.696,7.68 Z" id="flex" fill="#4986FF" fill-rule="nonzero"></path>
<!--
<path d="M229.17,75.884 C230.274569,75.884 231.17,76.7794305 231.17,77.884 L231.17,87.964 C231.17,89.0685695 230.274569,89.964 229.17,89.964 L136.07,89.964 C134.965431,89.964 134.07,89.0685695 134.07,87.964 L134.07,77.884 C134.07,76.7794305 134.965431,75.884 136.07,75.884 L229.17,75.884 Z M229.17,41.184 C230.274569,41.184 231.17,42.0794305 231.17,43.184 L231.17,53.264 C231.17,54.3685695 230.274569,55.264 229.17,55.264 L136.07,55.264 C134.965431,55.264 134.07,54.3685695 134.07,53.264 L134.07,43.184 C134.07,42.0794305 134.965431,41.184 136.07,41.184 L229.17,41.184 Z M229.17,7.684 C230.274569,7.684 231.17,8.5794305 231.17,9.684 L231.17,19.764 C231.17,20.8685695 230.274569,21.764 229.17,21.764 L136.07,21.764 C134.965431,21.764 134.07,20.8685695 134.07,19.764 L134.07,9.684 C134.07,8.5794305 134.965431,7.684 136.07,7.684 L229.17,7.684 Z" id="Shape" fill="#FF7300"></path>
-->
</g>
<g id="flexsearch" transform="translate(442.820000, 355.016000)">
<path d="M275.275454,208.024387 C277.228076,209.977008 277.228076,213.142833 275.275454,215.095454 C251.566743,238.804165 220.015227,251.739774 187.17553,251.996115 L186.175454,252 C158.267443,252 131.240582,242.84323 109.10845,225.696888 L60.3379221,274.468037 C59.5568735,275.249086 58.2905435,275.249086 57.5094949,274.468037 L49.0242136,265.982756 C48.243165,265.201707 48.243165,263.935378 49.0242136,263.154329 L97.0814271,215.096 L97.08,215.095454 L104.178542,208 L106.261974,210.083151 C127.837358,230.604849 155.981619,241.768069 185.258678,241.996425 L186.18,242 C216.751149,242 246.150834,230.07794 268.204387,208.024387 C270.157008,206.071765 273.322833,206.071765 275.275454,208.024387 Z M275.275454,36.9045456 C277.228076,38.857167 277.228076,42.0229919 275.275454,43.9756134 C273.322833,45.9282348 270.157008,45.9282348 268.204387,43.9756134 C222.903569,-1.32520446 149.456431,-1.32520446 104.155613,43.9756134 C102.202992,45.9282348 99.037167,45.9282348 97.0845456,43.9756134 C95.1319241,42.0229919 95.1319241,38.857167 97.0845456,36.9045456 C146.290606,-12.3015152 226.069394,-12.3015152 275.275454,36.9045456 Z" id="Shape" fill="#999"></path>
</g>
</g>
</g>
<g transform="scale(0.33) translate(135, 66)">
<g>
<path fill="none" stroke="#FF7300" stroke-width="40" stroke-linecap="round" stroke-dashoffset="0" stroke-dasharray="240px 950px" fill-rule="nonzero" d="M300,220 L540,220">
<!--
<animateTransform id="p1"
attributeName="transform"
values="0 0; 0 -50; 0 -50; 0 0"
dur="4s"
type="translate"
repeatCount="indefinite"
/>
<animateTransform id="p1"
attributeName="transform"
values="0 -50; 0 0; 0 0; 0 -50"
dur="4s"
type="translate"
repeatCount="indefinite"
/>
-->
<animate id="p1"
attributeName="stroke-dashoffset"
begin="0.1s"
values="0px;90px;00px;40px;0px;"
dur="2.1s"
calcMode="linear"
repeatCount="45656"
/>
</path>
<path fill="none" stroke="#FF7300" stroke-width="40" stroke-linecap="round" stroke-dashoffset="0" stroke-dasharray="240px 240px" fill-rule="nonzero" d="M300,320 L540,320">
<animate id="p1"
attributeName="stroke-dashoffset"
begin="0s"
values="0px;110px;10px;70px;0px;"
dur="1.9s"
calcMode="linear"
repeatCount="45656"
/>
</path>
<path fill="none" stroke="#FF7300" stroke-width="40" stroke-linecap="round" stroke-dashoffset="0" stroke-dasharray="240px 950px" fill-rule="nonzero" d="M300,420 L540,420">
<!--
<animateTransform id="p1"
attributeName="transform"
values="0 216; 0 276; 0 276; 0 216"
dur="4s"
type="translate"
repeatCount="99999"
/>
<animateTransform id="p1"
attributeName="transform"
values="0 276; 0 216; 0 216; 0 276"
dur="4s"
type="translate"
repeatCount="99999"
/>
-->
<animate id="p1"
attributeName="stroke-dashoffset"
begin="0.2s"
values="0px;30px;20px;110px;0px"
dur="1.65s"
calcMode="linear"
repeatCount="45656"
/>
</path>
</g>
</g>
</svg>

After

Width:  |  Height:  |  Size: 12 KiB

109
doc/fuzzy-search.md Normal file
View File

@@ -0,0 +1,109 @@
## Fuzzy-Search
Fuzzysearch describes a basic concept of how making queries more tolerant. FlexSearch provides several methods to achieve fuzziness:
1. Use a tokenizer: `forward`, `reverse` or `full`
2. Don't forget to use any of the builtin encoder `simple` > `balance` > `advanced` > `extra` > `soundex` (sorted by fuzziness)
3. Use one of the language specific presets e.g. `/lang/en.js` for en-US specific content
4. Enable suggestions by passing the search option `suggest: true`
Additionally, you can apply custom `Mapper`, `Replacer`, `Stemmer`, `Filter` or by assigning a custom `normalize(str)`, `prepare(str)` or `finalize(arr)` function to the Encoder.
### Compare Fuzzy-Search Encoding
Original term which was indexed: "Struldbrugs"
<table>
<tr>
<th align="left">Encoder:</th>
<th><code>LatinExact</code></th>
<th><code>LatinDefault</code></th>
<th><code>LatinSimple</code></th>
<th><code>LatinBalance</code></th>
<th><code>LatinAdvanced</code></th>
<th><code>LatinExtra</code></th>
<th><code>LatinSoundex</code></th>
</tr>
<tr>
<th align="left">Index Size</th>
<th>3.1 Mb</th>
<th>1.9 Mb</th>
<th>1.8 Mb</th>
<th>1.7 Mb</th>
<th>1.6 Mb</th>
<th>1.1 Mb</th>
<th>0.7 Mb</th>
</tr>
<tr>
<td align="left">Struldbrugs</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td align="left">struldbrugs</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td align="left">strũldbrųĝgs</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td align="left">strultbrooks</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td align="left">shtruhldbrohkz</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td align="left">zdroltbrykz</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td align="left">struhlbrogger</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</table>
The index size was measured after indexing the book "Gulliver's Travels".

17
doc/keystore.md Normal file
View File

@@ -0,0 +1,17 @@
## Big In-Memory Keystores
The default maximum keystore limit for the In-Memory index is 2^24 of distinct terms/partials being stored (so-called "cardinality"). An additional register could be enabled and is dividing the index into self-balanced partitions.
```js
const index = new FlexSearchIndex({
// e.g. set keystore range to 8-Bit:
// 2^8 * 2^24 = 2^32 keys total
keystore: 8
});
```
You can theoretically store up to 2^88 keys (64-Bit address range).
The internal ID arrays scales automatically when limit of 2^31 has reached by using Proxy.
> Persistent storages has no keystore limit by default. You should not enable keystore when using persistent indexes, as long as you do not stress the buffer too hard before calling `index.commit()`.

213
doc/persistent.md Normal file
View File

@@ -0,0 +1,213 @@
## Persistent Indexes
FlexSearch provides a new Storage Adapter where indexes are delegated through persistent storages.
Supported:
- [IndexedDB (Browser)](persistent-indexeddb.md)
- [Redis](persistent-redis.md)
- [SQLite](persistent-sqlite.md)
- [Postgres](persistent-postgres.md)
- [MongoDB](persistent-mongodb.md)
- [Clickhouse](persistent-clickhouse.md)
The `.export()` and `.import()` methods are still available for non-persistent In-Memory indexes.
All search capabilities are available on persistent indexes like:
- Context-Search
- Suggestions
- Cursor-based Queries (Limit/Offset)
- Scoring (supports a resolution of up to 32767 slots)
- Document-Search
- Partial Search
- Multi-Tag-Search
- Boost Fields
- Custom Encoder
- Resolver
- Tokenizer (Strict, Forward, Reverse, Full)
- Document Store (incl. enrich results)
- Worker Threads to run in parallel
- Auto-Balanced Cache (top queries + last queries)
All persistent variants are optimized for larger sized indexes under heavy workload. Almost every task will be streamlined to run in batch/parallel, getting the most out of the selected database engine. Whereas the InMemory index can't share their data between different nodes when running in a cluster, every persistent storage can handle this by default.
Examples Node.js:
- [nodejs-commonjs](../example/nodejs-commonjs):
- [basic-persistent](../example/nodejs-commonjs/basic-persistent)
- [document-persistent](../example/nodejs-commonjs/document-persistent)
- [nodejs-esm](../example/nodejs-esm):
- [basic-persistent](../example/nodejs-esm/basic-persistent)
- [document-persistent](../example/nodejs-esm/document-persistent)
Examples Browser:
- [browser-legacy](../example/browser-legacy):
- [basic-persistent](../example/browser-legacy/basic-persistent)
- [document-persistent](../example/browser-legacy/document-persistent)
- [browser-module](../example/browser-module):
- [basic-persistent](../example/browser-module/basic-persistent)
- [document-persistent](../example/browser-module/document-persistent)
```js
import FlexSearchIndex from "./index.d.ts";
import Database from "./db/indexeddb/index.js";
// create an index
const index = new FlexSearchIndex();
// create db instance with optional prefix
const db = new Database("my-store");
// mount and await before transfering data
await flexsearch.mount(db);
// update the index as usual
index.add(1, "content...");
index.update(2, "content...");
index.remove(3);
// changes are automatically committed by default
// when you need to wait for the task completion, then you
// can use the commit method explicitely:
await index.commit();
```
Alternatively mount a store by index creation:
```js
const index = new FlexSearchIndex({
db: new Storage("my-store")
});
// await for the db response before access the first time
await index.db;
// apply changes to the index
// ...
```
Query against a persistent storage just as usual:
```js
const result = await index.search("gulliver");
```
Auto-Commit is enabled by default and will process changes asynchronously in batch.
You can fully disable the auto-commit feature and perform them manually:
```js
const index = new FlexSearchIndex({
db: new Storage("my-store"),
commit: false
});
// update the index
index.add(1, "content...");
index.update(2, "content...");
index.remove(3);
// transfer all changes to the db
await index.commit();
```
You can call the commit method manually also when `commit: true` option was set.
### Benchmark
The benchmark was measured in "terms per second".
<table>
<tr>
<th align="left">Store</th>
<th>Add</th>
<th>Search 1</th>
<th>Search N</th>
<th>Replace</th>
<th>Remove</th>
<th>Not Found</th>
<th>Scaling</th>
</tr>
<tr>
<td></td>
<td align="right"><sub>terms per sec</sub></td>
<td align="right"><sub>terms per sec</sub></td>
<td align="right"><sub>terms per sec</sub></td>
<td align="right"><sub>terms per sec</sub></td>
<td align="right"><sub>terms per sec</sub></td>
<td align="right"><sub>terms per sec</sub></td>
<td></td>
</tr>
<!--
<tr>
<td align="left">Memory</td>
<td align="right">28,345,405</td>
<td align="right">65,180,102</td>
<td align="right">12,098,298</td>
<td align="right">19,099,981</td>
<td align="right">36,164,827</td>
<td align="right">143,369,175</td>
<td align="right">No</td>
</tr>
-->
<tr>
<td align="left">IndexedDB</td>
<td align="right">123,298</td>
<td align="right">83,823</td>
<td align="right">62,370</td>
<td align="right">57,410</td>
<td align="right">171,053</td>
<td align="right">425,744</td>
<td align="right">No</td>
</tr>
<tr>
<td align="left">Redis</td>
<td align="right">1,566,091</td>
<td align="right">201,534</td>
<td align="right">859,463</td>
<td align="right">117,013</td>
<td align="right">129,595</td>
<td align="right">875,526</td>
<td align="right">Yes</td>
</tr>
<tr>
<td align="left">Sqlite</td>
<td align="right">269,812</td>
<td align="right">29,627</td>
<td align="right">129,735</td>
<td align="right">174,445</td>
<td align="right">1,406,553</td>
<td align="right">122,566</td>
<td align="right">No</td>
</tr>
<tr>
<td align="left">Postgres</td>
<td align="right">354,894</td>
<td align="right">24,329</td>
<td align="right">76,189</td>
<td align="right">324,546</td>
<td align="right">3,702,647</td>
<td align="right">50,305</td>
<td align="right">Yes</td>
</tr>
<tr>
<td align="left">MongoDB</td>
<td align="right">515,938</td>
<td align="right">19,684</td>
<td align="right">81,558</td>
<td align="right">243,353</td>
<td align="right">485,192</td>
<td align="right">67,751</td>
<td align="right">Yes</td>
</tr>
<tr>
<td align="left">Clickhouse</td>
<td align="right">1,436,992</td>
<td align="right">11,507</td>
<td align="right">22,196</td>
<td align="right">931,026</td>
<td align="right">3,276,847</td>
<td align="right">16,644</td>
<td align="right">Yes</td>
</tr>
</table>
__Search 1:__ Single term query<br>
__Search N:__ Multi term query (Context-Search)
The benchmark was executed against a single client.

204
doc/resolver.md Normal file
View File

@@ -0,0 +1,204 @@
## Resolver
Retrieve an unresolved result:
```js
const raw = index.search("a short query", {
resolve: false
});
```
You can apply and chain different resolver methods to the raw result, e.g.:
```js
raw.and( ... )
.and( ... )
.boost(2)
.or( ... , ... )
.limit(100)
.xor( ... )
.not( ... )
// final resolve
.resolve({
limit: 10,
offset: 0,
enrich: true
});
```
The default resolver:
```js
const raw = index.search("a short query", {
resolve: false
});
const result = raw.resolve();
```
Or use declaration style:
```js
import Resolver from "./resolver.js";
const raw = new Resolver({
index: index,
query: "a short query"
});
const result = raw.resolve();
```
### Chainable Boolean Operations
The basic concept explained:
```js
// 1. get one or multiple unresolved results
const raw1 = index.search("a short query", {
resolve: false
});
const raw2 = index.search("another query", {
resolve: false,
boost: 2
});
// 2. apply and chain resolver operations
const raw3 = raw1.and(raw2, /* ... */);
// you can access the aggregated result by raw3.result
console.log("The aggregated result is:", raw3.result)
// apply further operations ...
// 3. resolve final result
const result = raw3.resolve({
limit: 100,
offset: 0
});
console.log("The final result is:", result)
```
Use inline queries:
```js
const result = index.search("further query", {
// set resolve to false on the first query
resolve: false,
boost: 2
})
.or( // union
index.search("a query")
.and( // intersection
index.search("another query", {
boost: 2
})
)
)
.not( // exclusion
index.search("some query")
)
// resolve the result
.resolve({
limit: 100,
offset: 0
});
```
```js
import Resolver from "./resolver.js";
const result = new Resolver({
index: index,
query: "further query",
boost: 2
})
.or({
and: [{ // inner expression
index: index,
query: "a query"
},{
index: index,
query: "another query",
boost: 2
}]
})
.not({ // exclusion
index: index,
query: "some query"
})
.resolve({
limit: 100,
offset: 0
});
```
When all queries are made against the same index, you can skip the index in every declaration followed after initially calling `new Resolve()`:
```js
import Resolver from "./resolver.js";
const result = new Resolver({
index: index,
query: "a query"
})
.and({ query: "another query", boost: 2 })
.or ({ query: "further query", boost: 2 })
.not({ query: "some query" })
.resolve(100);
```
<!--
### Custom Result Decoration
```js
import highlight from "./resolve/highlight.js";
import collapse from "./resolve/collapse.js";
const raw = index.search("a short query", {
resolve: false
});
// resolve result for display
const template = highlight(raw, {
wrapper: "<ul>$1</ul>",
item: "<li>$1</li>",
text: "$1",
highlight: "<b>$1</b>"
});
document.body.appendChild(template);
// resolve result for further processing
const result = collapse(raw);
```
Alternatively:
```js
const template = highlight(raw, {
wrapper: function(){
const wrapper = document.createElement("ul");
return wrapper;
},
item: function(wrapper){
const item = document.createElement("li");
wrapper.append(item);
},
text: function(item, content){
const node = document.createTextNode(content);
item.append(node);
},
highlight: function(item, content){
const node = document.createElement("b");
node.textContent = content;
item.append(node);
}
});
document.body.appendChild(template);
```
-->
### Custom Resolver
```js
function CustomResolver(raw){
// console.log(raw)
let output;
// generate output ...
return output;
}
const result = index.search("a short query", {
resolve: CustomResolver
});
```

View File

@@ -0,0 +1,62 @@
## Result Highlighting
Result highlighting could be just enabled when using Document-Index with enabled Data-Store. Also when you just want to add id-content-pairs you'll need to use a DocumentIndex for this feature (just define a simple document descriptor as shown below).
```js
// create the document index
const index = new Document({
document: {
store: true,
index: [{
field: "title",
tokenize: "forward",
encoder: Charset.LatinBalance
}]
}
});
// add data
index.add({
"id": 1,
"title": "Carmencita"
});
index.add({
"id": 2,
"title": "Le clown et ses chiens"
});
// perform a query
const result = index.search({
query: "karmen or clown or not found",
suggest: true,
// set enrich to true (required)
enrich: true,
// highlight template
// $1 is a placeholder for the matched partial
highlight: "<b>$1</b>"
});
```
The result will look like:
```js
[{
"field": "title",
"result": [{
"id": 1,
"doc": {
"id": 1,
"title": "Carmencita"
},
"highlight": "<b>Carmen</b>cita"
},{
"id": 2,
"doc": {
"id": 2,
"title": "Le clown et ses chiens"
},
"highlight": "Le <b>clown</b> et ses chiens"
}
]
}]
```

134
doc/worker.md Normal file
View File

@@ -0,0 +1,134 @@
## Extern Worker Configuration
When using Worker by __also__ assign custom functions to the options e.g.:
- Custom Encoder
- Custom Encoder methods (normalize, prepare, finalize)
- Custom Score (function)
- Custom Filter (function)
- Custom Fields (function)
... then you'll need to move your __field configuration__ into a file which exports the configuration as a `default` export. The field configuration is not the whole Document-Descriptor.
When not using custom functions in combination with Worker you can skip this part.
Since every field resolves into a dedicated Worker, also every field which includes custom functions should have their own configuration file accordingly.
Let's take this document descriptor:
```js
{
document: {
index: [{
// this is the field configuration
// ---->
field: "custom_field",
custom: function(data){
return "custom field content";
}
// <------
}]
}
};
```
The configuration which needs to be available as a default export is:
```js
{
field: "custom_field",
custom: function(data){
return "custom field content";
}
};
```
You're welcome to make some suggestions how to improve the handling of extern configuration.
### Example Node.js:
An extern configuration for one WorkerIndex, let's assume it is located in `./custom_field.js`:
```js
const { Charset } = require("flexsearch");
const { LatinSimple } = Charset;
// it requires a default export:
module.exports = {
encoder: LatinSimple,
tokenize: "forward",
// custom function:
custom: function(data){
return "custom field content";
}
};
```
Create Worker Index along the configuration above:
```js
const { Document } = require("flexsearch");
const flexsearch = new Document({
worker: true,
document: {
index: [{
// the field name needs to be set here
field: "custom_field",
// path to your config from above:
config: "./custom_field.js",
}]
}
});
```
### Browser (ESM)
An extern configuration for one WorkerIndex, let's assume it is located in `./custom_field.js`:
```js
import { Charset } from "./dist/flexsearch.bundle.module.min.js";
const { LatinSimple } = Charset;
// it requires a default export:
export default {
encoder: LatinSimple,
tokenize: "forward",
// custom function:
custom: function(data){
return "custom field content";
}
};
```
Create Worker Index with the configuration above:
```js
import { Document } from "./dist/flexsearch.bundle.module.min.js";
// you will need to await for the response!
const flexsearch = await new Document({
worker: true,
document: {
index: [{
// the field name needs to be set here
field: "custom_field",
// Absolute URL to your config from above:
config: "http://localhost/custom_field.js"
}]
}
});
```
Here it needs the __absolute URL__, because the WorkerIndex context is from type `Blob` and you can't use relative URLs starting from this context.
### Test Case
As a test the whole IMDB data collection was indexed, containing of:
JSON Documents: 9,273,132<br>
Fields: 83,458,188<br>
Tokens: 128,898,832<br>
The used index configuration has 2 fields (using bidirectional context of `depth: 1`), 1 custom field, 2 tags and a full datastore of all input json documents.
A non-Worker Document index requires 181 seconds to index all contents.<br>
The Worker index just takes 32 seconds to index them all, by processing every field and tag in parallel. For such large content it is a quite impressive result.
### CSP-friendly Worker (Browser)
When just using worker by passing the option `worker: true`, the worker will be created by code generation under the hood. This might have issues when using strict CSP settings.
You can overcome this issue by passing the filepath to the worker file like `worker: "./worker.js"`. The original worker file is located at `src/worker/worker.js`.

2
index.d.ts vendored
View File

@@ -209,8 +209,8 @@ declare module "flexsearch" {
}>;
type SearchResults =
DefaultSearchResults |
IntermediateSearchResults |
EnrichedSearchResults |
Resolver |
Promise<SearchResults>;
/**

View File

@@ -4,12 +4,6 @@
"name": "flexsearch-clickhouse",
"version": "0.1.0",
"main": "index.js",
"scripts": {},
"files": [
"index.js",
"README.md",
"LICENSE"
],
"dependencies": {
"clickhouse": "^2.6.0"
}

View File

@@ -4,12 +4,6 @@
"name": "flexsearch-mongodb",
"version": "0.1.0",
"main": "index.js",
"scripts": {},
"files": [
"index.js",
"README.md",
"LICENSE"
],
"dependencies": {
"mongodb": "^6.13.0"
}

View File

@@ -4,12 +4,6 @@
"name": "flexsearch-postgres",
"version": "0.1.0",
"main": "index.js",
"scripts": {},
"files": [
"index.js",
"README.md",
"LICENSE"
],
"dependencies": {
"pg-promise": "^11.10.2"
}

View File

@@ -4,12 +4,6 @@
"name": "flexsearch-redis",
"version": "0.1.0",
"main": "index.js",
"scripts": {},
"files": [
"index.js",
"README.md",
"LICENSE"
],
"dependencies": {
"redis": "^4.7.0"
}

View File

@@ -4,12 +4,6 @@
"name": "flexsearch-sqlite",
"version": "0.1.0",
"main": "index.js",
"scripts": {},
"files": [
"index.js",
"README.md",
"LICENSE"
],
"dependencies": {
"sqlite3": "^5.1.7"
}