1
0
mirror of https://github.com/nextapps-de/flexsearch.git synced 2025-09-03 10:53:41 +02:00

improved intersection strategy

This commit is contained in:
Thomas Wilkerling
2021-06-02 10:57:18 +02:00
parent 8b9be00251
commit d1491da259
12 changed files with 825 additions and 708 deletions

View File

@@ -128,7 +128,6 @@ These profiles are covering standard use cases. It is recommended to apply custo
- New memory-friendly strategy for indexes (switchable, saves up to 50% of memory for each index, slightly decrease performance)
- Better scoring calculation (one of the biggest concerns of the old implementation was that the order of arrays processed in the intersection has affected the order of relevance in the final result)
- Fix resolution (the resolution in the old implementation was not fully stretched through the whole range in some cases)
- Fix threshold (the threshold in the old implementation often has almost no effect, especially when using contextual index)
- Skip words (optionally, automatically skip words from the context chain which are too short)
- Hugely improves performance of long queries (up to 450x faster!) and also memory allocation (up to 250x less memory)
- New fast-update strategy (optionally, hugely improves performance of all updates and removals of indexed contents up to 2850x)
@@ -150,7 +149,6 @@ These profiles are covering standard use cases. It is recommended to apply custo
- Enhanced Field Search
- Improved sorting by relevance (score)
- Added Context Scoring (context index has its own resolution)
- Extern Stores
- Enhanced charset normalization
- Improved bundler (support for inline WebWorker)
@@ -166,7 +164,6 @@ A full configuration example for a context-based index:
var index = new Index({
tokenize: "strict",
resolution: 9,
threshold: 0,
minlength: 3,
optimize: "memory",
fastupdate: true,
@@ -174,13 +171,12 @@ var index = new Index({
context: {
depth: 1,
resolution: 3,
threshold: 0,
bidirectional: true
}
});
```
The parameters `resolution` and `threshold` could be also set independently for the contextual index also, e.g. set those values more aggressive on contextual index only.
The `resolution` could be set also for the contextual index.
A full configuration example for a document based index:
@@ -189,7 +185,6 @@ const index = new Document({
tokenize: "forward",
optimize: "memory",
resolution: 9,
threshold: 0,
cache: 100,
worker: true,
document: {
@@ -202,19 +197,16 @@ const index = new Document({
field: "title",
tokenize: "forward",
optimize: "memory",
resolution: 9,
threshold: 0
resolution: 9
},{
field: "content",
tokenize: "strict",
optimize: "memory",
resolution: 9,
threshold: 3,
minlength: 3,
context: {
depth: 1,
resolution: 3,
threshold: 0
resolution: 3
}
}]
}
@@ -452,19 +444,16 @@ const index = new Document({
field: "title",
tokenize: "forward",
optimize: "memory",
resolution: 9,
threshold: 0
resolution: 9
},{
field: "content",
tokenize: "strict",
optimize: "memory",
resolution: 9,
threshold: 3,
minlength: 3,
context: {
depth: 1,
resolution: 3,
threshold: 0
resolution: 3
}
}]
});
@@ -477,7 +466,6 @@ const index = new Document({
tokenize: "strict",
optimize: "memory",
resolution: 9,
threshold: 0,
document: {
key: "id",
index:[{
@@ -485,19 +473,17 @@ const index = new Document({
tokenize: "forward"
},{
field: "content",
threshold: 3,
minlength: 3,
context: {
depth: 1,
resolution: 3,
threshold: 0
resolution: 3
}
}]
}
});
```
Note: The context options from the field "content" also gets inherited by the corresponding field options, whereas this field options was inherited by the global option (so threshold would be "3" if not set in context options).
Note: The context options from the field "content" also gets inherited by the corresponding field options, whereas this field options was inherited by the global option.
### Nested Data Fields
@@ -1065,28 +1051,7 @@ Your results are now looking like:
Both field "author" and "email" are not indexed.
### Extern Stores
When the data already exist in your application runtime, then you did not need to add those to the store again. You can assign your data as "extern store":
```js
const data = [{ ... }, { ... }, { ... }];
const index = new Document({
document: {
index: "content",
extern: data // <--- extern store
}
});
index.add(data[0]);
```
> Entries from an extern store are not being managed/changed automatically by FlexSearch. When removing items from the index, the corresponding data item from the extern dataset stays untouched. Please consider, using the method `index.set(id, data)` will change extern stores also.
When you didn't use the data anywhere in your application (just for searching) then it is better to use an internal store and just select fields you need in the results, which costs you less memory.
## WebWorker
## Worker Parallelism (Browser + Node.js)
The whole worker implementation has changed by also keeping Node.js support in mind. The good news is worker will also get supported by Node.js by the library.
@@ -1217,7 +1182,21 @@ index.searchAsync(query).then(callback);
index.searchAsync(query).then(callback);
```
When using `await` you can prioritize the order ("first task completed") and solve requests one by one and process sub-tasks in parallel:
Or when you have just one callback when all requests are done, simply use `Promise.all()` which also prioritize "all tasks completed":
```js
Promise.all([
index.searchAsync(query).then(callback),
index.searchAsync(query).then(callback),
index.searchAsync(query).then(callback)
]).then(callback);
```
Inside the callback of `Promise.all()` you will also get an array of results as the first parameter respectively for each query you put into.
When using `await` you can prioritize the order (prioritize "first task completed") and solve requests one by one and just process the sub-tasks in parallel:
```js
await index.searchAsync(query);
@@ -1545,7 +1524,6 @@ index = new Index({
tokenize: "strict",
optimize: "memory",
resolution: 1,
threshold: 0,
minlength: 3,
fastupdate: false,
context: false
@@ -1555,28 +1533,25 @@ index = new Index({
### Compare Impact of Memory Allocation
by default a lexical index is very small:<br>
`depth: 0, bidirectional: 0, resolution: 3, threshold: 0, minlength: 0` => 2.1 Mb
`depth: 0, bidirectional: 0, resolution: 3, minlength: 0` => 2.1 Mb
a higher resolution will increase the memory allocation:<br>
`depth: 0, bidirectional: 0, resolution: 9, threshold: 0, minlength: 0` => 2.9 Mb
`depth: 0, bidirectional: 0, resolution: 9, minlength: 0` => 2.9 Mb
using the contextual index will increase the memory allocation:<br>
`depth: 1, bidirectional: 0, resolution: 9, threshold: 0, minlength: 0` => 12.5 Mb
`depth: 1, bidirectional: 0, resolution: 9, minlength: 0` => 12.5 Mb
a higher contextual depth will increase the memory allocation:<br>
`depth: 2, bidirectional: 0, resolution: 9, threshold: 0, minlength: 0` => 21.5 Mb
`depth: 2, bidirectional: 0, resolution: 9, minlength: 0` => 21.5 Mb
a higher minlength will decrease memory allocation:<br>
`depth: 2, bidirectional: 0, resolution: 9, threshold: 0, minlength: 3` => 19.0 Mb
`depth: 2, bidirectional: 0, resolution: 9, minlength: 3` => 19.0 Mb
using bidirectional will decrease memory allocation:<br>
`depth: 2, bidirectional: 1, resolution: 9, threshold: 0, minlength: 3` => 17.9 Mb
a higher threshold will decrease memory allocation:<br>
`depth: 2, bidirectional: 1, resolution: 9, threshold: 5, minlength: 3` => 5.8 Mb
`depth: 2, bidirectional: 1, resolution: 9, minlength: 3` => 17.9 Mb
enable the option "fastupdate" will increase memory allocation:<br>
`depth: 2, bidirectional: 1, resolution: 9, threshold: 5, minlength: 3` => 6.3 Mb
`depth: 2, bidirectional: 1, resolution: 9, minlength: 3` => 6.3 Mb
### Full Comparison Table
@@ -1608,14 +1583,6 @@ FlexSearch provides you many parameters you can use to adjust the optimal balanc
<td>+2 (per level)</td>
</tr>
<tr></tr>
<tr>
<td>threshold</td>
<td>-4 (per level)</td>
<td>-3 (per level)</td>
<td>+2 (per level)</td>
<td>0</td>
</tr>
<tr></tr>
<tr>
<td>depth</td>
<td>+4 (per level)</td>

File diff suppressed because one or more lines are too long

Before

Width:  |  Height:  |  Size: 16 KiB

After

Width:  |  Height:  |  Size: 16 KiB