# FlexSearch v0.8 (Preview)
```bash
npm install git+https://github.com/nextapps-de/flexsearch/tree/v0.8-preview
```
## What's New
- Persistent indexes support for: `IndexedDB` (Browser), `Redis`, `SQLite`, `Postgres`, `MongoDB`, `Clickhouse`
- Enhanced language customization via the new `Encoder` class
- Searching single terms is up to 7 times faster, the overall benchmark score was doubled
- Enhanced support for larger indexes or larger result sets
- Improved offset and limit processing achieve up to 100 times faster traversal performance through large datasets
- Support for larger In-Memory index with extended key size (the defaults maximum keystore limit is: 2^24)
- Greatly enhanced performance of the whole text encoding pipeline
- Improved indexing of numeric content (Triplets)
- Intermediate result sets and `Resolver`
- Basic Resolver: `and`, `or`, `xor`, `not`, `limit`, `offset`, `enrich`, `resolve`, Output formatter
- Improved charset collection
- New charset preset `soundex` which further reduces memory consumption by also increasing "fuzziness"
- Performance gain when polling tasks to the index by using "Event-Loop-Caches"
- Up to 100 times faster deletion/replacement when not using the additional "fastupdate" register
- Regex Pre-Compilation (transforms hundreds of regex rules into just a few)
- Extended support for multiple tags (DocumentIndex)
- Custom Fields ("Virtual Fields")
- Custom Filter
- Custom Score Function
- Added French language preset (stop-word filter, stemmer)
- Enhanced Worker Support
- Improved Build System + Bundler (Supported: CommonJS, ESM, Global Namespace)
- Full covering index.d.ts type definitions
Compare Benchmark: [0.7.0](https://nextapps-de.github.io/flexsearch/test/flexsearch-0.7.0/) vs. [0.8.0](https://nextapps-de.github.io/flexsearch/test/flexsearch-0.8.0/)
## Persistent Indexes
FlexSearch provides a new Storage Adapter where indexes are delegated through persistent storages.
Supported:
- [IndexedDB (Browser)](db/indexeddb/)
- [Redis](db/redis/)
- [SQLite](db/sqlite/)
- [Postgres](db/postgres/)
- [MongoDB](db/mongo/)
- [Clickhouse](db/clickhouse/)
The `.export()` and `.import()` methods are still available for non-persistent In-Memory indexes.
<<<<<<< HEAD
When it comes to raw search speed FlexSearch outperforms every single searching library out there and also provides flexible search capabilities like multi-field search, phonetic transformations or partial matching.
=======
All search capabilities are available on persistent indexes like:
- Context-Search
- Suggestions
- Cursor-based Queries (Limit/Offset)
- Scoring (supports a resolution of up to 32767 slots)
- Document-Search
- Partial Search
- Multi-Tag-Search
- Boost Fields
- Custom Encoder
- Resolver
- Tokenizer (Strict, Forward, Reverse, Full)
- Document Store (incl. enrich results)
- Worker Threads to run in parallel
- Auto-Balanced Cache (top queries + last queries)
>>>>>>> 7755e7d (bundle pre-release)
All persistent variants are optimized for larger sized indexes under heavy workload. Almost every task will be streamlined to run in batch/parallel, getting the most out of the selected database engine. Whereas the InMemory index can't share their data between different nodes when running in a cluster, every persistent storage can handle this by default.
### Example
```js
import FlexSearchIndex from "./index.js";
import Database from "./db/indexeddb/index.js";
// create an index
const index = new FlexSearchIndex();
// create db instance with optional prefix
const db = new Database("my-store");
// mount and await before transfering data
await flexsearch.mount(db);
// update the index as usual
index.add(1, "content...");
index.update(2, "content...");
index.remove(3);
// changes are automatically committed by default
// when you need to wait for the task completion, then you
// can use the commit method explicitely:
await index.commit();
```
Alternatively mount a store by index creation:
```js
const index = new FlexSearchIndex({
db: new Storage("my-store")
});
// await for the db response before access the first time
await index.db;
// apply changes to the index
// ...
```
Query against a persistent storage just as usual:
```js
const result = await index.search("gulliver");
```
Auto-Commit is enabled by default and will process changes asynchronously in batch.
You can fully disable the auto-commit feature and perform them manually:
```js
const index = new FlexSearchIndex({
db: new Storage("my-store"),
commit: false
});
// update the index
index.add(1, "content...");
index.update(2, "content...");
index.remove(3);
// transfer all changes to the db
await index.commit();
```
You can call the commit method manually also when `commit: true` option was set.
### Benchmark
The benchmark was measured in "terms per second".
Store |
Add |
Search 1 |
Search N |
Replace |
Remove |
Not Found |
Scaling |
|
terms per sec |
terms per sec |
terms per sec |
terms per sec |
terms per sec |
terms per sec |
|
IndexedDB |
123,298 |
83,823 |
62,370 |
57,410 |
171,053 |
425,744 |
No |
Redis |
1,566,091 |
201,534 |
859,463 |
117,013 |
129,595 |
875,526 |
Yes |
Sqlite |
269,812 |
29,627 |
129,735 |
174,445 |
1,406,553 |
122,566 |
No |
Postgres |
354,894 |
24,329 |
76,189 |
324,546 |
3,702,647 |
50,305 |
Yes |
MongoDB |
515,938 |
19,684 |
81,558 |
243,353 |
485,192 |
67,751 |
Yes |
Clickhouse |
1,436,992 |
11,507 |
22,196 |
931,026 |
3,276,847 |
16,644 |
Yes |
__Search 1:__ Single term query
__Search N:__ Multi term query (Context-Search)
The benchmark was executed against a single client.
## Encoder
Search capabilities highly depends on language processing. The old workflow wasn't really practicable. The new Encoder class is a huge improvement and fully replaces the encoding part. Some FlexSearch options was moved to the new `Encoder` instance.
New Encoding Pipeline:
1. charset normalization
2. custom preparation
3. split into terms (apply includes/excludes)
4. filter (pre-filter)
5. matcher (substitute terms)
6. stemmer (substitute term endings)
7. filter (post-filter)
8. replace chars (mapper)
9. custom regex (replacer)
10. letter deduplication
11. apply finalize
### Example
```js
const encoder = new Encoder({
normalize: true,
dedupe: true,
cache: true,
include: {
letter: true,
number: true,
symbol: false,
punctuation: false,
control: false,
char: "@"
}
});
```
```js
const encoder = new Encoder({
normalize: function(str){
return str.toLowerCase();
},
prepare: function(str){
return str.replace(/&/g, " and ");
},
exclude: {
letter: false,
number: false,
symbol: true,
punctuation: true,
control: true
}
});
```
Define language specific transformations:
```js
const encoder = new Encoder({
replacer: [
/[´`’ʼ]/g, "'"
],
filter: new Set([
"and",
]),
matcher: new Map([
["xvi", "16"]
]),
stemmer: new Map([
["ly", ""]
]),
mapper: new Map([
["é", "e"]
])
});
```
Or use predefined language and add custom options:
```js
import EnglishBookPreset from "./lang/en.js";
const encoder = new Encoder({
assign: EnglishBookPreset,
filter: false
});
```
Equivalent:
```js
import EnglishBookPreset from "./lang/en.js";
const encoder = new Encoder(EnglishBookPreset);
encoder.assign({ filter: false });
```
Assign extensions to the encoder instance:
```js
import LatinEncoder from "./lang/latin/simple.js";
import EnglishBookPreset from "./lang/en.js";
// stack definitions to the encoder instance
const encoder = new Encoder()
.assign(LatinEncoder)
.assign(EnglishBookPreset)
// override preset options ...
.assign({ minlength: 3 });
// assign further presets ...
```
Add custom transformations to an existing index:
```js
import LatinEncoder from "./lang/latin/default.js";
const encoder = new Encoder(LatinEncoder);
encoder.addReplacer(/[´`’ʼ]/g, "'");
encoder.addFilter("and");
encoder.addMatcher("xvi", "16");
encoder.addStemmer("ly", "");
encoder.addMapper("é", "e");
```
## Resolver
Retrieve an unresolved result:
```js
const raw = index.search("a short query", {
resolve: false
});
```
<<<<<<< HEAD
Library Comparison "Gulliver's Travels":
- Performance Benchmark
- Scoring Benchmark
- Memory Consumption
Plugins (extern projects):
- React: https://github.com/angeloashmore/react-use-flexsearch
- Vue: https://github.com/Noction/vue-use-flexsearch
- Gatsby: https://www.gatsbyjs.org/packages/gatsby-plugin-flexsearch/
### Get Latest
* The bundle "flexsearch.es5.js" includes polyfills for EcmaScript 5 Support.
#### Get Latest (NPM)
```cmd
npm install flexsearch
```
### Compare Web-Bundles
> The Node.js package includes all features from `flexsearch.bundle.js`.
## Performance Benchmark (Ranking)
Run Comparison: Performance Benchmark "Gulliver's Travels"
Operation per seconds, higher is better, except the test "Memory" on which lower is better.
|
Rank |
Library |
Memory |
Query (Single Term) |
Query (Multi Term) |
Query (Long) |
Query (Dupes) |
Query (Not Found) |
1 |
FlexSearch |
17 |
7084129 |
1586856 |
511585 |
2017142 |
3202006 |
2 |
JSii |
27 |
6564 |
158149 |
61290 |
95098 |
534109 |
3 |
Wade |
424 |
20471 |
78780 |
16693 |
225824 |
213754 |
4 |
JS Search |
193 |
8221 |
64034 |
10377 |
95830 |
167605 |
5 |
Elasticlunr.js |
646 |
5412 |
7573 |
2865 |
23786 |
13982 |
6 |
BulkSearch |
1021 |
3069 |
3141 |
3333 |
3265 |
21825569 |
7 |
MiniSearch |
24348 |
4406 |
10945 |
72 |
39989 |
17624 |
8 |
bm25 |
15719 |
1429 |
789 |
366 |
884 |
1823 |
9 |
Lunr.js |
2219 |
255 |
271 |
272 |
266 |
267 |
10 |
FuzzySearch |
157373 |
53 |
38 |
15 |
32 |
43 |
11 |
Fuse |
7641904 |
6 |
2 |
1 |
2 |
3 |
## Load Library
There are 3 types of indexes:
1. `Index` is a flat high performance index which stores id-content-pairs.
2. `Worker` / `WorkerIndex` is also a flat index which stores id-content-pairs but runs in background as a dedicated worker thread.
3. `Document` is multi-field index which can store complex JSON documents (could also exist of worker indexes).
The most of you probably need just one of them according to your scenario.
### Browser
#### Legacy ES5 Script Tag (Bundled)
```html
```
#### ESM/ES6 Modules:
```html
```
#### ESM/ES6 Bundled Module:
```html
```
Or via CDN:
```html
```
AMD / CommonJS:
```javascript
var FlexSearch = require("./node_modules/flexsearch/dist/flexsearch.bundle.min.js");
```
### Node.js
```npm
npm install flexsearch
```
In your code include as follows:
=======
You can apply and chain different resolver methods to the raw result, e.g.:
>>>>>>> 7755e7d (bundle pre-release)
```js
raw.and( ... )
.and( ... )
.boost(2)
.or( ... , ... )
.limit(100)
.xor( ... )
.not( ... )
// final resolve
.resolve({
limit: 10,
offset: 0,
enrich: true
});
```
The default resolver:
```js
const raw = index.search("a short query", {
resolve: false
});
const result = raw.resolve();
```
Or use declaration style:
```js
import Resolver from "./resolver.js";
const raw = new Resolver({
index: index,
query: "a short query"
});
const result = raw.resolve();
```
### Chainable Boolean Operations
The basic concept explained:
```js
// 1. get one or multiple unresolved results
const raw1 = index.search("a short query", {
resolve: false
});
const raw2 = index.search("another query", {
resolve: false,
boost: 2
});
// 2. apply and chain resolver operations
const raw3 = raw1.and(raw2, /* ... */);
// you can access the aggregated result by raw3.result
console.log("The aggregated result is:", raw3.result)
// apply further operations ...
// 3. resolve final result
const result = raw3.resolve({
limit: 100,
offset: 0
});
console.log("The final result is:", result)
```
Use inline queries:
```js
const result = index.search("further query", {
// set resolve to false on the first query
resolve: false,
boost: 2
})
.or( // union
index.search("a query")
.and( // intersection
index.search("another query", {
boost: 2
})
)
)
.not( // exclusion
index.search("some query")
)
// resolve the result
.resolve({
limit: 100,
offset: 0
});
```
Or use a fully declarative style (also recommended when run in parallel):
```js
import Resolver from "./resolver.js";
const result = new Resolver({
index: index,
query: "further query",
boost: 2
})
.or({
and: [{ // inner expression
index: index,
query: "a query"
},{
index: index,
query: "another query",
boost: 2
}]
})
.not({ // exclusion
index: index,
query: "some query"
})
.resolve({
limit: 100,
offset: 0
});
```
When all queries are made against the same index, you can skip the index in every declaration followed after initially calling `new Resolve()`:
```js
import Resolver from "./resolver.js";
const result = new Resolver({
index: index,
query: "a query"
})
.and({ query: "another query", boost: 2 })
.or ({ query: "further query", boost: 2 })
.not({ query: "some query" })
.resolve(100);
```
### Custom Result Decoration
```js
import highlight from "./resolve/highlight.js";
import collapse from "./resolve/collapse.js";
const raw = index.search("a short query", {
resolve: false
});
// resolve result for display
const template = highlight(raw, {
wrapper: "",
item: "$1",
text: "$1",
highlight: "$1"
});
document.body.appendChild(template);
// resolve result for further processing
const result = collapse(raw);
```
Alternatively:
```js
const template = highlight(raw, {
wrapper: function(){
const wrapper = document.createElement("ul");
return wrapper;
},
item: function(wrapper){
const item = document.createElement("li");
wrapper.append(item);
},
text: function(item, content){
const node = document.createTextNode(content);
item.append(node);
},
highlight: function(item, content){
const node = document.createElement("b");
node.textContent = content;
item.append(node);
}
});
document.body.appendChild(template);
```
### Custom Resolver
```js
function CustomResolver(raw){
// console.log(raw)
let output;
// generate output ...
return output;
}
const result = index.search("a short query", {
resolve: CustomResolver
});
```
## Big In-Memory Keystores
The default maximum keystore limit for the In-Memory index is 2^24 of distinct terms/partials being stored (so-called "cardinality"). An additional register could be enabled and is dividing the index into self-balanced partitions.
```js
const index = new FlexSearchIndex({
// e.g. set keystore range to 8-Bit:
// 2^8 * 2^24 = 2^32 keys total
keystore: 8
});
```
You can theoretically store up to 2^88 keys (64-Bit address range).
The internal ID arrays scales automatically when limit of 2^31 has reached by using Proxy.
> Persistent storages has no keystore limit by default. You should not enable keystore when using persistent indexes, as long as you do not stress the buffer too hard before calling `index.commit()`.
## Multi-Tag-Search
Assume this document schema (a dataset from IMDB):
```js
{
"tconst": "tt0000001",
"titleType": "short",
"primaryTitle": "Carmencita",
"originalTitle": "Carmencita",
"isAdult": 0,
"startYear": "1894",
"endYear": "",
"runtimeMinutes": "1",
"genres": [
"Documentary",
"Short"
]
}
```
An appropriate document descriptor could look like:
```js
import LatinEncoder from "./lang/latin/simple.js";
const flexsearch = new Document({
encoder: new Encoder(LatinEncoder),
resolution: 3,
document: {
id: "tconst",
//store: true, // document store
index: [{
field: "primaryTitle",
tokenize: "forward"
},{
field: "originalTitle",
tokenize: "forward"
}],
tag: [
"startYear",
"genres"
]
}
});
```
The field contents of `primaryTitle` and `originalTitle` are encoded by the forward tokenizer. The field contents of `startYear` and `genres` are added as tags.
Get all entries of a specific tag:
```js
const result = flexsearch.search({
//enrich: true, // enrich documents
tag: { "genres": "Documentary" },
limit: 1000,
offset: 0
});
```
Get entries of multiple tags (intersection):
```js
const result = flexsearch.search({
//enrich: true, // enrich documents
tag: {
"genres": ["Documentary", "Short"],
"startYear": "1894"
}
});
```
Combine tags with queries (intersection):
```js
const result = flexsearch.search({
query: "Carmen", // forward tokenizer
tag: {
"genres": ["Documentary", "Short"],
"startYear": "1894"
}
});
```
Alternative declaration:
```js
const result = flexsearch.search("Carmen", {
tag: [{
field: "genres",
tag: ["Documentary", "Short"]
},{
field: "startYear",
tag: "1894"
}]
});
```
## Filter Fields (Index / Tags / Datastore)
```js
const flexsearch = new Document({
document: {
id: "id",
index: [{
// custom field:
field: "somefield",
filter: function(data){
// return false to filter out
// return anything else to keep
return true;
}
}],
tag: [{
field: "city",
filter: function(data){
// return false to filter out
// return anything else to keep
return true;
}
}],
store: [{
field: "anotherfield",
filter: function(data){
// return false to filter out
// return anything else to keep
return true;
}
}]
}
});
```
## Custom Fields (Index / Tags / Datastore)
Dataset example:
```js
{
"id": 10001,
"firstname": "John",
"lastname": "Doe",
"city": "Berlin",
"street": "Alexanderplatz",
"number": "1a",
"postal": "10178"
}
```
You can apply custom fields derived from data or by anything else:
```js
const flexsearch = new Document({
document: {
id: "id",
index: [{
// custom field:
field: "fullname",
custom: function(data){
// return custom string
return data.firstname + " " +
data.lastname;
}
},{
// custom field:
field: "location",
custom: function(data){
return data.street + " " +
data.number + ", " +
data.postal + " " +
data.city;
}
}],
tag: [{
// existing field
field: "city"
},{
// custom field:
field: "category",
custom: function(data){
let tags = [];
// push one or multiple tags
// ....
return tags;
}
}],
store: [{
field: "anotherfield",
custom: function(data){
// return a falsy value to filter out
// return anything else as to keep in store
return data;
}
}]
}
});
```
> Filter is also available in custom functions when returning `false`.
Perform a query against the custom field as usual:
```js
const result = flexsearch.search({
query: "10178 Berlin Alexanderplatz",
field: "location"
});
```
```js
const result = flexsearch.search({
query: "john doe",
tag: { "city": "Berlin" }
});
```
## Custom Score Function
```js
const index = new FlexSearchIndex({
resolution: 10,
score: function(content, term, term_index, partial, partial_index){
// you'll need to return a number between 0 and "resolution"
// score is starting from 0, which is the highest score
// for a resolution of 10 you can return 0 - 9
// ...
return 3;
}
});
```
A common situation is you have some predefined labels which are related to some kind of order, e.g. the importance or priority. A priority label could be `high`, `moderate`, `low` so you can derive the scoring from those properties. Another example is when you have something already ordered and you would like to keep this order as relevance.
The parameters from the score function explained:
1. `content` is the whole content as an array of terms (encoded)
2. `term` is the current term which is actually processed (encoded)
3. `term_index` is the index of the term in the content array
4. `partial` is the current partial of a term which is actually processed
5. `partial_index` is the index position of the partial within the term
Partials params are empty when using tokenizer `strict`. Let's take an example by using the tokenizer `full`.
The content: "This is an ex[amp]()le of partial encoding"
The highlighting part marks the partial which is actually processed. Then your score function will called by passing these parameters:
```js
function score(content, term, term_index, partial, partial_index){
content = ["this", "is", "an", "example", "of", "partial", "encoding"]
term = "example"
term_index = 3
partial = "amp"
partial_index = 2
}
```
## Merge Document Results
By default, the result set of Field-Search has a structure grouped by field names:
```js
[{
field: "fieldname-1",
result: [{
id: 1001,
doc: {/* stored document */}
}]
},{
field: "fieldname-2",
result: [{
id: 1001,
doc: {/* stored document */}
}]
},{
field: "fieldname-3",
result: [{
id: 1002,
doc: {/* stored document */}
}]
}]
```
By passing the search option `merge: true` the result set will be merged into:
```js
[{
id: 1001,
doc: {/* stored document */}
field: ["fieldname-1", "fieldname-2"]
},{
id: 1002,
doc: {/* stored document */}
field: ["fieldname-3"]
}]
```
## Extern Worker Configuration
When using Worker by __also__ assign custom functions to the options e.g.:
- Custom Encoder
- Custom Encoder methods (normalize, prepare, finalize)
- Custom Score (function)
- Custom Filter (function)
- Custom Fields (function)
... then you'll need to move your __field configuration__ into a file which exports the configuration as a `default` export. The field configuration is not the whole Document-Descriptor.
When not using custom functions in combination with Worker you can skip this part.
Since every field resolves into a dedicated Worker, also every field which includes custom functions should have their own configuration file accordingly.
Let's take this document descriptor:
```js
{
document: {
index: [{
// this is the field configuration
// ---->
field: "custom_field",
custom: function(data){
return "custom field content";
}
// <------
}]
}
};
```
The configuration which needs to be available as a default export is:
<<<<<<< HEAD
> __Note:__ This feature is disabled by default because of its extended memory usage. Read here get more information about and how to enable.
FlexSearch introduce a new scoring mechanism called __Contextual Search__ which was invented by Thomas Wilkerling, the author of this library. A Contextual Search incredibly boost up queries to a complete new level but also requires some additional memory (depending on ___depth___).
The basic idea of this concept is to limit relevance by its context instead of calculating relevance through the whole distance of its corresponding document.
This way contextual search also improves the results of relevance-based queries on a large amount of text data.
## Enable Contextual Scoring
Create an index and use the default context:
=======
>>>>>>> 7755e7d (bundle pre-release)
```js
{
field: "custom_field",
custom: function(data){
return "custom field content";
}
};
```
You're welcome to make some suggestions how to improve the handling of extern configuration.
### Example Node.js:
An extern configuration for one WorkerIndex, let's assume it is located in `./custom_field.js`:
```js
const { Charset } = require("flexsearch");
const EncoderPreset = Charset["latin:simple"];
// it requires a default export:
module.exports = {
encoder: EncoderPreset,
tokenize: "forward",
// custom function:
custom: function(data){
return "custom field content";
}
};
```
Create Worker Index along the configuration above:
```js
const { Document } = require("flexsearch");
const flexsearch = new Document({
worker: true,
document: {
index: [{
// the field name needs to be set here
field: "custom_field",
// path to your config from above:
config: "./custom_field.js",
}]
}
});
```
### Browser (ESM)
An extern configuration for one WorkerIndex, let's assume it is located in `./custom_field.js`:
```js
import { Charset } from "./dist/flexsearch.bundle.module.min.js";
const EncoderPreset = Charset["latin:simple"];
// it requires a default export:
export default {
encoder: EncoderPreset,
tokenize: "forward",
// custom function:
custom: function(data){
return "custom field content";
}
};
```
Create Worker Index with the configuration above:
```js
import { Document } from "./dist/flexsearch.bundle.module.min.js";
// you will need to await for the response!
const flexsearch = await new Document({
worker: true,
document: {
index: [{
// the field name needs to be set here
field: "custom_field",
// Absolute URL to your config from above:
config: "http://localhost/custom_field.js"
}]
}
});
```
Here it needs the __absolute URL__, because the WorkerIndex context is from type `Blob` and you can't use relative URLs starting from this context.
### Test Case
As a test the whole IMDB data collection was indexed, containing of:
JSON Documents: 9,273,132
Fields: 83,458,188
Tokens: 128,898,832
The used index configuration has 2 fields (using bidirectional context of `depth: 1`), 1 custom field, 2 tags and a full datastore of all input json documents.
A non-Worker Document index requires 181 seconds to index all contents.
The Worker index just takes 32 seconds to index them all, by processing every field and tag in parallel. For such large content it is a quite impressive result.
## Fuzzy-Search
Fuzzysearch describes a basic concept of how making queries more tolerant. Something like Levinstein distance can't be added because of the core architecture. Instead, FlexSearch provides several methods to achieve fuzziness:
1. Use a tokenizer: `forward`, `reverse` or `full`
2. Don't forget to use any of the builtin encoder `simple` > `balanced` > `advanced` > `extra` > `soundex` (sorted by fuzziness)
3. Use one of the language specific presets e.g. `/lang/en.js` for en-US specific content
4. Enable suggestions by passing the search option `suggest: true`
Additionally, you can apply custom `Mapper`, `Replacer`, `Stemmer`, `Filter` or by assigning a custom `normalize` or `prepare` function to the Encoder.
### Compare Fuzzy-Search Encoding
Original term which was indexed: "Struldbrugs"
Encoder: |
exact |
default |
simple |
balance |
advanced |
extra |
soundex |
Index Size |
3.1 Mb |
1.9 Mb |
1.8 Mb |
1.7 Mb |
1.6 Mb |
1.1 Mb |
0.7 Mb |
Struldbrugs |
✓ |
✓ |
✓ |
✓ |
✓ |
✓ |
✓ |
struldbrugs |
|
✓ |
✓ |
✓ |
✓ |
✓ |
✓ |
strũldbrųĝgs |
|
|
✓ |
✓ |
✓ |
✓ |
✓ |
strultbrooks |
|
|
|
✓ |
✓ |
✓ |
✓ |
shtruhldbrohkz |
|
|
|
|
✓ |
✓ |
✓ |
zdroltbrykz |
|
|
|
|
|
✓ |
✓ |
struhlbrogger |
|
|
|
|
|
|
✓ |
The index size was measured after indexing the book "Gulliver's Travels".
### Custom Encoder
Since it is very simple to create a custom Encoder, you are welcome to create your own.
e.g.
```js
function customEncoder(content){
const tokens = [];
// split content into terms/tokens
// apply your changes to each term/token
// you will need to return an Array of terms/tokens
// so just iterate through the input string and
// push tokens to the array
// ...
return tokens;
}
const index = new Index({
// set to strict when your tokenization was already done
tokenize: "strict",
encode: customEncoder
});
```
If you get some good results please feel free to share your encoder.
## Load Library (Node.js, ESM, Legacy Browser)
> Do not use the "src" folder of this repo. It isn't meant to be used directly, instead it needs compilation. You can easily perform a custom build, but don't use the source folder for production. You will need at least any kind of compiler which resolve the compiler flags within the code. The "dist" folder is containing every version which you probably need including unminified ESM modules.
```bash
npm install flexsearch
```
The **_dist_** folder are located in: `node_modules/flexsearch/dist/`
> All debug versions are providing debug information through the console and gives you helpful advices on certain situations. Do not use them in production, since they are special builds containing extra debugging processes which noticeably reduce performance.
The abbreviations used at the end of the filenames indicates:
- `bundle` All features included, FlexSearch is available on `window.FlexSearch`
- `light` Only basic features are included, FlexSearch is available on `window.FlexSearch`
- `es5` bundle has support for EcmaScript5, FlexSearch is available on `window.FlexSearch`
- `module` indicates that this bundle is a Javascript module (ESM), FlexSearch members are available by `import { Index, Document, Worker, Encoder, Charset } from "./flexsearch.bundle.module.min.js"` or alternatively using the default export `import FlexSearch from "./flexsearch.bundle.module.min.js"`
- `min` bundle is minified
- `debug` bundle has enabled debug mode and contains additional code just for debugging purposes (do not use for production)
### Non-Module Bundles (ES5 Legacy)
> Non-Module Bundles export all their features to the public namespace "FlexSearch" e.g. `window.FlexSearch.Index` or `window.FlexSearch.Document`.
Load the bundle by a script tag:
```html
```
### Module (ESM)
When using modules you can choose from 2 variants: `flexsearch.xxx.module.min.js` has all features bundled ready for production, whereas the folder `/dist/module/` export all the features in the same structure as the source code but here compiler flags was resolved.
Also, for each variant there exist:
1. A debug version for the development
2. A pre-compiled minified version for production
Use the bundled version exported as a module (default export):
```html
```
Or import FlexSearch members separately by:
```html
```
Use non-bundled modules:
```html
```
Also, pre-compiled non-bundled production-ready modules are located in `dist/module-min/`, whereas the debug version is located at `dist/module-debug/`.
You can also load modules via CDN:
```html
```
### Node.js
Install FlexSearch via NPM:
```npm
npm install flexsearch
```
```js
const { Index, Document, Encoder } = require("flexsearch");
const index = new Index(/* ... */);
```
When you are using ESM in Node.js then just use the Modules explained one section above.
## Migration
- The index option property "minlength" has moved to the Encoder Class
- The index option flag "optimize" was removed
- The index option flag "lang" was replaced by the Encoder Class `.assign()`
- Boost cannot apply upfront anymore when indexing, instead you can use the boost property on a query dynamically
- All definitions of the old text encoding process was replaced by similar definitions (Array changed to Set, Object changed to Map). You can use of the helper methods like `.addMatcher(char_match, char_replace)` which adds everything properly.
- The default value for `fastupdate` is set to `false` by default when not passed via options
- The method `index.encode()` has moved to `index.encoder.encode()`
- The options `charset` and `lang` was removed from index (replaced by `Encoder.assign({...})`)
- Every charset collection (files in folder `/lang/**.js`) is now exported as a config object (instead of a function). This config needs to be created by passing to the constructor `new Encoder(config)` or can be added to an existing instance via `encoder.assign(config)`. The reason was to keep the default encoder configuration when having multiple document indexes.
- The property `bool` from DocumentOptions was removed (replaced by `Resolver`)
- The static methods `FlexSearch.registerCharset()` and `FlexSearch.registerLanguage()` was removed, those collections are now exported to `FlexSearch.Charset` and `FlexSearch.Language` which can be accessed as module `import { Charset, Language } from "flexsearch"`