# FlexSearch v0.8 (Preview) ```bash npm install git+https://github.com/nextapps-de/flexsearch/tree/v0.8-preview ``` ## What's New - Persistent indexes support for: `IndexedDB` (Browser), `Redis`, `SQLite`, `Postgres`, `MongoDB`, `Clickhouse` - Enhanced language customization via the new `Encoder` class - Searching single terms is up to 7 times faster, the overall benchmark score was doubled - Enhanced support for larger indexes or larger result sets - Improved offset and limit processing achieve up to 100 times faster traversal performance through large datasets - Support for larger In-Memory index with extended key size (the defaults maximum keystore limit is: 2^24) - Greatly enhanced performance of the whole text encoding pipeline - Improved indexing of numeric content (Triplets) - Intermediate result sets and `Resolver` - Basic Resolver: `and`, `or`, `xor`, `not`, `limit`, `offset`, `enrich`, `resolve`, Output formatter - Improved charset collection - New charset preset `soundex` which further reduces memory consumption by also increasing "fuzziness" - Performance gain when polling tasks to the index by using "Event-Loop-Caches" - Up to 100 times faster deletion/replacement when not using the additional "fastupdate" register - Regex Pre-Compilation (transforms hundreds of regex rules into just a few) - Extended support for multiple tags (DocumentIndex) - Custom Fields ("Virtual Fields") - Custom Filter - Custom Score Function - Added French language preset (stop-word filter, stemmer) - Enhanced Worker Support - Improved Build System + Bundler (Supported: CommonJS, ESM, Global Namespace) - Full covering index.d.ts type definitions Compare Benchmark: [0.7.0](https://nextapps-de.github.io/flexsearch/test/flexsearch-0.7.0/) vs. [0.8.0](https://nextapps-de.github.io/flexsearch/test/flexsearch-0.8.0/) ## Persistent Indexes FlexSearch provides a new Storage Adapter where indexes are delegated through persistent storages. Supported: - [IndexedDB (Browser)](db/indexeddb/) - [Redis](db/redis/) - [SQLite](db/sqlite/) - [Postgres](db/postgres/) - [MongoDB](db/mongo/) - [Clickhouse](db/clickhouse/) The `.export()` and `.import()` methods are still available for non-persistent In-Memory indexes. <<<<<<< HEAD When it comes to raw search speed FlexSearch outperforms every single searching library out there and also provides flexible search capabilities like multi-field search, phonetic transformations or partial matching. ======= All search capabilities are available on persistent indexes like: - Context-Search - Suggestions - Cursor-based Queries (Limit/Offset) - Scoring (supports a resolution of up to 32767 slots) - Document-Search - Partial Search - Multi-Tag-Search - Boost Fields - Custom Encoder - Resolver - Tokenizer (Strict, Forward, Reverse, Full) - Document Store (incl. enrich results) - Worker Threads to run in parallel - Auto-Balanced Cache (top queries + last queries) >>>>>>> 7755e7d (bundle pre-release) All persistent variants are optimized for larger sized indexes under heavy workload. Almost every task will be streamlined to run in batch/parallel, getting the most out of the selected database engine. Whereas the InMemory index can't share their data between different nodes when running in a cluster, every persistent storage can handle this by default. ### Example ```js import FlexSearchIndex from "./index.js"; import Database from "./db/indexeddb/index.js"; // create an index const index = new FlexSearchIndex(); // create db instance with optional prefix const db = new Database("my-store"); // mount and await before transfering data await flexsearch.mount(db); // update the index as usual index.add(1, "content..."); index.update(2, "content..."); index.remove(3); // changes are automatically committed by default // when you need to wait for the task completion, then you // can use the commit method explicitely: await index.commit(); ``` Alternatively mount a store by index creation: ```js const index = new FlexSearchIndex({ db: new Storage("my-store") }); // await for the db response before access the first time await index.db; // apply changes to the index // ... ``` Query against a persistent storage just as usual: ```js const result = await index.search("gulliver"); ``` Auto-Commit is enabled by default and will process changes asynchronously in batch. You can fully disable the auto-commit feature and perform them manually: ```js const index = new FlexSearchIndex({ db: new Storage("my-store"), commit: false }); // update the index index.add(1, "content..."); index.update(2, "content..."); index.remove(3); // transfer all changes to the db await index.commit(); ``` You can call the commit method manually also when `commit: true` option was set. ### Benchmark The benchmark was measured in "terms per second".

Store	Add	Search 1	Search N	Replace	Remove	Not Found	Scaling
	_{terms per sec}	_{terms per sec}	_{terms per sec}	_{terms per sec}	_{terms per sec}	_{terms per sec}
IndexedDB	123,298	83,823	62,370	57,410	171,053	425,744	No
Redis	1,566,091	201,534	859,463	117,013	129,595	875,526	Yes
Sqlite	269,812	29,627	129,735	174,445	1,406,553	122,566	No
Postgres	354,894	24,329	76,189	324,546	3,702,647	50,305	Yes
MongoDB	515,938	19,684	81,558	243,353	485,192	67,751	Yes
Clickhouse	1,436,992	11,507	22,196	931,026	3,276,847	16,644	Yes

__Search 1:__ Single term query
__Search N:__ Multi term query (Context-Search) The benchmark was executed against a single client. ## Encoder Search capabilities highly depends on language processing. The old workflow wasn't really practicable. The new Encoder class is a huge improvement and fully replaces the encoding part. Some FlexSearch options was moved to the new `Encoder` instance. New Encoding Pipeline: 1. charset normalization 2. custom preparation 3. split into terms (apply includes/excludes) 4. filter (pre-filter) 5. matcher (substitute terms) 6. stemmer (substitute term endings) 7. filter (post-filter) 8. replace chars (mapper) 9. custom regex (replacer) 10. letter deduplication 11. apply finalize ### Example ```js const encoder = new Encoder({ normalize: true, dedupe: true, cache: true, include: { letter: true, number: true, symbol: false, punctuation: false, control: false, char: "@" } }); ``` ```js const encoder = new Encoder({ normalize: function(str){ return str.toLowerCase(); }, prepare: function(str){ return str.replace(/&/g, " and "); }, exclude: { letter: false, number: false, symbol: true, punctuation: true, control: true } }); ``` Define language specific transformations: ```js const encoder = new Encoder({ replacer: [ /[´`’ʼ]/g, "'" ], filter: new Set([ "and", ]), matcher: new Map([ ["xvi", "16"] ]), stemmer: new Map([ ["ly", ""] ]), mapper: new Map([ ["é", "e"] ]) }); ``` Or use predefined language and add custom options: ```js import EnglishBookPreset from "./lang/en.js"; const encoder = new Encoder({ assign: EnglishBookPreset, filter: false }); ``` Equivalent: ```js import EnglishBookPreset from "./lang/en.js"; const encoder = new Encoder(EnglishBookPreset); encoder.assign({ filter: false }); ``` Assign extensions to the encoder instance: ```js import LatinEncoder from "./lang/latin/simple.js"; import EnglishBookPreset from "./lang/en.js"; // stack definitions to the encoder instance const encoder = new Encoder() .assign(LatinEncoder) .assign(EnglishBookPreset) // override preset options ... .assign({ minlength: 3 }); // assign further presets ... ``` Add custom transformations to an existing index: ```js import LatinEncoder from "./lang/latin/default.js"; const encoder = new Encoder(LatinEncoder); encoder.addReplacer(/[´`’ʼ]/g, "'"); encoder.addFilter("and"); encoder.addMatcher("xvi", "16"); encoder.addStemmer("ly", ""); encoder.addMapper("é", "e"); ``` ## Resolver Retrieve an unresolved result: ```js const raw = index.search("a short query", { resolve: false }); ``` <<<<<<< HEAD Library Comparison "Gulliver's Travels": - Performance Benchmark - Scoring Benchmark - Memory Consumption Plugins (extern projects): - React: https://github.com/angeloashmore/react-use-flexsearch - Vue: https://github.com/Noction/vue-use-flexsearch - Gatsby: https://www.gatsbyjs.org/packages/gatsby-plugin-flexsearch/ ### Get Latest


Build	File	CDN
flexsearch.bundle.js	Download	https://rawcdn.githack.com/nextapps-de/flexsearch/0.7.31/dist/flexsearch.bundle.js
flexsearch.light.js	Download	https://rawcdn.githack.com/nextapps-de/flexsearch/0.7.31/dist/flexsearch.light.js
flexsearch.compact.js	Download	https://rawcdn.githack.com/nextapps-de/flexsearch/0.7.31/dist/flexsearch.compact.js
flexsearch.es5.js *	Download	https://rawcdn.githack.com/nextapps-de/flexsearch/0.7.31/dist/flexsearch.es5.js
ES6 Modules	Download	The /dist/module/ folder of this Github repository

* The bundle "flexsearch.es5.js" includes polyfills for EcmaScript 5 Support. #### Get Latest (NPM) ```cmd npm install flexsearch ``` ### Compare Web-Bundles > The Node.js package includes all features from `flexsearch.bundle.js`.


Feature	flexsearch.bundle.js	flexsearch.compact.js	flexsearch.light.js
Presets	✓	✓	-
Async Search	✓	✓	-
Workers (Web + Node.js)	✓	-	-
Contextual Indexes	✓	✓	✓
Index Documents (Field-Search)	✓	✓	-
Document Store	✓	✓	-
Partial Matching	✓	✓	✓
Relevance Scoring	✓	✓	✓
Auto-Balanced Cache by Popularity	✓	-	-
Tags	✓	-	-
Suggestions	✓	✓	-
Phonetic Matching	✓	✓	-
Customizable Charset/Language (Matcher, Encoder, Tokenizer, Stemmer, Filter, Split, RTL)	✓	✓	✓
Export / Import Indexes	✓	-	-
File Size (gzip)	6.8 kb	5.3 kb	2.9 kb

## Performance Benchmark (Ranking) Run Comparison: Performance Benchmark "Gulliver's Travels" Operation per seconds, higher is better, except the test "Memory" on which lower is better.


Rank	Library	Memory	Query (Single Term)	Query (Multi Term)	Query (Long)	Query (Dupes)	Query (Not Found)
1	FlexSearch	17	7084129	1586856	511585	2017142	3202006
2	JSii	27	6564	158149	61290	95098	534109
3	Wade	424	20471	78780	16693	225824	213754
4	JS Search	193	8221	64034	10377	95830	167605
5	Elasticlunr.js	646	5412	7573	2865	23786	13982
6	BulkSearch	1021	3069	3141	3333	3265	21825569
7	MiniSearch	24348	4406	10945	72	39989	17624
8	bm25	15719	1429	789	366	884	1823
9	Lunr.js	2219	255	271	272	266	267
10	FuzzySearch	157373	53	38	15	32	43
11	Fuse	7641904	6	2	1	2	3

## Load Library There are 3 types of indexes: 1. `Index` is a flat high performance index which stores id-content-pairs. 2. `Worker` / `WorkerIndex` is also a flat index which stores id-content-pairs but runs in background as a dedicated worker thread. 3. `Document` is multi-field index which can store complex JSON documents (could also exist of worker indexes). The most of you probably need just one of them according to your scenario. ### Browser #### Legacy ES5 Script Tag (Bundled) ```html ``` #### ESM/ES6 Modules: ```html ``` #### ESM/ES6 Bundled Module: ```html ``` Or via CDN: ```html ``` AMD / CommonJS: ```javascript var FlexSearch = require("./node_modules/flexsearch/dist/flexsearch.bundle.min.js"); ``` ### Node.js ```npm npm install flexsearch ``` In your code include as follows: ======= You can apply and chain different resolver methods to the raw result, e.g.: >>>>>>> 7755e7d (bundle pre-release) ```js raw.and( ... ) .and( ... ) .boost(2) .or( ... , ... ) .limit(100) .xor( ... ) .not( ... ) // final resolve .resolve({ limit: 10, offset: 0, enrich: true }); ``` The default resolver: ```js const raw = index.search("a short query", { resolve: false }); const result = raw.resolve(); ``` Or use declaration style: ```js import Resolver from "./resolver.js"; const raw = new Resolver({ index: index, query: "a short query" }); const result = raw.resolve(); ``` ### Chainable Boolean Operations The basic concept explained: ```js // 1. get one or multiple unresolved results const raw1 = index.search("a short query", { resolve: false }); const raw2 = index.search("another query", { resolve: false, boost: 2 }); // 2. apply and chain resolver operations const raw3 = raw1.and(raw2, /* ... */); // you can access the aggregated result by raw3.result console.log("The aggregated result is:", raw3.result) // apply further operations ... // 3. resolve final result const result = raw3.resolve({ limit: 100, offset: 0 }); console.log("The final result is:", result) ``` Use inline queries: ```js const result = index.search("further query", { // set resolve to false on the first query resolve: false, boost: 2 }) .or( // union index.search("a query") .and( // intersection index.search("another query", { boost: 2 }) ) ) .not( // exclusion index.search("some query") ) // resolve the result .resolve({ limit: 100, offset: 0 }); ``` Or use a fully declarative style (also recommended when run in parallel): ```js import Resolver from "./resolver.js"; const result = new Resolver({ index: index, query: "further query", boost: 2 }) .or({ and: [{ // inner expression index: index, query: "a query" },{ index: index, query: "another query", boost: 2 }] }) .not({ // exclusion index: index, query: "some query" }) .resolve({ limit: 100, offset: 0 }); ``` When all queries are made against the same index, you can skip the index in every declaration followed after initially calling `new Resolve()`: ```js import Resolver from "./resolver.js"; const result = new Resolver({ index: index, query: "a query" }) .and({ query: "another query", boost: 2 }) .or ({ query: "further query", boost: 2 }) .not({ query: "some query" }) .resolve(100); ``` ### Custom Result Decoration ```js import highlight from "./resolve/highlight.js"; import collapse from "./resolve/collapse.js"; const raw = index.search("a short query", { resolve: false }); // resolve result for display const template = highlight(raw, { wrapper: "

$1", item: "

", text: "$1", highlight: "$1" }); document.body.appendChild(template); // resolve result for further processing const result = collapse(raw); ``` Alternatively: ```js const template = highlight(raw, { wrapper: function(){ const wrapper = document.createElement("ul"); return wrapper; }, item: function(wrapper){ const item = document.createElement("li"); wrapper.append(item); }, text: function(item, content){ const node = document.createTextNode(content); item.append(node); }, highlight: function(item, content){ const node = document.createElement("b"); node.textContent = content; item.append(node); } }); document.body.appendChild(template); ``` ### Custom Resolver ```js function CustomResolver(raw){ // console.log(raw) let output; // generate output ... return output; } const result = index.search("a short query", { resolve: CustomResolver }); ``` ## Big In-Memory Keystores The default maximum keystore limit for the In-Memory index is 2^24 of distinct terms/partials being stored (so-called "cardinality"). An additional register could be enabled and is dividing the index into self-balanced partitions. ```js const index = new FlexSearchIndex({ // e.g. set keystore range to 8-Bit: // 2^8 * 2^24 = 2^32 keys total keystore: 8 }); ``` You can theoretically store up to 2^88 keys (64-Bit address range). The internal ID arrays scales automatically when limit of 2^31 has reached by using Proxy. > Persistent storages has no keystore limit by default. You should not enable keystore when using persistent indexes, as long as you do not stress the buffer too hard before calling `index.commit()`. ## Multi-Tag-Search Assume this document schema (a dataset from IMDB): ```js { "tconst": "tt0000001", "titleType": "short", "primaryTitle": "Carmencita", "originalTitle": "Carmencita", "isAdult": 0, "startYear": "1894", "endYear": "", "runtimeMinutes": "1", "genres": [ "Documentary", "Short" ] } ``` An appropriate document descriptor could look like: ```js import LatinEncoder from "./lang/latin/simple.js"; const flexsearch = new Document({ encoder: new Encoder(LatinEncoder), resolution: 3, document: { id: "tconst", //store: true, // document store index: [{ field: "primaryTitle", tokenize: "forward" },{ field: "originalTitle", tokenize: "forward" }], tag: [ "startYear", "genres" ] } }); ``` The field contents of `primaryTitle` and `originalTitle` are encoded by the forward tokenizer. The field contents of `startYear` and `genres` are added as tags. Get all entries of a specific tag: ```js const result = flexsearch.search({ //enrich: true, // enrich documents tag: { "genres": "Documentary" }, limit: 1000, offset: 0 }); ``` Get entries of multiple tags (intersection): ```js const result = flexsearch.search({ //enrich: true, // enrich documents tag: { "genres": ["Documentary", "Short"], "startYear": "1894" } }); ``` Combine tags with queries (intersection): ```js const result = flexsearch.search({ query: "Carmen", // forward tokenizer tag: { "genres": ["Documentary", "Short"], "startYear": "1894" } }); ``` Alternative declaration: ```js const result = flexsearch.search("Carmen", { tag: [{ field: "genres", tag: ["Documentary", "Short"] },{ field: "startYear", tag: "1894" }] }); ``` ## Filter Fields (Index / Tags / Datastore) ```js const flexsearch = new Document({ document: { id: "id", index: [{ // custom field: field: "somefield", filter: function(data){ // return false to filter out // return anything else to keep return true; } }], tag: [{ field: "city", filter: function(data){ // return false to filter out // return anything else to keep return true; } }], store: [{ field: "anotherfield", filter: function(data){ // return false to filter out // return anything else to keep return true; } }] } }); ``` ## Custom Fields (Index / Tags / Datastore) Dataset example: ```js { "id": 10001, "firstname": "John", "lastname": "Doe", "city": "Berlin", "street": "Alexanderplatz", "number": "1a", "postal": "10178" } ``` You can apply custom fields derived from data or by anything else: ```js const flexsearch = new Document({ document: { id: "id", index: [{ // custom field: field: "fullname", custom: function(data){ // return custom string return data.firstname + " " + data.lastname; } },{ // custom field: field: "location", custom: function(data){ return data.street + " " + data.number + ", " + data.postal + " " + data.city; } }], tag: [{ // existing field field: "city" },{ // custom field: field: "category", custom: function(data){ let tags = []; // push one or multiple tags // .... return tags; } }], store: [{ field: "anotherfield", custom: function(data){ // return a falsy value to filter out // return anything else as to keep in store return data; } }] } }); ``` > Filter is also available in custom functions when returning `false`. Perform a query against the custom field as usual: ```js const result = flexsearch.search({ query: "10178 Berlin Alexanderplatz", field: "location" }); ``` ```js const result = flexsearch.search({ query: "john doe", tag: { "city": "Berlin" } }); ``` ## Custom Score Function ```js const index = new FlexSearchIndex({ resolution: 10, score: function(content, term, term_index, partial, partial_index){ // you'll need to return a number between 0 and "resolution" // score is starting from 0, which is the highest score // for a resolution of 10 you can return 0 - 9 // ... return 3; } }); ``` A common situation is you have some predefined labels which are related to some kind of order, e.g. the importance or priority. A priority label could be `high`, `moderate`, `low` so you can derive the scoring from those properties. Another example is when you have something already ordered and you would like to keep this order as relevance. The parameters from the score function explained: 1. `content` is the whole content as an array of terms (encoded) 2. `term` is the current term which is actually processed (encoded) 3. `term_index` is the index of the term in the content array 4. `partial` is the current partial of a term which is actually processed 5. `partial_index` is the index position of the partial within the term Partials params are empty when using tokenizer `strict`. Let's take an example by using the tokenizer `full`. The content: "This is an ex[amp]()le of partial encoding"
The highlighting part marks the partial which is actually processed. Then your score function will called by passing these parameters: ```js function score(content, term, term_index, partial, partial_index){ content = ["this", "is", "an", "example", "of", "partial", "encoding"] term = "example" term_index = 3 partial = "amp" partial_index = 2 } ``` ## Merge Document Results By default, the result set of Field-Search has a structure grouped by field names: ```js [{ field: "fieldname-1", result: [{ id: 1001, doc: {/* stored document */} }] },{ field: "fieldname-2", result: [{ id: 1001, doc: {/* stored document */} }] },{ field: "fieldname-3", result: [{ id: 1002, doc: {/* stored document */} }] }] ``` By passing the search option `merge: true` the result set will be merged into: ```js [{ id: 1001, doc: {/* stored document */} field: ["fieldname-1", "fieldname-2"] },{ id: 1002, doc: {/* stored document */} field: ["fieldname-3"] }] ``` ## Extern Worker Configuration When using Worker by __also__ assign custom functions to the options e.g.: - Custom Encoder - Custom Encoder methods (normalize, prepare, finalize) - Custom Score (function) - Custom Filter (function) - Custom Fields (function) ... then you'll need to move your __field configuration__ into a file which exports the configuration as a `default` export. The field configuration is not the whole Document-Descriptor. When not using custom functions in combination with Worker you can skip this part. Since every field resolves into a dedicated Worker, also every field which includes custom functions should have their own configuration file accordingly. Let's take this document descriptor: ```js { document: { index: [{ // this is the field configuration // ----> field: "custom_field", custom: function(data){ return "custom field content"; } // <------ }] } }; ``` The configuration which needs to be available as a default export is: <<<<<<< HEAD > __Note:__ This feature is disabled by default because of its extended memory usage. Read here get more information about and how to enable. FlexSearch introduce a new scoring mechanism called __Contextual Search__ which was invented by Thomas Wilkerling, the author of this library. A Contextual Search incredibly boost up queries to a complete new level but also requires some additional memory (depending on ___depth___). The basic idea of this concept is to limit relevance by its context instead of calculating relevance through the whole distance of its corresponding document. This way contextual search also improves the results of relevance-based queries on a large amount of text data.

## Enable Contextual Scoring Create an index and use the default context: ======= >>>>>>> 7755e7d (bundle pre-release) ```js { field: "custom_field", custom: function(data){ return "custom field content"; } }; ``` You're welcome to make some suggestions how to improve the handling of extern configuration. ### Example Node.js: An extern configuration for one WorkerIndex, let's assume it is located in `./custom_field.js`: ```js const { Charset } = require("flexsearch"); const EncoderPreset = Charset["latin:simple"]; // it requires a default export: module.exports = { encoder: EncoderPreset, tokenize: "forward", // custom function: custom: function(data){ return "custom field content"; } }; ``` Create Worker Index along the configuration above: ```js const { Document } = require("flexsearch"); const flexsearch = new Document({ worker: true, document: { index: [{ // the field name needs to be set here field: "custom_field", // path to your config from above: config: "./custom_field.js", }] } }); ``` ### Browser (ESM) An extern configuration for one WorkerIndex, let's assume it is located in `./custom_field.js`: ```js import { Charset } from "./dist/flexsearch.bundle.module.min.js"; const EncoderPreset = Charset["latin:simple"]; // it requires a default export: export default { encoder: EncoderPreset, tokenize: "forward", // custom function: custom: function(data){ return "custom field content"; } }; ``` Create Worker Index with the configuration above: ```js import { Document } from "./dist/flexsearch.bundle.module.min.js"; // you will need to await for the response! const flexsearch = await new Document({ worker: true, document: { index: [{ // the field name needs to be set here field: "custom_field", // Absolute URL to your config from above: config: "http://localhost/custom_field.js" }] } }); ``` Here it needs the __absolute URL__, because the WorkerIndex context is from type `Blob` and you can't use relative URLs starting from this context. ### Test Case As a test the whole IMDB data collection was indexed, containing of: JSON Documents: 9,273,132
Fields: 83,458,188
Tokens: 128,898,832
The used index configuration has 2 fields (using bidirectional context of `depth: 1`), 1 custom field, 2 tags and a full datastore of all input json documents. A non-Worker Document index requires 181 seconds to index all contents.
The Worker index just takes 32 seconds to index them all, by processing every field and tag in parallel. For such large content it is a quite impressive result. ## Fuzzy-Search Fuzzysearch describes a basic concept of how making queries more tolerant. Something like Levinstein distance can't be added because of the core architecture. Instead, FlexSearch provides several methods to achieve fuzziness: 1. Use a tokenizer: `forward`, `reverse` or `full` 2. Don't forget to use any of the builtin encoder `simple` > `balanced` > `advanced` > `extra` > `soundex` (sorted by fuzziness) 3. Use one of the language specific presets e.g. `/lang/en.js` for en-US specific content 4. Enable suggestions by passing the search option `suggest: true` Additionally, you can apply custom `Mapper`, `Replacer`, `Stemmer`, `Filter` or by assigning a custom `normalize` or `prepare` function to the Encoder. ### Compare Fuzzy-Search Encoding Original term which was indexed: "Struldbrugs"

Encoder:	`exact`	`default`	`simple`	`balance`	`advanced`	`extra`	`soundex`
Index Size	3.1 Mb	1.9 Mb	1.8 Mb	1.7 Mb	1.6 Mb	1.1 Mb	0.7 Mb
Struldbrugs	✓	✓	✓	✓	✓	✓	✓
struldbrugs		✓	✓	✓	✓	✓	✓
strũldbrųĝgs			✓	✓	✓	✓	✓
strultbrooks				✓	✓	✓	✓
shtruhldbrohkz					✓	✓	✓
zdroltbrykz						✓	✓
struhlbrogger							✓

The index size was measured after indexing the book "Gulliver's Travels". ### Custom Encoder Since it is very simple to create a custom Encoder, you are welcome to create your own. e.g. ```js function customEncoder(content){ const tokens = []; // split content into terms/tokens // apply your changes to each term/token // you will need to return an Array of terms/tokens // so just iterate through the input string and // push tokens to the array // ... return tokens; } const index = new Index({ // set to strict when your tokenization was already done tokenize: "strict", encode: customEncoder }); ``` If you get some good results please feel free to share your encoder. ## Load Library (Node.js, ESM, Legacy Browser) > Do not use the "src" folder of this repo. It isn't meant to be used directly, instead it needs compilation. You can easily perform a custom build, but don't use the source folder for production. You will need at least any kind of compiler which resolve the compiler flags within the code. The "dist" folder is containing every version which you probably need including unminified ESM modules. ```bash npm install flexsearch ``` The **_dist_** folder are located in: `node_modules/flexsearch/dist/`

Build	File	CDN
flexsearch.bundle.debug.js	Download	https://rawcdn.githack.com/nextapps-de/flexsearch/0.8.0/dist/flexsearch.bundle.debug.js
flexsearch.bundle.min.js	Download	https://rawcdn.githack.com/nextapps-de/flexsearch/0.8.0/dist/flexsearch.bundle.min.js
flexsearch.bundle.module.debug.js	Download	https://rawcdn.githack.com/nextapps-de/flexsearch/0.8.0/dist/flexsearch.bundle.module.debug.js
flexsearch.bundle.module.min.js	Download	https://rawcdn.githack.com/nextapps-de/flexsearch/0.8.0/dist/flexsearch.bundle.module.min.js
flexsearch.es5.debug.js	Download	https://rawcdn.githack.com/nextapps-de/flexsearch/0.8.0/dist/flexsearch.es5.debug.js
flexsearch.es5.min.js	Download	https://rawcdn.githack.com/nextapps-de/flexsearch/0.8.0/dist/flexsearch.es5.min.js
flexsearch.light.debug.js	Download	https://rawcdn.githack.com/nextapps-de/flexsearch/0.8.0/dist/flexsearch.light.debug.js
flexsearch.light.min.js	Download	https://rawcdn.githack.com/nextapps-de/flexsearch/0.8.0/dist/flexsearch.light.min.js
flexsearch.light.module.debug.js	Download	https://rawcdn.githack.com/nextapps-de/flexsearch/0.8.0/dist/flexsearch.light.module.debug.js
flexsearch.light.module.min.js	Download	https://rawcdn.githack.com/nextapps-de/flexsearch/0.8.0/dist/flexsearch.light.module.min.js
Javascript Modules	Download	https://github.com/nextapps-de/flexsearch/tree/0.8.0/dist/module
Javascript Modules (Minified)	Download	https://github.com/nextapps-de/flexsearch/tree/0.8.0/dist/module-min
Javascript Modules (Debug)	Download	https://github.com/nextapps-de/flexsearch/tree/0.8.0/dist/module-debug
flexsearch.custom.js	Read more about "Custom Build"

> All debug versions are providing debug information through the console and gives you helpful advices on certain situations. Do not use them in production, since they are special builds containing extra debugging processes which noticeably reduce performance. The abbreviations used at the end of the filenames indicates: - `bundle` All features included, FlexSearch is available on `window.FlexSearch` - `light` Only basic features are included, FlexSearch is available on `window.FlexSearch` - `es5` bundle has support for EcmaScript5, FlexSearch is available on `window.FlexSearch` - `module` indicates that this bundle is a Javascript module (ESM), FlexSearch members are available by `import { Index, Document, Worker, Encoder, Charset } from "./flexsearch.bundle.module.min.js"` or alternatively using the default export `import FlexSearch from "./flexsearch.bundle.module.min.js"` - `min` bundle is minified - `debug` bundle has enabled debug mode and contains additional code just for debugging purposes (do not use for production) ### Non-Module Bundles (ES5 Legacy) > Non-Module Bundles export all their features to the public namespace "FlexSearch" e.g. `window.FlexSearch.Index` or `window.FlexSearch.Document`. Load the bundle by a script tag: ```html ``` ### Module (ESM) When using modules you can choose from 2 variants: `flexsearch.xxx.module.min.js` has all features bundled ready for production, whereas the folder `/dist/module/` export all the features in the same structure as the source code but here compiler flags was resolved. Also, for each variant there exist: 1. A debug version for the development 2. A pre-compiled minified version for production Use the bundled version exported as a module (default export): ```html ``` Or import FlexSearch members separately by: ```html ``` Use non-bundled modules: ```html ``` Also, pre-compiled non-bundled production-ready modules are located in `dist/module-min/`, whereas the debug version is located at `dist/module-debug/`. You can also load modules via CDN: ```html ``` ### Node.js Install FlexSearch via NPM: ```npm npm install flexsearch ``` ```js const { Index, Document, Encoder } = require("flexsearch"); const index = new Index(/* ... */); ``` When you are using ESM in Node.js then just use the Modules explained one section above. ## Migration - The index option property "minlength" has moved to the Encoder Class - The index option flag "optimize" was removed - The index option flag "lang" was replaced by the Encoder Class `.assign()` - Boost cannot apply upfront anymore when indexing, instead you can use the boost property on a query dynamically - All definitions of the old text encoding process was replaced by similar definitions (Array changed to Set, Object changed to Map). You can use of the helper methods like `.addMatcher(char_match, char_replace)` which adds everything properly. - The default value for `fastupdate` is set to `false` by default when not passed via options - The method `index.encode()` has moved to `index.encoder.encode()` - The options `charset` and `lang` was removed from index (replaced by `Encoder.assign({...})`) - Every charset collection (files in folder `/lang/**.js`) is now exported as a config object (instead of a function). This config needs to be created by passing to the constructor `new Encoder(config)` or can be added to an existing instance via `encoder.assign(config)`. The reason was to keep the default encoder configuration when having multiple document indexes. - The property `bool` from DocumentOptions was removed (replaced by `Resolver`) - The static methods `FlexSearch.registerCharset()` and `FlexSearch.registerLanguage()` was removed, those collections are now exported to `FlexSearch.Charset` and `FlexSearch.Language` which can be accessed as module `import { Charset, Language } from "flexsearch"`