diff --git a/README.md b/README.md
index 9aca3b6..22f51d8 100644
--- a/README.md
+++ b/README.md
@@ -13,7 +13,6 @@
### FlexSearch v0.7.0 (Beta)
-=======
Beta is now available. Please test the new version and post back issues and suggestions. The Beta will pushed to the master branch in 2 weeks.
@@ -29,13 +28,15 @@ Source Code v0.7.0-beta available here:
Web's fastest and most memory-flexible full-text search library with zero dependencies.
-When it comes to raw search speed FlexSearch outperforms every single searching library out there and also provides flexible search capabilities like multi-field search, phonetic transformations or partial matching.
+Installation Guide • API Reference • Custom Builds • Flexsearch Server • Changelog
+
+When it comes to raw search speed FlexSearch outperforms every single searching library out there and also provides flexible search capabilities like multi-field search, phonetic transformations or partial matching.
+
Depending on the used options it also provides the most memory-efficient index. FlexSearch introduce a new scoring algorithm called "contextual index" based on a pre-scored lexical dictionary architecture which actually performs queries up to 1,000,000 times faster compared to other libraries.
FlexSearch also provides you a non-blocking asynchronous processing model as well as web workers to perform any updates or queries on the index in parallel through dedicated balanced threads.
-FlexSearch Server is available here: https://github.com/nextapps-de/flexsearch-server.
-
-Installation Guide • API Reference • Custom Builds • Flexsearch Server • Changelog
+FlexSearch Server is available here:
+https://github.com/nextapps-de/flexsearch-server.
Supported Platforms:
- Browser
diff --git a/doc/0.7.0-lang.md b/doc/0.7.0-lang.md
new file mode 100644
index 0000000..d6f407e
--- /dev/null
+++ b/doc/0.7.0-lang.md
@@ -0,0 +1,258 @@
+## Documentation 0.7.0-rev2
+
+### Language Handler
+
+Handling languages was completely replaced by a more generic approach. All language-specific definitions has excluded and was optimized for maximum dead-code elimination when using compiler/bundler. Each language exists of 5 definitions, which are divided into two groups:
+
+1. Charset
+ 1. ___encode___, type: `function(string):string[]`
+ 2. ___rtl___, type: `boolean`
+2. Language
+ 1. ___matcher___, type: `{string: string}`
+ 2. ___stemmer___, type: `{string: string}`
+ 3. ___filter___, type: `string[]`
+
+The charset contains the encoding logic, the language contains stemmer, stopword filter and matchers. Multiple language definitions can use the same charset encoder. Also this separation let you manage different language definitions for special use cases (e.g. names, cities, dialects/slang, etc.).
+
+To fully describe a custom language __on the fly__ you need to pass:
+
+```js
+const index = FlexSearch({
+ // mandatory:
+ encode: (str) => [str],
+ // optionally:
+ rtl: false,
+ stemmer: {},
+ matcher: {},
+ filter: []
+});
+```
+
+When passing no parameter it uses the `latin:default` schema by default.
+
+
+
+
+ Field |
+ Category |
+ Description |
+
+
+ encode |
+ charset |
+ The encoder function. Has to return an array of separated words (or an empty string). |
+
+
+
+ rtl |
+ charset |
+ A boolean property which indicates right-to-left encoding. |
+
+
+
+ filter |
+ language |
+ Filter are also known as "stopwords", they completely filter out words from being indexed. |
+
+
+
+ stemmer |
+ language |
+ Stemmer removes word endings and is a kind of "partial normalization". A word ending just matched when the word length is bigger than the matched partial. |
+
+
+
+ matcher |
+ language |
+ Matcher replaces all occurrences of a given string regardless of its position and is also a kind of "partial normalization". |
+
+
+
+### 1. Language Packs: ES6 Modules
+
+The most simple way to assign charset/language specific encoding via modules is:
+
+```js
+import charset from "./dist/module/lang/latin/soundex.js";
+import lang from "./dist/module/lang/en.js";
+
+const index = FlexSearch({
+ charset: charset,
+ lang: lang
+});
+```
+
+Just import the __default export__ by each module and assign them accordingly.
+
+The full qualified example from above is:
+
+```js
+import { encode, rtl, tokenize } from "./dist/module/lang/latin/soundex.js";
+import { stemmer, filter, matcher } from "./dist/module/lang/en.js";
+
+const index = FlexSearch({
+ encode: encode,
+ // assign forced tokenizer first:
+ tokenize: tokenize || "forward",
+ rtl: rtl,
+ stemmer: stemmer,
+ matcher: matcher,
+ filter: filter
+});
+```
+
+The example above is the standard interface which is at least exported from each charset/language.
+
+__Note:__ Some of the encoder variants limit the use of built-in tokenizer (e.g. soundex). To be save prioritize the forced tokenizer and fall back to your choice, e.g. `tokenize || "forward"`.
+
+#### Encoder Variants
+
+You remember the encoding variants like `simple`, `advanced`, `extra`, or `balanced`? These are also supported and provides you several variants of encoding (which differs in performance and degree of normalization).
+
+It is pretty straight forward when using a encoder variant:
+
+```js
+import advanced from "./dist/module/lang/latin/advanced.js";
+import { encode } from "./dist/module/lang/latin/extra.js";
+
+const index_advanced = FlexSearch({
+ // apply all definitions:
+ charset: advanced
+});
+
+const index_extra = FlexSearch({
+ // just apply the encoder:
+ encode: encode
+});
+```
+
+#### Available Latin Encoders
+
+1. default
+2. simple
+3. advanced
+4. extra
+5. balance
+6. soundex
+
+You can assign a charset by passing the charset during initialization, e.g. `charset: "latin"` for the default charset encoder or `charset: "latin:soundex"` for a encoder variant.
+
+#### Dialect / Slang
+
+Language definitions (especially matchers) also could be used to normalize dialect and slang of a specific language.
+
+### 2. Language Packs: ES5 Modules
+
+You need to make the charset and/or language definitions available by:
+
+1. All charset definitions are included in the `flexsearch.min.js` build by default, but no language-specific definitions are included
+2. You can load packages located in `/dist/lang/` (files refers to languages, folders are charsets)
+3. You can make a custom build
+
+When loading language packs, make sure that the library was loaded before:
+
+```html
+
+
+
+```
+
+Because you loading packs as external packages (non-ES6-modules) you have to initialize them by shortcuts:
+
+```js
+const index = FlexSearch({
+ charset: "latin:soundex",
+ lang: "en"
+});
+```
+
+> Use the `charset:variant` notation to assign charset and its variants. When just passing the charset without a variant will automatically resolve as `charset:default`.
+
+You can also override existing definitions, e.g.:
+
+```js
+const index = FlexSearch({
+ charset: "latin",
+ lang: "en",
+ matcher: {}
+});
+```
+
+Passed definitions will __not__ extend default definitions, they will replace them. When you like to extend a definition just create a new language file and put in all the content.
+
+#### Encoder Variants
+
+It is pretty straight forward when using an encoder variant:
+
+```html
+
+
+
+
+```
+
+```js
+const index_advanced = FlexSearch({
+ charset: "latin:advanced"
+});
+
+const index_extra = FlexSearch({
+ charset: "latin:extra"
+});
+```
+
+Again use the `charset:variant` notation to define charset and its variants.
+
+### Partial Tokenizer
+
+In FlexSearch you can't provide your own partial tokenizer, because it is a direct dependency to the core unit. The built-in tokenizer of FlexSearch splits each word into chunks by different patterns:
+
+1. strict (supports contextual index)
+2. forward
+3. reverse / both
+4. full
+5. ngram (supports contextual index, coming soon)
+
+### Language Processing Pipeline
+
+This is the default pipeline provided by FlexSearch:
+
+
+
+
+
+#### Custom Pipeline
+
+At first take a look into the default pipeline in `src/common.js`. It is very simple and straight forward. The pipeline will process as some sort of inversion of control, the final encoder implementation has to handle charset and also language specific transformations. This workaround has left over from many tests.
+
+Inject the default pipeline by e.g.:
+
+```js
+this.pipeline(
+
+ /* string: */ str.toLowerCase(),
+ /* normalize: */ false,
+ /* split: */ split,
+ /* collapse: */ false
+);
+```
+
+Use the pipeline schema from above to understand the iteration and the difference of pre-encoding and post-encoding. Stemmer and matchers needs to be applied after charset normalization but before language transformations, filters also.
+
+Here is a good example of extending pipelines: `src/lang/latin/extra.js` → `src/lang/latin/advanced.js` → `src/lang/latin/simple.js`.
+
+### How to contribute?
+
+Search for your language in `src/lang/`, if it exists you can extend or provide variants (like dialect/slang). If the language doesn't exist create a new file and check if any of the existing charsets (e.g. latin) fits to your language. When no charset exist, you need to provide a charset as a base for the language.
+
+A new charset should provide at least:
+
+1. `encode` A function which normalize the charset of a passed text content (remove special chars, lingual transformations, etc.) and __returns an array of separated words__. Also stemmer, matcher or stopword filter needs to be applied here. When the language has no words make sure to provide something similar, e.g. each chinese sign could also be a "word". Don't return the whole text content without split.
+3. `rtl` A boolean flag which indicates right-to-left encoding
+
+Basically the charset needs just to provide an encoder function along with an indicator for right-to-left encoding:
+
+```js
+export function encode(str){ return [str] }
+export const rtl = false;
+```
diff --git a/doc/0.7.0.md b/doc/0.7.0.md
index d6f407e..9d9f145 100644
--- a/doc/0.7.0.md
+++ b/doc/0.7.0.md
@@ -1,258 +1,1288 @@
-## Documentation 0.7.0-rev2
+# FlexSearch v0.7.0
-### Language Handler
+A long journey finally comes to its end. This document gives you some insights of the current state plus an overview of introduced new features and changes.
-Handling languages was completely replaced by a more generic approach. All language-specific definitions has excluded and was optimized for maximum dead-code elimination when using compiler/bundler. Each language exists of 5 definitions, which are divided into two groups:
+I ended up by building the whole library from scratch again, starting from a blank file. FlexSearch was one of my first open source libraries from which I've learned much about how to make a codebase ready for continuously adding new features. Putting features to the old codebase over and over again ends in a structural disaster. I also started an implementation in Rust where I finally recognized that the old codebase has to be thrown away completely.
-1. Charset
- 1. ___encode___, type: `function(string):string[]`
- 2. ___rtl___, type: `boolean`
-2. Language
- 1. ___matcher___, type: `{string: string}`
- 2. ___stemmer___, type: `{string: string}`
- 3. ___filter___, type: `string[]`
+My first step was addressing each issue and goal as "acceptance criteria". Most of them are about matching and scoring, but also keeping the performance was very important. The criteria for scoring are really hard, there is no library out which could pass those tests actually. The half of them are currently covered, that's a really great capability. Don't worry, the final release will not being delayed by the completion of those criteria.
-The charset contains the encoding logic, the language contains stemmer, stopword filter and matchers. Multiple language definitions can use the same charset encoder. Also this separation let you manage different language definitions for special use cases (e.g. names, cities, dialects/slang, etc.).
+Let's talk about the current state. The most difficult part (aka "the core development") has almost finished. I'm very satisfying with the result, which is an improvement in every single aspect. To be honest, it is still complex of course, probably more complex than the old generation was, so please don't expect too much about the simplicity of the code. My biggest focus was on code structure and process flow.
-To fully describe a custom language __on the fly__ you need to pass:
+FlexSearch basically is now divided into two Classes:
+
+1. Index
+2. Document
+
+The index is a pure (flat) implementation, greatly optimized to perform fast. The document usually exists of multiple indexes (for each field in document).
+
+In a document every index can have its own configuration, except the `encoder`. When you need custom encoders on specific fields you need to create multiple indexes or documents for this purpose.
+
+You can use an instance of Index directly (very much like the old FlexSearch when not indexing documents, instead just ID and text contents).
+
+The method signature stays almost unchanged:
```js
-const index = FlexSearch({
- // mandatory:
- encode: (str) => [str],
- // optionally:
- rtl: false,
- stemmer: {},
- matcher: {},
- filter: []
+var index = new Index(options);
+index.add(id, text);
+index.search(text, limit);
+index.search(text, options);
+index.search(text, limit, options);
+index.search(options);
+```
+
+```js
+var document = new Document(options);
+document.add(doc);
+document.add(id, doc);
+document.search(text, limit);
+document.search(text, options);
+document.search(text, limit, options);
+document.search(options);
+```
+
+## Builtin Profiles
+
+1. `memory` (primary optimize for memory)
+2. `performance` (primary optimize for performance)
+3. `match` (primary optimize for matching)
+4. `score` (primary optimize for scoring)
+5. `default` (the default balanced profile)
+
+These profiles are covering standard use cases. It is recommended to apply custom configuration instead of using profiles to get the best out for your situation. Every profile could be optimized further to its specific task, e.g. extreme performance optimized configuration or extreme memory and so on.
+
+## Improvements
+
+The whole "context" strategy has greatly improved.
+
+- Bidirectional Context (the order of words can now vary, does not increase memory when using bidirectional context)
+- New memory-friendly strategy for indexes (switchable, saves up to 50% of memory for each index, slightly decrease performance)
+- Better scoring calculation (one of the biggest concerns of the old implementation was that the order of arrays processed in the intersection has affected the order of relevance in the final result)
+- Fix resolution (the resolution in the old implementation was not fully stretched through the whole range in some cases)
+- Fix threshold (the threshold in the old implementation often has almost no effect, especially when using contextual index)
+- Skip words (optionally, automatically skip words from the context chain which are too short)
+- Hugely improves performance of long queries (up to 450x faster!) and also memory allocation (up to 250x less memory)
+- New fast-update strategy (optionally, hugely improves performance of all updates and removals of indexed contents up to 2850x)
+- Improved auto-balanced cache (keep and expire cache by popularity)
+- Append contents to already existing entries (already indexed documents or contents)
+- New method "contain" to check if an ID was already indexed
+- Access documents directly from internal store (read/write)
+- Suggestions are hugely improved, falls back from context search all the way down to single term match
+- Document descriptor has now array support (optionally adds array entries via the new `append` under the hood to provide a unique relevance context for each entry)
+- Document storage handler gets improved
+- Results from document index now grouped by field (this is one of the few bigger breaking changes which needs migrations of your old code)
+- Boolean search has a new concept (use in combination of the new result structure)
+
+A full configuration for a context-based index looks now like:
+
+```js
+var index = new Index({
+ tokenize: "strict",
+ resolution: 9,
+ threshold: 0,
+ minlength: 3,
+ optimize: "memory",
+ fastupdate: true,
+ context: {
+ depth: 1,
+ resolution: 3,
+ threshold: 0,
+ bidirectional: true
+ }
});
```
-When passing no parameter it uses the `latin:default` schema by default.
+The parameters `resolution` and `threshold` could be also set independently for the contextual index also, e.g. set those values more aggressive on contextual index only.
+
+## Index Stack-flow
+
+The process flow of an index could be switched between "memory-optimized" and "default".
+
+> The default flow performs slightly faster, because it provides two additional optimizations: 1. "fast fail" (stop early when term was not found) and 2. "fast success" (return early when the result has reached the limit).
+
+Default flow:
+`{term} => [score] => [ids]`
+
+Memory-optimized flow:
+`[score] => {term} => [ids]`
+
+The contextual index has the same schema as the lexical index above, but goes one level deeper:
+
+Default flow:
+`{keyword} => {term} => [score] => [ids]`
+
+Memory-optimized flow:
+`[score] => {keyword} => {term} => [ids]`
+
+## Tokenizer
+
+The tokenizers are also available again. They affect the capability of matching partials (parts of a term).
+
+1. `strict`
+2. `forward`
+3. `reverse`
+4. `full`
+
+## Changes
+
+I decide to use less parameter variation to make the code cleaner and also to make type definitions more practicable.
+
+### Async
+
+The "async" options was removed, instead you can call each method in its async version, e.g. `index.addAsync` or `index.searchAsync`.
+
+The advantage is you can now use both variations on the same index, whereas the old version is just performing asynchronous for all methods when the option flag was set.
+
+You can assign callbacks to each async function:
+
+```js
+index.addAsync(id, content, function(){
+ console.log("Task Done");
+});
+```
+
+```js
+index.searchAsync(query, function(result){
+ console.log("Results: ", result);
+});
+```
+
+Or did not pass a callback function and getting back a `Promise` instead:
+
+```js
+index.addAsync(id, content).then(function(){
+ console.log("Task Done");
+});
+```
+
+```js
+index.searchAsync(query).then(function(result){
+ console.log("Results: ", result);
+});
+```
+
+Or use `async` and `await`:
+
+```js
+async function add(){
+ await index.addAsync(id, content);
+ console.log("Task Done");
+}
+```
+
+```js
+async function search(){
+ const results = await index.searchAsync(query);
+ console.log("Results: ", result);
+}
+```
+
+### Auto-Balanced Cache (By Popularity)
+
+The cache was improved and has a new strategy for balance/expire cache entries by popularity.
+
+Also, to prevent inner calls to the same function recursively the cache gets a new method. Again, this make it possible to switch between both kind of queries (cached/uncached) on the same index.
+
+You still need to initialize the cache and its limit during the creation of the index:
+
+```js
+const index = new Index({ cache: 100 });
+```
+
+```js
+const results = index.searchCache(query);
+```
+
+A common scenario for using a cache is an autocomplete or instant search when typing.
+
+### Append Contents
+
+You can now append contents to an existing index like:
+
+```js
+index.append(id, content);
+```
+
+This will not overwrite the old indexed contents as it will do when perform `index.update(id, content)`. Keep in mind that `index.add(id, content)` will also perform "update" under the hood when the id was already being indexed.
+
+Appended contents will have their own context and also their own full `resolution`. Therefore, the relevance isn't being stacked but gets its own context.
+
+Let us take this example:
+
+```js
+index.add(0, "some index");
+index.append(0, "some appended content");
+
+index.add(1, "some text");
+index.append(1, "index appended content");
+```
+
+When you query `index.search("index")` then you will get index id 1 as the first entry in the result, because the context starts from zero for the appended data (isn't stacked to the old context) and here "index" is the first term.
+
+If you didn't want this behavior than just ust `index.add(id, content)` and provide the full length of content.
+
+#### Check existence of already indexed IDs
+
+You can check if an ID was already indexed by:
+
+```js
+if(index.contain(1)){
+ console.log("ID is already in index");
+}
+```
+
+## Document Index
+
+One of the key improvements is how documents will be indexed and processed. Such an index is getting its own class `Document` which contains instances of `Index` for each field under the hood. One advantage is you can query every `Index` of a document directly if you like. That comes close to the old "tag" feature, but runs significantly faster, e.g. when just query against one field. This way using a document can logically divide and distribute your contents through multiple indexes and is performing faster when query against one field compared to a non-document approach where you put in all your data to one index. Of course, a query through more than one field can't beat the non-document index performance-wise.
+
+### Document Descriptor
+
+Every document needs an ID. When your documents has no ID, then you need to create one by passing an index or count or something else as an ID (a value from type `number` is highly recommended). Those IDs are unique references to a given content. This is important when you update or adding over content through existing IDs. When referencing is not a concern, you can simply use something simple like `count++`.
+
+Assuming our document has a data structure like this:
+
+```json
+{
+ "id": 0,
+ "content": "some text"
+}
+```
+
+Old syntax FlexSearch v0.6.3 (___not supported anymore___):
+
+```js
+const index = new Document({
+ doc: {
+ id: "id",
+ field: ["content"]
+ }
+});
+```
+
+> The document descriptor has slightly changed, there is no `field` branch anymore, instead just apply one level higher, so `key` becomes a main member of options.
+
+For the new syntax create a new document instance by providing a document descriptor `doc` and a `key` in you options.
+
+```js
+const index = new Document({
+ key: "id",
+ doc: ["content"]
+});
+
+index.add({
+ id: 0,
+ content: "some text"
+});
+```
+
+The field `key` was renamed and represents the old `id` field. This field describes where the ID lives inside your documents. The default key gets the value `id` by default when not passed, so you can shorten this to:
+
+```js
+const index = new Document({
+ tokenize: "strict",
+ doc: ["content"]
+});
+```
+
+The member `field` has a list of fields which you want to be indexed from your documents. When just selecting one field, then you can pass a string. When also using default key `id` then this shortens to just:
+
+```js
+const index = new Document({ doc: "content" });
+index.add({ id: 0, content: "some text" });
+```
+
+Assuming you have several fields, you can pass custom options for each field also:
+
+```js
+const index = new Document({
+ key: "id",
+ doc: {
+ "title": {
+ tokenize: "forward",
+ optimize: "memory",
+ resolution: 9,
+ threshold: 0
+ },
+ "content": {
+ tokenize: "strict",
+ optimize: "memory",
+ resolution: 9,
+ threshold: 3,
+ minlength: 3,
+ context: {
+ depth: 1,
+ resolution: 3,
+ threshold: 0
+ }
+ }
+ }
+});
+```
+
+Field options gets inherited when also global options was passed, e.g.:
+
+```js
+const index = new Document({
+ tokenize: "strict",
+ optimize: "memory",
+ resolution: 9,
+ threshold: 0,
+ // document specific options:
+ key: "id",
+ doc: {
+ "title": {
+ tokenize: "forward"
+ },
+ "content": {
+ threshold: 3,
+ minlength: 3,
+ context: {
+ depth: 1,
+ resolution: 3,
+ threshold: 0
+ }
+ }
+ }
+});
+```
+
+Note: The context options from the field "content" also gets inherited by the corresponding field options, whereas this field options was inherited by the global option (so threshold would be "3" if not set in context options).
+
+#### Nested Data Fields
+
+```json
+{
+ "record": {
+ "id": 0,
+ "title": "some title",
+ "content": {
+ "header": "some text",
+ "footer": "some text"
+ }
+ }
+}
+```
+
+> Use the colon ":" syntax to name each field hierarchically starting from the root, e.g.:
+
+```js
+const index = new Document({
+ key: "record:id",
+ doc: [
+ "record:title",
+ "record:content:header",
+ "record:content:footer"
+ ]
+});
+```
+
+> Just add fields you want to query against. Do not add fields to the index, you just need in the result (but did not query against). For this purpose you can store documents independently of its index (read below).
+
+Same in object notation for field-specific options:
+
+```js
+const index = new Document({
+ key: "record:id",
+ doc: {
+ "record:title": {
+ tokenize: "forward"
+ },
+ "record:content:header": {
+ threshold: 3
+ }
+ }
+});
+```
+
+When you want to query through a field you have to pass the exact key of the field you have defined in the `doc` as a field name (with colon syntax):
+
+```js
+index.search(query, {
+ field: [
+ "record:title",
+ "record:content:header",
+ "record:content:footer"
+ ],
+});
+```
+
+or also:
+
+```js
+index.search({
+ field: {
+ "record:title": {
+ query: "some query",
+ limit: 100,
+ suggest: true
+ },
+ "record:content:header": {
+ query: "some other query",
+ limit: 50
+ }
+ }
+});
+```
+
+#### Complex Documents
+
+```json
+[
+ {
+ "tag": "cat",
+ "records": [
+ {
+ "id": 0,
+ "body": {
+ "title": "some title",
+ "footer": "some text"
+ },
+ "keywords": ["some", "key", "words"]
+ },
+ {
+ "id": 1,
+ "body": {
+ "title": "some title",
+ "footer": "some text"
+ },
+ "keywords": ["some", "key", "words"]
+ }
+ ]
+ }
+]
+```
+
+Please notice this complex structure has its records as nested array which also includes the `key`.
+
+```js
+const index = new Document({
+ key: "records[]:id",
+ tag: "tag",
+ doc: [
+ "records[]:body:title",
+ "records[]:body:footer",
+ "records[]:body:keywords"
+ ]
+});
+```
+
+Again, when searching you have to use the same colon-separated-string from your field definition.
+
+```js
+index.search(query, {
+ field: "records[]:body:title"
+});
+```
+
+#### Join / Append Arrays
+
+On the complex example above, the field `keywords` is an array but here the markup did not have brackets like `keywords[]`. That will also detect the array but instead of appending each entry to a new context, the array will be joined into on large string and added to the index.
+
+The difference of both kinds of adding array contents is the relevance when searching. When adding each item from an array via append to its own context with the syntax `field[]`, then the relevance of the last entry concurrent with the first entry. When you left the brackets in the notation, it will join the array to one string. Here the first entry has the highest relevance, whereas the last entry has the lowest relevance.
+
+So assuming the keyword from the example above are pre-sorted by relevance to its popularity, then you want to keep this order (information of relevance). For this purpose do not add brackets to the notation. Otherwise, it would take the entries in a new scoring context (the old order is getting lost).
+
+Also you can left bracket notation for better performance and smaller memory footprint. Use it when you did not need the granularity of relevance by the entries.
+
+### Field Search
+
+Search through all fields:
+
+```js
+index.search(query);
+```
+
+Search through a specific field:
+
+```js
+index.search(query, { field: "title" });
+```
+
+Search through a given set of fields:
+
+```js
+index.search(query, { field: ["title", "content"] });
+```
+
+Pass custom modifiers to each field:
+
+```js
+index.search(query, {
+ field: {
+ title: {
+ threshold: 0,
+ limit: 50
+ },
+ content: {
+ threshold: 3,
+ limit: 100,
+ suggest: true
+ }
+ }
+});
+```
+
+Or pass custom query to each field:
+
+```js
+index.search({
+ field: {
+ title: {
+ query: "some title",
+ threshold: 0,
+ limit: 50
+ },
+ content: {
+ query: "some content",
+ threshold: 3,
+ limit: 100,
+ suggest: true
+ }
+ }
+});
+```
+
+### New Result Set
+
+One of the few breaking changes which needs migration of your old implementation is the result set. I was thinking a long time about it and came to the conclusion, that this new structure might look weird on the first time, but also comes with some nice new capabilities.
+
+Schema of the result-set:
+
+> `fields[] => { field, result[] => { document }}`
+
+The first index is an array of fields the query was applied to. Each of this field has a record (object) with 2 properties "field" and "result". The "result" is also an array and includes the result for this specific field. The result could be an array of IDs or as enriched with stored document data.
+
+A non-enriched result set now looks like:
+
+```js
+[{
+ field: "title",
+ result: [0, 1, 2]
+},{
+ field: "content",
+ result: [3, 4, 5]
+}]
+```
+
+An enriched result set now looks like:
+
+```js
+[{
+ field: "title",
+ result: [
+ { id: 0, doc: { /* document */ }},
+ { id: 1, doc: { /* document */ }},
+ { id: 2, doc: { /* document */ }}
+ ]
+},{
+ field: "content",
+ result: [
+ { id: 3, doc: { /* document */ }},
+ { id: 4, doc: { /* document */ }},
+ { id: 5, doc: { /* document */ }}
+ ]
+}]
+```
+
+When using `pluck` instead of "field" you can explicitly select just one field and get back a flat representation:
+
+```js
+index.search(query, { pluck: "title", enrich: true });
+```
+
+```js
+[
+ { id: 0, doc: { /* document */ }},
+ { id: 1, doc: { /* document */ }},
+ { id: 2, doc: { /* document */ }}
+]
+```
+
+Ok, but why? These change is basically based on "boolean search". Instead of applying your bool logic to a nested object (which almost ends in structured hell), you can apply your logic by yourself on top of the result-set dynamically. This opens hugely capabilities on how you process the results. Therefore, the results from the fields aren't squashed into one result anymore. That keeps some important information, like the name of the field as well as the relevance of each field results which didn't get mixed anymore.
+
+> A field search will apply a query with the boolean "or" logic by default. Each field has its own result to the given query.
+
+There is only one situation where the `bool` property is still supported. When you like to switch the default "or" logic from the field search into "and", e.g.:
+
+```js
+index.search(query, {
+ field: ["title", "content"],
+ bool: "and"
+});
+```
+
+You will just get results which contains the query in both fields. That's it.
+
+The new group-by-field result also lets you pass a custom query to each field:
+
+```js
+index.search({
+ field: {
+ title: {
+ query: "some title",
+ threshold: 0,
+ limit: 50
+ },
+ content: {
+ query: "some content",
+ threshold: 3,
+ limit: 100,
+ suggest: true
+ }
+ }
+});
+```
+
+## Document Stores
+
+> Never use a store when: 1. you did not need the original data to process your query results, or 2. you already have the contents/documents stored elsewhere (outside the index).
+
+Only a document index have a store. You can use a document index instead of a flat index to get this functionality when only storing ID-content-pairs.
+
+This will add the whole original content to the store (text as string or documents as objects):
+
+```js
+const index = new Document({ doc: "content", store: true });
+index.add({ id: 0, content: "some text" });
+```
+
+### Access documents from internal store
+
+You can get indexed documents from the store:
+
+```js
+var data = index.get(1);
+```
+
+You can update/change store contents directly without changing the index by:
+
+```js
+index.set(1, data);
+```
+
+To update the store and also update the index then just use `index.update`, `index.add` or `index.append`.
+
+When you perform a query, weather it is a document index or a flat index, then you will always get back an array of IDs.
+
+Optionally you can enrich the query results automatically with stored contents by:
+
+```js
+index.search(query, { enrich: true });
+```
+
+Your results look now like:
+
+```js
+[{
+ id: 0,
+ doc: { /* content from store */ }
+},{
+ id: 1,
+ doc: { /* content from store */ }
+}]
+```
+
+### Configure Storage (Recommended)
+
+This will add just specific fields from a document to the store (the ID isn't necessary to keep in store):
+
+```js
+const index = new Document({
+ doc: "content",
+ store: ["author", "email"]
+});
+
+index.add(id, content);
+```
+
+You can configure independently what should being indexed and what should being stored. It is highly recommended to make use of this whenever you can.
+
+Here a useful example of configuring doc and store:
+
+```js
+const index = new Document({
+ doc: "content",
+ store: ["author", "email"]
+});
+
+index.add({
+ id: 0,
+ author: "Jon Doe",
+ email: "john@mail.com",
+ content: "Some content for the index ..."
+});
+```
+
+You can query through the contents and will get back the stored values instead:
+
+```js
+index.search("some content", { enrich: true });
+```
+
+Your results are now looking like:
+
+```js
+[{
+ field: "content",
+ result: [{
+ id: 0,
+ doc: {
+ author: "Jon Doe",
+ email: "john@mail.com",
+ }
+ }]
+}]
+```
+
+Both field "author" and "email" are not indexed.
+
+## WebWorker
+
+The whole worker implementation has changed by also keeping Node.js support in mind. The good news is worker will also get supported by Node.js by the library.
+
+One important change is how workers divided their tasks and how contents are distributed. One big issue was that in the old model workers cycles for each task (Round Robin). Theoretically that provides an optimal balance of workload and storage. But that breaks the internal architecture of this search library and almost every performance optimization is getting lost.
+
+Let us take an example. Assuming you have 4 workers and you will add 4 contents to the index, then each content is delegated to one worker (a perfect balance but index becomes a partial index).
+
+```js
+const index = new Index({ worker: 4 });
+index.add(1, "some")
+ .add(2, "content")
+ .add(3, "to")
+ .add(4, "index");
+```
+
+```
+Worker 1: { 1: "some" }
+Worker 2: { 2: "content" }
+Worker 3: { 3: "to" }
+Worker 4: { 4: "index" }
+```
+
+The issue starts when you query a term. Each of the worker has to resolve the search on its own index and has to delegate back the results to apply the intersection calculation. That's the problem. No one of the workers could solve a search task completely, they have to transmit intermediate results back. Therefore, no optimization path could be applied early, because every worker has to send back the full (non-limited) result first.
+
+The new worker model from v0.7.0 is divided into "fields" from the document (1 worker = 1 field index). This way the worker becomes able to solve tasks (subtasks) completely. The downside of this paradigm is they might not have been perfect balanced in storing contents (fields may have different length of contents). On the other hand there is no indication that balancing the storage gives any advantage (they all require the same amount in total).
+
+```js
+const index = new Index({
+ doc: ["tag", "name", "title", "text"],
+ worker: 4
+});
+
+index.add({
+ id: 1, tag: "cat", name: "Tom", title: "some", text: "some"
+}).add({
+ id: 2, tag: "dog", name: "Ben", title: "title", text: "content"
+}).add({
+ id: 3, tag: "cat", name: "Max", title: "to", text: "to"
+}).add({
+ id: 4, tag: "dog", name: "Tim", title: "index", text: "index"
+});
+```
+
+```
+Worker 1: { 1: "cat", 2: "dog", 3: "cat", 4: "dog" }
+Worker 2: { 1: "Tom", 2: "Ben", 3: "Max", 4: "Tim" }
+Worker 3: { 1: "some", 2: "title", 3: "to", 4: "index" }
+Worker 4: { 1: "some", 2: "content", 3: "to", 4: "index" }
+```
+
+When you perform a field search through all fields then this task is perfectly balanced through all workers, which can solve their subtasks independently.
+
+The main thread has to solve a last intersection calculation as before. On this step it needs to apply "bool" and "paging" logic, also "suggestions". I'm thinking about to move the workload from the main tread to another worker, so all computations will perform in background completely, the index from the main thread just holds the configuration and its document store (when using store).
+
+## Engines (Custom Indexes)
+
+FlexSearch supports custom engines for the index. The default engine is `Index` which extends the abstract class `Engine` under the hood. These abstract class provides some basic methods like cache and async handler. It needs to just implement the 5 standard functions `add`, `append`, `update`, `remove` and `search`. Then you can use this new engine fully integrated into FlexSearch workflow (Document Handler, Worker, Async Handler, Cache, Document Storage, Paging). That it also some demonstration of the powerful flexibility of extending FlexSearch.
+
+This is the standard engine which automatically gets applied when not set by default:
+
+```js
+import Index from "./index.js";
+import Document from "./document.js";
+const index = new Document({ engine: Index });
+```
+
+Just for the understanding. The abstract class `Engine` (engine.js) provides some basic methods (you can override these in your custom index):
+
+```js
+Engine.prototype.addAsync;
+Engine.prototype.appendAsync;
+Engine.prototype.searchAsync;
+Engine.prototype.updateAsync;
+Engine.prototype.removeAsync;
+Engine.prototype.searchCache;
+```
+
+Define your custom index:
+
+```js
+import IndexInterface from "./interface.js";
+import Engine from "./engine.js";
+
+/**
+ * @constructor
+ * @implements IndexInterface
+ * @extends Engine
+ */
+function CustomIndex(){
+ // provide some properties if your implementation needs it
+ this.property = true;
+}
+
+// implement all methods your index should provide/need respectively,
+// but keep method arguments and type definitions from the standard interface:
+
+CustomIndex.prototype.add = function(id, content){};
+CustomIndex.prototype.append = function(id, content){};
+CustomIndex.prototype.update = function(id, content){};
+CustomIndex.prototype.search = function(query){};
+CustomIndex.prototype.remove = function(id){};
+
+// you can also implement your own methods besides the standard interface:
+
+CustomIndex.prototype.export = function(){};
+CustomIndex.prototype.import = function(){};
+```
+
+The standard interface `IndexInterface` for an index could be found in `interface.js`. You will just need it for your type validation of your IDE or for your build tool as well as use it as a template for your implementation, because it contains all necessary method information as JSDoc.
+
+Create your engine as follows when it should extend the abstract class:
+
+```js
+import Engine from "./engine.js";
+var MyIndex = new Engine(CustomIndex);
+```
+
+Or when it does not extend the abstract class:
+
+```js
+var MyIndex = CustomIndex();
+```
+
+Or export it as an ES6 module:
+
+```js
+export default new Engine(CustomIndex);
+```
+
+> You did not need to extend from abstract class `Engine`, but then you did not have the standard helper function for cache or async when using the index directly. Probably this might not be an issue.
+
+Now you can assign your created engine to the document:
+
+```js
+var index = new Document({ engine: MyIndex });
+index.add(document);
+index.search(query);
+```
+
+Or just use this directly as a "flat" index:
+
+```js
+var index = new MyIndex();
+index.add(id, content);
+index.search(query);
+```
+
+> You cannot assign a custom engine for the document itself. The passed engine is just used for the internal indexes. Otherwise, you would bypass the whole library.
+
+
+
+### Custom Index "Bulk"
+
+I will provide a simple custom engine from another project "BulkSearch". These index is completely different from FlexSearch and focus on memory allocation and partial match capabilities. The implementation would be very simple compared to the contextual index from FexSearch.
+
+This is also a good demonstration in how to use custom engines and also useful for all which absolutely focus of memory and partial matching capabilities. A high matching capability for FlexSearch has a significant memory cost, whereas the "bulk" index has zero additional memory cost. The smallest memory footprint of a "bulk" index is three times smaller than from FlexSearch and does not increase by any of the modifiers. That's the complete opposite compared to FlexSearch approach, that also means the performance of such a bulk index are several leagues behind FlexSearch.
+
+## Benchmark (Search)
+
+### Contextual Search
+
+FlexSearch v0.6.3 (fastest profile):
+
+```
+query-single 4313828 op/s, Memory: 1
+query-multi 1526028 op/s, Memory: 1
+query-long 57181 op/s, Memory: 8
+query-dupes 1460489 op/s, Memory: 1
+not-found 2423155 op/s, Memory: 1
+```
+
+FlexSearch v0.7.0 (equivalent profile):
+
+```
+query-single 7344119 op/s, Memory: 1
+query-multi 2460401 op/s, Memory: 1
+query-long 931957 op/s, Memory: 1
+query-dupes 2137628 op/s, Memory: 1
+not-found 3028110 op/s, Memory: 1
+```
+
+This is a performance gain up to 16x faster.
+
+### Lexical Search
+
+FlexSearch v0.6.3 (fastest profile):
+
+```
+query-single 4154241 op/s, Memory: 1
+query-multi 175687 op/s, Memory: 3
+query-long 1453 op/s, Memory: 516
+query-dupes 969917 op/s, Memory: 1
+not-found 2289013 op/s, Memory: 1
+```
+
+There was a performance leak when using extra long queries (for this test I've picked a worst-case scenario).
+
+FlexSearch v0.7.0 (equivalent profile):
+
+```
+query-single 7362096 op/s, Memory: 1
+query-multi 580524 op/s, Memory: 4
+query-long 645983 op/s, Memory: 2
+query-dupes 2136893 op/s, Memory: 1
+not-found 3061433 op/s, Memory: 1
+```
+
+This is a performance gain up to 450x faster, also reduced memory allocation up to 250x.
+
+### Search + Cache
+
+FlexSearch v0.6.3:
+
+```
+query-single 2342487 op/s, Memory: 1
+query-multi 2445660 op/s, Memory: 1
+query-long 3823374 op/s, Memory: 1
+query-dupes 4162607 op/s, Memory: 1
+not-found 3858238 op/s, Memory: 1
+```
+
+A fun fact is that the new version is almost as fast as the old version with cache enabled.
+
+FlexSearch v0.7.0:
+
+```
+query-single 29266333 op/s, Memory: 1
+query-multi 35164612 op/s, Memory: 1
+query-long 33610046 op/s, Memory: 1
+query-dupes 30240771 op/s, Memory: 1
+not-found 36181951 op/s, Memory: 1
+```
+
+This is a performance gain up to 14 times faster.
+
+## Benchmark (Add, Update, Delete)
+
+One part which also gets massively improvements is the update and removal of indexed contents by `index.update(id, content)` or by `index.remove(id)`. That was the worst case scenario for FlexSearch.
+
+The new option flag "fastupdate" make use of an additional register and pushes performance of all updates and removals of __already indexed contents__ by a factor up to 2850x faster. This additional register comes with a moderate memory cost (+5%). When your index needs to be updated __already indexed contents__ frequently, then this option is highly recommended. When just adding new contents (with new IDs), this option is useless and the extra memory cost isn't worth it.
+
+FlexSearch v0.6.3 (fastest profile):
+
+```
+add 84788 op/s, Memory: 166
+update 717 op/s, Memory: 785
+remove 1186 op/s, Memory: 535
+```
+
+FlexSearch v0.7.0 (equivalent profile):
+
+```
+add 261529 op/s, Memory: 3
+update 3043 op/s, Memory: 113
+remove 5572 op/s, Memory: 530
+```
+
+This is a performance gain up to 5x faster.
+
+FlexSearch v0.7.0 + "fastupdate" enabled:
+
+```
+add 261172 op/s, Memory: 3
+update 238025 op/s, Memory: 1
+remove 3430364 op/s, Memory: 1
+```
+
+This is a performance gain up to 2850x faster.
+
+## Contextual Search
+
+The advantage of using a contextual search is the scoring of relevance which take the distance between each term from the indexed documents into account. That brings relevance search to a complete new level compared to TF-IDF. In fact a TF-IDF tells nothing about the relevance of a query which exist of multiple terms. TF-IDF is just useful when using one term queries and is also used by FlexSearch as a fallback for this purpose.
+
+The context starts by a query which have more than one term and will increase for each additional term. Often you will need 3 or 4 words to get the __absolutely perfect match__ in a complex document. A nice bonus is the performance boost you will get by internally cutting down the intersection calculations on multiple-term queries.
+
+The contextual index is an additional index to the pre-scored lexical standard index. This addition comes with a memory cost.
+
+## Memory Allocation
+
+The book "Gulliver's Travels Swift Jonathan 1726" was fully indexed for the examples below.
+
+The most memory-optimized meaningful setting will allocate just 1.2 Mb for the whole book indexed! This is probably the most tiny memory footprint you will get from a search library.
+
+```js
+import { encode } from "./lang/latin/extra.js";
+
+index = new Index({
+ encode: encode,
+ tokenize: "strict",
+ optimize: "memory",
+ resolution: 1,
+ threshold: 0,
+ minlength: 3,
+ fastupdate: false,
+ context: false
+});
+```
+
+### Compare Impact of Memory Allocation
+
+by default a lexical index is very small:
+`depth: 0, bidirectional: 0, resolution: 3, threshold: 0, minlength: 0` => 2.1 Mb
+
+a higher resolution will increase the memory allocation:
+`depth: 0, bidirectional: 0, resolution: 9, threshold: 0, minlength: 0` => 2.9 Mb
+
+using the contextual index will increase the memory allocation:
+`depth: 1, bidirectional: 0, resolution: 9, threshold: 0, minlength: 0` => 12.5 Mb
+
+a higher contextual depth will increase the memory allocation:
+`depth: 2, bidirectional: 0, resolution: 9, threshold: 0, minlength: 0` => 21.5 Mb
+
+a higher minlength will decrease memory allocation:
+`depth: 2, bidirectional: 0, resolution: 9, threshold: 0, minlength: 3` => 19.0 Mb
+
+using bidirectional will decrease memory allocation:
+`depth: 2, bidirectional: 1, resolution: 9, threshold: 0, minlength: 3` => 17.9 Mb
+
+a higher threshold will decrease memory allocation:
+`depth: 2, bidirectional: 1, resolution: 9, threshold: 5, minlength: 3` => 5.8 Mb
+
+enable the option "fastupdate" will increase memory allocation:
+`depth: 2, bidirectional: 1, resolution: 9, threshold: 5, minlength: 3` => 6.3 Mb
+
+### Full Comparison Table
+
+Every search library is constantly in competition with these 4 properties:
+
+1. Memory Allocation
+2. Performance
+3. Matching Capabilities
+4. Relevance Order (Scoring)
+
+FlexSearch provides you many parameters you can use to adjust the optimal balance for your specific use-case.
-
- Field |
- Category |
- Description |
+ |
- encode |
- charset |
- The encoder function. Has to return an array of separated words (or an empty string). |
+ Modifier |
+ Memory Impact * |
+ Performance Impact ** |
+ Matching Impact ** |
+ Scoring Impact ** |
+
+
+ resolution |
+ +1 (per level) |
+ +1 (per level) |
+ 0 |
+ +3 (per level) |
- rtl |
- charset |
- A boolean property which indicates right-to-left encoding. |
+ threshold |
+ -4 (per level) |
+ +2 (per level) |
+ -3 (per level) |
+ 0 |
- filter |
- language |
- Filter are also known as "stopwords", they completely filter out words from being indexed. |
+ depth |
+ +4 (per level) |
+ +2 (per level) |
+ -10 + depth |
+ +10 |
- stemmer |
- language |
- Stemmer removes word endings and is a kind of "partial normalization". A word ending just matched when the word length is bigger than the matched partial. |
+ minlength |
+ -2 (per level) |
+ +2 (per level) |
+ -3 (per level) |
+ +3 (per level) |
- matcher |
- language |
- Matcher replaces all occurrences of a given string regardless of its position and is also a kind of "partial normalization". |
+ bidirectional |
+ -2 |
+ 0 |
+ +3 (per level) |
+ 0 |
+
+
+
+ fastupdate |
+ +1 |
+ +10 (update, remove) |
+ 0 |
+ 0 |
+
+
+
+ optimize: "memory" |
+ -5 |
+ -1 |
+ 0 |
+ -1 |
+
+
+
+ encoder: "icase" |
+ 0 |
+ 0 |
+ 0 |
+ 0 |
+
+
+
+ encoder: "simple" |
+ -2 |
+ -1 |
+ +2 |
+ 0 |
+
+
+
+ encoder: "advanced" |
+ -3 |
+ -2 |
+ +4 |
+ 0 |
+
+
+
+ encoder: "extra" |
+ -5 |
+ -5 |
+ +6 |
+ 0 |
+
+
+
+ encoder: "soundex" |
+ -6 |
+ -2 |
+ +8 |
+ 0 |
+
+
+
+ tokenize: "strict" |
+ 0 |
+ 0 |
+ 0 |
+ 0 |
+
+
+
+ tokenize: "forward" |
+ +3 |
+ -2 |
+ +5 |
+ 0 |
+
+
+
+ tokenize: "reverse" |
+ +5 |
+ -4 |
+ +7 |
+ 0 |
+
+
+
+ tokenize: "full" |
+ +8 |
+ -5 |
+ +10 |
+ 0 |
+
+
+
+ store: true |
+ +5 (per document) |
+ 0 |
+ 0 |
+ 0 |
+
+
+
+ store: [fields] |
+ +2 (per field) |
+ 0 |
+ 0 |
+ 0 |
+
+
+
+ cache: true |
+ +10 |
+ +10 |
+ 0 |
+ 0 |
+
+
+
+ cache: 100 |
+ +1 |
+ +9 |
+ 0 |
+ 0 |
+
+
+
+ type of ids: number |
+ 0 |
+ 0 |
+ 0 |
+ 0 |
+
+
+
+ type of ids: string |
+ +3 |
+ -3 |
+ 0 |
+ 0 |
+* range from -10 to 10, lower is better (-10 => big decrease, 0 => unchanged, +10 => big increase)
+** range from -10 to 10, higher is better
-### 1. Language Packs: ES6 Modules
+---
-The most simple way to assign charset/language specific encoding via modules is:
-
-```js
-import charset from "./dist/module/lang/latin/soundex.js";
-import lang from "./dist/module/lang/en.js";
-
-const index = FlexSearch({
- charset: charset,
- lang: lang
-});
-```
-
-Just import the __default export__ by each module and assign them accordingly.
-
-The full qualified example from above is:
-
-```js
-import { encode, rtl, tokenize } from "./dist/module/lang/latin/soundex.js";
-import { stemmer, filter, matcher } from "./dist/module/lang/en.js";
-
-const index = FlexSearch({
- encode: encode,
- // assign forced tokenizer first:
- tokenize: tokenize || "forward",
- rtl: rtl,
- stemmer: stemmer,
- matcher: matcher,
- filter: filter
-});
-```
-
-The example above is the standard interface which is at least exported from each charset/language.
-
-__Note:__ Some of the encoder variants limit the use of built-in tokenizer (e.g. soundex). To be save prioritize the forced tokenizer and fall back to your choice, e.g. `tokenize || "forward"`.
-
-#### Encoder Variants
-
-You remember the encoding variants like `simple`, `advanced`, `extra`, or `balanced`? These are also supported and provides you several variants of encoding (which differs in performance and degree of normalization).
-
-It is pretty straight forward when using a encoder variant:
-
-```js
-import advanced from "./dist/module/lang/latin/advanced.js";
-import { encode } from "./dist/module/lang/latin/extra.js";
-
-const index_advanced = FlexSearch({
- // apply all definitions:
- charset: advanced
-});
-
-const index_extra = FlexSearch({
- // just apply the encoder:
- encode: encode
-});
-```
-
-#### Available Latin Encoders
-
-1. default
-2. simple
-3. advanced
-4. extra
-5. balance
-6. soundex
-
-You can assign a charset by passing the charset during initialization, e.g. `charset: "latin"` for the default charset encoder or `charset: "latin:soundex"` for a encoder variant.
-
-#### Dialect / Slang
-
-Language definitions (especially matchers) also could be used to normalize dialect and slang of a specific language.
-
-### 2. Language Packs: ES5 Modules
-
-You need to make the charset and/or language definitions available by:
-
-1. All charset definitions are included in the `flexsearch.min.js` build by default, but no language-specific definitions are included
-2. You can load packages located in `/dist/lang/` (files refers to languages, folders are charsets)
-3. You can make a custom build
-
-When loading language packs, make sure that the library was loaded before:
-
-```html
-
-
-
-```
-
-Because you loading packs as external packages (non-ES6-modules) you have to initialize them by shortcuts:
-
-```js
-const index = FlexSearch({
- charset: "latin:soundex",
- lang: "en"
-});
-```
-
-> Use the `charset:variant` notation to assign charset and its variants. When just passing the charset without a variant will automatically resolve as `charset:default`.
-
-You can also override existing definitions, e.g.:
-
-```js
-const index = FlexSearch({
- charset: "latin",
- lang: "en",
- matcher: {}
-});
-```
-
-Passed definitions will __not__ extend default definitions, they will replace them. When you like to extend a definition just create a new language file and put in all the content.
-
-#### Encoder Variants
-
-It is pretty straight forward when using an encoder variant:
-
-```html
-
-
-
-
-```
-
-```js
-const index_advanced = FlexSearch({
- charset: "latin:advanced"
-});
-
-const index_extra = FlexSearch({
- charset: "latin:extra"
-});
-```
-
-Again use the `charset:variant` notation to define charset and its variants.
-
-### Partial Tokenizer
-
-In FlexSearch you can't provide your own partial tokenizer, because it is a direct dependency to the core unit. The built-in tokenizer of FlexSearch splits each word into chunks by different patterns:
-
-1. strict (supports contextual index)
-2. forward
-3. reverse / both
-4. full
-5. ngram (supports contextual index, coming soon)
-
-### Language Processing Pipeline
-
-This is the default pipeline provided by FlexSearch:
-
-
-
-
-
-#### Custom Pipeline
-
-At first take a look into the default pipeline in `src/common.js`. It is very simple and straight forward. The pipeline will process as some sort of inversion of control, the final encoder implementation has to handle charset and also language specific transformations. This workaround has left over from many tests.
-
-Inject the default pipeline by e.g.:
-
-```js
-this.pipeline(
-
- /* string: */ str.toLowerCase(),
- /* normalize: */ false,
- /* split: */ split,
- /* collapse: */ false
-);
-```
-
-Use the pipeline schema from above to understand the iteration and the difference of pre-encoding and post-encoding. Stemmer and matchers needs to be applied after charset normalization but before language transformations, filters also.
-
-Here is a good example of extending pipelines: `src/lang/latin/extra.js` → `src/lang/latin/advanced.js` → `src/lang/latin/simple.js`.
-
-### How to contribute?
-
-Search for your language in `src/lang/`, if it exists you can extend or provide variants (like dialect/slang). If the language doesn't exist create a new file and check if any of the existing charsets (e.g. latin) fits to your language. When no charset exist, you need to provide a charset as a base for the language.
-
-A new charset should provide at least:
-
-1. `encode` A function which normalize the charset of a passed text content (remove special chars, lingual transformations, etc.) and __returns an array of separated words__. Also stemmer, matcher or stopword filter needs to be applied here. When the language has no words make sure to provide something similar, e.g. each chinese sign could also be a "word". Don't return the whole text content without split.
-3. `rtl` A boolean flag which indicates right-to-left encoding
-
-Basically the charset needs just to provide an encoder function along with an indicator for right-to-left encoding:
-
-```js
-export function encode(str){ return [str] }
-export const rtl = false;
-```
+Copyright 2019 Nextapps GmbH
+Released under the Apache 2.0 License