From 376da38349d4a8e4a9b3cc9c304558e30b6bd7be Mon Sep 17 00:00:00 2001 From: Thomas Wilkerling Date: Sun, 24 Feb 2019 21:43:43 +0100 Subject: [PATCH] Update Readme --- README.md | 298 +++++++++++++++++++++++++++++++++++++----------------- 1 file changed, 204 insertions(+), 94 deletions(-) diff --git a/README.md b/README.md index 61eeed2..0a37e60 100644 --- a/README.md +++ b/README.md @@ -80,8 +80,8 @@ All Features: Presets - ✔ - ✔ + ✓ + ✓ - @@ -89,8 +89,8 @@ All Features: Async Search - ✔ - ✔ + ✓ + ✓ - @@ -98,7 +98,7 @@ All Features: Web-Workers (not available in Node.js) - ✔ + ✓ - - @@ -107,17 +107,17 @@ All Features: Contextual Indexes - ✔ - ✔ - ✔ + ✓ + ✓ + ✓ Index Documents (Field-Search) - ✔ - ✔ + ✓ + ✓ - @@ -125,8 +125,8 @@ All Features: Logical Operators - ✔ - ✔ + ✓ + ✓ - @@ -134,7 +134,7 @@ All Features: Where / Find / Tags - ✔ + ✓ - - @@ -143,25 +143,25 @@ All Features: Partial Matching - ✔ - ✔ - ✔ + ✓ + ✓ + ✓ Relevance Scoring - ✔ - ✔ - ✔ + ✓ + ✓ + ✓ Auto-Balanced Cache by Popularity - ✔ + ✓ - - @@ -170,7 +170,7 @@ All Features: Pagination - ✔ + ✓ - - @@ -179,7 +179,7 @@ All Features: Suggestions - ✔ + ✓ - - @@ -188,8 +188,8 @@ All Features: Phonetic Matching - ✔ - ✔ + ✓ + ✓ - @@ -197,9 +197,9 @@ All Features: Customizable: Matcher, Encoder, Tokenizer, Stemmer, Filter - ✔ - ✔ - ✔ + ✓ + ✓ + ✓ File Size (gzip) @@ -221,12 +221,12 @@ Comparison: Rank - Library Name - Library Version - Single Phrase (op/s) - Multi Phrase (op/s) - Not Found (op/s) + Rank + Library Name + Library Version + Single Phrase (op/s) + Multi Phrase (op/s) + Not Found (op/s) 1 @@ -316,11 +316,11 @@ Comparison: Rank - Library Name - Library Version - Index Size * - Memory Allocation ** + Rank + Library Name + Library Version + Index Size * + Memory Allocation ** 1 @@ -432,22 +432,22 @@ The index consists of an in-memory pre-scored dictionary as its base. The bigges - - + + - - + + - - + + - - + +
TypeComplexityTypeComplexity
Each single term query:1Each single term query:1
Lexical Pre-Scored Dictionary (Solo):TERM_COUNT * TERM_MATCHESLexical Pre-Scored Dictionary (Solo):TERM_COUNT * TERM_MATCHES
Lexical Pre-Scored Dictionary + Context-based Map:TERM_MATCHES / TERM_COUNTLexical Pre-Scored Dictionary + Context-based Map:TERM_MATCHES / TERM_COUNT
@@ -458,9 +458,9 @@ The complexity for one single term is always 1. - - - + + + @@ -762,6 +762,8 @@ index.search("John", { }); ``` +See all available custom search options. + #### Pagination @@ -1409,6 +1411,8 @@ var results = index.search([{ ``` --> +See all available field-search options. + ## Logical Operators @@ -1616,10 +1620,12 @@ var results = index.search("John", { > The default sorting order is from lowest to highest. + Sort by a custom function: ```js @@ -1681,7 +1687,7 @@ Create index and just set a limit of cache entries: var index = new FlexSearch({ profile: "score", - cache: 10000 + cache: 1000 }); ``` @@ -1690,7 +1696,7 @@ var index = new FlexSearch({ > When just using "true" the cache is unbounded and perform actually 2-3 times faster (because the balancer do not have to run). -## WebWorker Sharding (Browser only) +## Web-Worker (Browser only) Worker get its own dedicated memory and also run in their own dedicated thread without blocking the UI while processing. Especially for larger indexes, web worker improves speed and available memory a lot. FlexSearch index was tested with a 250 Mb text file including 10 Million words. @@ -1734,18 +1740,20 @@ index.search("John Doe").then(function(results){ ## Options -FlexSearch ist highly customizable. Make use of the the right options can really improve your results as well as memory economy or query time. +FlexSearch ist highly customizable. Make use of the the right options can really improve your results as well as memory economy and query time. + +### Initialize Index
BulkSearchFlexSearchBulkSearchFlexSearch
Access
- - - + + + - - + - - + - - + @@ -1785,7 +1793,7 @@ FlexSearch ist highly customizable. Make use of the the right - + - + - + - + - + - + - + @@ -1853,7 +1861,7 @@ FlexSearch ist highly customizable. Make use of the the right - + - + - +
OptionValuesDescriptionOptionValuesDescription
profile





+ profile





"memory"
"speed"
"match"
@@ -1759,8 +1767,8 @@ FlexSearch ist highly customizable. Make use of the the right
tokenize




+ tokenize




"strict"
"forward"
"reverse"
@@ -1774,8 +1782,8 @@ FlexSearch ist highly customizable. Make use of the the right
split

+ split

RegExp
string
encode






encode






false
"icase"
@@ -1799,7 +1807,7 @@ FlexSearch ist highly customizable. Make use of the the right
cache


cache


false
true
@@ -1809,7 +1817,7 @@ FlexSearch ist highly customizable. Make use of the the right
async

async

true
false @@ -1818,7 +1826,7 @@ FlexSearch ist highly customizable. Make use of the the right
worker

worker

false
{number} @@ -1827,7 +1835,7 @@ FlexSearch ist highly customizable. Make use of the the right
depth

depth

false
{number} @@ -1836,7 +1844,7 @@ FlexSearch ist highly customizable. Make use of the the right
threshold

threshold

false
{number} @@ -1845,7 +1853,7 @@ FlexSearch ist highly customizable. Make use of the the right
resolutionresolution {number}
stemmer


stemmer


false
{string}
@@ -1863,7 +1871,7 @@ FlexSearch ist highly customizable. Make use of the the right
filter


filter


false
{string}
@@ -1873,7 +1881,7 @@ FlexSearch ist highly customizable. Make use of the the right
rtl

rtl

true
false @@ -1882,13 +1890,115 @@ FlexSearch ist highly customizable. Make use of the the right
+ +### Custom Search + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
OptionValuesDescription
limitnumberSets the limit of results.
suggesttrue, falseEnables suggestions in results.
whereobjectUse a where-clause for non-indexed fields.
fieldstring, Array<string>Sets the document fields which should be searched. When no field is set, all fields will be searched. Custom options per field are also supported.
bool"and", "or"Sets the used logical operator when searching through multiple fields.
pagetrue, false, cursorEnables paginated results.
+ +You can also override these following index settings via custom search (v0.7.0): + +- encode +- split +- tokenize +- threshold +- cache +- async + +Custom-Search options will override index options. + + +### Field-Search (v0.7.0) + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
OptionValuesDescription
limitnumberSets the limit of results per field.
suggesttrue, falseEnables suggestions in results per field.
bool"and", "or", "not"Sets the used logical operator when searching through multiple fields.
boostnumberEnables boosting fields.
+ +You can also override these following index settings per field via custom field-search: + +- encode +- split +- tokenize +- threshold + +Field-Search options will override custom-search options and index options. + ## Depth, Threshold, Resolution? -Whereas __depth is the minimum relevance for the context-based index__, __threshold is the minimum relevance for the lexical index__. The threshold score is an enhanced variation of a conventional scoring calculation, it uses on document distance and partial distance instead of TF-IDF. The final scoring value is based on 3 kinds of distance. +Whereas __depth__ is the minimum relevance for the __contextual index__, __threshold__ is the minimum relevance for the __lexical index__. The threshold score is an enhanced variation of a conventional scoring calculation, it uses on document distance and partial distance instead of TF-IDF. The final scoring value is based on 3 kinds of distance. Resolution on the other hand specify the max scoring value. The final score value is an integer value, so resolution affect how many segments the score may have. When the resolution is 1, then there exist just one scoring level for all matched terms. To get more differentiated results you need to raise the resolution. -> The difference of both (_resolution_ - _threshold_) affects the performance on higher values. +> The difference of both affects the performance on higher values (complexity = _resolution_ - _threshold_). The combination of resolution and threshold gives you a good controlling of your matches as well as performance, e.g. when the resolution is 25 and the threshold is 22, then the result only contains matches which are super relevant. The goal should always be just have items in result which are really needed. On top, that also improves performance a lot. @@ -1900,10 +2010,10 @@ Tokenizer effects the required memory also as query time and flexibility of part - - - - + + + + @@ -1951,10 +2061,10 @@ Encoding effects the required memory also as query time and phonetic matches. Tr
OptionDescriptionExampleMemory Factor (n = length of word)OptionDescriptionExampleMemory Factor (n = length of word)
"strict"
- - - - + + + + @@ -2007,11 +2117,11 @@ Encoding effects the required memory also as query time and phonetic matches. Tr
OptionDescriptionFalse-PositivesCompressionOptionDescriptionFalse-PositivesCompression
false
- - - - - + + + + + @@ -2117,8 +2227,8 @@ The required memory for the index depends on several options:
QueryicasesimpleadvancedextraQueryicasesimpleadvancedextra
björn
- - + + @@ -2145,8 +2255,8 @@ The required memory for the index depends on several options: - - + + @@ -2175,8 +2285,8 @@ The required memory for the index depends on several options: - - + + @@ -2564,8 +2674,8 @@ node compile SUPPORT_WORKER=true
EncodingMemory usage of every ~ 100,000 indexed wordEncodingMemory usage of every ~ 100,000 indexed word
false90 kb
ModeMultiplied with: (n = average length of indexed words)ModeMultiplied with: (n = average length of indexed words)
"strict"* n * (n - 1)
Contextual IndexMultiply the sum above with:Contextual IndexMultiply the sum above with:
- - + +
FlagValuesFlagValues
DEBUG