1
0
mirror of https://github.com/pirate/ArchiveBox.git synced 2025-08-31 10:01:52 +02:00

Updated Setting up Search (markdown)

Nick Sweeting
2024-05-10 14:08:03 -07:00
parent 01e8800589
commit 9926933f5f

@@ -48,6 +48,8 @@ However, there are some fundamental limitations of scanning through every file o
<br/> <br/>
<a name="ripgrep"></a>
### `ripgrep` *(the default)* ### `ripgrep` *(the default)*
If you do not already have `ripgrep` installed, follow the [instructions here](https://github.com/BurntSushi/ripgrep#installation) to get it. If you do not already have `ripgrep` installed, follow the [instructions here](https://github.com/BurntSushi/ripgrep#installation) to get it.
@@ -78,6 +80,8 @@ archivebox list --filter-type=search 'text to search for'
<br/> <br/>
<a name="ripgrep-all"></a>
### `ripgrep-all` (aka `rga`) ### `ripgrep-all` (aka `rga`)
The same as ripgrep except that it supports searching more binary filetypes like PDFs, eBooks, Office documents, zip, tar.gz, etc. The same as ripgrep except that it supports searching more binary filetypes like PDFs, eBooks, Office documents, zip, tar.gz, etc.
@@ -97,6 +101,8 @@ archivebox list --filter-type=search 'text to search for'
<br/> <br/>
<a name="ugrep"></a>
### `ugrep` ### `ugrep`
Not tested by the ArchiveBox team but it's very similar to `ripgrep` and may work as a drop-in replacement, with some caveats. (contributions welcome to improve support) Not tested by the ArchiveBox team but it's very similar to `ripgrep` and may work as a drop-in replacement, with some caveats. (contributions welcome to improve support)
@@ -123,6 +129,8 @@ archivebox config --set RIPGREP_BINARY=ugrep+
<br/><br/> <br/><br/>
<a name="sonic"></a>
### `sonic` ⭐️ (the recommended upgrade path for most people) ### `sonic` ⭐️ (the recommended upgrade path for most people)
[Sonic](https://github.com/valeriansaliou/sonic) is a fast, lightweight, rust-based alternative to super-heavy traditional search backends like Elasticsearch. It is capable of normalizing natural language search queries, fuzzy matching, and searching Unicode, without needing to maintain a duplicate document store index of all the searchable text. [Sonic](https://github.com/valeriansaliou/sonic) is a fast, lightweight, rust-based alternative to super-heavy traditional search backends like Elasticsearch. It is capable of normalizing natural language search queries, fuzzy matching, and searching Unicode, without needing to maintain a duplicate document store index of all the searchable text.
@@ -172,6 +180,8 @@ docker compose run archivebox list --filter-type=search 'some text to search'
<br/> <br/>
<a name="fts5"></a>
### `SQLite FTS5` ### `SQLite FTS5`
This is a [recently added](https://github.com/ArchiveBox/ArchiveBox/pull/1241) experimental option that uses a separate SQLite3 Database (similar to the one ArchiveBox already uses for Snapshot metadata) to provide full-text search. This is a [recently added](https://github.com/ArchiveBox/ArchiveBox/pull/1241) experimental option that uses a separate SQLite3 Database (similar to the one ArchiveBox already uses for Snapshot metadata) to provide full-text search.