1
0
mirror of https://github.com/pirate/ArchiveBox.git synced 2025-08-30 09:39:52 +02:00

Updated Setting up Search (markdown)

Nick Sweeting
2024-05-10 14:08:03 -07:00
parent 01e8800589
commit 9926933f5f

@@ -48,6 +48,8 @@ However, there are some fundamental limitations of scanning through every file o
<br/>
<a name="ripgrep"></a>
### `ripgrep` *(the default)*
If you do not already have `ripgrep` installed, follow the [instructions here](https://github.com/BurntSushi/ripgrep#installation) to get it.
@@ -78,6 +80,8 @@ archivebox list --filter-type=search 'text to search for'
<br/>
<a name="ripgrep-all"></a>
### `ripgrep-all` (aka `rga`)
The same as ripgrep except that it supports searching more binary filetypes like PDFs, eBooks, Office documents, zip, tar.gz, etc.
@@ -97,6 +101,8 @@ archivebox list --filter-type=search 'text to search for'
<br/>
<a name="ugrep"></a>
### `ugrep`
Not tested by the ArchiveBox team but it's very similar to `ripgrep` and may work as a drop-in replacement, with some caveats. (contributions welcome to improve support)
@@ -123,6 +129,8 @@ archivebox config --set RIPGREP_BINARY=ugrep+
<br/><br/>
<a name="sonic"></a>
### `sonic` ⭐️ (the recommended upgrade path for most people)
[Sonic](https://github.com/valeriansaliou/sonic) is a fast, lightweight, rust-based alternative to super-heavy traditional search backends like Elasticsearch. It is capable of normalizing natural language search queries, fuzzy matching, and searching Unicode, without needing to maintain a duplicate document store index of all the searchable text.
@@ -172,6 +180,8 @@ docker compose run archivebox list --filter-type=search 'some text to search'
<br/>
<a name="fts5"></a>
### `SQLite FTS5`
This is a [recently added](https://github.com/ArchiveBox/ArchiveBox/pull/1241) experimental option that uses a separate SQLite3 Database (similar to the one ArchiveBox already uses for Snapshot metadata) to provide full-text search.