1
0
mirror of https://github.com/pirate/ArchiveBox.git synced 2025-08-24 23:16:19 +02:00

Updated Setting up Search (markdown)

Nick Sweeting
2024-05-07 01:09:20 -07:00
parent 78dbe1f0ef
commit 466cfd8bae

@@ -121,20 +121,22 @@ archivebox config --set RIPGREP_BINARY=ugrep+
### `sonic` ⭐️ (the recommended upgrade path for most people) ### `sonic` ⭐️ (the recommended upgrade path for most people)
Sonic is a fast, lightweight, rust-based alternative to super-heavy traditional search backends like Elasticsearch. It is capable of normalizing natural language search queries, fuzzy matching, and searching Unicode, without needing to maintain a duplicate document store index of all the searchable text. Instead it works as an index store, storing only the IDs of the Snapshots with a super-compressed internal index. This allows it to scale to searching terabytes of archive data while maintaining an index only a fraction of that size. [Sonic](https://github.com/valeriansaliou/sonic) is a fast, lightweight, rust-based alternative to super-heavy traditional search backends like Elasticsearch. It is capable of normalizing natural language search queries, fuzzy matching, and searching Unicode, without needing to maintain a duplicate document store index of all the searchable text.
It is the recommended backend for most ArchiveBox users who need to scale beyond what `ripgrep` can provide. Internally it functions as an index store, storing only the original IDs of the Snapshots with a super-compressed representation of the text. This allows it to scale to searching terabytes of archive data while maintaining an index only a fraction of that size.
Using sonic with ArchiveBox in Docker Compose is the easiest way to get started, though you can also use it without Docker. *ArchiveBox has supported Sonic for years, and it is the most thoroughly tested and recommended backend for ArchiveBox users that need to scale beyond `ripgrep`.*
Using [sonic with ArchiveBox in Docker Compose](https://github.com/ArchiveBox/ArchiveBox/blob/dev/docker-compose.yml) is the easiest way to get started, though you can also use it without Docker by [installing it manually](https://github.com/valeriansaliou/sonic#installation).
```bash ```bash
# edit docker-compose.yml and uncomment the lines related to sonic # edit docker-compose.yml to uncomment the lines that enable sonic
nano docker-compose.yml nano docker-compose.yml
# make sure ArchiveBox is configured to use Sonic # make sure ArchiveBox is configured to use the Sonic backend
docker compose run archivebox config --set SEARCH_BACKEND_ENGINE=sonic docker compose run archivebox config --set SEARCH_BACKEND_ENGINE=sonic
# restart all the containers to apply the changes # restart the containers to apply changes and start the Sonic worker
docker compose down docker compose down
docker compose up docker compose up
@@ -142,7 +144,7 @@ docker compose up
docker compose logs sonic docker compose logs sonic
docker compose run archivebox version docker compose run archivebox version
# add any existing archivebox data to the new Sonic index (may take an hour or longer depending on storage speed and collection size) # backfill any existing archivebox data into the Sonic index (may take an hour or longer depending on storage speed and collection size)
docker compose run archivebox update --index-only docker compose run archivebox update --index-only
# then test it out: # then test it out: