1
0
mirror of https://github.com/pirate/ArchiveBox.git synced 2025-08-29 17:19:53 +02:00

Updated Usage (markdown)

Nick Sweeting
2020-07-29 18:20:36 -04:00
parent b9b1448acd
commit 1a7639e4db

@@ -17,19 +17,20 @@
- [[Troubleshooting]]: Resources if you encounter any problems - [[Troubleshooting]]: Resources if you encounter any problems
- [Screenshots](https://github.com/pirate/ArchiveBox#Screenshots): See what the CLI and outputted HTML look like - [Screenshots](https://github.com/pirate/ArchiveBox#Screenshots): See what the CLI and outputted HTML look like
## Overview
The `./archive` binary is a shortcut to `bin/archivebox`. Piping RSS, JSON, [Netscape](https://msdn.microsoft.com/en-us/library/aa753582(v=vs.85).aspx), or TXT lists of links into the `./archive` command will add them to your archive folder, and create a locally-stored browsable archive for each new URL.
The archiver produces an [output folder](#Disk-Layout) `output/` containing `index.html`, `index.json`, and archived copies of all the sites organized by timestamp bookmarked. It's powered by [Chrome headless](https://developers.google.com/web/updates/2017/04/headless-chrome), good 'ol `wget`, and a few other common Unix tools.
## CLI Usage ## CLI Usage
<img src="https://i.imgur.com/biVfFYr.png" width="30%" align="right"> <img src="https://i.imgur.com/biVfFYr.png" width="30%" align="right">
`./archive` refers to the executable shortcut in the root of the project, but you can also call ArchiveBox via `./bin/archivebox`. If you add `/path/to/ArchiveBox/bin` to your shell `$PATH` then you can call `archivebox` from anywhere on your system. All three of these ways of running ArchiveBox are equivalent and interchangeable:
If you're using Docker, the CLI interface is similar but needs to be prefixed by `docker-compose exec ...` or `docker run ...`, for examples see the [[Docker]] page. - `archivebox [subcommand] [...args]` (using `pip install archivebox`)
- `archivebox run -v $PWD:/data nikisweeting/archivebox [subcommand] [...args]` (using the official Docker image)
- `docker-compose run archivebox [subcommand] [...args]` (using the official Docker image in a Docker Compose project)
You can share a single archivebox data directory between Docker and non-Docker instances as well, allowing you to run the server in a container but still execute CLI commands on the host for example.
For more examples see the [[Docker]] page.
- [Run ArchiveBox with configuration options](#Run-ArchiveBox-with-configuration-options) - [Run ArchiveBox with configuration options](#Run-ArchiveBox-with-configuration-options)
- [Import a single URL or list of URLs via stdin](#Import-a-single-URL-or-list-of-URLs-via-stdin) - [Import a single URL or list of URLs via stdin](#Import-a-single-URL-or-list-of-URLs-via-stdin)
@@ -52,9 +53,17 @@ If you're using Docker, also make sure to read the Configuration section on the
### Import a single URL or list of URLs via stdin ### Import a single URL or list of URLs via stdin
```bash ```bash
echo 'https://example.com' | ./archive archivebox add 'https://example.com'
# or # or
cat urls_to_archive.txt | ./archive echo 'https://example.com' | archivebox add
```
---
### Import a list of URLs from a file or feed
```bash
archivebox add < urls_to_archive.txt
# or
curl https://getpocket.com/users/USERNAME/feed/all | archivebox add
``` ```
You can also pipe in RSS, XML, Netscape, or any of the other supported import formats via stdin. You can also pipe in RSS, XML, Netscape, or any of the other supported import formats via stdin.
@@ -63,27 +72,14 @@ You can also pipe in RSS, XML, Netscape, or any of the other supported import fo
### Import list of links exported from browser or another service ### Import list of links exported from browser or another service
```bash ```bash
./archive ~/Downloads/browser_bookmarks_export.html archivebox add < ~/Downloads/browser_bookmarks_export.html
# or # or
./archive ~/Downloads/pinboard_bookmarks.json archivebox add < ~/Downloads/pinboard_bookmarks.json
# or # or
./archive ~/Downloads/other_links.txt archivebox add < ~/Downloads/other_links.txt
``` ```
Passing a file as an argument here does not archive the file, it parses it as a list of URLs and archives the links *inside of it*, so only use it for *lists of links* to archive, not HTML files or other content you want added directy to the archive. You can also add `--depth=1` to any of these commands if you want to recursively archive the URLs and all URLs one hop away. (e.g. all the outlinks on a page + the page).
---
### Import list of URLs from a remote RSS feed or file
ArchiveBox will download the URL to a local file in `output/sources/` and attempt to autodetect the format and import any URLs found. Currently, Netscape HTML, JSON, RSS, and plain text links lists are supported.
```bash
./archive https://example.com/feed.rss
# or
./archive https://example.com/links.txt
```
Passing a URL as an argument here does not archive the specified URL, it downloads it and archives the links *inside* of it, so only use it for RSS feeds or other *lists of links* you want to add. To add an individual link use the instruction above and pass the URL via stdin instead of as an argument.
--- ---
@@ -93,21 +89,27 @@ This uses the `archivebox-export-browser-history` helper script to parse your br
Specify the type of the browser as the first argument, and optionally the path to the SQLite history file as the second argument. Specify the type of the browser as the first argument, and optionally the path to the SQLite history file as the second argument.
```bash ```bash
./bin/archivebox-export-browser-history --chrome ./bin/export-browser-history --chrome
./archive output/sources/chrome_history.json archivebox add < output/sources/chrome_history.json
# or # or
./bin/archivebox-export-browser-history --firefox ./bin/export-browser-history --firefox
./archive output/sources/firefox_history.json archivebox add < output/sources/firefox_history.json
# or # or
./bin/archivebox-export-browser-history --safari ./bin/export-browser-history --safari
./archive output/sources/safari_history.json archivebox add < output/sources/safari_history.json
``` ```
--- ---
## UI Usage ## UI Usage
To access your archive, open `output/index.html` in a browser. You should see something [like this](https://archive.sweeting.me). ```bash
archivebox server
open http://127.0.0.1:8000
```
Or if you prefer to use the static HTML UI instead of the interactive UI provided by the server,
you can open `./index.html` in a browser. You should see something [like this](https://archive.sweeting.me).
You can sort by column, search using the box in the upper right, and see the total number of links at the bottom. You can sort by column, search using the box in the upper right, and see the total number of links at the bottom.
@@ -120,12 +122,13 @@ Click the Favicon under the "Files" column to go to the details page for each li
## Disk Layout ## Disk Layout
The `output/` folder containing the UI HTML and archived data has the structure outlined here. The `OUTPUT_DIR` folder (usually whatever folder you run `archivebox` in), contains the UI HTML and archived data with the structure outlined below.
```yaml ```yaml
- output/ - output/
- index.json # Main index of all archived URLs - index.sqlite3 # Main index of all archived URLs
- index.html - index.json # Redundant JSON version of the same main index
- index.html # Redundant static HTML version of the same main index
- archive/ - archive/
- 155243135/ # Archived links are stored in folders by timestamp - 155243135/ # Archived links are stored in folders by timestamp
@@ -154,12 +157,11 @@ Those numbers are from running it single-threaded on my i5 machine with 50mbps d
Storage requirements go up immensely if you're using `FETCH_MEDIA=True` and are archiving many pages with audio & video. Storage requirements go up immensely if you're using `FETCH_MEDIA=True` and are archiving many pages with audio & video.
You can run it in parallel by using the `resume` feature, or by manually splitting export.html into multiple files: You can run it in parallel by manually splitting your URLs into separate chunks:
```bash ```bash
./archive export.html 1498800000 & # second argument is timestamp to resume downloading from archivebox add < urls_chunk_1.txt &
./archive export.html 1498810000 & archivebox add < urls_chunk_2.txt &
./archive export.html 1498820000 & archivebox add < urls_chunk_3.txt &
./archive export.html 1498830000 &
``` ```
Users have reported running it with 50k+ bookmarks with success (though it will take more RAM while running). Users have reported running it with 50k+ bookmarks with success (though it will take more RAM while running).