mirror of
https://github.com/pirate/ArchiveBox.git
synced 2025-08-22 14:13:01 +02:00
docs: Update usage page
40
Docker.md
40
Docker.md
@@ -10,22 +10,23 @@ Make sure you have Docker installed and set up on your machine before following
|
||||
|
||||
- [Overview](#)
|
||||
- [Docker Compose](#docker-compose) (recommended way)
|
||||
+ [Setup](#setup)
|
||||
+ [Usage](#usage)
|
||||
+ [Accessing the data](#accessing-the-data)
|
||||
+ [Configuration](#configuration)
|
||||
- [Setup](#setup)
|
||||
- [Usage](#usage)
|
||||
- [Accessing the data](#accessing-the-data)
|
||||
- [Configuration](#configuration)
|
||||
- [Plain Docker](#docker)
|
||||
+ [Setup](#setup-1)
|
||||
+ [Usage](#usage-1)
|
||||
+ [Accessing the data](#accessing-the-data-1)
|
||||
+ [Configuration](#configuration-1)
|
||||
- [Setup](#setup-1)
|
||||
- [Usage](#usage-1)
|
||||
- [Accessing the data](#accessing-the-data-1)
|
||||
- [Configuration](#configuration-1)
|
||||
|
||||
**Official Docker Hub image:**
|
||||
https://hub.docker.com/r/nikisweeting/archivebox
|
||||
|
||||
**Usage:**
|
||||
|
||||
```bash
|
||||
echo 'https://example.com' | docker run -i -v ~/ArchiveBox:/data nikisweeting/archivebox
|
||||
echo 'https://example.com' | docker run -i -v ~/ArchiveBox:/data nikisweeting/archivebox add
|
||||
```
|
||||
|
||||
---
|
||||
@@ -37,6 +38,7 @@ echo 'https://example.com' | docker run -i -v ~/ArchiveBox:/data nikisweeting/ar
|
||||
An example [`docker-compose.yml`](https://github.com/pirate/ArchiveBox/blob/master/docker-compose.yml) config with ArchiveBox and an Nginx server to serve the archive is included in the project root. You can edit it as you see fit, or just run it as it comes out-of-the-box.
|
||||
|
||||
Just make sure you have a Docker version that's [new enough](https://docs.docker.com/compose/compose-file/) to support `version: 3` format:
|
||||
|
||||
```bash
|
||||
docker --version
|
||||
Docker version 18.09.1, build 4c52b90 # must be >= 17.04.0
|
||||
@@ -59,21 +61,25 @@ First, make sure you're `cd`'ed into the same folder as your `docker-compose.yml
|
||||
To add new URLs, you can use docker-compose just like the normal `./archive` CLI.
|
||||
|
||||
**To add an individual link or list of links**, pass in URLs via stdin.
|
||||
|
||||
```bash
|
||||
echo "https://example.com" | docker-compose exec -T archivebox /bin/archive
|
||||
```
|
||||
|
||||
**To import links from a file** you can either `cat` the file and pass it via stdin like above, or move it into your data folder so that ArchiveBox can access it from within the container.
|
||||
|
||||
```bash
|
||||
mv ~/Downloads/bookmarks.html data/sources/bookmarks.html
|
||||
docker-compose exec archivebox /bin/archive /data/sources/bookmarks.html
|
||||
```
|
||||
|
||||
**To pull in links from a feed or remote file**, pass the URL or path to the feed as an argument.
|
||||
|
||||
```bash
|
||||
docker-compose exec archivebox /bin/archive https://example.com/some/feed.rss
|
||||
```
|
||||
Passing a URL as an argument here does not archive the specified URL, it downloads it and archives the links *inside* of it, so only use it for RSS feeds or other *lists of links* you want to add. To add an individual link you want to archive use the instruction above and pass via stdin instead of by argument.
|
||||
|
||||
Passing a URL as an argument here does not archive the specified URL, it downloads it and archives the links _inside_ of it, so only use it for RSS feeds or other _lists of links_ you want to add. To add an individual link you want to archive use the instruction above and pass via stdin instead of by argument.
|
||||
|
||||
### Accessing the data
|
||||
|
||||
@@ -88,6 +94,7 @@ ArchiveBox running with docker-compose accepts all the same environment variable
|
||||
The recommended way to pass in config variables is to edit the `environment:` section in `docker-compose.yml` directly or add an `env_file: ./path/to/ArchiveBox.conf` line before `environment:` to import variables from an env file.
|
||||
|
||||
Example of adding config options to `docker-compose.yml`:
|
||||
|
||||
```yaml
|
||||
...
|
||||
|
||||
@@ -114,6 +121,7 @@ If you want to access your archive server with HTTPS, put a reverse proxy like N
|
||||
### Setup
|
||||
|
||||
Fetch and run the ArchiveBox Docker image to create your initial archive.
|
||||
|
||||
```bash
|
||||
echo 'https://example.com' | docker run -i -v ~/ArchiveBox:/data nikisweeting/archivebox
|
||||
```
|
||||
@@ -125,6 +133,7 @@ Make sure the data folder you use host is either a new, uncreated path, or if it
|
||||
### Usage
|
||||
|
||||
**To add a single URL to the archive** or a list of links from a file, pipe them in via stdin. This will archive each link passed in.
|
||||
|
||||
```bash
|
||||
echo 'https://example.com' | docker run -i -v ~/ArchiveBox:/data nikisweeting/archivebox
|
||||
# or
|
||||
@@ -132,19 +141,23 @@ cat bookmarks.html | docker run -i -v ~/ArchiveBox:/data nikisweeting/archivebox
|
||||
```
|
||||
|
||||
**To add a list of pages via feed URL or remote file,** pass the URL of the feed as an argument.
|
||||
|
||||
```bash
|
||||
docker run -v -v ~/ArchiveBox:/data nikisweeting/archivebox /bin/archive 'https://example.com/some/rss/feed.xml'
|
||||
```
|
||||
Passing a URL as an argument here does not archive the specified URL, it downloads it and archives the links *inside* of it, so only use it for RSS feeds or other *lists of links* you want to add. To add an individual link use the instruction above and pass via stdin instead of by argument.
|
||||
|
||||
Passing a URL as an argument here does not archive the specified URL, it downloads it and archives the links _inside_ of it, so only use it for RSS feeds or other _lists of links_ you want to add. To add an individual link use the instruction above and pass via stdin instead of by argument.
|
||||
|
||||
### Accessing the data
|
||||
|
||||
#### Using a bind folder
|
||||
|
||||
Use the flag:
|
||||
|
||||
```bash
|
||||
-v /full/path/to/folder/on/host:/data
|
||||
```
|
||||
|
||||
This will use the folder `/full/path/to/folder/on/host` on your host to store the ArchiveBox output.
|
||||
|
||||
#### Using a named Docker data volume
|
||||
@@ -152,7 +165,9 @@ This will use the folder `/full/path/to/folder/on/host` on your host to store th
|
||||
```bash
|
||||
docker volume create archivebox-data
|
||||
```
|
||||
|
||||
Then use the flag:
|
||||
|
||||
```bash
|
||||
-v archivebox-data:/data
|
||||
```
|
||||
@@ -161,6 +176,7 @@ You can mount your data volume using standard docker tools, or access the conten
|
||||
`/var/lib/docker/volumes/archivebox-data/_data` (on most Linux systems)
|
||||
|
||||
On a Mac you'll have to enter the base Docker Linux VM first to access the volume data:
|
||||
|
||||
```bash
|
||||
screen ~/Library/Containers/com.docker.docker/Data/vms/0/tty
|
||||
cd /var/lib/docker/volumes/archivebox-data/_data
|
||||
@@ -171,11 +187,13 @@ cd /var/lib/docker/volumes/archivebox-data/_data
|
||||
ArchiveBox in Docker accepts all the same environment variables as normal, see the list on the [[Configuration]] page.
|
||||
|
||||
To pass environment variables when running, you can use the env command.
|
||||
|
||||
```bash
|
||||
echo 'https://example.com' | docker run -i -v ~/ArchiveBox:/data nikisweeting/archivebox env FETCH_SCREENSHOT=False /bin/archive
|
||||
```
|
||||
|
||||
Or you can create an `ArchiveBox.env` file (copy from the default `etc/ArchiveBox.conf.default`) and pass it in like so:
|
||||
|
||||
```bash
|
||||
docker run -i -v --env-file=ArchiveBox.env nikisweeting/archivebox
|
||||
```
|
||||
|
52
Usage.md
52
Usage.md
@@ -1,6 +1,6 @@
|
||||
# Usage
|
||||
|
||||
▶️ *Make sure the dependencies are [fully installed](https://github.com/pirate/ArchiveBox/wiki/Install) before running any ArchiveBox commands.*
|
||||
▶️ _Make sure the dependencies are [fully installed](https://github.com/pirate/ArchiveBox/wiki/Install) before running any ArchiveBox commands._
|
||||
|
||||
**ArchiveBox API Reference:**
|
||||
|
||||
@@ -12,6 +12,7 @@
|
||||
- [Disk Layout](#Disk-Layout): Description of the archive folder structure and contents.
|
||||
|
||||
**Related:**
|
||||
|
||||
- [[Docker]]: Learn about ArchiveBox usage with Docker and Docker Compose
|
||||
- [[Configuration]]: Learn about the various archive method options
|
||||
- [[Scheduled Archiving]]: Learn how to set up automatic daily archiving
|
||||
@@ -21,15 +22,29 @@
|
||||
|
||||
## Overview
|
||||
|
||||
The `./archive` binary is a shortcut to `bin/archivebox`. Piping RSS, JSON, [Netscape](https://msdn.microsoft.com/en-us/library/aa753582(v=vs.85).aspx), or TXT lists of links into the `./archive` command will add them to your archive folder, and create a locally-stored browsable archive for each new URL.
|
||||
The `./archive` binary is a shortcut to `bin/archivebox`.
|
||||
You can create/initialize an archive folder anywhere you want. You can to this with:
|
||||
|
||||
The archiver produces an [output folder](#Disk-Layout) `output/` containing `index.html`, `index.json`, and archived copies of all the sites organized by timestamp bookmarked. It's powered by [Chrome headless](https://developers.google.com/web/updates/2017/04/headless-chrome), good 'ol `wget`, and a few other common Unix tools.
|
||||
```bash
|
||||
archivebox init
|
||||
```
|
||||
|
||||
You can add urls using `stdin` or `args` along with the `add` command:
|
||||
|
||||
```bash
|
||||
archivebox add https://example.com
|
||||
archivebox add https://my-rss-feed.com --depth=1
|
||||
```
|
||||
|
||||
Passing RSS, JSON, [Netscape](<https://msdn.microsoft.com/en-us/library/aa753582(v=vs.85).aspx>), or TXT lists of links into the `./archive add` command will add them to your archive folder, and create a locally-stored browsable archive for each new URL.
|
||||
|
||||
The archiver will create new files for the index and the archived copies of all the sites organized by timestamp. It's powered by [Chrome headless](https://developers.google.com/web/updates/2017/04/headless-chrome), good 'ol `wget`, and a few other common Unix tools.
|
||||
|
||||
## CLI Usage
|
||||
|
||||
<img src="https://i.imgur.com/biVfFYr.png" width="30%" align="right">
|
||||
|
||||
`./archive` refers to the executable shortcut in the root of the project, but you can also call ArchiveBox via `./bin/archivebox`. If you add `/path/to/ArchiveBox/bin` to your shell `$PATH` then you can call `archivebox` from anywhere on your system.
|
||||
`archivebox` refers to the executable that is available when you install this project using PIP.
|
||||
|
||||
If you're using Docker, the CLI interface is similar but needs to be prefixed by `docker-compose exec ...` or `docker run ...`, for examples see the [[Docker]] page.
|
||||
|
||||
@@ -42,22 +57,32 @@ If you're using Docker, the CLI interface is similar but needs to be prefixed by
|
||||
---
|
||||
|
||||
### Run ArchiveBox with configuration options
|
||||
|
||||
You can set environment variables in your shell profile, a config file, or by using the `env` command.
|
||||
|
||||
```bash
|
||||
env FETCH_MEDIA=True MEDIA_TIMEOUT=500 ./archive ...
|
||||
```
|
||||
|
||||
See [[Configuration]] page for more details about the available options and ways to pass config.
|
||||
If you're using Docker, also make sure to read the Configuration section on the [[Docker]] page.
|
||||
|
||||
---
|
||||
|
||||
### Import a single URL or list of URLs via stdin
|
||||
### Import a single URL
|
||||
|
||||
```bash
|
||||
echo 'https://example.com' | ./archive
|
||||
echo 'https://example.com' | archivebox add
|
||||
# or
|
||||
cat urls_to_archive.txt | ./archive
|
||||
archivebox add https://example.com
|
||||
```
|
||||
|
||||
### Import a list of URLs from a txt file
|
||||
|
||||
```bash
|
||||
cat urls_to_archive.txt | archivebox add
|
||||
```
|
||||
|
||||
You can also pipe in RSS, XML, Netscape, or any of the other supported import formats via stdin.
|
||||
|
||||
---
|
||||
@@ -72,24 +97,26 @@ You can also pipe in RSS, XML, Netscape, or any of the other supported import fo
|
||||
./archive ~/Downloads/other_links.txt
|
||||
```
|
||||
|
||||
Passing a file as an argument here does not archive the file, it parses it as a list of URLs and archives the links *inside of it*, so only use it for *lists of links* to archive, not HTML files or other content you want added directy to the archive.
|
||||
Passing a file as an argument here does not archive the file, it parses it as a list of URLs and archives the links _inside of it_, so only use it for _lists of links_ to archive, not HTML files or other content you want added directy to the archive.
|
||||
|
||||
---
|
||||
|
||||
### Import list of URLs from a remote RSS feed or file
|
||||
|
||||
ArchiveBox will download the URL to a local file in `output/sources/` and attempt to autodetect the format and import any URLs found. Currently, Netscape HTML, JSON, RSS, and plain text links lists are supported.
|
||||
|
||||
```bash
|
||||
./archive https://example.com/feed.rss
|
||||
echo https://my-rss-feed | archivebox add --depth=1
|
||||
# or
|
||||
./archive https://example.com/links.txt
|
||||
archivebox add https://my-rss-feed --depth=1
|
||||
```
|
||||
|
||||
Passing a URL as an argument here does not archive the specified URL, it downloads it and archives the links *inside* of it, so only use it for RSS feeds or other *lists of links* you want to add. To add an individual link use the instruction above and pass the URL via stdin instead of as an argument.
|
||||
Passing a URL as an argument here does archive the specified URL, it downloads it and archives the links _inside_ of it, so you can use it for RSS feeds or other _lists of links_ you want to add.
|
||||
|
||||
---
|
||||
|
||||
### Import list of links from browser history
|
||||
|
||||
```bash
|
||||
./bin/archivebox-export-browser-history --chrome
|
||||
./archive output/sources/chrome_history.json
|
||||
@@ -150,12 +177,14 @@ Those numbers are from running it single-threaded on my i5 machine with 50mbps d
|
||||
Storage requirements go up immensely if you're using `FETCH_MEDIA=True` and are archiving many pages with audio & video.
|
||||
|
||||
You can run it in parallel by using the `resume` feature, or by manually splitting export.html into multiple files:
|
||||
|
||||
```bash
|
||||
./archive export.html 1498800000 & # second argument is timestamp to resume downloading from
|
||||
./archive export.html 1498810000 &
|
||||
./archive export.html 1498820000 &
|
||||
./archive export.html 1498830000 &
|
||||
```
|
||||
|
||||
Users have reported running it with 50k+ bookmarks with success (though it will take more RAM while running).
|
||||
|
||||
If you already imported a huge list of bookmarks and want to import only new
|
||||
@@ -163,7 +192,6 @@ bookmarks, you can use the `ONLY_NEW` environment variable. This is useful if
|
||||
you want to import a bookmark dump periodically and want to skip broken links
|
||||
which are already in the index.
|
||||
|
||||
|
||||
## Python API Usage
|
||||
|
||||
```python
|
||||
|
Reference in New Issue
Block a user