1
0
mirror of https://github.com/pirate/ArchiveBox.git synced 2025-08-23 06:33:18 +02:00

Merge branch 'django'

Nick Sweeting
2020-07-29 23:53:03 -04:00
29 changed files with 996 additions and 129 deletions

1
.gitignore vendored Normal file

@@ -0,0 +1 @@
_build/

@@ -1,3 +1,5 @@
# Changelog
▶️ *If you're having an issue with a breaking change, or migrating your data between versions, open an [issue](https://github.com/pirate/ArchiveBox/issues) to get help.* ▶️ *If you're having an issue with a breaking change, or migrating your data between versions, open an [issue](https://github.com/pirate/ArchiveBox/issues) to get help.*
**`ArchiveBox` was previously named `Pocket Archive Stream` and then `Bookmark Archiver`.** **`ArchiveBox` was previously named `Pocket Archive Stream` and then `Bookmark Archiver`.**

@@ -1,3 +1,5 @@
# Chromium Install
By default, ArchiveBox looks for any existing installed version of Chrome/Chromium and uses it if found. You can optionally install a specific version and set the environment variable `CHROME_BINARY` to force ArchiveBox to use that one, e.g.: By default, ArchiveBox looks for any existing installed version of Chrome/Chromium and uses it if found. You can optionally install a specific version and set the environment variable `CHROME_BINARY` to force ArchiveBox to use that one, e.g.:
- `CHROME_BINARY=google-chrome-beta` - `CHROME_BINARY=google-chrome-beta`
@@ -6,7 +8,7 @@ By default, ArchiveBox looks for any existing installed version of Chrome/Chromi
If you don't already have Chrome installed, I recommend installing Chromium instead of Google Chrome, as it's the open-source fork of Chrome that doesn't send as much tracking data to Google. If you don't already have Chrome installed, I recommend installing Chromium instead of Google Chrome, as it's the open-source fork of Chrome that doesn't send as much tracking data to Google.
#### Check for existing Chrome/Chromium install **Check for existing Chrome/Chromium install:**
<img src="https://i.imgur.com/FxFoIMH.jpg" width="25%" align="right"/> <img src="https://i.imgur.com/FxFoIMH.jpg" width="25%" align="right"/>

@@ -1,5 +1,6 @@
▶️ *The default ArchiveBox config file can be found here: [`etc/ArchiveBox.conf.default`](https://github.com/pirate/ArchiveBox/blob/master/etc/ArchiveBox.conf.default).* # Configuration
▶️ *The default ArchiveBox config file can be found here: [`etc/ArchiveBox.conf.default`](https://github.com/pirate/ArchiveBox/blob/master/etc/ArchiveBox.conf.default).*
Configuration is done through environment variables. You can pass in settings using all the usual environment variable methods: e.g. by using the `env` command, exporting variables in your shell profile, or sourcing a `.env` file before running the command. Configuration is done through environment variables. You can pass in settings using all the usual environment variable methods: e.g. by using the `env` command, exporting variables in your shell profile, or sourcing a `.env` file before running the command.
@@ -384,31 +385,3 @@ Path or name of the curl binary to use.
<img src="https://i.imgur.com/almAbwK.png" width="100%"/> <img src="https://i.imgur.com/almAbwK.png" width="100%"/>
---
# Creating a Config File
*Note: If you're using Docker, see the [[Docker]] page for configuration instructions.*
To set up a persistent config:
1. Copy `etc/ArchiveBox.conf.default` to `~/.ArchiveBox.conf`
```bash
cp ArchiveBox/etc/ArchiveBox.conf.default ~/.ArchiveBox.conf
```
2. Edit your options inside `~/.ArchiveBox.conf`, e.g.:
```bash
CHROME_BINARY=google-chrome-stable
RESOLUTION=1440,900
FETCH_PDF=False
```
3. Source your config file when you run your archive script:
```bash
eval export $(grep -v '^#' ~/path/to/your/ArchiveBox.conf); ./archive https://example.com/rss/feed.xml
```
Improving this process is on the roadmap, in future versions you'll be able to pass a config file directly to the archive command.

71
Contents.rst Normal file

@@ -0,0 +1,71 @@
Intro
#####
.. toctree::
:maxdepth: 1
README.md
Getting Started
###############
.. toctree::
:maxdepth: 2
Quickstart.md
Install.md
Docker.md
General
#######
.. toctree::
:maxdepth: 2
Usage.md
Configuration.md
Troubleshooting.md
Security-Overview.md
Publishing-Your-Archive.md
Scheduled-Archiving.md
Chromium-Install.md
API Reference
#############
.. toctree::
:maxdepth: 1
Configuration Options <Configuration.md>
Data Folder Layout <Usage.md>
Command Line Interface <Usage.md>
Web Interface <Usage.md>
Python API <modules>
REST API <modules>
.. - [Configuration Options](Configuration.md)
.. - [Data Folder Layout](Configuration.md)
.. - [Command Line Interface](Usage.md)
.. - [Web Interface](Usage.md)
.. - [Python API](modules)
.. - REST API (Coming soon...)
Meta
####
.. toctree::
:maxdepth: 1
Roadmap.md
Changelog.md
Donations.md
.. toctree::
:maxdepth: 3
Web-Archiving-Community.md

@@ -1,4 +1,6 @@
# Overview # Docker
## Overview
Running ArchiveBox with Docker allows you to manage it in a container without exposing it to the rest of your system. Usage with Docker is similar to usage of ArchiveBox normally, with a few small differences. Running ArchiveBox with Docker allows you to manage it in a container without exposing it to the rest of your system. Usage with Docker is similar to usage of ArchiveBox normally, with a few small differences.
@@ -8,39 +10,41 @@ Make sure you have Docker installed and set up on your machine before following
- [Overview](#) - [Overview](#)
- [Docker Compose](#docker-compose) (recommended way) - [Docker Compose](#docker-compose) (recommended way)
+ [Setup](#setup) - [Setup](#setup)
+ [Usage](#usage) - [Usage](#usage)
+ [Accessing the data](#accessing-the-data) - [Accessing the data](#accessing-the-data)
+ [Configuration](#configuration) - [Configuration](#configuration)
- [Plain Docker](#docker) - [Plain Docker](#docker)
+ [Setup](#setup-1) - [Setup](#setup-1)
+ [Usage](#usage-1) - [Usage](#usage-1)
+ [Accessing the data](#accessing-the-data-1) - [Accessing the data](#accessing-the-data-1)
+ [Configuration](#configuration-1) - [Configuration](#configuration-1)
**Official Docker Hub image:** **Official Docker Hub image:**
https://hub.docker.com/r/nikisweeting/archivebox https://hub.docker.com/r/nikisweeting/archivebox
**Usage:** **Usage:**
```bash ```bash
echo 'https://example.com' | docker run -i -v ~/ArchiveBox:/data nikisweeting/archivebox echo 'https://example.com' | docker run -i -v ~/ArchiveBox:/data nikisweeting/archivebox add
``` ```
--- ---
<img src="https://i.imgur.com/knwOtky.png" height="40px" align="right"> <img src="https://i.imgur.com/knwOtky.png" height="40px" align="right">
# Docker Compose ## Docker Compose
An example [`docker-compose.yml`](https://github.com/pirate/ArchiveBox/blob/master/docker-compose.yml) config with ArchiveBox and an Nginx server to serve the archive is included in the project root. You can edit it as you see fit, or just run it as it comes out-of-the-box. An example [`docker-compose.yml`](https://github.com/pirate/ArchiveBox/blob/master/docker-compose.yml) config with ArchiveBox and an Nginx server to serve the archive is included in the project root. You can edit it as you see fit, or just run it as it comes out-of-the-box.
Just make sure you have a Docker version that's [new enough](https://docs.docker.com/compose/compose-file/) to support `version: 3` format: Just make sure you have a Docker version that's [new enough](https://docs.docker.com/compose/compose-file/) to support `version: 3` format:
```bash ```bash
docker --version docker --version
Docker version 18.09.1, build 4c52b90 # must be >= 17.04.0 Docker version 18.09.1, build 4c52b90 # must be >= 17.04.0
``` ```
## Setup ### Setup
```bash ```bash
git clone https://github.com/pirate/ArchiveBox && cd ArchiveBox git clone https://github.com/pirate/ArchiveBox && cd ArchiveBox
@@ -50,43 +54,48 @@ docker-compose up -d
Then open [`http://127.0.0.1:8098`](http://127.0.0.1:8098) or `data/index.html` to view the archive (HTTP, not HTTPS). Then open [`http://127.0.0.1:8098`](http://127.0.0.1:8098) or `data/index.html` to view the archive (HTTP, not HTTPS).
## Usage ### Usage
First, make sure you're `cd`'ed into the same folder as your `docker-compose.yml` file (e.g. the project root) and that your containers have been started with `docker-compose up -d`. First, make sure you're `cd`'ed into the same folder as your `docker-compose.yml` file (e.g. the project root) and that your containers have been started with `docker-compose up -d`.
To add new URLs, you can use docker-compose just like the normal `./archive` CLI. To add new URLs, you can use docker-compose just like the normal `./archive` CLI.
**To add an individual link or list of links**, pass in URLs via stdin. **To add an individual link or list of links**, pass in URLs via stdin.
```bash ```bash
echo "https://example.com" | docker-compose exec -T archivebox /bin/archive echo "https://example.com" | docker-compose exec -T archivebox /bin/archive
``` ```
**To import links from a file** you can either `cat` the file and pass it via stdin like above, or move it into your data folder so that ArchiveBox can access it from within the container. **To import links from a file** you can either `cat` the file and pass it via stdin like above, or move it into your data folder so that ArchiveBox can access it from within the container.
```bash ```bash
mv ~/Downloads/bookmarks.html data/sources/bookmarks.html mv ~/Downloads/bookmarks.html data/sources/bookmarks.html
docker-compose exec archivebox /bin/archive /data/sources/bookmarks.html docker-compose exec archivebox /bin/archive /data/sources/bookmarks.html
``` ```
**To pull in links from a feed or remote file**, pass the URL or path to the feed as an argument. **To pull in links from a feed or remote file**, pass the URL or path to the feed as an argument.
```bash ```bash
docker-compose exec archivebox /bin/archive https://example.com/some/feed.rss docker-compose exec archivebox /bin/archive https://example.com/some/feed.rss
``` ```
Passing a URL as an argument here does not archive the specified URL, it downloads it and archives the links *inside* of it, so only use it for RSS feeds or other *lists of links* you want to add. To add an individual link you want to archive use the instruction above and pass via stdin instead of by argument.
## Accessing the data Passing a URL as an argument here does not archive the specified URL, it downloads it and archives the links _inside_ of it, so only use it for RSS feeds or other _lists of links_ you want to add. To add an individual link you want to archive use the instruction above and pass via stdin instead of by argument.
### Accessing the data
The outputted archive data is stored in `data/` (relative to the project root), or whatever folder path you specified in the `docker-compose.yml` `volumes:` section. Make sure the `data/` folder on the host has permissions initially set to `777` so that the ArchiveBox command is able to set it to the specified `OUTPUT_PERMISSIONS` config setting on the first run. The outputted archive data is stored in `data/` (relative to the project root), or whatever folder path you specified in the `docker-compose.yml` `volumes:` section. Make sure the `data/` folder on the host has permissions initially set to `777` so that the ArchiveBox command is able to set it to the specified `OUTPUT_PERMISSIONS` config setting on the first run.
To access your archive, you can open `data/index.html` directly, or you can use the provided Nginx server running inside docker on [`http://127.0.0.1:8098`](http://127.0.0.1:8098). To access your archive, you can open `data/index.html` directly, or you can use the provided Nginx server running inside docker on [`http://127.0.0.1:8098`](http://127.0.0.1:8098).
## Configuration ### Configuration
ArchiveBox running with docker-compose accepts all the same environment variables as normal, see the full list on the [[Configuration]] page. ArchiveBox running with docker-compose accepts all the same environment variables as normal, see the full list on the [[Configuration]] page.
The recommended way to pass in config variables is to edit the `environment:` section in `docker-compose.yml` directly or add an `env_file: ./path/to/ArchiveBox.conf` line before `environment:` to import variables from an env file. The recommended way to pass in config variables is to edit the `environment:` section in `docker-compose.yml` directly or add an `env_file: ./path/to/ArchiveBox.conf` line before `environment:` to import variables from an env file.
Example of adding config options to `docker-compose.yml`: Example of adding config options to `docker-compose.yml`:
```yml
```yaml
... ...
services: services:
@@ -107,11 +116,12 @@ If you want to access your archive server with HTTPS, put a reverse proxy like N
--- ---
# Docker ## Docker
## Setup ### Setup
Fetch and run the ArchiveBox Docker image to create your initial archive. Fetch and run the ArchiveBox Docker image to create your initial archive.
```bash ```bash
echo 'https://example.com' | docker run -i -v ~/ArchiveBox:/data nikisweeting/archivebox echo 'https://example.com' | docker run -i -v ~/ArchiveBox:/data nikisweeting/archivebox
``` ```
@@ -120,9 +130,10 @@ Replace `~/ArchiveBox` in the command above with the full path to a folder to us
Make sure the data folder you use host is either a new, uncreated path, or if it already exists make sure it has permissions initially set to `777` so that the ArchiveBox command is able to set it to the specified `OUTPUT_PERMISSIONS` config setting on the first run. Make sure the data folder you use host is either a new, uncreated path, or if it already exists make sure it has permissions initially set to `777` so that the ArchiveBox command is able to set it to the specified `OUTPUT_PERMISSIONS` config setting on the first run.
## Usage ### Usage
**To add a single URL to the archive** or a list of links from a file, pipe them in via stdin. This will archive each link passed in. **To add a single URL to the archive** or a list of links from a file, pipe them in via stdin. This will archive each link passed in.
```bash ```bash
echo 'https://example.com' | docker run -i -v ~/ArchiveBox:/data nikisweeting/archivebox echo 'https://example.com' | docker run -i -v ~/ArchiveBox:/data nikisweeting/archivebox
# or # or
@@ -130,27 +141,33 @@ cat bookmarks.html | docker run -i -v ~/ArchiveBox:/data nikisweeting/archivebox
``` ```
**To add a list of pages via feed URL or remote file,** pass the URL of the feed as an argument. **To add a list of pages via feed URL or remote file,** pass the URL of the feed as an argument.
```bash ```bash
docker run -v -v ~/ArchiveBox:/data nikisweeting/archivebox /bin/archive 'https://example.com/some/rss/feed.xml' docker run -v -v ~/ArchiveBox:/data nikisweeting/archivebox /bin/archive 'https://example.com/some/rss/feed.xml'
``` ```
Passing a URL as an argument here does not archive the specified URL, it downloads it and archives the links *inside* of it, so only use it for RSS feeds or other *lists of links* you want to add. To add an individual link use the instruction above and pass via stdin instead of by argument.
## Accessing the data Passing a URL as an argument here does not archive the specified URL, it downloads it and archives the links _inside_ of it, so only use it for RSS feeds or other _lists of links_ you want to add. To add an individual link use the instruction above and pass via stdin instead of by argument.
### Using a bind folder ### Accessing the data
#### Using a bind folder
Use the flag: Use the flag:
```bash ```bash
-v /full/path/to/folder/on/host:/data -v /full/path/to/folder/on/host:/data
``` ```
This will use the folder `/full/path/to/folder/on/host` on your host to store the ArchiveBox output. This will use the folder `/full/path/to/folder/on/host` on your host to store the ArchiveBox output.
### Using a named Docker data volume #### Using a named Docker data volume
```bash ```bash
docker volume create archivebox-data docker volume create archivebox-data
``` ```
Then use the flag: Then use the flag:
```bash ```bash
-v archivebox-data:/data -v archivebox-data:/data
``` ```
@@ -159,21 +176,24 @@ You can mount your data volume using standard docker tools, or access the conten
`/var/lib/docker/volumes/archivebox-data/_data` (on most Linux systems) `/var/lib/docker/volumes/archivebox-data/_data` (on most Linux systems)
On a Mac you'll have to enter the base Docker Linux VM first to access the volume data: On a Mac you'll have to enter the base Docker Linux VM first to access the volume data:
```bash ```bash
screen ~/Library/Containers/com.docker.docker/Data/vms/0/tty screen ~/Library/Containers/com.docker.docker/Data/vms/0/tty
cd /var/lib/docker/volumes/archivebox-data/_data cd /var/lib/docker/volumes/archivebox-data/_data
``` ```
## Configuration ### Configuration
ArchiveBox in Docker accepts all the same environment variables as normal, see the list on the [[Configuration]] page. ArchiveBox in Docker accepts all the same environment variables as normal, see the list on the [[Configuration]] page.
To pass environment variables when running, you can use the env command. To pass environment variables when running, you can use the env command.
```bash ```bash
echo 'https://example.com' | docker run -i -v ~/ArchiveBox:/data nikisweeting/archivebox env FETCH_SCREENSHOT=False /bin/archive echo 'https://example.com' | docker run -i -v ~/ArchiveBox:/data nikisweeting/archivebox env FETCH_SCREENSHOT=False /bin/archive
``` ```
Or you can create an `ArchiveBox.env` file (copy from the default `etc/ArchiveBox.conf.default`) and pass it in like so: Or you can create an `ArchiveBox.env` file (copy from the default `etc/ArchiveBox.conf.default`) and pass it in like so:
```bash ```bash
docker run -i -v --env-file=ArchiveBox.env nikisweeting/archivebox docker run -i -v --env-file=ArchiveBox.env nikisweeting/archivebox
``` ```

@@ -1,3 +1,5 @@
# Install
ArchiveBox only has a few main dependencies apart from `python3`, and they can all be installed using your normal package manager. It usually takes 1min to get up and running if you use the [helper script](#automatic-setup), or about 5min if you install everything [manually](#manual-setup). ArchiveBox only has a few main dependencies apart from `python3`, and they can all be installed using your normal package manager. It usually takes 1min to get up and running if you use the [helper script](#automatic-setup), or about 5min if you install everything [manually](#manual-setup).
<img src="https://lh4.googleusercontent.com/KWaqSJ_J9nSaGZugZWGR_mC18xxbGj2pVScriSzP8hX7KiUSw6L3VVL8rhDxQKIwxaCsfSFUO1B2pipEM4h7L-HJOGXo7yZK8a3DBVERwqfEZ8GxpeHPwh8P4LSkqVjPGRx5XYs" width="20%" align="right"/> <img src="https://lh4.googleusercontent.com/KWaqSJ_J9nSaGZugZWGR_mC18xxbGj2pVScriSzP8hX7KiUSw6L3VVL8rhDxQKIwxaCsfSFUO1B2pipEM4h7L-HJOGXo7yZK8a3DBVERwqfEZ8GxpeHPwh8P4LSkqVjPGRx5XYs" width="20%" align="right"/>

19
Makefile Normal file

@@ -0,0 +1,19 @@
# Minimal makefile for Sphinx documentation
#
# You can set these variables from the command line.
SPHINXOPTS =
SPHINXBUILD = sphinx-build
SOURCEDIR = .
BUILDDIR = _build
# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
.PHONY: help Makefile
# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

@@ -1,4 +1,4 @@
## Publishing Your Archive # Publishing Your Archive
The archive produced by `./archive` is suitable for serving on any provider that can host static html (e.g. github pages!). The archive produced by `./archive` is suitable for serving on any provider that can host static html (e.g. github pages!).
@@ -19,16 +19,15 @@ Make sure you're not running any content as CGI or PHP, you only want to serve s
Urls look like: `https://archive.example.com/archive/1493350273/en.wikipedia.org/wiki/Dining_philosophers_problem.html` Urls look like: `https://archive.example.com/archive/1493350273/en.wikipedia.org/wiki/Dining_philosophers_problem.html`
**Security WARNING & Content Disclaimer** ## Security Concerns
Re-hosting other people's content has security implications for any other sites sharing your hosting domain. Make sure you understand Re-hosting other people's content has security implications for any other sites sharing your hosting domain. Make sure you understand the dangers of hosting unknown archived CSS & JS files [on your shared domain](https://developer.mozilla.org/en-US/docs/Web/Security/Same-origin_policy).
the dangers of hosting unknown archived CSS & JS files [on your shared domain](https://developer.mozilla.org/en-US/docs/Web/Security/Same-origin_policy). Due to the security risk of serving some malicious JS you archived by accident, it's best to put this on a domain or subdomain of its own to keep cookies separate and slightly mitigate [CSRF attacks](https://en.wikipedia.org/wiki/Cross-site_request_forgery) and other nastiness.
Due to the security risk of serving some malicious JS you archived by accident, it's best to put this on a domain or subdomain
of its own to keep cookies separate and slightly mitigate [CSRF attacks](https://en.wikipedia.org/wiki/Cross-site_request_forgery) and other nastiness. ## Copyright Concerns
Be aware that some sites you archive may not allow you to rehost their content publicly for copyright reasons, it's up to you to host responsibly and respond to takedown requests appropriately.
You may also want to blacklist your archive in `/robots.txt` if you don't want to be publicly assosciated with all the links you archive via search engine results. You may also want to blacklist your archive in `/robots.txt` if you don't want to be publicly assosciated with all the links you archive via search engine results.
Be aware that some sites you archive may not allow you to rehost their content publicly for copyright reasons,
it's up to you to host responsibly and respond to takedown requests appropriately.
Please modify the `FOOTER_INFO` config variable to add your contact info to the footer of your index. Please modify the `FOOTER_INFO` config variable to add your contact info to the footer of your index.

@@ -1,3 +1,5 @@
# Quickstart
<div align="center"> <div align="center">
<img src="https://i.imgur.com/ZbHpEf8.jpg" width="40%"/> <img src="https://i.imgur.com/ZbHpEf8.jpg" width="40%"/>
</div> </div>

1
README.md Symbolic link

@@ -0,0 +1 @@
../README.md

@@ -1,4 +1,4 @@
## Roadmap # Roadmap
<img src="https://i.imgur.com/es97GGV.png" width="20%" align="right"/> <img src="https://i.imgur.com/es97GGV.png" width="20%" align="right"/>
@@ -8,7 +8,7 @@
--- ---
# Planned Specification ## Planned Specification
To see how this spec has been scheduled / implemented / released so far, read these pull requests: To see how this spec has been scheduled / implemented / released so far, read these pull requests:
- ✅ [v0.2.x](https://github.com/pirate/ArchiveBox/tree/483a3bef9e2b1a7b80611947a3be99b0cf4f9959) - ✅ [v0.2.x](https://github.com/pirate/ArchiveBox/tree/483a3bef9e2b1a7b80611947a3be99b0cf4f9959)

@@ -1,4 +1,6 @@
## Schedule daily importing of new links into your archive # Scheduled Archiving
## Using Cron
To schedule regular archiving you can use any task scheduler like `cron`, `at`, `sytsemd`, etc. To schedule regular archiving you can use any task scheduler like `cron`, `at`, `sytsemd`, etc.
@@ -8,7 +10,9 @@ ones as necessary.
For some example configs, see the [`etc/cron.d`](https://github.com/pirate/ArchiveBox/blob/master/etc/cron.d) and [`etc/supervisord`](https://github.com/pirate/ArchiveBox/blob/master/etc/supervisord) folders. For some example configs, see the [`etc/cron.d`](https://github.com/pirate/ArchiveBox/blob/master/etc/cron.d) and [`etc/supervisord`](https://github.com/pirate/ArchiveBox/blob/master/etc/supervisord) folders.
## Example: Import Firefox browser history every 24 hours ## Examples
### Example: Import Firefox browser history every 24 hours
This example exports your browser history and archives it once a day: This example exports your browser history and archives it once a day:
@@ -26,7 +30,7 @@ cd /opt/ArchiveBox
0 24 * * * www-data /opt/ArchiveBox/bin/firefox_custom.sh 0 24 * * * www-data /opt/ArchiveBox/bin/firefox_custom.sh
``` ```
## Example: Import an RSS feed from Pocket every 12 hours ### Example: Import an RSS feed from Pocket every 12 hours
This example imports your Pocket bookmark feed and archives any new links once a day: This example imports your Pocket bookmark feed and archives any new links once a day:

@@ -1,3 +1,5 @@
# Security Overview
## Usage Modes ## Usage Modes
ArchiveBox has three common usage modes outlined below. ArchiveBox has three common usage modes outlined below.

@@ -1,3 +1,5 @@
# Troubleshooting
▶️ *If you need help or have a question, you can open an [issue](https://github.com/pirate/ArchiveBox/issues?q=is%3Aissue+is%3Aopen+sort%3Aupdated-desc) or reach out on [Twitter](https://github.com/theSquashSH).* ▶️ *If you need help or have a question, you can open an [issue](https://github.com/pirate/ArchiveBox/issues?q=is%3Aissue+is%3Aopen+sort%3Aupdated-desc) or reach out on [Twitter](https://github.com/theSquashSH).*
What are you having an issue with?: What are you having an issue with?:
@@ -9,11 +11,11 @@ What are you having an issue with?:
--- ---
### Installing ## Installing
Make sure you've followed the Manual Setup guide in the [[Install]] instructions first. Then check here for help depending on what component you need help with: Make sure you've followed the Manual Setup guide in the [[Install]] instructions first. Then check here for help depending on what component you need help with:
#### Python ### Python
On some Linux distributions the python3 package might not be recent enough. On some Linux distributions the python3 package might not be recent enough.
If this is the case for you, resort to installing a recent enough version manually. If this is the case for you, resort to installing a recent enough version manually.
@@ -22,7 +24,7 @@ add-apt-repository ppa:fkrull/deadsnakes && apt update && apt install python3.6
``` ```
If you still need help, [the official Python docs](https://docs.python.org/3.6/using/unix.html) are a good place to start. If you still need help, [the official Python docs](https://docs.python.org/3.6/using/unix.html) are a good place to start.
#### Chromium/Google Chrome ### Chromium/Google Chrome
For more info, see the [[Chromium Install]] page. For more info, see the [[Chromium Install]] page.
@@ -62,7 +64,7 @@ env CHROME_BINARY=/path/from/step/1/chromium-browser ./archive bookmarks_export.
``` ```
#### Wget & Curl ### Wget & Curl
If you're missing `wget` or `curl`, simply install them using `apt` or your package manager of choice. If you're missing `wget` or `curl`, simply install them using `apt` or your package manager of choice.
See the "Manual Setup" instructions for more details. See the "Manual Setup" instructions for more details.
@@ -71,14 +73,14 @@ If wget times out or randomly fails to download some sites that you have confirm
upgrade wget to the most recent version with `brew upgrade wget` or `apt upgrade wget`. There is upgrade wget to the most recent version with `brew upgrade wget` or `apt upgrade wget`. There is
a bug in versions `<=1.19.1_1` that caused wget to fail for perfectly valid sites. a bug in versions `<=1.19.1_1` that caused wget to fail for perfectly valid sites.
### Archiving ## Archiving
#### No links parsed from export file ### No links parsed from export file
Please open an [issue](https://github.com/pirate/ArchiveBox/issues) with a description of where you got the export, and Please open an [issue](https://github.com/pirate/ArchiveBox/issues) with a description of where you got the export, and
preferrably your export file attached (you can redact the links). We'll fix the parser to support your format. preferrably your export file attached (you can redact the links). We'll fix the parser to support your format.
#### Lots of skipped sites ### Lots of skipped sites
If you ran the archiver once, it wont re-download sites subsequent times, it will only download new links. If you ran the archiver once, it wont re-download sites subsequent times, it will only download new links.
If you haven't already run it, make sure you have a working internet connection and that the parsed URLs look correct. If you haven't already run it, make sure you have a working internet connection and that the parsed URLs look correct.
@@ -86,22 +88,22 @@ You can check the `archive.py` output or `index.html` to see what links it's dow
If you're still having issues, try deleting or moving the `output/archive` folder (back it up first!) and running `./archive` again. If you're still having issues, try deleting or moving the `output/archive` folder (back it up first!) and running `./archive` again.
#### Lots of errors ### Lots of errors
Make sure you have all the dependencies installed and that you're able to visit the links from your browser normally. Make sure you have all the dependencies installed and that you're able to visit the links from your browser normally.
Open an [issue](https://github.com/pirate/ArchiveBox/issues) with a description of the errors if you're still having problems. Open an [issue](https://github.com/pirate/ArchiveBox/issues) with a description of the errors if you're still having problems.
#### Lots of broken links from the index ### Lots of broken links from the index
Not all sites can be effectively archived with each method, that's why it's best to use a combination of `wget`, PDFs, and screenshots. Not all sites can be effectively archived with each method, that's why it's best to use a combination of `wget`, PDFs, and screenshots.
If it seems like more than 10-20% of sites in the archive are broken, open an [issue](https://github.com/pirate/ArchiveBox/issues) If it seems like more than 10-20% of sites in the archive are broken, open an [issue](https://github.com/pirate/ArchiveBox/issues)
with some of the URLs that failed to be archived and I'll investigate. with some of the URLs that failed to be archived and I'll investigate.
#### Removing unwanted links from the index ### Removing unwanted links from the index
If you accidentally added lots of unwanted links into index and they slow down your archiving, you can use the `bin/purge` script to remove them from your index, which removes everything matching python regexes you pass into it. E.g: `bin/purge -r 'amazon\.com' -r 'google\.com'`. It would prompt before removing links from index, but for extra safety you might want to back up `index.json` first (or put in undex version control). If you accidentally added lots of unwanted links into index and they slow down your archiving, you can use the `bin/purge` script to remove them from your index, which removes everything matching python regexes you pass into it. E.g: `bin/purge -r 'amazon\.com' -r 'google\.com'`. It would prompt before removing links from index, but for extra safety you might want to back up `index.json` first (or put in undex version control).
### Hosting the Archive ## Hosting the Archive
If you're having issues trying to host the archive via nginx, make sure you already have nginx running with SSL. If you're having issues trying to host the archive via nginx, make sure you already have nginx running with SSL.
If you don't, google around, there are plenty of tutorials to help get that set up. Open an [issue](https://github.com/pirate/ArchiveBox/issues) If you don't, google around, there are plenty of tutorials to help get that set up. Open an [issue](https://github.com/pirate/ArchiveBox/issues)

@@ -1,21 +1,24 @@
▶️ *Make sure the dependencies are [fully installed](https://github.com/pirate/ArchiveBox/wiki/Install) before running any ArchiveBox commands.* # Usage
▶️ _Make sure the dependencies are [fully installed](https://github.com/pirate/ArchiveBox/wiki/Install) before running any ArchiveBox commands._
**ArchiveBox API Reference:** **ArchiveBox API Reference:**
<img src="https://i.imgur.com/aQZZcku.png" width="20%" align="right"/> <img src="https://i.imgur.com/aQZZcku.png" width="20%" align="right"/>
- [Overview](#Overview): Program structure and outline of basic archiving process. - [Overview](#Overview): Program structure and outline of basic archiving process.
- [CLI Usage](#CLI-Usage): Docs and examples for the ArchiveBox command line interface. - [CLI Usage](#CLI-Usage): Docs and examples for the ArchiveBox command line interface.
- [UI Usage](#UI-Usage): Docs and screenshots for the outputted HTML archive interface. - [UI Usage](#UI-Usage): Docs and screenshots for the outputted HTML archive interface.
- [Disk Layout](#Disk-Layout): Description of the archive folder structure and contents. - [Disk Layout](#Disk-Layout): Description of the archive folder structure and contents.
**Related:** **Related:**
- [[Docker]]: Learn about ArchiveBox usage with Docker and Docker Compose
- [[Configuration]]: Learn about the various archive method options - [[Docker]]: Learn about ArchiveBox usage with Docker and Docker Compose
- [[Scheduled Archiving]]: Learn how to set up automatic daily archiving - [[Configuration]]: Learn about the various archive method options
- [[Publishing Your Archive]]: Learn how to host your archive for others to access - [[Scheduled Archiving]]: Learn how to set up automatic daily archiving
- [[Troubleshooting]]: Resources if you encounter any problems - [[Publishing Your Archive]]: Learn how to host your archive for others to access
- [Screenshots](https://github.com/pirate/ArchiveBox#Screenshots): See what the CLI and outputted HTML look like - [[Troubleshooting]]: Resources if you encounter any problems
- [Screenshots](https://github.com/pirate/ArchiveBox#Screenshots): See what the CLI and outputted HTML look like
## CLI Usage ## CLI Usage
@@ -35,16 +38,18 @@ You can share a single archivebox data directory between Docker and non-Docker i
For more examples see the [[Docker]] page. For more examples see the [[Docker]] page.
- [Run ArchiveBox with configuration options](#Run-ArchiveBox-with-configuration-options) - [Run ArchiveBox with configuration options](#Run-ArchiveBox-with-configuration-options)
- [Import a single URL or list of URLs via stdin](#Import-a-single-URL-or-list-of-URLs-via-stdin) - [Import a single URL or list of URLs via stdin](#Import-a-single-URL-or-list-of-URLs-via-stdin)
- [Import list of links exported from browser or another service](#Import-list-of-links-exported-from-browser-or-another-service) - [Import list of links exported from browser or another service](#Import-list-of-links-exported-from-browser-or-another-service)
- [Import list of URLs from a remote RSS feed or file](#Import-list-of-URLs-from-a-remote-RSS-feed-or-file) - [Import list of URLs from a remote RSS feed or file](#Import-list-of-URLs-from-a-remote-RSS-feed-or-file)
- [Import list of links from browser history](#Import-list-of-links-from-browser-history) - [Import list of links from browser history](#Import-list-of-links-from-browser-history)
--- ---
### Run ArchiveBox with configuration options ### Run ArchiveBox with configuration options
You can set environment variables in your shell profile, a config file, or by using the `env` command.
```bash ```bash
# via the CLI # via the CLI
archivebox config --set TIMEOUT=3600 archivebox config --set TIMEOUT=3600
@@ -62,26 +67,29 @@ If you're using Docker, also make sure to read the Configuration section on the
--- ---
### Import a single URL or list of URLs via stdin ### Import a single URL
```bash ```bash
<<<<<<< HEAD
archivebox add 'https://example.com' archivebox add 'https://example.com'
# or # or
echo 'https://example.com' | archivebox add echo 'https://example.com' | archivebox add
``` ```
--- You can also add `--depth=1` to any of these commands if you want to recursively archive the URLs and all URLs one hop away. (e.g. all the outlinks on a page + the page).
### Import a list of URLs from a file or feed
### Import a list of URLs from a txt file
```bash ```bash
cat urls_to_archive.txt | archivebox add
# or
archivebox add < urls_to_archive.txt archivebox add < urls_to_archive.txt
# or # or
curl https://getpocket.com/users/USERNAME/feed/all | archivebox add curl https://getpocket.com/users/USERNAME/feed/all | archivebox add
``` ```
You can also pipe in RSS, XML, Netscape, or any of the other supported import formats via stdin. You can also pipe in RSS, XML, Netscape, or any of the other supported import formats via stdin.
---
### Import list of links exported from browser or another service
```bash ```bash
archivebox add < ~/Downloads/browser_bookmarks_export.html archivebox add < ~/Downloads/browser_bookmarks_export.html
# or # or
@@ -90,13 +98,11 @@ archivebox add < ~/Downloads/pinboard_bookmarks.json
archivebox add < ~/Downloads/other_links.txt archivebox add < ~/Downloads/other_links.txt
``` ```
You can also add `--depth=1` to any of these commands if you want to recursively archive the URLs and all URLs one hop away. (e.g. all the outlinks on a page + the page).
--- ---
### Import list of links from browser history ### Import list of links from browser history
This uses the `archivebox-export-browser-history` helper script to parse your browser's SQLite history database for URLs. Look in the `bin/` folder of this repo to find a script to parse your browser's SQLite history database for URLs.
Specify the type of the browser as the first argument, and optionally the path to the SQLite history file as the second argument. Specify the type of the browser as the first argument, and optionally the path to the SQLite history file as the second argument.
```bash ```bash
@@ -174,9 +180,25 @@ archivebox add < urls_chunk_1.txt &
archivebox add < urls_chunk_2.txt & archivebox add < urls_chunk_2.txt &
archivebox add < urls_chunk_3.txt & archivebox add < urls_chunk_3.txt &
``` ```
(though this may not be faster if you have a very large collection/main index)
Users have reported running it with 50k+ bookmarks with success (though it will take more RAM while running). Users have reported running it with 50k+ bookmarks with success (though it will take more RAM while running).
If you already imported a huge list of bookmarks and want to import only new If you already imported a huge list of bookmarks and want to import only new
bookmarks, you can use the `ONLY_NEW` environment variable. This is useful if bookmarks, you can use the `ONLY_NEW` environment variable. This is useful if
you want to import a bookmark dump periodically and want to skip broken links you want to import a bookmark dump periodically and want to skip broken links
which are already in the index. which are already in the index.
## Python API Usage
```python
from archivebox.main import add, info, remove, check_data_folder
out_dir = '~/path/to/my/data/folder'
check_data_folder(out_dir=out_dir)
add('https://example.com', index_only=True, out_dir=out_dir)
info(out_dir=out_dir)
remove('https://example.com', delete=True, yes=True, out_dir=out_dir)
```
For more information see the Python API Reference.

@@ -1,3 +1,5 @@
# Web Archiving Community
<div align="center"> <div align="center">
<!--💬 **Join the [`#ArchiveBox` channel](http://webchat.freenode.net?channels=ArchiveBox&uio=d4) via IRC on [FreeNode.net](http://webchat.freenode.net?channels=ArchiveBox&uio=d4) to chat with us!**--> <!--💬 **Join the [`#ArchiveBox` channel](http://webchat.freenode.net?channels=ArchiveBox&uio=d4) via IRC on [FreeNode.net](http://webchat.freenode.net?channels=ArchiveBox&uio=d4) to chat with us!**-->
@@ -12,7 +14,6 @@ The internet archiving community is surprisingly far-reaching and almost univers
Whether you want to learn which organizations are the big players in the web archiving space, want to find a specific open source tool for your web archiving need, or just want to see where archivists hang out online, this is my attempt at an index of the entire web archiving community. Whether you want to learn which organizations are the big players in the web archiving space, want to find a specific open source tool for your web archiving need, or just want to see where archivists hang out online, this is my attempt at an index of the entire web archiving community.
## Contents
<img src="https://i.imgur.com/duS8Lm7.png" width="200px" align="right"/> <img src="https://i.imgur.com/duS8Lm7.png" width="200px" align="right"/>

142
archivebox.cli.rst Normal file

@@ -0,0 +1,142 @@
archivebox.cli package
======================
Submodules
----------
archivebox.cli.archivebox module
--------------------------------
.. automodule:: archivebox.cli.archivebox
:members:
:undoc-members:
:show-inheritance:
archivebox.cli.archivebox\_add module
-------------------------------------
.. automodule:: archivebox.cli.archivebox_add
:members:
:undoc-members:
:show-inheritance:
archivebox.cli.archivebox\_config module
----------------------------------------
.. automodule:: archivebox.cli.archivebox_config
:members:
:undoc-members:
:show-inheritance:
archivebox.cli.archivebox\_help module
--------------------------------------
.. automodule:: archivebox.cli.archivebox_help
:members:
:undoc-members:
:show-inheritance:
archivebox.cli.archivebox\_info module
--------------------------------------
.. automodule:: archivebox.cli.archivebox_info
:members:
:undoc-members:
:show-inheritance:
archivebox.cli.archivebox\_init module
--------------------------------------
.. automodule:: archivebox.cli.archivebox_init
:members:
:undoc-members:
:show-inheritance:
archivebox.cli.archivebox\_list module
--------------------------------------
.. automodule:: archivebox.cli.archivebox_list
:members:
:undoc-members:
:show-inheritance:
archivebox.cli.archivebox\_manage module
----------------------------------------
.. automodule:: archivebox.cli.archivebox_manage
:members:
:undoc-members:
:show-inheritance:
archivebox.cli.archivebox\_remove module
----------------------------------------
.. automodule:: archivebox.cli.archivebox_remove
:members:
:undoc-members:
:show-inheritance:
archivebox.cli.archivebox\_schedule module
------------------------------------------
.. automodule:: archivebox.cli.archivebox_schedule
:members:
:undoc-members:
:show-inheritance:
archivebox.cli.archivebox\_server module
----------------------------------------
.. automodule:: archivebox.cli.archivebox_server
:members:
:undoc-members:
:show-inheritance:
archivebox.cli.archivebox\_shell module
---------------------------------------
.. automodule:: archivebox.cli.archivebox_shell
:members:
:undoc-members:
:show-inheritance:
archivebox.cli.archivebox\_update module
----------------------------------------
.. automodule:: archivebox.cli.archivebox_update
:members:
:undoc-members:
:show-inheritance:
archivebox.cli.archivebox\_version module
-----------------------------------------
.. automodule:: archivebox.cli.archivebox_version
:members:
:undoc-members:
:show-inheritance:
archivebox.cli.logging module
-----------------------------
.. automodule:: archivebox.cli.logging
:members:
:undoc-members:
:show-inheritance:
archivebox.cli.tests module
---------------------------
.. automodule:: archivebox.cli.tests
:members:
:undoc-members:
:show-inheritance:
Module contents
---------------
.. automodule:: archivebox.cli
:members:
:undoc-members:
:show-inheritance:

22
archivebox.config.rst Normal file

@@ -0,0 +1,22 @@
archivebox.config package
=========================
Submodules
----------
archivebox.config.stubs module
------------------------------
.. automodule:: archivebox.config.stubs
:members:
:undoc-members:
:show-inheritance:
Module contents
---------------
.. automodule:: archivebox.config
:members:
:undoc-members:
:show-inheritance:

@@ -0,0 +1,30 @@
archivebox.core.migrations package
==================================
Submodules
----------
archivebox.core.migrations.0001\_initial module
-----------------------------------------------
.. automodule:: archivebox.core.migrations.0001_initial
:members:
:undoc-members:
:show-inheritance:
archivebox.core.migrations.0002\_auto\_20190417\_0739 module
------------------------------------------------------------
.. automodule:: archivebox.core.migrations.0002_auto_20190417_0739
:members:
:undoc-members:
:show-inheritance:
Module contents
---------------
.. automodule:: archivebox.core.migrations
:members:
:undoc-members:
:show-inheritance:

93
archivebox.core.rst Normal file

@@ -0,0 +1,93 @@
archivebox.core package
=======================
Subpackages
-----------
.. toctree::
archivebox.core.migrations
Submodules
----------
archivebox.core.admin module
----------------------------
.. automodule:: archivebox.core.admin
:members:
:undoc-members:
:show-inheritance:
archivebox.core.apps module
---------------------------
.. automodule:: archivebox.core.apps
:members:
:undoc-members:
:show-inheritance:
archivebox.core.models module
-----------------------------
.. automodule:: archivebox.core.models
:members:
:undoc-members:
:show-inheritance:
archivebox.core.settings module
-------------------------------
.. automodule:: archivebox.core.settings
:members:
:undoc-members:
:show-inheritance:
archivebox.core.tests module
----------------------------
.. automodule:: archivebox.core.tests
:members:
:undoc-members:
:show-inheritance:
archivebox.core.urls module
---------------------------
.. automodule:: archivebox.core.urls
:members:
:undoc-members:
:show-inheritance:
archivebox.core.views module
----------------------------
.. automodule:: archivebox.core.views
:members:
:undoc-members:
:show-inheritance:
archivebox.core.welcome\_message module
---------------------------------------
.. automodule:: archivebox.core.welcome_message
:members:
:undoc-members:
:show-inheritance:
archivebox.core.wsgi module
---------------------------
.. automodule:: archivebox.core.wsgi
:members:
:undoc-members:
:show-inheritance:
Module contents
---------------
.. automodule:: archivebox.core
:members:
:undoc-members:
:show-inheritance:

86
archivebox.extractors.rst Normal file

@@ -0,0 +1,86 @@
archivebox.extractors package
=============================
Submodules
----------
archivebox.extractors.archive\_org module
-----------------------------------------
.. automodule:: archivebox.extractors.archive_org
:members:
:undoc-members:
:show-inheritance:
archivebox.extractors.dom module
--------------------------------
.. automodule:: archivebox.extractors.dom
:members:
:undoc-members:
:show-inheritance:
archivebox.extractors.favicon module
------------------------------------
.. automodule:: archivebox.extractors.favicon
:members:
:undoc-members:
:show-inheritance:
archivebox.extractors.git module
--------------------------------
.. automodule:: archivebox.extractors.git
:members:
:undoc-members:
:show-inheritance:
archivebox.extractors.media module
----------------------------------
.. automodule:: archivebox.extractors.media
:members:
:undoc-members:
:show-inheritance:
archivebox.extractors.pdf module
--------------------------------
.. automodule:: archivebox.extractors.pdf
:members:
:undoc-members:
:show-inheritance:
archivebox.extractors.screenshot module
---------------------------------------
.. automodule:: archivebox.extractors.screenshot
:members:
:undoc-members:
:show-inheritance:
archivebox.extractors.title module
----------------------------------
.. automodule:: archivebox.extractors.title
:members:
:undoc-members:
:show-inheritance:
archivebox.extractors.wget module
---------------------------------
.. automodule:: archivebox.extractors.wget
:members:
:undoc-members:
:show-inheritance:
Module contents
---------------
.. automodule:: archivebox.extractors
:members:
:undoc-members:
:show-inheritance:

54
archivebox.index.rst Normal file

@@ -0,0 +1,54 @@
archivebox.index package
========================
Submodules
----------
archivebox.index.csv module
---------------------------
.. automodule:: archivebox.index.csv
:members:
:undoc-members:
:show-inheritance:
archivebox.index.html module
----------------------------
.. automodule:: archivebox.index.html
:members:
:undoc-members:
:show-inheritance:
archivebox.index.json module
----------------------------
.. automodule:: archivebox.index.json
:members:
:undoc-members:
:show-inheritance:
archivebox.index.schema module
------------------------------
.. automodule:: archivebox.index.schema
:members:
:undoc-members:
:show-inheritance:
archivebox.index.sql module
---------------------------
.. automodule:: archivebox.index.sql
:members:
:undoc-members:
:show-inheritance:
Module contents
---------------
.. automodule:: archivebox.index
:members:
:undoc-members:
:show-inheritance:

78
archivebox.parsers.rst Normal file

@@ -0,0 +1,78 @@
archivebox.parsers package
==========================
Submodules
----------
archivebox.parsers.generic\_json module
---------------------------------------
.. automodule:: archivebox.parsers.generic_json
:members:
:undoc-members:
:show-inheritance:
archivebox.parsers.generic\_rss module
--------------------------------------
.. automodule:: archivebox.parsers.generic_rss
:members:
:undoc-members:
:show-inheritance:
archivebox.parsers.generic\_txt module
--------------------------------------
.. automodule:: archivebox.parsers.generic_txt
:members:
:undoc-members:
:show-inheritance:
archivebox.parsers.medium\_rss module
-------------------------------------
.. automodule:: archivebox.parsers.medium_rss
:members:
:undoc-members:
:show-inheritance:
archivebox.parsers.netscape\_html module
----------------------------------------
.. automodule:: archivebox.parsers.netscape_html
:members:
:undoc-members:
:show-inheritance:
archivebox.parsers.pinboard\_rss module
---------------------------------------
.. automodule:: archivebox.parsers.pinboard_rss
:members:
:undoc-members:
:show-inheritance:
archivebox.parsers.pocket\_html module
--------------------------------------
.. automodule:: archivebox.parsers.pocket_html
:members:
:undoc-members:
:show-inheritance:
archivebox.parsers.shaarli\_rss module
--------------------------------------
.. automodule:: archivebox.parsers.shaarli_rss
:members:
:undoc-members:
:show-inheritance:
Module contents
---------------
.. automodule:: archivebox.parsers
:members:
:undoc-members:
:show-inheritance:

58
archivebox.rst Normal file

@@ -0,0 +1,58 @@
archivebox package
==================
Subpackages
-----------
.. toctree::
archivebox.cli
archivebox.config
archivebox.core
archivebox.extractors
archivebox.index
archivebox.parsers
Submodules
----------
archivebox.main module
----------------------
.. automodule:: archivebox.main
:members:
:undoc-members:
:show-inheritance:
archivebox.manage module
------------------------
.. automodule:: archivebox.manage
:members:
:undoc-members:
:show-inheritance:
archivebox.system module
------------------------
.. automodule:: archivebox.system
:members:
:undoc-members:
:show-inheritance:
archivebox.util module
----------------------
.. automodule:: archivebox.util
:members:
:undoc-members:
:show-inheritance:
Module contents
---------------
.. automodule:: archivebox
:members:
:undoc-members:
:show-inheritance:

134
conf.py Normal file

@@ -0,0 +1,134 @@
# Configuration file for the Sphinx documentation builder.
#
# This file only contains a selection of the most common options. For a full
# list see the documentation:
# http://www.sphinx-doc.org/en/master/config
# -- Path setup --------------------------------------------------------------
# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#
import os
import sys
import django
import recommonmark
from recommonmark.transform import AutoStructify
os.environ['USE_CHROME'] = 'False'
PYTHON_DIR = os.path.abspath(os.path.join(os.path.dirname(os.path.abspath(__file__)), '..', 'archivebox'))
sys.path.insert(0, os.path.abspath('.'))
sys.path.insert(0, os.path.abspath('../'))
sys.path.insert(0, PYTHON_DIR)
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "core.settings")
django.setup()
VERSION = open(os.path.join(PYTHON_DIR, 'VERSION'), 'r').read().strip()
# -- Project information -----------------------------------------------------
project = 'ArchiveBox'
copyright = '2020, Nick Sweeting'
author = 'Nick Sweeting'
github_url = 'https://github.com/pirate/ArchiveBox'
github_doc_root = 'https://github.com/pirate/ArchiveBox/tree/master/docs/'
language = 'en'
# The full version, including alpha/beta/rc tags
release = VERSION
# -- General configuration ---------------------------------------------------
# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = [
'sphinx.ext.autodoc',
'sphinx.ext.napoleon',
'sphinx.ext.viewcode',
'recommonmark',
]
source_suffix = {
'.rst': 'restructuredtext',
'.txt': 'markdown',
'.md': 'markdown',
}
master_doc = 'index'
napoleon_google_docstring = True
napoleon_use_param = True
napoleon_use_ivar = False
napoleon_use_rtype = True
napoleon_include_special_with_doc = False
# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']
# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This pattern also affects html_static_path and html_extra_path.
exclude_patterns = [
'_build',
'Thumbs.db',
'.DS_Store',
'data',
'output',
'templates',
'tests',
'migrations',
]
# -- Options for HTML output -------------------------------------------------
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#
html_logo = 'logo.png'
html_theme = 'sphinx_rtd_theme'
html_theme_options = {
'navigation_depth': 5,
'collapse_navigation': False,
'sticky_navigation': True,
}
html_show_sphinx = False
texinfo_documents = [
(master_doc, 'archivebox', 'archivebox Documentation',
author, 'archivebox', 'The open-source self-hosted internet archive.',
'Miscellaneous'),
]
autodoc_default_flags = ['members']
autodoc_member_order = 'bysource'
extensions += ['sphinx.ext.autosummary',]
autosummary_gerenerate = True
pygments_style = 'sphinx'
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static']
man_pages = [
(master_doc, 'archivebox', 'archivebox Documentation',
[author], 1)
]
# At the bottom of conf.py
def setup(app):
app.add_config_value('recommonmark_config', {
# 'url_resolver': lambda url: github_doc_root + url,
'auto_toc_tree_section': 'Documentation',
}, True)
app.add_transform(AutoStructify)

40
index.rst Normal file

@@ -0,0 +1,40 @@
.. sidebar:: Welcome to ArchiveBox!
Just getting started?
Check out the `Quickstart <Quickstart.html>`_ guide.
Need help with something?
Ping us on `Twitter <https://twitter.com/theSquashSH>`_ or `Github <https://github.com/pirate/ArchiveBox/issues>`_.
Want to join the community?
See our `Community Wiki <https://github.com/pirate/ArchiveBox/wiki/Web-Archiving-Community>`_ page.
.. image:: logo.png
:width: 200px
:align: center
:alt: ArchiveBox Logo
==========
ArchiveBox
==========
"The open-source self-hosted internet archive."
`Website <https://archivebox.io>`_ | `Github <https://github.com/pirate/ArchiveBox>`_ | `Source <https://github.com/pirate/ArchiveBox/tree/master>`_ | `Bug Tracker <https://github.com/pirate/ArchiveBox/issues>`_
.. code-block:: bash
mkdir my-archive; cd my-archive/
pip install archivebox
archivebox init
archivebox add https://example.com
archivebox info
=============
Documentation
=============
.. toctree::
:maxdepth: 2
Contents.rst

BIN
logo.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 49 KiB

7
modules.rst Normal file

@@ -0,0 +1,7 @@
archivebox
==========
.. toctree::
:maxdepth: 4
archivebox