diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..69fa449 --- /dev/null +++ b/.gitignore @@ -0,0 +1 @@ +_build/ diff --git a/Changelog.md b/Changelog.md index df3d114..7edfa67 100644 --- a/Changelog.md +++ b/Changelog.md @@ -1,3 +1,5 @@ +# Changelog + ▶️ *If you're having an issue with a breaking change, or migrating your data between versions, open an [issue](https://github.com/pirate/ArchiveBox/issues) to get help.* **`ArchiveBox` was previously named `Pocket Archive Stream` and then `Bookmark Archiver`.** @@ -110,4 +112,4 @@ See the [releases](https://github.com/pirate/ArchiveBox/releases) page for versi - added Pocket-format export support --- - - v0.0.0 released: created Pocket Archive Stream 2017/05/05 \ No newline at end of file + - v0.0.0 released: created Pocket Archive Stream 2017/05/05 diff --git a/Chromium-Install.md b/Chromium-Install.md index 6e8bfbf..9bed8ec 100644 --- a/Chromium-Install.md +++ b/Chromium-Install.md @@ -1,3 +1,5 @@ +# Chromium Install + By default, ArchiveBox looks for any existing installed version of Chrome/Chromium and uses it if found. You can optionally install a specific version and set the environment variable `CHROME_BINARY` to force ArchiveBox to use that one, e.g.: - `CHROME_BINARY=google-chrome-beta` @@ -6,7 +8,7 @@ By default, ArchiveBox looks for any existing installed version of Chrome/Chromi If you don't already have Chrome installed, I recommend installing Chromium instead of Google Chrome, as it's the open-source fork of Chrome that doesn't send as much tracking data to Google. -#### Check for existing Chrome/Chromium install +**Check for existing Chrome/Chromium install:** @@ -48,4 +50,4 @@ apt install google-chrome-beta ## Troubleshooting -If you encounter problems setting up Google Chrome or Chromium, see the [Troubleshooting](https://github.com/pirate/ArchiveBox/wiki/Troubleshooting#chromiumgoogle-chrome) page. \ No newline at end of file +If you encounter problems setting up Google Chrome or Chromium, see the [Troubleshooting](https://github.com/pirate/ArchiveBox/wiki/Troubleshooting#chromiumgoogle-chrome) page. diff --git a/Configuration.md b/Configuration.md index b16eba7..bcf24cd 100644 --- a/Configuration.md +++ b/Configuration.md @@ -1,5 +1,6 @@ -▶️ *The default ArchiveBox config file can be found here: [`etc/ArchiveBox.conf.default`](https://github.com/pirate/ArchiveBox/blob/master/etc/ArchiveBox.conf.default).* +# Configuration +▶️ *The default ArchiveBox config file can be found here: [`etc/ArchiveBox.conf.default`](https://github.com/pirate/ArchiveBox/blob/master/etc/ArchiveBox.conf.default).* Configuration is done through environment variables. You can pass in settings using all the usual environment variable methods: e.g. by using the `env` command, exporting variables in your shell profile, or sourcing a `.env` file before running the command. @@ -384,31 +385,3 @@ Path or name of the curl binary to use. - ---- - - -# Creating a Config File - -*Note: If you're using Docker, see the [[Docker]] page for configuration instructions.* - -To set up a persistent config: - -1. Copy `etc/ArchiveBox.conf.default` to `~/.ArchiveBox.conf` -```bash -cp ArchiveBox/etc/ArchiveBox.conf.default ~/.ArchiveBox.conf -``` - -2. Edit your options inside `~/.ArchiveBox.conf`, e.g.: -```bash -CHROME_BINARY=google-chrome-stable -RESOLUTION=1440,900 -FETCH_PDF=False -``` - -3. Source your config file when you run your archive script: -```bash -eval export $(grep -v '^#' ~/path/to/your/ArchiveBox.conf); ./archive https://example.com/rss/feed.xml -``` - -Improving this process is on the roadmap, in future versions you'll be able to pass a config file directly to the archive command. \ No newline at end of file diff --git a/Contents.rst b/Contents.rst new file mode 100644 index 0000000..06ba8f0 --- /dev/null +++ b/Contents.rst @@ -0,0 +1,71 @@ +Intro +##### + +.. toctree:: + :maxdepth: 1 + + README.md + + +Getting Started +############### + +.. toctree:: + :maxdepth: 2 + + Quickstart.md + Install.md + Docker.md + + +General +####### + +.. toctree:: + :maxdepth: 2 + + Usage.md + Configuration.md + Troubleshooting.md + Security-Overview.md + Publishing-Your-Archive.md + Scheduled-Archiving.md + Chromium-Install.md + + +API Reference +############# + +.. toctree:: + :maxdepth: 1 + + Configuration Options + Data Folder Layout + Command Line Interface + Web Interface + Python API + REST API + +.. - [Configuration Options](Configuration.md) +.. - [Data Folder Layout](Configuration.md) +.. - [Command Line Interface](Usage.md) +.. - [Web Interface](Usage.md) +.. - [Python API](modules) +.. - REST API (Coming soon...) + + +Meta +#### + +.. toctree:: + :maxdepth: 1 + + Roadmap.md + Changelog.md + Donations.md + + +.. toctree:: + :maxdepth: 3 + + Web-Archiving-Community.md diff --git a/Docker.md b/Docker.md index ddd7371..01c0dac 100644 --- a/Docker.md +++ b/Docker.md @@ -1,46 +1,50 @@ -# Overview +# Docker -Running ArchiveBox with Docker allows you to manage it in a container without exposing it to the rest of your system. Usage with Docker is similar to usage of ArchiveBox normally, with a few small differences. +## Overview -Make sure you have Docker installed and set up on your machine before following these instructions. If you don't already have Docker installed, follow the official install instructions for Linux, macOS, or Windows here: https://docs.docker.com/install/#supported-platforms. +Running ArchiveBox with Docker allows you to manage it in a container without exposing it to the rest of your system. Usage with Docker is similar to usage of ArchiveBox normally, with a few small differences. - +Make sure you have Docker installed and set up on your machine before following these instructions. If you don't already have Docker installed, follow the official install instructions for Linux, macOS, or Windows here: https://docs.docker.com/install/#supported-platforms. + + - [Overview](#) - [Docker Compose](#docker-compose) (recommended way) - + [Setup](#setup) - + [Usage](#usage) - + [Accessing the data](#accessing-the-data) - + [Configuration](#configuration) + - [Setup](#setup) + - [Usage](#usage) + - [Accessing the data](#accessing-the-data) + - [Configuration](#configuration) - [Plain Docker](#docker) - + [Setup](#setup-1) - + [Usage](#usage-1) - + [Accessing the data](#accessing-the-data-1) - + [Configuration](#configuration-1) + - [Setup](#setup-1) + - [Usage](#usage-1) + - [Accessing the data](#accessing-the-data-1) + - [Configuration](#configuration-1) **Official Docker Hub image:** https://hub.docker.com/r/nikisweeting/archivebox **Usage:** + ```bash -echo 'https://example.com' | docker run -i -v ~/ArchiveBox:/data nikisweeting/archivebox +echo 'https://example.com' | docker run -i -v ~/ArchiveBox:/data nikisweeting/archivebox add ``` --- -# Docker Compose +## Docker Compose -An example [`docker-compose.yml`](https://github.com/pirate/ArchiveBox/blob/master/docker-compose.yml) config with ArchiveBox and an Nginx server to serve the archive is included in the project root. You can edit it as you see fit, or just run it as it comes out-of-the-box. +An example [`docker-compose.yml`](https://github.com/pirate/ArchiveBox/blob/master/docker-compose.yml) config with ArchiveBox and an Nginx server to serve the archive is included in the project root. You can edit it as you see fit, or just run it as it comes out-of-the-box. Just make sure you have a Docker version that's [new enough](https://docs.docker.com/compose/compose-file/) to support `version: 3` format: + ```bash docker --version Docker version 18.09.1, build 4c52b90 # must be >= 17.04.0 ``` -## Setup +### Setup ```bash git clone https://github.com/pirate/ArchiveBox && cd ArchiveBox @@ -50,43 +54,48 @@ docker-compose up -d Then open [`http://127.0.0.1:8098`](http://127.0.0.1:8098) or `data/index.html` to view the archive (HTTP, not HTTPS). -## Usage +### Usage First, make sure you're `cd`'ed into the same folder as your `docker-compose.yml` file (e.g. the project root) and that your containers have been started with `docker-compose up -d`. To add new URLs, you can use docker-compose just like the normal `./archive` CLI. **To add an individual link or list of links**, pass in URLs via stdin. + ```bash echo "https://example.com" | docker-compose exec -T archivebox /bin/archive ``` **To import links from a file** you can either `cat` the file and pass it via stdin like above, or move it into your data folder so that ArchiveBox can access it from within the container. + ```bash mv ~/Downloads/bookmarks.html data/sources/bookmarks.html docker-compose exec archivebox /bin/archive /data/sources/bookmarks.html ``` **To pull in links from a feed or remote file**, pass the URL or path to the feed as an argument. + ```bash docker-compose exec archivebox /bin/archive https://example.com/some/feed.rss ``` -Passing a URL as an argument here does not archive the specified URL, it downloads it and archives the links *inside* of it, so only use it for RSS feeds or other *lists of links* you want to add. To add an individual link you want to archive use the instruction above and pass via stdin instead of by argument. -## Accessing the data +Passing a URL as an argument here does not archive the specified URL, it downloads it and archives the links _inside_ of it, so only use it for RSS feeds or other _lists of links_ you want to add. To add an individual link you want to archive use the instruction above and pass via stdin instead of by argument. -The outputted archive data is stored in `data/` (relative to the project root), or whatever folder path you specified in the `docker-compose.yml` `volumes:` section. Make sure the `data/` folder on the host has permissions initially set to `777` so that the ArchiveBox command is able to set it to the specified `OUTPUT_PERMISSIONS` config setting on the first run. +### Accessing the data + +The outputted archive data is stored in `data/` (relative to the project root), or whatever folder path you specified in the `docker-compose.yml` `volumes:` section. Make sure the `data/` folder on the host has permissions initially set to `777` so that the ArchiveBox command is able to set it to the specified `OUTPUT_PERMISSIONS` config setting on the first run. To access your archive, you can open `data/index.html` directly, or you can use the provided Nginx server running inside docker on [`http://127.0.0.1:8098`](http://127.0.0.1:8098). -## Configuration +### Configuration ArchiveBox running with docker-compose accepts all the same environment variables as normal, see the full list on the [[Configuration]] page. The recommended way to pass in config variables is to edit the `environment:` section in `docker-compose.yml` directly or add an `env_file: ./path/to/ArchiveBox.conf` line before `environment:` to import variables from an env file. Example of adding config options to `docker-compose.yml`: -```yml + +```yaml ... services: @@ -103,15 +112,16 @@ services: You can also specify an env file via CLI when running compose using `docker-compose --env-file=/path/to/config.env ...` although you must specify the variables in the `environment:` section that you want to have passed down to the ArchiveBox container from the passed env file. -If you want to access your archive server with HTTPS, put a reverse proxy like Nginx or Caddy in front of `http://127.0.0.1:8098` to do SSL termination. You can find many instructions to do this online if you search "SSL reverse proxy". +If you want to access your archive server with HTTPS, put a reverse proxy like Nginx or Caddy in front of `http://127.0.0.1:8098` to do SSL termination. You can find many instructions to do this online if you search "SSL reverse proxy". --- -# Docker +## Docker -## Setup +### Setup Fetch and run the ArchiveBox Docker image to create your initial archive. + ```bash echo 'https://example.com' | docker run -i -v ~/ArchiveBox:/data nikisweeting/archivebox ``` @@ -120,9 +130,10 @@ Replace `~/ArchiveBox` in the command above with the full path to a folder to us Make sure the data folder you use host is either a new, uncreated path, or if it already exists make sure it has permissions initially set to `777` so that the ArchiveBox command is able to set it to the specified `OUTPUT_PERMISSIONS` config setting on the first run. -## Usage +### Usage + +**To add a single URL to the archive** or a list of links from a file, pipe them in via stdin. This will archive each link passed in. -**To add a single URL to the archive** or a list of links from a file, pipe them in via stdin. This will archive each link passed in. ```bash echo 'https://example.com' | docker run -i -v ~/ArchiveBox:/data nikisweeting/archivebox # or @@ -130,27 +141,33 @@ cat bookmarks.html | docker run -i -v ~/ArchiveBox:/data nikisweeting/archivebox ``` **To add a list of pages via feed URL or remote file,** pass the URL of the feed as an argument. + ```bash docker run -v -v ~/ArchiveBox:/data nikisweeting/archivebox /bin/archive 'https://example.com/some/rss/feed.xml' ``` -Passing a URL as an argument here does not archive the specified URL, it downloads it and archives the links *inside* of it, so only use it for RSS feeds or other *lists of links* you want to add. To add an individual link use the instruction above and pass via stdin instead of by argument. -## Accessing the data +Passing a URL as an argument here does not archive the specified URL, it downloads it and archives the links _inside_ of it, so only use it for RSS feeds or other _lists of links_ you want to add. To add an individual link use the instruction above and pass via stdin instead of by argument. -### Using a bind folder +### Accessing the data + +#### Using a bind folder Use the flag: + ```bash -v /full/path/to/folder/on/host:/data ``` + This will use the folder `/full/path/to/folder/on/host` on your host to store the ArchiveBox output. -### Using a named Docker data volume +#### Using a named Docker data volume ```bash docker volume create archivebox-data ``` + Then use the flag: + ```bash -v archivebox-data:/data ``` @@ -159,21 +176,24 @@ You can mount your data volume using standard docker tools, or access the conten `/var/lib/docker/volumes/archivebox-data/_data` (on most Linux systems) On a Mac you'll have to enter the base Docker Linux VM first to access the volume data: + ```bash screen ~/Library/Containers/com.docker.docker/Data/vms/0/tty cd /var/lib/docker/volumes/archivebox-data/_data ``` -## Configuration +### Configuration ArchiveBox in Docker accepts all the same environment variables as normal, see the list on the [[Configuration]] page. To pass environment variables when running, you can use the env command. + ```bash echo 'https://example.com' | docker run -i -v ~/ArchiveBox:/data nikisweeting/archivebox env FETCH_SCREENSHOT=False /bin/archive ``` Or you can create an `ArchiveBox.env` file (copy from the default `etc/ArchiveBox.conf.default`) and pass it in like so: + ```bash docker run -i -v --env-file=ArchiveBox.env nikisweeting/archivebox -``` \ No newline at end of file +``` diff --git a/Install.md b/Install.md index 6a6afff..2bc7cf3 100644 --- a/Install.md +++ b/Install.md @@ -1,3 +1,5 @@ +# Install + ArchiveBox only has a few main dependencies apart from `python3`, and they can all be installed using your normal package manager. It usually takes 1min to get up and running if you use the [helper script](#automatic-setup), or about 5min if you install everything [manually](#manual-setup). @@ -131,4 +133,4 @@ You may optionally specify a second argument to `archive.py export.html 15324242 First, if you don't already have docker installed, follow the official install instructions for Linux, macOS, or Windows https://docs.docker.com/install/#supported-platforms. -Then see the [[Docker]] page for next steps. \ No newline at end of file +Then see the [[Docker]] page for next steps. diff --git a/Makefile b/Makefile new file mode 100644 index 0000000..5128596 --- /dev/null +++ b/Makefile @@ -0,0 +1,19 @@ +# Minimal makefile for Sphinx documentation +# + +# You can set these variables from the command line. +SPHINXOPTS = +SPHINXBUILD = sphinx-build +SOURCEDIR = . +BUILDDIR = _build + +# Put it first so that "make" without argument is like "make help". +help: + @$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) + +.PHONY: help Makefile + +# Catch-all target: route all unknown targets to Sphinx using the new +# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS). +%: Makefile + @$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) diff --git a/Publishing-Your-Archive.md b/Publishing-Your-Archive.md index dc32ee0..56808bf 100644 --- a/Publishing-Your-Archive.md +++ b/Publishing-Your-Archive.md @@ -1,4 +1,4 @@ -## Publishing Your Archive +# Publishing Your Archive The archive produced by `./archive` is suitable for serving on any provider that can host static html (e.g. github pages!). @@ -19,16 +19,15 @@ Make sure you're not running any content as CGI or PHP, you only want to serve s Urls look like: `https://archive.example.com/archive/1493350273/en.wikipedia.org/wiki/Dining_philosophers_problem.html` -**Security WARNING & Content Disclaimer** +## Security Concerns -Re-hosting other people's content has security implications for any other sites sharing your hosting domain. Make sure you understand -the dangers of hosting unknown archived CSS & JS files [on your shared domain](https://developer.mozilla.org/en-US/docs/Web/Security/Same-origin_policy). -Due to the security risk of serving some malicious JS you archived by accident, it's best to put this on a domain or subdomain -of its own to keep cookies separate and slightly mitigate [CSRF attacks](https://en.wikipedia.org/wiki/Cross-site_request_forgery) and other nastiness. +Re-hosting other people's content has security implications for any other sites sharing your hosting domain. Make sure you understand the dangers of hosting unknown archived CSS & JS files [on your shared domain](https://developer.mozilla.org/en-US/docs/Web/Security/Same-origin_policy). +Due to the security risk of serving some malicious JS you archived by accident, it's best to put this on a domain or subdomain of its own to keep cookies separate and slightly mitigate [CSRF attacks](https://en.wikipedia.org/wiki/Cross-site_request_forgery) and other nastiness. + +## Copyright Concerns + +Be aware that some sites you archive may not allow you to rehost their content publicly for copyright reasons, it's up to you to host responsibly and respond to takedown requests appropriately. You may also want to blacklist your archive in `/robots.txt` if you don't want to be publicly assosciated with all the links you archive via search engine results. -Be aware that some sites you archive may not allow you to rehost their content publicly for copyright reasons, -it's up to you to host responsibly and respond to takedown requests appropriately. - -Please modify the `FOOTER_INFO` config variable to add your contact info to the footer of your index. \ No newline at end of file +Please modify the `FOOTER_INFO` config variable to add your contact info to the footer of your index. diff --git a/Quickstart.md b/Quickstart.md index d5f49b4..f392555 100644 --- a/Quickstart.md +++ b/Quickstart.md @@ -1,3 +1,5 @@ +# Quickstart +
@@ -72,4 +74,4 @@ Open `output/index.html` to view your archive. (favicons will appear next to ea - Read [[Configuration]] to learn about the various archive method options - Read [[Scheduled Archiving]] to learn how to set up automatic daily archiving - Read [[Publishing Your Archive]] if you want to host your archive for others to access online - - Read [[Troubleshooting]] if you encounter any problems \ No newline at end of file + - Read [[Troubleshooting]] if you encounter any problems diff --git a/README.md b/README.md new file mode 120000 index 0000000..32d46ee --- /dev/null +++ b/README.md @@ -0,0 +1 @@ +../README.md \ No newline at end of file diff --git a/Roadmap.md b/Roadmap.md index 746dd7f..d6af8a6 100644 --- a/Roadmap.md +++ b/Roadmap.md @@ -1,4 +1,4 @@ -## Roadmap +# Roadmap @@ -8,7 +8,7 @@ --- -# Planned Specification +## Planned Specification To see how this spec has been scheduled / implemented / released so far, read these pull requests: - ✅ [v0.2.x](https://github.com/pirate/ArchiveBox/tree/483a3bef9e2b1a7b80611947a3be99b0cf4f9959) @@ -691,4 +691,4 @@ services: --- -**IMPORTANT**: *Please don't work on any of these major long-term tasks without [contacting me first](https://nicksweeting.com/blog#Contact-Me), work is already in progress for many of these, and I may have to reject your PR if it doesn't align with the existing work!* \ No newline at end of file +**IMPORTANT**: *Please don't work on any of these major long-term tasks without [contacting me first](https://nicksweeting.com/blog#Contact-Me), work is already in progress for many of these, and I may have to reject your PR if it doesn't align with the existing work!* diff --git a/Scheduled-Archiving.md b/Scheduled-Archiving.md index f8b4e7e..c261e5b 100644 --- a/Scheduled-Archiving.md +++ b/Scheduled-Archiving.md @@ -1,4 +1,6 @@ -## Schedule daily importing of new links into your archive +# Scheduled Archiving + +## Using Cron To schedule regular archiving you can use any task scheduler like `cron`, `at`, `sytsemd`, etc. @@ -8,7 +10,9 @@ ones as necessary. For some example configs, see the [`etc/cron.d`](https://github.com/pirate/ArchiveBox/blob/master/etc/cron.d) and [`etc/supervisord`](https://github.com/pirate/ArchiveBox/blob/master/etc/supervisord) folders. -## Example: Import Firefox browser history every 24 hours +## Examples + +### Example: Import Firefox browser history every 24 hours This example exports your browser history and archives it once a day: @@ -26,7 +30,7 @@ cd /opt/ArchiveBox 0 24 * * * www-data /opt/ArchiveBox/bin/firefox_custom.sh ``` -## Example: Import an RSS feed from Pocket every 12 hours +### Example: Import an RSS feed from Pocket every 12 hours This example imports your Pocket bookmark feed and archives any new links once a day: @@ -43,4 +47,4 @@ cd /opt/ArchiveBox **Then create a new file `/etc/cron.d/ArchiveBox-Pocket` to tell cron to run your script every 12 hours:** ```bash 0 12 * * * www-data /opt/ArchiveBox/bin/pocket_custom.sh -``` \ No newline at end of file +``` diff --git a/Security-Overview.md b/Security-Overview.md index aa192aa..7a32e33 100644 --- a/Security-Overview.md +++ b/Security-Overview.md @@ -1,3 +1,5 @@ +# Security Overview + ## Usage Modes ArchiveBox has three common usage modes outlined below. @@ -76,4 +78,4 @@ How much are you planning to archive? Only a few bookmarked articles, or thousa Are you publishing your archive? If so, make sure you're only serving it as HTML and not accidentally running it as php or cgi, and put it on its own domain not shared with other services. This is done in order to avoid cookies leaking between your main domain and domains hosting content you don't control. Many companies put user provided files on separate domains like googleusercontent.com and github.io to avoid this problem. -Published archives automatically include a `robots.txt` `Dissallow: /` to block search engines from indexing them. You may still wish to publish your contact info in the index footer though using [`FOOTER_INFO`](https://github.com/pirate/ArchiveBox/wiki/Configuration#FOOTER_INFO) so that you can respond to any DMCA and copyright takedown notices if you accidentally rehost copyrighted content. \ No newline at end of file +Published archives automatically include a `robots.txt` `Dissallow: /` to block search engines from indexing them. You may still wish to publish your contact info in the index footer though using [`FOOTER_INFO`](https://github.com/pirate/ArchiveBox/wiki/Configuration#FOOTER_INFO) so that you can respond to any DMCA and copyright takedown notices if you accidentally rehost copyrighted content. diff --git a/Troubleshooting.md b/Troubleshooting.md index e53ef45..c712592 100644 --- a/Troubleshooting.md +++ b/Troubleshooting.md @@ -1,3 +1,5 @@ +# Troubleshooting + ▶️ *If you need help or have a question, you can open an [issue](https://github.com/pirate/ArchiveBox/issues?q=is%3Aissue+is%3Aopen+sort%3Aupdated-desc) or reach out on [Twitter](https://github.com/theSquashSH).* What are you having an issue with?: @@ -9,11 +11,11 @@ What are you having an issue with?: --- -### Installing +## Installing Make sure you've followed the Manual Setup guide in the [[Install]] instructions first. Then check here for help depending on what component you need help with: -#### Python +### Python On some Linux distributions the python3 package might not be recent enough. If this is the case for you, resort to installing a recent enough version manually. @@ -22,7 +24,7 @@ add-apt-repository ppa:fkrull/deadsnakes && apt update && apt install python3.6 ``` If you still need help, [the official Python docs](https://docs.python.org/3.6/using/unix.html) are a good place to start. -#### Chromium/Google Chrome +### Chromium/Google Chrome For more info, see the [[Chromium Install]] page. @@ -62,7 +64,7 @@ env CHROME_BINARY=/path/from/step/1/chromium-browser ./archive bookmarks_export. ``` -#### Wget & Curl +### Wget & Curl If you're missing `wget` or `curl`, simply install them using `apt` or your package manager of choice. See the "Manual Setup" instructions for more details. @@ -71,14 +73,14 @@ If wget times out or randomly fails to download some sites that you have confirm upgrade wget to the most recent version with `brew upgrade wget` or `apt upgrade wget`. There is a bug in versions `<=1.19.1_1` that caused wget to fail for perfectly valid sites. -### Archiving +## Archiving -#### No links parsed from export file +### No links parsed from export file Please open an [issue](https://github.com/pirate/ArchiveBox/issues) with a description of where you got the export, and preferrably your export file attached (you can redact the links). We'll fix the parser to support your format. -#### Lots of skipped sites +### Lots of skipped sites If you ran the archiver once, it wont re-download sites subsequent times, it will only download new links. If you haven't already run it, make sure you have a working internet connection and that the parsed URLs look correct. @@ -86,23 +88,23 @@ You can check the `archive.py` output or `index.html` to see what links it's dow If you're still having issues, try deleting or moving the `output/archive` folder (back it up first!) and running `./archive` again. -#### Lots of errors +### Lots of errors Make sure you have all the dependencies installed and that you're able to visit the links from your browser normally. Open an [issue](https://github.com/pirate/ArchiveBox/issues) with a description of the errors if you're still having problems. -#### Lots of broken links from the index +### Lots of broken links from the index Not all sites can be effectively archived with each method, that's why it's best to use a combination of `wget`, PDFs, and screenshots. If it seems like more than 10-20% of sites in the archive are broken, open an [issue](https://github.com/pirate/ArchiveBox/issues) with some of the URLs that failed to be archived and I'll investigate. -#### Removing unwanted links from the index +### Removing unwanted links from the index If you accidentally added lots of unwanted links into index and they slow down your archiving, you can use the `bin/purge` script to remove them from your index, which removes everything matching python regexes you pass into it. E.g: `bin/purge -r 'amazon\.com' -r 'google\.com'`. It would prompt before removing links from index, but for extra safety you might want to back up `index.json` first (or put in undex version control). -### Hosting the Archive +## Hosting the Archive If you're having issues trying to host the archive via nginx, make sure you already have nginx running with SSL. If you don't, google around, there are plenty of tutorials to help get that set up. Open an [issue](https://github.com/pirate/ArchiveBox/issues) -if you have problem with a particular nginx config. \ No newline at end of file +if you have problem with a particular nginx config. diff --git a/Usage.md b/Usage.md index d94b02d..0633785 100644 --- a/Usage.md +++ b/Usage.md @@ -1,21 +1,24 @@ -▶️ *Make sure the dependencies are [fully installed](https://github.com/pirate/ArchiveBox/wiki/Install) before running any ArchiveBox commands.* +# Usage + +▶️ _Make sure the dependencies are [fully installed](https://github.com/pirate/ArchiveBox/wiki/Install) before running any ArchiveBox commands._ **ArchiveBox API Reference:** - + - - [Overview](#Overview): Program structure and outline of basic archiving process. - - [CLI Usage](#CLI-Usage): Docs and examples for the ArchiveBox command line interface. - - [UI Usage](#UI-Usage): Docs and screenshots for the outputted HTML archive interface. - - [Disk Layout](#Disk-Layout): Description of the archive folder structure and contents. +- [Overview](#Overview): Program structure and outline of basic archiving process. +- [CLI Usage](#CLI-Usage): Docs and examples for the ArchiveBox command line interface. +- [UI Usage](#UI-Usage): Docs and screenshots for the outputted HTML archive interface. +- [Disk Layout](#Disk-Layout): Description of the archive folder structure and contents. **Related:** - - [[Docker]]: Learn about ArchiveBox usage with Docker and Docker Compose - - [[Configuration]]: Learn about the various archive method options - - [[Scheduled Archiving]]: Learn how to set up automatic daily archiving - - [[Publishing Your Archive]]: Learn how to host your archive for others to access - - [[Troubleshooting]]: Resources if you encounter any problems - - [Screenshots](https://github.com/pirate/ArchiveBox#Screenshots): See what the CLI and outputted HTML look like + +- [[Docker]]: Learn about ArchiveBox usage with Docker and Docker Compose +- [[Configuration]]: Learn about the various archive method options +- [[Scheduled Archiving]]: Learn how to set up automatic daily archiving +- [[Publishing Your Archive]]: Learn how to host your archive for others to access +- [[Troubleshooting]]: Resources if you encounter any problems +- [Screenshots](https://github.com/pirate/ArchiveBox#Screenshots): See what the CLI and outputted HTML look like ## CLI Usage @@ -35,16 +38,18 @@ You can share a single archivebox data directory between Docker and non-Docker i For more examples see the [[Docker]] page. - - [Run ArchiveBox with configuration options](#Run-ArchiveBox-with-configuration-options) - - [Import a single URL or list of URLs via stdin](#Import-a-single-URL-or-list-of-URLs-via-stdin) - - [Import list of links exported from browser or another service](#Import-list-of-links-exported-from-browser-or-another-service) - - [Import list of URLs from a remote RSS feed or file](#Import-list-of-URLs-from-a-remote-RSS-feed-or-file) - - [Import list of links from browser history](#Import-list-of-links-from-browser-history) +- [Run ArchiveBox with configuration options](#Run-ArchiveBox-with-configuration-options) +- [Import a single URL or list of URLs via stdin](#Import-a-single-URL-or-list-of-URLs-via-stdin) +- [Import list of links exported from browser or another service](#Import-list-of-links-exported-from-browser-or-another-service) +- [Import list of URLs from a remote RSS feed or file](#Import-list-of-URLs-from-a-remote-RSS-feed-or-file) +- [Import list of links from browser history](#Import-list-of-links-from-browser-history) --- ### Run ArchiveBox with configuration options +You can set environment variables in your shell profile, a config file, or by using the `env` command. + ```bash # via the CLI archivebox config --set TIMEOUT=3600 @@ -62,26 +67,29 @@ If you're using Docker, also make sure to read the Configuration section on the --- -### Import a single URL or list of URLs via stdin +### Import a single URL + ```bash +<<<<<<< HEAD archivebox add 'https://example.com' # or echo 'https://example.com' | archivebox add ``` ---- -### Import a list of URLs from a file or feed +You can also add `--depth=1` to any of these commands if you want to recursively archive the URLs and all URLs one hop away. (e.g. all the outlinks on a page + the page). + +### Import a list of URLs from a txt file + ```bash +cat urls_to_archive.txt | archivebox add +# or archivebox add < urls_to_archive.txt # or curl https://getpocket.com/users/USERNAME/feed/all | archivebox add ``` + You can also pipe in RSS, XML, Netscape, or any of the other supported import formats via stdin. ---- - -### Import list of links exported from browser or another service - ```bash archivebox add < ~/Downloads/browser_bookmarks_export.html # or @@ -90,13 +98,11 @@ archivebox add < ~/Downloads/pinboard_bookmarks.json archivebox add < ~/Downloads/other_links.txt ``` -You can also add `--depth=1` to any of these commands if you want to recursively archive the URLs and all URLs one hop away. (e.g. all the outlinks on a page + the page). - --- ### Import list of links from browser history -This uses the `archivebox-export-browser-history` helper script to parse your browser's SQLite history database for URLs. +Look in the `bin/` folder of this repo to find a script to parse your browser's SQLite history database for URLs. Specify the type of the browser as the first argument, and optionally the path to the SQLite history file as the second argument. ```bash @@ -124,7 +130,7 @@ you can open `./index.html` in a browser. You should see something [like this]( You can sort by column, search using the box in the upper right, and see the total number of links at the bottom. -Click the Favicon under the "Files" column to go to the details page for each link. +Click the Favicon under the "Files" column to go to the details page for each link.
@@ -147,7 +153,7 @@ The `OUTPUT_DIR` folder (usually whatever folder you run `archivebox` in), conta - index.html # Archive method outputs: - - warc/ + - warc/ - media/ - git/ ... @@ -164,7 +170,7 @@ The `OUTPUT_DIR` folder (usually whatever folder you run `archivebox` in), conta ### Large Archives I've found it takes about an hour to download 1000 articles, and they'll take up roughly 1GB. -Those numbers are from running it single-threaded on my i5 machine with 50mbps down. YMMV. +Those numbers are from running it single-threaded on my i5 machine with 50mbps down. YMMV. Storage requirements go up immensely if you're using `FETCH_MEDIA=True` and are archiving many pages with audio & video. @@ -174,9 +180,25 @@ archivebox add < urls_chunk_1.txt & archivebox add < urls_chunk_2.txt & archivebox add < urls_chunk_3.txt & ``` +(though this may not be faster if you have a very large collection/main index) + Users have reported running it with 50k+ bookmarks with success (though it will take more RAM while running). If you already imported a huge list of bookmarks and want to import only new bookmarks, you can use the `ONLY_NEW` environment variable. This is useful if you want to import a bookmark dump periodically and want to skip broken links which are already in the index. + +## Python API Usage + +```python +from archivebox.main import add, info, remove, check_data_folder + +out_dir = '~/path/to/my/data/folder' +check_data_folder(out_dir=out_dir) +add('https://example.com', index_only=True, out_dir=out_dir) +info(out_dir=out_dir) +remove('https://example.com', delete=True, yes=True, out_dir=out_dir) +``` + +For more information see the Python API Reference. diff --git a/Web-Archiving-Community.md b/Web-Archiving-Community.md index ad2cdd9..5434c62 100644 --- a/Web-Archiving-Community.md +++ b/Web-Archiving-Community.md @@ -1,3 +1,5 @@ +# Web Archiving Community +
@@ -12,7 +14,6 @@ The internet archiving community is surprisingly far-reaching and almost univers Whether you want to learn which organizations are the big players in the web archiving space, want to find a specific open source tool for your web archiving need, or just want to see where archivists hang out online, this is my attempt at an index of the entire web archiving community. -## Contents @@ -394,4 +395,4 @@ You can find more organizations and initiatives on these other lists: [![](https://img.shields.io/badge/Donate-Archive.org-%23115D76.svg)](https://archive.org/donate/)

^   Back to Top   ^ -
\ No newline at end of file +
diff --git a/archivebox.cli.rst b/archivebox.cli.rst new file mode 100644 index 0000000..7c6a357 --- /dev/null +++ b/archivebox.cli.rst @@ -0,0 +1,142 @@ +archivebox.cli package +====================== + +Submodules +---------- + +archivebox.cli.archivebox module +-------------------------------- + +.. automodule:: archivebox.cli.archivebox + :members: + :undoc-members: + :show-inheritance: + +archivebox.cli.archivebox\_add module +------------------------------------- + +.. automodule:: archivebox.cli.archivebox_add + :members: + :undoc-members: + :show-inheritance: + +archivebox.cli.archivebox\_config module +---------------------------------------- + +.. automodule:: archivebox.cli.archivebox_config + :members: + :undoc-members: + :show-inheritance: + +archivebox.cli.archivebox\_help module +-------------------------------------- + +.. automodule:: archivebox.cli.archivebox_help + :members: + :undoc-members: + :show-inheritance: + +archivebox.cli.archivebox\_info module +-------------------------------------- + +.. automodule:: archivebox.cli.archivebox_info + :members: + :undoc-members: + :show-inheritance: + +archivebox.cli.archivebox\_init module +-------------------------------------- + +.. automodule:: archivebox.cli.archivebox_init + :members: + :undoc-members: + :show-inheritance: + +archivebox.cli.archivebox\_list module +-------------------------------------- + +.. automodule:: archivebox.cli.archivebox_list + :members: + :undoc-members: + :show-inheritance: + +archivebox.cli.archivebox\_manage module +---------------------------------------- + +.. automodule:: archivebox.cli.archivebox_manage + :members: + :undoc-members: + :show-inheritance: + +archivebox.cli.archivebox\_remove module +---------------------------------------- + +.. automodule:: archivebox.cli.archivebox_remove + :members: + :undoc-members: + :show-inheritance: + +archivebox.cli.archivebox\_schedule module +------------------------------------------ + +.. automodule:: archivebox.cli.archivebox_schedule + :members: + :undoc-members: + :show-inheritance: + +archivebox.cli.archivebox\_server module +---------------------------------------- + +.. automodule:: archivebox.cli.archivebox_server + :members: + :undoc-members: + :show-inheritance: + +archivebox.cli.archivebox\_shell module +--------------------------------------- + +.. automodule:: archivebox.cli.archivebox_shell + :members: + :undoc-members: + :show-inheritance: + +archivebox.cli.archivebox\_update module +---------------------------------------- + +.. automodule:: archivebox.cli.archivebox_update + :members: + :undoc-members: + :show-inheritance: + +archivebox.cli.archivebox\_version module +----------------------------------------- + +.. automodule:: archivebox.cli.archivebox_version + :members: + :undoc-members: + :show-inheritance: + +archivebox.cli.logging module +----------------------------- + +.. automodule:: archivebox.cli.logging + :members: + :undoc-members: + :show-inheritance: + +archivebox.cli.tests module +--------------------------- + +.. automodule:: archivebox.cli.tests + :members: + :undoc-members: + :show-inheritance: + + +Module contents +--------------- + +.. automodule:: archivebox.cli + :members: + :undoc-members: + :show-inheritance: diff --git a/archivebox.config.rst b/archivebox.config.rst new file mode 100644 index 0000000..b71af50 --- /dev/null +++ b/archivebox.config.rst @@ -0,0 +1,22 @@ +archivebox.config package +========================= + +Submodules +---------- + +archivebox.config.stubs module +------------------------------ + +.. automodule:: archivebox.config.stubs + :members: + :undoc-members: + :show-inheritance: + + +Module contents +--------------- + +.. automodule:: archivebox.config + :members: + :undoc-members: + :show-inheritance: diff --git a/archivebox.core.migrations.rst b/archivebox.core.migrations.rst new file mode 100644 index 0000000..72c2291 --- /dev/null +++ b/archivebox.core.migrations.rst @@ -0,0 +1,30 @@ +archivebox.core.migrations package +================================== + +Submodules +---------- + +archivebox.core.migrations.0001\_initial module +----------------------------------------------- + +.. automodule:: archivebox.core.migrations.0001_initial + :members: + :undoc-members: + :show-inheritance: + +archivebox.core.migrations.0002\_auto\_20190417\_0739 module +------------------------------------------------------------ + +.. automodule:: archivebox.core.migrations.0002_auto_20190417_0739 + :members: + :undoc-members: + :show-inheritance: + + +Module contents +--------------- + +.. automodule:: archivebox.core.migrations + :members: + :undoc-members: + :show-inheritance: diff --git a/archivebox.core.rst b/archivebox.core.rst new file mode 100644 index 0000000..8b4682c --- /dev/null +++ b/archivebox.core.rst @@ -0,0 +1,93 @@ +archivebox.core package +======================= + +Subpackages +----------- + +.. toctree:: + + archivebox.core.migrations + +Submodules +---------- + +archivebox.core.admin module +---------------------------- + +.. automodule:: archivebox.core.admin + :members: + :undoc-members: + :show-inheritance: + +archivebox.core.apps module +--------------------------- + +.. automodule:: archivebox.core.apps + :members: + :undoc-members: + :show-inheritance: + +archivebox.core.models module +----------------------------- + +.. automodule:: archivebox.core.models + :members: + :undoc-members: + :show-inheritance: + +archivebox.core.settings module +------------------------------- + +.. automodule:: archivebox.core.settings + :members: + :undoc-members: + :show-inheritance: + +archivebox.core.tests module +---------------------------- + +.. automodule:: archivebox.core.tests + :members: + :undoc-members: + :show-inheritance: + +archivebox.core.urls module +--------------------------- + +.. automodule:: archivebox.core.urls + :members: + :undoc-members: + :show-inheritance: + +archivebox.core.views module +---------------------------- + +.. automodule:: archivebox.core.views + :members: + :undoc-members: + :show-inheritance: + +archivebox.core.welcome\_message module +--------------------------------------- + +.. automodule:: archivebox.core.welcome_message + :members: + :undoc-members: + :show-inheritance: + +archivebox.core.wsgi module +--------------------------- + +.. automodule:: archivebox.core.wsgi + :members: + :undoc-members: + :show-inheritance: + + +Module contents +--------------- + +.. automodule:: archivebox.core + :members: + :undoc-members: + :show-inheritance: diff --git a/archivebox.extractors.rst b/archivebox.extractors.rst new file mode 100644 index 0000000..a8ba6a3 --- /dev/null +++ b/archivebox.extractors.rst @@ -0,0 +1,86 @@ +archivebox.extractors package +============================= + +Submodules +---------- + +archivebox.extractors.archive\_org module +----------------------------------------- + +.. automodule:: archivebox.extractors.archive_org + :members: + :undoc-members: + :show-inheritance: + +archivebox.extractors.dom module +-------------------------------- + +.. automodule:: archivebox.extractors.dom + :members: + :undoc-members: + :show-inheritance: + +archivebox.extractors.favicon module +------------------------------------ + +.. automodule:: archivebox.extractors.favicon + :members: + :undoc-members: + :show-inheritance: + +archivebox.extractors.git module +-------------------------------- + +.. automodule:: archivebox.extractors.git + :members: + :undoc-members: + :show-inheritance: + +archivebox.extractors.media module +---------------------------------- + +.. automodule:: archivebox.extractors.media + :members: + :undoc-members: + :show-inheritance: + +archivebox.extractors.pdf module +-------------------------------- + +.. automodule:: archivebox.extractors.pdf + :members: + :undoc-members: + :show-inheritance: + +archivebox.extractors.screenshot module +--------------------------------------- + +.. automodule:: archivebox.extractors.screenshot + :members: + :undoc-members: + :show-inheritance: + +archivebox.extractors.title module +---------------------------------- + +.. automodule:: archivebox.extractors.title + :members: + :undoc-members: + :show-inheritance: + +archivebox.extractors.wget module +--------------------------------- + +.. automodule:: archivebox.extractors.wget + :members: + :undoc-members: + :show-inheritance: + + +Module contents +--------------- + +.. automodule:: archivebox.extractors + :members: + :undoc-members: + :show-inheritance: diff --git a/archivebox.index.rst b/archivebox.index.rst new file mode 100644 index 0000000..49ab62c --- /dev/null +++ b/archivebox.index.rst @@ -0,0 +1,54 @@ +archivebox.index package +======================== + +Submodules +---------- + +archivebox.index.csv module +--------------------------- + +.. automodule:: archivebox.index.csv + :members: + :undoc-members: + :show-inheritance: + +archivebox.index.html module +---------------------------- + +.. automodule:: archivebox.index.html + :members: + :undoc-members: + :show-inheritance: + +archivebox.index.json module +---------------------------- + +.. automodule:: archivebox.index.json + :members: + :undoc-members: + :show-inheritance: + +archivebox.index.schema module +------------------------------ + +.. automodule:: archivebox.index.schema + :members: + :undoc-members: + :show-inheritance: + +archivebox.index.sql module +--------------------------- + +.. automodule:: archivebox.index.sql + :members: + :undoc-members: + :show-inheritance: + + +Module contents +--------------- + +.. automodule:: archivebox.index + :members: + :undoc-members: + :show-inheritance: diff --git a/archivebox.parsers.rst b/archivebox.parsers.rst new file mode 100644 index 0000000..d3b902c --- /dev/null +++ b/archivebox.parsers.rst @@ -0,0 +1,78 @@ +archivebox.parsers package +========================== + +Submodules +---------- + +archivebox.parsers.generic\_json module +--------------------------------------- + +.. automodule:: archivebox.parsers.generic_json + :members: + :undoc-members: + :show-inheritance: + +archivebox.parsers.generic\_rss module +-------------------------------------- + +.. automodule:: archivebox.parsers.generic_rss + :members: + :undoc-members: + :show-inheritance: + +archivebox.parsers.generic\_txt module +-------------------------------------- + +.. automodule:: archivebox.parsers.generic_txt + :members: + :undoc-members: + :show-inheritance: + +archivebox.parsers.medium\_rss module +------------------------------------- + +.. automodule:: archivebox.parsers.medium_rss + :members: + :undoc-members: + :show-inheritance: + +archivebox.parsers.netscape\_html module +---------------------------------------- + +.. automodule:: archivebox.parsers.netscape_html + :members: + :undoc-members: + :show-inheritance: + +archivebox.parsers.pinboard\_rss module +--------------------------------------- + +.. automodule:: archivebox.parsers.pinboard_rss + :members: + :undoc-members: + :show-inheritance: + +archivebox.parsers.pocket\_html module +-------------------------------------- + +.. automodule:: archivebox.parsers.pocket_html + :members: + :undoc-members: + :show-inheritance: + +archivebox.parsers.shaarli\_rss module +-------------------------------------- + +.. automodule:: archivebox.parsers.shaarli_rss + :members: + :undoc-members: + :show-inheritance: + + +Module contents +--------------- + +.. automodule:: archivebox.parsers + :members: + :undoc-members: + :show-inheritance: diff --git a/archivebox.rst b/archivebox.rst new file mode 100644 index 0000000..b96e694 --- /dev/null +++ b/archivebox.rst @@ -0,0 +1,58 @@ +archivebox package +================== + +Subpackages +----------- + +.. toctree:: + + archivebox.cli + archivebox.config + archivebox.core + archivebox.extractors + archivebox.index + archivebox.parsers + +Submodules +---------- + +archivebox.main module +---------------------- + +.. automodule:: archivebox.main + :members: + :undoc-members: + :show-inheritance: + +archivebox.manage module +------------------------ + +.. automodule:: archivebox.manage + :members: + :undoc-members: + :show-inheritance: + +archivebox.system module +------------------------ + +.. automodule:: archivebox.system + :members: + :undoc-members: + :show-inheritance: + +archivebox.util module +---------------------- + +.. automodule:: archivebox.util + :members: + :undoc-members: + :show-inheritance: + + +Module contents +--------------- + +.. automodule:: archivebox + :members: + :undoc-members: + :show-inheritance: diff --git a/conf.py b/conf.py new file mode 100644 index 0000000..d4daedd --- /dev/null +++ b/conf.py @@ -0,0 +1,134 @@ +# Configuration file for the Sphinx documentation builder. +# +# This file only contains a selection of the most common options. For a full +# list see the documentation: +# http://www.sphinx-doc.org/en/master/config + +# -- Path setup -------------------------------------------------------------- + +# If extensions (or modules to document with autodoc) are in another directory, +# add these directories to sys.path here. If the directory is relative to the +# documentation root, use os.path.abspath to make it absolute, like shown here. +# +import os +import sys + +import django +import recommonmark +from recommonmark.transform import AutoStructify + +os.environ['USE_CHROME'] = 'False' + +PYTHON_DIR = os.path.abspath(os.path.join(os.path.dirname(os.path.abspath(__file__)), '..', 'archivebox')) + +sys.path.insert(0, os.path.abspath('.')) +sys.path.insert(0, os.path.abspath('../')) +sys.path.insert(0, PYTHON_DIR) +os.environ.setdefault("DJANGO_SETTINGS_MODULE", "core.settings") +django.setup() + +VERSION = open(os.path.join(PYTHON_DIR, 'VERSION'), 'r').read().strip() + +# -- Project information ----------------------------------------------------- + +project = 'ArchiveBox' +copyright = '2020, Nick Sweeting' +author = 'Nick Sweeting' +github_url = 'https://github.com/pirate/ArchiveBox' +github_doc_root = 'https://github.com/pirate/ArchiveBox/tree/master/docs/' +language = 'en' + +# The full version, including alpha/beta/rc tags +release = VERSION + + +# -- General configuration --------------------------------------------------- + +# Add any Sphinx extension module names here, as strings. They can be +# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom +# ones. +extensions = [ + 'sphinx.ext.autodoc', + 'sphinx.ext.napoleon', + 'sphinx.ext.viewcode', + 'recommonmark', +] + +source_suffix = { + '.rst': 'restructuredtext', + '.txt': 'markdown', + '.md': 'markdown', +} +master_doc = 'index' +napoleon_google_docstring = True +napoleon_use_param = True +napoleon_use_ivar = False +napoleon_use_rtype = True +napoleon_include_special_with_doc = False + +# Add any paths that contain templates here, relative to this directory. +templates_path = ['_templates'] + +# List of patterns, relative to source directory, that match files and +# directories to ignore when looking for source files. +# This pattern also affects html_static_path and html_extra_path. +exclude_patterns = [ + '_build', + 'Thumbs.db', + '.DS_Store', + 'data', + 'output', + 'templates', + 'tests', + 'migrations', +] + + +# -- Options for HTML output ------------------------------------------------- + +# The theme to use for HTML and HTML Help pages. See the documentation for +# a list of builtin themes. +# +html_logo = 'logo.png' +html_theme = 'sphinx_rtd_theme' +html_theme_options = { + 'navigation_depth': 5, + 'collapse_navigation': False, + 'sticky_navigation': True, +} +html_show_sphinx = False + +texinfo_documents = [ + (master_doc, 'archivebox', 'archivebox Documentation', + author, 'archivebox', 'The open-source self-hosted internet archive.', + 'Miscellaneous'), +] + +autodoc_default_flags = ['members'] +autodoc_member_order = 'bysource' +extensions += ['sphinx.ext.autosummary',] +autosummary_gerenerate = True + +pygments_style = 'sphinx' + +# Add any paths that contain custom static files (such as style sheets) here, +# relative to this directory. They are copied after the builtin static files, +# so a file named "default.css" will overwrite the builtin "default.css". +html_static_path = ['_static'] + + +man_pages = [ + (master_doc, 'archivebox', 'archivebox Documentation', + [author], 1) +] + + + + +# At the bottom of conf.py +def setup(app): + app.add_config_value('recommonmark_config', { + # 'url_resolver': lambda url: github_doc_root + url, + 'auto_toc_tree_section': 'Documentation', + }, True) + app.add_transform(AutoStructify) diff --git a/index.rst b/index.rst new file mode 100644 index 0000000..86d821a --- /dev/null +++ b/index.rst @@ -0,0 +1,40 @@ +.. sidebar:: Welcome to ArchiveBox! + + Just getting started? + Check out the `Quickstart `_ guide. + Need help with something? + Ping us on `Twitter `_ or `Github `_. + Want to join the community? + See our `Community Wiki `_ page. + + .. image:: logo.png + :width: 200px + :align: center + :alt: ArchiveBox Logo + +========== +ArchiveBox +========== + + "The open-source self-hosted internet archive." + +`Website `_ | `Github `_ | `Source `_ | `Bug Tracker `_ + +.. code-block:: bash + + mkdir my-archive; cd my-archive/ + pip install archivebox + + archivebox init + archivebox add https://example.com + archivebox info + + +============= +Documentation +============= + +.. toctree:: + :maxdepth: 2 + + Contents.rst diff --git a/logo.png b/logo.png new file mode 100644 index 0000000..209787e Binary files /dev/null and b/logo.png differ diff --git a/modules.rst b/modules.rst new file mode 100644 index 0000000..2ef953f --- /dev/null +++ b/modules.rst @@ -0,0 +1,7 @@ +archivebox +========== + +.. toctree:: + :maxdepth: 4 + + archivebox