1
0
mirror of https://github.com/pirate/ArchiveBox.git synced 2025-08-27 08:14:38 +02:00

Created Configuration (markdown)

Nick Sweeting
2018-12-31 19:53:07 -05:00
parent b590512f2e
commit 3bb035f58a

36
Configuration.md Normal file

@@ -0,0 +1,36 @@
You can tweak parameters via environment variables, or by editing `config.py` directly:
```bash
env CHROME_BINARY=google-chrome-stable RESOLUTION=1440,900 FETCH_PDF=False ./archive ~/Downloads/bookmarks_export.html
```
**Shell Options:**
- colorize console ouput: `USE_COLOR` value: [`True`]/`False`
- show progress bar: `SHOW_PROGRESS` value: [`True`]/`False`
- archive permissions: `OUTPUT_PERMISSIONS` values: [`755`]/`644`/`...`
**Dependency Options:**
- path to Chrome: `CHROME_BINARY` values: [`chromium-browser`]/`/usr/local/bin/google-chrome`/`...`
- path to wget: `WGET_BINARY` values: [`wget`]/`/usr/local/bin/wget`/`...`
**Archive Options:**
- maximum allowed download time per link: `TIMEOUT` values: [`60`]/`30`/`...`
- import only new links: `ONLY_NEW` values `True`/[`False`]
- archive methods (values: [`True`]/`False`):
- fetch page with wget: `FETCH_WGET`
- fetch images/css/js with wget: `FETCH_WGET_REQUISITES` (True is highly recommended)
- print page as PDF: `FETCH_PDF`
- fetch a screenshot of the page: `FETCH_SCREENSHOT`
- fetch a DOM dump of the page: `FETCH_DOM`
- fetch a favicon for the page: `FETCH_FAVICON`
- submit the page to archive.org: `SUBMIT_ARCHIVE_DOT_ORG`
- screenshot: `RESOLUTION` values: [`1440,900`]/`1024,768`/`...`
- user agent: `WGET_USER_AGENT` values: [`Wget/1.19.1`]/`"Mozilla/5.0 ..."`/`...`
- chrome profile: `CHROME_USER_DATA_DIR` values: [`~/Library/Application\ Support/Google/Chrome/Default`]/`/tmp/chrome-profile`/`...`
To capture sites that require a user to be logged in, you must specify a path to a chrome profile (which loads the cookies needed for the user to be logged in). If you don't have an existing chrome profile, create one with `chromium-browser --disable-gpu --user-data-dir=/tmp/chrome-profile`, and log into the sites you need. Then set `CHROME_USER_DATA_DIR=/tmp/chrome-profile` to make ArchiveBox use that profile.
- output directory: `OUTPUT_DIR` values: [`$REPO_DIR/output`]/`/srv/www/bookmarks`/`...` Optionally output the archives to an alternative directory.
(See defaults & more at the top of `config.py`)
To tweak the outputted html index file's look and feel, just edit the HTML files in `archiver/templates/`.
The chrome/chromium dependency is _optional_ and only required for screenshots, PDF, and DOM dump output, it can be safely ignored if those three methods are disabled.