From d1c499f14aedd87e88637159fada75c39a85a022 Mon Sep 17 00:00:00 2001 From: Nick Sweeting Date: Fri, 11 Jan 2019 08:02:16 -0500 Subject: [PATCH] Updated Configuration (markdown) --- Configuration.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/Configuration.md b/Configuration.md index 81c4b5d..901a33f 100644 --- a/Configuration.md +++ b/Configuration.md @@ -24,6 +24,7 @@ env CHROME_BINARY=google-chrome-stable RESOLUTION=1440,900 FETCH_PDF=False ./arc ### Archive Options - maximum allowed download time per link: `TIMEOUT` values: [`60`]/`30`/`...` + - maximum allowed download time per media file: `MEDIA_TIMEOUT` values: [`3600`]/`120`/`...` - import only new links: `ONLY_NEW` values `True`/[`False`] - archive methods (values: [`True`]/`False`): - fetch page with wget: `FETCH_WGET` @@ -31,10 +32,15 @@ env CHROME_BINARY=google-chrome-stable RESOLUTION=1440,900 FETCH_PDF=False ./arc - print page as PDF: `FETCH_PDF` - fetch a screenshot of the page: `FETCH_SCREENSHOT` - fetch a DOM dump of the page: `FETCH_DOM` + - fetch git repositories on the page: `FETCH_GIT` + - fetch a WARC dump of the page: `FETCH_WARC` + - fetch all audio and video on the page: `FETCH_MEDIA` + - fetch a DOM dump of the page: `FETCH_DOM` - fetch a favicon for the page: `FETCH_FAVICON` - submit the page to archive.org: `SUBMIT_ARCHIVE_DOT_ORG` - screenshot: `RESOLUTION` values: [`1440,900`]/`1024,768`/`...` - user agent: `WGET_USER_AGENT` values: [`Wget/1.19.1`]/`"Mozilla/5.0 ..."`/`...` + - git domains: `GIT_DOMAINS` values: [`github.com,bitbucket.org,gitlab.com`]/`git.example.com`/`...` - chrome profile: `CHROME_USER_DATA_DIR` values: [`~/Library/Application\ Support/Google/Chrome/Default`]/`/tmp/chrome-profile`/`...` To capture sites that require a user to be logged in, you must specify a path to a chrome profile (which loads the cookies needed for the user to be logged in). If you don't have an existing chrome profile, create one with `chromium-browser --disable-gpu --user-data-dir=/tmp/chrome-profile`, and log into the sites you need. Then set `CHROME_USER_DATA_DIR=/tmp/chrome-profile` to make ArchiveBox use that profile. - output directory: `OUTPUT_DIR` values: [`$REPO_DIR/output`]/`/srv/www/bookmarks`/`...` Optionally output the archives to an alternative directory.