From 446d12b62ea16711ddea1af6f409332c324a33df Mon Sep 17 00:00:00 2001 From: Nick Sweeting Date: Thu, 14 Mar 2024 00:19:12 -0700 Subject: [PATCH] Updated Chromium Install (markdown) --- Chromium-Install.md | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/Chromium-Install.md b/Chromium-Install.md index 2fed02a..3af7155 100644 --- a/Chromium-Install.md +++ b/Chromium-Install.md @@ -67,6 +67,13 @@ If you encounter problems setting up Google Chrome or Chromium, see the [Trouble You may choose to set up a Chrome/Chromium user profile in order to use your cookies/sessions to log into sites behind authentication/paywall during archiving. +*Note: not all extractors use Chrome (e.g. `wget`, `mercury`, `media`), so [`COOKIES_FILE`](https://github.com/ArchiveBox/ArchiveBox/wiki/Configuration/#cookies_file) should be set up as well after this.* + +> [!WARNING] +> **Make sure you use separate credentials dedicated to archiving,** e.g. don't log in with your normal daily Facebook/Instagram/Youtube/etc. accounts as server responses and page content will often contain your name/email/PII, session cookies, private tokens, etc.! +> +> You need to use a separate account to make sure you don't leak your account info to any future viewers of your snapshots (even if you keep your archive data private for now, you may want to share a snapshot in the future, and they're very hard to sanitize after-the-fact!). + ### Docker Setup If using ArchiveBox in Docker, the easiest way to set up session credentials is by attaching the ArchiveBox browser to a virtual window server in a sidecar container, and logging in to your sites over VNC (less complicated than it sounds). @@ -121,11 +128,6 @@ docker compose add 'https://example.com/some/site/requiring/login.html' # make sure the content appears as your logged-in user would see it ``` -*Note: not all extractors use Chrome (e.g. `wget`, `mercury`, `media`), so `COOKIES_FILE` should be set up as well. - -> [!WARNING] -> Make sure you use separate credentials dedicated to archiving, e.g. don't log in with your normal daily Facebook/Instagram/Youtube/etc. accounts as server responses and page content will often contain your name/email/PII, session cookies, private tokens, etc.! You need to use a separate account to make sure you don't leak your account info to any future viewers of your snapshots (even if you keep your archive data private for now, you may want to share a snapshot in the future, and they're very hard to sanitize after-the-fact!). - ### Non-Docker Setup (Local Host)