1
0
mirror of https://github.com/pirate/ArchiveBox.git synced 2025-08-22 14:13:01 +02:00

Updated Security Overview (markdown)

Nick Sweeting
2024-05-03 19:07:11 -07:00
parent f99dca1092
commit b57b378207

@@ -126,7 +126,21 @@ More info:
### Publishing ### Publishing
Are you [publishing your archive](https://github.com/ArchiveBox/ArchiveBox/wiki/Publishing-Your-Archive)? If so, make sure you use the built-in `archivebox server` or only serve the static export as static HTML (don't accidentally serve it as PHP or CGI or you may execute malicious archived files by accident). Regardless of how you serve it, make sure to put it on its own domain not shared with other services. This is done in order to avoid cookies leaking between your main domain and domains hosting content you don't control. A common practice is to put user provided / untrusted archived content on completely separate top-level domains from anything else (like how Google and Github do with `googleusercontent.com` and `github.io`). > [!CAUTION]
> Re-hosting untrusted archived content on a domain can potentially compromise *all apps on that domain*!
> (including other subdomains)
Make sure you thoroughly understand the dangers of [hosting untrusted HTML/JS/CSS that may be captured during archiving](https://developer.mozilla.org/en-US/docs/Web/Security/Same-origin_policy), and how viewing it can enable [CSRF attacks](https://en.wikipedia.org/wiki/Cross-site_request_forgery) across all apps on the same domain. If a logged-in user happens to visit an archived page with malicious Javascript embedded, it would allow the JS to hijack any cookies on the domain and pretend to be them, potentially exfiltrating or modifying other Snapshots/data on your server.
(This is why we don't support serving ArchiveBox from a subdirectory like `myapps.example.com/archivebox/`, it's too dangerous to share domains)
The industry standard approach is to use a separate domain for untrusted content, for example Github uses `githubusercontent.com` and Google uses `googleusercontent.com` for all user-uploaded files. If hosting ArchiveBox publicly, do the same and keep it on an isolated domain in order to mitigate potential damage of leaked cookies, CORS, and CSRF attack.
To protect the Admin dashboard, it's also recommended to serve all content under `/archive/` on a separate domain from `/admin/`. We do this on our servers using a simple redirect rule in nginx/cloudflare like so:
- https://demo.archivebox.io: only serves `/`, redirects `/archive/*` to `demo-static.`
- https://demo-static.archivebox.io: only serves `/archive/`, redirects everything else to `demo.`
Published archives automatically include a `robots.txt` `Dissallow: /` to block search engines from indexing them. You may still wish to publish your contact info in the index footer though using [`FOOTER_INFO`](https://github.com/ArchiveBox/ArchiveBox/wiki/Configuration#FOOTER_INFO) so that you can respond to any DMCA and copyright takedown notices if you accidentally rehost copyrighted content. Published archives automatically include a `robots.txt` `Dissallow: /` to block search engines from indexing them. You may still wish to publish your contact info in the index footer though using [`FOOTER_INFO`](https://github.com/ArchiveBox/ArchiveBox/wiki/Configuration#FOOTER_INFO) so that you can respond to any DMCA and copyright takedown notices if you accidentally rehost copyrighted content.