1
0
mirror of https://github.com/pirate/ArchiveBox.git synced 2025-09-02 02:42:38 +02:00

Updated Home (markdown)

Nick Sweeting
2019-03-05 12:26:01 -05:00
parent 243db316b8
commit aebfda61a6

@@ -53,11 +53,13 @@ organized by timestamp bookmarked. It's Powered by [headless](https://developer
Wget doesn't work on sites you need to be logged into, but chrome headless does, see the [Configuration](#configuration)* section for `CHROME_USER_DATA_DIR`. Wget doesn't work on sites you need to be logged into, but chrome headless does, see the [Configuration](#configuration)* section for `CHROME_USER_DATA_DIR`.
**Large Exports & Estimated Runtime:** ### Large Exports
I've found it takes about an hour to download 1000 articles, and they'll take up roughly 1GB. I've found it takes about an hour to download 1000 articles, and they'll take up roughly 1GB.
Those numbers are from running it single-threaded on my i5 machine with 50mbps down. YMMV. Those numbers are from running it single-threaded on my i5 machine with 50mbps down. YMMV.
Storage requirements go up immensely if you're using `FETCH_MEDIA=True` and are archiving many pages with audio & video.
You can run it in parallel by using the `resume` feature, or by manually splitting export.html into multiple files: You can run it in parallel by using the `resume` feature, or by manually splitting export.html into multiple files:
```bash ```bash
./archive export.html 1498800000 & # second argument is timestamp to resume downloading from ./archive export.html 1498800000 & # second argument is timestamp to resume downloading from