mirror of
https://github.com/pirate/ArchiveBox.git
synced 2025-09-02 02:42:38 +02:00
Updated Home (markdown)
4
Home.md
4
Home.md
@@ -53,11 +53,13 @@ organized by timestamp bookmarked. It's Powered by [headless](https://developer
|
|||||||
|
|
||||||
Wget doesn't work on sites you need to be logged into, but chrome headless does, see the [Configuration](#configuration)* section for `CHROME_USER_DATA_DIR`.
|
Wget doesn't work on sites you need to be logged into, but chrome headless does, see the [Configuration](#configuration)* section for `CHROME_USER_DATA_DIR`.
|
||||||
|
|
||||||
**Large Exports & Estimated Runtime:**
|
### Large Exports
|
||||||
|
|
||||||
I've found it takes about an hour to download 1000 articles, and they'll take up roughly 1GB.
|
I've found it takes about an hour to download 1000 articles, and they'll take up roughly 1GB.
|
||||||
Those numbers are from running it single-threaded on my i5 machine with 50mbps down. YMMV.
|
Those numbers are from running it single-threaded on my i5 machine with 50mbps down. YMMV.
|
||||||
|
|
||||||
|
Storage requirements go up immensely if you're using `FETCH_MEDIA=True` and are archiving many pages with audio & video.
|
||||||
|
|
||||||
You can run it in parallel by using the `resume` feature, or by manually splitting export.html into multiple files:
|
You can run it in parallel by using the `resume` feature, or by manually splitting export.html into multiple files:
|
||||||
```bash
|
```bash
|
||||||
./archive export.html 1498800000 & # second argument is timestamp to resume downloading from
|
./archive export.html 1498800000 & # second argument is timestamp to resume downloading from
|
||||||
|
Reference in New Issue
Block a user