
@@ -128,7 +155,7 @@ The `output/` folder containing the UI HTML and archived data has the structure
- index.html
# Archive method outputs:
- - warc/
+ - warc/
- media/
- git/
...
@@ -145,17 +172,19 @@ The `output/` folder containing the UI HTML and archived data has the structure
### Large Archives
I've found it takes about an hour to download 1000 articles, and they'll take up roughly 1GB.
-Those numbers are from running it single-threaded on my i5 machine with 50mbps down. YMMV.
+Those numbers are from running it single-threaded on my i5 machine with 50mbps down. YMMV.
Storage requirements go up immensely if you're using `FETCH_MEDIA=True` and are archiving many pages with audio & video.
You can run it in parallel by using the `resume` feature, or by manually splitting export.html into multiple files:
+
```bash
./archive export.html 1498800000 & # second argument is timestamp to resume downloading from
./archive export.html 1498810000 &
./archive export.html 1498820000 &
./archive export.html 1498830000 &
```
+
Users have reported running it with 50k+ bookmarks with success (though it will take more RAM while running).
If you already imported a huge list of bookmarks and want to import only new
@@ -163,7 +192,6 @@ bookmarks, you can use the `ONLY_NEW` environment variable. This is useful if
you want to import a bookmark dump periodically and want to skip broken links
which are already in the index.
-
## Python API Usage
```python