1
0
mirror of https://github.com/pirate/ArchiveBox.git synced 2025-09-02 19:02:37 +02:00

Updated Roadmap (markdown)

Nick Sweeting
2019-04-03 00:17:16 -04:00
parent 71b1537a4a
commit 61ac05bf57

@@ -130,8 +130,8 @@ Initialize a new "collection" folder, aka a complete archive containing an Archi
### `$ archivebox add` ### `$ archivebox add`
#### `--skip=[existing|none]` #### `--only-new`
Controls whether to skip links that have been previously archived. To re-archive links and take a new snapshot every time they're added, pass `none`. Controls whether to only add new links or also retry previously failed/skipped links.
#### `--mirror` #### `--mirror`
Archive an entire site (finding all linked pages below it on the same domain) Archive an entire site (finding all linked pages below it on the same domain)
@@ -226,8 +226,8 @@ USE_CHROME=False
#### `(no args)` #### `(no args)`
Update the index and go through each page, retrying any that failed previously. Update the index and go through each page, retrying any that failed previously.
#### `--skip=[none|existing]` #### `--only-new`
By default it always retries previously failed pages, set this to `existing` to only archive newly added links. By default it always retries previously failed/skipped pages, pass this flag to only archive newly added links without going through the whole archive and attempting to fix previously failed links.
#### `--resume=[timestamp]` #### `--resume=[timestamp]`
Resume the update process from a specific URL timestamp. Resume the update process from a specific URL timestamp.
@@ -235,6 +235,49 @@ Resume the update process from a specific URL timestamp.
#### `--snapshot` #### `--snapshot`
[TODO] by default ArchiveBox never re-archives pages after the first successful archive, if you want to take a new snapshot of every page even if there's an existing version, pass this option. [TODO] by default ArchiveBox never re-archives pages after the first successful archive, if you want to take a new snapshot of every page even if there's an existing version, pass this option.
### `$ archivebox list [--csv=COLUMNS] [--json] [--filter=REGEX] [--before=TIMESTAMP] [--after=TIMESTAMP]`
#### `--csv=COLUMNS`
Print the output in CSV format, with the specified columns, e.g. `--csv=timestamp,base_url,is_archived`
### `--json`
Print the output in JSON format (with all the link attributes included in the JSON output).
### `--filter=REGEX`
Print only URLs matching a specified regex, e.g. `--filter='.*github.com.*'`
### `--before=TIMESTAMP` / `--after=TIMESTAMP`
Print only URLs before or after a given timestamp, e.g. `--before=1554263415.2` or `--after=1554260000`
```bash
$ archivebox list --sort=timestamp
http://www.iana.org/domains/example
https://github.com/pirate/ArchiveBox/wiki
https://github.com/pirate/ArchiveBox/commit/0.4.0
https://github.com/pirate/ArchiveBox
https://archivebox.io
```
```bash
$ archivebox list --sort=timestamp --csv=timestamp,url
timestamp,url
1554260947,http://www.iana.org/domains/example
1554263415,https://github.com/pirate/ArchiveBox/wiki
1554263415.0,https://github.com/pirate/ArchiveBox/commit/0.4.0
1554263415.1,https://github.com/pirate/ArchiveBox
1554263415.2,https://archivebox.io
```
```bash
$ archivebox list --sort=timestamp --csv=timestamp,url --after=1554263415.0
timestamp,url
1554263415,https://github.com/pirate/ArchiveBox/wiki
1554263415.0,https://github.com/pirate/ArchiveBox/commit/0.4.0
1554263415.1,https://github.com/pirate/ArchiveBox
1554263415.2,https://archivebox.io
```
### `$ archivebox server [--bind=0.0.0.0:8000]` ### `$ archivebox server [--bind=0.0.0.0:8000]`
```bash ```bash