1
0
mirror of https://github.com/pirate/ArchiveBox.git synced 2025-08-27 16:20:05 +02:00

Updated Configuration (markdown)

Nick Sweeting
2021-07-06 23:04:07 -04:00
parent 6b75c03434
commit 921dc04fdb

@@ -114,6 +114,17 @@ When building your blacklist, you can check whether a given URL matches your reg
True True
``` ```
You can also use this to **whitelist** certain patterns and exclude all others by adding `(?!`*pattern*`)` around the pattern to negate it. For example, to match only URLs `*.example.org` you could do:
```python
>>> URL_BLACKLIST = r'(?!http(s)?:\/\/(.+)?example\.org\/?.*)'
>>> bool(re.compile(URL_BLACKLIST, re.IGNORECASE).match('https://example.org/example.php?abc=123')
False # this URL would not be excluded (i.e. it will be archived)
>>> bool(re.compile(URL_BLACKLIST, re.IGNORECASE).match('https://abc.example.org')
False # this URL would not be excluded (i.e. it will be archived)
>>> bool(re.compile(URL_BLACKLIST, re.IGNORECASE).match('https://test.youtube.com/example.php?abc=123')
True # but this would be excluded and not archived, because it does not match *.example.org
```
*Related options:* *Related options:*
[`SAVE_MEDIA`](#SAVE_MEDIA), [`SAVE_GIT`](#SAVE_GIT), [`GIT_DOMAINS`](#GIT_DOMAINS) [`SAVE_MEDIA`](#SAVE_MEDIA), [`SAVE_GIT`](#SAVE_GIT), [`GIT_DOMAINS`](#GIT_DOMAINS)