Updated Configuration (markdown)

2025-08-18 04:11:57 +02:00 · 2021-07-06 23:55:17 -04:00
parent 066f722da1
commit ae6529bc2f
1 changed files with 2 additions and 2 deletions
--- a/Configuration.md
+++ b/Configuration.md
@@ -128,7 +128,7 @@ A regex expression used to exclude all URLs that don't match the given pattern f
 When building your blacklist, you can check whether a given URL matches your regex expression in `python` like so:
 ```python
 >>> import re
->>> URL_WHITELIST = r'^http(s)?:\/\/(.+)?example\.org\/?.*$'  # replace this with your regex to test
+>>> URL_WHITELIST = r'^http(s)?:\/\/(.+)?example\.com\/?.*$'  # replace this with your regex to test
 >>> test_url = 'https://test.example.com/example.php?abc=123'
 >>> bool(re.compile(URL_BLACKLIST, re.IGNORECASE | re.UNICODE | re.MULTILINE).search(test_url))
 True      # this URL would be archived
@@ -138,7 +138,7 @@ True      # this URL would be archived
 False     # this URL would be excluded from archiving
 ```

-This option is useful for recursively archiving all the pages on a given domain (aka crawling/spidering), without following links to external domains.
+This option is useful for **recursive archiving** of all the pages under a given domain or subfolder (aka crawling/spidering), without following links to external domains / parent folders.
 ```bash
 # temporarily enforce a whitelist by setting the option as an environment variable
 export URL_WHITELIST='^http(s)?:\/\/(.+)?example\.com\/?.*$'