1
0
mirror of https://github.com/pirate/ArchiveBox.git synced 2025-08-19 12:51:49 +02:00

Updated Troubleshooting (markdown)

Nick Sweeting
2019-03-12 20:43:48 -04:00
parent 953e7e2f24
commit af37c449f2

@@ -13,7 +13,7 @@ What are you having an issue with?:
Make sure you've followed the Manual Setup guide in the [[Install]] instructions first. Then check here for help depending on what component you need help with: Make sure you've followed the Manual Setup guide in the [[Install]] instructions first. Then check here for help depending on what component you need help with:
**Python:** #### Python
On some Linux distributions the python3 package might not be recent enough. On some Linux distributions the python3 package might not be recent enough.
If this is the case for you, resort to installing a recent enough version manually. If this is the case for you, resort to installing a recent enough version manually.
@@ -22,7 +22,7 @@ add-apt-repository ppa:fkrull/deadsnakes && apt update && apt install python3.6
``` ```
If you still need help, [the official Python docs](https://docs.python.org/3.6/using/unix.html) are a good place to start. If you still need help, [the official Python docs](https://docs.python.org/3.6/using/unix.html) are a good place to start.
**Chromium/Google Chrome:** #### Chromium/Google Chrome
For more info, see the [[Chromium Install]] page. For more info, see the [[Chromium Install]] page.
@@ -62,7 +62,7 @@ env CHROME_BINARY=/path/from/step/1/chromium-browser ./archive bookmarks_export.
``` ```
**Wget & Curl:** #### Wget & Curl
If you're missing `wget` or `curl`, simply install them using `apt` or your package manager of choice. If you're missing `wget` or `curl`, simply install them using `apt` or your package manager of choice.
See the "Manual Setup" instructions for more details. See the "Manual Setup" instructions for more details.
@@ -73,12 +73,12 @@ a bug in versions `<=1.19.1_1` that caused wget to fail for perfectly valid site
### Archiving ### Archiving
**No links parsed from export file:** #### No links parsed from export file
Please open an [issue](https://github.com/pirate/ArchiveBox/issues) with a description of where you got the export, and Please open an [issue](https://github.com/pirate/ArchiveBox/issues) with a description of where you got the export, and
preferrably your export file attached (you can redact the links). We'll fix the parser to support your format. preferrably your export file attached (you can redact the links). We'll fix the parser to support your format.
**Lots of skipped sites:** #### Lots of skipped sites
If you ran the archiver once, it wont re-download sites subsequent times, it will only download new links. If you ran the archiver once, it wont re-download sites subsequent times, it will only download new links.
If you haven't already run it, make sure you have a working internet connection and that the parsed URLs look correct. If you haven't already run it, make sure you have a working internet connection and that the parsed URLs look correct.
@@ -86,18 +86,18 @@ You can check the `archive.py` output or `index.html` to see what links it's dow
If you're still having issues, try deleting or moving the `output/archive` folder (back it up first!) and running `./archive` again. If you're still having issues, try deleting or moving the `output/archive` folder (back it up first!) and running `./archive` again.
**Lots of errors:** #### Lots of errors
Make sure you have all the dependencies installed and that you're able to visit the links from your browser normally. Make sure you have all the dependencies installed and that you're able to visit the links from your browser normally.
Open an [issue](https://github.com/pirate/ArchiveBox/issues) with a description of the errors if you're still having problems. Open an [issue](https://github.com/pirate/ArchiveBox/issues) with a description of the errors if you're still having problems.
**Lots of broken links from the index:** #### Lots of broken links from the index
Not all sites can be effectively archived with each method, that's why it's best to use a combination of `wget`, PDFs, and screenshots. Not all sites can be effectively archived with each method, that's why it's best to use a combination of `wget`, PDFs, and screenshots.
If it seems like more than 10-20% of sites in the archive are broken, open an [issue](https://github.com/pirate/ArchiveBox/issues) If it seems like more than 10-20% of sites in the archive are broken, open an [issue](https://github.com/pirate/ArchiveBox/issues)
with some of the URLs that failed to be archived and I'll investigate. with some of the URLs that failed to be archived and I'll investigate.
**Removing unwanted links from the index:** #### Removing unwanted links from the index
If you accidentally added lots of unwanted links into index and they slow down your archiving, you can use the `bin/purge` script to remove them from your index, which removes everything matching python regexes you pass into it. E.g: `bin/purge -r 'amazon\.com' -r 'google\.com'`. It would prompt before removing links from index, but for extra safety you might want to back up `index.json` first (or put in undex version control). If you accidentally added lots of unwanted links into index and they slow down your archiving, you can use the `bin/purge` script to remove them from your index, which removes everything matching python regexes you pass into it. E.g: `bin/purge -r 'amazon\.com' -r 'google\.com'`. It would prompt before removing links from index, but for extra safety you might want to back up `index.json` first (or put in undex version control).