From 983caae7f266977db4dcef946cdcdb33ab60c5fc Mon Sep 17 00:00:00 2001 From: Nick Sweeting Date: Mon, 31 Dec 2018 19:59:32 -0500 Subject: [PATCH] Created Roadmap (markdown) --- Roadmap.md | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) create mode 100644 Roadmap.md diff --git a/Roadmap.md b/Roadmap.md new file mode 100644 index 0000000..610fcf3 --- /dev/null +++ b/Roadmap.md @@ -0,0 +1,22 @@ +## Roadmap + +[*Official Roadmap*](https://github.com/pirate/ArchiveBox/issues/120). + +If you feel like contributing a PR, some of these tasks are pretty easy. Feel free to open an issue if you need help getting started in any way! + +**Major upcoming changes:** + + - finalize python packaging to allow installing via pip and importing individual componenets + - add an optional web GUI for managing sources, adding new links, and viewing the archive + +**Minor upcoming changes:** + - download closed-captions text from youtube videos + - body text extraction using [fathom](https://hacks.mozilla.org/2017/04/fathom-a-framework-for-understanding-web-pages/) + - auto-tagging based on important extracted words + - audio & video archiving with `youtube-dl` + - full-text indexing with elasticsearch/elasticlunr/ag + - video closed-caption downloading on Youtube for full-text indexing of video content + - automatic text summaries of article with nlp summarization library + - featured image extraction + - http support (from my https-only domain) + - try wgetting dead sites from archive.org (https://github.com/hartator/wayback-machine-downloader) \ No newline at end of file