From 3da559cb4454f0ffc1a1ce74e383ecaeb51d7aca Mon Sep 17 00:00:00 2001 From: Nick Sweeting Date: Fri, 1 Mar 2019 04:29:33 -0500 Subject: [PATCH] Updated Roadmap (markdown) --- Roadmap.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/Roadmap.md b/Roadmap.md index 03dcc3f..3291d0d 100644 --- a/Roadmap.md +++ b/Roadmap.md @@ -14,11 +14,11 @@ If you feel like contributing a PR, some of these tasks are pretty easy. Feel f ### Minor upcoming changes - support pushing pages to multiple 3rd party services using ArchiveNow instead of just archive.org - - body text extraction using [fathom](https://hacks.mozilla.org/2017/04/fathom-a-framework-for-understanding-web-pages/) - - auto-tagging based on important extracted words - - full-text indexing with elasticsearch/elasticlunr/ag - - video closed-caption downloading on Youtube for full-text indexing of video content - - automatic text summaries of article with nlp summarization library - - featured image extraction - - try wgetting dead sites from archive.org (https://github.com/hartator/wayback-machine-downloader) + - body text extraction to markdown (using [fathom](https://hacks.mozilla.org/2017/04/fathom-a-framework-for-understanding-web-pages/)?) + - featured image / thumbnail extraction + - auto-tagging links based on important/frequent keywords in extracted text (like pocket) + - automatic article summary paragraphs from extracted text with nlp summarization library + - full-text search of extracted text with elasticsearch/elasticlunr/ag + - download closed-caption subtitles from Youtube and other video sites for full-text indexing of video content + - try pulling dead sites from archive.org and other sources if original is down (https://github.com/hartator/wayback-machine-downloader) - And more in the [issues list](https://github.com/pirate/ArchiveBox/issues/)... \ No newline at end of file