1
0
mirror of https://github.com/pirate/ArchiveBox.git synced 2025-09-01 18:32:40 +02:00

Updated Roadmap (markdown)

Nick Sweeting
2019-03-01 04:29:33 -05:00
parent d7cae9412c
commit 3da559cb44

@@ -14,11 +14,11 @@ If you feel like contributing a PR, some of these tasks are pretty easy. Feel f
### Minor upcoming changes ### Minor upcoming changes
- support pushing pages to multiple 3rd party services using ArchiveNow instead of just archive.org - support pushing pages to multiple 3rd party services using ArchiveNow instead of just archive.org
- body text extraction using [fathom](https://hacks.mozilla.org/2017/04/fathom-a-framework-for-understanding-web-pages/) - body text extraction to markdown (using [fathom](https://hacks.mozilla.org/2017/04/fathom-a-framework-for-understanding-web-pages/)?)
- auto-tagging based on important extracted words - featured image / thumbnail extraction
- full-text indexing with elasticsearch/elasticlunr/ag - auto-tagging links based on important/frequent keywords in extracted text (like pocket)
- video closed-caption downloading on Youtube for full-text indexing of video content - automatic article summary paragraphs from extracted text with nlp summarization library
- automatic text summaries of article with nlp summarization library - full-text search of extracted text with elasticsearch/elasticlunr/ag
- featured image extraction - download closed-caption subtitles from Youtube and other video sites for full-text indexing of video content
- try wgetting dead sites from archive.org (https://github.com/hartator/wayback-machine-downloader) - try pulling dead sites from archive.org and other sources if original is down (https://github.com/hartator/wayback-machine-downloader)
- And more in the [issues list](https://github.com/pirate/ArchiveBox/issues/)... - And more in the [issues list](https://github.com/pirate/ArchiveBox/issues/)...