From 7e32d5236b40533789dae504bb43afadee67289f Mon Sep 17 00:00:00 2001 From: Nick Sweeting Date: Wed, 27 Feb 2019 13:05:40 -0500 Subject: [PATCH] Updated Web Archiving Community (markdown) --- Web-Archiving-Community.md | 123 +++++++++++++++++++++++++++++-------- 1 file changed, 99 insertions(+), 24 deletions(-) diff --git a/Web-Archiving-Community.md b/Web-Archiving-Community.md index 0901735..def05dc 100644 --- a/Web-Archiving-Community.md +++ b/Web-Archiving-Community.md @@ -1,11 +1,31 @@ ▶️ *Just getting started and want to learn more about why Web Archiving is important?
Check out this article: [On the Importance of Web Archiving](https://parameters.ssrc.org/2018/09/on-the-importance-of-web-archiving/).* +## Contents + + - [The Master Lists](#The-Master-Lists) + - [Web Archiving Software](#Web-Archiving-Projects) + + Bookmarking+archiving services + + Software maintained by well known Archiving institutions + + Public Archiving Services + + ArchiveBox Alternatives + + Smaller Utilities + - [Reading List](#Reading-List) + + Blogs + + Articles + + ArchiveBox Discussions in News & Social Media + - [Communities](#Communities) + + Most Active Communities + + Web Archiving Communities + + General Archiving Foundations, Coalitions, Initiatives, and Institutes + ## The Master Lists - - [The Awesome Web Archiving List](https://github.com/iipc/awesome-web-archiving) (IIPC) - - [Wiki of Web Archiving Tools](http://coptr.digipres.org/Category:Tools) (COPTR) +Lists of great archiving institutions and software maintained by other people. + + - **[The Awesome Web Archiving List](https://github.com/iipc/awesome-web-archiving) (IIPC)** + - **[Wiki of Web Archiving Tools](http://coptr.digipres.org/Category:Tools) (COPTR)** - [ArchiveTeam's List of Software](https://www.archiveteam.org/index.php?title=Software) (ArchiveTeam.org) - [List of Web Archiving Initiatives](https://en.wikipedia.org/wiki/List_of_Web_archiving_initiatives) (Wikipedia.org) - [Directory of Archiving Organizations](https://www2.archivists.org/assoc-orgs) (American Society of Archivists) @@ -13,41 +33,77 @@ ## Web Archiving Projects +--- + +
+     +     +     + +
+ ### Big Name Bookmarking + Archiving Services - - [Pocket Premium](https://getpocket.com) Bookmarking tool that provides an archiving service in their paid version, run by Mozilla - - [Pinboard](https://pinboard.in) Bookmarking tool that provides archiving in a paid version, run by a single independent developer + + - **[Pocket Premium](https://getpocket.com) Bookmarking tool that provides an archiving service in their paid version, run by Mozilla** + - **[Pinboard](https://pinboard.in) Bookmarking tool that provides archiving in a paid version, run by a single independent developer** - [Wallabag](https://wallabag.org) / [Wallabag.it](https://wallabag.it) Self-hostable web archiving server that can import via RSS - [Shaarli](https://github.com/shaarli/Shaarli) Self-hostable bookmark tagging, archiving, and sharing service +--- + ### From the Archive.org & Archive-It teams - - [Archive.org](https://archive.org) The O.G. wayback machine provided publicly by the Internet Archive (Archive.org) - - [Archive.it](https://archive-it.org) commercial Wayback-Machine solution - - [Brozzler](https://github.com/internetarchive/brozzler) chrome headless crawler + WARC archiver maintained by Archive.org + + + + + - **[Archive.org](https://archive.org) The O.G. wayback machine provided publicly by the Internet Archive (Archive.org)** + - **[Archive.it](https://archive-it.org) commercial Wayback-Machine solution** + - **[Brozzler](https://github.com/internetarchive/brozzler) chrome headless crawler + WARC archiver maintained by Archive.org** - [WarcProx](https://github.com/internetarchive/warcprox) warc proxy recording and playback utility - [WarcTools](https://github.com/internetarchive/warctools) utilities for dealing with WARCs - [More on their Github...](https://github.com/internetarchive) +--- + ### From the Webrecorder team - - [Webrecorder.io](https://webrecorder.io/) An open-source personal archiving server that uses pywb under the hood - - [pywb](https://github.com/webrecorder/pywb) The python wayback machine, the codebase forked off archive.org that powers webrecorder + + + + - **[Webrecorder.io](https://webrecorder.io/) An open-source personal archiving server that uses pywb under the hood** + - **[pywb](https://github.com/webrecorder/pywb) The python wayback machine, the codebase forked off archive.org that powers webrecorder** - [warcit](https://github.com/webrecorder/warcit) Create a warc file out of a folder full of assets - [WebArchivePlayer](https://github.com/ikreymer/webarchiveplayer#auto-load-warcs) A tool for replaying web archives - [warcio](https://github.com/webrecorder/warcio) fast streaming asynchronous WARC reader and writer - [More on their Github...](https://github.com/webrecorder) +--- + ### From the Old Dominion University: Web Science Team - - [ipwb](https://github.com/oduwsdl/ipwb) A distributed web archiving solution using pywb with ipfs for storage - - [archivenow](https://github.com/oduwsdl/archivenow) tool that pushes urls into all the online archive services like Archive.is and Archive.org + + + - **[ipwb](https://github.com/oduwsdl/ipwb) A distributed web archiving solution using pywb with ipfs for storage** + - **[archivenow](https://github.com/oduwsdl/archivenow) tool that pushes urls into all the online archive services like Archive.is and Archive.org** - [WAIL](https://github.com/n0tan3rd/wail) Electron app version of the original [wail](https://github.com/machawk1/wail) for creating and interacting with web archives - - [warcreate](https://github.com/machawk1/warcreate) a Chrome extension for creating WARCs from any webpage + - **[warcreate](https://github.com/machawk1/warcreate) a Chrome extension for creating WARCs from any webpage** - [More on their Github...](https://github.com/oduwsdl) +--- + ### From the Archives Unleashed Team + + + - [AUT](https://github.com/archivesunleashed/aut) Archives Unleashed Toolkit for analyzing web archives (formerly WarcBase) - [Warclight](https://github.com/archivesunleashed/warclight) A Rails engine for finding and searching web archives - [More on their Github...](https://github.com/archivesunleashed) +--- + ### Other Public Archiving Services + + + + - https://archive.is / https://archive.today - https://archive.st - https://timetravel.mementoweb.org/ @@ -57,11 +113,13 @@ - https://megalodon.jp/ - Google, Bing, DuckDuckGo, and other search engine caches -### Other ArchiveBox alternatives - - [Polarized](https://getpolarized.io/) a desktop application for bookmarking, annotating, and archiving articles offline - - [Hypothes.is](https://web.hypothes.is/) a web/pdf/ebook annotation tool that also archives content - - [Reminiscence](https://github.com/kanishka-linux/reminiscence/) extremely similar to ArchiveBox, uses a Django backend + UI and provides auto tagging and summary features with NLTK - - [Shaarchiver](https://github.com/nodiscc/shaarchiver) very similar project that archives Firefox, Shaarli, or Delicious bookmarks and all linked media, generating a markdown/HTML index +--- + +### Other ArchiveBox Alternatives + - **[Polarized](https://getpolarized.io/) a desktop application for bookmarking, annotating, and archiving articles offline** + - **[Hypothes.is](https://web.hypothes.is/) a web/pdf/ebook annotation tool that also archives content** + - **[Reminiscence](https://github.com/kanishka-linux/reminiscence/) extremely similar to ArchiveBox, uses a Django backend + UI and provides auto tagging and summary features with NLTK** + - **[Shaarchiver](https://github.com/nodiscc/shaarchiver) very similar project that archives Firefox, Shaarli, or Delicious bookmarks and all linked media, generating a markdown/HTML index** - [ReadableWebProxy](https://github.com/fake-name/ReadableWebProxy) A proxying archiver that downloads content from sites and can snapshot multiple versions of sites over time - [Memex by Worldbrain.io](https://github.com/WorldBrain/Memex) a browser extension that saves all your history and does full-text search - [Perkeep](https://perkeep.org/) "Perkeep lets you permanently keep your stuff, for life." @@ -74,6 +132,8 @@ - [Erised](https://github.com/marvelm/erised) Super simple CLI utility to bookmark and archive webpages - [Zotero](https://www.zotero.org/) collect, organize, cite, and share research (mainly for technical/scientific papers & citations) +--- + ### Smaller Utilities Random helpful utiltities for web archiving, WARC creation and replay, and more... @@ -88,11 +148,16 @@ Random helpful utiltities for web archiving, WARC creation and replay, and more. - https://en.wikipedia.org/wiki/Furl - [And many more on the other lists...](#the-master-lists) +--- + ## Reading List A collection of blog posts and articles about internet archiving, contact me / open an issue if you want to add a link here! ### Blogs + + + - https://blog.archive.org - https://netpreserveblog.wordpress.com - https://ws-dl.blogspot.com @@ -126,10 +191,12 @@ A collection of blog posts and articles about internet archiving, contact me / o If any of these links are dead, you can find an archived version on https://archive.sweeting.me. -## ArchiveBox Discussions in News & Social Media +### ArchiveBox Discussions in News & Social Media - - [ProductHunt](https://www.producthunt.com/posts/archivebox) - - [AlternativeTo](https://alternativeto.net/software/archivebox/) + + + - **[ProductHunt](https://www.producthunt.com/posts/archivebox)** + - **[AlternativeTo](https://alternativeto.net/software/archivebox/)** - [Hacker News Discussion](https://news.ycombinator.com/item?id=14272133) - [Reddit r/selfhosted Discussion #1](https://www.reddit.com/r/selfhosted/comments/69eoi3/pocket_stream_archive_your_own_personal_wayback/) - [Reddit r/selfhosted Discussion #2](https://www.reddit.com/r/selfhosted/comments/an2368/archivebox_the_opensource_selfhosted_web_archive/) @@ -139,14 +206,18 @@ If any of these links are dead, you can find an archived version on https://arch - [Recurse Center: The Joy of Computing](https://joy.recurse.com/posts/26-bookmark-archiver-your-own-self-hosted-way-back-machine) - [Python Trending Twiter](https://twitter.com/pythontrending/status/1092492387182628865) +--- + ## Communities ### Most Active Communities - - [The Internet Archive (Archive.org)](https://archive.org/iathreads/forums.php) (USA) - - [International Internet Preservation Consortium (IIPC)](http://netpreserve.org/) (International) - - [The Archive Team](https://www.archiveteam.org/), [URL Team](https://www.archiveteam.org/index.php?title=URLTeam), [r/ArchiveTeam](https://reddit.com/r/ArchiveTeam) (International) - - [r/DataHoarder](https://www.reddit.com/r/DataHoarder), [r/Archivists](https://www.reddit.com/r/Archivists/), [r/DHExchange](https://www.reddit.com/r/DHExchange/) (International) + + + - **[The Internet Archive (Archive.org)](https://archive.org/iathreads/forums.php)** (USA) + - **[International Internet Preservation Consortium (IIPC)](http://netpreserve.org/)** (International) + - **[The Archive Team](https://www.archiveteam.org/), [URL Team](https://www.archiveteam.org/index.php?title=URLTeam), [r/ArchiveTeam](https://reddit.com/r/ArchiveTeam)** (International) + - **[r/DataHoarder](https://www.reddit.com/r/DataHoarder), [r/Archivists](https://www.reddit.com/r/Archivists/), [r/DHExchange](https://www.reddit.com/r/DHExchange/)** (International) - [The Eye](https://the-eye.eu) Non-profit working on content archival and long-term preservation (Europe) - [Digital Preservation Coalition](https://www.dpconline.org/about) & their [Software Tool Registry (COPTR)](http://coptr.digipres.org/Main_Page) (UK & Wales) - [Archives Unleashed Project](https://archivesunleashed.org/about-project/) and [UAP Github](https://github.com/archivesunleashed) (Canada) @@ -154,6 +225,8 @@ If any of these links are dead, you can find an archived version on https://arch ### Web Archiving Communities + + - [Canadian Web Archiving Coalition](https://www.carl-abrc.ca/advancing-research/digital-preservation/cwac/) (Canada) - [Web Archives for Historical Research Group](https://uwaterloo.ca/web-archive-group/about) (Canada) - [Smithsonian Institution Archives: Digital Curation](https://siarchives.si.edu/what-we-do/digital-curation) (Washington D.C., USA) @@ -170,6 +243,8 @@ If any of these links are dead, you can find an archived version on https://arch ### General Archiving Foundations, Coalitions, Initiatives, and Institutes + + - [Community Archives and Heritage Group](https://www.communityarchives.org.uk/content/about/history-and-purpose) (UK & Ireland) - [Open Preservation Foundation (OPF)](https://openpreservation.org/about/organisation/) (UK & Europe) - [Software Preservation Network](https://www.softwarepreservationnetwork.org/about/) (International)