1
0
mirror of https://github.com/pirate/ArchiveBox.git synced 2025-09-03 11:22:37 +02:00

Use feedparser for RSS parsing in generic_rss and pinboard_rss parsers

The feedparser packages has 20 years of history and is very good at parsing
RSS and Atom, so use that instead of ad-hoc regex and XML parsing.

The medium_rss and shaarli_rss parsers weren't touched because they are
probably unnecessary. (The special parse for pinboard is just needing because
of how tags work.)

Doesn't include tests because I haven't figured out how to run them in the
docker development setup.

Fixes #1171
This commit is contained in:
jim winstead
2024-02-25 12:34:51 -08:00
parent 7b042c854a
commit 9f462a87a8
3 changed files with 34 additions and 50 deletions

View File

@@ -15,6 +15,7 @@ dependencies = [
"dateparser>=1.0.0",
"django-extensions>=3.0.3",
"django>=3.1.3,<3.2",
"feedparser>=6.0.11",
"ipython>5.0.0",
"mypy-extensions>=0.4.3",
"python-crontab>=2.5.1",