Creating an RSS feed from HTML in python
Copy and paste into your RSS reader, or click this, to add the blog: https://ljmartin.github.io/blog/feed.xml
As (some) people (like me) use Twitter less, I wanted a better way to share blog entries. Recently I've been following blogs on a free, open-source RSS reader called NetNewsWire, so it made sense to start an RSS feed, too.
This blog is built using emacs: I copy an html file and edit the fields for each entry. That means there's no associated RSS feed. A million tools purport to automate this but none worked for me, until I found this helpful post. It uses a 9 year old python script that's been maintained this whole time, and was originally inspired by a script from compchemist Andrew Dalke in 2003!
So here's what this looks like. Hopefully it's useful for someone else too:
from bs4 import BeautifulSoup # HTML parser
from rfeed import * # RSS generator
from datetime import datetime
import os
import re
# Get all the HTML files in the blog section
dirname = '/Users/ljmartin/Documents/GitHub/ljmartin.github.io/blog/'
htmls = [x for x in os.listdir("dirname") if x.endswith('.html')]
htmls.sort(reverse=True) # sort newest to oldest
#add each blog item:
items = []
for f in htmls:
#make sure the file has the pattern '15_rss.html'
if not re.search('[0-9][0-9]_*', f):
continue
print(f)
with open('/Users/ljmartin/Documents/GitHub/ljmartin.github.io/blog/'+f) as of:
soup = BeautifulSoup(of, 'html.parser')
txt = soup.find_all('main')[0]
t = soup.find('meta', attrs={"itemprop": "datePublished"})
date = t['content']
items.append(
Item(
author = 'Lewis J. Martin',
pubDate = datetime.strptime(date, '%Y-%m-%d'),
description = txt
))
#wrap it into a 'feed'
feed = Feed(
title = 'LJM CompMedChem',
link = "https://ljmartin.github.io/",
items = items,
description = 'Sideprojects and code snippets for compchemistry'
)
#and write:
rss = feed.rss().replace("–", "--") # replace hyphens with readable char
# write RSS feed to feed.xml
with open("feed.xml", "w") as file:
file.write(rss)