Creating an RSS feed from HTML in python

Copy and paste into your RSS reader, or click this, to add the blog: https://ljmartin.github.io/blog/feed.xml

As (some) people (like me) use Twitter less, I wanted a better way to share blog entries. Recently I've been following blogs on a free, open-source RSS reader called NetNewsWire, so it made sense to start an RSS feed, too.

This blog is built using emacs: I copy an html file and edit the fields for each entry. That means there's no associated RSS feed. A million tools purport to automate this but none worked for me, until I found this helpful post. It uses a 9 year old python script that's been maintained this whole time, and was originally inspired by a script from compchemist Andrew Dalke in 2003!

So here's what this looks like. Hopefully it's useful for someone else too:

from bs4 import BeautifulSoup   # HTML parser
from rfeed import *   # RSS generator
from datetime import datetime
import os
import re

# Get all the HTML files in the blog section
dirname = '/Users/ljmartin/Documents/GitHub/ljmartin.github.io/blog/'
htmls = [x for x in os.listdir("dirname") if x.endswith('.html')]
htmls.sort(reverse=True) # sort newest to oldest

#add each blog item:
items = []
for f in htmls:
 
    #make sure the file has the pattern '15_rss.html'
    if not re.search('[0-9][0-9]_*', f):
        continue
        
    print(f)
    with open('/Users/ljmartin/Documents/GitHub/ljmartin.github.io/blog/'+f) as of:
        soup = BeautifulSoup(of, 'html.parser')
        txt = soup.find_all('main')[0]
        t = soup.find('meta', attrs={"itemprop": "datePublished"})
        date = t['content']
        items.append(
            Item(
                author = 'Lewis J. Martin',
                pubDate = datetime.strptime(date, '%Y-%m-%d'),
                description = txt
            ))
        
        
#wrap it into a 'feed'
feed = Feed(
    title = 'LJM CompMedChem',
    link = "https://ljmartin.github.io/",
    items = items,
    description = 'Sideprojects and code snippets for compchemistry'
)

#and write:
rss = feed.rss().replace("–", "--")     # replace hyphens with readable char
# write RSS feed to feed.xml
with open("feed.xml", "w") as file:
    file.write(rss)