Extracting reviews from Goodreads into Markdown pages

In early 2020, it became really difficult to read books. I'm slowly starting to get past that at last, but while Goodreads was super fun for a few years, right now tracking and counting and rating just makes reading feel like a chore and Not Fun as a hobby.

I still would like to keep a record of what I read, though. And I used to write book reviews on this blog before getting onto Goodreads. It seems like a nice way to keep a record of what I read without it becoming a number thing. I'll probably still write Goodreads reviews for small authors since I know it can help, but I feel less pressed about having everything there.

I also wanted to save the reviews I wrote only on Goodreads here, and wrote a script to migrate my reviews into Pelican-friendly Markdown pages since that's what powers this blog now. I decided to keep to a single entry per year rather than one entry per book, since I had a few good reading years in there (and others will only a single review!)

Step 1: CSV export of the books

First, if you go to 'My books' and find the 'Tools' menu at the bottom of the leftside menu on Goodreads, you'll find a page to export your library. You may have to try a couple of times: my first export only had a handful a books, the second one looks more comprehensive although the number of books was off by two, but what can you do.

Step 2: Python script to create the Markdown pages

This is the script I wrote to extract only the books I've actually read. I don't really care about DNF (did not finish) and to-read, right now. I also hardcoded the years relevant to me. May someone find something helpful in here!

import csv
from dataclasses import dataclass
from datetime import date, datetime


@dataclass
class Review:
    title: str
    author: str
    date_read: date
    review: str
    rating: int


def get_reviews(year):
    reviews = []

    with open('goodreads_library_export.csv') as csvfile:
        reader = csv.DictReader(csvfile)

        for row in reader:
            try:
                r = Review(row['Title'],
                           row['Author'],
                           row['Date Read'],
                           row['My Review'],
                           int(row['My Rating']))
            except ValueError:
                # When the int() cast fails, usually it means the CSV is
                # corrupted for that line. In my case, it was for a few to-read
                # records so I ignore them rather than attempt to fix the
                # original CSV. You can print the row here if you want to check
                # what's failing.
                pass

            if year is None or year in r.date_read:
                reviews.append(r)

    return reviews


def rating_or_review(review):
    # If I wrote a review, return that
    if review.review:
        return review.review

    # Otherwise, make the rating into words.
    if review.rating >= 4:
        return "I really enjoyed it."
    elif review.rating == 3:
        return "It was fine."
    else:
        return "Wasn't for me."


def format_reviews(reviews, year):
    with open(f'book-reviews-{year}.md', 'w') as f:
        f.write(f"Title: Book reviews: Year {year}\n")
        f.write(f"Date: {datetime.now().isoformat()}\n")
        f.write("tags: book review\n\n")

        for r in reviews:
            # A couple of abandoned books sneaked in with a '0' rating, and I'm
            # not interested in preserving those
            if r.rating != 0:
                f.write(f"## {r.title} by {r.author}\n\n")
                f.write(f"{rating_or_review(r)}\n")
                f.write("\n")


for year in range(2013, 2023):
    reviews = get_reviews(str(year))
    # Chronological order
    reviews = sorted(reviews, key=lambda r: r.date_read)
    format_reviews(reviews, year)

The hardest part was probably to decide what text to convert a rating into, since I didn't want to keep numbers!

Step 3: Checking the output looks right and recalling fond memories

I used the 'Year in Books' pages on Goodreads to compare the results. There was some funkiness sometimes, like a book read in 2011 showing in year in books but without any shelves and a date read showing as 2020, even though I don't remember messing with it. The review also shows as Jan 2020 on the Goodreads UI despite appearing in the correct 'Year in books'. 2020 turned out to be date_added (which is definitely false) while the date_read field is empty. Maybe some data migration funkiness on the Goodreads side at some point during the last 12 years. Otherwise, a duplicate once, and a couple of intra-Goodreads links that didn't work.

I still have to clean up the file for 2022. I was getting annoyed with tracking myself so I didn't write reviews, but if it's for the blog I wouldn't mind adding a few notes. And I need to decide if I want to post my 2023 reviews as I go, or batch them in some way!

jpichon.net

Other articles

Book reviews: Year 2021

Bonds of Brass (The Bloodright Trilogy, #1) by Emily Skrutskie

Book reviews: Year 2020

The Fractal Prince (Jean le Flambeur, #2) by Hannu Rajaniemi

Book reviews: Year 2019

How to Be a Stoic: Ancient Wisdom for Modern Living by Massimo Pigliucci

Book reviews: Year 2018

The Bands of Mourning (Mistborn, #6) by Brandon Sanderson

Artemis by Andy Weir