In early 2020, it became really difficult to read books. I'm slowly starting to get past that at last, but while Goodreads was super fun for a few years, right now tracking and counting and rating just makes reading feel like a chore and Not Fun as a hobby.
I still would like to keep a record of what I read, though. And I used to write book reviews on this blog before getting onto Goodreads. It seems like a nice way to keep a record of what I read without it becoming a number thing. I'll probably still write Goodreads reviews for small authors since I know it can help, but I feel less pressed about having everything there.
I also wanted to save the reviews I wrote only on Goodreads here, and wrote a script to migrate my reviews into Pelican-friendly Markdown pages since that's what powers this blog now. I decided to keep to a single entry per year rather than one entry per book, since I had a few good reading years in there (and others will only a single review!)
Step 1: CSV export of the books
First, if you go to 'My books' and find the 'Tools' menu at the bottom of the leftside menu on Goodreads, you'll find a page to export your library. You may have to try a couple of times: my first export only had a handful a books, the second one looks more comprehensive although the number of books was off by two, but what can you do.
Step 2: Python script to create the Markdown pages
This is the script I wrote to extract only the books I've actually read. I don't really care about DNF (did not finish) and to-read, right now. I also hardcoded the years relevant to me. May someone find something helpful in here!
import csv
from dataclasses import dataclass
from datetime import date, datetime
@dataclass
class Review:
    title: str
    author: str
    date_read: date
    review: str
    rating: int
def get_reviews(year):
    reviews = []
    with open('goodreads_library_export.csv') as csvfile:
        reader = csv.DictReader(csvfile)
        for row in reader:
            try:
                r = Review(row['Title'],
                           row['Author'],
                           row['Date Read'],
                           row['My Review'],
                           int(row['My Rating']))
            except ValueError:
                # When the int() cast fails, usually it means the CSV is
                # corrupted for that line. In my case, it was for a few to-read
                # records so I ignore them rather than attempt to fix the
                # original CSV. You can print the row here if you want to check
                # what's failing.
                pass
            if year is None or year in r.date_read:
                reviews.append(r)
    return reviews
def rating_or_review(review):
    # If I wrote a review, return that
    if review.review:
        return review.review
    # Otherwise, make the rating into words.
    if review.rating >= 4:
        return "I really enjoyed it."
    elif review.rating == 3:
        return "It was fine."
    else:
        return "Wasn't for me."
def format_reviews(reviews, year):
    with open(f'book-reviews-{year}.md', 'w') as f:
        f.write(f"Title: Book reviews: Year {year}\n")
        f.write(f"Date: {datetime.now().isoformat()}\n")
        f.write("tags: book review\n\n")
        for r in reviews:
            # A couple of abandoned books sneaked in with a '0' rating, and I'm
            # not interested in preserving those
            if r.rating != 0:
                f.write(f"## {r.title} by {r.author}\n\n")
                f.write(f"{rating_or_review(r)}\n")
                f.write("\n")
for year in range(2013, 2023):
    reviews = get_reviews(str(year))
    # Chronological order
    reviews = sorted(reviews, key=lambda r: r.date_read)
    format_reviews(reviews, year)
The hardest part was probably to decide what text to convert a rating into, since I didn't want to keep numbers!
Step 3: Checking the output looks right and recalling fond memories
I used the 'Year in Books'
pages on Goodreads to
compare the results. There was some funkiness sometimes, like a book read in
2011 showing in year in books but without any shelves and a date read showing
as 2020, even though I don't remember messing with it. The review also shows as
Jan 2020 on the Goodreads UI despite appearing in the correct 'Year in
books'. 2020 turned out to be date_added (which is definitely false) while
the date_read field is empty. Maybe some data migration funkiness on the
Goodreads side at some point during the last 12 years. Otherwise, a duplicate
once, and a couple of intra-Goodreads links that didn't work.
I still have to clean up the file for 2022. I was getting annoyed with tracking myself so I didn't write reviews, but if it's for the blog I wouldn't mind adding a few notes. And I need to decide if I want to post my 2023 reviews as I go, or batch them in some way!