Another Refactoring Story: ActiveRecord Lists

I’ve now tried to write this post like three times, and one thing that’s clear to me again is why you don’t see more write-ups of even moderately complicated real-ish problems. It’s hard to get enough of the context across to make the code sensible without being overwhelming.

But there’s a Rails and ActiveRecord structure that this example gets to that I think is useful, so I’m going to try again. (Spoiler alert: the trick is refactoring ActiveRecord has_many lists to wrap them in a dedicated list class.)

Here’s the context…

Elmer is a project tracking tool and the main unit is the Card. I’ve talked about Elmer’s code a few times already.

Cards have some data attached. For our purposes here, the important attribute is status, representing where the card is along its path. The current path is “attic” to “unstarted” to “started” to “done”. Cards can move back and forth through the status path, so a card could be “done” but then it might retreat back to “started” for some reason or another.

For project data collection purposes, Elmer keeps track of when a card enters a new state. There’s a class called Calculator that takes a card and calculates how many days it was in progress, how many business days, and whether it’s currently part of the active project.

For historical comparison purposes, I added functionality by which I can get the historical data for a card, to give me its status in the project and its estimated size for any day that the card has existed. The idea is to use this data to see if the projected end date is moving or not.

However, with this feature added, Elmer now has two completely independent ways of determining the history of a card’s status:

Cards have attributes called date_added, date_started and date_finished. These attributes are updated when the card’s state changes using state-change methods called advance! and retreat.
Cards also have a list of associated objects called CardSnapshot. Each CardSnapshot represents a a date and the status or size of a card on that date. The entire list of snapshots is a history of the card over time. These snapshots are created or updated using ActiveRecord callbacks when the card is saved if the save changes the size or status. Complicating things, there is only one snapshot per day, if the size or status changes again on the same day, the existing snapshot for that day is updated rather than a new one created. I do this because Elmer doesn’t care about units of time shorter than a day.

These two mechanisms are redundant — the information in the card attributes can be completely gleaned by examining the Card Snapshots.

This is duplication in the strict sense of what Don’t Repeat Yourself (DRY) means. If I want to know when a card was started, I have two sources of truth, and it’s not hard to see how they might get out of sync. Just updating the status without using the special advance! or retreat! methods would do it.

Using the attributes on Card has two problems. One is that the attribute names are hard-coded to match the names of the statuses, which would be a problem if and when I ever decide to change the status names.

More importantly, there’s a requirement for the historical data that I’m not sure how to do with just the attributes — the ability to recreate the state of the card as it appeared on a particular day, no matter how the card has moved back or forth since then.

In other words, if the card was done on Tuesday, but we revert it back to started on Wednesday then finish it again on Thursday, if we look back and generate historical graphs then Tuesday’s historical data should reflect the card being done, and Wednesday’s shouldn’t, even though the card is finished again in the future.

I don’t think you can do this with the attributes alone, going back and forth past a given status wipes out older times you might have entered the status. You can do this with the data history, but it’s not clear to me as I start exactly how I want to manage it. Specifically, the Calculator still depends on the Card’s present-day state in a way that confounds doing historical data.

So, the plan is to replace the attributes with the snapshots, first of all to remove the duplication and then see what that gives us.

I’d also like to preserve the API — I have a lot of code that depends on the existence of getters like card.date_started, and I’d prefer not to change it.

Luckily for me, at this point the basic behavior of both the original attributes and the card snapshots is well covered by tests, so I’m pretty confident that I can start with my existing tests and if they all pass, I’m likely fine.

What I want to do is replace all attribute getters like date_started with new versions that look through the list of snapshots to infer the correct date.

First, I removed all the attribute columns via a migration

class MoveDateToSnapshot < ActiveRecord::Migration[7.0]
  def change
    remove_column(:cards, :date_added, :date)
    remove_column(:cards, :date_deleted, :date)
    remove_column(:cards, :date_finished, :date)
    remove_column(:cards, :date_started, :date)
  end
end

There’s no real data yet, and even if there was, all the running projects already have snapshots, so nothing should be lost here.

Replacing the attributes calls for a bunch of similar methods that all have the same pattern.

The simple logic here would be to search for the one and only snapshot in the list whose status matches what we are looking for and return its date.

However, in the actual data, the card might have gone back and forth over the actual status leading to multiple snapshots with the same status, or there might be zero cards with the same status if the card has jumped through multiple statuses on the same day.

Because I’m me, I metaprogammed them¹ — this is the quick to green, un-refactored version:

Status.sorted_statuses.each do |a_status_value|
  define_method(:"date_#{a_status_value.verbed}") do
    return nil unless achieved?(a_status_value)
    card_snapshots
	  .select { _1.advance? }
	  .sort_by { [-_1.status, _1.snapshot_date] }
      .select { _1.status_value.after_or_eq?(a_status_value) }
      .last
	  &.snapshot_date
  end
end

This gives us a set of attribute methods for each status that look like this sample:

def date_started
  return nil if unless achieved?(Status["started"])
  card_snapshots
    .select { _1.advance? }
    .sort_by { [-_1.status, _1.snapshot_date] }
    .select { _1.status_value.after_or_eq?(Status["started"]) }
    .last
	&.snapshot_date
end

So, the code:

Returns nil if the card has not currently achieved the status — in the sample method, if the current state of the card is before :started, return nil
Then we filter out the card snapshots to try and find the most recent relevant one: first we limit our list to only snapshots that are an advance?, meaning that they have moved the status forward from the previous snapshot.
Then we sort such that the latest statuses are first, with ties broken by snapshot date, filter down to the remaining snapshots that are after or equal the one we are testing, and pick the last one (trust me), and grab its snapshot_date.

All told, this should handle the weird cases where there are more than one or less than one matching snapshot. I think…

The exact details of the algorithm don’t matter — even though I just spent 500 words explaining them. What’s important is this:

All the history calculations are now completely separated from the card itself — all the data we need for the calculation is encapsulated in the list of snapshots. We do use the current status of the card, but, importantly, that’s equivalent to the status of the most recent snapshot.

In other words, we can refactor these attributes to this separate class:

class CardSnapshotList
  include Enumerable
  include HasValueObject
  attr_reader :card_snapshots

  value_object_for(:status, Status)
  value_object_for(:size, Size, default_value: "sm")

  def initialize(*card_snapshots)
    @card_snapshots = card_snapshots.flatten
  end

  delegate :each, :empty?, to: :card_snapshots

  delegate :status, :project, :in_project?, :size, to: :last
  def last
    card_snapshots.max_by(&:snapshot_date)
  end

  def achieved?(a_status_value)
    status_value.after?(a_status_value)
  end

  Status.sorted_statuses.each do |a_status_value|
    define_method(:"date_#{a_status_value.verbed}") do
      return nil unless achieved?(a_status_value)
      last_snapshot_at_or_beyond(a_status_value)&.snapshot_date
    end
  end

  def last_snapshot_at_or_beyond(a_status_value)
    in_status_order.select { _1.advance? }
      .select { _1.status_value.after_or_eq?(a_status_value) }
      .last
  end

  def in_status_order
    sort_by { [-_1.status, _1.snapshot_date] }
  end
end

We also need a little glue in the Card class to make the attributes continue to delegate appropriately:

def card_snapshot_list
  CardSnapshotList.new(card_snapshots)
end

Status.sorted_statuses.each do |a_status_value|
  delegate :"date_#{a_status_value.verbed}", to: :card_snapshot_list
end

This all works, the tests pass. There is probably a performance issue eventually about how advance? is calculated, but that should be solvable should it ever be an issue.

So…

We’ve fixed our issues with having duplicate sources of truth, and the code much better suited to handle different kinds of statuses over time.

The code has gotten more complicated, but so has the business logic — if we weren’t also trying to allow statuses to go back and forth, the code would still be pretty simple.

Yes, there’s metaprogramming, and I know, but if you wanted to get rid of it you could replace all the dynamic date_started methods with something like date_of_status(:started), it’d all work pretty much the same.

More to the point, we’ve completely solved our issue with being able to recreate historical status exactly as it looked on an arbitrary day.

Here’s how:

Before this refactor, the Calculator class that does all the of calculations on a card to get the duration of time spent on it, and so on, takes a Card as an argument.

But with all the timing data now siphoned off into CardSnapshotList, we can pass that class a CardSnapshotList instead. Everything still works exactly same because we’ve preserved the method interface, both classes respond to date_started and card_snapshots.

And once we can do duration calculations on a whole set of snapshots, we can do duration calculations on a partial set of snapshots…

def at(previous_date)
  CardSnapshotList.new(card_snapshots.select { _1.snapshot_date <= previous_date })
end

The method can go either in Card or CardSnapshotList, it works either way.

All we need to do is pass a truncated list of the card snapshots through the day we care about, and we’ll have preserved the state of the card as it looked on that day.

This trick — encapsulating lists of objects, especially has_many ActiveRecord relations — is a useful one that I don’t see suggested very often. (I’ve often felt it’d be interesting for Rails to create list classes for ActiveRecord models by default). Hopefully this example starts to show you why that is.

In many cases there’s some kind of custom list traversal with a domain-specific meaning. For example, you might have

A custom way of calculating a total value of the list — for example a list of Elmer cards might have custom size calculation.
Other custom ways of converting the list to a single value, as this case converts a list of snapshots to single start values for each status.
A custom way of filtering, for example a list of Elmer cards might have a constructor or factory that takes a search.
Custom conversions: a list of Elmer cards might need to convert to a list of their statuses.

The point is that if those domain specific features of lists are useful in one context, there is a really good chance they are useful in others — especially if the ActiveRecord class has a relationship with multiple parent classes

I notice, for example, that the Elmer Project class right now has four methods that start with either cards.select or cards.map. And one method that starts with people.map. The Elmer Team class also has_many cards and has_many people, and it’s highly likely that Team is going to need most or all of those methods on a Team list of cards once I do cross-project team wide displays. (This doesn’t even count searching through cards, which I haven’t written yet). Creating a CardList class means that behavior is encapsulated. And this example shows that once the behavior is encapsulated, it’s also possible that new behavior will be easier to write and manage.

Let’s just do this. Here’s an ActiveRecord concern:

concern :HasListClass do
  class_methods do
    def list_class_for(attribute, class_name = nil)
      class_name ||= "#{attribute.to_s.classify}List"

      define_method :"#{attribute.to_s.singularize}_list" do
        class_name.constantize.new(send(attribute))
      end
    end
  end
end

The classify method is an ActiveModel feature that converts the attribute name from something like :card_snapshots to the default class name CardSnapshotList, including converting the name to a singular noun. (It’s what ActiveRecord already uses for table names from has_many declarations.) The defined method does exactly what our glue method did above: creates an instance of the list class with the internal list as an argument.

Then our existing Card code can change to:

includes HasListClass
list_class_for(:card_snapshots)

And the method card_snapshot_list still exists and works.

It’s not much of a change, but we’re creating a new convention, and now I can easily adjust the project class

includes HasListClass
list_class_for(:people)
list_class_for(:cards)

This gives me methods called card_list and person_list (the CardList and PersonList classes have to exist for the methods to work).

Wait, while I’m here, let’s take it another step, we’re going to want common features to all the list classes, here’s a parent class pulling out from CardSnapshotList

class ActiveRecordList
  include Enumerable

  attr_reader :elements

  def self.from(elements_or_list)
    if elements_or_list.is_a?(self)
      return elements_or_list
    end
    new(elements_or_list)
  end

  def initialize(*elements)
    @elements = elements.flatten
  end

  delegate :each, :empty?, to: :elements
end

Now I can make CardSnapshotList a subclass of ActiveRecord list and remove the duplicate lines from that class, and I’ve got a foundation of common behavior for my list classes that will make them instantly Enumerable and instantly useful.

I can then move all the cards.select and cards.map method from Project to the new CardList class. It works, but there are some references that need to change from, say project.cards_at to project.card_list.at. I think I’m okay with that change, if it bothers me, I can always add a delegate call.

And that’s refactoring to encapsulate a list of objects in ActiveRecord.

I also did setters, which I needed temporarily to make the tests pass, but they were very slow and it wound up being easier to change the tests to create and update snapshots directly. I’m not showing the setters here because they would up being unrelated to the general point. ↩︎

Related Pages