Noel Rappin Writes Here

How Not To Use Static Typing In Ruby

Posted on September 16, 2024


How To Not Use Static Typing In Ruby

Last time, I took a short example and examined in some detail what you would gain by adding static typing to it and what it would cost to use static typing.

What I didn’t do was explain how I might handle the problem without static typing.

For reference, Here’s the example again. Consider this to be part of a larger system and don’t worry too much about the rest of the world:

class CheckoutService
  def checkout(user, items, amount, status)
    # do some things
    ManagePayment.new.manage_payment(user, items, amount, status)
  end
end

class ManagePayment
  def manage_payment(user, items, amount, status)
    # make the user pay
    HandleShipping.new.handle_shipping(user, status)
  end
end

class HandleShipping
  def handle_shipping(user, status)
    send_item_to(user.address)
  end
end

The problem, as originally presented to me, was:

“Even though address isn’t used until the third step, none of the steps should happen if the user input doesn’t have an address”.

A statically typed system solves this problem by preventing you from passing an object to user if it isn’t of the class User, which presumably has a address attribute. A dynamically typed system needs to do something else.

In the previous article we talked about what other potential data validation errors might need to be handled, but in this one, I’m going to focus on that fast-fail if the type isn’t correct. The other potential validation issues still need to be handled, but that’s true in both cases so I’ll focus on the dynamic specific structures.

Also, I’m not going to be worrying about editor tooling — static typing is better for that, some of these options do provide tooling with Solargraph, Ruby LSP, or RubyMine’s tooling.

Option 1: Do Nothing

Really. This is an option. Okay, technically it doesn’t conform to the constraint, but in the real world, in this situation you should at least raise the question of how much damage is actually done if the code does not fast-fail.

I realize that this is kind of cheating for the purposes of the problem, but my whole point here is that there are scenarios where the tradeoff of the flexibility might be worth the occasional error (in practice this would mean waiting until the code gets to the actual call in handle_shipping to catch the error):

  • In some cases, something else is doing the data validation and you are pretty sure that the incoming data is going to be valid. (This is perhaps another thing that is more often true in the smaller team / less complex code setup). This is the “if a non-User gets here, 10 other terrible things will have already happened, and this code path is the least of our problems” scenario.
  • In a related case, you have a setup where something else will fail loudly in this code – in this example, you might have a call to address (or some other User-specific attribute) before the other calls happen, so it will naturally fail.
  • It might not actually be terrible for the point of failure to be in handle_shpping – the problem is set up so it is, but if this code isn’t actually about money or irreversible real-world outcomes, you should make sure that the constraint is real and not an obstacle. The simpler code that waits to catch the error may have benefits down the road. Malka Older has a quote in this book about “the human tendency to romanticize the imposition of unnecessary obstacles”, and wow does that apply to developers. Fight it.
  • Alternately, it might be easier to mitigate an error after it happens than to prevent it before it happens.

Anyway, doing nothing is an option here.

Option 1a: YARD

This is somewhere between doing nothing and actually type checking, but the YARD documentation tool does allow you to annotate with type information:

class CheckoutService

  # @param user [User]
  # @param items [Array<Item>]
  # @param amount [Integer]
  # @param status [Symbol]
  # @return [void]
  def checkout(user, items, amount, status)
    # do some things
    ManagePayment.new.manage_payment(user, items, amount, status)
  end
end

This doesn’t actually give you runtime tooling, but RubyMine does parse this and will give you editor hints based on the YARD comments (other tools might as well), plus you get nifty documentation to boot.

I feel there’s a very strong chance that we’ll get some kind of YARD -> RBS bridge sooner or later.

Option 2: Actually Type Check

You can trivially do a real runtime type-check in Ruby:

class CheckoutService
  def checkout(user, items, amount, status)
    raise RuntimeError unless user.kind_of?(User)
    # do some things
    ManagePayment.new.manage_payment(user, items, amount, status)
  end
end

And while I grant you that it’s less elegant than a static type declaration, it will prevent what we are told we want to prevent.

Most Ruby style guides will tell you to avoid this usage (see pages 361-2 of the Pickaxe, for example) for two basic reasons:

  • You are getting all the costs of static typing in terms of making the code less flexible and almost none of the benefits – Ruby editors and tools likely won’t accept that user is a User even after the guard clause.
  • If you are doing any logic based on the class of an object, you are potentially replicating what a late-binding system does and you should just use method calls.

Have I done this in Ruby? Yes. Most often, as we’ll see, because it’s useful in the factory methods you use for coercion methods. Less frequently as a way to avoid monkey patching code I don’t own. But there’s almost always a better way than integrating the type check with the business logic.

Option 3: Modified Type Check

Most Ruby style guides will tell you that if you want to do logic based on the type of an object, don’t use the class – classes aren’t types in Ruby – but check whether the object responds to a method:

class CheckoutService
  def checkout(user, items, amount, status)
    raise RuntimeError unless user.respond_to?(:address)
    # do some things
    ManagePayment.new.manage_payment(user, items, amount, status)
  end
end

Pickaxe suggests this on page 362, including this joke

“Will you get thrown out of the duck typing club if you check the parameter against a class? No, you won’t. The duck typing club doesn’t check to see whether you’re a member anyway”

I wish I could take credit for the joke, but it predates me and sounds like Dave.

From a Ruby style standpoint, respond_to? is considered incrementally better than kind_of?. Why? Because you keep the potential of dynamically extending the code. If we eventually have Customer objects here, the code will still work as long as they respond to address, but the code will still fail if you pass in a String or nil or something else random.

There are a couple of downsides, the main one being that if you are dependent on more than one method of user this gets unwieldy quickly.

I find that I don’t actually use this one very often in practice, I usually jump to the next step.

Option 4: Coercion to Existing Classes

One way to look at static typing is that it’s preventative — you are dealing with complex and potentially messy data by limiting the kinds of data that can be passed to a method. You are building a fence around your code.

That’s obviously appealing, but the result is that you are placing a burden on any of the method’s callers to adjust their data to the shape you are expecting.

There are a couple of potential problems here. The theoretical problem is that this way of managing data inverts what is supposed to be the flow of knowledge in an object-oriented program, where a class is supposed to manage its own data. Another theoretical problem is that your wall might be too strict, that actually useful usages might be prevented. The practical problem is that if there are a lot of callers to a method, that’s potentially a lot of sites that are adding complexity to adjust their data to match the call site.

Alternately, we could make our method more welcoming in what data it accepts. This puts the complexity involved making sure the data is the correct shape inside the method or class. Rather than a fence, this is a toll road in, a road in which you do have to get past a barrier.

One place this might work in our example is the status parameter, which presumably could be a string or a symbol.

Strings and symbols are a particularly fraught type issue in Ruby because they are so similar and because external data sources often don’t recognize symbols.

A very common type problem in Ruby goes like this:

  • Ruby code treats an attribute as a symbol
  • The attribute round-trips to a database or a JSON payload and comes back as a string.
  • The Ruby code does an equality test on the data against a symbol, which always fails and won’t raise an error. (Or the data is used as the key in a hash and never matches the symbol key, which is basically the same thing.)

I don’t consider this a reason to use static typing in general, but I do consider it an inconvenience in the way Ruby interacts with the outside world that needs to be dealt with.

I frequently normalize potential string/symbol data as it comes in — usually this would be in the initializer, but I don’t have an initializer in this example yet.

class CheckoutService
  def checkout(user, items, amount, status)
    status = status&.to_sym
    ManagePayment.new.manage_payment(user, items, amount, status)
  end
end

We could do something similar for amount, to convert it to BigDecimal or use the money gem.

I will also do things like call symbolize_keys on a Hash to normalize the hash keys, or alternately use the Rails HashWithIndifferentAccess so I don’t need to care what the input type is.

The common argument against using things like HashWithIndifferentAccess is that it encourages or at least enables sloppiness on the part of the programmers. In the specific case of string/symbol, I don’t think that’s a problem – Rails developers have been using HashWithIndifferentAccess for 20 years, and I’d bet there’s a very high number who use it regularly and don’t know it exists.

In the general case, you do need to be careful — I used to do this a lot in code:

def thing(user_or_id)
  user = user_or_id.is_a?(Integer) ? User.find(user_or_id) : user_or_id
  # more stuff
end

I liked it, but the problem is that if somebody accidentally passes in an id for a different class, this code will blithely convert to a user and you get a very subtle bug.

This is where somebody tells me about their TypeScript code where every object’s ID was a unique type so you avoid that problem, and I tell them about Rails Global ID, which effectively allows you to do the same thing.

You could get fancy with this with a little monkey patching… This uses the object system to ensure that we’ve got a User object given a User or a GlobalID.

class GlobalID
  def find_if(klass)
    raise RuntimeError unless klass == model_class
    find
  end
end

class ActiveRecord::Base
  def find_if(klass)
    raise RuntimeError unless klass == self.class
    self
  end
end

def thing(user_or_global_id)
  user = user_or_global_id.find_if(User)
end

I like this, the main problem is that you are less likely to be dealing in Rails GlobalIDs, but I could see this being valuable if you have a method that is called both normally (with a User) or from a background job (with a GlobalID).

You can do coercion for our non-literal classes as well.

Let’s say we had a method like this:

class User
  def self.from(object)
    case object
    when User
      object
    else
      raise RuntimeError, "#{object isn't a user}"
    end
  end
end

(I’m using the case statement here because then I don’t need to use kind_of?)

Now, I take my original method and run my user through that:

class CheckoutService
  def checkout(user, items, amount, status)
    vetted_user = User.from(user)
    ManagePayment.new.manage_payment(vetted_user, items, amount, status)
  end
end

Okay, big whoop, it’s the same type check but buried behind some abstraction… But I could extend it to be more forgiving about what types got passed in.

class User
  def self.from(object)
    if object.is_a?(GlobalID)
      object = object.find
    end
    case object
    when Company
      object.contact_user
    when String
      User.find_by(email: object)
    when User
      object
    else
      raise RuntimeError, "#{object} isn't a user"
    end
  end
end

(The Global ID check is separate from the case statement because if the Global ID is for a Company, you’d still want the conversion to happen.)

And so on, depending on how causal you want to be about accepting values.

And, for what it’s worth, if you do want some static typing, and you type check the coercion method…

class User
  def self.from(object: any): User
end

You get a lot of the benefit of static typing – your tooling will be able to treat the vetted_user as a User, but you don’t have to limit the set of parameters that your method takes.

(I’m talking myself into a very counterintuitive position where I could imagine not type checking parameters to methods, but at least occasionally type checking the return values on the theory you get a lot of the benefit of tooling without losing much flexibility. I don’t completely believe this yet, but I’d be into trying it once.)

Validated Objects

A slightly different structure that was pointed out to me allows you to incorporate all your validations into the type system by having a ValidUser and InvalidUser class. There are a lot of ways to do this, and extension of what we’ve got looks like this:

class User
  def self.from(object)
    if object.is_a?(GlobalID)
      object = object.find
    end
    case object
    when Company
      object.contact_user
    when String
      User.find_by(email: object)
    when User
      object
    else
      raise RuntimeError, "can't convert #{object} to a User"
    end
    object.valid? ? ValidUser(object) : InvalidUser(object)
  end
end

I think we’ll talk more about this in a future post about getting rid of if statements, but the idea here is that then the InvalidUser acts like more of a null object and prevents the future operations from happening.

A digression about initializers

If you have a quote-unquote “service object” that effectively has one main public method (which is basically what we have here), there are three ways to structure the code in Ruby:

Class method:

class Service
  def self.call(arg1, arg2)
  end
end

Instance method, empty initializer:

class Service
  def initialize
  end

  def call(arg1, arg2)
  end
end

Instance method, full initializer:

class Service
  def initialize(arg1, arg2)
  end

  def call
  end
end

You should almost always use the last version.

There are a couple of reasons, the relevant one here is that it allows us to ensure that our data normalization happens.

In our example, we’re potentially coercing the user, the items, the status, the amount… it’s a lot to do in the actual business logic class. It’s easier, and clearer to do this:

class CheckoutService
  attr_reader :user, :items, :amount, :status

  def initialize(user, items, amount, status)
    @user = User.from(user)
    @items = items
    @amount = amount
    @status = status&.to_sym
  end

  def checkout
    # do some things
    ManagePayment.new.manage_payment(user, items, amount, status)
  end
end

There are a couple of advantages to this structure. A big one is that if we add a second public method besides checkout, we don’t have to re-do all the data shaping.

There aren’t very many ways to enforce order of methods in Ruby, but initialize is one of them – the object has to be initialized before its used, so you can guarantee the data is validated before checkout.

And… if you are into the partial static typing idea, you can type the instance variables and not the arguments to initialize, and you can still get a lot of the tooling benefit of typing.

Option 5: Coercion to New Class

One downside of putting all this initialization and whatever in the initializer is that this set of objects is effectively shared across the three services. Assuming we don’t want to duplicate the validation each time we call the new service, another option is to create a new class that contains all the validation and then use that:

class CheckoutOrder
  attr_reader :user, :items, :amount, :status

  def initialize(user:, items:, amount:, status:)
    @user = User.from(user)
    @items = items
    @amount = amount
    @status = status&.to_sym
  end
end

This time I’ve made the arguments keyword arguments — my normal practice if there are more than two arguments.

Then:

class CheckoutService
  attr_reader :checkout_order

  def initialize(user, items, amount, status)
    @checkout_order = CheckoutOrder.new(user:, items:, amount:, status:)
  end

  def checkout
    # do some things
    ManagePayment.new(checkout_order).manage_payment
  end
end

class ManagePayment
  attr_reader :checkout_order

  def initialize(checkout_order)
    @checkout_order = checkout_order
  end

  def manage_payment
    HandleShipping.new(checkout_order).handle_shipping
  end
end

class HandleShipping
  attr_reader :checkout_order

  def initialize(checkout_order)
    @checkout_order = checkout_order
  end

  def handle_shipping
    send_item_to(checkout_order.user.address)
  end
end

This passes the full order object to the HandleShipping class, which in the previous examples didn’t actually use all the attributes. I don’t have a problem with that, but it is a difference in the code.

Like the other examples, this code will flag the type error immediately on the attempt to create the CheckoutOrder object, and basically any code path into the other methods will need to be checked when that object is created. I guess technically, you’d consider validating that the initialize methods of the other classes receive a CheckoutOrder.

This could be combined with the valid/invalid factory to give a ValidCheckoutOrder vs. InvalidCheckoutOrder if you wanted.

What I like about this one is that it makes it easy to have the common validation logic shared across the different parts of this workflow. My experience is that these kinds of value classes tend to wind up attracting useful functionality that would otherwise awkwardly be attached to one of the other objects.

The down side is the increased complexity, so you’d want to do this in cases where there is shared validation logic across multiple users of these classes.

Next

This got super long, so I just want to conclude with these statements:

  • You can do runtime type checking in Ruby if you must
  • You can also do implicit type management by using the object system to shape the data as you need it.

Next post will go more into the extracting class techniques and why they are valuable.



Comments

comments powered by Disqus



Copyright 2024 Noel Rappin

All opinions and thoughts expressed or shared in this article or post are my own and are independent of and should not be attributed to my current employer, Chime Financial, Inc., or its subsidiaries.