#### Errors and Partial Functions

##### July 10, 2019
programming

What is an error? and should you model them with exceptions or return values? Here’s an overview for what an error is and how you should write code to handle it.

# A Motivating Example

Let’s start by considering the Spin-Wheel action within the game of “Wheel of Fortune”. Most people would divide the possible outcomes in the following ways:

• Valid Outputs:
• $500 •$1000
• JACKPOT
• BANKRUPT
• LOSE_TURN
• Errors:
• WEAK_SPIN
• WHEEL_ON_FIRE

This raises the question - “Why do we think of WEAK_SPIN as an error?” Wouldn’t it be just as valid to model the Spin-Wheel action so that it returns a SpinWheelResult instead of a Wedge, which could then have WheelLogicViolation as a return value, making WEAK_SPIN an actual output value instead of an error.

This is one illustration of the idea that nothing is normatively an error - they are simply different branches of behavior that need to be modeled - and it is a matter of perspective that we consider them to be errors.

SpinWheel => Wedge | Error
SpinWheel => SpinWheelResult = Wedge | LogicViolation


Indeed, if errors are simply other branches of behavior, it raises another question: Why do we even need error constructs? We already have mechanisms such as if/else statements and polymorphism that allow us to handle branches - why don’t we just use those for everything?

Errors are ultimately a structural concept derived from a useful underlying formalization with common patterns for handling these branches. As we’ll explore later on, it’s exactly right that we’ll want to use the language features we’re already comfortable with in order to handle them.

# Partial Functions

A function f: Domain -> Codomain is a mapping for elements of a Domain to elements of a Codomain.

A total function is one that is defined for all elements of the Domain, in contrast to a partial function, which is undefined for certain inputs.

A simple example of this is division /: (R,R) -> R, which is undefined for the input denominator of 0. Of course, if you had instead defined division as /: (R,R\{0}) -> R, it would be a total function.

## Arbitrariness of Totality

Totality and Partiality are ultimately modeling properties whose ultimate expression is limited by the strength of your type system. The more complex Domain-Sets you can express in your type system, the more functions you can directly express as Total without having to resort to some of the other tricks we’ll discuss later.

But in many ways, the functions you define are independent from your type system modeling - the two division operators above are both “division” in the sense that matters, but one of them better precludes inputs that are nonsensical for that operator.

## Partiality-Error Equivalence

The second insight is that our concept of “Error” maps 1-for-1 with any cause of Partiality for a Partial Function.

For example, a DivideByZero error is simply a cause of Partiality for a division operator that allows 0 in its Domain.

## Pure Partiality

There are two major kinds of Partiality that exist when writing software. The first is Pure Partiality, which is concerned with the errors that relate to input values with no corresponding mapped output value in the Codomain.

There are two sources for this:

• Static Enforcement: The point of describing your Domain and Codomain is to only have those values enter and exit your function. A dynamic type system, while allowing you to describe your function, does nothing to prevent invalid values from runtime evaluation.
• Example: passing a string into the division function in Python
• Imprecise Definition: Your Domain may be specified in a way such that there are values with no corresponding mapped values in the Codomain.
• Example: passing in zero to the denominator of the division function in Java
• Example: querying a database for an ID that doesn’t exist

In a sense, these two kinds of “Pure Partiality” errors are variants of each other. Errors are ultimately something that happen when your software runs, so if your Type System doesn’t statically prevent invalid values from being passed in, you can think of every function you write as being of type f: Any -> Any, which means that you have an extremely wide variety of Imprecise Definition errors to handle at runtime.

This is one of the reasons I believe type-inferred static typing to be a major advantage in language design. A dynamic language is fundamentally typed at runtime with a single type Any, which means that to properly handle the Imprecise Definition errors, you need to assert the correct type and then handle other inputs for every function. It’s simply less code to use a static type system, which gives you a concise way to make the type assertion and enforces types at compile time, which obviates the need to write the error handling logic at all.

It’s important not to go off the deep end with strong type systems, however, as a consequence of the Halting Problem is that no matter how powerful your type system is, there will always be a function whose “exact” Domain cannot be expressed, since a compiler is ultimately a program and cannot enforce the Domain of an undecidable function.

## IO Partiality

There is a second kind of Partiality of deep and fundamental importance, which is that our functions run on machines and do things in the real world, which has two main sources:

• Physical Layer: every function ultimately runs against a real machine despite our virtualized programming model. Therefore, every single function is Partial, since there are physical constraints (such as OutOfMemory) that can cause potentially any function to fail.
• Side-Effect: Virtually every function that is of practical use will perform some side-effect in the real world that may fail.

# Good Error Code

When you consider something as simple as the division operator, your language runtime must still do something when it evaluates a denominator of 0. If you were to imagine a language without nulls, exceptions, or any other such mechanism, evaluating a denominator of 0 might just trigger a total process shutdown.

In a vague sense, every function is “total” in that every result will have some action taken by the language runtime.

The essence of robust software, however, is not to eliminate errors, as all errors are simply branches of behavior based on how we model our domains, but rather to exhaustively handle all branches in intentional fashion, instead of just letting errors occur and defaulting to the language runtime’s behavior (which may or may not be correct).

To that end, error handling mechanisms exist to help you convert Partial Functions into Total Functions, and your goal should be to “Total-ize” your functions with these mechanisms.

Consider:

public static Integer divide(Rational numerator, Rational denominator) throws ArithmeticException


This signature is essentially /: (Q,Q) -> Integer || ArithmeticException, which is Total!

## Modularity Matters

Error handling code is still code, which means those remediation routines should follow the same Modularity principles that all good code follows.

There are three heuristics that are particularly useful:

• Authority
• End-to-End Principle
• Volatility Risk

### Authority

The first major question to ask is: Is the system with enough knowledge and context the one to properly handle the error?

Let’s say you were trying to send a package with UPS.

If the error was “a truck broke down”, the entire contract I have with UPS is that I don’t want to care about how a package was transported - UPS would be the correct authority for handling that error (by re-routing the package, etc.) and it should never bubble up to me.

On the other hand, if the error was “the address is invalid”, the only possible authority for remedy-ing this error is me since I’m the only one who really knows who I am trying to send the package to. Magical auto-correct of addresses can be attempted, but at the end of the day, I am the authority for this error.

#### Physical Layer

There’s an interesting corollary of the Authority Heuristic in terms of how to deal with the Physical Layer Failures.

The truth about the Physical Layer is that those errors are both inevitable and unhandleable. Code running on a machine cannot remediate failures of the machine, which makes those errors not even worth modeling.

The reality is that errors are errors within their context, but there are always other systems that watch systems, and to those systems - errors are simply eventualities to be managed. For example, the processes of a webservice are managed by a daemon, which are monitored by a metrics system, which are watched by a human, which is usually arrayed in a self-replicating redundant form.

WHEEL_ON_FIRE (from the Wheel of Fortune case study) is a perfect example of this - the meta-context of the game (i.e. the TV show production) will have processes in place to handle errors that cascade through all layers (Turn off the TV broadcast and display a “We Are Experiencing Technical Difficulties” message, while people try and figure out what to do to get the game state fixed).

When inevitable and unhandleable errors occur, Process Teardown should be how we proceed, and recovery must be delegated to the meta-system (whether daemon or human).

This then means that all code should be written as if it could fail and terminate at any moment of time, and that the resulting state should be such that the meta-system’s recovery process will result in correct application state.

### End-to-End Principle

When combined with the Authority Heuristic, the End-to-End principle is critical for understanding Retries.

Devs typically think of retries as error handling, but it turns out that they do not help with error handling.

A retry policy can reduce the propagation of errors but cannot eliminate any error. A retry policy will either generate an infinite loop (infinite retries == infinite loop) or the error will simply propagate outward more rarely (but will still propagate).

Fundamentally, some system has the authority to execute full and total remediation of the error, and no retry policy can be inserted in between that gets around the fact that that system must understand what to do with that error.

Retries are ultimately a reliability optimization, not a substitute for robustness.

### Volatility Risk

Error handling code is code, which means that the same consideration of avoiding entangling components that are vulnerable to change you would apply to your happy path should apply here. Your error values and their associated remediation should encapsulate so that the enclosing scope has no need to understand the error.

A common area for this is DB IO:

val myVal = query_database_for_id(1)


This method results in SQLException propagating to who knows where, meaning anyone could write catch (SQLException e) and now the fact you are using a SQL database has leaked across your code.

# Ideal Error Handling

What should an ideal form of error handling do?

• It should be first-class and comprehensive, allowing developers to take control of all causes of partiality using the same code branching mechanisms they have used for everything else.
• Developers should be forced to model and handle errors exhaustively (i.e. write and handle total functions). In order of importance:
• Statically checked types
• safe types (no implicit nulls)
• Statically enforce that all return values be used.

## Sentinel Values

I call it my billion-dollar mistake. It was the invention of the null reference in 1965.

— Tony Hoare

The history of error handling starts with nulls and sentinel values (like false or -1), which suffers from a few main problems:

1. You typically had to rely on documentation in order to understand whether or not there were error values at all.
2. Dynamic languages frequently result in people returning incorrectly-typed error values, which often cause problems when unhandled.
3. Finally, sentinel values that were properly typed resulted in the sinister behavior of semantically invalid values being used when they shouldn’t.

## Exceptions

Exceptions were the next step with two major drivers:

1. A separate value with explicit control flow implications made it very difficult for people to silently and accidentally ignore an error.
2. Unchecked exceptions also allow us to avoid contaminating literally every single function signature with a error return type because of the partiality causes that result from physical-layer failures, such as OutOfMemory.

Unchecked exceptions aren’t going to be our “ideal” since they don’t force people to handle errors. Checked exceptions do, but they suffer from the same problem as Golang, which is that they are essentially a second return type standing off to the side, which makes them substantially more verbose to express and handle.

I prefer an error model with the following primitives:

• A stdlib PartialFunctionError datatype should exist and be extensible (IOPartial & PurePartial) to describe error values.
• GADT support should exist to express sum-types with Errors and pure types.
• An Error Monad should exist (and supersede Option/Maybe/Try - covering both Pure and IO partiality) and be constructible from a sum-type with PartialFunctionError in order to convert functions from partial to total.
• Exceptions or the concept of Process Teardown should continue to exist, but should be used exclusively by the language runtime to initiate process teardown in the presence of physical layer errors (OOM, CPU failure, etc.)

This hits all of the major points:

• You use GADTs and first-class pattern matching, as well as a Monad structure to destructure and handle errors.
• Your compiler forces people to use the error values, which are then impossible to misuse due to compiler-enforced destructuring.
• Panics are extremely limited and prevent partiality from infecting every single function signature or type.

## Code Example: Division

If you look at a basic division example, a typical method might have a signature like this:

public static Integer divide(Rational numerator, Rational denominator) throws DivideByZeroException


Whereas I find an approach like this more compelling (intentionally made more verbose in order to illustrate what’s happening):

public static Try<Integer> divide(Rational numerator, Rational denominator) {
if (denominator.isZero()) {
return Try.of(() -> { throw new DivideByZeroException; });
} else {
return Try.of(numerator / denominator)
}
}


The compiler forces you to handle this robustly:

_.divide(1, 0) + 1; // Doesn't compile
_.divide(1, 0).map(d -> d.toString).getOrElse("Oops"); // "Oops"


But you can still chain together these operations without too much rigamarole either:

_.divide(10,5).flatMap(d -> _.divide(d, 0)).map(d -> d.toString).getOrElse("Oops");
// "Oops"


If you want to experiment more with this style, I’ve put together a couple of basic exercises that you can try out.

programming

programming

programming