Functional Programming in Python Pt. 1
One common question programmers have is “Why functional programming (FP)?” We were taught Object Oriented Programming (OOP) in college and primarily work with OOP in our professional lives. You may have learned functional programming in languages like Haskell or OCaml and found it too demanding to use in practice. This blog will show you how functional programming is closely related to the design of modern programming languages and how to apply it to your day-to-day work.
This blog is heavily based on the book “Functional Programming in Scala” (referenced as “the book” below) by Paul Chiusano and Runar Bjarnason. I found their work extremely enlightening and am grateful for their contribution.
What is Functional Programming?
A Taste of FP
Let’s start with a quick overview of FP for those who might be unfamiliar with it. At its core, FP means constructing programs based on pure functions. Pure functions are functions with no side effects. Side effects are anything the function does other than returning the result, such as modifying a global variable or writing to a file. However, most programs need to have some side effects when they interact with the real world. What FP does is postpone side effects as much as possible. We want the structure (or skeleton) of our project to be side-effect free, only producing side effects when absolutely necessary.
You might wonder why we’re so concerned about side effects, given that OOP is full of them and seems to work well. Let me borrow an example from the book to illustrate this.
Suppose we are modeling a Cafe:
class Cafe {
    def buyCoffee(cc: CreditCard) -> Coffee:
        cup = new Coffee()
        cc.charge(cup.price)
        return cup
}
This program looks natural to OOP programmers: when buying coffee, we initiate a charge on a credit card instance and return the produced coffee. Here, cc.charge is what we call a “side effect” because it mutates the CreditCard instance passed into it.
There are two issues with this interface:
- It violates Separation of Concerns. The buyCoffeemethod shouldn’t need to know how the payment is implemented. If we need to change thecc.chargesignature, we also have to modify thebuyCoffeemethod. This tight coupling betweenCafeandCreditCardmakes the code difficult to test.
- The coupling prevents composition. For example, if a customer wants to buy two cups of coffee, the current implementation would force us to issue two separate payment requests to the credit card service, which is inefficient.
In contrast, the FP style eliminates the side effect by returning a Charge object:
class Cafe {
    def buyCoffee(cc: CreditCard) -> (Coffee, Charge):
        cup = new Coffee()
        return cup, Charge(cc, cup.price)
}
Notice how we’ve added a layer of abstraction between Cafe and CreditCard with the Charge class, effectively postponing the side effects. Instead of immediately issuing a payment request, we return a Charge object that represents the payment request. This allows us to easily combine multiple payment requests by extending our Charge class:
@dataclass
class Charge:
    cc: CreditCard
    amount: Double
    def combine(other: Charge) -> Charge:
        if cc == other.cc:
            Charge(cc, amount + other.amount)
With Charge, buying multiple coffees becomes straightforward:
class Cafe {
    def buyCoffee(cc: CreditCard) -> (Coffee, Charge):
        #...
    def buyCoffees(cc: CreditCard, n: int) -> (List[Coffee], Charge):
        # leverage buyCoffee 
        purchases = [buyCoffee(cc) for i in range(n)]
        coffees, charges = purchases.unzip()
        
        # combining charges is simple
        combinedCharge = charges.reduce(lambda c1, c2: c1.combine(c2))
        # returns a list of coffee, but only a single charge
        return coffees, combinedCharge
}
While this is a simple example, it demonstrates how FP can improve code quality and maintain Separation of Concerns. In my experience, tightly coupled code appears frequently, and code quality can deteriorate quickly. It’s valuable to keep the concept of “side effects” in mind when programming or conducting code reviews.
Pure Functions
While “pure functions are functions without side effects” is a good working definition, the formal definition is that a pure function’s output is solely determined by its input, regardless of any external states. The output is literally equivalent to the function called with the input, so you can directly replace the function call with the output. This property is called Referential Transparency.
Our first buyCoffee implementation breaks referential transparency because we can’t replace Cafe.buyCoffee with a Coffee instance, since Cafe.buyCoffee also issues a payment request. The second buyCoffee implementation maintains referential transparency.
To achieve referential transparency, we typically avoid mutating data in-place since that would be a side effect. Here’s a simple example using Python’s list operations:
a = [1, 2, 3]
# approach 1, using List.pop() to modify in-place
def list_pop_1(a):
    a.pop()
print(a) # [1, 2]
# approach 2, creating a new list
def list_pop_2(a):
    return a[:-1]
a = list_pop_2(a)
print(a) # [1, 2]
Notice that list_pop_1 breaks referential transparency by mutating external state. The list returned by list_pop_2 is a new list. We can verify this using Python’s id() function, which returns the memory address:
id(a) # 4365453504
b = list_pop_2(a)
id(b) # 4365452224
The second approach aligns with FP principles. However, since it returns a new list, you might wonder if this means the program requires more memory. The answer is interesting…
Data Sharing
Do pure functions require more memory?
The answer is both yes and no. When we call a[:-1], we’re not creating a new list every time. Instead, the immutable list is created once and reused afterward. In the code below, you can see that calling a[:-1] multiple times yields the same memory address:
a = [1, 2, 3]
print(id(a)) # 4365453504
print(id(a[:-1])) # 4365452224
print(id(a[:-1])) # 4365452224 same id.
Interestingly, when a[:-1] is assigned to a variable (indicating potential mutation), Python creates a new immutable list. It’s as if b claims the original list slice and makes it mutable, so Python needs to create a new immutable one for reading.
b = a[:-1] 
print(id(b)) # 4365452224 still the same id!
print(id(a[:-1])) # 4365452992 wow! new id
This behavior is called data sharing, specifically Copy on Write (CoW). This pattern is widely used in computer science. I first encountered it while learning about the fork() system call in college.
With data sharing, calling list_pop_2(a) multiple times takes only constant memory, as long as the result is only read and not written to.
Error Handling
Is try/catch the best way of error handling?
In later programming languages like Go and Rust, the standard approach to error handling is returning the error, as opposed to earlier programming languages like C++, JavaScript, and Python, where try/catch is the common pattern.
// Golang
func divide(a, b float64) (float64, error) {
    if b == 0 {
        return 0, errors.New("division by zero")
    }
    return a / b, nil
}
// Usage
result, err := divide(10, 0)
if err != nil {
    fmt.Println("Error:", err)
    return
}
fmt.Println("Result:", result)
# Python
def divide(a: float, b: float):
    try:
        return a / b
    except ZeroDivisionError:
        raise ValueError("division by zero")
# Usage
try:
    result = divide(10, 0)
    print("Result:", result)
except ValueError as e:
    print("Error:", e)
I deal with Go code on a daily basis at work, and I’m often frustrated by this return-based error handling because I frequently encounter this pattern:
func doThis():
    res1, err := firstDoThis()
    if err != nil {
        return nil, err
    }
    res2, err := thenDoThis()
    if err != nil {
        return nil, err
    }
    res3, ... you get the idea
You might wonder why Go and Rust chose this pattern. Here’s the reasoning:
Referential Transparency
First, throwing an error breaks referential transparency, making the function impure. Consider this code:
def failingFn():
    result = divide(10, 0)
    try:
        return result
    except Exception as e:
        return 42
The function should throw the ValueError from divide. However, let’s try substituting result for divide(10,0) to test referential transparency:
def failingFn():
    try:
        return divide(10, 0)
    except Exception as e:
        return 42
Now the function returns 42. The effect is different, so it’s not referentially transparent. The key point is that throwing an exception is context-dependent (i.e., whether it’s inside a try block). Therefore, divide isn’t pure because the effect of the function changes based on the context, even with the same input.
Type Safety
Another clear drawback of exception-based error handling is type safety. The function signature in Go explicitly tells what kind of error the caller should expect, while in Python you have to read the docstring (which is not reliable) or the implementation of the function. It’s unclear what errors the caller should expect, and there’s no way for the compiler to help because exceptions aren’t part of the type checking system. Errors can only be detected at runtime.
Total Functions vs Partial Functions
Looking back at the divide function, we can see that for some valid inputs, the return value is undefined. Specifically, even though 0 is a valid input for the denominator (since it’s a float), the function doesn’t return a float as expected (thus breaking the type system). This makes it a partial function.
Partial functions are functions that aren’t defined for all inputs. By contrast, total functions work for every input. The Go version of divide is a total function because for all float inputs, it always returns a (float, error) tuple.
Lifting and Fall Through
Returning errors can fully replicate the process of throwing exceptions while making the process more explicit. If you don’t want to handle an error immediately, you can let it fall through to the next level by returning it.
Laziness
Consider this function:
li = [1, 2, 3, 4]
plus_10 = list(map(lambda x: x + 10, li))
even = list(filter(lambda x: x % 2 == 0, plus_10))
times_3 = list(map(lambda x: x * 3, even))
It adds 10 to each element, filters out odd numbers, and multiplies by three. We can represent the pipeline in a DAG:
For each step, three things happen:
- The list is broken into elements for computation via iterator
- Each element is processed and yields a new element
- The elements are collected into a list
The “collecting into list” step creates a blocking stage. We have to wait for all elements to be processed and collected into a temporary list for the next step, even though the list is broken into elements immediately.
Can we avoid creating these temporary lists?
Iterators
Python provides an elegant solution. map and filter can accept and return iterators, allowing us to avoid building temporary lists.
def add_10(x):
    print(f"Adding 10 to {x}")
    return x + 10
def is_even(x):
    print(f"Checking if {x} is even")
    return x % 2 == 0
def multiply_3(x):
    print(f"Multiplying {x} by 3")
    return x * 3
li = [1, 2, 3, 4]
plus_10 = map(add_10, li)
even = filter(is_even, plus_10)
times_3 = map(multiply_3, even)
# Nothing printed yet because no computation has happened
print("About to compute results...")
result = list(times_3)  # Now you'll see the print statements
The program outputs:
About to compute results...
Adding 10 to 1
Checking if 11 is even
Adding 10 to 2
Checking if 12 is even
Multiplying 12 by 3
Adding 10 to 3
Checking if 13 is even
Adding 10 to 4
Checking if 14 is even
Multiplying 14 by 3
As you can see from the output, map and filter functions don’t perform the actual computation, but only create a description of the computation. The actual computation happens only when the iterators are materialized into a list. This feature helps us avoid intermediate temporary data structures, improving performance.
First-class Loops
With iterators, we can build loop-like logic in a functional programming way without sacrificing performance. The above program is equivalent to this loop-based version:
ret = []
for ele in li:
    ele += 10
    if ele % 2 != 0:
        continue
    ele *= 3
    ret.append(ele)
However, the loop-based program isn’t composable—it’s a large chunk of code that can’t be easily reused elsewhere. It’s also harder to reason about compared to map and filter. That’s why we sometimes describe iterators (or streams) as “first-class loops” since they let us interleave and compose map and filter operations. This exemplifies the philosophy of functional programming: using small, composable building blocks to create complex systems.
Generators
Here’s another fascinating topic.
Python’s list can create a finite list of elements:
list = [1, 2, 3, 4]
But what if we want to create a list with an infinite number of elements (e.g., all natural numbers)? Is that possible?
list = [1, 2, 3, 4, 5 ...]
Actually, yes! We don’t need to materialize an infinite list; we just need to describe it. This is what generators do. They describe a sequence of elements and materialize them on demand:
def natural_numbers():
    n = 1
    while True:
        yield n
        n += 1
# Usage:
numbers = natural_numbers()
for n in numbers:
    print(n)
    if n > 5:
        break
Let’s look at a practical example. Imagine you need to assign a unique ID to each instance you create. A simple implementation might use a global variable:
id = 1
class A:
    def __init__(self):
        self.id = id
        id += 1
While this works, it’s not ideal because the behavior of id isn’t constrained. Anyone can reset id to 0 and disrupt the system. A better approach is to create an infinite sequence using a generator and assign IDs by taking elements from this sequence in order:
def idGenerator():
    n = 1
    while True: 
    # The infinite loop is lazily evaluated, so it's safe
        yield n
        n += 1
generator = idGenerator()
class A:
    def __init__(self):
        # get the next element from the sequence
        self.id = generator.__next__()
            
a = A()
print(a.id) # 1
b = A()
print(b.id) # 2
This concept might remind you of PRG (pseudorandom generator) in cryptography, which generates a sequence of numbers that no differentiator can distinguish from a truly random sequence. For generating IDs (especially in distributed systems), you could also use a random generator, which is essentially an implementation of PRG.
The term “generator” is quite fitting, as it truly generates a sequence of values on demand.
Epilogue
We’ve covered the fundamental concepts of functional programming here. In future blog posts, you can expect to learn more about how FP influences interface design, testing, and other aspects of software development.
Highly recommend reading “Functional Programming in Scala” - it’s truly insightful and inspiring.