Intermediate Python

Today we’re going to be talking about some features of Python that might be considered “ergonomic.”

You can, and have, gotten by without using these as they aren’t fundamental in the way that loops, functions, and classes are.

Learning these features gives you additional ways to solve problems, which can be more efficient or elegant than alternatives.

Some topics that fall into this category are:

  • Comprehensions (which you’ve already seen)
  • Decorators (which you’ve seen a little bit of)
  • Exceptions & Context Managers
  • Generators
  • Type Hints

Exceptions & Context Managers

Motivation

No matter how good your code is, there’s always going to be a chance that something goes wrong.

A common example is a file not being available when you try to open it. (Perhaps it is missing or you don’t have permission to access it.)

You could imagine code like this:

file = open("file.txt")
file.write("Hello, world!\n")
file.write("Second Write")
file.close()

If you’re worried about open failing you might end up with code like this:

file = open("file.txt")
# check if file is open (not a real method, just an example)
if file.is_open():
    file.write("Hello, world!\n")
    file.write("Second Write")
    file.close()

But what if other methods fail? If we’re writing over a network, the disk is full, or the file is locked by another process?

# this code does not work, demo purposes
file = open("file.txt")
if file.is_open():
    success = file.write("Hello, world!\n")
    if success:
        success = file.write("Second Write")

    if not success:
        # handle error

    # close file no matter what
    file.close()

This can get very messy very quickly.

Exceptions Syntax Review

In Python, we can use exceptions as an alternate control flow path. What this means is that instead of executing code sequentially, we can jump to a different part of the program if an exception is raised.

The two key pieces of syntax are raise, and try-except.

raise & Exception Types

When a raise statement is encountered, typical execution is stopped and the program jumps to the nearest matching except block.

raise ExceptionType("message")

An exception can be any class that inherits from BaseException, common built in exceptions are:

  • Exception
  • ValueError
  • TypeError
  • KeyError
  • IndexError
  • FileNotFoundError
  • NotImplementedError

You can also create your own by subclassing Exception or any other relevant type.

class TooManyTokens(Exception):
    pass

...

if len(tokens) > MAX_TOKENS:
    raise TooManyTokens(f"Expected at most {MAX_TOKENS} tokens, got {len(tokens)}")

try-except

try:
    # code that might raise an exception
except ExceptionType:
    # code to run if an exception is raised
except OtherExceptionType as e:
    # code to run if an exception is raised
    # in this example, e will be the exception object so you can
    # use it/log it/etc.
else:
    # code to run if no exception is raised
finally:
    # code to run no matter what
  • only one except block will be run, the first one that matches the exception type
  • an except block can match multiple exception types by providing a tuple of types or by using a base class
  • else and finally are not required and often omitted, but can be useful.

Our file handling example can be rewritten to use exceptions:

try:
    file = open("file.txt")
    file.write("Hello, world!\n")
    file.write("Second Write")
except (OSError, IOError) as e:
    # handle error
finally:
    file.close()

Now if any of these three lines raise an exception, the file will still be closed.

Context Managers

Context managers are a way to automatically run some code when entering and exiting a block of code.

They are commonly used to manage resources like files, locks, and database connections and can be thought of as related to exceptions in that they provide an alternate way to work with errors.

If a Python object has __enter__ and __exit__ methods, it can be used as a context manager.

Rewriting our file handling example to use a context manager:

__enter__ is called when entering the with block, and __exit__ is called when exiting the block.

These are called no matter what, so if an exception is raised, __exit__ will still be called.

If you have no custom code you need to run, just want to make sure something is closed, you could rewrite the above example as

with open("file.txt") as file:
    file.write("Hello, world!\n")
    file.write("Second Write")

Any exception raised in the with block will be caught and passed to __exit__. file's __exit__ method will then close the file.

Other common uses of context managers are:

  • with db_connection: - Ensures database connection is closed when the block exits.
  • with db.transaction.atomic(): - Ensure that a block of code is run within a database transaction so changes can be rolled back if an exception is raised.

Writing Your Own Context Managers

As mentioned, if a Python object has __enter__ and __exit__ methods, it can be used as a context manager.

You can write your own context managers by implementing these methods.

class MyContextManager:
    def __enter__(self):
        print("entering block")
        # code to run when entering the block

        # return value is assigned to the variable in the `as` clause
        return self

    def __exit__(self, exc_type, exc_value, traceback):
        # code to run when exiting the block
        # exc_type, exc_value, and traceback are the exception info
        # if an exception was raised, otherwise they are None
        if exc_type is not None:
            print("exception was raised")
        else:
            print("exiting normally")

        # return True to suppress the exception propagating
        return True
with MyContextManager() as context:
    print("inside block")
entering block
inside block
exiting normally
with MyContextManager() as context:
    7 / 0
entering block
exception was raised

Tips / Further Reading

Generators

Generators are a special type of function that can be re-entered.

An incredibly powerful tool for writing efficient code, especially when dealing with large amounts of data.

Iterables

Recall that Python objects use dunder methods to implement most of their functionality.

You may recall that when implementing a class you can define __getitem__ and __setitem__ to make your class subscriptable.

ll = [1, 2, 3]
print(ll[0])
print(ll.__getitem__(0))  # you wouldn't write this, but it's the same thing

ll[0] = 4
ll.__setitem__(0, 4)  # you wouldn't write this, but it's the same thing
1
1

A for loop is similarly syntactic sugar for calling two other methods: __iter__ and __next__.

ll = [1, 2, 3, 4]
for item in ll:
    print(item)
1
2
3
4
ll = [1, 2, 3, 4]
iterator = ll.__iter__()
while True:
    try:
        item = iterator.__next__()
    except StopIteration:
        break
    print(item)
1
2
3
4
  • __iter__ returns an iterator object, a special intermediary object that tracks the current position in the iterable.
  • __next__ returns the next item in the iterable, and raises StopIteration when there are no more items.

You could write your own classes that implement these methods to make them iterable. But today we’ll look at another way to make iterables: generators.

Motivation for Generators

Let’s say that you want to write a function that will return many values, but you only intend for one to be used at a time.

def permute(word):
    if len(word) == 1:
        return [word]
    else:
        result = []
        for i in range(len(word)):
            for perm in permute(word[:i] + word[i + 1 :]):
                result.append(word[i] + perm)
        return result
permute("abc")
['abc', 'acb', 'bac', 'bca', 'cab', 'cba']

Great, but the size of the returned list will be \(n!\), where \(n\) is the length of the string.

It’s also possible we’re searching for a particular permutation, so we won’t actually need all of the results & it’d be nice to stop early.

results = permute("too long")
print(len(results))
40320

Generators

Often, we only need one item at a time, and we don’t want to store all of the results in memory.

This is the case with a lot of data processing tasks, where we might have millions of records, but only need to process one at a time.

Generators are special functions that return an iterator.

Let’s take a look at range:

def stop_cond(x):
    # contrived stop condition, perhaps you're searching for a value that fits some criteria:w
    return x == 17


for x in range(10_000_000):
    print(x)
    # we don't actually need all of the values
    if stop_cond(x):
        print("found it!")
        break
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
found it!

If range returned a list, it would need to allocate a large list, which is both slower and more memory intensive.

Instead range’s implementation looks something like this:

# simplified form with one parameter
def range(n):
    i = 0
    while i < n:
        yield i
        i += 1

yield is a special keyword that returns a value from the function, but doesn’t exit the function the way return does.

When the function is called again, it will continue from where it left off.

Let’s rewrite permute as a generator:

def ipermute(word):
    if len(word) == 1:
        yield word
    else:
        for i in range(len(word)):
            for perm in permute(word[:i] + word[i + 1 :]):
                yield word[i] + perm
for perm in ipermute("abc"):
    print(perm)
abc
acb
bac
bca
cab
cba

Generator Expressions

You can use generator expressions to create generators without having to write a function.

squares = (x**2 for x in range(1000000))

This looks like a list comprehension, but has parentheses instead of brackets.

It creates a generator that will return the squares of the numbers from 0 to 999999.

itertools

The itertools module contains many useful functions for working with iterators, all of which are implemented as generators.

Useful functions include:

  • itertools.permutations - permutations of an iterable
  • itertools.combinations - combinations without replacement
  • itertools.product - cartesian product of multiple iterables (like nested for loops)
  • itertools.chain - concatenate iterators
  • itertools.islice - slice an iterator the way you would a list
  • itertools.groupby - group items by a key
  • itertools.tee - create multiple iterators from one

Typing

The biggest change to Python in recent years is the addition of type annotations.

Motivation

Python is a dynamically typed language, which means that the type of a variable is determined at runtime.

It also means the type can change:

x = 1
x = "hello"  # no error

This is a common source of bugs, since it can be difficult to keep track of what type a variable is.

x = f() # f used to return an int, but now returns a string

Static Typing

Many languages require variable definitions and function signatures to include type annotations.

// C
int f(int x) {
    return x + 1;
}
// Rust
fn f(x: i32) -> i32 {
    x + 1
}

This is called static typing, because the type is checked at compile time.

Type Annotations

Python 3.5 introduced type annotations, which are optional type hints that can be added to your code. Evey version of Python since 3.5 has added new features to the type system, but as of 3.10 many of the rough edges have been smoothed out.

def f(x: int) -> int:
    return x + 1

Two new pieces of syntax:

  • After a variable definition (typically a function parameter) you can add a colon and the type.
  • Return type annotations can be placed after the closing parenthesis of the function signature with the -> int syntax.

Types

You can annotate with any of the built-in types:

  • int
  • float
  • str
  • bool
  • None
  • etc.

The compound types (features added in Python 3.7-3.9):

  • list
  • dict
  • set
  • tuple

These allow for annotating the type of the elements in the container:

def f(x: list[int]) -> dict[str, int]:
    return {str(i): i for i in x}
  • list[int] - a list of ints
  • dict[str, int] - a dictionary with str keys and int values
  • tuple[int, str] - a 2-tuple with an int and a str
  • set[tuple[int, int, int]] - set of 3-tuples of ints

And finally, there are a lot of helper types in the typing module:

  • typing.Any - any type
  • typing.Optional[int] - an int or None
  • typing.Union[int, str] - an int or a str
  • typing.Callable[[int, str], bool] - a function that takes an int and a str and returns a bool

You can also union types together with | (as of Python 3.10):

def f(x: int | str) -> int | str:
    return x

This also works as an alternate syntax for Optional:

def f(x: int | None) -> int | None:
    return x

Type Checking

One thing to be aware of: these don’t do anything! They’re just hints for the programmer.

# bad type annotations
def f(x: list) -> str:
    return {"a": 1}

f(27.5) 

Every type annotation is wrong in the above example, but Python will not complain at “compile” time nor at runtime.

If you want to check your types, you can use a tool like mypy:

https://mypy-lang.org/

Running mypy on the above code will give you output like:

$ mypy test.py
test.py:3: error: Incompatible return value type (got "Dict[str, int]", expected "str")
test.py:5: error: Argument 1 to "f" has incompatible type "float"; expected "list"
Found 2 errors in 1 file (checked 1 source file)

Runtime Type Checking

Some libraries, such as the built in dataclasses module, pydantic, FastAPI, and typer are starting to use type annotations for runtime type checking.

from pydantic import BaseModel

class User(BaseModel):
    id: int
    name: str
    email: str

# these are type-checked
user = User(id=1, name="Sarah Connor", email="sarah@hotmail.com")
try:
    # note: id will be coerced to string since types are compatible
    user = User(id=1, name="Sarah Connor", email=None)
except Exception as e:
    print(e)
1 validation error for User
email
  none is not an allowed value (type=type_error.none.not_allowed)

This allows you to catch errors earlier, and can result in less boilerplate code.

More on Types

You’ll definitely encounter type annotations in library documentation, and in more modern codebases.

Norms around their usage are evolving, but it’s worth getting into the habit of using them. It can make your code more clear and easier to maintain. It can help you find bugs before they happen or more easily reason about expected behavior in an unfamiliar codebase.

More on typing: https://docs.python.org/3/library/typing.html

If you’re using VSCode’s Python extension, it integrates nicely with type extensions and can be configured to warn you about type errors and optionally run tools like mypy to check your types.

Conclusion

A common symptom of the “intermediate” stage of knowing a language is that people tend to overuse features that are available to them.

Please keep in mind that there’s nothing inherently better about choosing these over the alternatives, strive to make your code readable and maintainable above all else.

More Python

  • Functional Programming - decorators, functools
  • Metaclasses
  • Async I/O - asyncio, async/await
  • Bridging Python to other languages - C API, Cython, CFFI, PyO3

Next Time: Interactive Debugging Lab

I’ll be sending out some setup instructions in advance, since I’d like the next talk to be more interactive.