Type evaluation

Type evaluation is a mechanism for replacing complex overloads and version checks. It provides a restricted subset of Python that can be executed by type checkers to customize the behavior of a particular function.

Motivation

Consider the definition of round() in typeshed:

@overload
def round(number: SupportsRound[Any]) -> int: ...
@overload
def round(number: SupportsRound[Any], ndigits: None) -> int: ...
@overload
def round(number: SupportsRound[_T], ndigits: SupportsIndex) -> _T: ...

With type evaluation, this could instead be written as:

@evaluated
def round(number: SupportsRound[_T], ndigits: SupportsIndex | None = None):
    if ndigits is None:
        return int
    else:
        return _T

This makes it easier to see at a glance what the difference is between various overloads.

Other features of type evaluation, as proposed here, include customizable error messages, branching on the type of an argument, and branching on whether an argument was provided as a positional or keyword argument.

Type evaluation functions can replace most complex overloads with simpler, more readable code. They solve a number of problems:

  • Type evaluation functions provide ways to implement several type system features that have been previously requested, including:

    • Marking a function, parameter, or parameter type as deprecated.

    • Accepting Sequence[str] but not str.

    • Checking whether two generic arguments are overlapping

  • Error messages involving overloads are often hard to read. Type evaluation functions enable the author of a function to provide custom error messages that clearly point out the issue in the user’s code.

  • Complex overloads can be difficult to understand and write. Type evaluation functions provide a more natural interface that is closer to how such functions are written at runtime.

  • The precise behavior of overloads is not specified and varies across type checkers. The behavior of type evaluation functions is more precisely specified.

Specification

This section specifies how type evaluation works, without commentary. Discussion, with motivating use cases, is provided in the “Discussion” section below. The examples in this section are meant to clarify the semantics only.

Type-evaluated functions may be declared both at runtime and in stub files, similar to the existing @overload mechanism. At runtime, the evaluated function must immediately precede the function implementation:

@evaluated
def round(number: SupportsRound[_T], ndigits: SupportsIndex | None = None):
    if ...

def round(number, ndigits=None):
    return number.__round__(ndigits)

In stubs, the implementation is omitted.

When a type checker encounters a call to a function for which a type evaluation has been provided, it should do the following:

  • Validate that the arguments to the call are compatible with the type annotations on the parameters to the evaluation function, as with a normal call.

  • Symbolically evaluate the body of the type evaluation until it reaches a return statement, which provides the type that the call should return. During this symbolic evaluation, each argument is set to the value it has at the call site that is being evaluated.

  • If execution reached a return statement, return the type provided by that statement. Otherwise, return the type set in the evaluation function’s return annotation, or Any if there is no return annotation.

Type checkers are encouraged to provide a strictness option that produces an error if an evaluation function is missing a type annotation on a parameter or return type. However, no error should be provided if the return annotation is missing and all branches (including error branches) return a type.

The default value of a parameter to an evaluation function may be either ... or any value that is valid inside Literal[...]. If an argument with default X is not provided in a call, the type of the argument within the evaluation function is Literal[X]. If the default is ..., the type is the parameter’s annotation instead.

Simple examples to demonstrate the semantics:

@evaluated
def always_returns(x: int):
    return str

always_returns("x")  # error: "x" is not an int
always_returns()  # error: not enough arguments
reveal_type(always_returns(1))  # str

@evaluated
def always_errors(x: int):
    show_error("error")

x = always_errors(1)  # error
reveal_type(x)  # Any

@evaluated
def always_errors_with_type(x: int) -> str:
    show_error("error")

x = always_errors(1)  # error
reveal_type(x)  # str

@evaluated
def with_defaults(x: int = ..., y: int = 1) -> None:
    reveal_type(x)
    reveal_type(y)

with_defaults()  # x is "int", y is "Literal[1]"
with_defaults(1)  # x and y are both "Literal[1]"

Supported features

The body of a type evaluation uses a restricted subset of Python. The only supported features are:

  • if statements and else blocks. These can only contain conditions of the form specified below.

  • return statements with return values that are interpretable as type annotations. This indicates the type that the function returns in a particular condition.

  • pass statements, which do nothing.

  • Calls to show_error(), which cause the type checker to emit an error. These are discussed further below.

  • Calls to reveal_type(arg), where arg is one of the arguments to the type evaluation function. These cause the type checker to emit a message showing the current type of arg. This is a debugging feature.

Conditions in if statements may contain:

  • A call to one of the following functions, which are covered in more detail below:

    • is_provided(), which returns whether a parameter was explicitly provided in a call.

    • is_positional(), which returns whether a parameter was provided through a positional argument.

    • is_keyword(), which returns whether a parameter was provided through a keyword argument.

    • is_of_type(), which returns whether a parameter is of a particular type.

  • Expressions of the form arg <op> <constant>, where <op> is one of is, is not, ==, or !=. This is equivalent to (not) is_of_type(arg, Literal[<constant>], exclude_any=True). <constant> may be any value that is valid inside Literal (None, a string, a bool, an int, or an enum member).

  • Version and platform checks that are otherwise valid in stubs, as specified in PEP 484.

  • Multiple conditions combined with and or or.

  • A negation of another condition with not.

show_error()

The show_error() special function has the following signature:

def show_error(message: str, /, *, argument: Any | None = ...): ...

The message parameter must be a string literal. Calls to this function cause the type checker to emit an error that includes the given message. Execution continues past the show_error() call as normal.

If the argument parameter is provided, it must be one of the parameters to the function, indicating the parameter that is causing the error. The type checker may use this information to produce a more precise error (for example, by pointing the error caret at the specified argument in the call site).

is_provided(), is_positional(), and is_keyword()

These special functions have the following signatures:

def is_provided(arg: Any, /) -> bool: ...
def is_positional(arg: Any, /) -> bool: ...
def is_keyword(arg: Any, /) -> bool: ...

arg must be one of the parameters to the function. is_provided() returns True if the parameter was explicitly provided in the call; that is, the default value was not used. Similarly, is_positional() returns True if the parameter was provided as a positional argument, and is_keyword() returns True if the parameter was provided as a keyword argument.

Parameters in Python can be provided in three ways, which we call argument kinds for the purpose of this specification:

  • POSITIONAL: at the call site, either a single positional argument or a variadic one (*args)

  • KEYWORD: at the call site, either a sinngle keyword argument or a variadic one (**kwargs)

  • DEFAULT: no value provided at the call site; the default defined in the function is used

Static analyzers must add a fourth kind in the presence of calls with *args and **kwargs:

  • UNKNOWN: the kind cannot be statically determined. This can happen in the following situations:

    • A positional-only parameter with a default in a call with *args of unknown size.

    • A keyword-only parameter with a default in a call with **kwargs of unknown size.

    • A positional-or-keyword parameter that matches either of the above conditions.

    • A positional-or-keyword parameter (with or without a default) in a call with both *args and **kwargs.

The three special functions map to these kinds as follows:

  • is_provided(): kind is POSITIONAL or KEYWORD

  • is_positional(): kind is POSITIONAL

  • is_keyword(): kind is KEYWORD

Thus, there is no way to distinguish between DEFAULT and UNKNOWN, and a parameter for which is_provided() returns False in the type evaluator may actually be provided at runtime.

For variadic parameters (*args and **kwargs), the kind is either DEFAULT if no arguments are provided to the parameter, or either POSITIONAL (for *args) or KEYWORD (for **kwargs) if arguments may be provided. If the type checker can prove that a variadic argument is empty, is_provided() may return False. (For example, given a definition def f(*args) and a call f(*()), is_provided(args) may return False.)

Examples:

@evaluated
def reject_arg(arg: int = 0) -> None:
    if is_provided(arg):
        show_error("error")

args: Any = ...
kwargs: Any = ...
reject_arg()  # ok
reject_arg(0)  # error
reject_arg(arg=0)  # error
reject_arg(*args)  # ok
reject_arg(**kwargs)  # ok

@evaluated
def reject_star_args(*args: int) -> None:
    if is_provided(args):
        show_error("error")

reject_star_args()  # ok
reject_star_args(1)  # error
reject_star_args(*(1,))  # error
reject_star_args(*())  # may error, depending on type checker

@evaluated
def reject_star_kwargs(**kwargs: int) -> None:
    if is_provided(kwargs):
        show_error("error")

reject_star_kwargs()  # ok
reject_star_kwargs(x=1)  # error
reject_star_kwargs(**{"x": 1})  # error
reject_star_args(**{})  # may error, depending on type checker

@evaluated
def reject_keyword(arg: int = 0) -> None:
    if is_keyword(arg):
        show_error("error")

reject_keyword()  # ok
reject_keyword(0)  # ok
reject_keyword(arg=0)  # error
reject_keyword(*args)  # ok
reject_keyword(**kwargs)  # ok

@evaluated
def reject_positional(arg: int = 0)-> None:
    if is_positional(arg):
        show_error("error")

reject_keyword()  # ok
reject_keyword(0)  # error
reject_keyword(arg=0)  # ok
reject_keyword(*args)  # ok
reject_keyword(**kwargs)  # ok

@evaluated
def invalid(arg: object) -> None:
    if is_provided(x):  # error, not a function parameter
        show_error("error")

is_of_type()

The special is_of_type() function has the following signature:

def is_oF_type(arg: object, type: Any, /, *, exclude_any: bool = True) -> bool: ...

arg must be one of the parameters to the function and type must be a form that the type checker would accept in a type annotation.

If exclude_any is False, is_of_type(x, T) returns true if x is compatible with T; that is, if the type checker would accept an assignment _: T = x.

If the exclude_any parameter is True (the default), normal type checking rules are modified so that Any is no longer compatible with any other type, but only with another Any. All other types are still compatible with Any.

Examples:

@evaluated
def length_or_none(s: str | None = None):
    if is_of_type(s, str, exclude_any=False):
        return int
    else:
        return None

any: Any = ...
opt: int | None = ...
reveal_type(length_or_none("x"))  # int
reveal_type(length_or_none(None))  # None
reveal_type(length_or_none(opt))  # int | None
reveal_type(length_or_none(any))  # int

@evaluated
def length_or_none2(s: str | None):
    if is_of_type(s, str):
        return int
    elif is_of_type(s, None):
        return None
    else:
        return Any

reveal_type(length_or_none2("x"))  # int
reveal_type(length_or_none2(None))  # None
reveal_type(length_or_none2(opt))  # int | None
reveal_type(length_or_none2(any))  # Any

@evaluated
def nested_any(s: Sequence[Any]):
    if is_of_type(s, str):
        show_error("error")
    elif is_of_type(s, Sequence[str]):
        return str
    else:
        return int

anyseq: Sequence[Any] = ...
nested_any("x")  # error
reveal_type(nested_any(["x"]))  # str
reveal_type(nested_any([1]))  # int
reveal_type(nested_any(any))  # int
reveal_type(nested_any(anyseq))  # int

Interaction with unions

Type checkers should apply normal type narrowing rules to arguments that are of Union types. If only some members of a Union match a condition, both branches of the conditional are taken, with the parameter type narrowed appropriately in each case. The return type of the function is the union of the two branches.

For example:

@evaluated
def switch_types(arg: str | int):
    if is_of_type(arg, str):
        return int
    else:
        return str

reveal_type(switch_types(1))  # str
reveal_type(switch_types("x"))  # int
union: int | str
reveal_type(switch_types(union))  # int | str

Generic evaluators

If any type variables appear in the parameters of the type evaluation function, the type checker should first solve those and use the solution in the body of the function:

@evaluated
def identity(x: T):
    return T

reveal_type(evaluated(int()))  # int

As a result, is_of_type() checks that use a type variable work:

@evaluated
def safe_upcast(typ: Type[T1], value: object):
    if is_of_type(value, T1):
        return T1
    show_error("unsafe cast")
    return Any

reveal_type(safe_upcast(object, 1))  # object
reveal_type(safe_upcast(int, 1))  # int
safe_upcast(str, 1)  # error

Type compatibility

The type of an evaluated function is compatible with a Callable with the same arguments and returning the Union of the possible return types, and with any Callable for which the evaluation function would return a compatible type given the same arguments.

Examples:

@evaluated
def maybe_path(path: str | None):
    if path is None:
        return None
    else:
        return Path

_: Callable[[str | None], Path | None] = maybe_path  # ok
_: Callable[[None], None] = maybe_path  # ok
_: Callable[[str], Path] = maybe_path  # ok
_: Callable[[str | None], Path] = maybe_path  # error
_: Callable[[str], Path | None] = maybe_path  # ok
_: Callable[[Literal["x"]], Path] = maybe_path  # ok

Runtime behavior

At runtime, the @evaluated decorator returns a dummy function that throws an error when called, similar to @overload. In order to support dynamic type checkers, it also stores the original function, keyed by its fully qualified name.

A helper function is provided to retrieve all registered evaluation functions for a given fully qualified name:

def get_type_evaluations(
    fully_qualified_name: str
) -> Sequence[Callable[..., Any]]: ...

For example, if method B.c in module a has an evaluation function, get_type_evaluations("a.B.c") will retrieve it.

Dummy implementations are provided for the various helper functions (is_provided(), is_positional(), is_keyword(), is_of_type(), and show_error()). These throw an error if called at runtime.

The reveal_type() function has a runtime implementation that simply returns its argument.

Discussion

Interaction with Any

The below is an evaluation function for a simplified version of the open() builtin:

@evaluated
def open(mode: str):
    if is_of_type(mode, Literal["r", "w"]):
        return TextIO
    elif is_of_type(mode, Literal["rb", "wb"]):
        return BinaryIO
    else:
        return IO[Any]

What should open() return if the type of the mode argument is Any? With the equivalent code expressed using overloads, existing type checkers do not agree: pyright picks the first overload that matches and returns int, since Any is compatible with None; mypy and pyanalyze see that multiple overloads might match and return Any. There are good reasons for both choices, as discussed here by Eric Traut. In particular, mypy’s behavior is more sound for a type checker, but pyright’s behavior helps generate better autocompletion suggestions in a language server.

Type evaluation functions potentially have the same ambiguity, so in order to provide predictable behavior across type checkers, we need to specify a single behavior.

As specified above, our choice is to treat Any specially by default within evaluation functions, making it incompatible with other types, both within is/== comparisons and within the is_of_type primitive. This behavior makes it easiest to write evaluation functions that read naturally and behave as desired. In particular, this choice makes open(Any) return IO[Any], which is both the most intuitive and the most useful result.

The most natural alternative is to make is_of_type() follow normal type compatibility rules, where Any is compatible with everything. But this would create confusing behavior for evaluation functions like the one for open():

  • open() would return TextIO if mode is Any, which is too precise in general.

  • The order of the BinaryIO and TextIO checks would matter: the function would behave differently if the two checks were flipped. This would be a subtle behavior that is not obvious to readers of the code.

  • There would be no obvious way to provide a customized fallback behavior for Any. Technically, a check like is_of_type(mode, Literal["r"]) and is_of_type(mode, Literal["w"]) could be used to check for Any (only Any is compatible with both literals), but this would be obscure and unreadable.

  • It would be difficult to show an error for a particular parameter value. For example, a stub for open() might want to show a warning if the deprecated rU mode is used. The obvious way to do that would be to write if mode == "rU": show_error(...), but if this returned true for Any, we would show the error for mode: Any.

As an additional example, consider functions that take some object or None and return either None or a transformed version of the object, like this:

@evaluated
def maybe_path(path: str | None):
    if path is None:
        return None
    else:
        return Path

Functions of this form are fairly common, and it is natural to write them with the trivial branch (None) first, both in the implementation and in the evaluation function. But if path is None would be true for Any, the evaluation function would return None, which is bad both for type checkers and for autocomplete suggestions.

One downside of this behavior is that type checkers may incorrectly flag is None checks after a maybe_path() call as unreachable. However, such checks are usually only enabled in a strict mode, and Any should be rare in strictly typed code. Type checkers could also provide a mechanism that labels types derived from an evaluation function that used Any to disable diagnostics about unreachable code.

Another alternative would be to use a mechanism similar to mypy-style overload resolution: conditions that match due to Any would essentially match neither branch and simply return Any. This behavior would avoid returning any overly precise types, but it would be useless for autocompletion suggestions and would remove a lot of useful type precision. For example, there would be no way for the open() evaluation function to produce IO[Any].

Argument kind functions

The three argument kind functions is_provided(), is_positional(), and is_keyword() are useful in various ways:

  • Functions implemented in C sometimes change behavior depending on the presence of an argument, without a meaningful default. For example, dict.pop(key) returns the key’s value type (or else it raises an exception), but dict.pop(key, default) returns either the value type or the type of default. Currently overloads are necessary to represent this behavior, but is_provided() provides an alternative.

  • It is common for new versions of Python to add or remove parameters. For example, zip() gained a strict= keyword argument in Python 3.10. Using is_provided() with a sys.version_info check, we can provide an error if the parameter is used in an older version, without duplicating the entire function definition.

  • Similarly, new versions of Python often change parameters from positional-or-keyword to positional-only or vice versa. Version checks can be used with is_positional() or is_keyword() to reflect such changes in the stub.

  • Library authors who want to evolve an API sometimes want to make a function parameter keyword-only. An evaluation function can be used to warn users who pass the parameter positionally without changing the runtime parameter kind, so that users have time to adapt before the runtime code is broken.

As an example, this is the current implementation of sum() in typeshed:

if sys.version_info >= (3, 8):
    @overload
    def sum(__iterable: Iterable[_T]) -> _T | Literal[0]: ...
    @overload
    def sum(__iterable: Iterable[_T], start: _S) -> _T | _S: ...

else:
    @overload
    def sum(__iterable: Iterable[_T]) -> _T | Literal[0]: ...
    @overload
    def sum(__iterable: Iterable[_T], __start: _S) -> _T | _S: ...

This is how it could be implemented using @evaluated:

@evaluated
def sum(__iterable: Iterable[_T], start: _S = ...):
    if not is_provided(start):
        return _T | Literal[0]
    if sys.version_info < (3, 8) and is_keyword(start):
        show_error("start is a positional-only argument in Python <3.8", argument=start)
    return _T | _S

Generic evaluators

The specification for generic evaluators allows creating an evaluator that checks whether two types have any overlap:

T1 = TypeVar("T1")
T2 = TypeVar("T2")

@evaluated
def safe_contains(elt: T1, container: Container[T2]) -> bool:
    if not is_of_type(elt, T2) and not is_of_type(container, Container[T1]):
        show_error("Element cannot be a member of container")

lst: List[int]
safe_contains("x", lst)  # error
safe_contains(True, lst)  # ok (bool is a subclass of int)
safe_contains(object(), lst)  # ok (List[int] is a subclass of Container[object])

Thus, type evaluation provides a way to implement checks similar to mypy’s strict equality flag directly in stubs.

Compatibility

The proposal is fully backward compatible.

Type evaluation functions are going to be most frequently useful in library stubs, where it is often important that multiple type checkers can parse the stub. In order to unblock usage of the new feature in stubs, type checker authors could simply ignore the body of evaluation functions and rely on the signature. This would still allow other type checkers to fully use the evaluation function.

Possible extensions

The following features may be useful, but are deferred for now for simplicity.

Error categories

It may be useful to provide hints to the type checker about the severity of a show_error() call. For example, deprecation warnings could be marked so that the user can control whether to show them.

One possibility is to add a keyword-only argument category: str = ... to show_error(). We would specify some standard categories that can be used in typeshed:

  • deprecation (for deprecated behavior)

  • python_version (for wrong Python version)

  • platform (for wrong sys.platform)

  • warning (for miscellaneous non-blocking issues)

Type checkers could add support for additional categories as desired. Other type checkers would be expected to silently ignore unrecognized category strings.

Reusable error messages

Because show_error() requires a string literal as the message, typeshed would contain a lot of hardcoded string messages about version changes.

Some possible solutions include:

  • Allow the message to be a variable of Literal type instead of a string literal. However, this would not allow customizing an error message to include e.g. the name of the argument or the Python version when some behavior changed.

  • Allow the message to be a call to .format() on a string literal or Literal variable, where all the arguments are function arguments or literals: show_error(NEW_IN_VERSION.format(arg, "3.10")).

  • Allow the message to be a call to another evaluation function that returns a string literal instead of a type. This would allow even more complex logic for emitting the error message.

The last option could look like this:

@evaluated
def added_in_py_version(feature: str, version: str):
    return f"{feature} was added in Python {version}"

def zip(strict: bool = False):
    if is_provided(strict) and sys.version_info < (3, 10):
        show_error(
            added_in_py_version("strict", "3.10"),
            argument="strict"
        )

Adding attributes

A common pattern in type checker plugins is for the plugin to add some extra attribute to the object. For example, @functools.total_ordering inserts various dunder methods into the class it decorates.

We could add an add_attributes() primitive that given a type and a dictionary of attributes, modifies the type to add these attributes.

Usage could look like this:

@evaluated
def total_ordering(cls: Type[T]):
    return add_attributes(
        cls,
        {"__eq__": Callable[[T, T], bool]}
    )

Status

A partial implementation of this feature is available in pyanalyze:

from pyanalyze.extensions import evaluated, is_provided

@evaluated
def simple_evaluated(x: int, y: str = ""):
    if is_provided(y):
        return int
    else:
        return str

def simple_evaluated(*args: object) -> Union[int, str]:
    if len(args) >= 2:
        return 1
    else:
        return "x"

Currently unsupported features include:

  • Type compatibility for evaluated functions.

  • Overloaded evaluated functions.

Areas that need more thought include:

  • Interaction with overloads. It should be possible to register multiple evaluation functions for a function, treating them as overloads.

  • Interaction with __init__ and self types. How does an eval function set the self type of a function? Perhaps we can have the return type have special meaning just for __init__ methods.