# Type evaluation Type evaluation is a mechanism for replacing complex overloads and version checks. It provides a restricted subset of Python that can be executed by type checkers to customize the behavior of a particular function. ## Motivation Consider the definition of `round()` in typeshed: @overload def round(number: SupportsRound[Any]) -> int: ... @overload def round(number: SupportsRound[Any], ndigits: None) -> int: ... @overload def round(number: SupportsRound[_T], ndigits: SupportsIndex) -> _T: ... With type evaluation, this could instead be written as: @evaluated def round(number: SupportsRound[_T], ndigits: SupportsIndex | None = None): if ndigits is None: return int else: return _T This makes it easier to see at a glance what the difference is between various overloads. Other features of type evaluation, as proposed here, include customizable error messages, branching on the type of an argument, and branching on whether an argument was provided as a positional or keyword argument. Type evaluation functions can replace most complex overloads with simpler, more readable code. They solve a number of problems: - Type evaluation functions provide ways to implement several type system features that have been previously requested, including: - Marking a function, parameter, or parameter type as deprecated. - Accepting `Sequence[str]` but not `str`. - Checking whether two generic arguments are overlapping - Error messages involving overloads are often hard to read. Type evaluation functions enable the author of a function to provide custom error messages that clearly point out the issue in the user's code. - Complex overloads can be difficult to understand and write. Type evaluation functions provide a more natural interface that is closer to how such functions are written at runtime. - The precise behavior of overloads is not specified and varies across type checkers. The behavior of type evaluation functions is more precisely specified. ## Specification This section specifies how type evaluation works, without commentary. Discussion, with motivating use cases, is provided in the "Discussion" section below. The examples in this section are meant to clarify the semantics only. Type-evaluated functions may be declared both at runtime and in stub files, similar to the existing `@overload` mechanism. At runtime, the evaluated function must immediately precede the function implementation: @evaluated def round(number: SupportsRound[_T], ndigits: SupportsIndex | None = None): if ... def round(number, ndigits=None): return number.__round__(ndigits) In stubs, the implementation is omitted. When a type checker encounters a call to a function for which a type evaluation has been provided, it should do the following: - Validate that the arguments to the call are compatible with the type annotations on the parameters to the evaluation function, as with a normal call. - Symbolically evaluate the body of the type evaluation until it reaches a `return` statement, which provides the type that the call should return. During this symbolic evaluation, each argument is set to the value it has at the call site that is being evaluated. - If execution reached a `return` statement, return the type provided by that statement. Otherwise, return the type set in the evaluation function's return annotation, or `Any` if there is no return annotation. Type checkers are encouraged to provide a strictness option that produces an error if an evaluation function is missing a type annotation on a parameter or return type. However, no error should be provided if the return annotation is missing and all branches (including error branches) return a type. The default value of a parameter to an evaluation function may be either `...` or any value that is valid inside `Literal[...]`. If an argument with default `X` is not provided in a call, the type of the argument within the evaluation function is `Literal[X]`. If the default is `...`, the type is the parameter's annotation instead. Simple examples to demonstrate the semantics: @evaluated def always_returns(x: int): return str always_returns("x") # error: "x" is not an int always_returns() # error: not enough arguments reveal_type(always_returns(1)) # str @evaluated def always_errors(x: int): show_error("error") x = always_errors(1) # error reveal_type(x) # Any @evaluated def always_errors_with_type(x: int) -> str: show_error("error") x = always_errors(1) # error reveal_type(x) # str @evaluated def with_defaults(x: int = ..., y: int = 1) -> None: reveal_type(x) reveal_type(y) with_defaults() # x is "int", y is "Literal[1]" with_defaults(1) # x and y are both "Literal[1]" ### Supported features The body of a type evaluation uses a restricted subset of Python. The only supported features are: - `if` statements and `else` blocks. These can only contain conditions of the form specified below. - `return` statements with return values that are interpretable as type annotations. This indicates the type that the function returns in a particular condition. - `pass` statements, which do nothing. - Calls to `show_error()`, which cause the type checker to emit an error. These are discussed further below. - Calls to `reveal_type(arg)`, where arg is one of the arguments to the type evaluation function. These cause the type checker to emit a message showing the current type of `arg`. This is a debugging feature. Conditions in `if` statements may contain: - A call to one of the following functions, which are covered in more detail below: - `is_provided()`, which returns whether a parameter was explicitly provided in a call. - `is_positional()`, which returns whether a parameter was provided through a positional argument. - `is_keyword()`, which returns whether a parameter was provided through a keyword argument. - `is_of_type()`, which returns whether a parameter is of a particular type. - Expressions of the form `arg `, where `` is one of `is`, `is not`, `==`, or `!=`. This is equivalent to `(not) is_of_type(arg, Literal[], exclude_any=True)`. `` may be any value that is valid inside `Literal` (`None`, a string, a bool, an int, or an enum member). - Version and platform checks that are otherwise valid in stubs, as specified in PEP 484. - Multiple conditions combined with `and` or `or`. - A negation of another condition with `not`. ### show_error() The `show_error()` special function has the following signature: def show_error(message: str, /, *, argument: Any | None = ...): ... The `message` parameter must be a string literal. Calls to this function cause the type checker to emit an error that includes the given message. Execution continues past the `show_error()` call as normal. If the `argument` parameter is provided, it must be one of the parameters to the function, indicating the parameter that is causing the error. The type checker may use this information to produce a more precise error (for example, by pointing the error caret at the specified argument in the call site). ### is_provided(), is_positional(), and is_keyword() These special functions have the following signatures: def is_provided(arg: Any, /) -> bool: ... def is_positional(arg: Any, /) -> bool: ... def is_keyword(arg: Any, /) -> bool: ... `arg` must be one of the parameters to the function. `is_provided()` returns True if the parameter was explicitly provided in the call; that is, the default value was not used. Similarly, `is_positional()` returns True if the parameter was provided as a positional argument, and `is_keyword()` returns True if the parameter was provided as a keyword argument. Parameters in Python can be provided in three ways, which we call _argument kinds_ for the purpose of this specification: - `POSITIONAL`: at the call site, either a single positional argument or a variadic one (`*args`) - `KEYWORD`: at the call site, either a sinngle keyword argument or a variadic one (`**kwargs`) - `DEFAULT`: no value provided at the call site; the default defined in the function is used Static analyzers must add a fourth kind in the presence of calls with `*args` and `**kwargs`: - `UNKNOWN`: the kind cannot be statically determined. This can happen in the following situations: - A positional-only parameter with a default in a call with `*args` of unknown size. - A keyword-only parameter with a default in a call with `**kwargs` of unknown size. - A positional-or-keyword parameter that matches either of the above conditions. - A positional-or-keyword parameter (with or without a default) in a call with both `*args` and `**kwargs`. The three special functions map to these kinds as follows: - `is_provided()`: kind is `POSITIONAL` or `KEYWORD` - `is_positional()`: kind is `POSITIONAL` - `is_keyword()`: kind is `KEYWORD` Thus, there is no way to distinguish between `DEFAULT` and `UNKNOWN`, and a parameter for which `is_provided()` returns False in the type evaluator may actually be provided at runtime. For variadic parameters (`*args` and `**kwargs`), the kind is either `DEFAULT` if no arguments are provided to the parameter, or either `POSITIONAL` (for `*args`) or `KEYWORD` (for `**kwargs`) if arguments may be provided. If the type checker can prove that a variadic argument is empty, `is_provided()` may return False. (For example, given a definition `def f(*args)` and a call `f(*())`, `is_provided(args)` may return False.) Examples: @evaluated def reject_arg(arg: int = 0) -> None: if is_provided(arg): show_error("error") args: Any = ... kwargs: Any = ... reject_arg() # ok reject_arg(0) # error reject_arg(arg=0) # error reject_arg(*args) # ok reject_arg(**kwargs) # ok @evaluated def reject_star_args(*args: int) -> None: if is_provided(args): show_error("error") reject_star_args() # ok reject_star_args(1) # error reject_star_args(*(1,)) # error reject_star_args(*()) # may error, depending on type checker @evaluated def reject_star_kwargs(**kwargs: int) -> None: if is_provided(kwargs): show_error("error") reject_star_kwargs() # ok reject_star_kwargs(x=1) # error reject_star_kwargs(**{"x": 1}) # error reject_star_args(**{}) # may error, depending on type checker @evaluated def reject_keyword(arg: int = 0) -> None: if is_keyword(arg): show_error("error") reject_keyword() # ok reject_keyword(0) # ok reject_keyword(arg=0) # error reject_keyword(*args) # ok reject_keyword(**kwargs) # ok @evaluated def reject_positional(arg: int = 0)-> None: if is_positional(arg): show_error("error") reject_keyword() # ok reject_keyword(0) # error reject_keyword(arg=0) # ok reject_keyword(*args) # ok reject_keyword(**kwargs) # ok @evaluated def invalid(arg: object) -> None: if is_provided(x): # error, not a function parameter show_error("error") ### is_of_type() The special `is_of_type()` function has the following signature: def is_oF_type(arg: object, type: Any, /, *, exclude_any: bool = True) -> bool: ... `arg` must be one of the parameters to the function and `type` must be a form that the type checker would accept in a type annotation. If `exclude_any` is False, `is_of_type(x, T)` returns true if `x` is compatible with `T`; that is, if the type checker would accept an assignment `_: T = x`. If the `exclude_any` parameter is True (the default), normal type checking rules are modified so that `Any` is no longer compatible with any other type, but only with another `Any`. All other types are still compatible with `Any`. Examples: @evaluated def length_or_none(s: str | None = None): if is_of_type(s, str, exclude_any=False): return int else: return None any: Any = ... opt: int | None = ... reveal_type(length_or_none("x")) # int reveal_type(length_or_none(None)) # None reveal_type(length_or_none(opt)) # int | None reveal_type(length_or_none(any)) # int @evaluated def length_or_none2(s: str | None): if is_of_type(s, str): return int elif is_of_type(s, None): return None else: return Any reveal_type(length_or_none2("x")) # int reveal_type(length_or_none2(None)) # None reveal_type(length_or_none2(opt)) # int | None reveal_type(length_or_none2(any)) # Any @evaluated def nested_any(s: Sequence[Any]): if is_of_type(s, str): show_error("error") elif is_of_type(s, Sequence[str]): return str else: return int anyseq: Sequence[Any] = ... nested_any("x") # error reveal_type(nested_any(["x"])) # str reveal_type(nested_any([1])) # int reveal_type(nested_any(any)) # int reveal_type(nested_any(anyseq)) # int ### Interaction with unions Type checkers should apply normal type narrowing rules to arguments that are of Union types. If only some members of a Union match a condition, both branches of the conditional are taken, with the parameter type narrowed appropriately in each case. The return type of the function is the union of the two branches. For example: @evaluated def switch_types(arg: str | int): if is_of_type(arg, str): return int else: return str reveal_type(switch_types(1)) # str reveal_type(switch_types("x")) # int union: int | str reveal_type(switch_types(union)) # int | str ### Generic evaluators If any type variables appear in the parameters of the type evaluation function, the type checker should first solve those and use the solution in the body of the function: @evaluated def identity(x: T): return T reveal_type(evaluated(int())) # int As a result, `is_of_type()` checks that use a type variable work: @evaluated def safe_upcast(typ: Type[T1], value: object): if is_of_type(value, T1): return T1 show_error("unsafe cast") return Any reveal_type(safe_upcast(object, 1)) # object reveal_type(safe_upcast(int, 1)) # int safe_upcast(str, 1) # error ### Type compatibility The type of an evaluated function is compatible with a `Callable` with the same arguments and returning the `Union` of the possible return types, and with any `Callable` for which the evaluation function would return a compatible type given the same arguments. Examples: @evaluated def maybe_path(path: str | None): if path is None: return None else: return Path _: Callable[[str | None], Path | None] = maybe_path # ok _: Callable[[None], None] = maybe_path # ok _: Callable[[str], Path] = maybe_path # ok _: Callable[[str | None], Path] = maybe_path # error _: Callable[[str], Path | None] = maybe_path # ok _: Callable[[Literal["x"]], Path] = maybe_path # ok ### Runtime behavior At runtime, the `@evaluated` decorator returns a dummy function that throws an error when called, similar to `@overload`. In order to support dynamic type checkers, it also stores the original function, keyed by its fully qualified name. A helper function is provided to retrieve all registered evaluation functions for a given fully qualified name: def get_type_evaluations( fully_qualified_name: str ) -> Sequence[Callable[..., Any]]: ... For example, if method `B.c` in module `a` has an evaluation function, `get_type_evaluations("a.B.c")` will retrieve it. Dummy implementations are provided for the various helper functions (`is_provided()`, `is_positional()`, `is_keyword()`, `is_of_type()`, and `show_error()`). These throw an error if called at runtime. The `reveal_type()` function has a runtime implementation that simply returns its argument. ## Discussion ### Interaction with Any The below is an evaluation function for a simplified version of the `open()` builtin: @evaluated def open(mode: str): if is_of_type(mode, Literal["r", "w"]): return TextIO elif is_of_type(mode, Literal["rb", "wb"]): return BinaryIO else: return IO[Any] What should `open()` return if the type of the `mode` argument is `Any`? With the equivalent code expressed using overloads, existing type checkers do not agree: pyright picks the first overload that matches and returns `int`, since `Any` is compatible with `None`; mypy and pyanalyze see that multiple overloads might match and return `Any`. There are good reasons for both choices, as discussed [here](https://github.com/microsoft/pyright/issues/2521#issuecomment-956823577) by Eric Traut. In particular, mypy's behavior is more sound for a type checker, but pyright's behavior helps generate better autocompletion suggestions in a language server. Type evaluation functions potentially have the same ambiguity, so in order to provide predictable behavior across type checkers, we need to specify a single behavior. As specified above, our choice is to treat `Any` specially by default within evaluation functions, making it incompatible with other types, both within `is`/`==` comparisons and within the `is_of_type` primitive. This behavior makes it easiest to write evaluation functions that read naturally and behave as desired. In particular, this choice makes `open(Any)` return `IO[Any]`, which is both the most intuitive and the most useful result. The most natural alternative is to make `is_of_type()` follow normal type compatibility rules, where `Any` is compatible with everything. But this would create confusing behavior for evaluation functions like the one for `open()`: - `open()` would return `TextIO` if `mode` is `Any`, which is too precise in general. - The order of the `BinaryIO` and `TextIO` checks would matter: the function would behave differently if the two checks were flipped. This would be a subtle behavior that is not obvious to readers of the code. - There would be no obvious way to provide a customized fallback behavior for `Any`. Technically, a check like `is_of_type(mode, Literal["r"]) and is_of_type(mode, Literal["w"])` could be used to check for `Any` (only `Any` is compatible with both literals), but this would be obscure and unreadable. - It would be difficult to show an error for a particular parameter value. For example, a stub for `open()` might want to show a warning if the deprecated `rU` mode is used. The obvious way to do that would be to write `if mode == "rU": show_error(...)`, but if this returned true for `Any`, we would show the error for `mode: Any`. As an additional example, consider functions that take some object or `None` and return either `None` or a transformed version of the object, like this: @evaluated def maybe_path(path: str | None): if path is None: return None else: return Path Functions of this form are fairly common, and it is natural to write them with the trivial branch (`None`) first, both in the implementation and in the evaluation function. But if `path is None` would be true for `Any`, the evaluation function would return `None`, which is bad both for type checkers and for autocomplete suggestions. One downside of this behavior is that type checkers may incorrectly flag `is None` checks after a `maybe_path()` call as unreachable. However, such checks are usually only enabled in a strict mode, and `Any` should be rare in strictly typed code. Type checkers could also provide a mechanism that labels types derived from an evaluation function that used `Any` to disable diagnostics about unreachable code. Another alternative would be to use a mechanism similar to mypy-style overload resolution: conditions that match due to `Any` would essentially match neither branch and simply return `Any`. This behavior would avoid returning any overly precise types, but it would be useless for autocompletion suggestions and would remove a lot of useful type precision. For example, there would be no way for the `open()` evaluation function to produce `IO[Any]`. ### Argument kind functions The three argument kind functions `is_provided()`, `is_positional()`, and `is_keyword()` are useful in various ways: - Functions implemented in C sometimes change behavior depending on the presence of an argument, without a meaningful default. For example, `dict.pop(key)` returns the key's value type (or else it raises an exception), but `dict.pop(key, default)` returns either the value type or the type of `default`. Currently overloads are necessary to represent this behavior, but `is_provided()` provides an alternative. - It is common for new versions of Python to add or remove parameters. For example, `zip()` gained a `strict=` keyword argument in Python 3.10. Using `is_provided()` with a `sys.version_info` check, we can provide an error if the parameter is used in an older version, without duplicating the entire function definition. - Similarly, new versions of Python often change parameters from positional-or-keyword to positional-only or vice versa. Version checks can be used with `is_positional()` or `is_keyword()` to reflect such changes in the stub. - Library authors who want to evolve an API sometimes want to make a function parameter keyword-only. An evaluation function can be used to warn users who pass the parameter positionally without changing the runtime parameter kind, so that users have time to adapt before the runtime code is broken. As an example, this is the current implementation of `sum()` in typeshed: if sys.version_info >= (3, 8): @overload def sum(__iterable: Iterable[_T]) -> _T | Literal[0]: ... @overload def sum(__iterable: Iterable[_T], start: _S) -> _T | _S: ... else: @overload def sum(__iterable: Iterable[_T]) -> _T | Literal[0]: ... @overload def sum(__iterable: Iterable[_T], __start: _S) -> _T | _S: ... This is how it could be implemented using `@evaluated`: @evaluated def sum(__iterable: Iterable[_T], start: _S = ...): if not is_provided(start): return _T | Literal[0] if sys.version_info < (3, 8) and is_keyword(start): show_error("start is a positional-only argument in Python <3.8", argument=start) return _T | _S ### Generic evaluators The specification for generic evaluators allows creating an evaluator that checks whether two types have any overlap: T1 = TypeVar("T1") T2 = TypeVar("T2") @evaluated def safe_contains(elt: T1, container: Container[T2]) -> bool: if not is_of_type(elt, T2) and not is_of_type(container, Container[T1]): show_error("Element cannot be a member of container") lst: List[int] safe_contains("x", lst) # error safe_contains(True, lst) # ok (bool is a subclass of int) safe_contains(object(), lst) # ok (List[int] is a subclass of Container[object]) Thus, type evaluation provides a way to implement checks similar to mypy's [strict equality](https://mypy.readthedocs.io/en/stable/command_line.html#cmdoption-mypy-strict-equality) flag directly in stubs. ## Compatibility The proposal is fully backward compatible. Type evaluation functions are going to be most frequently useful in library stubs, where it is often important that multiple type checkers can parse the stub. In order to unblock usage of the new feature in stubs, type checker authors could simply ignore the body of evaluation functions and rely on the signature. This would still allow other type checkers to fully use the evaluation function. ## Possible extensions The following features may be useful, but are deferred for now for simplicity. ### Error categories It may be useful to provide hints to the type checker about the severity of a `show_error()` call. For example, deprecation warnings could be marked so that the user can control whether to show them. One possibility is to add a keyword-only argument `category: str = ...` to `show_error()`. We would specify some standard categories that can be used in typeshed: - `deprecation` (for deprecated behavior) - `python_version` (for wrong Python version) - `platform` (for wrong sys.platform) - `warning` (for miscellaneous non-blocking issues) Type checkers could add support for additional categories as desired. Other type checkers would be expected to silently ignore unrecognized category strings. ### Reusable error messages Because `show_error()` requires a string literal as the message, typeshed would contain a lot of hardcoded string messages about version changes. Some possible solutions include: - Allow the message to be a variable of `Literal` type instead of a string literal. However, this would not allow customizing an error message to include e.g. the name of the argument or the Python version when some behavior changed. - Allow the message to be a call to `.format()` on a string literal or `Literal` variable, where all the arguments are function arguments or literals: `show_error(NEW_IN_VERSION.format(arg, "3.10"))`. - Allow the message to be a call to another evaluation function that returns a string literal instead of a type. This would allow even more complex logic for emitting the error message. The last option could look like this: @evaluated def added_in_py_version(feature: str, version: str): return f"{feature} was added in Python {version}" def zip(strict: bool = False): if is_provided(strict) and sys.version_info < (3, 10): show_error( added_in_py_version("strict", "3.10"), argument="strict" ) ### Adding attributes A common pattern in type checker plugins is for the plugin to add some extra attribute to the object. For example, `@functools.total_ordering` inserts various dunder methods into the class it decorates. We could add an `add_attributes()` primitive that given a type and a dictionary of attributes, modifies the type to add these attributes. Usage could look like this: @evaluated def total_ordering(cls: Type[T]): return add_attributes( cls, {"__eq__": Callable[[T, T], bool]} ) ## Status A partial implementation of this feature is available in pyanalyze: from pyanalyze.extensions import evaluated, is_provided @evaluated def simple_evaluated(x: int, y: str = ""): if is_provided(y): return int else: return str def simple_evaluated(*args: object) -> Union[int, str]: if len(args) >= 2: return 1 else: return "x" Currently unsupported features include: - Type compatibility for evaluated functions. - Overloaded evaluated functions. Areas that need more thought include: - Interaction with overloads. It should be possible to register multiple evaluation functions for a function, treating them as overloads. - Interaction with `__init__` and self types. How does an eval function set the self type of a function? Perhaps we can have the return type have special meaning just for `__init__` methods.