PEP 465 – A dedicated infix operator for matrix multiplication
- Author:
- Nathaniel J. Smith <njs at pobox.com>
- Status:
- Final
- Type:
- Standards Track
- Created:
- 20-Feb-2014
- Python-Version:
- 3.5
- Post-History:
- 13-Mar-2014
- Resolution:
- Python-Dev message
Table of Contents
- Abstract
- Specification
- Motivation
- Executive summary
- Background: What’s wrong with the status quo?
- Why should matrix multiplication be infix?
- Transparent syntax is especially crucial for non-expert programmers
- But isn’t matrix multiplication a pretty niche requirement?
- So
@
is good for matrix formulas, but how common are those really? - But isn’t it weird to add an operator with no stdlib uses?
- Compatibility considerations
- Intended usage details
- Implementation details
- Rationale for specification details
- Rejected alternatives to adding a new operator
- Discussions of this PEP
- References
- Copyright
Abstract
This PEP proposes a new binary operator to be used for matrix
multiplication, called @
. (Mnemonic: @
is *
for
mATrices.)
Specification
A new binary operator is added to the Python language, together with the corresponding in-place version:
Op | Precedence/associativity | Methods |
---|---|---|
@ |
Same as * |
__matmul__ , __rmatmul__ |
@= |
n/a | __imatmul__ |
No implementations of these methods are added to the builtin or standard library types. However, a number of projects have reached consensus on the recommended semantics for these operations; see Intended usage details below for details.
For details on how this operator will be implemented in CPython, see Implementation details.
Motivation
Executive summary
In numerical code, there are two important operations which compete
for use of Python’s *
operator: elementwise multiplication, and
matrix multiplication. In the nearly twenty years since the Numeric
library was first proposed, there have been many attempts to resolve
this tension [13]; none have been really satisfactory.
Currently, most numerical Python code uses *
for elementwise
multiplication, and function/method syntax for matrix multiplication;
however, this leads to ugly and unreadable code in common
circumstances. The problem is bad enough that significant amounts of
code continue to use the opposite convention (which has the virtue of
producing ugly and unreadable code in different circumstances), and
this API fragmentation across codebases then creates yet more
problems. There does not seem to be any good solution to the
problem of designing a numerical API within current Python syntax –
only a landscape of options that are bad in different ways. The
minimal change to Python syntax which is sufficient to resolve these
problems is the addition of a single new infix operator for matrix
multiplication.
Matrix multiplication has a singular combination of features which distinguish it from other binary operations, which together provide a uniquely compelling case for the addition of a dedicated infix operator:
- Just as for the existing numerical operators, there exists a vast
body of prior art supporting the use of infix notation for matrix
multiplication across all fields of mathematics, science, and
engineering;
@
harmoniously fills a hole in Python’s existing operator system. @
greatly clarifies real-world code.@
provides a smoother onramp for less experienced users, who are particularly harmed by hard-to-read code and API fragmentation.@
benefits a substantial and growing portion of the Python user community.@
will be used frequently – in fact, evidence suggests it may be used more frequently than//
or the bitwise operators.@
allows the Python numerical community to reduce fragmentation, and finally standardize on a single consensus duck type for all numerical array objects.
Background: What’s wrong with the status quo?
When we crunch numbers on a computer, we usually have lots and lots of numbers to deal with. Trying to deal with them one at a time is cumbersome and slow – especially when using an interpreted language. Instead, we want the ability to write down simple operations that apply to large collections of numbers all at once. The n-dimensional array is the basic object that all popular numeric computing environments use to make this possible. Python has several libraries that provide such arrays, with numpy being at present the most prominent.
When working with n-dimensional arrays, there are two different ways we might want to define multiplication. One is elementwise multiplication:
[[1, 2], [[11, 12], [[1 * 11, 2 * 12],
[3, 4]] x [13, 14]] = [3 * 13, 4 * 14]]
and the other is matrix multiplication:
[[1, 2], [[11, 12], [[1 * 11 + 2 * 13, 1 * 12 + 2 * 14],
[3, 4]] x [13, 14]] = [3 * 11 + 4 * 13, 3 * 12 + 4 * 14]]
Elementwise multiplication is useful because it lets us easily and
quickly perform many multiplications on a large collection of values,
without writing a slow and cumbersome for
loop. And this works as
part of a very general schema: when using the array objects provided
by numpy or other numerical libraries, all Python operators work
elementwise on arrays of all dimensionalities. The result is that one
can write functions using straightforward code like a * b + c / d
,
treating the variables as if they were simple values, but then
immediately use this function to efficiently perform this calculation
on large collections of values, while keeping them organized using
whatever arbitrarily complex array layout works best for the problem
at hand.
Matrix multiplication is more of a special case. It’s only defined on 2d arrays (also known as “matrices”), and multiplication is the only operation that has an important “matrix” version – “matrix addition” is the same as elementwise addition; there is no such thing as “matrix bitwise-or” or “matrix floordiv”; “matrix division” and “matrix to-the-power-of” can be defined but are not very useful, etc. However, matrix multiplication is still used very heavily across all numerical application areas; mathematically, it’s one of the most fundamental operations there is.
Because Python syntax currently allows for only a single
multiplication operator *
, libraries providing array-like objects
must decide: either use *
for elementwise multiplication, or use
*
for matrix multiplication. And, unfortunately, it turns out
that when doing general-purpose number crunching, both operations are
used frequently, and there are major advantages to using infix rather
than function call syntax in both cases. Thus it is not at all clear
which convention is optimal, or even acceptable; often it varies on a
case-by-case basis.
Nonetheless, network effects mean that it is very important that we
pick just one convention. In numpy, for example, it is technically
possible to switch between the conventions, because numpy provides two
different types with different __mul__
methods. For
numpy.ndarray
objects, *
performs elementwise multiplication,
and matrix multiplication must use a function call (numpy.dot
).
For numpy.matrix
objects, *
performs matrix multiplication,
and elementwise multiplication requires function syntax. Writing code
using numpy.ndarray
works fine. Writing code using
numpy.matrix
also works fine. But trouble begins as soon as we
try to integrate these two pieces of code together. Code that expects
an ndarray
and gets a matrix
, or vice-versa, may crash or
return incorrect results. Keeping track of which functions expect
which types as inputs, and return which types as outputs, and then
converting back and forth all the time, is incredibly cumbersome and
impossible to get right at any scale. Functions that defensively try
to handle both types as input and DTRT, find themselves floundering
into a swamp of isinstance
and if
statements.
PEP 238 split /
into two operators: /
and //
. Imagine the
chaos that would have resulted if it had instead split int
into
two types: classic_int
, whose __div__
implemented floor
division, and new_int
, whose __div__
implemented true
division. This, in a more limited way, is the situation that Python
number-crunchers currently find themselves in.
In practice, the vast majority of projects have settled on the
convention of using *
for elementwise multiplication, and function
call syntax for matrix multiplication (e.g., using numpy.ndarray
instead of numpy.matrix
). This reduces the problems caused by API
fragmentation, but it doesn’t eliminate them. The strong desire to
use infix notation for matrix multiplication has caused a number of
specialized array libraries to continue to use the opposing convention
(e.g., scipy.sparse, pyoperators, pyviennacl) despite the problems
this causes, and numpy.matrix
itself still gets used in
introductory programming courses, often appears in StackOverflow
answers, and so forth. Well-written libraries thus must continue to
be prepared to deal with both types of objects, and, of course, are
also stuck using unpleasant funcall syntax for matrix multiplication.
After nearly two decades of trying, the numerical community has still
not found any way to resolve these problems within the constraints of
current Python syntax (see Rejected alternatives to adding a new
operator below).
This PEP proposes the minimum effective change to Python syntax that
will allow us to drain this swamp. It splits *
into two
operators, just as was done for /
: *
for elementwise
multiplication, and @
for matrix multiplication. (Why not the
reverse? Because this way is compatible with the existing consensus,
and because it gives us a consistent rule that all the built-in
numeric operators also apply in an elementwise manner to arrays; the
reverse convention would lead to more special cases.)
So that’s why matrix multiplication doesn’t and can’t just use *
.
Now, in the rest of this section, we’ll explain why it nonetheless
meets the high bar for adding a new operator.
Why should matrix multiplication be infix?
Right now, most numerical code in Python uses syntax like
numpy.dot(a, b)
or a.dot(b)
to perform matrix multiplication.
This obviously works, so why do people make such a fuss about it, even
to the point of creating API fragmentation and compatibility swamps?
Matrix multiplication shares two features with ordinary arithmetic operations like addition and multiplication on numbers: (a) it is used very heavily in numerical programs – often multiple times per line of code – and (b) it has an ancient and universally adopted tradition of being written using infix syntax. This is because, for typical formulas, this notation is dramatically more readable than any function call syntax. Here’s an example to demonstrate:
One of the most useful tools for testing a statistical hypothesis is the linear hypothesis test for OLS regression models. It doesn’t really matter what all those words I just said mean; if we find ourselves having to implement this thing, what we’ll do is look up some textbook or paper on it, and encounter many mathematical formulas that look like:
Here the various variables are all vectors or matrices (details for the curious: [5]).
Now we need to write code to perform this calculation. In current numpy, matrix multiplication can be performed using either the function or method call syntax. Neither provides a particularly readable translation of the formula:
import numpy as np
from numpy.linalg import inv, solve
# Using dot function:
S = np.dot((np.dot(H, beta) - r).T,
np.dot(inv(np.dot(np.dot(H, V), H.T)), np.dot(H, beta) - r))
# Using dot method:
S = (H.dot(beta) - r).T.dot(inv(H.dot(V).dot(H.T))).dot(H.dot(beta) - r)
With the @
operator, the direct translation of the above formula
becomes:
S = (H @ beta - r).T @ inv(H @ V @ H.T) @ (H @ beta - r)
Notice that there is now a transparent, 1-to-1 mapping between the symbols in the original formula and the code that implements it.
Of course, an experienced programmer will probably notice that this is
not the best way to compute this expression. The repeated computation
of Hβ − r should perhaps be factored out; and,
expressions of the form dot(inv(A), B)
should almost always be
replaced by the more numerically stable solve(A, B)
. When using
@
, performing these two refactorings gives us:
# Version 1 (as above)
S = (H @ beta - r).T @ inv(H @ V @ H.T) @ (H @ beta - r)
# Version 2
trans_coef = H @ beta - r
S = trans_coef.T @ inv(H @ V @ H.T) @ trans_coef
# Version 3
S = trans_coef.T @ solve(H @ V @ H.T, trans_coef)
Notice that when comparing between each pair of steps, it’s very easy to see exactly what was changed. If we apply the equivalent transformations to the code using the .dot method, then the changes are much harder to read out or verify for correctness:
# Version 1 (as above)
S = (H.dot(beta) - r).T.dot(inv(H.dot(V).dot(H.T))).dot(H.dot(beta) - r)
# Version 2
trans_coef = H.dot(beta) - r
S = trans_coef.T.dot(inv(H.dot(V).dot(H.T))).dot(trans_coef)
# Version 3
S = trans_coef.T.dot(solve(H.dot(V).dot(H.T)), trans_coef)
Readability counts! The statements using @
are shorter, contain
more whitespace, can be directly and easily compared both to each
other and to the textbook formula, and contain only meaningful
parentheses. This last point is particularly important for
readability: when using function-call syntax, the required parentheses
on every operation create visual clutter that makes it very difficult
to parse out the overall structure of the formula by eye, even for a
relatively simple formula like this one. Eyes are terrible at parsing
non-regular languages. I made and caught many errors while trying to
write out the ‘dot’ formulas above. I know they still contain at
least one error, maybe more. (Exercise: find it. Or them.) The
@
examples, by contrast, are not only correct, they’re obviously
correct at a glance.
If we are even more sophisticated programmers, and writing code that
we expect to be reused, then considerations of speed or numerical
accuracy might lead us to prefer some particular order of evaluation.
Because @
makes it possible to omit irrelevant parentheses, we can
be certain that if we do write something like (H @ V) @ H.T
,
then our readers will know that the parentheses must have been added
intentionally to accomplish some meaningful purpose. In the dot
examples, it’s impossible to know which nesting decisions are
important, and which are arbitrary.
Infix @
dramatically improves matrix code usability at all stages
of programmer interaction.
Transparent syntax is especially crucial for non-expert programmers
A large proportion of scientific code is written by people who are experts in their domain, but are not experts in programming. And there are many university courses run each year with titles like “Data analysis for social scientists” which assume no programming background, and teach some combination of mathematical techniques, introduction to programming, and the use of programming to implement these mathematical techniques, all within a 10-15 week period. These courses are more and more often being taught in Python rather than special-purpose languages like R or Matlab.
For these kinds of users, whose programming knowledge is fragile, the
existence of a transparent mapping between formulas and code often
means the difference between succeeding and failing to write that code
at all. This is so important that such classes often use the
numpy.matrix
type which defines *
to mean matrix
multiplication, even though this type is buggy and heavily
disrecommended by the rest of the numpy community for the
fragmentation that it causes. This pedagogical use case is, in fact,
the only reason numpy.matrix
remains a supported part of numpy.
Adding @
will benefit both beginning and advanced users with
better syntax; and furthermore, it will allow both groups to
standardize on the same notation from the start, providing a smoother
on-ramp to expertise.
But isn’t matrix multiplication a pretty niche requirement?
The world is full of continuous data, and computers are increasingly called upon to work with it in sophisticated ways. Arrays are the lingua franca of finance, machine learning, 3d graphics, computer vision, robotics, operations research, econometrics, meteorology, computational linguistics, recommendation systems, neuroscience, astronomy, bioinformatics (including genetics, cancer research, drug discovery, etc.), physics engines, quantum mechanics, geophysics, network analysis, and many other application areas. In most or all of these areas, Python is rapidly becoming a dominant player, in large part because of its ability to elegantly mix traditional discrete data structures (hash tables, strings, etc.) on an equal footing with modern numerical data types and algorithms.
We all live in our own little sub-communities, so some Python users may be surprised to realize the sheer extent to which Python is used for number crunching – especially since much of this particular sub-community’s activity occurs outside of traditional Python/FOSS channels. So, to give some rough idea of just how many numerical Python programmers are actually out there, here are two numbers: In 2013, there were 7 international conferences organized specifically on numerical Python [3] [4]. At PyCon 2014, ~20% of the tutorials appear to involve the use of matrices [6].
To quantify this further, we used Github’s “search” function to look
at what modules are actually imported across a wide range of
real-world code (i.e., all the code on Github). We checked for
imports of several popular stdlib modules, a variety of numerically
oriented modules, and various other extremely high-profile modules
like django and lxml (the latter of which is the #1 most downloaded
package on PyPI). Starred lines indicate packages which export
array- or matrix-like objects which will adopt @
if this PEP is
approved:
Count of Python source files on Github matching given search terms
(as of 2014-04-10, ~21:00 UTC)
================ ========== =============== ======= ===========
module "import X" "from X import" total total/numpy
================ ========== =============== ======= ===========
sys 2374638 63301 2437939 5.85
os 1971515 37571 2009086 4.82
re 1294651 8358 1303009 3.12
numpy ************** 337916 ********** 79065 * 416981 ******* 1.00
warnings 298195 73150 371345 0.89
subprocess 281290 63644 344934 0.83
django 62795 219302 282097 0.68
math 200084 81903 281987 0.68
threading 212302 45423 257725 0.62
pickle+cPickle 215349 22672 238021 0.57
matplotlib 119054 27859 146913 0.35
sqlalchemy 29842 82850 112692 0.27
pylab *************** 36754 ********** 41063 ** 77817 ******* 0.19
scipy *************** 40829 ********** 28263 ** 69092 ******* 0.17
lxml 19026 38061 57087 0.14
zlib 40486 6623 47109 0.11
multiprocessing 25247 19850 45097 0.11
requests 30896 560 31456 0.08
jinja2 8057 24047 32104 0.08
twisted 13858 6404 20262 0.05
gevent 11309 8529 19838 0.05
pandas ************** 14923 *********** 4005 ** 18928 ******* 0.05
sympy 2779 9537 12316 0.03
theano *************** 3654 *********** 1828 *** 5482 ******* 0.01
================ ========== =============== ======= ===========
These numbers should be taken with several grains of salt (see
footnote for discussion: [12]), but, to the extent they
can be trusted, they suggest that numpy
might be the single
most-imported non-stdlib module in the entire Pythonverse; it’s even
more-imported than such stdlib stalwarts as subprocess
, math
,
pickle
, and threading
. And numpy users represent only a
subset of the broader numerical community that will benefit from the
@
operator. Matrices may once have been a niche data type
restricted to Fortran programs running in university labs and military
clusters, but those days are long gone. Number crunching is a
mainstream part of modern Python usage.
In addition, there is some precedence for adding an infix operator to
handle a more-specialized arithmetic operation: the floor division
operator //
, like the bitwise operators, is very useful under
certain circumstances when performing exact calculations on discrete
values. But it seems likely that there are many Python programmers
who have never had reason to use //
(or, for that matter, the
bitwise operators). @
is no more niche than //
.
So @
is good for matrix formulas, but how common are those really?
We’ve seen that @
makes matrix formulas dramatically easier to
work with for both experts and non-experts, that matrix formulas
appear in many important applications, and that numerical libraries
like numpy are used by a substantial proportion of Python’s user base.
But numerical libraries aren’t just about matrix formulas, and being
important doesn’t necessarily mean taking up a lot of code: if matrix
formulas only occurred in one or two places in the average
numerically-oriented project, then it still wouldn’t be worth adding a
new operator. So how common is matrix multiplication, really?
When the going gets tough, the tough get empirical. To get a rough
estimate of how useful the @
operator will be, the table below
shows the rate at which different Python operators are actually used
in the stdlib, and also in two high-profile numerical packages – the
scikit-learn machine learning library, and the nipy neuroimaging
library – normalized by source lines of code (SLOC). Rows are sorted
by the ‘combined’ column, which pools all three code bases together.
The combined column is thus strongly weighted towards the stdlib,
which is much larger than both projects put together (stdlib: 411575
SLOC, scikit-learn: 50924 SLOC, nipy: 37078 SLOC). [7]
The dot
row (marked ******
) counts how common matrix multiply
operations are in each codebase.
==== ====== ============ ==== ========
op stdlib scikit-learn nipy combined
==== ====== ============ ==== ========
= 2969 5536 4932 3376 / 10,000 SLOC
- 218 444 496 261
+ 224 201 348 231
== 177 248 334 196
* 156 284 465 192
% 121 114 107 119
** 59 111 118 68
!= 40 56 74 44
/ 18 121 183 41
> 29 70 110 39
+= 34 61 67 39
< 32 62 76 38
>= 19 17 17 18
<= 18 27 12 18
dot ***** 0 ********** 99 ** 74 ****** 16
| 18 1 2 15
& 14 0 6 12
<< 10 1 1 8
// 9 9 1 8
-= 5 21 14 8
*= 2 19 22 5
/= 0 23 16 4
>> 4 0 0 3
^ 3 0 0 3
~ 2 4 5 2
|= 3 0 0 2
&= 1 0 0 1
//= 1 0 0 1
^= 1 0 0 0
**= 0 2 0 0
%= 0 0 0 0
<<= 0 0 0 0
>>= 0 0 0 0
==== ====== ============ ==== ========
These two numerical packages alone contain ~780 uses of matrix
multiplication. Within these packages, matrix multiplication is used
more heavily than most comparison operators (<
!=
<=
>=
). Even when we dilute these counts by including the stdlib
into our comparisons, matrix multiplication is still used more often
in total than any of the bitwise operators, and 2x as often as //
.
This is true even though the stdlib, which contains a fair amount of
integer arithmetic and no matrix operations, makes up more than 80% of
the combined code base.
By coincidence, the numeric libraries make up approximately the same
proportion of the ‘combined’ codebase as numeric tutorials make up of
PyCon 2014’s tutorial schedule, which suggests that the ‘combined’
column may not be wildly unrepresentative of new Python code in
general. While it’s impossible to know for certain, from this data it
seems entirely possible that across all Python code currently being
written, matrix multiplication is already used more often than //
and the bitwise operations.
But isn’t it weird to add an operator with no stdlib uses?
It’s certainly unusual (though extended slicing existed for some time
builtin types gained support for it, Ellipsis
is still unused
within the stdlib, etc.). But the important thing is whether a change
will benefit users, not where the software is being downloaded from.
It’s clear from the above that @
will be used, and used heavily.
And this PEP provides the critical piece that will allow the Python
numerical community to finally reach consensus on a standard duck type
for all array-like objects, which is a necessary precondition to ever
adding a numerical array type to the stdlib.
Compatibility considerations
Currently, the only legal use of the @
token in Python code is at
statement beginning in decorators. The new operators are both infix;
the one place they can never occur is at statement beginning.
Therefore, no existing code will be broken by the addition of these
operators, and there is no possible parsing ambiguity between
decorator-@ and the new operators.
Another important kind of compatibility is the mental cost paid by
users to update their understanding of the Python language after this
change, particularly for users who do not work with matrices and thus
do not benefit. Here again, @
has minimal impact: even
comprehensive tutorials and references will only need to add a
sentence or two to fully document this PEP’s changes for a
non-numerical audience.
Intended usage details
This section is informative, rather than normative – it documents the
consensus of a number of libraries that provide array- or matrix-like
objects on how @
will be implemented.
This section uses the numpy terminology for describing arbitrary
multidimensional arrays of data, because it is a superset of all other
commonly used models. In this model, the shape of any array is
represented by a tuple of integers. Because matrices are
two-dimensional, they have len(shape) == 2, while 1d vectors have
len(shape) == 1, and scalars have shape == (), i.e., they are “0
dimensional”. Any array contains prod(shape) total entries. Notice
that prod(()) == 1 (for the same reason that sum(()) == 0); scalars
are just an ordinary kind of array, not a special case. Notice also
that we distinguish between a single scalar value (shape == (),
analogous to 1
), a vector containing only a single entry (shape ==
(1,), analogous to [1]
), a matrix containing only a single entry
(shape == (1, 1), analogous to [[1]]
), etc., so the dimensionality
of any array is always well-defined. Other libraries with more
restricted representations (e.g., those that support 2d arrays only)
might implement only a subset of the functionality described here.
Semantics
The recommended semantics for @
for different inputs are:
- 2d inputs are conventional matrices, and so the semantics are
obvious: we apply conventional matrix multiplication. If we write
arr(2, 3)
to represent an arbitrary 2x3 array, thenarr(2, 3) @ arr(3, 4)
returns an array with shape (2, 4). - 1d vector inputs are promoted to 2d by prepending or appending a ‘1’
to the shape, the operation is performed, and then the added
dimension is removed from the output. The 1 is always added on the
“outside” of the shape: prepended for left arguments, and appended
for right arguments. The result is that matrix @ vector and vector
@ matrix are both legal (assuming compatible shapes), and both
return 1d vectors; vector @ vector returns a scalar. This is
clearer with examples.
arr(2, 3) @ arr(3, 1)
is a regular matrix product, and returns an array with shape (2, 1), i.e., a column vector.arr(2, 3) @ arr(3)
performs the same computation as the previous (i.e., treats the 1d vector as a matrix containing a single column, shape = (3, 1)), but returns the result with shape (2,), i.e., a 1d vector.arr(1, 3) @ arr(3, 2)
is a regular matrix product, and returns an array with shape (1, 2), i.e., a row vector.arr(3) @ arr(3, 2)
performs the same computation as the previous (i.e., treats the 1d vector as a matrix containing a single row, shape = (1, 3)), but returns the result with shape (2,), i.e., a 1d vector.arr(1, 3) @ arr(3, 1)
is a regular matrix product, and returns an array with shape (1, 1), i.e., a single value in matrix form.arr(3) @ arr(3)
performs the same computation as the previous, but returns the result with shape (), i.e., a single scalar value, not in matrix form. So this is the standard inner product on vectors.
An infelicity of this definition for 1d vectors is that it makes
@
non-associative in some cases ((Mat1 @ vec) @ Mat2
!=Mat1 @ (vec @ Mat2)
). But this seems to be a case where practicality beats purity: non-associativity only arises for strange expressions that would never be written in practice; if they are written anyway then there is a consistent rule for understanding what will happen (Mat1 @ vec @ Mat2
is parsed as(Mat1 @ vec) @ Mat2
, just likea - b - c
); and, not supporting 1d vectors would rule out many important use cases that do arise very commonly in practice. No-one wants to explain to new users why to solve the simplest linear system in the obvious way, they have to type(inv(A) @ b[:, np.newaxis]).flatten()
instead ofinv(A) @ b
, or perform an ordinary least-squares regression by typingsolve(X.T @ X, X @ y[:, np.newaxis]).flatten()
instead ofsolve(X.T @ X, X @ y)
. No-one wants to type(a[np.newaxis, :] @ b[:, np.newaxis])[0, 0]
instead ofa @ b
every time they compute an inner product, or(a[np.newaxis, :] @ Mat @ b[:, np.newaxis])[0, 0]
for general quadratic forms instead ofa @ Mat @ b
. In addition, sage and sympy (see below) use these non-associative semantics with an infix matrix multiplication operator (they use*
), and they report that they haven’t experienced any problems caused by it. - For inputs with more than 2 dimensions, we treat the last two
dimensions as being the dimensions of the matrices to multiply, and
‘broadcast’ across the other dimensions. This provides a convenient
way to quickly compute many matrix products in a single operation.
For example,
arr(10, 2, 3) @ arr(10, 3, 4)
performs 10 separate matrix multiplies, each of which multiplies a 2x3 and a 3x4 matrix to produce a 2x4 matrix, and then returns the 10 resulting matrices together in an array with shape (10, 2, 4). The intuition here is that we treat these 3d arrays of numbers as if they were 1d arrays of matrices, and then apply matrix multiplication in an elementwise manner, where now each ‘element’ is a whole matrix. Note that broadcasting is not limited to perfectly aligned arrays; in more complicated cases, it allows several simple but powerful tricks for controlling how arrays are aligned with each other; see [10] for details. (In particular, it turns out that when broadcasting is taken into account, the standard scalar * matrix product is a special case of the elementwise multiplication operator*
.)If one operand is >2d, and another operand is 1d, then the above rules apply unchanged, with 1d->2d promotion performed before broadcasting. E.g.,
arr(10, 2, 3) @ arr(3)
first promotes toarr(10, 2, 3) @ arr(3, 1)
, then broadcasts the right argument to create the aligned operationarr(10, 2, 3) @ arr(10, 3, 1)
, multiplies to get an array with shape (10, 2, 1), and finally removes the added dimension, returning an array with shape (10, 2). Similarly,arr(2) @ arr(10, 2, 3)
produces an intermediate array with shape (10, 1, 3), and a final array with shape (10, 3). - 0d (scalar) inputs raise an error. Scalar * matrix multiplication
is a mathematically and algorithmically distinct operation from
matrix @ matrix multiplication, and is already covered by the
elementwise
*
operator. Allowing scalar @ matrix would thus both require an unnecessary special case, and violate TOOWTDI.
Adoption
We group existing Python projects which provide array- or matrix-like types based on what API they currently use for elementwise and matrix multiplication.
Projects which currently use * for elementwise multiplication, and function/method calls for matrix multiplication:
The developers of the following projects have expressed an intention
to implement @
on their array-like types using the above
semantics:
- numpy
- pandas
- blaze
- theano
The following projects have been alerted to the existence of the PEP, but it’s not yet known what they plan to do if it’s accepted. We don’t anticipate that they’ll have any objections, though, since everything proposed here is consistent with how they already do things:
- pycuda
- panda3d
Projects which currently use * for matrix multiplication, and function/method calls for elementwise multiplication:
The following projects have expressed an intention, if this PEP is
accepted, to migrate from their current API to the elementwise-*
,
matmul-@
convention (i.e., this is a list of projects whose API
fragmentation will probably be eliminated if this PEP is accepted):
- numpy (
numpy.matrix
) - scipy.sparse
- pyoperators
- pyviennacl
The following projects have been alerted to the existence of the PEP, but it’s not known what they plan to do if it’s accepted (i.e., this is a list of projects whose API fragmentation may or may not be eliminated if this PEP is accepted):
- cvxopt
Projects which currently use * for matrix multiplication, and which don’t really care about elementwise multiplication of matrices:
There are several projects which implement matrix types, but from a
very different perspective than the numerical libraries discussed
above. These projects focus on computational methods for analyzing
matrices in the sense of abstract mathematical objects (i.e., linear
maps over free modules over rings), rather than as big bags full of
numbers that need crunching. And it turns out that from the abstract
math point of view, there isn’t much use for elementwise operations in
the first place; as discussed in the Background section above,
elementwise operations are motivated by the bag-of-numbers approach.
So these projects don’t encounter the basic problem that this PEP
exists to address, making it mostly irrelevant to them; while they
appear superficially similar to projects like numpy, they’re actually
doing something quite different. They use *
for matrix
multiplication (and for group actions, and so forth), and if this PEP
is accepted, their expressed intention is to continue doing so, while
perhaps adding @
as an alias. These projects include:
- sympy
- sage
Implementation details
New functions operator.matmul
and operator.__matmul__
are
added to the standard library, with the usual semantics.
A corresponding function PyObject* PyObject_MatrixMultiply(PyObject
*o1, PyObject *o2)
is added to the C API.
A new AST node is added named MatMult
, along with a new token
ATEQUAL
and new bytecode opcodes BINARY_MATRIX_MULTIPLY
and
INPLACE_MATRIX_MULTIPLY
.
Two new type slots are added; whether this is to PyNumberMethods
or a new PyMatrixMethods
struct remains to be determined.
Rationale for specification details
Choice of operator
Why @
instead of some other spelling? There isn’t any consensus
across other programming languages about how this operator should be
named [11]; here we discuss the various options.
Restricting ourselves only to symbols present on US English keyboards,
the punctuation characters that don’t already have a meaning in Python
expression context are: @
, backtick, $
, !
, and ?
. Of
these options, @
is clearly the best; !
and ?
are already
heavily freighted with inapplicable meanings in the programming
context, backtick has been banned from Python by BDFL pronouncement
(see PEP 3099), and $
is uglier, even more dissimilar to *
and
⋅, and has Perl/PHP baggage. $
is probably the
second-best option of these, though.
Symbols which are not present on US English keyboards start at a
significant disadvantage (having to spend 5 minutes at the beginning
of every numeric Python tutorial just going over keyboard layouts is
not a hassle anyone really wants). Plus, even if we somehow overcame
the typing problem, it’s not clear there are any that are actually
better than @
. Some options that have been suggested include:
- U+00D7 MULTIPLICATION SIGN:
A × B
- U+22C5 DOT OPERATOR:
A ⋅ B
- U+2297 CIRCLED TIMES:
A ⊗ B
- U+00B0 DEGREE:
A ° B
What we need, though, is an operator that means “matrix
multiplication, as opposed to scalar/elementwise multiplication”.
There is no conventional symbol with this meaning in either
programming or mathematics, where these operations are usually
distinguished by context. (And U+2297 CIRCLED TIMES is actually used
conventionally to mean exactly the wrong things: elementwise
multiplication – the “Hadamard product” – or outer product, rather
than matrix/inner product like our operator). @
at least has the
virtue that it looks like a funny non-commutative operator; a naive
user who knows maths but not programming couldn’t look at A * B
versus A × B
, or A * B
versus A ⋅ B
, or A * B
versus
A ° B
and guess which one is the usual multiplication, and which
one is the special case.
Finally, there is the option of using multi-character tokens. Some options:
- Matlab and Julia use a
.*
operator. Aside from being visually confusable with*
, this would be a terrible choice for us because in Matlab and Julia,*
means matrix multiplication and.*
means elementwise multiplication, so using.*
for matrix multiplication would make us exactly backwards from what Matlab and Julia users expect. - APL apparently used
+.×
, which by combining a multi-character token, confusing attribute-access-like . syntax, and a unicode character, ranks somewhere below U+2603 SNOWMAN on our candidate list. If we like the idea of combining addition and multiplication operators as being evocative of how matrix multiplication actually works, then something like+*
could be used – though this may be too easy to confuse with*+
, which is just multiplication combined with the unary+
operator. - PEP 211 suggested
~*
. This has the downside that it sort of suggests that there is a unary*
operator that is being combined with unary~
, but it could work. - R uses
%*%
for matrix multiplication. In R this forms part of a general extensible infix system in which all tokens of the form%foo%
are user-defined binary operators. We could steal the token without stealing the system. - Some other plausible candidates that have been suggested:
><
(= ascii drawing of the multiplication sign ×); the footnote operator[*]
or|*|
(but when used in context, the use of vertical grouping symbols tends to recreate the nested parentheses visual clutter that was noted as one of the major downsides of the function syntax we’re trying to get away from);^*
.
So, it doesn’t matter much, but @
seems as good or better than any
of the alternatives:
- It’s a friendly character that Pythoneers are already used to typing in decorators, but the decorator usage and the math expression usage are sufficiently dissimilar that it would be hard to confuse them in practice.
- It’s widely accessible across keyboard layouts (and thanks to its use in email addresses, this is true even of weird keyboards like those in phones).
- It’s round like
*
and ⋅. - The mATrices mnemonic is cute.
- The swirly shape is reminiscent of the simultaneous sweeps over rows and columns that define matrix multiplication
- Its asymmetry is evocative of its non-commutative nature.
- Whatever, we have to pick something.
Precedence and associativity
There was a long discussion [15] about
whether @
should be right- or left-associative (or even something
more exotic [18]). Almost all Python operators are
left-associative, so following this convention would be the simplest
approach, but there were two arguments that suggested matrix
multiplication might be worth making right-associative as a special
case:
First, matrix multiplication has a tight conceptual association with function application/composition, so many mathematically sophisticated users have an intuition that an expression like RSx proceeds from right-to-left, with first S transforming the vector x, and then R transforming the result. This isn’t universally agreed (and not all number-crunchers are steeped in the pure-math conceptual framework that motivates this intuition [16]), but at the least this intuition is more common than for other operations like 2⋅3⋅4 which everyone reads as going from left-to-right.
Second, if expressions like Mat @ Mat @ vec
appear often in code,
then programs will run faster (and efficiency-minded programmers will
be able to use fewer parentheses) if this is evaluated as Mat @ (Mat
@ vec)
then if it is evaluated like (Mat @ Mat) @ vec
.
However, weighing against these arguments are the following:
Regarding the efficiency argument, empirically, we were unable to find
any evidence that Mat @ Mat @ vec
type expressions actually
dominate in real-life code. Parsing a number of large projects that
use numpy, we found that when forced by numpy’s current funcall syntax
to choose an order of operations for nested calls to dot
, people
actually use left-associative nesting slightly more often than
right-associative nesting [17]. And anyway,
writing parentheses isn’t so bad – if an efficiency-minded programmer
is going to take the trouble to think through the best way to evaluate
some expression, they probably should write down the parentheses
regardless of whether they’re needed, just to make it obvious to the
next reader that they order of operations matter.
In addition, it turns out that other languages, including those with
much more of a focus on linear algebra, overwhelmingly make their
matmul operators left-associative. Specifically, the @
equivalent
is left-associative in R, Matlab, Julia, IDL, and Gauss. The only
exceptions we found are Mathematica, in which a @ b @ c
would be
parsed non-associatively as dot(a, b, c)
, and APL, in which all
operators are right-associative. There do not seem to exist any
languages that make @
right-associative and *
left-associative. And these decisions don’t seem to be controversial
– I’ve never seen anyone complaining about this particular aspect of
any of these other languages, and the left-associativity of *
doesn’t seem to bother users of the existing Python libraries that use
*
for matrix multiplication. So, at the least we can conclude from
this that making @
left-associative will certainly not cause any
disasters. Making @
right-associative, OTOH, would be exploring
new and uncertain ground.
And another advantage of left-associativity is that it is much easier
to learn and remember that @
acts like *
, than it is to
remember first that @
is unlike other Python operators by being
right-associative, and then on top of this, also have to remember
whether it is more tightly or more loosely binding than
*
. (Right-associativity forces us to choose a precedence, and
intuitions were about equally split on which precedence made more
sense. So this suggests that no matter which choice we made, no-one
would be able to guess or remember it.)
On net, therefore, the general consensus of the numerical community is
that while matrix multiplication is something of a special case, it’s
not special enough to break the rules, and @
should parse like
*
does.
(Non)-Definitions for built-in types
No __matmul__
or __matpow__
are defined for builtin numeric
types (float
, int
, etc.) or for the numbers.Number
hierarchy, because these types represent scalars, and the consensus
semantics for @
are that it should raise an error on scalars.
We do not – for now – define a __matmul__
method on the standard
memoryview
or array.array
objects, for several reasons. Of
course this could be added if someone wants it, but these types would
require quite a bit of additional work beyond __matmul__
before
they could be used for numeric work – e.g., they have no way to do
addition or scalar multiplication either! – and adding such
functionality is beyond the scope of this PEP. In addition, providing
a quality implementation of matrix multiplication is highly
non-trivial. Naive nested loop implementations are very slow and
shipping such an implementation in CPython would just create a trap
for users. But the alternative – providing a modern, competitive
matrix multiply – would require that CPython link to a BLAS library,
which brings a set of new complications. In particular, several
popular BLAS libraries (including the one that ships by default on
OS X) currently break the use of multiprocessing
[8].
Together, these considerations mean that the cost/benefit of adding
__matmul__
to these types just isn’t there, so for now we’ll
continue to delegate these problems to numpy and friends, and defer a
more systematic solution to a future proposal.
There are also non-numeric Python builtins which define __mul__
(str
, list
, …). We do not define __matmul__
for these
types either, because why would we even do that.
Non-definition of matrix power
Earlier versions of this PEP also proposed a matrix power operator,
@@
, analogous to **
. But on further consideration, it was
decided that the utility of this was sufficiently unclear that it
would be better to leave it out for now, and only revisit the issue if
– once we have more experience with @
– it turns out that @@
is truly missed. [14]
Rejected alternatives to adding a new operator
Over the past few decades, the Python numeric community has explored a
variety of ways to resolve the tension between matrix and elementwise
multiplication operations. PEP 211 and PEP 225, both proposed in 2000
and last seriously discussed in 2008 [9], were early
attempts to add new operators to solve this problem, but suffered from
serious flaws; in particular, at that time the Python numerical
community had not yet reached consensus on the proper API for array
objects, or on what operators might be needed or useful (e.g., PEP 225
proposes 6 new operators with unspecified semantics). Experience
since then has now led to consensus that the best solution, for both
numeric Python and core Python, is to add a single infix operator for
matrix multiply (together with the other new operators this implies
like @=
).
We review some of the rejected alternatives here.
Use a second type that defines __mul__ as matrix multiplication:
As discussed above (Background: What’s wrong with the status quo?),
this has been tried this for many years via the numpy.matrix
type
(and its predecessors in Numeric and numarray). The result is a
strong consensus among both numpy developers and developers of
downstream packages that numpy.matrix
should essentially never be
used, because of the problems caused by having conflicting duck types
for arrays. (Of course one could then argue we should only define
__mul__
to be matrix multiplication, but then we’d have the same
problem with elementwise multiplication.) There have been several
pushes to remove numpy.matrix
entirely; the only counter-arguments
have come from educators who find that its problems are outweighed by
the need to provide a simple and clear mapping between mathematical
notation and code for novices (see Transparent syntax is especially
crucial for non-expert programmers). But, of course, starting out
newbies with a dispreferred syntax and then expecting them to
transition later causes its own problems. The two-type solution is
worse than the disease.
Add lots of new operators, or add a new generic syntax for defining infix operators: In addition to being generally un-Pythonic and repeatedly rejected by BDFL fiat, this would be using a sledgehammer to smash a fly. The scientific python community has consensus that adding one operator for matrix multiplication is enough to fix the one otherwise unfixable pain point. (In retrospect, we all think PEP 225 was a bad idea too – or at least far more complex than it needed to be.)
Add a new @ (or whatever) operator that has some other meaning in
general Python, and then overload it in numeric code: This was the
approach taken by PEP 211, which proposed defining @
to be the
equivalent of itertools.product
. The problem with this is that
when taken on its own terms, it’s pretty clear that
itertools.product
doesn’t actually need a dedicated operator. It
hasn’t even been deemed worth of a builtin. (During discussions of
this PEP, a similar suggestion was made to define @
as a general
purpose function composition operator, and this suffers from the same
problem; functools.compose
isn’t even useful enough to exist.)
Matrix multiplication has a uniquely strong rationale for inclusion as
an infix operator. There almost certainly don’t exist any other
binary operations that will ever justify adding any other infix
operators to Python.
Add a .dot method to array types so as to allow “pseudo-infix” A.dot(B) syntax: This has been in numpy for some years, and in many cases it’s better than dot(A, B). But it’s still much less readable than real infix notation, and in particular still suffers from an extreme overabundance of parentheses. See Why should matrix multiplication be infix? above.
Use a ‘with’ block to toggle the meaning of * within a single code block: E.g., numpy could define a special context object so that we’d have:
c = a * b # element-wise multiplication
with numpy.mul_as_dot:
c = a * b # matrix multiplication
However, this has two serious problems: first, it requires that every
array-like type’s __mul__
method know how to check some global
state (numpy.mul_is_currently_dot
or whatever). This is fine if
a
and b
are numpy objects, but the world contains many
non-numpy array-like objects. So this either requires non-local
coupling – every numpy competitor library has to import numpy and
then check numpy.mul_is_currently_dot
on every operation – or
else it breaks duck-typing, with the above code doing radically
different things depending on whether a
and b
are numpy
objects or some other sort of object. Second, and worse, with
blocks are dynamically scoped, not lexically scoped; i.e., any
function that gets called inside the with
block will suddenly find
itself executing inside the mul_as_dot world, and crash and burn
horribly – if you’re lucky. So this is a construct that could only
be used safely in rather limited cases (no function calls), and which
would make it very easy to shoot yourself in the foot without warning.
Use a language preprocessor that adds extra numerically-oriented
operators and perhaps other syntax: (As per recent BDFL suggestion:
[1]) This suggestion seems based on the idea that
numerical code needs a wide variety of syntax additions. In fact,
given @
, most numerical users don’t need any other operators or
syntax; it solves the one really painful problem that cannot be solved
by other means, and that causes painful reverberations through the
larger ecosystem. Defining a new language (presumably with its own
parser which would have to be kept in sync with Python’s, etc.), just
to support a single binary operator, is neither practical nor
desirable. In the numerical context, Python’s competition is
special-purpose numerical languages (Matlab, R, IDL, etc.). Compared
to these, Python’s killer feature is exactly that one can mix
specialized numerical code with code for XML parsing, web page
generation, database access, network programming, GUI libraries, and
so forth, and we also gain major benefits from the huge variety of
tutorials, reference material, introductory classes, etc., which use
Python. Fragmenting “numerical Python” from “real Python” would be a
major source of confusion. A major motivation for this PEP is to
reduce fragmentation. Having to set up a preprocessor would be an
especially prohibitive complication for unsophisticated users. And we
use Python because we like Python! We don’t want
almost-but-not-quite-Python.
Use overloading hacks to define a “new infix operator” like *dot*, as in a well-known Python recipe: (See: [2]) Beautiful is better than ugly. This is… not beautiful. And not Pythonic. And especially unfriendly to beginners, who are just trying to wrap their heads around the idea that there’s a coherent underlying system behind these magic incantations that they’re learning, when along comes an evil hack like this that violates that system, creates bizarre error messages when accidentally misused, and whose underlying mechanisms can’t be understood without deep knowledge of how object oriented systems work.
Use a special “facade” type to support syntax like arr.M * arr:
This is very similar to the previous proposal, in that the .M
attribute would basically return the same object as arr *dot
would,
and thus suffers the same objections about ‘magicalness’. This
approach also has some non-obvious complexities: for example, while
arr.M * arr
must return an array, arr.M * arr.M
and
arr * arr.M
must return facade objects, or else arr.M * arr.M * arr
and arr * arr.M * arr
will not work. But this means that facade
objects must be able to recognize both other array objects and other
facade objects (which creates additional complexity for writing
interoperating array types from different libraries who must now
recognize both each other’s array types and their facade types). It
also creates pitfalls for users who may easily type arr * arr.M
or
arr.M * arr.M
and expect to get back an array object; instead,
they will get a mysterious object that throws errors when they attempt
to use it. Basically with this approach users must be careful to
think of .M*
as an indivisible unit that acts as an infix operator
– and as infix-operator-like token strings go, at least *dot*
is prettier looking (look at its cute little ears!).
Discussions of this PEP
Collected here for reference:
- Github pull request containing much of the original discussion and drafting: https://github.com/numpy/numpy/pull/4351
- sympy mailing list discussions of an early draft:
- sage-devel mailing list discussions of an early draft: https://groups.google.com/forum/#!topic/sage-devel/YxEktGu8DeM
- 13-Mar-2014 python-ideas thread: https://mail.python.org/pipermail/python-ideas/2014-March/027053.html
- numpy-discussion thread on whether to keep
@@
: http://mail.scipy.org/pipermail/numpy-discussion/2014-March/069448.html - numpy-discussion threads on precedence/associativity of
@
: * http://mail.scipy.org/pipermail/numpy-discussion/2014-March/069444.html * http://mail.scipy.org/pipermail/numpy-discussion/2014-March/069605.html
References
Copyright
This document has been placed in the public domain.
Source: https://github.com/python/peps/blob/main/peps/pep-0465.rst
Last modified: 2023-09-09 17:39:29 GMT