Proposal to Extend NumPy Broadcasting for Enhanced Array Operations
Proposal to Extend NumPy Broadcasting for (D₁, D₂, ..., N, M) → (D₁, D₂, ..., K, M) When K is a Multiple of N
Proposal: Enhancing NumPy's Broadcasting Mechanism
Current Broadcasting Rule and Its Limitation
As per NumPy's current broadcasting rules, if two arrays have different shapes, the smaller array can be expanded only if one of its dimensions is 1. This allows memory-efficient expansion of data without copying. However, if a dimension is greater than 1, NumPy does not allow expansion to a larger size, even when the larger size is a multiple of the smaller size.
Example of Current Behavior (Allowed Expansion)
import numpy as np
A = np.array([[1, 2, 3]]) # Shape (1,3)
B = np.array([[4, 5, 6], # Shape (2,3)
[7, 8, 9]])
C = A + B # Broadcasting works because (1,3) can expand to (2,3)
print(C)
Output:
[[ 5 7 9]
[ 8 10 12]]
Here, A
has shape (1,3)
, which is automatically expanded to (2,3)
without copying because a dimension of size 1
can be stretched.
However, NumPy fails when trying to expand a dimension where N > 1
, even if the larger size is a multiple of N
.
Example of a Case That Fails (Even Though It Could Work)
A = np.array([[1, 2, 3], # Shape (2,3)
[4, 5, 6]])
B = np.array([[10, 20, 30], # Shape (4,3)
[40, 50, 60],
[70, 80, 90],
[100, 110, 120]])
C = A + B # Error! NumPy does not allow (2,3) to (4,3)
This fails with the error:
ValueError: operands could not be broadcast together with shapes (2,3) (4,3)
Why Should This Be Allowed?
If a larger dimension is an exact multiple of a smaller one, then logically, the smaller array can be repeated along that axis without physically copying data—just like NumPy does when broadcasting (1,M)
to (N,M)
.
In the above example:
A
has shape(2,3)
, andB
has shape(4,3)
.- Since
4
is a multiple of2
(4 % 2 == 0
),A
could be logically repeated2
times to become(4,3)
. - NumPy already does a similar expansion when a dimension is
1
, so why not extend the same logic?
Proposed Behavior (Expanding N → M When M % N == 0)
Allow an axis with size N
to be broadcast to size M
if and only if M
is an exact multiple of N
(M % N == 0
). This is just as memory-efficient as the current broadcasting rules because it can be done using stride tricks instead of copying data.
Example of the Proposed Behavior
If NumPy allowed this new form of broadcasting:
A = np.array([[1, 2, 3], # Shape (2,3)
[4, 5, 6]])
B = np.array([[10, 20, 30], # Shape (4,3)
[40, 50, 60],
[70, 80, 90],
[100, 110, 120]])
# Proposed new broadcasting rule
C = A + B
print(C)
Expected Output:
[[ 11 22 33]
[ 44 55 66]
[ 71 82 93]
[104 115 126]]
This works by logically repeating A
to match B
’s shape (4,3)
.
Why This is a Natural Extension of Broadcasting
- Memory Efficiency: Just like broadcasting
(1,M)
to(N,M)
, this expansion does not require physically copying data. Instead, strides can be adjusted to logically repeat elements, making this as efficient as current broadcasting. - Intuitiveness: Right now, broadcasting already surprises new users. If
(1,3)
can become(2,3)
, why not(2,3)
to(4,3)
when4
is a multiple of2
? This would be a more intuitive rule. - Extends Current Functionality: This is not a new concept—it extends the existing rule where
1
can be stretched to any value. We are generalizing it to any factor relationship (N → M
whenM % N == 0
).
Implementation Considerations
The logic behind NumPy’s current broadcasting already uses stride tricks for memory-efficient expansion. Extending it to handle (N, M) → (K, M)
(where K % N == 0
) would require:
- Updating
np.broadcast_shapes()
,np.broadcast_to()
, and related functions. - Extending the existing logic that handles expanding
1
to support factors as well. - Ensuring backward compatibility and maintaining performance.
Conclusion
I strongly believe this enhancement would make NumPy’s broadcasting more powerful, more intuitive, and just as efficient as before. The fact that we can implement this without unnecessary copying makes it a natural extension of the existing feature.
Get Involved
If you're interested in this proposal or have thoughts to share, please join the discussion on the NumPy mailing list or the GitHub issue tracker. The NumPy community welcomes feedback and contributions from developers who want to help shape the future of scientific computing in Python.
About the Author
This proposal was written by Shasang Thummar, a Python developer passionate about scientific computing and numerical analysis. You can reach out to him at shasangthummar4@gmail.com.