Commit Graph

6549 Commits

Author SHA1 Message Date
Rasmus Munk Larsen
a235ddef39 Get rid of code duplication for conj_helper. For packets where LhsType=RhsType a single generic implementation suffices. For scalars, the generic implementation of pconj automatically forwards to numext::conj, so much of the existing specialization can be avoided. For mixed types we still need specializations.
(cherry picked from commit 52a5f98212)
2021-06-24 23:30:42 +00:00
Antonio Sanchez
c2c0f6f64b Fix fix<> for gcc-4.9.3.
There's a missing `EIGEN_HAS_CXX14` -> `EIGEN_HAS_CXX14_VARIABLE_TEMPLATES`
replacement.

Fixes ##2267


(cherry picked from commit 35a367d557)
2021-06-21 17:26:07 +00:00
Antonio Sanchez
ee4e099aa2 Remove pset, replace with ploadu.
We can't make guarantees on alignment for existing calls to `pset`,
so we should default to loading unaligned.  But in that case, we should
just use `ploadu` directly. For loading constants, this load should hopefully
get optimized away.

This is causing segfaults in Google Maps.


(cherry picked from commit 12e8d57108)
2021-06-17 17:11:08 +00:00
Chip-Kerchner
9fc93ce31a EIGEN_STRONG_INLINE was NOT inlining in some critical needed areas (6.6X slowdown) when used with Tensorflow. Changing to EIGEN_ALWAYS_INLINE where appropiate.
(cherry picked from commit ef1fd341a8)
2021-06-16 22:14:17 +00:00
Antonio Sanchez
1374f49f28 Add missing ppc pcmp_lt_or_nan<Packet8bf>
(cherry picked from commit 9e94c59570)
2021-06-15 22:12:22 +00:00
Rasmus Munk Larsen
47722a66f2 Fix more enum arithmetic.
(cherry picked from commit 13fb5ab92c)
2021-06-15 16:40:35 +00:00
Antonio Sanchez
5e75331b9f Fix checking of version number for mingw.
MinGW spits out version strings like: `x86_64-w64-mingw32-g++ (GCC)
10-win32 20210110`, which causes the version extraction to fail.
Added support for this with tests.

Also added `make_unsigned` for `long long`, since mingw seems to
use that for `uint64_t`.

Related to #2268.  CMake and build passes for me after this.


(cherry picked from commit ad82d20cf6)
2021-06-12 00:02:26 +00:00
Rasmus Munk Larsen
1cb1ffd5b2 Use bit_cast to create -0.0 for floating point types to avoid compiler optimization changing sign with --ffast-math enabled.
(cherry picked from commit fc87e2cbaa)
2021-06-11 02:57:02 +00:00
Rasmus Munk Larsen
4b502a7215 Fix c++20 warnings about using enums in arithmetic expressions.
(cherry picked from commit f64b2954c7)
2021-06-11 02:35:19 +00:00
Cyril Kaiser
573570b6c9 Remove EIGEN_DEVICE_FUNC from CwiseBinaryOp's default copy constructor.
(cherry picked from commit 91cd67f057)
2021-05-26 19:45:25 +00:00
Antonio Sanchez
98cf1e076f Add missing NEON ptranspose implementations.
Unified implementation using only `vzip`.


(cherry picked from commit dba753a986)
2021-05-25 19:09:50 +00:00
Antonio Sanchez
ee2a8f7139 Modify Unary/Binary/TernaryOp evaluators to work for non-class types.
This used to work for non-class types (e.g. raw function pointers) in
Eigen 3.3.  This was changed in commit 11f55b29 to optimize the
evaluator:

> `sizeof((A-B).cwiseAbs2())` with A,B Vector4f is now 16 bytes, instead of 48 before this optimization.

though I cannot reproduce the 16 byte result.  Both before the change
and after, with multiple compilers/versions, I always get a result of 40 bytes.

https://godbolt.org/z/MsjTc1PGe

This change modifies the code slightly to allow non-class types.  The
final generated code is identical, and the expression remains 40 bytes
for the `abs2` sample case.

Fixes #2251


(cherry picked from commit ebb300d0b4)
2021-05-25 18:19:53 +00:00
Steve Bronder
4fbd01cd4b Adds macro for checking if C++14 variable templates are supported
(cherry picked from commit 1720057023)
2021-05-21 16:43:30 +00:00
Niall Murphy
a883a8797c Use derived object type in conservative_resize_like_impl
When calling conservativeResize() on a matrix with DontAlign flag, the
temporary variable used to perform the resize should have the same
Options as the original matrix to ensure that the correct override of
swap is called (i.e. PlainObjectBase::swap(DenseBase<OtherDerived> &
other). Calling the base class swap (i.e in DenseBase) results in
assertions errors or memory corruption.


(cherry picked from commit 391094c507)
2021-05-20 23:43:57 +00:00
guoqiangqi
2f908f8255 Changing the storage of the SSE complex packets to that of the wrapper. This should fix #2242 .
(cherry picked from commit 3d9051ea84)
2021-05-12 17:02:19 +00:00
Nathan Luehr
d1825cbb68 Device implementation of log for std::complex types.
(cherry picked from commit 7e6a1c129c)
2021-05-11 22:31:53 +00:00
Nathan Luehr
d9288f078d Fix ambiguity due to argument dependent lookup.
(cherry picked from commit 6753f0f197)
2021-05-11 22:00:36 +00:00
Rohit Santhanam
85ebd6aff8 Fix for issue where numext::imag and numext::real are used before they are defined.
(cherry picked from commit 39ec31c0ad)
2021-05-10 20:14:10 +00:00
Antonio Sanchez
2947c0cc84 Restore ABI compatibility for conj with 3.3, fix conflict with boost.
The boost library unfortunately specializes `conj` for various types and
assumes the original two-template-parameter version.  This changes
restores the second parameter.  This also restores ABI compatibility.

The specialization for `std::complex` is because `std::conj` is not
a device function. For custom complex scalar types, users should provide
their own `conj` implementation.

We may consider removing the unnecessary second parameter in the future - but
this will require modifying boost as well.

Fixes #2112.


(cherry picked from commit c0eb5f89a4)
2021-05-07 18:38:23 +00:00
Antonio Sanchez
42acbd5700 Fix numext::arg return type.
The cxx11 path for `numext::arg` incorrectly returned the complex type
instead of the real type, leading to compile errors. Fixed this and
added tests.

Related to !477, which uncovered the issue.


(cherry picked from commit 90e9a33e1c)
2021-05-07 17:52:07 +00:00
Christoph Hertzberg
9e0dc8f09b Revert addition of unused paddsub<Packet2cf>. This fixes #2242
(cherry picked from commit 722ca0b665)
2021-05-07 16:23:03 +00:00
Antonio Sanchez
fc2cc10842 Better CUDA complex division.
The original produced NaNs when dividing 0/b for subnormal b.
The `complex_divide_stable` was changed to use the more common
Smith's algorithm.


(cherry picked from commit 1c013be2cc)
2021-04-29 17:58:45 +00:00
Antonio Sanchez
a33855f6ee Add missing pcmp_lt_or_nan for NEON Packet4bf.
(cherry picked from commit 172db7bfc3)
2021-04-27 21:15:08 +00:00
Jakub Lichman
ac3c5aad31 Tests added and AVX512 bug fixed for pcmp_lt_or_nan
(cherry picked from commit d87648a6be)
2021-04-26 18:07:55 +00:00
Antonio Sanchez
8830d66c02 DenseStorage safely copy/swap.
Fixes #2229.

For dynamic matrices with fixed-sized storage, only copy/swap
elements that have been set.  Otherwise, this leads to inefficient
copying, and potential UB for non-initialized elements.


(cherry picked from commit d213a0bcea)
2021-04-22 21:05:50 +00:00
Rasmus Munk Larsen
54425a39b2 Make vectorized compute_inverse_size4 compile with AVX.
(cherry picked from commit 85a76a16ea)
2021-04-22 17:25:25 +00:00
Jakub Lichman
42a8bdd4d7 HasExp added for AVX512 Packet8d
(cherry picked from commit 2b1dfd1ba0)
2021-04-21 12:09:21 +02:00
Chip-Kerchner
28564957ac Fix taking address of rvalue compiler issue with TensorFlow (plus other warnings).
(cherry picked from commit 06c2760bd1)
2021-04-21 01:05:21 +00:00
Antonio Sanchez
ab7fe215f9 Fix ldexp for AVX512 (#2215)
Wrong shuffle was used.  Need to interleave low/high halves with a
`permute` instruction.

Fixes #2215.


(cherry picked from commit 1d79c68ba0)
2021-04-20 20:52:26 +00:00
David Tellenbach
1f4c0311cd Bump to 3.3.91 (3.4-rc1) 2021-04-18 23:43:12 +02:00
David Tellenbach
3e819d83bf Before 3.4 branch 2021-04-18 23:36:14 +02:00
Christoph Hertzberg
9357feedc7 Avoid using uninitialized inputs and if available, use slightly more efficient movsd instruction for pset1<Packet2cf>. 2021-04-13 01:36:59 +02:00
Christoph Hertzberg
1e1c8a735c Use EIGEN_HAS_CXX11 and EIGEN_COMP_CXXVER macros to detect C++ version for std::result_of and std::invoke_result.
Fixes #2209
2021-04-12 01:26:15 +00:00
Christoph Hertzberg
d58678069c Make iterators default constructible and assignable, by making... 2021-04-09 17:03:28 +00:00
Antonio Sanchez
fcb5106c6e Scaled epsilon the wrong way.
Should have been 0.5 to widen the bounds, since this is inverse
precision.  Setting to 0.5, however, leads to many more failing
tests at Google, so reverting to 1 for now.
2021-04-07 15:08:39 -07:00
Christoph Hertzberg
6197ce1a35 Replace -2147483648 by -0.0f or -0.0 constants (this should fix #2189).
Also, remove unnecessary `pgather` operations.
2021-04-07 11:25:27 +00:00
Rasmus Munk Larsen
22edb46823 Align local arrays to Packet boundary. 2021-04-06 16:22:36 +00:00
Antonio Sanchez
90187a33e1 Fix SelfAdjoingEigenSolver (#2191)
Adjust the relaxation step to use the condition
```
abs(subdiag[i]) <= epsilon * sqrt(abs(diag[i]) + abs(diag[i+1]))
```
for setting the subdiagonal entry to zero.

Also adjust Wilkinson shift for small `e = subdiag[end-1]` -
I couldn't find a reference for the original, and it was not
consistent with the Wilkinson definition.

Fixes #2191.
2021-04-05 11:19:09 -07:00
Rasmus Munk Larsen
3ddc0974ce Fix two bugs in commit 2021-04-02 22:06:27 +00:00
Chip Kerchner
c24bee6120 Fix address of temporary object errors in clang11.
This fixes the problem with taking the address of temporary objects which clang11 treats as errors.
2021-04-02 16:27:08 +00:00
Rasmus Munk Larsen
5bbc9cea93 Add an info() method to the SVDBase class to make it possible to tell the user that the computation failed, possibly due to invalid input.
Make Jacobi and divide-and-conquer fail fast and return info() == InvalidInput if the matrix contains NaN or +/-Inf.
2021-03-31 21:09:19 +00:00
Antonio Sanchez
78ee3d6261 Fix CUDA constexpr issues for numeric_limits.
Some CUDA/HIP constants fail on device with `constexpr` since they
internally rely on non-constexpr functions, e.g.
```
\#define CUDART_INF_F            __int_as_float(0x7f800000)
```
This fails for cuda-clang (though passes with nvcc). These constants are
currently used by `device::numeric_limits`.  For portability, we
need to remove `constexpr` from the affected functions.

For C++11 or higher, we should be able to rely on the `std::numeric_limits`
versions anyways, since the methods themselves are now `constexpr`, so
should be supported on device (clang/hipcc natively, nvcc with
`--expr-relaxed-constexpr`).
2021-03-30 18:01:27 +00:00
Antonio Sanchez
af1247fbc1 Use Index type in loop over coefficients.
Previously was `int`.  Brought up by Kyle Snow (Polaris Geospatial
Services) on the mailing list.
2021-03-29 17:40:55 +00:00
Antonio Sanchez
87729ea39f Eliminate round_impl double-promotion warnings for c++03. 2021-03-25 16:52:19 +00:00
Deven Desai
748489ef9c Un-defining EIGEN_HAS_CONSTEXPR on the HIP platform
The Eigen unit-tests started failing on the HIP/ROCm platform, after the following commit

e7b8643d70

```
In file included from /home/rocm-user/eigen/test/main.h:360:
In file included from /home/rocm-user/eigen/Eigen/QR:11:
In file included from /home/rocm-user/eigen/Eigen/Core:162:
/home/rocm-user/eigen/Eigen/src/Core/util/Meta.h:300:17: error: constexpr function never produces a constant expression [-Winvalid-constexpr]
  static float (max)() {
                ^
/home/rocm-user/eigen/Eigen/src/Core/util/Meta.h:304:12: note: non-constexpr function '__int_as_float' cannot be used in a constant expression
    return HIPRT_MAX_NORMAL_F;
           ^
/home/rocm-user/eigen/Eigen/src/Core/arch/HIP/hcc/math_constants.h:14:28: note: expanded from macro 'HIPRT_MAX_NORMAL_F'
#define HIPRT_MAX_NORMAL_F __int_as_float(0x7f7fffff)
                           ^
/opt/rocm/hip/include/hip/hcc_detail/device_functions.h:913:32: note: declared here
__device__ static inline float __int_as_float(int x) {
                               ^
```

The problem seems to that some of the constants defined in the HIP `math_constants.h` have a call to `__int_as_float` routine which is not declared `constexpr` in the HIP runtime header file.

Working around this issue for now, be skipping the const_expr support (enabled via the above commit) on HIP
2021-03-25 13:45:52 +00:00
Chip Kerchner
d59ef212e1 Fixed performance issues for complex VSX and P10 MMA in gebp_kernel (level 3). 2021-03-25 11:08:19 +00:00
Steve Bronder
e7b8643d70 Revert "Revert "Adds EIGEN_CONSTEXPR and EIGEN_NOEXCEPT to rows(), cols(), innerStride(), outerStride(), and size()""
This reverts commit 5f0b4a4010.
2021-03-24 18:14:56 +00:00
Christoph Hertzberg
69a4f70956 Revert "Uses _mm512_abs_pd for Packet8d pabs"
This reverts commit f019b97aca
2021-03-23 18:52:19 +00:00
David Tellenbach
4811e81966 Remove yet another comma at end of enum 2021-03-18 23:30:00 +01:00
Steve Bronder
f019b97aca Uses _mm512_abs_pd for Packet8d pabs 2021-03-18 15:47:52 +00:00