eigen

CFD/eigen

Author	SHA1	Message	Date
Antonio Sanchez	69adf26aa3	Modify googlehash use to account for namespace issues. The namespace declaration for googlehash is a configurable macro that can be disabled. In particular, it is disabled within google, causing compile errors since `dense_hash_map`/`sparse_hash_map` are then in the global namespace instead of in `::google`. Here we play a bit of gynastics to allow for both `google::_hash_map` and `_hash_map`, while limiting namespace polution. Symbols within the `::google` namespace are imported into `Eigen::google`. We also remove checks based on `_SPARSE_HASH_MAP_H_`, as this is fragile, and instead require `EIGEN_GOOGLEHASH_SUPPORT` to be defined.	2021-04-12 19:00:39 -07:00
Rasmus Munk Larsen	a2c0542010	Fix typo in TensorDimensions.h	2021-04-12 18:59:56 +00:00
Jens Wehner	f6fc66aa75	fixed doxygen for unsupported iterative solver module	2021-04-11 16:26:14 +00:00
Rohit Santhanam	2859db0220	This fixes an issue where the compiler was not choosing the GPU specific specialization of ScanLauncher. The issue was discovered when the GPU scan unit test was run and resulted in a segmentation fault. The segmantation fault occurred because the unit test allocated GPU memory and passed a pointer to that memory to the computation that it presumed would execute on the GPU. But because of the issue, the computation was scheduled to execute on the CPU so a situation was constructed where the CPU attempted to access a GPU memory location. The fix expands the GPU specific ScanLauncher specialization to handle cases where vectorization is enabled. Previously, the GPU specialization is chosen only if Vectorization is not used.	2021-04-08 15:14:48 +00:00
Steve Bronder	e7b8643d70	Revert "Revert "Adds EIGEN_CONSTEXPR and EIGEN_NOEXCEPT to rows(), cols(), innerStride(), outerStride(), and size()"" This reverts commit `5f0b4a4010`.	2021-03-24 18:14:56 +00:00
Jens Wehner	c0a889890f	Fixed output of complex matrices	2021-03-15 21:51:55 +00:00
Antonio Sanchez	543e34ab9d	Re-implement move assignments. The original swap approach leads to potential undefined behavior (reading uninitialized memory) and results in unnecessary copying of data for static storage. Here we pass down the move assignment to the underlying storage. Static storage does a one-way copy, dynamic storage does a swap. Modified the tests to no longer read from the moved-from matrix/tensor, since that can lead to UB. Added a test to ensure we do not access uninitialized memory in a move. Fixes: #2119	2021-03-10 16:55:20 +00:00
Antonio Sanchez	2468253c9a	Define EIGEN_CPLUSPLUS and replace most __cplusplus checks. The macro `__cplusplus` is not defined correctly in MSVC unless building with the the `/Zc:__cplusplus` flag. Instead, it defines `_MSVC_LANG` to the specified c++ standard version number. Here we introduce `EIGEN_CPLUSPLUS` which will contain the c++ version number both for MSVC and otherwise. This simplifies checks for supported features. Also replaced most instances of standard version checking via `__cplusplus` with the existing `EIGEN_COMP_CXXVER` macro for better clarity. Fixes: #2170	2021-03-05 18:33:18 +00:00
David Tellenbach	5f0b4a4010	Revert "Adds EIGEN_CONSTEXPR and EIGEN_NOEXCEPT to rows(), cols(), innerStride(), outerStride(), and size()" This reverts commit `6cbb3038ac` because it breaks clang-10 builds on x86 and aarch64 when C++11 is enabled.	2021-03-05 13:16:43 +01:00
Steve Bronder	6cbb3038ac	Adds EIGEN_CONSTEXPR and EIGEN_NOEXCEPT to rows(), cols(), innerStride(), outerStride(), and size()	2021-03-04 18:58:08 +00:00
Eugene Zhulenev	a6601070f2	Add log2 operation to TensorBase	2021-03-04 00:13:36 +00:00
Christoph Hertzberg	2660d01fa7	Inherit from `no_assignment_operator` to avoid implicit copy constructor warnings (cherry picked from commit 9bbb7ea4b54b1f307863be4ed8d105c38cdefe50)	2021-02-27 18:44:26 +01:00
Christoph Hertzberg	a3521d743c	Fix some enum-enum conversion warnings (cherry picked from commit 838f3d8ce22a5549ef10c7386fb03040721749a0)	2021-02-27 18:44:26 +01:00
Christoph Hertzberg	81b5fe2f0a	ReturnByValue is already non-copyable (cherry picked from commit abbf95045009619f37bd92b45433eedbfcbe41cf)	2021-02-27 18:44:26 +01:00
Christoph Hertzberg	4fb3459a23	Fix double-promotion warnings (cherry picked from commit c22c103e932e511e96645186831363585a44b7a3)	2021-02-27 18:44:26 +01:00
Jens Wehner	4bfcee47b9	Idrs iterative linear solver	2021-02-27 12:09:33 +00:00
Rasmus Munk Larsen	f284c8592b	Don't crash when attempting to slice an empty tensor.	2021-02-24 18:12:51 -08:00
Guoqiang QI	f44197fabd	Some improvements for kissfft from Martin Reinecke(pocketfft author): 1.Only computing about half of the factors and use complex conjugate symmetry for the rest instead of all to save time. 2.All twiddles are calculated in double because that gives the maximum achievable precision when doing float transforms. 3.Reducing all angles to the range 0<angle<pi/4 which gives even more precision.	2021-02-24 21:36:47 +00:00
Antonio Sanchez	5f9cfb2529	Add missing adolc isinf/isnan. Also modified cmake/FindAdolc.cmake to eliminate warnings, and added search paths to match install layout. Fixed: #2157	2021-02-19 22:26:56 +00:00
frgossen	33e0af0130	Return nan at poles of polygamma, digamma, and zeta if limit is not defined	2021-02-19 16:35:11 +00:00
David Tellenbach	36200b7855	Remove vim specific comments to recognoize correct file-type. As discussed in #2143 we remove editor specific comments.	2021-02-09 09:13:09 +01:00
Ralf Hannemann-Tamas	984d010b7b	add specialization of check_sparse_solving() for SuperLU solver, in order to test adjoint and transpose solves	2021-02-08 22:00:31 +00:00
Antonio Sanchez	3f4684f87d	Include `<cstdint>` in one place, remove custom typedefs Originating from [this SO issue](https://stackoverflow.com/questions/65901014/how-to-solve-this-all-error-2-in-this-case), some win32 compilers define `__int32` as a `long`, but MinGW defines `std::int32_t` as an `int`, leading to a type conflict. To avoid this, we remove the custom `typedef` definitions for win32. The Tensor module requires C++11 anyways, so we are guaranteed to have included `<cstdint>` already in `Eigen/Core`. Also re-arranged the headers to only include `<cstdint>` in one place to avoid this type of error again.	2021-01-26 14:23:05 -08:00
David Tellenbach	660c6b857c	Remove std::cerr in iterative solver since we don't have iostream. This fixes #2123	2021-01-21 11:40:05 +01:00
Maozhou, Ge	21a8a2487c	fix paddings of TensorVolumePatchOp	2021-01-15 11:51:49 +08:00
Antonio Sanchez	070d303d56	Add CUDA complex sqrt. This is to support scalar `sqrt` of complex numbers `std::complex<T>` on device, requested by Tensorflow folks. Technically `std::complex` is not supported by NVCC on device (though it is by clang), so the default `sqrt(std::complex<T>)` function only works on the host. Here we create an overload to add back the functionality. Also modified the CMake file to add `--relaxed-constexpr` (or equivalent) flag for NVCC to allow calling constexpr functions from device functions, and added support for specifying compute architecture for NVCC (was already available for clang).	2020-12-22 23:25:23 -08:00
Turing Eret	19e6496ce0	Replace call to FixedDimensions() with a singleton instance of FixedDimensions.	2020-12-16 07:34:44 -07:00
Turing Eret	bc7d1599fb	TensorStorage with FixedDimensions now has zero instance memory overhead. Removed m_dimension as instance member of TensorStorage with FixedDimensions and instead use the template parameter. This means that the sizeof a pure fixed-size storage is exactly equal to the data it is storing.	2020-12-14 07:19:34 -07:00
Antonio Sanchez	2dbac2f99f	Fix bad NEON fp16 check	2020-12-04 13:42:18 -08:00
Antonio Sanchez	e2f21465fe	Special function implementations for half/bfloat16 packets. Current implementations fail to consider half-float packets, only half-float scalars. Added specializations for packets on AVX, AVX512 and NEON. Added tests to `special_packetmath`. The current `special_functions` tests would fail for half and bfloat16 due to lack of precision. The NEON tests also fail with precision issues and due to different handling of `sqrt(inf)`, so special functions bessel, ndtri have been disabled. Tested with AVX, AVX512.	2020-12-04 10:16:29 -08:00
Rasmus Munk Larsen	71c85df4c1	Clean up the Tensor header and get rid of the EIGEN_SLEEP macro.	2020-12-02 11:04:04 -08:00
Antonio Sanchez	17268b155d	Add bit_cast for half/bfloat to/from uint16_t, fix TensorRandom The existing `TensorRandom.h` implementation makes the assumption that `half` (`bfloat16`) has a `uint16_t` member `x` (`value`), which is not always true. This currently fails on arm64, where `x` has type `__fp16`. Added `bit_cast` specializations to allow casting to/from `uint16_t` for both `half` and `bfloat16`. Also added tests in `half_float`, `bfloat16_float`, and `cxx11_tensor_random` to catch these errors in the future.	2020-11-18 20:32:35 +00:00
Antonio Sanchez	3669498f5a	Fix rule-of-3 for the Tensor module. Adds copy constructors to Tensor ops, inherits assignment operators from `TensorBase`. Addresses #1863	2020-11-18 18:14:53 +00:00
mehdi-goli	a725a3233c	[SYCL clean up the code] : removing exrta #pragma unroll in SYCL which was causing issues in embeded systems	2020-10-28 08:34:49 +00:00
Rasmus Munk Larsen	61fc78bbda	Get rid of nested template specialization in TensorReductionGpu.h, which was broken by `c6953f799b`.	2020-10-13 23:53:11 +00:00
Rasmus Munk Larsen	c6953f799b	Add packet generic ops `predux_fmin`, `predux_fmin_nan`, `predux_fmax`, and `predux_fmax_nan` that implement reductions with `PropagateNaN`, and `PropagateNumbers` semantics. Add (slow) generic implementations for most reductions.	2020-10-13 21:48:31 +00:00
David Tellenbach	8f8d77b516	Add EIGEN prefix for HAS_LGAMMA_R	2020-10-08 18:32:19 +02:00
Eugene Zhulenev	2279f2c62f	Use lgamma_r if it is available (update check for glibc 2.19+)	2020-10-08 00:26:45 +00:00
Rasmus Munk Larsen	b431024404	Don't make assumptions about NaN-propagation for pmin/pmax - it various across platforms. Change test to only test for NaN-propagation for pfmin/pfmax.	2020-10-07 19:05:18 +00:00
Zhuyie	e4b24e7fb2	Fix Eigen::ThreadPool::CurrentThreadId returning wrong thread id when EIGEN_AVOID_THREAD_LOCAL and NDEBUG are defined	2020-09-25 09:36:43 +00:00
Rasmus Munk Larsen	e55182ac09	Get rid of initialization logic for blueNorm by making the computed constants static const or constexpr. Move macro definition EIGEN_CONSTEXPR to Core and make all methods in NumTraits constexpr when EIGEN_HASH_CONSTEXPR is 1.	2020-09-18 17:38:58 +00:00
Deven Desai	603e213d13	Fixing a CUDA / P100 regression introduced by PR 181 PR 181 ( https://gitlab.com/libeigen/eigen/-/merge_requests/181 ) adds `__launch_bounds__(1024)` attribute to GPU kernels, that did not have that attribute explicitly specified. That PR seems to cause regressions on the CUDA platform. This PR/commit makes the changes in PR 181, to be applicable for HIP only	2020-08-20 00:29:57 +00:00
Deven Desai	46f8a18567	Adding an explicit launch_bounds(1024) attribute for GPU kernels. Starting with ROCm 3.5, the HIP compiler will change from HCC to hip-clang. This compiler change introduce a change in the default value of the `__launch_bounds__` attribute associated with a GPU kernel. (default value means the value assumed by the compiler as the `__launch_bounds attribute__` value, when it is not explicitly specified by the user) Currently (i.e. for HIP with ROCm 3.3 and older), the default value is 1024. That changes to 256 with ROCm 3.5 (i.e. hip-clang compiler). As a consequence of this change, if a GPU kernel with a `__luanch_bounds__` attribute of 256 is launched at runtime with a threads_per_block value > 256, it leads to a runtime error. This is leading to a couple of Eigen unit test failures with ROCm 3.5. This commit adds an explicit `__launch_bounds(1024)__` attribute to every GPU kernel that currently does not have it explicitly specified (and hence will end up getting the default value of 256 with the change to hip-clang)	2020-08-05 01:46:34 +00:00
Rasmus Munk Larsen	b92206676c	Inherit alignment trait from argument in TensorBroadcasting to avoid segfault when the argument is unaligned.	2020-07-28 19:19:37 +00:00
Antonio Sanchez	9cb8771e9c	Fix tensor casts for large packets and casts to/from std::complex The original tensor casts were only defined for `SrcCoeffRatio`:`TgtCoeffRatio` 1:1, 1:2, 2:1, 4:1. Here we add the missing 1:N and 8:1. We also add casting `Eigen::half` to/from `std::complex<T>`, which was missing to make it consistent with `Eigen:bfloat16`, and generalize the overload to work for any complex type. Tests were added to `basicstuff`, `packetmath`, and `cxx11_tensor_casts` to test all cast configurations.	2020-06-30 18:53:55 +00:00
Teng Lu	386d809bde	Support BFloat16 in Eigen	2020-06-20 19:16:24 +00:00
Ilya Tokar	231ce21535	Run two independent chains, when reducing tensors. Running two chains exposes more instruction level parallelism, by allowing to execute both chains at the same time. Results are a bit noisy, but for medium length we almost hit theoretical upper bound of 2x. BM_fullReduction_16T/3 [using 16 threads] 17.3ns ±11% 17.4ns ± 9% ~ (p=0.178 n=18+19) BM_fullReduction_16T/4 [using 16 threads] 17.6ns ±17% 17.0ns ±18% ~ (p=0.835 n=20+19) BM_fullReduction_16T/7 [using 16 threads] 18.9ns ±12% 18.2ns ±10% ~ (p=0.756 n=20+18) BM_fullReduction_16T/8 [using 16 threads] 19.8ns ±13% 19.4ns ±21% ~ (p=0.512 n=20+20) BM_fullReduction_16T/10 [using 16 threads] 23.5ns ±15% 20.8ns ±24% -11.37% (p=0.000 n=20+19) BM_fullReduction_16T/15 [using 16 threads] 35.8ns ±21% 26.9ns ±17% -24.76% (p=0.000 n=20+19) BM_fullReduction_16T/16 [using 16 threads] 38.7ns ±22% 27.7ns ±18% -28.40% (p=0.000 n=20+19) BM_fullReduction_16T/31 [using 16 threads] 146ns ±17% 74ns ±11% -49.05% (p=0.000 n=20+18) BM_fullReduction_16T/32 [using 16 threads] 154ns ±19% 84ns ±30% -45.79% (p=0.000 n=20+19) BM_fullReduction_16T/64 [using 16 threads] 603ns ± 8% 308ns ±12% -48.94% (p=0.000 n=17+17) BM_fullReduction_16T/128 [using 16 threads] 2.44µs ±13% 1.22µs ± 1% -50.29% (p=0.000 n=17+17) BM_fullReduction_16T/256 [using 16 threads] 9.84µs ±14% 5.13µs ±30% -47.82% (p=0.000 n=19+19) BM_fullReduction_16T/512 [using 16 threads] 78.0µs ± 9% 56.1µs ±17% -28.02% (p=0.000 n=18+20) BM_fullReduction_16T/1k [using 16 threads] 325µs ± 5% 263µs ± 4% -19.00% (p=0.000 n=20+16) BM_fullReduction_16T/2k [using 16 threads] 1.09ms ± 3% 0.99ms ± 1% -9.04% (p=0.000 n=20+20) BM_fullReduction_16T/4k [using 16 threads] 7.66ms ± 3% 7.57ms ± 3% -1.24% (p=0.017 n=20+20) BM_fullReduction_16T/10k [using 16 threads] 65.3ms ± 4% 65.0ms ± 3% ~ (p=0.718 n=20+20)	2020-06-16 15:55:11 -04:00
mehdi-goli	d3e81db6c5	Eigen moved the `scanLauncehr` function inside the internal namespace. This commit applies the following changes: - Moving the `scamLauncher` specialization inside internal namespace to fix compiler crash on TensorScan for SYCL backend. - Replacing `SYCL/sycl.hpp` to `CL/sycl.hpp` in order to follow SYCL 1.2.1 standard. - minor fixes: commenting out an unused variable to avoid compiler warnings.	2020-05-11 16:10:33 +01:00
Rasmus Munk Larsen	2fd8a5a08f	Add parallelization of TensorScanOp for types without packet ops. Clean up the code a bit and do a few micro-optimizations to improve performance for small tensors. Benchmark numbers for Tensor<uint32_t>: name old time/op new time/op delta BM_cumSumRowReduction_1T/8 [using 1 threads] 76.5ns ± 0% 61.3ns ± 4% -19.80% (p=0.008 n=5+5) BM_cumSumRowReduction_1T/64 [using 1 threads] 2.47µs ± 1% 2.40µs ± 1% -2.77% (p=0.008 n=5+5) BM_cumSumRowReduction_1T/256 [using 1 threads] 39.8µs ± 0% 39.6µs ± 0% -0.60% (p=0.008 n=5+5) BM_cumSumRowReduction_1T/4k [using 1 threads] 13.9ms ± 0% 13.4ms ± 1% -4.19% (p=0.008 n=5+5) BM_cumSumRowReduction_2T/8 [using 2 threads] 76.8ns ± 0% 59.1ns ± 0% -23.09% (p=0.016 n=5+4) BM_cumSumRowReduction_2T/64 [using 2 threads] 2.47µs ± 1% 2.41µs ± 1% -2.53% (p=0.008 n=5+5) BM_cumSumRowReduction_2T/256 [using 2 threads] 39.8µs ± 0% 34.7µs ± 6% -12.74% (p=0.008 n=5+5) BM_cumSumRowReduction_2T/4k [using 2 threads] 13.8ms ± 1% 7.2ms ± 6% -47.74% (p=0.008 n=5+5) BM_cumSumRowReduction_8T/8 [using 8 threads] 76.4ns ± 0% 61.8ns ± 3% -19.02% (p=0.008 n=5+5) BM_cumSumRowReduction_8T/64 [using 8 threads] 2.47µs ± 1% 2.40µs ± 1% -2.84% (p=0.008 n=5+5) BM_cumSumRowReduction_8T/256 [using 8 threads] 39.8µs ± 0% 28.3µs ±11% -28.75% (p=0.008 n=5+5) BM_cumSumRowReduction_8T/4k [using 8 threads] 13.8ms ± 0% 2.7ms ± 5% -80.39% (p=0.008 n=5+5) BM_cumSumColReduction_1T/8 [using 1 threads] 59.1ns ± 0% 80.3ns ± 0% +35.94% (p=0.029 n=4+4) BM_cumSumColReduction_1T/64 [using 1 threads] 3.06µs ± 0% 3.08µs ± 1% ~ (p=0.114 n=4+4) BM_cumSumColReduction_1T/256 [using 1 threads] 175µs ± 0% 176µs ± 0% ~ (p=0.190 n=4+5) BM_cumSumColReduction_1T/4k [using 1 threads] 824ms ± 1% 844ms ± 1% +2.37% (p=0.008 n=5+5) BM_cumSumColReduction_2T/8 [using 2 threads] 59.0ns ± 0% 90.7ns ± 0% +53.74% (p=0.029 n=4+4) BM_cumSumColReduction_2T/64 [using 2 threads] 3.06µs ± 0% 3.10µs ± 0% +1.08% (p=0.016 n=4+5) BM_cumSumColReduction_2T/256 [using 2 threads] 176µs ± 0% 189µs ±18% ~ (p=0.151 n=5+5) BM_cumSumColReduction_2T/4k [using 2 threads] 836ms ± 2% 611ms ±14% -26.92% (p=0.008 n=5+5) BM_cumSumColReduction_8T/8 [using 8 threads] 59.3ns ± 2% 90.6ns ± 0% +52.79% (p=0.008 n=5+5) BM_cumSumColReduction_8T/64 [using 8 threads] 3.07µs ± 0% 3.10µs ± 0% +0.99% (p=0.016 n=5+4) BM_cumSumColReduction_8T/256 [using 8 threads] 176µs ± 0% 80µs ±19% -54.51% (p=0.008 n=5+5) BM_cumSumColReduction_8T/4k [using 8 threads] 827ms ± 2% 180ms ±14% -78.24% (p=0.008 n=5+5)	2020-05-06 14:48:37 -07:00
Rasmus Munk Larsen	0e59f786e1	Fix accidental copy of loop variable.	2020-05-05 21:35:38 +00:00

1 2 3 4 5 ...

2338 Commits