* added some tests for product and swap * overload .swap() for dynamic-sized matrix of same size
(equivalent to the GEMM blas routine) * added a GEMM benchmark