* add fixed-size vectorized path * add missing restrict keywords * use innerStride() * allow vectorization even if innerStride()>1, if PacketSize==1 (think of the case of rows of std::complex<double>)