release of tvmet (inactive for 2 years and developer unreachable) as the basis for eigen2, because it provides seemingly good expression template mechanisms, we want that, and it would take years to reinvent that wheel. We'll see. So this commit imports the last tvmet release.
376 lines
12 KiB
Plaintext
376 lines
12 KiB
Plaintext
/*
|
||
* $Id: compiler.dox,v 1.24 2005/03/09 12:05:19 opetzold Exp $
|
||
*/
|
||
|
||
/**
|
||
\page compiler Compiler Support
|
||
|
||
<p>Contents:</p>
|
||
- \ref requirements
|
||
- \ref gcc
|
||
- \ref kcc
|
||
- \ref pgCC
|
||
- \ref intel
|
||
- \ref vc71
|
||
|
||
\section requirements General compiler Requirements
|
||
|
||
This library is designed for portability - no compiler specific extensions
|
||
are used. Nevertheless, there are a few requirements: (These are all a part
|
||
of the C++ standard.)
|
||
|
||
- Support for the <tt>mutable</tt> keyword is required. This is used by the
|
||
CommaInitializer only.
|
||
|
||
- The <tt>typename</tt> keyword is used exhaustively here.
|
||
|
||
- The namespace concept is required. The tvmet library is itself is a
|
||
namespace. To avoid collisions of operators, there is also an element_wise
|
||
namespace within tvmet.
|
||
|
||
- Partial specialization is needed for the extrema functions min and max
|
||
to distinguish between vectors and matrices. This allows tvmet to return
|
||
an object with a specific behavior. (The location of an extremum in a
|
||
matrix has a (row, column) position whereas a vector extremum has only a
|
||
single index for its position).
|
||
|
||
\section gcc The GNU Compiler Collection
|
||
|
||
The <a href=http://gcc.gnu.org>GNU compiler</a> collection is mainly used for
|
||
developing this library. Moreover, it does compile the library the fastest.
|
||
|
||
\subsection gcc2953 GNU C++ Compiler v2.95.3
|
||
|
||
Gcc v2.95.3 is the last official release of the version 2 series from gnu.org.
|
||
Since this compiler features the \ref requirements it does work, but only
|
||
partial.
|
||
|
||
There are certain difficulties - see \ref ambiguous_overload (also, please
|
||
read about \ref gcc296). Furthermore, there are problems with functions and
|
||
operators declared in the namespace <code>element_wise</code> - the
|
||
compiler doesn't seem to find them--even though the compiler does know
|
||
about namespace tvmet. It appears to be a problem with nested namespaces and
|
||
the compiler's ability to perform function/operator lookup, especially during
|
||
regression tests: <code> matrix /= matrix </code> compiles inside a single
|
||
file but not at the regression tests--which is a contradiction in terms.
|
||
|
||
Porting to gcc v2.95.3 requires a lot of knowledge and effort--unfortunately,
|
||
I don't have enough of either. The examples do compile and the regression
|
||
tests build partially.
|
||
|
||
Matrix and vector operators are working, but don't expect too much.
|
||
|
||
\subsection gcc296 GNU C++ Compiler v2.96 (Rh7.x, MD8.x)
|
||
|
||
This compiler isn't an official release of the GNU Compiler group but shipped
|
||
by <a href=http://www.redhat.com>Red Hat</a> and Co.
|
||
|
||
Blitz++ is using a hasFastAccess() flag to perform a check for the use of
|
||
_bz_meta_vecAssign::fastAssign (without bounds checking) or
|
||
_bz_meta_vecAssign::assign (with bounds checking). This
|
||
isn't really necessary for operations on blitz::TinyVector, since it's
|
||
always true. Nevertheless, it is important for the produced asm code using
|
||
the gcc-c++-2.96-0.48mdk. Generally the code for Blitz++ using the gcc-2.96
|
||
is better than tvmet because of this (tested!).
|
||
|
||
I got into trouble with stl_relops.h where miscellaneous operators are defined. A
|
||
simple define of __SGI_STL_INTERNAL_RELOPS in the config header doesn't solve
|
||
the problem, only the commented out header version, see \ref
|
||
ambiguous_overload. Because of this problem, the regression tests don't
|
||
compile with this version. Projects with do not use the relational
|
||
operators are not affected.
|
||
|
||
It seems that the inlining performed by this compiler collection isn't very
|
||
smart. I got a lot of warnings: can't inline call to ... So, it would be
|
||
best to use the \ref gcc30x and later compilers.
|
||
|
||
\subsection gcc30x GNU C++ Compiler v3.0.x
|
||
|
||
These compiler produce better code than the \ref gcc296! Even the problems
|
||
with blitz++ fastAssign have vanished. And this compiler conforms to the
|
||
standard. The regression tests does compile and run successfully.
|
||
|
||
Due to the nature of ET and MT there is a need for a high level of inlining.
|
||
The v3.0.x seems to do this well as compared to the v2.9x compilers which
|
||
produce inline warnings.
|
||
|
||
This compiler works great with the
|
||
<a href=http://www.stlport.org">STLPort-4.5.3</a>
|
||
implementation of the STL/C++ Library, Tiny Vector and Matrix template library
|
||
and <a href=http://cppunit.sourceforge.net>cpp-unit</a>.
|
||
|
||
\subsection gcc31x GNU C++ Compiler v3.1
|
||
|
||
%tvmet does compile with this new GNU C++ compiler. The produced code looks
|
||
as good as the code created by \ref gcc30x. (Does anyone have time to make
|
||
a benchmark?)
|
||
|
||
The primary goal is conformance to the standard ISO/IEC 14882:1998.
|
||
|
||
\subsection gcc32x GNU C++ Compiler v3.2.x
|
||
|
||
The once again changed Application Binary Interface (ABI) doesn't affect
|
||
tvmet since it isn't a binary library--it's only compiled templates inside
|
||
the client code.
|
||
|
||
There are some problems with the GNU C++ compiler collection on the
|
||
regression test due to some bugs (IMO), \sa \ref regressiontest_failed.
|
||
|
||
\subsection gcc33x GNU C++ Compiler v3.3
|
||
|
||
Tested and works fine. Only some warnings on failed inlining which doesn't
|
||
concern tvmet directly.
|
||
|
||
Anyway, here the code from <tt>examples/ray.cc</tt> on gcc 3.3.3 using
|
||
<tt>-O2 -DTVMET_OPTIMIZE</tt>
|
||
|
||
\par Assembler (IA-32 Intel<65> Architecture):
|
||
\code
|
||
movl 16(%ebp), %edx
|
||
movl 12(%ebp), %ebx
|
||
movl 8(%ebp), %esi
|
||
fldl 8(%edx)
|
||
fldl 16(%edx)
|
||
fmull 16(%ebx)
|
||
fxch %st(1)
|
||
movl %ebx, -24(%ebp)
|
||
fmull 8(%ebx)
|
||
movl %edx, -32(%ebp)
|
||
fldl (%edx)
|
||
fmull (%ebx)
|
||
fxch %st(1)
|
||
movl %edx, -60(%ebp)
|
||
movl %edx, -12(%ebp)
|
||
faddp %st, %st(2)
|
||
faddp %st, %st(1)
|
||
fadd %st(0), %st
|
||
fstpl -56(%ebp)
|
||
movl -56(%ebp), %ecx
|
||
movl -52(%ebp), %eax
|
||
movl %ecx, -20(%ebp)
|
||
movl %eax, -16(%ebp)
|
||
movl %ecx, -40(%ebp)
|
||
movl %eax, -36(%ebp)
|
||
movl %ecx, -68(%ebp)
|
||
movl %eax, -64(%ebp)
|
||
fldl (%edx)
|
||
fmull -20(%ebp)
|
||
fsubrl (%ebx)
|
||
fstpl (%esi)
|
||
fldl 8(%edx)
|
||
fmull -20(%ebp)
|
||
fsubrl 8(%ebx)
|
||
fstpl 8(%esi)
|
||
fldl 16(%edx)
|
||
fmull -20(%ebp)
|
||
fsubrl 16(%ebx)
|
||
fstpl 16(%esi)
|
||
addl $64, %esp
|
||
popl %ebx
|
||
popl %esi
|
||
popl %ebp
|
||
ret
|
||
\endcode
|
||
|
||
\subsection gcc34x GNU C++ Compiler v3.4.x
|
||
|
||
The compiler 3.4.3 works fine, starting with tvmet release 1.7.1. The problem is
|
||
the correct syntax for the CommaInitializer template declaration and
|
||
implementation.
|
||
|
||
There is no assembler output for our <tt>examples/ray.cc</tt>, since I don't
|
||
have this compiler yet (yes, I need to update my linux system ;-)
|
||
|
||
|
||
\section kcc Kai C++
|
||
|
||
This has not been tested. Unfortunately Kai's compiler is no longer shipped
|
||
-- one should use the Intel compiler instead
|
||
(see <a href=http://www.kai.com>here</a>).
|
||
|
||
If you have used it successfully including regression and/or benchmark tests,
|
||
please give me an answer.
|
||
|
||
\section pgCC Portland Group Compiler Technology
|
||
|
||
\subsection pgCC32 Portland Group C++ 3.2
|
||
|
||
The <a href=http://www.pgroup.com>Portland Group</a> C++ compiler is shipped
|
||
with the RogueWave Standard C++ Library which provides conformance to the
|
||
standard. Unfortunately, the <cname> C library wrapper headers and the C++
|
||
overloads of the math functions are not provided on all platforms, see
|
||
<http://www.cug.com/roundup>. The download evaluation version 3.2-4 for
|
||
Linux is affected for example. At first glance, it does compile with pgCC
|
||
since it has has the great <a href=http://www.edg.com>EDG</a> front-end.
|
||
|
||
Maybe there is a solution with other standard library implementations like
|
||
<a href=http://www.stlport.org>STLPort</a> (On a quick try the STL Port
|
||
doesn't recognize the pgCC). If you know more about this, please let me know.
|
||
|
||
Anyway, the code produced is very poor even if I use high inlining levels
|
||
like the command line option -Minline=levels:100 which increases the compile
|
||
time dramatically! The benchmark tests have not been done. Unfortunately,
|
||
my trial period has expired. I haven't any idea if this compiler will pass
|
||
the regression tests.
|
||
|
||
\subsection pgCC51 Portland Group C++ 5.1
|
||
|
||
The <a href=http://www.pgroup.com>Portland Group</a> C++ compiler is shipped
|
||
with the <a href=http://www.stlport.org>STLport</a> Standard C++ Library, cool!
|
||
|
||
The code produced isn't very compact compared with the intel or gnu compiler.
|
||
Anyway it works, but the compiler time increases dramatically
|
||
even on higher inline levels.
|
||
|
||
|
||
\section intel Intel Compiler
|
||
|
||
\subsection icc5 Intel Compiler v5.0.1
|
||
|
||
This compiler complains even more than gcc-3.0.x regarding template
|
||
specifiers (e.g. correct spaces for template arguments to std::complex are
|
||
needed even when not instanced).
|
||
|
||
The produced code looks good but, I haven't done a benchmark to compare it
|
||
with the gcc-3.0.x since the compile time increases for the benchmark test
|
||
dramatically.
|
||
|
||
I have not run any regression tests due to the compile time needed by my
|
||
AMD K6/400 Linux box ...
|
||
|
||
\subsection icc6 Intel Compiler v6.0.x
|
||
|
||
Should work, but I haven't tested it.
|
||
|
||
\subsection icc7 Intel Compiler v7.x
|
||
|
||
This compiler is well supported by tvmet and passes the regression tests
|
||
without any failure - as opposed to the GNU C++ compiler collection.
|
||
|
||
\subsection icc8 Intel Compiler v8.x
|
||
|
||
No regression tests are done - reports are welcome. I'm not expecting
|
||
problems. Anyway, this versions uses pure macros for IEEE math isnan and
|
||
isinf. This prevents overwriting with tvmet's functions. Therefore
|
||
this functions are disabled after tvmet release 1.4.1. The code produced
|
||
is even on <tt>examples/ray.cc</tt> more compact than the \ref gcc33x.
|
||
|
||
Anyway, here the code from <tt>examples/ray.cc</tt> using
|
||
<tt>-O2 -DTVMET_OPTIMIZE</tt>
|
||
|
||
\par Assembler (IA-32 Intel<65> Architecture):
|
||
\code
|
||
movl 4(%esp), %ecx
|
||
movl 8(%esp), %edx
|
||
movl 12(%esp), %eax
|
||
fldl (%edx)
|
||
fmull (%eax)
|
||
fldl 8(%edx)
|
||
fmull 8(%eax)
|
||
fldl 16(%edx)
|
||
fmull 16(%eax)
|
||
faddp %st, %st(1)
|
||
faddp %st, %st(1)
|
||
fldl (%eax)
|
||
fxch %st(1)
|
||
fadd %st(0), %st
|
||
fmul %st, %st(1)
|
||
fxch %st(1)
|
||
fsubrl (%edx)
|
||
fstpl (%ecx)
|
||
fldl 8(%eax)
|
||
fmul %st(1), %st
|
||
fsubrl 8(%edx)
|
||
fstpl 8(%ecx)
|
||
fldl 16(%eax)
|
||
fmulp %st, %st(1)
|
||
fsubrl 16(%edx)
|
||
fstpl 16(%ecx)
|
||
ret
|
||
\endcode
|
||
|
||
|
||
|
||
|
||
\section vc71 Microsoft Visual C++ v7.1
|
||
|
||
\htmlonly
|
||
<script language="JavaScript">
|
||
var m_name="blbounnejapny";
|
||
var m_domain="hotmail.com";
|
||
var m_text='<a href="mailto:'+m_name+'@'+m_domain+'?subject=tvmet and Microsoft VC++">';
|
||
m_text+='Robi Carnecky</a>';
|
||
document.write(m_text);
|
||
</script>
|
||
\endhtmlonly
|
||
\latexonly
|
||
Robi Carnecky <blbounnejapny@hotmail.com>
|
||
\endlatexonly
|
||
has reported the success on tvmet using Visual C++ v7.1. At this
|
||
release of tvmet there are some warnings left - the work is on
|
||
progress.
|
||
|
||
The <a href="http://msdn.microsoft.com/visualc/vctoolkit2003/">Microsoft Visual C++ Toolkit 2003</a>
|
||
and Visual C++ prior 7.1 do not compile - you will get an undefined
|
||
internal error unfortunally.
|
||
|
||
Anyway, here the code from <tt>examples/ray.cc</tt>:
|
||
|
||
\par Assembler (IA-32 Intel<65> Architecture, no SSE2):
|
||
\code
|
||
push ebp
|
||
mov ebp, esp
|
||
and esp, -8 ; fffffff8H
|
||
sub esp, 28 ; 0000001cH
|
||
mov eax, DWORD PTR _ray$[ebp]
|
||
mov ecx, DWORD PTR _surfaceNormal$[ebp]
|
||
fld QWORD PTR [eax+16]
|
||
fmul QWORD PTR [ecx+16]
|
||
push ebx
|
||
fld QWORD PTR [eax+8]
|
||
push esi
|
||
fmul QWORD PTR [ecx+8]
|
||
push edi
|
||
mov edi, DWORD PTR $T35206[esp+52]
|
||
faddp ST(1), ST(0)
|
||
mov DWORD PTR $T35027[esp+60], edi
|
||
fld QWORD PTR [eax]
|
||
pop edi
|
||
fmul QWORD PTR [ecx]
|
||
faddp ST(1), ST(0)
|
||
fadd ST(0), ST(0)
|
||
fstp QWORD PTR $T35206[esp+36]
|
||
mov esi, DWORD PTR $T35206[esp+40]
|
||
mov edx, DWORD PTR $T35206[esp+36]
|
||
mov ebx, DWORD PTR $T35265[esp+40]
|
||
mov DWORD PTR $T35027[esp+44], edx
|
||
mov edx, DWORD PTR _reflection$[ebp]
|
||
mov DWORD PTR $T35027[esp+48], esi
|
||
fld QWORD PTR $T35027[esp+44]
|
||
fmul QWORD PTR [ecx]
|
||
pop esi
|
||
mov DWORD PTR $T35027[esp+36], ebx
|
||
pop ebx
|
||
fsubr QWORD PTR [eax]
|
||
fstp QWORD PTR [edx]
|
||
fld QWORD PTR $T35027[esp+36]
|
||
fmul QWORD PTR [ecx+8]
|
||
fsubr QWORD PTR [eax+8]
|
||
fstp QWORD PTR [edx+8]
|
||
fld QWORD PTR $T35027[esp+36]
|
||
fmul QWORD PTR [ecx+16]
|
||
fsubr QWORD PTR [eax+16]
|
||
fstp QWORD PTR [edx+16]
|
||
mov esp, ebp
|
||
pop ebp
|
||
\endcode
|
||
|
||
\sa \ref regressiontest_failed
|
||
\sa \ref install_win
|
||
*/
|
||
|
||
|
||
// Local Variables:
|
||
// mode:c++
|
||
// End:
|