eigen/tvmet-1.7.1/doc/compiler.dox

/*
 * $Id: compiler.dox,v 1.24 2005/03/09 12:05:19 opetzold Exp $
 */

/**
   \page compiler Compiler Support

   <p>Contents:</p>
   - \ref requirements
   - \ref gcc
   - \ref kcc
   - \ref pgCC
   - \ref intel
   - \ref vc71

   \section requirements General compiler Requirements

   This library is designed for portability - no compiler specific extensions
   are used. Nevertheless, there are a few requirements:  (These are all a part
   of the C++ standard.)

   - Support for the <tt>mutable</tt> keyword is required.  This is used by the
     CommaInitializer only.

   - The <tt>typename</tt> keyword is used exhaustively here.

   - The namespace concept is required.  The tvmet library is itself is a
     namespace. To avoid collisions of operators, there is also an element_wise
     namespace within tvmet.

   - Partial specialization is needed for the extrema functions min and max
     to distinguish between vectors and matrices. This allows tvmet to return
     an object with a specific behavior.  (The location of an extremum in a
     matrix has a (row, column) position whereas a vector extremum has only a
     single index for its position).

   \section gcc The GNU Compiler Collection

   The <a href=http://gcc.gnu.org>GNU compiler</a> collection is mainly used for
   developing this library. Moreover, it does compile the library the fastest.

   \subsection gcc2953 GNU C++ Compiler v2.95.3

   Gcc v2.95.3 is the last official release of the version 2 series from gnu.org.
   Since this compiler features the \ref requirements it does work, but only
   partial.

   There are certain difficulties - see \ref ambiguous_overload (also, please
   read about \ref gcc296).  Furthermore, there are problems with functions and
   operators declared in the namespace <code>element_wise</code> - the
   compiler doesn't seem to find them--even though the compiler does know
   about namespace tvmet.  It appears to be a problem with nested namespaces and
   the compiler's ability to perform function/operator lookup, especially during
   regression tests: <code> matrix /= matrix </code> compiles inside a single
   file but not at the regression tests--which is a contradiction in terms.

   Porting to gcc v2.95.3 requires a lot of knowledge and effort--unfortunately,
   I don't have enough of either.  The examples do compile and the regression
   tests build partially.

   Matrix and vector operators are working, but don't expect too much.

   \subsection gcc296 GNU C++ Compiler v2.96 (Rh7.x, MD8.x)

   This compiler isn't an official release of the GNU Compiler group but shipped
   by <a href=http://www.redhat.com>Red Hat</a> and Co.

   Blitz++ is using a hasFastAccess() flag to perform a check for the use of
   _bz_meta_vecAssign::fastAssign (without bounds checking) or
   _bz_meta_vecAssign::assign (with bounds checking). This
   isn't really necessary for operations on blitz::TinyVector, since it's
   always true. Nevertheless, it is important for the produced asm code using
   the gcc-c++-2.96-0.48mdk.  Generally the code for Blitz++ using the gcc-2.96
   is better than tvmet because of this (tested!).

   I got into trouble with stl_relops.h where miscellaneous operators are defined. A
   simple define of __SGI_STL_INTERNAL_RELOPS in the config header doesn't solve
   the problem, only the commented out header version, see \ref
   ambiguous_overload.  Because of this problem, the regression tests don't
   compile with this version.  Projects with do not use the relational
   operators are not affected.

   It seems that the inlining performed by this compiler collection isn't very
   smart. I got a lot of warnings: can't inline call to ... So, it would be
   best to use the \ref gcc30x and later compilers.

   \subsection gcc30x GNU C++ Compiler v3.0.x

   These compiler produce better code than the \ref gcc296! Even the problems
   with blitz++ fastAssign have vanished. And this compiler conforms to the
   standard.  The regression tests does compile and run successfully.

   Due to the nature of ET and MT there is a need for a high level of inlining.
   The v3.0.x seems to do this well as compared to the v2.9x compilers which
   produce inline warnings.

   This compiler works great with the
   <a href=http://www.stlport.org">STLPort-4.5.3</a>
   implementation of the STL/C++ Library, Tiny Vector and Matrix template library
   and <a href=http://cppunit.sourceforge.net>cpp-unit</a>.

   \subsection gcc31x GNU C++ Compiler v3.1

   %tvmet does compile with this new GNU C++ compiler. The produced code looks
   as good as the code created by \ref gcc30x.  (Does anyone have time to make
   a benchmark?)

   The primary goal is conformance to the standard ISO/IEC 14882:1998.

   \subsection gcc32x GNU C++ Compiler v3.2.x

   The once again changed Application Binary Interface (ABI) doesn't affect
   tvmet since it isn't a binary library--it's only compiled templates inside
   the client code.

   There are some problems with the GNU C++ compiler collection on the
   regression test due to some bugs (IMO), \sa \ref regressiontest_failed.

   \subsection gcc33x GNU C++ Compiler v3.3

   Tested and works fine. Only some warnings on failed inlining which doesn't
   concern tvmet directly.

   Anyway, here the code from <tt>examples/ray.cc</tt> on gcc 3.3.3 using
   <tt>-O2 -DTVMET_OPTIMIZE</tt>

   \par Assembler (IA-32 Intel<65> Architecture):
   \code
	movl	16(%ebp), %edx
	movl	12(%ebp), %ebx
	movl	8(%ebp), %esi
	fldl	8(%edx)
	fldl	16(%edx)
	fmull	16(%ebx)
	fxch	%st(1)
	movl	%ebx, -24(%ebp)
	fmull	8(%ebx)
	movl	%edx, -32(%ebp)
	fldl	(%edx)
	fmull	(%ebx)
	fxch	%st(1)
	movl	%edx, -60(%ebp)
	movl	%edx, -12(%ebp)
	faddp	%st, %st(2)
	faddp	%st, %st(1)
	fadd	%st(0), %st
	fstpl	-56(%ebp)
	movl	-56(%ebp), %ecx
	movl	-52(%ebp), %eax
	movl	%ecx, -20(%ebp)
	movl	%eax, -16(%ebp)
	movl	%ecx, -40(%ebp)
	movl	%eax, -36(%ebp)
	movl	%ecx, -68(%ebp)
	movl	%eax, -64(%ebp)
	fldl	(%edx)
	fmull	-20(%ebp)
	fsubrl	(%ebx)
	fstpl	(%esi)
	fldl	8(%edx)
	fmull	-20(%ebp)
	fsubrl	8(%ebx)
	fstpl	8(%esi)
	fldl	16(%edx)
	fmull	-20(%ebp)
	fsubrl	16(%ebx)
	fstpl	16(%esi)
	addl	$64, %esp
	popl	%ebx
	popl	%esi
	popl	%ebp
	ret
   \endcode

   \subsection gcc34x GNU C++ Compiler v3.4.x

   The compiler 3.4.3 works fine, starting with tvmet release 1.7.1. The problem is
   the correct syntax for the CommaInitializer template declaration and
   implementation.

   There is no assembler output for our <tt>examples/ray.cc</tt>, since I don't
   have this compiler yet (yes, I need to update my linux system ;-)


   \section kcc Kai C++

   This has not been tested.  Unfortunately Kai's compiler is no longer shipped
   -- one should use the Intel compiler instead
   (see <a href=http://www.kai.com>here</a>).

   If you have used it successfully including regression and/or benchmark tests,
   please give me an answer.

   \section pgCC Portland Group Compiler Technology

   \subsection pgCC32 Portland Group C++ 3.2

   The <a href=http://www.pgroup.com>Portland Group</a> C++ compiler is shipped
   with the RogueWave Standard C++ Library which provides conformance to the
   standard.  Unfortunately, the &lt;cname&gt; C library wrapper headers and the C++
   overloads of the math functions are not provided on all platforms, see
   <http://www.cug.com/roundup>.  The download evaluation version 3.2-4 for
   Linux is affected for example. At first glance, it does compile with pgCC
   since it has has the great <a href=http://www.edg.com>EDG</a> front-end.

   Maybe there is a solution with other standard library implementations like
   <a href=http://www.stlport.org>STLPort</a> (On a quick try the STL Port
   doesn't recognize the pgCC). If you know more about this, please let me know.

   Anyway, the code produced is very poor even if I use high inlining levels
   like the command line option -Minline=levels:100 which increases the compile
   time dramatically!  The benchmark tests have not been done.  Unfortunately,
   my trial period has expired.  I haven't any idea if this compiler will pass
   the regression tests.

   \subsection pgCC51 Portland Group C++ 5.1

   The <a href=http://www.pgroup.com>Portland Group</a> C++ compiler is shipped
   with the <a href=http://www.stlport.org>STLport</a> Standard C++ Library, cool!

   The code produced isn't very compact compared with the intel or gnu compiler.
   Anyway it works, but the compiler time increases dramatically
   even on higher inline levels.


   \section intel Intel Compiler

   \subsection icc5 Intel Compiler v5.0.1

   This compiler complains even more than gcc-3.0.x regarding template
   specifiers (e.g. correct spaces for template arguments to std::complex are
   needed even when not instanced).

   The produced code looks good but, I haven't done a benchmark to compare it
   with the gcc-3.0.x since the compile time increases for the benchmark test
   dramatically.

   I have not run any regression tests due to the compile time needed by my
   AMD K6/400 Linux box ...

   \subsection icc6 Intel Compiler v6.0.x

   Should work, but I haven't tested it.

   \subsection icc7 Intel Compiler v7.x

   This compiler is well supported by tvmet and passes the regression tests
   without any failure - as opposed to the GNU C++ compiler collection.

   \subsection icc8 Intel Compiler v8.x

   No regression tests are done - reports are welcome. I'm not expecting
   problems. Anyway, this versions uses pure macros for IEEE math isnan and
   isinf. This prevents overwriting with tvmet's functions. Therefore
   this functions are disabled after tvmet release 1.4.1. The code produced
   is even on <tt>examples/ray.cc</tt> more compact than the \ref gcc33x.

   Anyway, here the code from <tt>examples/ray.cc</tt> using
   <tt>-O2 -DTVMET_OPTIMIZE</tt>

   \par Assembler (IA-32 Intel<65> Architecture):
   \code
        movl      4(%esp), %ecx
        movl      8(%esp), %edx
        movl      12(%esp), %eax
        fldl      (%edx)
        fmull     (%eax)
        fldl      8(%edx)
        fmull     8(%eax)
        fldl      16(%edx)
        fmull     16(%eax)
        faddp     %st, %st(1)
        faddp     %st, %st(1)
        fldl      (%eax)
        fxch      %st(1)
        fadd      %st(0), %st
        fmul      %st, %st(1)
        fxch      %st(1)
        fsubrl    (%edx)
        fstpl     (%ecx)
        fldl      8(%eax)
        fmul      %st(1), %st
        fsubrl    8(%edx)
        fstpl     8(%ecx)
        fldl      16(%eax)
        fmulp     %st, %st(1)
        fsubrl    16(%edx)
        fstpl     16(%ecx)
        ret
   \endcode


   \section vc71 Microsoft Visual C++ v7.1

   \htmlonly
   <script language="JavaScript">
   var m_name="blbounnejapny";
   var m_domain="hotmail.com";
   var m_text='<a href="mailto:'+m_name+'@'+m_domain+'?subject=tvmet and Microsoft VC++">';
   m_text+='Robi Carnecky</a>';
   document.write(m_text);
   </script>
   \endhtmlonly
   \latexonly
   Robi Carnecky <blbounnejapny@hotmail.com>
   \endlatexonly
   has reported the success on tvmet using Visual C++ v7.1.   At this
   release of tvmet there are some warnings left - the work is on
   progress.

   The <a href="http://msdn.microsoft.com/visualc/vctoolkit2003/">Microsoft Visual C++ Toolkit 2003</a>
   and Visual C++ prior 7.1 do not compile - you will get an undefined
   internal error unfortunally.

   Anyway, here the code from <tt>examples/ray.cc</tt>:

   \par Assembler (IA-32 Intel<65> Architecture, no SSE2):
   \code
	push	ebp
	mov	ebp, esp
	and	esp, -8					; fffffff8H
	sub	esp, 28					; 0000001cH
	mov	eax, DWORD PTR _ray$[ebp]
	mov	ecx, DWORD PTR _surfaceNormal$[ebp]
	fld	QWORD PTR [eax+16]
	fmul	QWORD PTR [ecx+16]
	push	ebx
	fld	QWORD PTR [eax+8]
	push	esi
	fmul	QWORD PTR [ecx+8]
	push	edi
	mov	edi, DWORD PTR $T35206[esp+52]
	faddp	ST(1), ST(0)
	mov	DWORD PTR $T35027[esp+60], edi
	fld	QWORD PTR [eax]
	pop	edi
	fmul	QWORD PTR [ecx]
	faddp	ST(1), ST(0)
	fadd	ST(0), ST(0)
	fstp	QWORD PTR $T35206[esp+36]
	mov	esi, DWORD PTR $T35206[esp+40]
	mov	edx, DWORD PTR $T35206[esp+36]
	mov	ebx, DWORD PTR $T35265[esp+40]
	mov	DWORD PTR $T35027[esp+44], edx
	mov	edx, DWORD PTR _reflection$[ebp]
	mov	DWORD PTR $T35027[esp+48], esi
	fld	QWORD PTR $T35027[esp+44]
	fmul	QWORD PTR [ecx]
	pop	esi
	mov	DWORD PTR $T35027[esp+36], ebx
	pop	ebx
	fsubr	QWORD PTR [eax]
	fstp	QWORD PTR [edx]
	fld	QWORD PTR $T35027[esp+36]
	fmul	QWORD PTR [ecx+8]
	fsubr	QWORD PTR [eax+8]
	fstp	QWORD PTR [edx+8]
	fld	QWORD PTR $T35027[esp+36]
	fmul	QWORD PTR [ecx+16]
	fsubr	QWORD PTR [eax+16]
	fstp	QWORD PTR [edx+16]
	mov	esp, ebp
	pop	ebp
   \endcode

   \sa \ref regressiontest_failed
   \sa \ref install_win
*/


// Local Variables:
// mode:c++
// End: