Benoît Jacob ([info]bjacob) wrote,
@ 2007-08-26 09:19:00
Previous Entry  Add to memories!  Tell a Friend!  Next Entry
Entry tags:kde eigen eigen2 tvmet c++ matrix librar

The long road to Eigen 2
It's funny how none of the existing libraries for vector and matrix math seems to do all what one might want. Each of these libs (Blitz++, MTL, LAPACK, GSL, TVMET, GMM++...) is specialized for certain needs. Usually, an app will choose the most suitable one for its needs, and be happy. But KDE is a huge meta-project with lots of sub-projects, so it has a very broad range of mathematical needs, which is not covered by any of the specialized libraries.

Writing from scratch such a general math library would take many man-years. However, what one can do is to merge existing libraries into a single one.

A killer feature that we absolutely want is expression templates. It's a strange C++ technique that allows to do things like a = b + c without paying the cost of having a temporary object allocated on the stack and then copied into a. Basically, it allows to use natural mathematical notation without any overhead. It even allows things like

matrix.row(i) += matrix.row(j)
which would be impossible otherwise.

The best expression templates for a matrix library are currently found in TVMET. Unfortunately, TVMET has been stagnant for 2 years and has been very little used by other projects, which is probably due to being too abstract, aiming too much for theoretical perfection.

For Eigen2, I decided to use TVMET as my starting point, instead of Eigen1. Of course, the contents of Eigen1 will be incorporated into Eigen2 as soon as possible. However, for now my work has been focused in shaping TVMET for my needs. I've converted it from autotools to CMake, and I've removed tons of stuff that I don't need, shrinking it down from 12000 to 7000 lines of code, and I've still room to shrink it further, and I've already added some good stuff from Eigen1, such as sane fuzzy comparisons. I'm currently porting the unit-tests from CPPUnit to QTestLib, and by the way I'm improving them a lot now that there are sane fuzzy comparisons.

What's next? Once I'll have fully reshaped TVMET and incorporated Eigen1 into it, I'll have a working Eigen2 for tiny (i.e. fixed-size) vectors and matrices. For dynamic-size objects, both dense and sparse, I'll wrap GMM++. In order to make this possible, the expression templates from TVMET will have to be adapted, as for instance loop unrolling will no longer be possible.

It's a very exciting project that is bound to fill a very bad gap in the current existing software. It requires only knowledge of pure C++, and vague notions of matrix mathematics. Your help is more than welcome, as there's a lot to do. So don't hesitate, join me: eigen at lists tuxfamily org.


(Post a new comment)

Greate work !
(Anonymous)
2007-08-26 08:17 am UTC (link)
Eigen is a already very good library. Expression templates will make it just wonderful !

(Reply to this)(Thread)

Re: Greate work !
[info]bjacob
2007-08-26 04:25 pm UTC (link)
Thanks for the kind words!

(Reply to this)(Parent)

I was sort of looking for a good matrix framework
(Anonymous)
2007-08-26 10:56 am UTC (link)
For doing linear programming and integer programming solvers. The only good solver out there right now is unfortunately licensed under CPL, which makes it less useful than it should have been.

(Reply to this)(Thread)

Re: I was sort of looking for a good matrix framework
[info]bjacob
2007-08-26 04:28 pm UTC (link)
I'll expose in the Eigen2 API the various solver algorithms implemented in GMM++. I hope they'll satisfy your needs. I don't think this includes an integer solver, though. For that, you might either use a floating-point solver combined with an algorithm to approximate floating-point numbers by rational numbers, or try an Arithmetics package such as PARI.

(Reply to this)(Parent)


(Anonymous)
2007-08-26 12:21 pm UTC (link)
This sounds great to me. How fast do you expect it to perform?

(Reply to this)(Thread)


[info]bjacob
2007-08-26 04:17 pm UTC (link)
As fast as possible :) Seriously, there is no inherent limitation to how fast it can be. Expression templates are a huge optimization in themselves, as they allow to get all the syntactic sugar at zero cost. The question is then, will we get low-level optimizations, like using SSE assembly instructions. Someone has already volunteered on the Eigen list to do that, so I'm optimistic that we'll eventually get that as well.

(Reply to this)(Parent)

Interesting
(Anonymous)
2007-08-26 12:50 pm UTC (link)
Very nice work all right! C/C++ out of the box are a major pain to try and writeup any fast math work in and have the code (as apposed to only the comments) readable afterwards.

I tried using Blitz++ a few years ago (even submitted back a few patches). However, I ultimately just dropped it in favour of hacking things in with GSL. The code base just didn't leave me with a good feeling. It left me with that kind of evolved, not designed (or multiple conflicting designs), halfway between re-factoring, and now in maintenance only mode sense.

After reading your post, however, I had a look at some of the tvmet document. Looks good so far. Will have to take a peak at the code now. Perhaps the time for expression template goodness has finally come.

This could be a real boon for KDE. : )

(Reply to this)(Thread)

Re: Interesting
[info]bjacob
2007-08-26 04:14 pm UTC (link)
Thanks for your comment. Maybe you're interested in seeing how I'm changing TVMET, it's in KDE's SVN (readable at websvn.kde.org) at /branches/work/eigen2.

(Reply to this)(Parent)

Speed++
(Anonymous)
2007-08-26 01:25 pm UTC (link)
You've probably bookmarked http://ubiety.uwaterloo.ca/~tveldhui/papers/techniques/

I used to love expression templates! But then I realized that this is just excessive 'syntax morphing'. Expression templates simply turn "A=B+C+D" into a loop of "A[i]=B[i]+C[i]+D[i];".

But let's say you have a piece of code "A=B+C;D=A+E;". Unless you have a really smart compiler, you'll end up an extra loop and fetching A[i] twice.

At the moment we humans are still smarter than machines, so it makes much more sense for us to learn how to talk to them in a language that the machine can understand.

How much easier would life be if the Indian mathematicians had not only invented nothing (= 0) but also invented counting with only 8 fingers and using 2 thumbs for carries/exponents. And postfix notation.

On a more technical note, I find the speed of all libraries for medium sized (i.e. in L1-cache, out of register) vectors & matrixes rather dissappointing. For out of cache problems you are limited by memory bandwidth and should use a blocking algorithm.

Contrary to std::vector vectors in numerical applications change dimension less frequently, so it would seem natural to write code that specializes the algebra for different vector sizes.

If you want it really fast (think of storing spare vector offsets in the Instruction cache...), you could use a runtime assembler like RUNTAS (www.arithmex.com). It uses expression templates, so that one can put things like

int offset=rand();
int randreg=rand()%15;

_ASM( _add r10b,$BYTE[rq[randomreg]+rsi*8+$DWORD(offset)] );

in C++ source code. Expect it to assemble x86-specific code at around 6 cycles/codebyte.

Vectors could/should be lightweight 'handles' to some ALGEBRA<LINEAR,float> Algebra(params), that takes care of memory management and runtime generated code.

(Reply to this)(Thread)

Re: Speed++
(Anonymous)
2007-08-26 02:58 pm UTC (link)
Not to disagree with what you are saying, but template expressions do not just turn 'A=B+C+D' into 'A[i]=B[i]+C[i]+D[i]'. They turn 'A[i]=B[i]+C[i]+D[i]' plus a 'for' statement that is longer and more verbose than this entire expression into 'A=B+C+D'.

The value of getting rid of all this extraneous code (where extraneous is extra statements beyond those required to just expressing the algorithm -- in this case the 'for' statements and all the associated indices) is in making your code more maintainable (easier to understand, easier to modify, etc.), and it is huge in the long run.

(Reply to this)(Parent)

Re: Speed++
[info]bjacob
2007-08-26 04:25 pm UTC (link)
Thanks for the pointers, and the information on low-level optimization. As for expression templates, I am convinced that they are a good idea. In your example where A[i] is fetched twice, I understand your point, but it still doesn't bother me too much. I understand that expression templates don't guarantee optimal speed, but they at least greatly enhance it while allowing for a very neat API. Neat APIs are something that the KDE community values extremely high. So I'm all for low-level optimizations as long as they fit in the expressions template scheme.

(Reply to this)(Parent)(Thread)

Re: Speed++
(Anonymous)
2007-08-27 05:02 am UTC (link)
How much syntax massaging is really necessary for a neat API?

I'm convinced that expression templates are not a good idea for numerical problems. They only make code look more like what people are used to writing by hand on paper. And the underlying library itself is practically impossible to debug, if you didn't write it yourself. It will also catch a bunch of compiler bugs, so it keeps breaking with every compiler change.

The point is that the entire array A has to be reloaded from memory, so you end up with suboptimal code. If you do the loops by hand, you'll immediately see the optimization...

Nothing stops you from putting the 'expression' into a comment before 'unreadable code'...


// A=B+C;D=A+E;
for( int i=0;i<dimension;++i ){A[i]=B[i]+C[i];D[i]=A[i]+E[i];}



Also, depending on the depth of recursion, you can end up with really, really long compile times. Compilers tend to keep instances of templates as linked lists and not as arrays, so that you end up with algorithms with time-complexity of O(n^2). Or worse.

So, for math stuff, my suggestion is to either use a different language, or get used to getting your hands dirty. You can use intrinsics to get at 'prefetchnta', 'movnta', and similar assembly instructions to minimize cache pollution.

(Reply to this)(Parent)(Thread)

Re: Speed++
[info]bjacob
2007-08-27 06:24 am UTC (link)
I understand that expression templates are difficult to debug, so I'll write good internal documentation and extensive unit-tests.

TVMET is already compatible with many compilers including GCC from version 2.95 onwards and MSVC from version 7 onwards. So I'm not too much concerned about portability.

As to bad complexity with long arithmetic expression, I do see your point, so I'll have to remind my users to not use too long expressions. However with reasonable expressions I've always noticed short compile times.

As to the point witht the array A, again, I do see your point. I agree that my code will be suboptimal. So you found an example where my code gives suboptimal results... so what. I never believed an abstract high-level solution like expression templates would consistently give optimal speed. I just believe they tend to greatly improve performance _in general_, over what one would have with classical overloaded operators. I have other factors to balance with performance, and going back to 'for' loops everytime they want to add vectors just isn't an option for many application programmers.

As for the necessity of expression templates in order to have a nice API, my point is that I need to offer at least a way for the user to perform arithmetic operations without the overhead of allocating temporaries on the stack. So, without expression templates, I'd have to offer C-style methods alongside the arithmetic operators. For instance, instead of having only Matrix::operator*(const Matrix &other), I'd also have Matrix::multiply(const Matrix &other, Matrix *result). That does clutter the API. An even more evident example is the row/column operations. With expression templates, all I need to have in the API is two methods: Matrix::row(int) and Matrix::col(int). Without expression templates, I need methods to return a row as a vector, to do the same without returning by value, to read a row from a vector, to multiply a row by a number, to add a row multiplied by a number into another row, etc..., and then again the same thing for columns. And at the end of the day, even with 20 methods my API still wouldn't let the user do as many things with rows and columns as it would with expression templates.

This API niceness means a more stable API, which is an important aspect of maintaining a library in the long term.

(Reply to this)(Parent)(Thread)

Re: Speed++
[info]bjacob
2007-08-27 06:26 am UTC (link)
erratum: the required gcc version for TVMET is more like GCC 3.0.

(Reply to this)(Parent)

Re: Speed++
(Anonymous)
2007-08-27 01:51 pm UTC (link)
Accessing rows & columns in matrices is usually done to implement some higher algorithm, which is covered by BLAS libraries.

We frequently look at determinants and other scalar measures, but really few people will be interested in cout << MySquareMatrix.col(12345) << endl; other than for debugging purposes. :-)

I actually suspect that that overloading operators for non-scalars is somewhat of an anti-pattern, precisely because it conceals the complexity of the operation...

For example, my homemade BLAS library had an .T() that returns the transposed vectortype, so Vec1.T()*Vec2 would calculate the inner product and Vec1*Vec2.T() would return the outer product. Nice.

Unfortunately, Vec1*Vec2.T()*Vec3 is A LOT slower than Vec1*(Vec2.T()*Vec3) even though both expressions are algebraicly equivalent.

So having a nice API just creates another way to write slow code, if you have no idea how the internals work. But hiding the internals is exactly why you want a nice API, right?

Where is the point in maintaining a library in the long term, when it is suboptimal? The better you do it, the harder it will be to get out of the local optimum...

I'm not saying that you should do a bad job, so people will have a stronger incentive to really improve things with a quantum leap, I'm just saying: don't overdo it.

When I got interested in expression templates for speed improvements, I learned so much about how processors do things and about how compilers do things, that my conclusion is that we programmers are stuck in a local optimum, when we rely on compilers.

Now, I aim to write all critical code segments using the RUNTAS library. My L-BFGS implementation used to use matrices for the circular buffer. Now, it's a simple pointer.

I love consistent API's, but I don't think that overloading operators is the way to go.

(Reply to this)(Parent)(Thread)

Re: Speed++
[info]bjacob
2007-08-27 03:22 pm UTC (link)
I've seen enough use cases for accessing rows to know that your argument is invalid. For instance, if you want to compute the antecedent of a vector by a unitary operator. I know I've been doing this in libavogadro.

The argument that overloading operators is bad because it hides complexity is just the old argument against C++, nothing new. I don't worry about it. The user is supposed to be smart, that's all.

This story about local optima doesn't make any sense: how do you define 'local' here?

(Reply to this)(Parent)(Thread)

Re: Speed++
(Anonymous)
2007-08-27 06:43 pm UTC (link)
Do you think C++ is an optimal tool to do the job? Or is C++ optimal in the sense of all the existing C & C++ code?

http://troy-at-kde.livejournal.com/5647.html

(Reply to this)(Parent)(Thread)

Re: Speed++
[info]bjacob
2007-08-28 08:20 am UTC (link)
I've never said that C++ was optimal in any sense of the word.

But since you ask, yes, I think C++ is the best-suitable language for the job in the present situation (being basically the only object-oriented language that brings no inherent overhead, as long as you stay away from certain keywords such as "virtual", "dynamic_cast" etc.) and it is also the main language of the KDE community.

(Reply to this)(Parent)

Re: Speed++
(Anonymous)
2007-08-28 02:12 am UTC (link)
Found a quote from Terry Pratchett:

"As they say in Discworld, we are trying to unravel the Mighty Infinite using a language which was designed to tell one another where the fresh fruit was."

(Reply to this)(Parent)

Re: Speed++
(Anonymous)
2007-08-27 04:21 pm UTC (link)
Not to knock RUNTAS, but I'm left with the funny feeling that your code isn't going to run too well on the Alpha, Itanium, and SPARC boxes we have here (this is going to be a KDE library afterall)...

(Reply to this)(Parent)(Thread)

Re: Speed++
(Anonymous)
2007-08-27 06:36 pm UTC (link)
Only true until someone gives me an Alpha, Itanium and Sparc box and gives me the equivalent of the Intel Developers Manual Vol.I-III for these machines. :-)

(Reply to this)(Parent)

Re: Speed++
(Anonymous)
2007-08-28 09:18 am UTC (link)

Urgh, I just had to comment on this:

Nothing stops you from putting the 'expression' into a comment before 'unreadable code'...
// A=B+C;D=A+E;
for( int i=0;i<dimension;++i ){A[i]=B[i]+C[i];D[i]=A[i]+E[i];}

You talk of anti-pattern? This is premature optimization right there. But far worse, that is a comment that does not tell *why* a certain piece of code is there, but *what* it does. Which means that a) the reader is still mystified as to why this code is here and b) when the code changes but the comment isn't updated, the comment is worse than nothing.

When making an API, making one where the result is immediately readable is a huge boon. If it turns out that the above optimization is really necessary, you can add it with a comment that says (this is really just simple addition of matrixes, but profiling revealed the naive approach was a performance bottlenect, see unit test blablabla.)

You my few eurocent, even though this silly nation hasn't joined :/

(Reply to this)(Parent)


Create an Account
Forgot your login?
Login w/ OpenID
English • Español • Deutsch • Русский…