rss

Review time !

Category : C#, English, Google Summer of Code 2008, Mono, Programming

Now that Google Summer of Code is drawing to an end, it’s time to summarize what has been achieved during these two months. Before anything else I would to thank again everybody (and most noticeably the folks from Mono and Google) for this summer, it has been awesome :) !

Base API

First of all let’s see how we compare to Microsoft implementation with the output of corcompare. As you can see, a lot of the reported missings come from boilerplate attributes and such, so we are pretty much feature complete (except for PLinq).

In addition, I showed throughout this blog different examples coming from Microsoft CTP releases and running on our implementation. Each one showcased a different part of the available API with every time a good performance boost. In order we have parallelized ray tracer, various samples and PLinq stress test.

As remarked, PLinq lags a little behind and will requires a good deal of love to complete operators and get some internal reworking until it’s satisfactory.

Extensions

Even though most of the additions are still draft, we already have some interesting bits.

System.Threading namespace got two new classes : AtomicBoolean and InterlockedExtensions. AtomicBoolean is a simple wrapper that allows the same sort of operations provided by Interlocked.Exchange and Interlocked.CompareAndExchange but on a boolean type.

InterlockedExtensions only contains a work-in-progress MCAS instruction at the moment. MCAS (Multiple Compare-and-Swap) is a handy primitive that can be used in a lot of situation. It has a semantic close to the traditional Compare-and-swap instruction (Interlocked.CompareExchange on .NET) except that it works with multiple target. Think of it as a series of CAS executed atomically and failing if any of the CAS failed. A naive implementation could be :

public static bool MCas<t>(T[] locations, T[] expected, T[] newValues)
{
    for (int i = 0; i < locations.Length; i++)
        if (locations[i] != expected[i])
            return false;
    for (int i = 0; i < locations.Length; i++)
        locations[i] = newValues[i];

    return true;
}

It’s particularly useful with linked-list like data structures that maintain more than one reference to nodes.

In System.Threading.Collections we also have a first draft of a concurrent skip-list that can be used both as a standard list (like List) and as a hash map (like Dictionary). Skip list are very interesting structures that possess a lot of the advantages of balanced trees but with much less complexity. The current version uses fine-grained locking in some place but I plan to use the aforementioned MCAS primitive to remove them completely.

Finally we have the new System.Threading.Actors namespace which is also much like a playground to my experimentations for the moment. It will allows actors-based concurrency with .NET, a feature that is already present in concurrent languages like Erlang and Java-ish languages like Scala and Clojure. For the moment the model is close to a teared-down version of the one used in Scala but I have to look into Clojure’s model which seems lighter and doesn’t require pattern matching to be usable.

I still have more ideas that could be useful for concurrent programming like more lock-free data structures and a STM manager. STM, which stands for Software Transactional Memory, is a concept close to the transaction model you have in database for example. The principle is that, when you want to do a set of operations concurrently on shared data, you encapsulate them in a transaction where read/write are logged. If, in the middle of your operations, the data is changed by another thread you are able to ‘rollback’ the transaction and try again. If no thread got in the way the modification can be ‘committed’ to the shared data and thus be visible by other threads. In practice it should be possible to adopt a helping scheme where threads that are colliding with an existing transaction will help it finish before doing their own work.

In conclusion, concurrency is becoming a first-class citizens these days and just cannot be ignored anymore by programmers. Big actors like Microsoft are already pushing for that with ParallelFx and, probably, future evolution of C# itself (C#4 ?) so we mustn’t lag behind ;) .

More parallel benchmarking

1

Category : C#, English, Google Summer of Code 2008, Mono, Programming

This weekend I played with the Microsoft ParallelFx CTP’s samples, adapting and running them on current Mono ParallelFx implementation. I’m going to show you two of them and the results. I used the same computer than previously for running the samples.

Image Colorizer

The first one is an image colorizer. For the test, I used a .jpg file weighting 8,3 Mio and containing 13273085 pixels (4207×3155). The result of the processing is below :

For the same color region, the parallel-enabled processing was 1.77x faster than the synchronous one (parallel time : 4.3368363s, non-parallel time : 7.6746839s).

As can be seen on the two following cpu graphs, the parallel version take advantages of both core in my CPU :

(Parallel version running)


(Non-parallel version running)

Benchmark sample

The second sample is a benchmark program containing a bunch of test. Each test uses one of the available API in ParallelFx. For the run, I enabled the 3 following benchmarks :

  • Matrix multiplication : two random-filled matrix are multiplied together. Both are 500 by 500 matrix.
  • Tree sum : all nodes of a binary-tree are added together. I set tree depth to 24 and each node’s value was randomly chosen.
  • Searching : a random word is searched among all files present in a directory. I used a novel with each chapter split in a distinct .html file. The content of the directory was duplicated so that its total size was 12,6 Mio.

Each of these tests were run 7 times by the program and the time given back was the average of each run with the two extreme values being removed. Following are the results with, in parentheses, the ParallelFx API used by the test.

  • Matrix multiplication : 2,25x speedup (Parallel.For).
  • Tree sum : 1,58x speedup (Future).
  • Searching : 1,37x speedup (Parallel.ForEach).

Conclusion

So we can see that ParallelFx always provide a boost but that the overall improvement seems to depend on the component used and how it is used. It would be interesting to compare these results with the ones yielded by Microsoft implementation and by the Mono implementation but running on .NET. I will probably do that at the end of GSoC as a conclusion of the work put during the summer ;) .

In addition these samples showed some bugs. Indeed, I had to disable two tests in the benchmark sample because one of them returned different results between parallel/non-parallel version and the other was causing a stack overflow in Mono runtime (I still need to find out if it’s Mono fault or mine).

Report for GSoC first two weeks

Category : C#, English, Google Summer of Code 2008, Mono, Programming

Even though GSoC coding period started two weeks ago, I totally forgot to blog about it so here it is.

First week started very slowly as I had a bunch of school exams to take care of. During that time I set up a Windows VM to play with the library and wrote a first batch of stubs with some simple logic (getter and setter for instance).

This week was much more interesting because of two things : the Google Code Camp at my school and the ParallelFX’s team releasing a new CTP (beta) of the library. GCC was an event organized at my school by sam and Dave. The concept was simple : gather some geeks together in a big place for the night, with pizza and big amount of coffee/DarkDog, courtesy of Google, to hack on what we wanted. The association of these two events resulted in some productive things done.

At first I had planned to start the Scheduler part but with the new release (and thus the API additions) I preferred to stub/code the new parts like System.Threading.Collections which contains lock-free implementations of stack and queue, SpinWait/SpinLock which, as the name implies, provide locking facilities based on processor spinning and LazyInit/WriteOnce which, respectively, allows lazy evaluation of expression and readonly-like variable (which can be be set only one time and anytime). I pushed these changes this morning, together with a start of unit tests, in the Mono SoC repository.

The next three weeks will be at a slow pace as I have to prepare school stuff and my final exams in two weeks.