Archive for the Google Summer of Code 2008 Category

02
mar

Exploring other concurrency models

So, as you know ParallelFx helps programmers getting their software working nicely with multi core systems. To that end, it gives you several way to parallelize your code.

Nowadays in ParallelFx

The most simple and barebone approach is the Task API which is more or less a transcription of the standard .NET threading API (with some cool twist like continuations). Contrary to the ThreadPool or standard Thread, Tasks try to alleviate the work of the OS scheduler by using the ideal amount of workers. Thanks to that, thread context switches are avoided and it allows to get the more out of your processor cores in term of parallelism.

Closely building on that part, you have another nice abstraction called futures which let you think in term of computations rather than acting on results. Indeed, instead of directly manipulating results (and incidentally being bound by their computing time) you are encouraged to express how you would chain up result manipulation together and actually let the thing computes itself when it is really needed. That way, it allows everything to be done in background and be returned when ready.

Then, there are the Parallel class constructs that mimics classical imperative loops among which you find For, ForEach, While, etc. This way, imperative programmers feels at home and can get their loops parallelized easily (supposing there are no nasty side-effects and unprotected shared state manipulations of course).

Finally, PLinq allows query fans to get their Linq expressions running in parallel. Again, provided that there are the same conditions of thread-safety, this layer allows quick migration of existing code (basically some s/Enumerable/ParallelEnumerable/ and the addition of AsParallel() operator).

The Big Picture

The realization that starts to arise from the current work done on parallel system (especially those applied in imperative language) is that shared state is evil.

The problem with shared-state is that it inherently hurt down parallel performances. When using lock for instance, you prevent your code from scaling when the number of processor core increase as you waste CPU cycle to spin waiting for the locked resource to be freed (locked operations are also flawed in the sense that they can’t be composed together but that’s another subject).

Atomic primitives, like CAS, aren’t a panacea either as they perturb the caching mechanism of the CPU (forcing cores to flush their internal cache and go back fetching the new values) thus bringing poorer performances.

All in one, there is no miracle solution to this problem. It’s a consensus that we still have to maintain a certain amount of sharing mechanism cause doing a completely “pure” software with no side effects or mutation is either awkward or abnormal for the usual imperative programmer.

In fact there is an existing “solution” but it’s rather extreme : isolation. In a perfect world, all your parallel task would work on their stuff in a completely isolated fashion with walls around the task to prevent any kind of cross-thread access. That way no lock, no cache messing, etc can get in the way. Utopia he ?

Today in Mono.Threading.Extensions

Inheriting from the previous conclusions, there is an interest today in creating hybrid systems where isolation is the default while still allowing from time to time communication between each isolation domains.

At the moment, the two most popular in my sense are actors (popularized by Erlang) and software-transactional-memory.

The former rely on message passing between independent object to achieve concurrency while the latter works like a MCC database where each thread acts on its own copy of objects with the changes going back later on in the shared memory and in a transactional manner.

Brainstorming on these ideas, I implemented really simple version of the two mentioned systems, both relying on existing ParallelFx components.

Actors

My actor implementation uses the Task API and the existing work-stealing scheduler. To allow some sort of preemptive scheduling (and thus avoid deadlock when actors are waiting for message from another actor) some operation are abstracted through combinator methods like Loop, LoopWhile, AndThen, etc…

Following is an example of a process ring using 10 000 actors each passing around a message token to its neighbor seven times :

using System;
using System.Threading;
using Mono.Threading.Actors;

namespace ActorTest
{
  class Process
  {
	IActor actor;
	int round = 0;
	int id = sharedId++;

	const int MaxRound = 7;
	static int sharedId = 0;

	public Process(Func<Process> next)
	{
	  actor = Actor.Create (() => {
		  Combinators.Loop (() => {
			  if (actor.Receive ((m) => next().actor.Send(null, m.Message))) {
				round++;
				if (id % 10 == 0)
				  Console.WriteLine(id.ToString() + " : " + round.ToString ());
			  }
			  if (round > MaxRound) {
				Console.WriteLine("Finish");
				Environment.Exit (0);
			  }
			});
		});
	}

	public void Start()
	{
	  Console.WriteLine("Starting");
	  actor.Send(null, MessageArgs.Empty);
	}
  }

  public class MainClass
  {
	static Process mainProc;

	public static void Main()
	{
	  InitProcess(0);
	  mainProc.Start();
	  Thread.Sleep(50000);
	}

	static Process InitProcess(int count)
	{
	  if (count == 10000)
		return new Process(() => mainProc);
	  if (count > 0) {
		Process temp = InitProcess(count + 1);
		return new Process(() => temp);
	  }

	  Process tmp = InitProcess(count + 1);
	  mainProc = new Process(() => tmp);
	  return null;
	}
  }
}

While there are 10 000 actors here, only two threads are doing the actual processing on my dual-core system.

STM

The software-transactional-memory implementation manage copies of the shared objects that are passed to the transaction and later on uses MCas (multi compare-and-swap) to update the shared locations. MCas is itself based on an almost wait-free algorithm where threads try to cooperate rather than wait idly.

Again following is an example of transaction showing the traditional bank account problem with concurrent deposit/withdraw operations :

using System;
using System.Threading;
using System.Threading.Tasks;

using Mono.Threading.Transactions;

namespace StmTests
{
  public class StmTest
  {
	const int NbTimes = 100;

	class BankAccount : ICloneable
	{
	  int amount;

	  public BankAccount (int amount)
		{
		  this.amount = amount;
		}

	  public int Amount {
		get {
		  return amount;
		} set {
		  amount = value;
		}
	  }

	  public object Clone ()
	  {
		return new BankAccount(amount);
	  }
	}

	StmObject<BankAccount> amount =
	  new StmObject<BankAccount> (new BankAccount (100));

	public void Withdrawn (int number)
	{
	  Transaction.Create(OpeningMode.Write, amount, (acc) => {
		  acc.Amount -= number;
		}).Execute();
	  // Simulate activity
	  Thread.Sleep(0);
	  Console.WriteLine("Removed {0}€", number);
	}

	public void Deposit (int number)
	{
	  Transaction.Create(OpeningMode.Write, amount, (acc) => {
		  acc.Amount += number;
		}).Execute();
	  // Simulate activity
	  Thread.Sleep(0);
	  Console.WriteLine("Added {0}€", number);
	}

	public int Amount {
	  get {
		return amount.Value.Amount;
	  }
	}

	public static void Main ()
	{
	  StmTest test = new StmTest();

	  Console.WriteLine ("Base amount : " + test.Amount);

	  Task t1 = Task.StartNew(_ => Repeat ((o) => test.Withdrawn (5)));
	  Task t2 = Task.StartNew(_ => Repeat ((o) => test.Deposit (5)));

	  Task.WaitAll (t1, t2);

	  Console.WriteLine ("Final amount : " + test.Amount);
	}

	static void Repeat (Action<object> action)
	{
	  for (int i = 0; i < NbTimes; i++) {
		Task.StartNew (action);
	  }
	}
  }
}

The Future

The parallel team at Microsoft is already working some of the stuff I described above for inclusion in ParallelFx. The STM group now has its own blog and there are experimentations going on in concurrency-friendly DSL.

You may find yet more exciting stuff of what is brainstormed in this video of Joe Duffy and Erik Meijer.

Conclusion

Thoughts ? Comments ? Use cases sharing ? You can go wild in the comments ;-) .


free blog themes
14
jan

February previsions

Mono

Thanks to the change of school semester at my school, I’m going to have a break in February of almost 4 weeks during which we are supposed to participate to some activities chosen among a list.

The cool thing is that I managed to make « contributing to Mono » one of those activities. Thus, I will have 4 weeks to work on Mono and integrating SoC’s ParallelFx work into the mainline tree with the blessing of my school. Yay :-) !

FOSDEM

There are also a bunch of other nice things I will be able to do this February. Going to FOSDEM is one of those (I took the train tickets this afternoon) which mean I can now wear the pretty badge too :

I'm going to FOSDEM, the Free and Open Source Software Developers' European Meeting

Unless we have trains problems on the way, we should be there for the Fosdem’s Beer Event too, so if you want to discuss Mono, ParallelFx or whatever around an (excellent) Belgian beer be sure to come by ;-) .

Prologin

Finally, I was qualified to go into the semi-finals of Prologin, the French algorithmic contest for young people, this year again. It will be a good occasion to see old faces.

For the first time there are also proposing C# as a qualification language. Hopefully, it will mean no on-the-fly C++ learning for me this time :-) .


free blog themes
26
août

Review time !

Now that Google Summer of Code is drawing to an end, it’s time to summarize what has been achieved during these two months. Before anything else I would to thank again everybody (and most noticeably the folks from Mono and Google) for this summer, it has been awesome ) !

Base API

First of all let’s see how we compare to Microsoft implementation with the output of corcompare. As you can see, a lot of the reported missings come from boilerplate attributes and such, so we are pretty much feature complete (except for PLinq).

In addition, I showed throughout this blog different examples coming from Microsoft CTP releases and running on our implementation. Each one showcased a different part of the available API with every time a good performance boost. In order we have parallelized ray tracer, various samples and PLinq stress test.

As remarked, PLinq lags a little behind and will requires a good deal of love to complete operators and get some internal reworking until it’s satisfactory.

Extensions

Even though most of the additions are still draft, we already have some interesting bits.

System.Threading namespace got two new classes : AtomicBoolean and InterlockedExtensions. AtomicBoolean is a simple wrapper that allows the same sort of operations provided by Interlocked.Exchange and Interlocked.CompareAndExchange but on a boolean type.

InterlockedExtensions only contains a work-in-progress MCAS instruction at the moment. MCAS (Multiple Compare-and-Swap) is a handy primitive that can be used in a lot of situation. It has a semantic close to the traditional Compare-and-swap instruction (Interlocked.CompareExchange on .NET) except that it works with multiple target. Think of it as a series of CAS executed atomically and failing if any of the CAS failed. A naive implementation could be :

It’s particularly useful with linked-list like data structures that maintain more than one reference to nodes.

In System.Threading.Collections we also have a first draft of a concurrent skip-list that can be used both as a standard list (like List) and as a hash map (like Dictionary). Skip list are very interesting structures that possess a lot of the advantages of balanced trees but with much less complexity. The current version uses fine-grained locking in some place but I plan to use the aforementioned MCAS primitive to remove them completely.

Finally we have the new System.Threading.Actors namespace which is also much like a playground to my experimentations for the moment. It will allows actors-based concurrency with .NET, a feature that is already present in concurrent languages like Erlang and Java-ish languages like Scala and Clojure. For the moment the model is close to a teared-down version of the one used in Scala but I have to look into Clojure’s model which seems lighter and doesn’t require pattern matching to be usable.

I still have more ideas that could be useful for concurrent programming like more lock-free data structures and a STM manager. STM, which stands for Software Transactional Memory, is a concept close to the transaction model you have in database for example. The principle is that, when you want to do a set of operations concurrently on shared data, you encapsulate them in a transaction where read/write are logged. If, in the middle of your operations, the data is changed by another thread you are able to ‘rollback’ the transaction and try again. If no thread got in the way the modification can be ‘committed’ to the shared data and thus be visible by other threads. In practice it should be possible to adopt a helping scheme where threads that are colliding with an existing transaction will help it finish before doing their own work.

In conclusion, concurrency is becoming a first-class citizens these days and just cannot be ignored anymore by programmers. Big actors like Microsoft are already pushing for that with ParallelFx and, probably, future evolution of C# itself (C#4 ?) so we mustn’t lag behind ;) .


free blog themes
12
août

Call for ParallelFx test

Now that Summer of Code deadline is getting nearer, it’s a good time to give a try to ParallelFx. For that purpose I just issued updated test packages @ http://netherilshade.free.fr/mono/parallelfx/. Among them you will find stuff I already showed here like the raytracer, sudoku, or image colorizer.

Each of these package contains the most recent version of the System.Threading assembly (so no need to compile yourself trunk) and a README file explaining succinctly what the program do and how to launch it. In addition, feel free to test the library against other existing code making use of ParallelFx.

If you encounter any bug or bad performance, please send me an email at jeremie [DOT] laval [AT] gmail [DOT] com with your processor info (for Linux users the content of /proc/cpuinfo will do the trick), description of the problem and debug stack trace/reproducible test case (if any). Also, try to make the mail’s subject self-explanatory by adding a [Pfx] tag for example. I will triage them when I get back from holidays.

Happy testing :-) .


free blog themes
26
juil

Funny parallelism

For those who have some free leisure time this weekend and would like to explore a bit ParallelFx in a didactic manner I ported to Linux a Sudoku game coming from ParallelFx CTP samples which optionally uses ParallelFx to generate Sudoku grids.
For screenshot fans that’s how it looks like :
You can enable the use of [...]


free blog themes
25
juil

LINQ Raytracer but with a P twist

Following Marek’s annoucement about Mono’s compiler getting full C# 3.0 support and his proof-of-concept which none other than Luke Hoban LINQ-ified Ray tracer, I’m happy to say that by using Mono’s PLINQ and with as little modifications as this :
[sourcecode language='csharp']— linq-sequential.cs 2008-07-25 11:38:53.000000000 +0200
+++ plinq-parallel.cs 2008-07-25 11:39:04.000000000 +0200
@@ -1,9 +1,9 @@
-internal void RenderSequential(Scene scene)
+internal void RenderParallel(Scene [...]


free blog themes
22
juil

Current PLinq : mixed results

As described in the roadmap, PLinq implementation work has started a week ago. Some good bits are already there like what I consider to be the 3 most important operators (because most of the others can be expressed using these ones) : Select, Where and Aggregate. Following is a discussion on the specificities of PLinq, [...]


free blog themes
15
juil

Hello Monologue

After much RSS problems (and there are still some in fact), I’m now on Monologue. Thanks to Miguel and Michael for adding me :) .

Now for the usual presentation : my name is Jérémie Laval, I’m a french student participating in this year edition of Google Summer of Code for the Mono Project. The detail of my submission can be found in an older blog post.

To summarize, I’m doing an open-source implementation of ParallelFx, a framework designed to ease the development of parallel application on CLR platform. You can see the current results of my work here and here.

Apart from Mono and Summer of Code, I’m also working on two pet projects : D-Bus Explorer and Circ.

Finally, for more general informations, you can take a look at my « About » page.


free blog themes
15
juil

(Late) Zurich followup

So, as I said previously, I went to Google Zurich last Thursday. It was really fun, I indeed met a cool bunch of people and GSoC colleagues. It was really interesting to discuss with Petr (gda2tiles), Peter (Mercurial mentor), Arthur (aptitude-gtk) and the physicist guy whose name I forgot (so sorry :P ). By the way, thanks a ton to him for the laptop lend ;) .

My presentation went smoothly and people seemed interested (you can’t really tell for sure :) ). At least it sparkled some interesting questions. If you want the slides here they are : ParallelFx presentation. The other talks were both diverse and interesting too.

Of course, Google in itself was a blast. Unfortunately I had to catch the train so I wasn’t able to visit the whole building like the others but, even with the short glimpse I had, I’m pretty sure it must be fun to work at Google ;) . Still, I managed to retrieve a nice-looking Google tee-shirt. It’s rather funny to see that GSoC is the way to go if you want to freshen up your clothes stock ;) .


free blog themes
14
juil

More parallel benchmarking

This weekend I played with the Microsoft ParallelFx CTP’s samples, adapting and running them on current Mono ParallelFx implementation. I’m going to show you two of them and the results. I used the same computer than previously for running the samples.

Image Colorizer

The first one is an image colorizer. For the test, I used a .jpg file weighting 8,3 Mio and containing 13273085 pixels (4207×3155). The result of the processing is below :

For the same color region, the parallel-enabled processing was 1.77x faster than the synchronous one (parallel time : 4.3368363s, non-parallel time : 7.6746839s).

As can be seen on the two following cpu graphs, the parallel version take advantages of both core in my CPU :

(Parallel version running)


(Non-parallel version running)

Benchmark sample

The second sample is a benchmark program containing a bunch of test. Each test uses one of the available API in ParallelFx. For the run, I enabled the 3 following benchmarks :

  • Matrix multiplication : two random-filled matrix are multiplied together. Both are 500 by 500 matrix.
  • Tree sum : all nodes of a binary-tree are added together. I set tree depth to 24 and each node’s value was randomly chosen.
  • Searching : a random word is searched among all files present in a directory. I used a novel with each chapter split in a distinct .html file. The content of the directory was duplicated so that its total size was 12,6 Mio.

Each of these tests were run 7 times by the program and the time given back was the average of each run with the two extreme values being removed. Following are the results with, in parentheses, the ParallelFx API used by the test.

  • Matrix multiplication : 2,25x speedup (Parallel.For).
  • Tree sum : 1,58x speedup (Future).
  • Searching : 1,37x speedup (Parallel.ForEach).

Conclusion

So we can see that ParallelFx always provide a boost but that the overall improvement seems to depend on the component used and how it is used. It would be interesting to compare these results with the ones yielded by Microsoft implementation and by the Mono implementation but running on .NET. I will probably do that at the end of GSoC as a conclusion of the work put during the summer ;) .

In addition these samples showed some bugs. Indeed, I had to disable two tests in the benchmark sample because one of them returned different results between parallel/non-parallel version and the other was causing a stack overflow in Mono runtime (I still need to find out if it’s Mono fault or mine).


free blog themes