So, as you know ParallelFx helps programmers getting their software working nicely with multi core systems. To that end, it gives you several way to parallelize your code.

Nowadays in ParallelFx

The most simple and barebone approach is the Task API which is more or less a transcription of the standard .NET threading API (with some cool twist like continuations). Contrary to the ThreadPool or standard Thread, Tasks try to alleviate the work of the OS scheduler by using the ideal amount of workers. Thanks to that, thread context switches are avoided and it allows to get the more out of your processor cores in term of parallelism.

Closely building on that part, you have another nice abstraction called futures which let you think in term of computations rather than acting on results. Indeed, instead of directly manipulating results (and incidentally being bound by their computing time) you are encouraged to express how you would chain up result manipulation together and actually let the thing computes itself when it is really needed. That way, it allows everything to be done in background and be returned when ready.

Then, there are the Parallel class constructs that mimics classical imperative loops among which you find For, ForEach, While, etc. This way, imperative programmers feels at home and can get their loops parallelized easily (supposing there are no nasty side-effects and unprotected shared state manipulations of course).

Finally, PLinq allows query fans to get their Linq expressions running in parallel. Again, provided that there are the same conditions of thread-safety, this layer allows quick migration of existing code (basically some s/Enumerable/ParallelEnumerable/ and the addition of AsParallel() operator).

The Big Picture

The realization that starts to arise from the current work done on parallel system (especially those applied in imperative language) is that shared state is evil.

The problem with shared-state is that it inherently hurt down parallel performances. When using lock for instance, you prevent your code from scaling when the number of processor core increase as you waste CPU cycle to spin waiting for the locked resource to be freed (locked operations are also flawed in the sense that they can't be composed together but that's another subject).

Atomic primitives, like CAS, aren't a panacea either as they perturb the caching mechanism of the CPU (forcing cores to flush their internal cache and go back fetching the new values) thus bringing poorer performances.

All in one, there is no miracle solution to this problem. It's a consensus that we still have to maintain a certain amount of sharing mechanism cause doing a completely "pure" software with no side effects or mutation is either awkward or abnormal for the usual imperative programmer.

In fact there is an existing "solution" but it's rather extreme : isolation. In a perfect world, all your parallel task would work on their stuff in a completely isolated fashion with walls around the task to prevent any kind of cross-thread access. That way no lock, no cache messing, etc can get in the way. Utopia he ?

Today in Mono.Threading.Extensions

Inheriting from the previous conclusions, there is an interest today in creating hybrid systems where isolation is the default while still allowing from time to time communication between each isolation domains.

At the moment, the two most popular in my sense are actors (popularized by Erlang) and software-transactional-memory.

The former rely on message passing between independent object to achieve concurrency while the latter works like a MCC database where each thread acts on its own copy of objects with the changes going back later on in the shared memory and in a transactional manner.

Brainstorming on these ideas, I implemented really simple version of the two mentioned systems, both relying on existing ParallelFx components.

Actors

My actor implementation uses the Task API and the existing work-stealing scheduler. To allow some sort of preemptive scheduling (and thus avoid deadlock when actors are waiting for message from another actor) some operation are abstracted through combinator methods like Loop, LoopWhile, AndThen, etc...

Following is an example of a process ring using 10 000 actors each passing around a message token to its neighbor seven times :

using System;
using System.Threading;
using Mono.Threading.Actors;

namespace ActorTest
{
  class Process
  {
	IActor actor;
	int round = 0;
	int id = sharedId++;

	const int MaxRound = 7;
	static int sharedId = 0;

	public Process(Func<Process> next)
	{
	  actor = Actor.Create (() => {
		  Combinators.Loop (() => {
			  if (actor.Receive ((m) => next().actor.Send(null, m.Message))) {
				round++;
				if (id % 10 == 0)
				  Console.WriteLine(id.ToString() + " : " + round.ToString ());
			  }
			  if (round > MaxRound) {
				Console.WriteLine("Finish");
				Environment.Exit (0);
			  }
			});
		});
	}

	public void Start()
	{
	  Console.WriteLine("Starting");
	  actor.Send(null, MessageArgs.Empty);
	}
  }

  public class MainClass
  {
	static Process mainProc;

	public static void Main()
	{
	  InitProcess(0);
	  mainProc.Start();
	  Thread.Sleep(50000);
	}

	static Process InitProcess(int count)
	{
	  if (count == 10000)
		return new Process(() => mainProc);
	  if (count > 0) {
		Process temp = InitProcess(count + 1);
		return new Process(() => temp);
	  }

	  Process tmp = InitProcess(count + 1);
	  mainProc = new Process(() => tmp);
	  return null;
	}
  }
}

While there are 10 000 actors here, only two threads are doing the actual processing on my dual-core system.

STM

The software-transactional-memory implementation manage copies of the shared objects that are passed to the transaction and later on uses MCas (multi compare-and-swap) to update the shared locations. MCas is itself based on an almost wait-free algorithm where threads try to cooperate rather than wait idly.

Again following is an example of transaction showing the traditional bank account problem with concurrent deposit/withdraw operations :

using System;
using System.Threading;
using System.Threading.Tasks;

using Mono.Threading.Transactions;

namespace StmTests
{
  public class StmTest
  {
	const int NbTimes = 100;

	class BankAccount : ICloneable
	{
	  int amount;

	  public BankAccount (int amount)
		{
		  this.amount = amount;
		}

	  public int Amount {
		get {
		  return amount;
		} set {
		  amount = value;
		}
	  }

	  public object Clone ()
	  {
		return new BankAccount(amount);
	  }
	}

	StmObject<BankAccount> amount =
	  new StmObject<BankAccount> (new BankAccount (100));

	public void Withdrawn (int number)
	{
	  Transaction.Create(OpeningMode.Write, amount, (acc) => {
		  acc.Amount -= number;
		}).Execute();
	  // Simulate activity
	  Thread.Sleep(0);
	  Console.WriteLine("Removed {0}€", number);
	}

	public void Deposit (int number)
	{
	  Transaction.Create(OpeningMode.Write, amount, (acc) =>s {
		  acc.Amount += number;
		}).Execute();
	  // Simulate activity
	  Thread.Sleep(0);
	  Console.WriteLine("Added {0}€", number);
	}

	public int Amount {
	  get {
		return amount.Value.Amount;
	  }
	}

	public static void Main ()
	{
	  StmTest test = new StmTest();

	  Console.WriteLine ("Base amount : " + test.Amount);

	  Task t1 = Task.StartNew(_ => Repeat ((o) => test.Withdrawn (5)));
	  Task t2 = Task.StartNew(_ => Repeat ((o) => test.Deposit (5)));

	  Task.WaitAll (t1, t2);

	  Console.WriteLine ("Final amount : " + test.Amount);
	}

	static void Repeat (Action<object> action)
	{
	  for (int i = 0; i < NbTimes; i++) {
		Task.StartNew (action);
	  }
	}
  }
}

The Future

The parallel team at Microsoft is already working some of the stuff I described above for inclusion in ParallelFx. The STM group now has its own blog and there are experimentations going on in concurrency-friendly DSL.

You may find yet more exciting stuff of what is brainstormed in this video of Joe Duffy and Erik Meijer.

Conclusion

Thoughts ? Comments ? Use cases sharing ? You can go wild in the comments ;-) .