Current PLinq : mixed results
As described in the roadmap, PLinq implementation work has started a week ago. Some good bits are already there like what I consider to be the 3 most important operators (because most of the others can be expressed using these ones) : Select
, Where
and Aggregate
. Following is a discussion on the specificities of PLinq, difficulties they raise and, at the end, some examples/benchmarks.
A thing to be distinguished in PLinq contrary to standard Linq is the way data is processed. With Linq you can pretty much assume that data is processed in order. For example in Mono, Select
is implemented like this internally :
static IEnumerable CreateSelectIterator (IEnumerable source, Func selector)
{
foreach (var element in source)
yield return selector (element);
}
So if the IEnumerable
's iterator is built to return data in order (like most standard collections in .NET) this order will be kept. More importantly, some Linq's operators would lose their purpose if data wasn't kept in order. Operators such as ElementAt
, Skip
, First
are dependent on indexing and thus can't work reliably on scrambled data.
Now the problem is that PLinq doesn't follow this intuitive ordering property and that for a simple reason : performances. Indeed, just like Parallel.For
split the domain to parallelize work, PLinq's queries are run on workers which just take randomly data from the source enumerable and return them back without reordering them (or, put in another way, ordered by which took the less processing time). By doing things like this we can yield optimum performances. On the other hand, most applications rely somehow on data ordering, be it implicitly or with operators like OrderBy
, which mean we can't just ignore that aspect of Linq.
The current way PLinq allows this ordering part is to have two separate extension methods for turning IEnumerable
into IParallelEnumerable
. The first, AsParallel()
, allows the engine to process the data as it wants, so a developer can't expect in which order its initial data will be given back to him. The second one, AsOrdered()
, enforces strict order preservation at each step of the processing and, in fact, allows the class of operators I cited before to work as expected. However if ordering is explicitly turned on, either by AsOrdered()
or OrderBy
operator, the performance penalty can be important, killing the interest to parallelize queries in the first place (which is why it's turned off by default).
At the moment only the first mode (unordered processing) is supported in Mono's PLinq. Adding support for the other mode will be quite challenging with operators like Where
because they tend to create gaps in the source enumerable, gaps which are very hard to track in a non-deterministic environment like ParallelFx. I still have no clear idea on how to handle this in Mono's ParallelFx without too much locking (if anyone has pointers go ahead ;) ) .
Finally to close this post on a lighter touch, I have some examples which works on current implementation. First of all simple query like this one :
ParallelEnumerable.Range(1, 2000)
.Where(i => i % 10 == 0)
.Select(i => i + 1)[/sourcecode]
Yield around 1.5x performance improvement when run on my dual-core system. The actual precise number vary importantly on external things like how much data is queried, how long an operator takes to process one item, how selective are the predicates used ...
A second example which yield very little improvement at the moment (an example of why PLinq isn't a panacea) is this sort-of adaptive image blurrer. It takes a pixel and filter the others according to how much close their color are to the selected pixel's one. If a pixel doesn't comply to this predicate it gets blurred with a bloom-like effect :
(In the screenshot blue and white colors are supposed to be more blurred and thus fade in the background)
The parallelized version of this program is actually slower by a little amount than its synchronous counterpart on Linux but a little quicker on Windows for about the same run-time of the synchronous version on both platforms.