[LUGSB] Multithreading or Light-weight processes ?

Sun Oct 19 13:55:23 EDT 2008

Chris Wright wrote:
> 2008/10/19 Arjun G. Menon <arjungmenon at gmail.com>:
>   
>> Hi LUGSB,
>>
>> I guess most of you must have worked with threads at some point and
>> are familiar with the difficulty involved in writing complex
>> applications that use them. I think the best way to overcome this
>> "thread problem"
>> (http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-1.pdf) is
>> by using light-weight processes.
>>     
>
>   
That paper again!  I just saw that somewhere a couple of weeks ago.  The 
author writes about shared memory like it killed his father.  The 
coordination languages he discusses are cool, but I think his arguments 
against shared memory are overreaching.

Also, you've confused light-weight processes with message passing.  The 
term light-weight processes is actually very vague (I don't think 
there's a universally accepted definition), but it usually refers to the 
technical details involved in providing concurrent execution contexts.  
A light-weight process does not necessarily use message passing or 
shared memory (but it could use both).

The technical report you cite criticizes both concurrency models (shared 
memory and message passing), suggesting instead a model that is not 
quite message passing.  I didn't really read any of the articles the 
author cited, though.

> You really need very good support for IPC in that case. Erlang has
> language support for message passing, so that eliminates most of the
> pain associated with IPC.
>
> Plus if you're using userspace processes, your IPC will probably be
> shared memory, so you'll need some means of protecting that. Plus
> you're relying on cooperative multitasking, unless you have language
> support. Though any time a process passes a message, you can pause
> that process and run the scheduler, so you don't need to worry about
> that overmuch. And the scheduler would be able to tell you which
> processes are taking the largest time slices when there is contention,
> and you can pepper the relevant methods with yields.
>
>   
Message passing APIs do not need to rely on shared memory to work: they 
could communicate with pipes for example.  That said, many probably do 
use shared memory for its efficiency.  It's ok though, because then a 
team of experts can hammer out all the locking difficulties in that one 
subsystem and nobody else has to worry about it.

Cooperative scheduling is a non-starter.  All the major non-embedded 
operating systems provide preemptively-scheduled processes and threads.  
You're right here (Chris I mean), that cooperative scheduling can 
simplify synchronization, but that only works on single processor 
systems.  On SMP, shared memory is a world of hurt no matter what you do.
> I was wondering, just now, how you'd handle the issue of multiple
> processes attempting to work on shared data. But you can simply make
> the shared data structure into another process, and manipulating it
> will consist of passing messages to it.
>
>   
>> I personally feel using light-weight processes for intra-application
>> multitasking is far more superior than having concurrently running
>> threads that share the same block of memory. Light-weight processes
>> are far more secure than threads in the sense they don't share memory
>> and thus avoid a whole host of problems associated with it. IMO, they
>> are also easier to work with (while programming); I find the
>> message-passing IPC model simpler and more manageable.
>>
>> Additionally when it comes to parallel computing; even there
>> light-weight processes are a win-win scenario. There's no need for
>> complex algorithms that manage shared memory between CPUs when each
>> CPU can be assigned 1/more L.W.-processes and they all interact by
>> message passing. I think on a well designed OS, L.W.-procs should be
>> as efficient as threads.
>>     
>
>   
A lot of people agree that message passing is the right way to do 
parallel programming.  Distributed programs in particular virtually all 
rely on message passing, since nodes in a cluster cannot easily share 
memory.  MPI implementations provide powerful message passing 
facilities, but I can't speak for how efficient they are at passing 
messages between processes on a single machine.  Any inefficiencies in 
MPI probably aren't because of design problems in the OS, though.
> Additionally, light weight processes don't share memory, so there are
> some compiler optimizations you can do. And, for instance, you could
> run a garbage collector on just the memory associated with one light
> weight process, speeding up collection times without sacrificing
> accuracy.
>
> Also, switching between light weight processes takes something like 32
> instructions on x86 processors, so that's fast. But again, any sort of
> multithreading in the scheduler / runtime would require a lot of
> locking. It's just that you don't have to handle it manually. Since
> there's probably opportunity to use, develop, and improve lock-free
> messaging in light weight processes, and that's only one relatively
> small body of code to alter, that's a very good tradeoff.
>
>   
The cost of switching between processes (or threads within a process) 
includes more than just the context switch.  Choosing which process and 
thread to execute next (scheduling) is a complicated enough task that it 
probably dominates the context switch itself.  The Linux kernel 
scheduler has gone through several rewrites since the 2.4 days with the 
goal of reducing the time spent scheduling while still doing a good job 
choosing processes.
>> Some applications like Google Chrome already use L.W.-procs (for each
>> tab the user opens a seperate process is launched). It surprises me
>> that a lot more people don't use it already, given its many
>> advantages. Which model of multitasking do you think is better?
>> (especially in terms of programmer efficiency)
>>     
>
> In Chrome's case, the processes aren't very light weight. Chrome uses
> more memory than Firefox, and that's not easy.
>   
Yes, Chrome does not use light-weight processes; it uses full-blown 
processes.  Don't mistake either of those for message passing, though!  
Separate processes can absolutely use shared memory with the mmap() 
facility on Unix or a similar API on Windows.  I haven't looked at the 
Chrome source though, so I don't have any idea how the processes 
communicate.

As for my own opinion: shared memory is a bear, make no mistake, but it 
isn't going away.  Also, keep in mind that, while message passing solves 
the worst problems with shared memory concurrency, it does have plenty 
of pitfalls.

--Justin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.fsl.cs.sunysb.edu/pipermail/lugsb/attachments/20081019/773709f9/attachment.htm