UNIX Socket FAQ

A forum for questions and answers about network programming on Linux and all other Unix-like systems

You are not logged in.

#1 2013-01-02 08:42 PM

thinking
Member
Registered: 2005-09-15
Posts: 103

distributed local ipc

hi all,

i implemented a kind of a demo library using sysv semaphores and shared memory to have a multi-writer/multi-reader circular buffer
the nice thing about this was, that it's able to handle it's participants in a very flexible way
so, at any time its possible to attach to the shm and it's semaphores, use it (read/write) and detach.
the first one initializes everything, the last one destroys everything

why sysv?
im using the SEM_UNDO feature to know how many processes are using the shm at any given time.
this is needed to safely destroy everything at the end AND i added a read-counter in every written message, so i know if the read-counter of a message drop's to 0
the circular-buffer has new free bytes which can be overwritten by new messages (- i hope this was understandable?)
also sem_undo helps in case of a process crash, which may not caused by my library itself (kill -9, the prog using my library, ...)

i didn't test/stress much, but it seems working

about sysv semaphores i read that they are
1) kind of out-dated semaphores
2) the sem_undo feature may cause problems (see bug's section of man semop - but i dont think i have this kind of problem that the value drops below 0)

so i thought about using alternatives and was very surprised that there dont seem to be any in a portable way

my questions:
1) anyone can think of an alternative way to have a reliable and flexible local communication framework?
the easiest one i could think of is multicasting on lo-interface, this should also work windows, but it may be complicated to get this reliable cause of udp and such
2) would it be possible to use lock-free/wait-free algorithms for such distributed local behaviour? i dont have expirience with this so i dont know if i should have a look into this?
it's also very surprising to me, that there aren't any lock-free libraries ready for usage, everything i found seems to be highly theoretical or conceptual
3) any tips to get this running outside of sysv supported plattforms?
4) what really beg's me is what do other developers do who use posix/like semaphores in case the application/library crashes? i mean how to know if a semaphore, if it's left locked by a crash, is not in use and could be destroyed by the next running instance?

thx

Last edited by thinking (2013-01-02 08:45 PM)

Offline

#2 2013-01-03 03:45 PM

RobSeace
Administrator
From: Boston, MA
Registered: 2002-06-12
Posts: 3,847
Website

Re: distributed local ipc

Offline

#3 2013-01-04 12:25 AM

Nope
Administrator
From: Germany
Registered: 2004-01-24
Posts: 385
Website

Re: distributed local ipc

I'd also go with a client/server architecture. This special case sounds a lot like a good use for the Observer pattern.

I've done a lot with semaphores (Posix), but mostly in threaded applications. I can remember that I wanted to use system semaphores once to sync several processes, but I can't remember what I wanted to achieve, only that the Posix implementation of Linux at that time actually didn't support them. Usually I can`t count on the parts residing on the same machine anyway. So to ensure that it works, whether being on the same server or someplace else in the network, a client/server approach is the only liable way.

Offline

#4 2013-01-04 03:28 AM

i3839
Oddministrator
From: Amsterdam
Registered: 2003-06-07
Posts: 2,239

Re: distributed local ipc

Offline

#5 2013-01-04 07:11 PM

Nope
Administrator
From: Germany
Registered: 2004-01-24
Posts: 385
Website

Re: distributed local ipc

I used inter process futexes in the past. My dynamic pre-forking webserver used these to prevent the thundering herd problem of the old Linux socket accept. They did use shared memory instead of the normal int. Basically the processes waited in a select based on the file descriptor of the futex. So, for his program they might work.

Offline

#6 2013-01-06 03:13 PM

thinking
Member
Registered: 2005-09-15
Posts: 103

Re: distributed local ipc

thx all

currently i'm trying to use a server/client approach, but a bit different than suggested
server/client often means one server, many clients which means in my case, that the server would be a centralized point of failure
and it may be highly loaded beacauses every message from one client may need to be distributed to all the other clients
which need to be done by a single point for every client

my current idea is that every process hosts a server bound to a "random" port (=0), the acctually bound port number is broadcasted using multicast to a group of peers
so it's working locally and on the network producing a mesh of interconnected peers
if one fails, everything else will continue working

the first tests seem promising but i'll have to do a few things more to have it finally ready working

Last edited by thinking (2013-01-06 03:14 PM)

Offline

#7 2013-01-06 05:22 PM

RobSeace
Administrator
From: Boston, MA
Registered: 2002-06-12
Posts: 3,847
Website

Re: distributed local ipc

Offline

#8 2013-01-06 06:09 PM

Nope
Administrator
From: Germany
Registered: 2004-01-24
Posts: 385
Website

Re: distributed local ipc

Hmm, but in modern cluster like systems, you do have the possibility for every client to become master. An example would be elasticsearch. So perhaps you might want to look into that one. I do think, that I saw a paper from kimchi (the developer) about how elasticsearch decides who's master.

Offline

#9 2013-01-06 07:09 PM

thinking
Member
Registered: 2005-09-15
Posts: 103

Re: distributed local ipc

@RobSeace
wow, thx for your thoughts - this really helps getting a better view on the topic

@Nope
good tip
i didn't find the paper yet, but i'll have a look at the source

Offline

#10 2013-01-07 12:50 AM

i3839
Oddministrator
From: Amsterdam
Registered: 2003-06-07
Posts: 2,239

Re: distributed local ipc

If you can count on multicast mostly working, then I would go for a hybrid
approach: Use multicast for everything, but have a few peer to peer links
per process, which are used to retransmit any packets that didn't originally
made it (and to handle any clients where multicast doesn't work for some
reason). As a last fallback, you can always multicast a retransmit request
and have some heuristic in place to limit the number of replies. Another
option is, in case of lots of small packets being sent, is to always resend the
previous N packets for redundancy. All assuming there is a regular stream
of packets coming in, otherwise lost packets aren't detected soon enough.
This should scale very well, up till the individual clients can't handle the
total message stream any more.

Another way of avoiding the port problem is to use one UDP socket per
process for all peers, but this has a higher latency and uses more bandwidth.
The advantage of this to choosing one dynamic TCP server which everyone
uses until it goes down is that not all load is put on one server process and
it avoids one extra indirection. Downside is that you can't use TCP.

If you don't care about latency too much and can't use multicast, you can also
make a kind of tree or ring peer to peer structure (either UDP or TCP). But
then you have the problem of maintaining that structure in a robust way.

Offline

#11 2013-01-07 01:48 PM

RobSeace
Administrator
From: Boston, MA
Registered: 2002-06-12
Posts: 3,847
Website

Re: distributed local ipc

Offline

#12 2013-01-07 11:22 PM

i3839
Oddministrator
From: Amsterdam
Registered: 2003-06-07
Posts: 2,239

Re: distributed local ipc

I had mostly network load in mind, not CPU usage.

The days that CPU speed lags bandwidth seem to be over. 10Gb Ethernet should
have been common for years by now. (I blame hard disks having been pegged
around 100MB/s, making anything faster than 1Gb/s pretty useless widespread.)

There is always a problem when the reader(s) can't keep up with the sender(s).
Either you drop messages, or one slow reader can slow down everything and
cause a lot of memory usage for buffering not yet received messages (also bad
for latency). So I'd argue that dropping packets isn't always a bad thing, though
in this case it's probably better to disconnect or kill a consistently too slow client
if it can't keep up with the traffic. If you want guaranteed delivery and handling
of all packets then the whole will run as fast as the slowest node. You have this
choice no matter what kind of communication channel you use, the same is true
for shared memory.

The best approach and right solution depends on the fine details of what your
library is used for though. You have to choose what guarantees, if any, you
are prepared to give, and how you want to handle overload conditions.

Offline

Board footer

Powered by FluxBB