UNIX Socket FAQ

A forum for questions and answers about network programming on Linux and all other Unix-like systems

You are not logged in.

  • Index
  • » C
  • » High Performance Data Structures

#1 2013-10-14 08:27 PM

RipRage
Member
From: England
Registered: 2010-01-06
Posts: 146

High Performance Data Structures

Hi Guys

Just looking for some information in ways to create high performance data structures in regards to linked lists, queues etc. I am working on a multi-threaded server design using overlapped Input Output and completion ports. I have come up with a data structure which will be allocated for each connected client, I thought the best way to handle this would be to create a pool of nodes, say 25 at a given time and allocate them accordingly. I will post some code so you get more of an idea of what I am trying to achieve, I am looking for ways of making it as efficient as possible, since data structures is one particular area of C i am really interested in and want to learn how to do correctly!

/*
 * The following structures are used with input output completion ports 
 */
#define IO_SEND_CHAP 1
#define IO_RECV_CHAP 2

#define PACKET_SIZE 256

/* Per packet data */
typedef struct _PACKET_DATA
{
    OVERLAPPED overlapped;
    WSABUF wsabuf;
    char packet[PACKET_SIZE];	
    int len;
    char optcode;

} PACKET_DATA;

/* Per client data */
typedef struct _CLIENT_DATA
{
    u_char alloc:1;	        /* Identifer for UnmapViewOfFile */
    SOCKADDR_STORAGE address;			
    u_char secured:1;		/* Authorized via CHAP */
    long id;
    PACKET_DATA packet_data, *packet;
    SOCKET socket;

    struct _CLIENT_DATA *next;

} CLIENT_DATA;

#define LIST_SIZE 25

static CLIENT_DATA *g_active_list, *g_free_list;
static CRITICAL_SECTION g_list_lock;

#define INIT_LOCK() (InitializeCriticalSection(&g_list_lock))
#define LOCK() (EnterCriticalSection(&g_list_lock))
#define UNLOCK() (LeaveCriticalSection(&g_list_lock))
CLIENT_DATA *alloc_client_data (void)
{
	CLIENT_DATA *cd_tmp;
	HANDLE map_handle;

	size_t tot_len;

	LOCK();

	if (g_free_list == NULL)	/* Allocate a new block of nodes */
	{
		tot_len = (LIST_SIZE * sizeof(CLIENT_DATA));

		/* Virtual vs Heap ? */
		map_handle = CreateFileMapping(INVALID_HANDLE_VALUE, NULL, PAGE_READWRITE, 0, tot_len, NULL);
		if (map_handle == NULL)
		{
			windows_error(_T("CreateFileMapping()"));
			return NULL;
		}

		g_free_list = (CLIENT_DATA *) MapViewOfFile(map_handle, FILE_MAP_ALL_ACCESS, 0, 0, tot_len);
		if (g_free_list == NULL)
		{
			windows_error(_T("MapViewOfFile()")); CloseHandle(map_handle);
			return NULL;
		}

		CloseHandle(map_handle);	/* Finished with this */

		g_free_list->alloc = 1;

		for (cd_tmp = g_free_list; cd_tmp < g_free_list + LIST_SIZE - 1; cd_tmp++)
		{
			cd_tmp->packet = &cd_tmp->packet_data; 
			cd_tmp->next = cd_tmp + 1;
		}

		cd_tmp->next = NULL;
	}

	cd_tmp = g_free_list;			/* Allocate first node */
	g_free_list = cd_tmp->next;		/* Pop of list */

	UNLOCK();

	return cd_tmp;
}

Thanks guys!

Offline

#2 2013-10-15 12:45 PM

RobSeace
Administrator
From: Boston, MA
Registered: 2002-06-12
Posts: 3,839
Website

Re: High Performance Data Structures

I'm not sure why you need to link the nodes together in a linked list, since it appears the only time you need to traverse them is for marking the new head of the free list, which could be done by just traversing it like an array instead...  Indeed, you already have to traverse it like an array after allocation to fill in the "next" pointers!  If you got rid of those, you'd have a slightly smaller struct, plus you could get rid of that loop filling in "next" for each node...  You'd just need to notice when you're at the end of the array of allocated nodes to set the free list pointer to NULL rather than running off the end of the array...  There are various ways to handle it: set a pointer marking the end of the array at allocation time; set a counter of free nodes in the array, which you decrement each time you hand one out; maintain an index into the array of the next free node, rather than change the head pointer at all; etc...  I'm not sure which way is best...

I'm not sure why you need the seemingly redundant "packet" pointer rather than just accessing "packet_data" directly, either...  But, if you really did for some reason, you could set that pointer later when you return the new node, rather than need to do it up front at allocation time...

Also, your libc malloc()/calloc() should be about as well optimized as you can hope for, so trying to outsmart it by going straight to mmap() (which I assume is what's behind that CreateFileMapping() stuff) may end up being unwise...  At least on Linux, malloc() and friends will use mmap() when it's optimal to do so...  And, it's likely that it knows better than you when that really is...  (For one thing, you really want to be allocating even multiples of your native page size when using mmap()...)

Also, you may want to look into rearranging the fields in your structs to make them naturally aligned...  Otherwise, your compiler will likely insert useless padding to make field access more efficient, bloating up the size of the structs unnecessarily...  Basically, you want every field to fall on an address that's aligned to sizeof(field)...  Ie: pointers should be on an address that's aligned to whatever your native pointer size is (64 or 32 bits), ints on a 32-bit alligned address, shorts on 16-bit alligned address, and chars you can't really misalign (but you can still cause unecessary padding if you put them in between other fields that do need to be aligned)...  The easiest approach is to just group all like fields together from the top down in order of their size...  Eg: all pointers at the top, followed by longs, followed by ints, then shorts, then chars at the end...

Offline

#3 2013-10-17 02:56 PM

RipRage
Member
From: England
Registered: 2010-01-06
Posts: 146

Re: High Performance Data Structures

#define REALLOC(x,b) HeapReAlloc(GetProcessHeap(), HEAP_ZERO_MEMORY, (x), (b))
#define MALLOC(x) HeapAlloc(GetProcessHeap(), HEAP_ZERO_MEMORY, (x))
#define FREE(x) HeapFree(GetProcessHeap(), 0, (x))

/*
 * The following structures are used with input output completion ports 
 */
#define OP_RECV 0
#define OP_SEND 1

#define PACKET_SIZE 256

typedef struct _PACKET_DATA
{
	OVERLAPPED overlapped;
	WSABUF wsabuf;
	int len;
	char packet[PACKET_SIZE];
	u_char optcode:1;

} PACKET_DATA;

typedef struct _CLIENT_DATA
{
	PACKET_DATA packet_data;
	SOCKET socket;
	char address[INET6_ADDRSTRLEN];
	u_char secured:1;
	u_char free:1;
	
} CLIENT_DATA;

#define CD_POOL_SIZE 25

static CLIENT_DATA *g_cd_pool;
static unsigned int g_cd_counter, g_cd_end;
static CRITICAL_SECTION g_cd_pool_lock;

#define INIT_LOCK() (InitializeCriticalSection(&g_cd_pool_lock))
#define LOCK() (EnterCriticalSection(&g_cd_pool_lock))
#define UNLOCK()	 (LeaveCriticalSection(&g_cd_pool_lock))

Last edited by RipRage (2013-10-20 11:02 PM)

Offline

#4 2013-10-17 08:07 PM

RobSeace
Administrator
From: Boston, MA
Registered: 2002-06-12
Posts: 3,839
Website

Re: High Performance Data Structures

I'm not sure of the point of the "index" field...  It would seem redundant, since you can calculate the array index by just performing pointer arithmetic of the node pointer minus "g_cd_pool"...  And, I'm not sure I know why you need access to the index value anyway...  I don't understand what that strange index testing is doing there in free_client()...  How could it ever NOT match the index unless you've got a serious bug of some sort??

The "alloc" field now seems somewhat misnamed, and is now just marking the last node on the array, right?  If so, shouldn't you set the previous end node's value to FALSE when growing the array?

Also, I'm unfamiliar with those Heap*Alloc() functions, but do they guarantee fully nulled-out memory being returned, as you'd expect to get from calloc()?  If not, you'll have spurious values for "alloc" and "free" and all the fields after initial allocation, so you can't really trust them anyway...

Also, when grabbing a new free node, you increment the counter first and then use that as the index of the free node to return, when I think you really want to use the current value of the counter as the index of the node to return now, and then increment the counter AFTER you've used it to index into the array, so it'll be set for the next use...  As it is now, you're never using the very first node on the array, it looks like to me...

Offline

#5 2013-10-18 06:42 PM

RipRage
Member
From: England
Registered: 2010-01-06
Posts: 146

Re: High Performance Data Structures

Offline

#6 2013-10-19 02:01 PM

RobSeace
Administrator
From: Boston, MA
Registered: 2002-06-12
Posts: 3,839
Website

Re: High Performance Data Structures

cd_tmp = g_cd_pool + size - 1;
g_cd_end = cd_tmp - g_cd_pool;

That's just needless arithmetic...  You can replace it all with:

g_cd_end = size - 1;
cd_tmp = (CLIENT_DATA *) REALLOC (g_cd_pool, size * sizeof(CLIENT_DATA));
if (cd_tmp == NULL)
      windows_error(_T("REALLOC()")); FREE(g_cd_pool);

Are you missing curly braces to enclose that FREE() call in the error-handling block?  As it stands now, it's always getting called, which I don't think is what you want, if that REALLOC() behaves like normal libc realloc()...

cd_tmp = g_cd_pool + (*cd_active - g_cd_pool);
cd_tmp->free = TRUE;

I think "*cd_active" can just be directly accessed, without needing to recalculate its place in the pool...  So, I'd get rid of "cd_tmp" entirely and just do:

(*cd_active)->free = TRUE;

Offline

#7 2013-10-19 03:36 PM

i3839
Oddministrator
From: Amsterdam
Registered: 2003-06-07
Posts: 2,239

Re: High Performance Data Structures

The whole pooling thing just makes your code more complicated with zero gain.
Just call malloc/calloc() once for each new client. Malloc's implementation already
does the whole "pooling" thing internally, so you won't gain any speed. And it probably
has a more multithreading friendly implementation than your one big lock approach.
Other than that, client connect/disconnect events aren't often enough that the malloc
will take any noticeable time, especially with all the other stuff going on with new
connections.

I also don't understand why you don't use standard C functions like malloc instead of
MS Windows specific ones. If the API's better, sure, if you're writing non-portable code,
go ahead. But apparently it's so ugly you have to write your own wrapper functions
around them. Can as well use calloc() then. Or is the reason you use those that you've
heard that they are per thread and hence faster for multithreaded code and you're just
prematurely optimising? (While serialising everything under your own one big global lock..)

Now, about your data structures:

I wouldn't have such a one-to-one mapping between packets and user data, surely clients
are allowed to send or receive more than one packet?

Are you sure you need to keep the packet data around when using blocking-sockets?
Because if a packet is handled immediately, you only need to allocate the data on the
stack and don't have to keep it hanging around.

And do you need to keep "overlapped" and "wsabuf" around or are they only used when
doing a socket call and useless the rest of the time?

So about high performance data structures: Fastest are the ones allocated on the stack.
For high performance multithreaded data structures the most important rule is that
different CPUs (threads) are working on different cache lines to avoid cache line bouncing.
It's okay to share (mostly) read-only data between CPUs/threads, but for writing you want
minimize sharing as much as possible. Everything happens with cache line granularity,
usually 64 byte blocks, so adjacent fields influence each other (also the start and end
fields one structure with another structure if they're not aligned on 64 bytes).

In your example, you want to make sure that the frequent operations (probably sending,
receiving and handling packets) happen as fast as possible without taking any global locks.

So with whatever data structure you come up, that is what it needs to be optimise for, even
if it means that creating or deleting a client is slightly slower. Somewhere you have to handle
all threads, that's the place to manage your clients too. If you have one client per thread it's
simple, if you handle multiple clients with one thread then have a list per thread.

But if clients need to send packets to a lot of other clients all the time, then perhaps using
multiple threads isn't the best solution.

All in all you have to take a step back and look at the whole picture to see what the best
solution is. (But always listen to what Rob says!)

Offline

#8 2013-10-20 02:05 PM

RobSeace
Administrator
From: Boston, MA
Registered: 2002-06-12
Posts: 3,839
Website

Re: High Performance Data Structures

After seeing the multiple different presumably Windows-specific (what other sane system uses fucking CamelCase for standard library functions?!) *alloc() replacements used, I just assumed that Windows' libc malloc() and friends were so poorly implemented that no one actually used them, or something...  Which wouldn't surprise me in the slightest...  It would be just like MS to try to get people to use their "better" replacement functions with totally non-standard APIs, just to make it harder to port to any other system...  And, in that case, it may even make sense to do your own pooling of allocated data like that, if the replacement allocators you're using don't do such a thing for you already... *shrug*

But, yeah, on any sane system, such a scheme is not something I'd ever worry too much about, at least not right off the bat...  If it can be demonstrated later that such a thing would improve efficiency noticably, then maybe add it in later...  But, there are probably much bigger targets to go after first than that...

Offline

#9 2013-10-20 10:58 PM

RipRage
Member
From: England
Registered: 2010-01-06
Posts: 146

Re: High Performance Data Structures

DWORD WINAPI server_worker (void *param)
{
  CLIENT_DATA *client_data;
  PACKET_DATA *packet_data;

  OVERLAPPED *overlapped;
  DWORD bytes, flag, ret;

  int rc;

  HANDLE iocp = (HANDLE) param;

  for (;;)
  {
	/* Block here until the system notifies us that an I/O operation has completed */
	ret = GetQueuedCompletionStatus(iocp, &bytes, (PULONG_PTR)&client_data, 
	  &overlapped, INFINITE);
    
	packet_data = (PACKET_DATA *) overlapped;

	/* If the client has hung up then close the socket and free the node */
	if (bytes == 0 && (packet_data->optcode == OP_SEND || packet_data->optcode == OP_RECV))
	{
	  close_socket(client_data->socket); free_client(&client_data);
	  continue;
	}

	switch (packet_data->optcode)
	{
	case OP_RECV:

	  /* We have some data, echo it back to client */
	  ZeroMemory(&(packet_data->overlapped), sizeof(OVERLAPPED));

	  packet_data->wsabuf.buf = packet_data->packet;
	  packet_data->wsabuf.len = bytes;

	  packet_data->optcode = OP_SEND;

	  rc = WSASend(client_data->socket, &(packet_data->wsabuf), 1, &bytes, 
		0, &(packet_data->overlapped), NULL);

	  if (rc == -1 && WSAGetLastError() != WSA_IO_PENDING)
	  {
		close_socket(client_data->socket);
		free_client(&client_data);
	  }

	  break;

	case OP_SEND:

	  /* Here we check that the call to WSASend() completed successfully or post
	  another call to WSASend() if theirs still data left in the buffer, if all data has 
          been sent, then post another call to WSARecv() which will be handled above */
	  break;
	}
  }
}

void iocp_server(const TCHAR *node, const TCHAR *service)
{
  CLIENT_DATA *client_data;
  SYSTEM_INFO system_info;

  DWORD bytes, flag, i;
  HANDLE thread, iocp;

  SOCKET listener;
  int ret;

  INIT_LOCK();

  start_winsock();

  /* Create our IOCP */
  iocp = CreateIoCompletionPort(INVALID_HANDLE_VALUE, NULL, 0, 0);
  if (iocp == NULL)
	windows_error(_T("CreateIoCompletionPort()"));

  /* Create worker threads to service the overlapped I/O requests
  NOTE: The thread handles are closed right away, because we 
  will not need them and the worker threads will continue to execute */
  GetSystemInfo(&system_info);

  for (i = 0; i < system_info.dwNumberOfProcessors; i++)
  {
	thread = CreateThread(NULL, 0, server_worker, iocp, 0, NULL);
	if (thread == NULL)
	  windows_error(_T("CreateThread()"));

	CloseHandle(thread);
  }

  listener = tcp_listen(node, service, WSA_FLAG_OVERLAPPED, CD_POOL_SIZE);

  /* Main Accept loop */
  for (;;)
  {
	client_data = alloc_client();

	client_data->socket = WSAAccept(listener, NULL, NULL, NULL, 0);
	if (client_data->socket == -1)
	  winsock_error(_T("WSAAccept()"));

	/* Add our new socket handle to the existing input output completion port 
	along with its associated key data */
	if (!CreateIoCompletionPort((HANDLE)client_data->socket, iocp, (ULONG_PTR)client_data, 0))
	  windows_error(_T("CreateIoCompletionPort()"));

	/* Set up overlapped information and call WSARecv */
	client_data->packet_data.wsabuf.len = sizeof(client_data->packet_data.packet);
	client_data->packet_data.wsabuf.buf = client_data->packet_data.packet;

	client_data->packet_data.optcode = OP_RECV;

	flag = 0;
	ret = WSARecv(client_data->socket, &client_data->packet_data.wsabuf, 1, &bytes, 
	  &flag, &client_data->packet_data.overlapped, NULL);

	if (ret == -1 && WSAGetLastError() != WSA_IO_PENDING)
	{
	  close_socket(client_data->socket);
	  free_client(&client_data);
	}
  }
}

Last edited by RipRage (2013-10-20 11:12 PM)

Offline

#10 2013-10-21 11:46 AM

RobSeace
Administrator
From: Boston, MA
Registered: 2002-06-12
Posts: 3,839
Website

Re: High Performance Data Structures

Ugh...  I'm very glad I don't have to write code to run on Windows, that's all I can say...  I'd go mad within a week...  If my shift key didn't bust first from overuse typing out all the fucking CamelCase function names!

Offline

#11 2013-10-21 12:10 PM

RipRage
Member
From: England
Registered: 2010-01-06
Posts: 146

Re: High Performance Data Structures

It's horrible isn't it! The way we use java at uni at the moment is even worse, we have to use a mix of lower and uppercase, I.e getStringLength()  I fucking hate it! but if we don't stick to the rules we lose marks :( which I think is unfair as long as the code is readable, I say let people discover their own style! Its all part of the fun!

Offline

#12 2013-10-21 04:17 PM

i3839
Oddministrator
From: Amsterdam
Registered: 2003-06-07
Posts: 2,239

Re: High Performance Data Structures

Offline

#13 2013-10-21 08:40 PM

RobSeace
Administrator
From: Boston, MA
Registered: 2002-06-12
Posts: 3,839
Website

Re: High Performance Data Structures

Offline

#14 2013-10-22 07:10 PM

RipRage
Member
From: England
Registered: 2010-01-06
Posts: 146

Re: High Performance Data Structures

Hey guys!

I've decided I'm going to dual boot my machine back to Linux over the weekend :) (much more enjoyable to practise C on, still debating which distro to install Debian vs Ubuntu or perhaps red hat (which ive never used before)) I also just wanted to ask for some ideas of things to program ? (It can be anything really, as long as I can actually complete it and have a use for it... )  I've just recently purchase a good data structure book which I will try and master also!

Offline

#15 2013-10-22 08:16 PM

RobSeace
Administrator
From: Boston, MA
Registered: 2002-06-12
Posts: 3,839
Website

Re: High Performance Data Structures

Offline

#16 2013-10-23 12:00 PM

RipRage
Member
From: England
Registered: 2010-01-06
Posts: 146

Re: High Performance Data Structures

Thanks for your suggestions Rob, defiantly something I will look into! But for now I've got to figure out what's gone wrong with fedora, I've lost access to the windows partition, grub can't seem to detect it so I must of overwritten the boot loader or its something to do with this new UEFI crap, oh the joys lol :(

Last edited by RipRage (2013-10-23 12:02 PM)

Offline

  • Index
  • » C
  • » High Performance Data Structures

Board footer

Powered by FluxBB