After spending way more time than expected on getting microphone input and output to work correctly and fixing a bug in libsoundio along the way and a week at WWDC, I finally got to start implementing multiplayer.
Here is a video of what I have so far:
Useful Libraries
For networking in previous projects I used ENet and because it always worked great, I am using it again :). ENet is a lightweight multiplatform networking library on top of UDP that provides an easy to use interface to connect peers and send data around. It takes care of managing packet order and bandwidth and even offers reliable packets when needed. It really just works.
As the data that is sent around needs to come from somewhere and is ideally as compact as possible while being easy to read and write, I am also using protobuf. It consists of a data structure description language, a compiler to turn those descriptions into structures in a language of your choice and a library to serialize and deserialize those structures to and from compact binary blobs, but also other formats.
General Architecture
As my endgoal is a competetive multiplayer game cheating hopefully becomes a thing and I will need ways to prevent it as much as possible. So the first choice I made is to use a server authorative client server architecture. This means that there will always be one server instance all clients connect to and this server instance verifies everything it gets from the clients and sends back corrected data if there is something off.
Quite common for multiplayer games is to have the clients send their user input as it doesn't change that often, but in my case most input is absolute data from head and hand tracking. Verifying that on the server side is more complicated, but I guess a first verification could be that the hands can only have a maximum distance from the head and the head position can't change faster than at a certain speed. Also obstacles need to be taken in account and I am sure a lot more can be done. But to get going I currently just trust the clients to send good data and have the server pass it on to all clients. I will probably write another devlog about this topic once I get there.
My longterm goal will be to provide dedicated servers, but as those are quite expensive, I am starting out with one of the players being the server (which obviously allows that player to cheat somewhat easily) and maybe implement a mechanism for players to automatically reconnect to a new host if the previous one disconnected.
Connection Handling
When a client wants to connect to the server, ENet takes care of notifying both and exchanging some initial data. After that the server automatically sends pings to it's clients which are needed internally by ENet. If a ping is not responded to for a while the server detects it as a disconnect. Of course a client can also gracefully disconnect, which will send a disconnect message to the server and then the server will tell the client that it's fine to close the connection now and everything is great, but obviously only works if the client knows it wants to disconnect and has enough time for this to happen. This all is part of ENet and mostly works automatically.
If I were to only send reliable packets around, the disconnect timeout would also work on the client if the server went down, but with unreliable packets that don't need an acknowledgement to be sent back the client will never know if the server is still there or not. To solve this I implemented my own timeout which gets reset every time the client receives a packet. If it doesn't for a while it assumes that it disconnected.
To let other clients know about a new player joining or disconnecting the server will broadcast connect and disconnect messages to all clients and to be sure they actually get them, these are sent as reliable packets. An important part about the connect messages is that they also include a unique client id assigned by the server and used to identify which client a message belongs to. ENet actually has it's own concept of client IDs, but I didn't find much information on it and in the end it seemed more flexible to just generate my own. I am currently generating them starting at 1 (0 is the server) counting up and reusing the ones of disconnected clients (not entirely sure why I do that, it just seemed nicer...).
While the clients take care of writing their own id into messages, the server uses it's own internal mapping (using the enet peers data pointer) and overwrites those id's before sending the messages on to the clients, to prevent one client posing as a different one.
Data Structure
The data I send around is still very much WIP and will probably change a lot over time, but the general idea right now is to have a super "message" (which is what protobuf calls it's objects) that can either contain information about a client connecting or disconnecting, a players current state, which currently consists of head and hand position and orientation or speech data, but more on speech in another devlog.
For this I use protobufs "oneof" label and a couple of custom messages:
message Packet
{
oneof content
{
Connection connection = 1;
PlayerState playerState = 2;
Speech speech = 3;
}
}
Those other types look like this:
message Connection
{
enum State
{
CONNECTED = 0;
DISCONNECTED = 1;
REFUSED = 2;
}
uint32 id = 1;
State state = 2;
string message = 3;
}
message PlayerState
{
uint32 id = 1;
Head head = 2;
Hand leftHand = 3;
Hand rightHand = 4;
}
message Speech
{
uint32 id = 1;
bytes data = 2;
}
Where the Hand and Head types both only contain a vector for position and quaternion for orientation. I already use two different messages though, as Hands for example will most probably get some additional data for finger tracking in the future.
When to send Packets
The easiest would be to send a new position every frame, but framerates can vary and more importantly at 90 fps much more packes are sent than usually needed. I solved this by implementing a timer and having clients send messages about 50ms appart. I am thinking about dynamically adjusting this based on movement speed and distance to other players. As this is supposed to become a sword fighting game, precise hand movements are going to be somewhat important and 50ms might not be good enough.
The time it takes for messages to make it to and from other clients is probably going to be a way bigger problem though.
Synchronizing Players
I am currently going for some standard techniques (as for example described here) which I can built upon when needed. This means that the player is fully simulated on the client for smooth movement and only corrected if the server sends back something different. Due to lag between sending the message to the server and getting the corrected response, I'll have to include a packet identifier and only correct if the player position at the time it sent it was wrong, because the current position will most likely always differ. I don't have this yet though.
I started out by only moving other players when their message was received, but the result is not exactly smooth. I improved this by storing their previous position and orientation the new one and then interpolating between those over time until a new message is received. It still doesn't look great sometimes, but it is already a big improvement.
The main problem is that the other player I see in my game is way behind the real player playing. I might need some more advanced prediction for this in the future, but those might end up being wrong resulting in much worse problems.
Abstraction in the Game
I often read that it is very hard to turn a singleplayer game into multiplayer and as it turns out this is absolutely true. Fortunately I am just getting started and have a chance to already do some useful abstractions without having to change everything.
An instance of the game can either have a client or a server object that handles incoming and outgoing packets. In both cases there can be a local player that either directly passes it's packets on to the server to broadcast or in case of it being on client, will send them to the server and once it's implemented will also take care of corrected incoming data. Then there is also a player class which will be instanciated for every unknown clientID and be updated with new data when ever it is received. I might just have this class do a full movement simulation with physics on all clients and pass the result on the server on as the corrected data, but maybe there will be a different player version on the server in the future.
Testing
Most of my testing so far is just several instances on the same PC, so right now lag is almost not existent. I did already try it on my local network too, but obviously that isn't much better. But it's still too early to worry about lag and dropped packets and such anyway :).