go concurrency correctness
Go has great concurrency support and as with all concurrent code, it’s very easy to write code that’s doesn’t perform as expected or intended. This is a look at correctness in concurrent go, and some of the considerations.
correctness
How do we define correctness, the dictionary definition is:
“the quality or state of being free from error; accuracy.”
But as a whole, correctness is a deep and well read computer science topic. It includes a basis for formally verifying a program; the act of proving or disproving an algorithm, with respect to a formal specification.
Both correctness as a compsci topic and in our case depends on the spec, where the spec is usually grounded in mathematical techniques.
We can consider concurrent correctness to be a program that contains concurrent (but not necessarily parallel) code and produces a logically expected and deterministic output.
why is correctness hard?
Humans are logical, often serial thinkers. We often consider that because some code block appears after another on the page, it will be executed in that order. This is empirically true for code
that runs in serial, notwithstanding goto
, if-else
, defer
and other control structures.
This model of code execution falls down when you start to consider parallel code, which by definition is more than one code block running at the same time. But introduce concurrency, and you’ve got a potentially more confusing half way house. Concurrency is a property of the code, and parallelism is a property of the runtime for the code.
Writing concurrent code does not necessarily mean that it will run in parallel (do more than one thing at once). Concurrent code just says that it could do more than one thing at once. Imagine concurrent code that runs on a machine with a processor that has a single core, with a single tthread; processor scheduling and time slicing may give the impression of parallel execution, but really it’s serial. The fact that the code is concurrent only means that it is built to handle and exploit multithreaded environments, not that it has to be run in one.
golang’s fork-join model
Go uses a fork-join model to achieve concurrency, a single goroutine will fork with the go func(...)
statement to run a task concurrently, and then expects a “join” operation to happen to allow
serial execution to continue. Go has a number of ways of achieving a “join”, but all of them are initiated by the new goroutine. There’s currently no way of stopping an executing goroutine without it
blocking on a join point.
So what is a join point? A join point is a point in the code where the goroutine indicates that it can pause, and the scheduler can run some other part goroutines before coming back to this one. Join points are often used to communicate between goroutines, or stop concurrent execution.
Some of the most common join-point creating operations in golang are: mutex.Lock()
, <-chan
, chan<-
, waitGroup.Wait()
All of these operations will pause the execution of the current goroutine, this allows the go scheduler to kick in and decide if the execution of the current goroutine can continue, or if it’s better to execute some other routine before continuing here.
It’s also worth noting that a goroutine coming to the end of it’s execution e.g. the go func(...)
returning can be considered to be an implicit join point. As it too allows the end of the
concurrent task execution.
why do join points matter?
Join points are important because they are the basis for us to reason about the execution of a program. Without an explicit join point, concurrent programs are guaranteed to be non-determinstic. Consider the following example:
|
|
This example is trivially small, but has multiple things wrong with it. It’s output is non-deterministic, meaning we do not know (for sure) what the output will be. Some possibilities:
output 1:
|
|
output 2:
|
|
output 3:
|
|
In the example there’s no way to know when anonymous function started with go func()
will be executed. This means that we can get output 1
or output 2
where the anonymous function is executed
before or after the fmt.Println("sync")
call. output 3
happens because the sleep blocks the main goroutine allowing the go runtime to consider which routines to schedule, but it does not
enforce the execution of other routines. By not enforcing the scheduling and execution of the anonymous routine, it’s entirely possible that the main routine continues execution and exits after
the sleep before the anonymous function is executed, this created output 3
.
The sleep blocks the execution of the main thread, but does not create a join point. The sleep can expire / complete before the anonymous function gets a chance to execute. Adding a sleep in this
situation only increases the chances that other goroutines will get a chance to execute, but does not enforce this. The lack of enforcement means that we cannot consider this example program to be
concurrently correct. Imagine the scenario where the system calls required to print async
take longer than one second. On a multicore machine the program could exit before completing the print
operation. This means there are more than just 3 possibilities, and it’s hard to enumerate them all.
Looking back at the criteria for correctness we see that it:
- ✔️ contains concurrent code
- ❌ produces a logically expected and deterministic output
TL;DR the sleep increases the chance of other goroutines being executed, but does not guarantee it. Without that guarantee, the program is not correct.
how can we make this program correct?
WITH JOIN POINTS!
Adding some synchronisation between the two goroutines ensures that one cannot be executed before the other. Let’s assume we want the output:
|
|
|
|
A closed channel always allows reading a value (the zero value). Once we’ve printed async
we close the channel, this creates the join point that synchronises the anonymous and main goroutines
allowing the main goroutine to continue, and we know that the output will be what we expect. The main goroutine cannot contine execution until it’s recieved the notification on the channel ch
that the anonymous routine has finished.
Without using a synchronisation method such as a channel or mutex we cannot be sure of the output of the program, and therefore consider it to be incorrect.
why does correctness matter?
Correctness is the only way to guarantee that your program is not hiding bugs that are only exposed by the scheduler. Often called “finiky” bugs, these are errors that occur when something in the codes environment changes; change of machine it’s running on, change of load the machine is under, change of runtime version etc. All of these external changes can cause the bug hidden by the programs lack of correctness to rear it’s head. And what appears to cause the bug is often not related to the actual bug; this makes the hard and time consuming to track down and fix.
Writing correct programs in the first place means you do not have this problem. There are some rules to assist this;
- Any goroutine that spawns others must include a method of cleaning up what it has spawned.
- Never use sleeps to wait for, schedule, or orchestrate logic between goroutines.
- Unless 2 routines directly communicate, there’s no guarantee of execution order.
- Use
sync.WaitGroup
,sync.Mutex
orchan
to communicate between routines. - Unless the main gouroutine blocks until all routines are finished there’s no guarantee the others will get the time to complete.