During the last year we've been developing a complex semi-realtime system in production. We decided to write it with Golang. We had little to no experience in Go, so as you might imagine it was not trivial.

Fast forward a year: the system is running in production and became one of the major pillars in ClimaCell's offering.

To be proficient means that you have enough experience to know what are the pitfalls of the platform you are using and how to avoid them.

I want to describe three pitfalls we've encountered in our quest with Golang, in hopes that it will help you to avoid them off the gate.

for-range mutability

Consider the following example:

package main

import (
	"fmt"
	"sync"
)

type A struct {
	id int
}

func main() {
	channel := make(chan A, 5)
	
	var wg sync.WaitGroup
	
	wg.Add(1)
	go func() {
		defer wg.Done()
		for a := range channel {
			wg.Add(1)
			go func() {
				defer wg.Done()
				fmt.Println(a.id)
			}()
		}

	}()
	
	for i := 0; i < 10; i++ {
		channel <- A{id:i}
	}
	close(channel)
	
	wg.Wait()
}

We have a channel holding struct instances. We iterate over the channel with  range operator. What do you think will be the output of this piece of code?

6
6
6
6
6
9
9
9
9
9

Weird isn't it? We expected to see the numbers 1-9 (not ordered of course).

What we actually see is the result of mutability of the loop variable:

In every iteration we get a struct instance to work with. Structs are value types - they are copied to the for iteration variable in each iteration. The key word here is copy. To avoid large memory print, instead of creating a new instance of the variable in each iteration, a single instance is created at the beginning of the loop, and in each iteration the data is copied on it.

Closures are the other part of the equation: closure in Go (like most languages), hold a reference of the objects in the closure (not copying the data), so the inner go routine takes a reference of the iterated object, meaning the all the go routines get the same reference to the same instance.

The solution

First of all be aware that this happens. Its not trivial since its a completely different behavior from other languages (for-each in C#, for-of in JS - in those the loop variable is immutable)

To avoid this pitfall, capture the variable within the scope of the loop, thus creating a new instance yourself, and then use it however you would like:

go func() {
	defer wg.Done()
	for a := range channel {
		wg.Add(1)
		go func(item A) {
			defer wg.Done()
			fmt.Println(item.id)
		}(a) // Capture happens here
	}
}()
	

Here, we use the function call of the inner go routine to capture a - effectively copying it. It can also be copied explicitly:

for a := range channel {
	wg.Add(1)
	item := a // Capture happens here
	go func() {
		defer wg.Done()
		fmt.Println(item.id)
	}()
}

Notes

  • For large data sets, note that capturing the loop variable will create large number of objects, each saved until the underlying go routine will be executed, so if the object contains several fields, consider capturing only required fields for the execution of the inner routine
  • for-range as an additional manifestation for arrays. It also creates an index loop variable. Note that the index loop variable is also mutable. I.e. to use it within a go routine, capture it that same way as you do with the value loop variable
  • On the current Go version (1.15), the initial code we saw will actually throw an error! Helping us avoid this issue and enforcing us to capture the data we need

Beware of :=

GoLang have two assignment operators, = and :=:

var num int
num = 3

name := "yossi"

:= is pretty useful, allowing to avoid variable declaration before assignment. Its actually a common practice in many typed languages today (like var in C#). Its very useful and keeps the code cleaner (my humble opinion).

But, as lovely as this is, when combined with a couple of other behaviors in GoLang ,scope and multiple return values, we can encounter unexpected behavior. Consider the following example:

package main

import (
	"fmt"
)

func main() {
	var data []string
	
	data, err := getData()
	if err != nil {
		panic("ERROR!")
	}
	
	for _, item := range data {
		fmt.Println(item)
	}
}

func getData() ([]string, error) {
	// Simulating getting the data from a datasource - lets say a DB.
	return []string{"there","are","no","strings","on","me"}, nil
}

In this example we read an array of strings from somewhere and print it:

there
are
no
strings
on
me

Please note the usage of := :

data, err := getData()

Note that even though data is already declared we can still use := since err is not. Nice shorthand which creates a much cleaner code.

Now lets modify the code a bit:

func main() {
	var data []string
	
	killswitch := os.Getenv("KILLSWITCH")
	
	if killswitch == "" {
		fmt.Println("kill switch is off")
		data, err := getData()
	
		if err != nil {
			panic("ERROR!")
		}
		
		fmt.Printf("Data was fetched! %d\n", len(data))
	} 	
	
	for _, item := range data {
		fmt.Println(item)
	}
}

What do you think will be the result for this piece of code?

kill switch is off
Data was fetched! 6

Weird, isn't it? Since the kill switch is off, we DO load the data - we even print the length of it. So why the code doesn't print it like before?

You guessed it - because of :=!

Scope in GoLang (like most modern laguages) is defined with {}. Here, this if creates a new scope:

if killswitch == "" {
	...		
} 	

Because we use :=, Go will treat both data and err as new variables! I.e. data within the if clause is actually a new variable, which is discarded when the scope closes.

We've encountered such behavior several times on initialization flows - which usually expose some sort of package variable, initialized exactly as described here, with a kill switch to allow us to disable certain behaviors on production. The  implementation above will cause an invalid state of the system.

The solution

Awareness - did I say it already? :)

On some cases the Go compiler will issue a warning or even an error, if the internal variable in the if clause is not used for example:

if killswitch == "" {
	fmt.Println("kill switch is off")
	data, err := getData()

	if err != nil {
		panic("ERROR!")
	}
} 

// Will issue an error :
data declared but not used

So be aware of warnings upon compilation.

Nevertheless, sometimes we do actually use the variable within the scope, so an error will not be issued.

Anyway, the best course of action is to try to avoid := shorthand - especially when it is related to multiple return values and error handling, and keep extra attention when deciding to use it:

func main() {
	var data []string
	var err error // Declaring err to make sure we can use = instead of :=
	
	killswitch := os.Getenv("KILLSWITCH")
	
	if killswitch == "" {
		fmt.Println("kill switch is off")
		data, err = getData()
	
		if err != nil {
			panic("ERROR!")
		}
		
		fmt.Printf("Data was fetched! %d\n", len(data))
	} 
	
	
	for _, item := range data {
		fmt.Println(item)
	}
}

Will result:

kill switch is off
Data was fetched! 6
there
are
no
strings
on
me

Remember, as the code evolves different people will modify it. Code that was not in a different scope before might be in the future. Keep your eye out when you modify existing code, especially when moving it to a different scope.

WorkerPool. Captain WorkerPool

Consider the following example:

package main

import (
	"fmt"
	"sync"
	"time"
)

type A struct {
	id int
}

func main() {
	start := time.Now()

	channel := make(chan A, 100)
	
	var wg sync.WaitGroup
	
	wg.Add(1)
	go func() {
		defer wg.Done()
		for a := range channel {
			process(a)
		}

	}()
	
	for i := 0; i < 100; i++ {
		channel <- A{id:i}
	}
	close(channel)
	
	wg.Wait()

	elapsed := time.Since(start)
	fmt.Printf("Took %s\n", elapsed)
}

func process(a A) {
	fmt.Printf("Start processing %v\n", a)
	time.Sleep(100 * time.Millisecond)
	fmt.Printf("Finish processing %v\n", a)
}

Same as before, we have a for-range loop on a channel. Lets say that the process function contains an algorithm we need to run and is not very fast. If we process, lets say 100000 items the code above will run for almost 3 hours (process runs 100ms in the example). So instead, lets do this:

package main

import (
	"fmt"
	"sync"
	"time"
)

type A struct {
	id int
}

func main() {
	start := time.Now()
	
	channel := make(chan A, 100)
	
	var wg sync.WaitGroup
	
	wg.Add(1)
	go func() {
		defer wg.Done()
		for a := range channel {
			wg.Add(1)
			go func(a A) {
				defer wg.Done()
				process(a)
			}(a)
		}

	}()
	
	for i := 0; i < 100; i++ {
		channel <- A{id:i}
	}
	close(channel)
	
	wg.Wait()
	
	elapsed := time.Since(start)
	fmt.Printf("Took %s\n", elapsed)
}

func process(a A) {
	fmt.Printf("Start processing %v\n", a)
	time.Sleep(100 * time.Millisecond)
	fmt.Printf("Finish processing %v\n", a)
}

Instead of processing the items in serial, we dispatch a Go routine for each item in the channel. We want to utilize Go's amazing concurrency handling to help us process the data faster:

Items Without Go routines With Go Routines
100 10s 100ms

Theoretically this will also work for 100K items, right?

Unfortunately, the answer is "it depends".

To understand why, we need to understand what happens when we dispatch a go routine. I won't go into it in depth since it is out of scope for this article. In short, the runtime creates an object which contains all the data relevant to the go routine and stores it. When execution of the go routine is done, it is evicted. The minimal size of a go routine object is 2K, but it can reach up to 1GB (on 64 bit machine).

By now, you probably know where we are going - the more go routines we create, the more objects we create, hence memory consumption is increasing. Furthermore, go routines needs execution time from the CPU to do the actual execution, so the less cores we have, more of these objects will remain in memory waiting for execution.

On low resource environments (Lambda functions, K8s pods with resitricted limits), both CPU and memory are limited, the code sample will create stress on the memory even at 100K go routines (Again, depending on how much memory is available for the instance). In our case, on Cloud function with 128MB of memory we were able to process ~100K items before crashing.

Note that the actual data we need from an application point of view is pretty small - in this case, a simple int. Most of the memory consumption is the go routine itself.

The solution

Worker pools!

A worker pool allows us to manage the number of go routines we have, keeping memory print low. Lets see the same example with worker pool:

package main

import (
	"fmt"
	"sync"
	"time"
)

type A struct {
	id int
}

func main() {
	start := time.Now()
	
	workerPoolSize := 100
	
	channel := make(chan A, 100)
	
	var wg sync.WaitGroup
	
	wg.Add(1)
	go func() {
		defer wg.Done()
		for i := 0;i < workerPoolSize;i++ {
			wg.Add(1)
			go func() {
				defer wg.Done()
				
				for a := range channel {
					process(a)
				}
			}()
		}

	}()
	
	// Feeding the channel
	for i := 0; i < 100000; i++ {
		channel <- A{id:i}
	}
	close(channel)
	
	wg.Wait()
	
	elapsed := time.Since(start)
	fmt.Printf("Took %s\n", elapsed)
}

func process(a A) {
	fmt.Printf("Start processing %v\n", a)
	time.Sleep(100 * time.Millisecond)
	fmt.Printf("Finish processing %v\n", a)
}

We limited the number of worker pool to 100 and for each one created a go routine:

go func() {
	defer wg.Done()
	for i := 0;i < workerPoolSize;i++ {
		wg.Add(1)
		go func() { // Go routine per worker
			defer wg.Done()
				
			for a := range channel {
				process(a)
			}
		}()
	}
}()

Think of the channel as a queue, where each worker go routine is a consumer of the queue. Go's channels allow mutliple go routines to listen to the same channel, where each item in the channel will be processed once.

The upside: we can now plan our environment as the memory print now is expected and can be measured:

the size of the worker pool * expected size of a single go routine (min 2K)

The downside: execution time will increase. When we limit memory usage we pay for it with increased execution time. Why? previously we dispatched a go routine per item to process - effectivly creating consumer per item. Virtually gives us infinite scale and high concurrency. In reality its not correct as the execution of the go routines depends on the availability of the cores running the application. What it does mean that we'll have to optimize the number of workers according to the platform we run on, but it makes sense to do so in high volume systems.

To summarize: Worker pools gives us more control on the execution of our code. They allow us predictability so we can plan and optimize both our code and the platform to scale to high throughput and high volume of data.

I would recommend to always use worker pools in cases where the application needs to iterate on sets of data - even small ones. Using worker pools we are now able to process millions of items on Cloud functions without even even close to the limits the platform enable, giving us enough breathing room to scale.

Notes

  • Number of workers should be configurable (env variable for example), to allow you to play with the number and reach the result you want on each platform you run on
  • Set the channel size  to at least the number of workers in the pool - this will allow the producer of data to fill the queue and will prevent the workers to wait idle while the data is generated. Make it configurable as well.

Conclusion

What makes us better professionals is the ability to learn from our mistakes. But learning from someone else's can be equally important.

If you reach this far - Thanks!

I hope what we've seen here will help you, dear reader, to avoid the mistakes we've made on our journey with GoLang.