Gotraceui—the manual
1. Introduction
The official implementation of the Go language depends on a runtime for memory management (via garbage collection) and for scheduling goroutines. The runtime does its work in the background, out of sight, but it provides a powerful tool for inspecting its actions: the runtime tracer.
The runtime tracer produces execution traces, which are exact (i.e., not sampled) streams of events describing in detail the execution of goroutines. It shows when goroutines start or stop, where they block, what unblocks them, how long they spend stuck in syscalls, and more. Traces also contain information about the work of the garbage collector (GC), indicating its various phases, and how much CPU time goroutines have to contribute to assisting the GC. In short, the runtime tracer provides a complete view of the interactions between the runtime and our code.
These traces can be useful in two situations: When the runtime isn’t behaving as we expect, negatively affecting the performance of our code, and when we want to analyze how our goroutines interact with each other and the outside world.
Because the runtime is responsible for scheduling goroutines, it may make decisions that we don’t agree with. For example, it might take an unusually long time to schedule an important goroutine. This might become apparent due to unreliable performance and large tail latencies. When this happens, traces are the best way to make sure that the issue does indeed lie with Go and not our code.
More commonly, traces can be used to understand the behavior of our own code. Virtually all interactions between goroutines go through the runtime. Similarly, most interactions with the outside world (such as file or network I/O, or syscalls) go through the runtime, too. Execution traces, then, provide an accurate view into what our code is doing while it’s not spending time on the CPU executing instructions. In other words, they show us what our code is waiting on. This, too, can help diagnose performance issues (this time caused by us), deadlocks, and more.
Finally, the runtime/trace
package allows us to annotate our code with regions, tasks, and logs.
These help us group spans and provide additional information, making tracing useful for debugging CPU usage, too.
Unfortunately, the tracer hasn’t seen widespread adoption. This can be attributed to shortcomings in the official frontend, the Trace viewer. The trace viewer isn’t a custom-made tool for viewing Go execution traces; instead, it is a repurposed copy of Chrome’s old tracing frontend, Catapult. Unfortunately, it wasn’t made to handle the millions of events that can occur in even short traces, nor does it have any functionality that helps with understanding the specifics of the behavior of the Go runtime. Finally, the UI is just weird, using controls that are neither ergonomic nor intuitive. Go tries to work around these shortcomings, by splitting large traces into smaller ones and by clever use of Catapult’s features. However, workarounds can only do so much and come with their own problems. For these reasons, the tracer is usually people’s last choice for debugging problems.
Gotraceui was created to address these problems. It was written from scratch, with a focus on displaying large Go execution traces and making them more accessible. It can handle millions of events in a single trace, overlay traces with CPU profiling data as well as memory usage, and annotate traces with metadata extracted from stack traces, among other things. Unlike the official frontend, Gotraceui provides both per-processor and per-goroutine timelines, with the latter often being much more useful for understanding the behavior of user code.
Even though the current version of Gotraceui is still in its early stages, lacking many of the envisioned features, it has already proven to be a useful tool in everyday use and people have diagnosed real problems with it without prior experience with Go’s traces.
The purpose of this manual is not just to explain every aspect of Gotraceui, but also to offer an introduction to execution traces themselves, and, where necessary, explain how the runtime works to make better sense of the traces it produces.
The authors hope that execution traces will become a standard debugging tool for most Go developers.
2. Install
2.1. NixOS, nixpkgs
Gotraceui can be installed from Nixpkgs.
The stable channel 23.11
contains version 0.3.0,
while the unstable channel contains the latest released version
(within reason, as it takes some time to update Nixpkgs after a new release of Gotraceui.)
2.2. Flatpak
An unofficial, third-party Flatpak exists at https://github.com/hdonnay/co.honnef.Gotraceui.
We’re not in control of it and you’d be installing it at your own risk.
The last time we looked at it (at commit ecf252fd56ce02071c7eb829de81726a0df98d51
), it seemed safe.
2.3. Linux
Gotraceui hasn’t yet been packaged by most Linux distributions. You can build it yourself, as long as you have the following development dependencies installed:
- Wayland
- X11
- xkbcommon
- GLES
- EGL
- libXcursor
On Fedora, you can use the following command to install all required dependencies:
dnf install gcc pkg-config wayland-devel libX11-devel libxkbcommon-x11-devel mesa-libGLES-devel mesa-libEGL-devel libXcursor-devel vulkan-headers
On Ubuntu, you can use the following command to install all required dependencies:
apt install gcc pkg-config libwayland-dev libx11-dev libx11-xcb-dev libxkbcommon-x11-dev libgles2-mesa-dev libegl1-mesa-dev libffi-dev libxcursor-dev libvulkan-dev
After having installed all dependencies, you can run Gotraceui with
go run honnef.co/go/gotraceui/cmd/gotraceui@latest
2.4. macOS
The only dependency on macOS is Xcode.
After having installed Xcode, you can run Gotraceui with
go run honnef.co/go/gotraceui/cmd/gotraceui@latest
2.5. Windows
There are no external dependencies on Windows.
To run Gotraceui you can use
go run -ldflags="-H windowsgui" honnef.co/go/gotraceui/cmd/gotraceui@latest
3. System requirements
Gotraceui runs on Linux (X11 and Wayland), Windows, and macOS.
Execution traces are very dense in information and can contain millions of events in the span of seconds.
The format emitted by runtime/trace
is optimized for small and low overhead output and is highly compressed.
To be able to process and display a trace, Gotraceui has to parse and materialize it in memory.
Memory usage is roughly 30× the size of the input trace.
That is, a 300 MB trace file will need about 9 GB of memory to be loaded by Gotraceui.
For reference, an example 300 MB trace file was produced by tracing a busy Prometheus instance for one minute,
resulting in 66,044,021 events.
This represents an extreme example.
Many of your traces will be much smaller than that.
For example, tracing net/http
’s tests produces a 7.3 MB trace instead.
4. Adding tracing to your application
4.1. The runtime/trace
package
The runtime/trace
package provides the interface between user code and runtime tracing.
It allows recording traces, as well as adding additional information to them, in the form of user regions and tasks.
The package is well documented and we recommend that you read it for a complete overview.
For the purposes of this manual, we will quickly describe how to use runtime/trace
to write trace files and how to add user regions.
To start tracing, use the Start
function and pass it the desired destination, usually a file.
To stop tracing, use the Stop
function.
It is crucial that Stop
gets called, as otherwise an incomplete trace may get written, which may not be parseable at all.
For short-lived applications, Start
is best called as early in your main function as possible, and Stop
right before returning.
For long-lived applications, it is better to start and stop tracing on demand, for example via an API call of some sort, to capture an interesting time window.
One possible implementation is net/http/pprof
, which is described in 4.2.
4.1.1. User annotations
It is possible to add your own information to traces by using user annotations, which encompass log messages, regions, and tasks.
Log messages show up as events in Gotraceui, consist of a category and message, and can be emitted via Log
or Logf
.
Regions group events in a goroutine.
They can be used to, for example, denote distinct steps when handling an API request, such as querying the database, processing the results, and serializing them.
Regions can also be nested, for additional detail. They can be started with StartRegion
and stopped with (*Region).End
.
It is important that regions are stopped in a LIFO order to maintain valid nesting.
It is therefore recommended to use defer
, like in the following example:
defer trace.StartRegion(ctx, "myTracedRegion").End()
There is also a helper function called WithRegion
that wraps the execution of a function in a region.

Finally, tasks exist to group work that is happening across multiple goroutines.
For example, if an incoming API request causes multiple goroutines to do work on behalf of that request in parallel, a task will be able to tie all of them together.
Tasks are created similarly to regions, but with NewTask
and (*Task).End
respectively.
Gotraceui does not currently display tasks.
4.2. net/http/pprof
net/http/pprof
is a package that adds HTTP debugging endpoints to your application.
This is commonly used for acquiring CPU and memory profiles, but it can also be used to capture traces.
To collect a 5-second trace you can run a command like
curl -o trace.out 'http://localhost:6060/debug/pprof/trace?seconds=5'
There is no single endpoint to capture a trace with CPU profiling enabled, but you could capture both a CPU profile and a trace in parallel, like this:
curl -o /dev/null 'http://localhost:6060/debug/pprof/profile?seconds=6' & curl -o trace.out 'http://localhost:6060/debug/pprof/trace?seconds=5'
We capture a slightly longer CPU profile to ensure it covers the entire duration of the trace.
4.3. Tracing tests
One way to acquire execution traces without having to modify your code is by using go test
’s -trace
flag, which writes an execution trace to a file.
This can be particularly useful in combination with running benchmarks.
You can additionally use the -cpuprofile
flag to include CPU profiling samples in the trace.
5. The user interface
The following sections will describe the various components of Gotraceui’s UI.
The Gotraceui UI consists of a main menu, a list of tabs, the main view displaying the current tab, and a side panel.
5.1. Timelines
When loading a trace, the initially selected tab (Timelines) displays the timelines view. This is an interactive view consisting of multiple elements, most prominently timelines representing the trace data.
5.1.1. Axis
The top of the timelines view shows the axis. The bold tick indicates the origin of the axis and displays the absolute time at that point in the trace. Ticks to the left and right of the origin show relative decrements and increments to this absolute time.
By default, the origin is placed at the center of the axis, as analyzing traces often involves looking at what happened before and after an event. The origin can be moved by clicking and dragging anywhere on the axis. Alternatively, the context menu of the axis allows quickly placing the origin at the beginning, middle, or end of the axis. Manually placed origins will stay in place when resizing the window or timelines view, while origins set via the context menu will stay at their relative position.
Additionally, the axis contains red and purple sections, which correspond to the garbage collector’s stop-the-world phase and general activity. Pressing O cycles through displaying the red section, both sections, or none of the sections across the entire view.
5.1.2. Memory plot
The axis is followed by the memory plot, which shows memory usage (more specifically the size of the heap) and the garbage collector goal, using green and purple respectively. Hovering anywhere on the plot will show a tooltip with the exact numeric values. The memory plot is separated from the timelines by a black border. This border can be dragged to resize the plot.
The plot’s context menu offers the following additional features:
- Hide/Show legends
- hides or shows the labels for the minimum and maximum value.
- Hide/Show "Heap size" series
- hides or shows the heap size.
- Hide/Show "Heap goal" series
- hides or shows the heap goal.
- Set extents to global extrema
- scales the plot so that the bottom represents the smallest measured value and the top represents the largest measured value, for the entire trace. Only the values of enabled series will be considered.
- Set extents to local extrema
- works like the previous command, but only considers the currently visible portion of the trace.
- Auto-set extents to local extrema
- automatically applies the previous command whenever the currently visible portion of the trace changes.
- Reset extents
- resets the extents to their default: zero at the bottom and the global maximum at the top. It also disables auto-set extents.
5.1.3. Timelines, tracks, and spans
The main section of the timelines view consists of a number of horizontally stacked timelines. A timeline might show a processor, a goroutine, or phases of the garbage collector. Every timeline has a label, hovering over which may display a tooltip, and right-clicking which may open a context menu.
For example, for processors, the tooltip will show how much time was spent executing user code, doing garbage collection work, and being idle. Pressing Ctrl/⌘ + LMB on a label will zoom the view such that all spans in that timeline are visible. Pressing LMB on a goroutine label will open a panel with additional information about the goroutine (see 5.3 for more on panels.)
A timeline consists of one or more horizontally stacked tracks and each track consists of a series of spans. A span represents a state for some duration of time. For example, a goroutine may be blocked on a channel send operation for 100 ms, and this would be displayed as a single span. Tracks can visualize various things, such as the states of goroutines, call stacks, or user regions.
The space before the first and after the last span in a track is filled with whiskers, which are green and grey respectively. To differentiate goroutines that ended during the trace from goroutines that were still running by the end of the trace, the tracks of goroutines that have ended have a final, black span, indicating the end of the goroutine.
The view can be moved around by dragging with LMB, by using the scroll wheel, or by using the scrollbar. Holding Ctrl/⌘ while scrolling zooms in and out, centered around the cursor’s position. Holding Shift while scrolling swaps the axes. That is, scrolling vertically will scroll horizontally and vice versa. Dragging with Ctrl/⌘ + LMB selects a region of time to zoom to.
The Display menu contains commands for changing the way timelines are displayed, as well as commands for quick navigation.

Hovering over a span will show context-specific information about it, including its state and duration, but also additional information such as tags (see 5.1.3.2) or the reason for being in a certain state. Pressing LMB on a span will open a panel with additional information about the span, including a list of events that happened during that span. Pressing Ctrl/⌘ + LMB on a span will zoom to the span.
Spans have different colors depending on the states they represent. Different kinds of timelines and tracks use different color schemes. These are explained in 5.1.3.1.
Depending on the zoom level, individual spans may be too small to display. When that happens, Gotraceui groups small spans together in a merged span, which is rendered using a downsampled preview of the contained spans. Zooming into merged spans will progressively show higher resolution previews, until eventually the merged spans break up into individual spans.

Computing the previews for merged spans can sometimes take a long time, in which case we fall back to lower quality previews or a placeholder while the better preview gets computed in the background. As long as some of the visible previews aren’t displayed in the best quality possible, a dancing gopher will be shown.
All spans have context menus, which include at least a Zoom option, which acts identically to Ctrl/⌘ + LMB, and a Show span info option, which opens the span panel. Some spans have more options:
- Spans in processor timelines have a Scroll to goroutine option to scroll to the corresponding goroutine timeline.
- Blocked spans in goroutine timelines have a Scroll to unblocking goroutine option to scroll to the goroutine that unblocked the goroutine. For example, for a goroutine stuck in a channel receive, this will scroll to the sending goroutine.
- Running spans in goroutine timelines have a Scroll to processor option to scroll to the processor that the goroutine is running on at the time.
5.1.3.1. Span colors
Spans in processor timelines will have one of two colors: Green for spans that represent running user goroutines, and purple for spans that represent garbage collection work.
Spans in the first track of goroutine timelines can have many different colors, representing the many different states a goroutine can be in. You can find an exhaustive list of all goroutine states — and the corresponding span colors — in 6.1.5.
User regions are displayed in light pink. Stack traces are displayed either in a light shade of green if they’re from events, or in a lighter shade of green if they’ve been acquired via CPU sampling.
5.1.3.3. Stack traces and CPU sampling
In addition to sequences of runtime events and user regions, Gotraceui can also display tracks for stack traces. These can be enabled via Display > Show stack frames or by pressing the S key. When using the keyboard shortcut, the timeline that is currently under the cursor will stay in its current position. Other timelines will have to move, as their heights change due to the new tracks.
Each track represents one frame of the stack trace, with the frames sorted from bottom to top. In other words, the first displayed frame represents the entry point of the goroutine.
There are two kinds of stack traces in Gotraceui: stacks associated with runtime events, and CPU sampling. The first kind is straightforward to understand and conceptualize. Most events that cause state transitions into blocked states, which cause new spans to be created, have stack traces associated with them. These stacks are true for the entire duration of a span; a span that is blocked on a channel receive in some function will be blocked at the same place the whole time, for example.
The second kind is trickier to understand. When CPU profiling is enabled during tracing, the trace will include CPU samples. A CPU profiling sample states that at a specific point in time, a certain function was executing and how we got there. It doesn’t say anything about what happened right before or after the sample.
By default, samples occur at a frequency of 100 Hz, i.e., once every 10 ms. This means that there is 10 ms of uncertainty after a sample. The stack trace might’ve changed anywhere from 0 to 10 ms after the sample. The same function may even have been called repeatedly. All we really know is that at one point in time, the function was running.
Displaying spans that are infinitely small, however, wouldn’t be very useful. For that reason, Gotraceui displays CPU samples similarly to how it displays stack traces of events. A span for a CPU sample will start when the sample was made and it ends either when another sample is captured or when a state transition occurs. To differentiate these less accurate stack traces from others, they are displayed in a lighter color.
The power of CPU samples lies in spotting macroscopic patterns in code execution, on the scales of hundreds of milliseconds, if not seconds. A frame that gets sampled multiple times is likely to be spending more time executing than other frames. To further aid this macroscopic view, spans of consecutive CPU samples that describe the same frame are merged. This creates the usual layered representation of stacks that one might be familiar with from other tools such as flame graphs.
However, the data is woefully inadequate at small scales — the kind of scales at which execution trace data exists. You shouldn’t rely on sampled stack traces to fill in the gaps between two runtime events that happened 100 ms apart. It is important to either look at runtime events or CPU samples, but not both together. Runtime events show an exact history of what happened in the runtime, while CPU samples show a guess at what happened in user code.

readImagePass
, but beyond that we don’t have enough data.It is also important to understand that CPU profiling samples happen at a fairly constant rate, which means all samples have the same uncertainty. Runtime events, however, can happen at arbitrary points. If a sample is followed by a runtime event 1 ms later then it will look much smaller than if it were followed by a runtime event 9 ms later, even though in the latter case we still don’t know what happened for the first 9 ms.
5.2. Links
Gotraceui’s UI contains many kinds of links, such as links to timestamps, goroutines, functions, etc.
Links are color-coded. Red links refer to timestamps and blue links refer to objects such as goroutines or spans.
Left-clicking on links executes their default action, and most links also have alternate actions that can be accessed using the context menu, or by holding certain modifier keys while left-clicking.
Clicking on a timestamp scrolls the timelines view to that point in time, specifically by scrolling the time to the origin configured in the axis. Links to processors, goroutines, and spans all share the same functionality:
- Clicking on one opens the corresponding information panel.
- Holding Ctrl/⌘ while clicking on one zooms to it.
- Holding Shift while clicking on one scrolls the timelines view to the corresponding timeline or beginning timestamp.
The context menu of span links allows scrolling to their beginnings or ends, in addition to opening their info and zooming to them.
Hovering over timestamp links highlights the timestamp in the timelines view. This either uses a red cursor, if the timestamp is visible, or it highlights the left or right border of the view to indicate the direction in which the timestamp lies.
5.3. Panels
Gotraceui uses a side panel to display additional information about entities such as goroutines and spans. Clicking on a supported entity opens the panel on the right side of the window.
Panels can be resized by dragging the black line. Clicking Back will go back to the previously displayed panel. This can be used repeatedly. Finally, clicking Tabify will turn a panel into a tab (see 5.4 for more information on tabs).
Depending on the type of panel, additional buttons may exist.
Panels consist of summary information at the top and tabs that display more specific information.
5.3.1. Goroutine panel
Clicking on goroutine labels or goroutine links opens the goroutine panel.
Goroutine panels display the following information:
- Basic information, such as the goroutine’s parent, its duration, or what function it was running.
- Statistics of the different states of spans.
- All spans in the goroutine.
- All events in the goroutine.
- The stack trace, showing where the goroutine was created, if that information is available.
Goroutine panels have two additional buttons for scrolling and zooming to the goroutine.
5.3.2. Span panel
Clicking on spans (either merged or unmerged ones) opens the span panel.
Span panels display the following information:
- Basic information, such as the start and end time.
- Statistics of the different states.
- A list of the individual spans.
- For goroutine spans, including user regions, events that occurred during the span.
- For unmerged spans, the stack trace.
Additionally, individual user region spans have a button labeled Select user region, which selects all user region spans with the same label. Spans selected that way have a Histogram tab, which displays a histogram of span durations. See 5.5 for more information on histograms.
Span panels have two additional buttons for scrolling & panning and zooming to the spans.
5.3.3. Function panel
Clicking on function links—such as the ones displayed in goroutine panels—opens the function panel.
Function panels display the following information:
- Basic information, such as how many goroutines ran the function
- A list of all goroutines.
- A histogram, showing the durations of goroutines that ran the function. See 5.5 for more information on using histograms.
5.4. Tabs
The main UI uses tabs to display the major features of Gotraceui. These are the timelines view, the list of goroutines, heatmaps, and flame graphs. Furthermore, every panel can be converted to a tab using the Tabify button.
Most tabs can be closed by clicking on them with the middle mouse button, with the exception of the Timelines and Goroutines tabs.
When the tab bar contains more tabs than can be displayed it can be scrolled horizontally, or by holding Shift while scrolling vertically.
5.4.1. Goroutines
The Goroutines tab displays a tabular view of all goroutines in the trace.
5.4.2. Heatmaps
Gotraceui can display processor utilization using quantized heatmaps. These can be accessed via Analyze > Open processor utilization heatmap.
The X-axis shows time, the Y-axis shows utilization in percent, and color saturation represents the number of processors.
The size of a bucket can be adjusted using the arrow keys. The ← and → keys decrease and increase the amount of time represented by a bucket in steps of 10 ms. The ↓ and ↑ keys decrease and increase the range of percentage points per bucket.
By default, a ranked color palette is used, where each distinct value that occurred gets its own saturation. Compared to a linear palette, where the color is proportional to the value, a ranked palette makes it easier to spot outliers. On the flip side, a linear palette allows comparing absolute values just by looking at the color.


The bottom of the heatmap tab displays information about the currently hovered bucket: the range of time and the range of utilization represented by the bucket, as well as the number of processors in said bucket.
Please note that processor refers to the concept from the Go runtime, and not actual CPUs or CPU cores. While processor utilization is a good estimate for actual CPU utilization, it cannot account for the OS scheduler, nor for cgo.
5.4.3. Flame graphs
Gotraceui can display flame graphs based on CPU sampling as well as the stack traces associated with tracing events.
Flame graphs are a visualization of hierarchical data, created to visualize stack traces of profiled software so that the most frequent code-paths to be identified quickly and accurately. —Brendan Gregg
Our flame graphs follow the common conventions: colors don’t have any meaning and only serve to differentiate spans, and the order of spans on the X axis is alphabetical. Gotraceui assigns colors so that functions from the same package have the same hue, using different lightness values to differentiate neighboring spans.

There are two ways of opening flame graphs: the first is by using Analyze > Open flame graph. This will open a flame graph visualizing all of the CPU samples found in the trace. The second is by choosing the Open flame graph option in goroutine context menus. This will open a flame graph specific to that goroutine, displaying both CPU samples (under the root span titled Running) and stacks from goroutine state transitions (e.g. being blocked on a channel send), thus showing both on- and off-CPU time.
Flame graphs are interactive. Hovering over spans will display tooltips with useful information. Pressing Ctrl/⌘ + LMB on a span will zoom to it and Ctrl/⌘ + Z undoes zooming.
5.5. Histograms
Histograms are used for showing the distribution of data. They are similar to bar charts, but instead of discrete values, each bar represents a range of values. Gotraceui uses histograms to show the distribution of durations of goroutines and user regions.
They can be accessed in two ways: via function panels (see 5.3.3), and via span panels for user regions (see 5.3.2.)
By default, our histograms use 100 bins and don’t reject outliers. Both of these things can be changed by right-clicking on a histogram and selecting Change settings. The Filter outliers option removes outliers before computing the histogram. Outliers are defined as values that are larger than 2.5× the interquartile range.
Histograms are interactive. Hovering over a bin shows a tooltip describing the range represented by the bin, as well as the number of values in the bin. Double-clicking a bin, or drawing a selection with Ctrl/⌘ + LMB, focuses the histogram on the selected time range. You can reset the histogram by right-clicking and choosing Zoom out.
The Goroutines tab of function panels has a checkbox titled Filter list to range of durations selected in histogram. When this is enabled, focusing a time range in the histogram will filter the list of goroutines to those whose durations fall into the focused range. This is useful for finding goroutines that take abnormally long. For example, if you have a server application where each user request is served by a goroutine running a certain function, then you can use the histogram to focus on the time range you deem unacceptable—for example, requests might be intended to only take up to 10 ms and all goroutines that run for more than that are problematic.
5.6. Tables
Tables are used in various places. They allow sorting columns by clicking on their headers. They also allow resizing columns by dragging the divider between column headers. Without any modifier keys, dragging the divider will adjust the ratio between two columns. When holding the Shift key while dragging, the size of the left column will be adjusted without changing the size the right column. This might increase the width of the table.
5.7. Mouse and keyboard controls
5.7.1. Global
Input | Function |
---|---|
Ctrl/⌘ + + | Increase UI scale |
Ctrl/⌘ + = | Increase UI scale |
Ctrl/⌘ + - | Decrease UI scale |
Ctrl/⌘ + 0 | Reset UI scale |
RMB (click) | Open context menu |
5.7.2. Timelines view
Input | Function |
---|---|
LMB (click) | Open timeline and span information |
LMB (drag) | Pan the timelines view |
Ctrl/⌘ + LMB (drag) | Zoom to selected area |
Ctrl/⌘ + LMB (click) | Zoom to clicked span or timeline |
Home | Scroll to top of timelines view |
Ctrl/⌘ + Home | Zoom to fit currently visible timelines |
Shift + Home | Jump to beginning of trace |
C | Toggle compact display |
G | Open timeline selector |
H | Open span highlighting dialog |
O | Toggle STW and GC overlays |
S | Toggle display of stack tracks |
T | Toggle displaying tooltips |
X | Toggle display of all timeline labels |
Ctrl/⌘ + Z | Undo navigation |
5.7.3. Heatmaps
Input | Function |
---|---|
← | Decrease X-axis bucket size by 10 ms |
→ | Increase X-axis bucket size by 10 ms |
↓ | Decrease Y-axis bucket size |
↑ | Increase Y-axis bucket size |
5.7.4. Histograms
Input | Function |
---|---|
LMB (double-click) | Zoom to selected bin |
Ctrl/⌘ + LMB (drag) | Zoom to selected bins |
5.7.5. Flame graphs
Input | Function |
---|---|
Ctrl/⌘ + LMB (click) | Zoom to clicked span |
Ctrl/⌘ + Z | Undo navigation |
6. The Go runtime
Execution traces are primarily about the interactions between goroutines and the runtime. To make sense of traces, then, it is helpful to understand how the runtime functions. The following sections explain the most important aspects of the runtime and their impact on Gotraceui’s visualization.
6.1. The scheduler
Go programs can have very many goroutines, up to millions. Because it wouldn’t be feasible to map one goroutine to one OS level thread, Go has to distribute goroutines over a smaller number of threads. To do so, the scheduler has to decide which goroutines to run when, part of which involves tracking which goroutines can run. The activity related to this makes up a large part of what the trace captures and Gotraceui visualizes. It is thus helpful to understand how the scheduler works.
6.1.1. Machines, Processors, and Goroutines
The scheduler manages three resources: Machines, Processors, and Goroutines. In conversations, documentation, and source code these are usually referred to by their initials.
Machines correspond to operating system threads.
They are responsible for actually executing instructions.
Processors are, conceptually, tokens.
Machines need to hold processors to be allowed to run goroutines.
This serves two purposes:
First, it puts a bound on parallelism.
You wouldn’t want hundreds of threads to fight for CPU resources.
Second, it allows for an efficient implementation.
While processors are tokens in principle, they are also a concrete data structure that holds information necessary for efficiently running goroutines.
In other words, machines use processors to run goroutines.
The environment variable GOMAXPROCS
controls the number of available processors.
It defaults to the number of CPU cores.
In the context of execution traces and Gotraceui, goroutines are said to be running on processors, as the trace format is processor-centric. In fact, Gotraceui does not expose machines at all. If it did, it would show goroutines running on processors and processors running on machines.
If you’d like to learn more about the internals of the scheduler that aren’t necessary to understand traces but are nevertheless interesting, check out Daniel Morsing’s blog post on the topic.
6.1.2. Syscalls
Syscalls, short for system calls, are the primary way that processes use to communicate with the operating system’s kernel. They are used to, for example, manipulate files, spawn new processes, use the network, etc.
Syscalls are synchronous: once a thread executes a syscall it cannot do anything else until the syscall returns. When a goroutine executes a syscall it causes the whole machine to block—that machine will not be able to run any other goroutines until the syscall returns. When this happens, the machine loses its processor, after all it is no longer able to run goroutines, and another machine may pick up the processor. Additionally, Go ensures that there are always enough unblocked machines to run all processors by creating new ones when necessary.
All of this work is fairly expensive.
That is why Go differentiates between ``non-blocking’’ and ``blocking’’ syscalls.
Non-blocking syscalls aren’t truly non-blocking; they’re just syscalls that return very quickly.
For example, the gettimeofday
syscall usually returns within a few microseconds.
It would take longer to give up the processor and spawn a new thread than it would to wait for the syscall to return.
Additionally, the goroutine that invoked the syscall would have to wait its turn to be scheduled again, which might involve waiting for another goroutine to be preempted.
All in all, the cost of a cheap syscall would multiply tenfold.
Instead of doing all that, Go just waits to see if the syscall returns promptly.
Only if it doesn’t will Go go through the steps we described earlier.
In the execution trace, and thus Gotraceui, these two kinds of syscalls are represented differently. Short syscalls appear as instantaneous events during a span, while long syscalls appear as their own spans.
6.1.3. LockOSThread
By default, the scheduler moves goroutines between processors and processors between machines as necessary to maintain good performance.
For most Go programs, this is fine and indeed desirable.
However, when using cgo, it may be the case that the libraries you use depend on thread-local storage (TLS).
TLS allows storing per-thread state, which is little more than global variables scoped to threads.
Of course, if Go moves goroutines across threads, then thread-local state will not be available consistently.
To solve this problem, Go offers the runtime.LockOSThread
function, which locks the current goroutine to the current thread.
From that point on, the goroutine will only ever run on that thread (unless UnlockOSThread
is called), and no other goroutines will be allowed to run on it.
Because Gotraceui visualizes processors and goroutines but not machines, the use of LockOSThread
is largely invisible.
In particular, thread-locked goroutines can still move between processors freely.
6.1.4. Cooperative scheduling and preemption
This section will be expanded in the future
6.1.5. Goroutine states
All goroutines are in one of three states: Runnable, running, and blocked. The runtime subdivides blocked into different reasons for being blocked, and Gotraceui introduces some of its own subdivisions to further increase the level of detail.
The following is an exhaustive list of states found in Gotraceui. Each state name’s background color corresponds to the span color used in Gotraceui.
- created
- Newly created goroutines will be in this state before they get scheduled for the first time. It is a special case of the ready state.
- active
- Active goroutines are those that are currently running.
- send, recv, select
- These states describe the three ways in which goroutines can be blocked on channel communication.
- sync
- This state is used by goroutines that are blocked on sync
primitives, such as
sync.Mutex
. - sync.Once
- Blocked on a
sync.Once
. This is a special case of the sync state and detected by Gotraceui based on stack traces. - sync.Cond
- Blocked on a condition variable (
sync.Cond
.) - I/O
- This state is entered by goroutines that are waiting for pollable I/O to complete. See 6.2 for more information.
- syscall
- Goroutines enter this state when they invoke a blocking syscall. See 6.1.2 for an explanation of the difference between blocking and non-blocking syscalls in the context of Go.
- blocked
- Blocked goroutines are waiting for something to happen, but we don’t know what. This usually happens for goroutines of the runtime that don’t emit more accurate information. User goroutines will usually have more specific states such as send.
- inactive
- This state is one of Gotraceui’s custom states and is used for
goroutines that are blocked or ready to run, but aren’t actually eager to run.
For blocked goroutines,
this is exclusively used by goroutines of the runtime that block on some lock to pace the amount of work they do.
Goroutines that are technically in the ready state but are marked inactive are those that called
runtime.Gosched
ortime.Sleep
, as this indicates that they willingly gave up part of their share in CPU time, and their time spent waiting shouldn’t be considered scheduler latency. - Blocked (GC)
- The goroutine is waiting to assist the garbage collector.
- ready
- A goroutine in this state isn’t blocked on anything anymore and can start running as soon as it gets scheduled. A goroutine can be in this state because there aren’t any free processors to run it, or simply because the scheduler hasn’t gotten around to starting it yet. Goroutines can transition into this state from the active state if they get preempted, or from any of the various blocked states once they get unblocked. Time spent in this state is commonly called scheduler latency.
- GC (idle)
- A
runtime.gcBgMarkWorker
goroutine doing idle mark work. - GC (dedicated)
- A
runtime.gcBgMarkWorker
goroutine doing dedicated mark work. - GC (fractional)
- A
runtime.gcBgMarkWorker
goroutine doing fractional mark work. - GC mark assist
- Goroutines in the GC mark assist state are assisting the mark phase.
- GC sweep
- Goroutines in the GC sweep state are sweeping memory.
- stuck
- Goroutines in this state are stuck and can never make progress.
This happens, for example, when receiving from a nil channel or using
select
with no cases. - done
- This state is used for synthetic spans that indicate that goroutines have exited.
6.2. Pollable vs. non-pollable I/O
To do I/O on files (we use files in the Unix sense, referring to actual files, pipes, network connections, etc) we need to involve the operating system.
In 6.1.2 we established that this usually requires syscalls and indeed, the most straightforward way to read and write is to use the read
and write
syscalls (in the following we’ll focus on reading, but everything applies equally to writing.)
However, we’ve also established that syscalls block execution and force Go to create new threads.
This is especially true for I/O, which rarely finishes quickly.
But what happens if we’re working on a highly concurrent server that handles thousands of connections simultaneously?
We certainly do not want thousands of threads.
Threads are expensive to create, they need memory, the OS has to manage them, the Go runtime has to manage them, and so on.
Avoiding a large number of threads is one of the reasons Go doesn’t use one thread per goroutine in the first place.
Most operating systems provide a more efficient alternative to having thousands of threads waiting in syscalls: a combination of non-blocking I/O and some mechanism that can be polled to wait for files to become ready. Non-blocking means that instead of blocking, read syscalls fail immediately if there is no data available to be read. The expectation is that the program will try again in the future. Ideally this is combined with the aforementioned polling-based mechanism which tells the program when a file is ready to be read from. Using these two features means that we no longer need one thread per outstanding I/O operation. Instead, we need one thread to poll for events, and some number of threads to do I/O.
In a low-level language like C, we’d be responsible for managing non-blocking I/O, polling, retrying reads, and possibly using a thread pool.
In Go, all of this is hidden from view. To the programmer, all reads look like ordinary blocking calls: f.Read(b)
will return once it has read some data.
In the background, however, the runtime uses the features we’ve described previously.
A read doesn’t map to a simple syscall. Instead, it calls into the runtime, which is responsible for reading, polling, and retrying.
During polling, the goroutine is put into a blocked state and the processor that was running it is free to run other goroutines.
Once the file becomes ready for reading, the goroutine is unblocked and scheduled so it can retry the read.
The runtime system responsible for polling is called the netpoller. It is called that because originally, it only supported network connections. Since then it has been expanded to operate on other kinds of files, too, such as pipes. However, not all kinds of files are pollable. For example, Go uses epoll on Linux, and epoll doesn’t support ordinary disk files. Which kinds of files are pollable also differs between operating systems, as each OS provides its own mechanism. For non-pollable files the netpoller cannot be used and Go falls back to normal, blocking syscalls, with all of the previously mentioned downsides. However, the most important use case of the netpoller is undoubtedly network connections, and these are pollable on all systems.
In Gotraceui, blocking on pollable I/O is represented using I/O spans, while non-pollable I/O is shown as syscalls in the way described in 6.1.2. For pollable I/O, span tooltips may additionally show the kind of I/O, such as network reads. This information is based on tags, which are described in 5.1.3.2.

6.3. Garbage collection
Go uses a concurrent mark and sweep garbage collector. Its activity will interact with the scheduling of your goroutines in various ways, which we’ll explore in this section. We will focus on the details that matter for understanding execution traces. There are many more details to how the GC works and you’re encouraged to read the official documentation to learn more about it.
This section will be expanded in the future
6.4. The runtime’s goroutines
The runtime spawns several of its own goroutines that will show up in most traces. Most of these exist to help with the concurrent garbage collector.
bgsweep
is a low priority goroutine that sweeps spans when there are idle processors. This reduces the amount of sweeping that has to be done by other goroutines.bgscavenge
periodically returns unused memory to the OS.- Multiple
gcBgMarkWorker
goroutines implement the garbage collector’s concurrent mark phase. See 6.3 for more information on the garbage collector. forcegchelper
periodically gets woken up and forces a garbage collection cycle to start. This ensures that garbage gets collected regularly even if the program isn’t allocating enough memory to hit the heap target.runfinq
is the goroutine that is responsible for running finalizers. That means that this runtime goroutine will execute code provided by the user viaruntime.SetFinalizer
.
7. Final words
The best way to get started with runtime tracing is to jump right in.
