How linkage works
Sometimes you want to limit access to a symbol, for example a variable, to a single translation unit. The shared approach between C and C++ is to declare a symbol within a translation unit as static. This example is shown below:
// In lib.hpp
// Prevent count from being accessed outside of this translation unit
static int count = 0;
...
// In main.cpp
// Tell the compiler that count is defined in another translation unit
extern int count;
int main(int argc, char** argv) {
return count;
}
This example will throw an error when attempting to compile it. Specifically:
$ clang++ -c lib.cpp
$ clang++ main.cpp lib.o
Undefined symbols for architecture arm64:
"_count", referenced from:
_main in main-4c6fc0.o
ld: symbol(s) not found for architecture arm64
clang++: error: linker command failed with exit code 1 (use -v to see invocation)
I have probably read errors like this thousands of times, but I have never actually thought about the fact that this error is generated from the linker and not the compiler. How does the linker know that the C or C++ code indicates that the symbol should not be accessed from outside of the translation unit? Let us take a look at some examples.
First let us take an example with a non-static function:
static int count = 0;
void incrementCount() {
count++;
}
We can compile this example and then show the symbols present in the corresponding object file. These objects are presented to the linker to determine access.
$ clang++ -c lib.cpp
$ nm lib.o
0000000000000000 T __Z14incrementCountv
0000000000000038 b __ZL5count
0000000000000000 t ltmp0
0000000000000038 b ltmp1
0000000000000018 s ltmp2
The line 0000000000000000 T __Z14incrementCountv
shows the a symbols address, type, and name
respectively. The type T
indicates that the incrementCount
function is part of the text section
of the program and that it is exported by virtue of being upper case.
If we consider another example with a static function:
static int count = 0;
static void incrementCount() {
count++;
}
// Force the compiler not to skip incrementCount
void callIncrementCount() {
incrementCount();
}
We can then follow the same compilation process and show symbols as we did previously
$ clang++ -c lib.cpp
$ nm lib.o
0000000000000000 T __Z23callIncrementCountv
0000000000000014 t __ZL14incrementCountv
0000000000000068 b __ZL5count
0000000000000000 t ltmp0
0000000000000068 b ltmp1
0000000000000028 s ltmp2
The line 0000000000000014 t __ZL14incrementCountv
is similar to what it was previously, with the
exception of the type t
being lower case. This means that it is no longer being exported by
virtue of being lower case, but it still belongs to the text section in the object file.
Not only is this interesting, but it also explains where the terminology internal linkage and external linkage comes from. Internal linkage is when a symbol cannot be accessed from outside of the translation unit in which it is defined. External linkage, the default behavior, is when a symbol can be accessed from any translation unit.