Go code can retrieve and use field and method names via the `reflect` package.
For that reason, historically we did not obfuscate names of fields and methods
underneath types that we detected as used for reflection, via e.g. `reflect.TypeOf`.
However, that caused a number of issues. Since we obfuscate and build one package
at a time, we could only detect when types were used for reflection in their own package
or in upstream packages. Use of reflection in downstream packages would be detected
too late, causing one package to obfuscate the names and the other not to, leading to a build failure.
A different approach is implemented here. All names are obfuscated now, but we collect
those types used for reflection, and at the end of a build in `package main`,
we inject a function into the runtime's `internal/abi` package to reverse the obfuscation
for those names which can be used for reflection.
This does mean that the obfuscation for these names is very weak, as the binary
contains a one-to-one mapping to their original names, but they cannot be obfuscated
without breaking too many Go packages out in the wild. There is also some amount
of overhead in `internal/abi` due to this, but we aim to make the overhead insignificant.
Fixes#884, #799, #817, #881, #858, #843, #842Closes#406
We used it to detect GOOS-specific packages and ignore their load errors
without having to do a substring search.
However, it turns out that repeatedly loading the string slice
from gob files in the cache is rather slow, particularly since many
Go packages have dozens of GOOS-specific files which can be ignored.
│ old │ new │
│ cached-sec/op │ cached-sec/op vs base │
Build-8 340.3m ± 1% 335.8m ± 2% -1.32% (p=0.002 n=10)
│ old │ new │
│ mallocs/op │ mallocs/op vs base │
Build-8 35.73M ± 0% 35.09M ± 0% -1.79% (p=0.000 n=10)
In looking at the cpu and memory profiles, it surfaced that we spent
a lot of time in garbage collection, and a significant amount of the
garbage was produced by gob decoding string slices.
listedPackage.Deps is a list of a package's transitive dependencies,
so as a Go build gets larger, the list also gets larger and larger.
Given that Imports is the list of direct dependencies,
we can reconstruct it ourselves as needed, which is not always.
Moreover, since we want to do lookups, we can build a map directly.
This doesn't directly result in a wall time speed-up,
but it does result in a significant reduction in allocations.
The gob files we store in the disk cache should also be a bit smaller.
│ old │ new │
│ cached-sec/op │ cached-sec/op vs base │
Build-8 339.5m ± 2% 340.3m ± 1% ~ (p=0.218 n=10)
│ old │ new │
│ mallocs/op │ mallocs/op vs base │
Build-8 38.08M ± 0% 35.73M ± 0% -6.18% (p=0.000 n=10)
Otherwise we miscalculate int sizes, type sizes, alignments, and so on.
Caught by the GOARCH=386 go test on CI, since the os package imports
internal/syscall/unix, which uses arch-dependent padding.
The different padding between our incorrect use of go/types
and the correct typechecking done by the compiler caused different
obfuscation of fields, as the struct types stringified differently,
and they are used as a hash salt for field name obfuscation.
Recently, a patch changed the argument `-mod=` to `-mod=readonly`
as the former is not really a valid flag value, and broke with go.work.
However, the latter seems to break our tests on Go 1.22.6
when listing all of runtimeLinknamed:
panic: failed to load missing runtime-linknamed packages: golang.org/x/crypto@v0.16.1-0.20231129163542-152cdb1503eb:
reading http://127.0.0.1:43357/mod/golang.org/x/crypto/@v/v0.16.1-0.20231129163542-152cdb1503eb.mod: 404 Not Found
It seems like, somehow, listing std packages was trying to download
x/crypto from GOPROXY - which is a local server with testdata/mod,
and so it does not contain x/crypto. However, this is entirely wrong,
as std vendors dependencies, including this very version of x/crypto.
Reverting the change to `-mod=readonly` resolves this issue,
which explains why we hadn't encountered this surprising GOPROXY error,
but the revert would also break users of go.work files.
Luckily, we have a better alternative: rather than trying to override
the value of the flags by adding more arguments, delete them entirely.
The empty string is not a valid value for the -mod flag, and it fails when using a workspace too:
go: -mod may only be set to readonly or vendor when in workspace mode, but it is set to ""
gopls correctly pointed out that the err==nil check was never met,
as err was assigned and we returned early when err!=nil.
This was an oversight when I wrote this; when Encode fails,
we shouldn't return, because we still want to close the file.
We don't defer because we want to check the error; explain that.
Panicking in small helpers or in funcs that don't return error
has proved useful to keep code easier to maintain,
particularly for cases that should typically never happen.
However, in these cases we can error just as easily.
In particular, I was getting a panic whenever I forgot
that I was running garble with Go master (1.23), which is over the top.
When updating Garble to support Go 1.22.0, CI on MacOS spotted
that the syscall package was failing to build given that it uses
assembly code which is only allowed in some std packages.
That allowlist is based on import paths, and we were obfuscating
the syscall package's import path, so that was breaking GOOS=darwin.
As a fix, I added syscall to runtimeAndDeps to not obfuscate it.
That wasn't a great fix; it's not part of runtime and its dependencies,
and there's no reason we should avoid obfuscating the package contents.
Not obfuscating the contents in fact broke x/sys/unix,
as it contains a copy of syscall.Rlimit which it type converted with.
Undo that fix and reinstate the gogarble.txtar syscall test.
Implement the fix where we only leave syscall's import path alone.
Add a regression test, and add a note about adding x/net and x/sys
to check-third-party.sh so that we can catch these bugs earlier.
Fixes#830.
It seems like building with Go 1.22.0 for GOOS=darwin started
running into some issues with the syscall package's use of ABIInternal
in assembly source code:
> exec garble build
[stderr]
# syscall
[...].s:16: ABI selector only permitted when compiling runtime, reference was to "runtime.entersyscall"
The error can be reproduced from another platform like GOOS=linux
as long as we have any test that cross-compiles std to GOOS=darwin.
We had crossbuild.txtar which only ensured we covered GOOS=windows
and GOOS=linux, so add a third case to ensure MacOS is covered too.
This will slow down the tests a bit, but is important for the sake
of ensuring that we catch these bugs early, even without MacOS on CI.
In fact, we hadn't caught this earlier for Go 1.22 precisely because
on CI we only tested on Go tip with GOOS=linux, for the sake of speed.
Adding the rest of the package import paths from objabi.allowAsmABIPkgs
to our runtimeAndDeps generated map solves this error.
Go 1.21.0 was released in August 2023, so our upcoming release
will no longer support the Go 1.20 release series.
The first Go 1.22 release candidate is also due in December 2023,
less than a month from now, so dropping 1.20 will simplify 1.22 work.
Two new packages linknamed with the runtime package,
one new intrinsic function, and one that is being removed in Go 1.22
but we want to keep around as long as we support Go 1.21.
Also note that, since math/rand/v2 simply does not exist until Go 1.22,
we need to adjust appendListedPackages to not fail on older versions.
First, teach scripts/gen-go-std-tables.sh to omit test packages,
since runtime/metrics_test would always result in an error.
Instead, make transformLinkname explicitly skip that package,
leaving a comment about a potential improvement if needed.
Second, the only remaining "not found" error we had was "maps" on 1.20,
so rewrite that check based on ImportPath and GoVersionSemver.
Third, detect packages with the "exclude all Go files" error
by looking at CompiledGoFiles and IgnoredGoFiles, which is less brittle.
This means that we are no longer doing any filtering on pkg.Error.Err,
which means we are less likely to break with Go error message changes.
Fourth, the check on pkg.Incomplete is now obsolete given the above,
meaning that the CompiledGoFiles length check is plenty.
Finally, stop trying to be clever about how we print errors.
Now that we're no longer skipping packages based on pkg.Error values,
printing pkg.DepsErrors was causing duplicate messages in the output.
Simply print pkg.Error values with only minimal tweaks:
including the position if there is any, and avoiding double newlines.
Overall, this makes our logic a lot less complicated,
and garble still works the way we want it to.
computeLinkerVariableStrinsg had an unusedargument.
Only skip obfuscating the name "FS" in the "embed" package.
The reflect methods no longer use the transformer receiver type,
so that TODO now feels unnecessary. Many methods need to be aware
of what the current types.Package is, and that seems reasonable.
We no longer use writeFileExclusive for our own cache on disk,
so the TODO about using locking or atomic writes is no longer relevant.
This means we now have a unified cache directory for garble,
which is now documented in the README.
I considered using the same hash-based cache used for pkgCache,
but decided against it since that cache implementation only stores
regular files without any executable bits set.
We could force the cache package to do what we want here,
but I'm leaning against it for now given that single files work OK.
It is true that each garble process only obfuscates up to one package,
which is why we made them globals to begin with.
However, garble does quite a lot more now,
such as reversing the obfuscation of many packages at once.
Having a global "current package" variable makes mistakes easier.
Some funcs, like those in transformFuncs, are now transformer methods.
Packages like os and sync have started using go:linknames pointing to
packages outside their dependency tree, much like runtime already did.
This started causing warnings to be printed while obfuscsating std:
> exec garble build -o=out_rebuild ./stdimporter
[stderr]
# sync
//go:linkname refers to syscall.hasWaitingReaders - add `import _ "syscall"` for garble to find the package
# os
//go:linkname refers to net.newUnixFile - add `import _ "net"` for garble to find the package
> bincmp out_rebuild out
PASS
Relax the restriction in listPackage so that any package in std
is now allowed to list packages in runtimeLinknamed,
which makes the warnings and any potential problems go away.
Also make these std test cases check that no warnings are printed,
since I only happened to notice this problem by chance.
This is in preparation for the switch to Go's cache package,
whose ActionID type is also a full sha256 hash with 32 bytes.
We were using "short" hashes as shown by `go tool buildid`,
since that was consistent and 15 bytes was generally enough.
First, rename "component" to "hash", since it's shorter and more useful.
A full build ID is two or four hashes joined with slashes.
Second, add sanity checks that buildIDHashLength is being followed.
Otherwise the use of []byte could lead to human error.
Third, move all the hash encoding and decoding logic together.
A couple of new packages in runtimeAndDeps,
and go list's Package.DepsErrors may now include package build errors
which we want to ignore as we would print those as duplicates.
printOneCgoTraceback now returns a boolean rather than an int.
Since we need to have different logic based on the Go version,
and toolchainVersionSemver was only set for the main process,
move the string to the shared cache global.
This is a nice thing to do anyway, to reduce the number of globals.
While here, update actions/setup-go to v4, which starts caching
GOMODCACHE and GOCACHE by default now.
Disable it, because it still doesn't help in our case,
and GitHub's Actions caching is still really inefficient.
And update staticcheck too.
I mistakenly understood that, when the DepsErrors field has errors,
the Error field would contain an error as well.
That is not always the case; for example,
the imports_missing package in the added test script
had DepsErrors set but Error empty, causing a nil dereference panic.
Make the code more robust, and report both kinds of load errors.
Fixes#694.
We're building the linker binary for the host GOOS,
not the target GOOS that we happen to be building for.
I noticed that, after running `go test`, my garble cache
would contain both link and link.exe, which made no sense
as I run linux and not windows.
`go env` has GOHOSTOS to mirror GOOS, but there is no
GOHOSTEXE to mirror GOEXE, so we reconstruct it from
runtime.GOOS, which is equivalent to GOHOSTOS.
Add a regression test as well.
When we use `go list` on the standard library, we need to be careful
about what flags are passed from the top-level build command,
because some flags are not going to be appropriate.
In particular, GOFLAGS=-modfile=... resulted in a failure,
reproduced via the GOFLAGS variable added to linker.txtar:
go: inconsistent vendoring in /home/mvdan/tip/src:
golang.org/x/crypto@v0.5.1-0.20230203195927-310bfa40f1e4: is marked as explicit in vendor/modules.txt, but not explicitly required in go.mod
golang.org/x/net@v0.7.0: is marked as explicit in vendor/modules.txt, but not explicitly required in go.mod
golang.org/x/sys@v0.5.1-0.20230208141308-4fee21c92339: is marked as explicit in vendor/modules.txt, but not explicitly required in go.mod
golang.org/x/text@v0.7.1-0.20230207171107-30dadde3188b: is marked as explicit in vendor/modules.txt, but not explicitly required in go.mod
To ignore the vendor directory, use -mod=readonly or -mod=mod.
To sync the vendor directory, run:
go mod vendor
To work around this problem, reset the -mod and -modfile flags when
calling "go list" on the standard library, as those are the only two
flags which alter how we load the main module in a build.
The code which builds a modified cmd/link has a similar problem;
it already reset GOOS and GOARCH, but it could similarly run into
problems if other env vars like GOFLAGS were set.
To be on the safe side, we also disable GOENV and GOEXPERIMENT,
which we borrow from Go's bootstrapping commands.
We obfuscate import paths as well as their declared names.
The compiler treats some packages and APIs in special ways,
and the way it detects those is by looking at import paths and names.
In the past, we have avoided obfuscating some names like embed.FS or
reflect.Value.MethodByName for this reason. Otherwise,
go:embed or the linker's deadcode elimination might be broken.
This matching by path and name also happens with compiler intrinsics.
Intrinsics allow the compiler to rewrite some standard library calls
with small and efficient assembly, depending on the target GOARCH.
For example, math/bits.TrailingZeros32 gets replaced with ssa.OpCtz32,
which on amd64 may result in using the TZCNTL instruction.
We never noticed that we were breaking many of these intrinsics.
The intrinsics for funcs declared in the runtime and its dependencies
still worked properly, as we do not obfuscate those packages yet.
However, for other packages like math/bits and sync/atomic,
the intrinsics were being entirely disabled due to obfuscated names.
Skipping intrinsics is particularly bad for performance,
and it also leads to slightly larger binaries:
│ old │ new │
│ bin-B │ bin-B vs base │
Build-16 5.450Mi ± ∞ ¹ 5.333Mi ± ∞ ¹ -2.15% (p=0.029 n=4)
Finally, the main reason we noticed that intrinsics were broken
is that apparently GOARCH=mips fails to link without them,
as some symbols end up being not defined at all.
This patch fixes builds for the MIPS family of architectures.
Rather than building and linking all of std for every GOARCH,
test that intrinsics work by asking the compiler to print which
intrinsics are being applied, and checking that math/bits gets them.
This fix is relatively unfortunate, as it means we stop obfuscating
about 120 function names and a handful of package paths.
However, fixing builds and intrinsics is much more important.
We can figure out better ways to deal with intrinsics in the future.
Fixes#646.
The added test case would panic, because we would try to hash a name
with a broken package's GarbleActionID, which was empty.
We skipped over all package errors in appendListedPackages because two
kinds of errors were OK in the standard library.
However, this also meant we ignored real errors we should stop at,
because obfuscating those user packages is pointless.
Add more assertions, check for the OK errors explicitly,
and fail on any other error immediately.
Note that, in the process, I also found a bug in cmd/go.
Uncovered by github.com/bytedance/sonic,
whose internal/loader package fails to build on Go 1.20.
This value is hard-coded in the linker and written in a header.
We could rewrite the final binary, like we used to do with import paths,
but that would require once again maintaining libraries to do so.
Instead, we're now modifying the linker to do what we want.
It's not particularly hard, as every Go install has its source code,
and rebuilding a slightly modified linker only takes a few seconds at most.
Thanks to `go build -overlay`, we only need to copy the files we modify,
and right now we're just modifying one file in the toolchain.
We use a git patch, as the change is fairly static and small,
and the patch is easier to understand and maintain.
The other side of this change is in the runtime,
as it also hard-codes the magic value when loading information.
We modify the code via syntax trees in that case, like `-tiny` does,
because the change is tiny (one literal) and the affected lines of code
are modified regularly between major Go releases.
Since rebuilding a slightly modified linker can take a few seconds,
and Go's build cache does not cache linked binaries,
we keep our own cached version of the rebuilt binary in `os.UserCacheDir`.
The feature isn't perfect, and will be improved in the future.
See the TODOs about the added dependency on `git`,
or how we are currently only able to cache one linker binary at once.
Fixes#622.
We were obfuscating reflect's package path and its declared names,
but the toolchain wants to detect the presence of method reflection
to turn down the aggressiveness of dead code elimination.
Given that the obfuscation broke the detection,
we could easily end up in crashes when making reflect calls:
fatal error: unreachable method called. linker bug?
goroutine 1 [running]:
runtime.throw({0x50c9b3?, 0x2?})
runtime/panic.go:1047 +0x5d fp=0xc000063660 sp=0xc000063630 pc=0x43245d
runtime.unreachableMethod()
runtime/iface.go:532 +0x25 fp=0xc000063680 sp=0xc000063660 pc=0x40a845
runtime.call16(0xc00010a360, 0xc00000e0a8, 0x0, 0x0, 0x0, 0x8, 0xc000063bb0)
runtime/wcS9OpRFL:728 +0x49 fp=0xc0000636a0 sp=0xc000063680 pc=0x45eae9
runtime.reflectcall(0xc00001c120?, 0x1?, 0x1?, 0x18110?, 0xc0?, 0x1?, 0x1?)
<autogenerated>:1 +0x3c fp=0xc0000636e0 sp=0xc0000636a0 pc=0x462e9c
Avoid obfuscating the three names which cause problems: "reflect",
"Method", and "MethodByName".
While here, we also teach obfuscatedImportPath to skip "runtime",
as I also saw that the toolchain detects it for many reasons.
That wasn't a problem yet, as we do not obfuscate the runtime,
but it was likely going to become a problem in the future.
We can drop the code that kicked in when GOGARBLE was empty.
We can also add the value in addGarbleToHash unconditionally,
as we never allow it to be empty.
In the tests, remove all GOGARBLE lines where it just meant "obfuscate
everything" or "obfuscate the entire main module".
cgo.txtar had "obfuscate everything" as a separate step,
so remove it entirely.
linkname.txtar started failing because the imported package did not
import strings, so listPackage errored out. This wasn't a problem when
strings itself wasn't obfuscated, as transformLinkname silently left
strings.IndexByte untouched. It is a problem when IndexByte does get
obfuscated. Make that kind of listPackage error visible, and fix it.
reflect.txtar started failing with "unreachable method" runtime throws.
It's not clear to me why; it appears that GOGARBLE=* makes the linker
think that ExportedMethodName is suddenly unreachable.
Work around the problem by making the method explicitly reachable,
and leave a TODO as a reminder to investigate.
Finally, gogarble.txtar no longer needs to test for GOPRIVATE.
The rest of the test is left the same, as we still want the various
values for GOGARBLE to continue to work just like before.
Fixes#594.
Some big changes landed in Go for the upcoming 1.20.
While here, remove the use of GOGC=off with make.bash,
as https://go.dev/cl/436235 makes that unnecessary now.
We were still leaking the filenames for assembly files.
In our existing asm.txtar test's output binary,
the string `test/main/garble_main_amd64.s` was present.
This leaked full import paths on one hand,
and the filenames of each assembly file on the other.
We avoid this in Go files by using `/*line` directives,
but those are not supported in assembly files.
Instead, obfuscate the paths in the temporary directory.
Note that we still need a separate temporary directory per package,
because otherwise any included header files might collide.
We must remove the `main` package panic in obfuscatedImportPath,
as we now need to use that function for all packages.
While here, remove the outdated comment about `-trimpath`.
Fixes#605.
One more package that further unblocks obfuscating the runtime.
The issue was the TODO we already had about go:linkname directives with
just one argument, which are used in the syscall package.
While here, factor out the obfuscation of linkname directives into
transformLinkname, as it was starting to get a bit complex.
We now support debug logging as well, while still being able to use
"early returns" for some cases where we bail out.
We also need listPackage to treat all runtime sub-packages like it does
runtime itself, as `runtime/internal/syscall` linknames into `syscall`
without it being a dependency as well.
Finally, add a regression test that, without the fix,
properly spots that the syscall package was not obfuscated:
FAIL: testdata/script/gogarble.txtar:41: unexpected match for ["syscall.RawSyscall6"] in out
Updates #193.
This failed at link time because transformAsm did not know how to handle
the fact that the runtime package's assembly code implements the
`time.now` function via:
TEXT time·now<ABIInternal>(SB),NOSPLIT,$16-24
First, we need transformAsm to happen for all packages, not just the
ones that we are obfuscating. This is because the runtime can implement
APIs in other packages which are themselves obfuscated, whereas runtime
may not itself be getting obfuscated. This is currently the case with
`GOGARBLE=*` as we do not yet support obfuscating the runtime.
Second, we need to teach replaceAsmNames to handle qualified names with
import paths. Not just to look up the right package information for the
name, but also to obfuscate the package path if necessary.
Third, we need to relax the Deps requirement on listPackage, since the
runtime package and its dependencies are always implicit dependencies.
This is a big step towards being able to obfuscate the runtime, as there
is now just one package left that we cannot obfuscate outside the runtime.
Updates #193.
The generics issue has been fixed for the upcoming Go 1.20.
Include that version as a reminder for when we can drop Go 1.19.
The fs.SkipAll proposal is also implemented for Go 1.20.
The BinaryContentID comment was a little bit trickier.
We did get stamped VCS information some time ago,
but it only provides us with the current commit info and a dirty bit.
That is not enough for our use of the build cache,
because we want any uncommitted changes to garble to cause rebuilds.
I don't think we'll get any better than using garble's own build ID.
Reword the quasi-TODO to instead explain what we're doing and why.
See https://golang.org/issue/28749. The improved asm test would fail:
go parse: $WORK/imported/imported_amd64.s:1:1: expected 'package', found TEXT (and 2 more errors)
because we would incorrectly parse a non-Go file as a Go file.
Add a workaround. The original reporter's reproducer with go-ethereum
works now, as this was the last hiccup.
Fixes#555.
The reverse feature relied on `GoFiles` from `go list`,
but that list may not be enough to typecheck a package:
typecheck error: $WORK/main.go:3:15: undeclared name: longMain
`go help list` shows:
GoFiles []string // .go source files (excluding CgoFiles, TestGoFiles, XTestGoFiles)
CgoFiles []string // .go source files that import "C"
CompiledGoFiles []string // .go files presented to compiler (when using -compiled)
In other words, to mimic the same list of Go files fed to the compiler,
we want CompiledGoFiles.
Note that, since the cgo files show up as generated files,
we currently do not support reversing their filenames.
That is left as a TODO for now.
Updates #555.