Our test was essentially a "birthday paradox" simulation,
with 8 people in total and 7 possible birthdays.
Instead of checking whether two or more people share a birthday,
we tested whether six or more people shared a birthday.
Per a math/rand simulation with ten million iterations,
that test has a 0.13% flake rate.
That is low enough that we didn't immediately notice,
but high enough to realistically cause problems at some point.
My previous math in the test comment was clearly wrong.
First, we recently lowered the range by 1,
and `(1 / 7) ** 6` is three times higher as a probability.
Second, that's the probability of 6 people to share the same birthday,
without taking into account the extra two people in the total of 8.
That clearly drives the chances up, per the birthday paradox.
Double the number of people to 16,
and look for 14 or more to share a birthday.
The simulation returns a 0% chance at that point,
so it's extremely unlikely to cause issues.
Fixes#634.
The patch to the linker does this when generating the pclntab,
which is the binary section containing func names.
When `-tiny` is being used, we look for unexported funcs,
and set their names to the offset `0` - a shared empty string.
We also avoid including the original name in the binary,
which saves a significant amount of space.
The following stats were collected on GOOS=linux,
which show that `-tiny` is now about 4% smaller:
go build 1203067
garble build 782336
(old) garble -tiny build 688128
(new) garble -tiny build 659456
The test case is lifted from github.com/bytedance/sonic/internal/native,
which had the following line:
JMP github·com∕bytedance∕sonic∕internal∕native∕avx2·__quote(SB)
Updates #621.
This value is hard-coded in the linker and written in a header.
We could rewrite the final binary, like we used to do with import paths,
but that would require once again maintaining libraries to do so.
Instead, we're now modifying the linker to do what we want.
It's not particularly hard, as every Go install has its source code,
and rebuilding a slightly modified linker only takes a few seconds at most.
Thanks to `go build -overlay`, we only need to copy the files we modify,
and right now we're just modifying one file in the toolchain.
We use a git patch, as the change is fairly static and small,
and the patch is easier to understand and maintain.
The other side of this change is in the runtime,
as it also hard-codes the magic value when loading information.
We modify the code via syntax trees in that case, like `-tiny` does,
because the change is tiny (one literal) and the affected lines of code
are modified regularly between major Go releases.
Since rebuilding a slightly modified linker can take a few seconds,
and Go's build cache does not cache linked binaries,
we keep our own cached version of the rebuilt binary in `os.UserCacheDir`.
The feature isn't perfect, and will be improved in the future.
See the TODOs about the added dependency on `git`,
or how we are currently only able to cache one linker binary at once.
Fixes#622.
v0.8.0 was improved to obfuscate names so that they didn't all end up
with the same length. We went from a length fixed to 8 to a range
between 8 and 15.
Since the new mechanism chooses a length evenly between 8 and 15,
the average hashed name length went from 8 to 11.5.
While this improved obfuscation, it also increased the size of
obfuscated binaries by quite a bit. We do need to use a reasonably large
length to avoid collisions within packages, but we also don't want it to
be large enough to cause too much bloat.
Reduce the minimum and maximum from 8 and 15 to 6 and 12. This means
that the average goes fro 11.5 to 9, much closer to the old average.
We could consider a range of 4 to 12, but 4 bytes is short enough where
collisions become likely in practice, even if it's just the minimum.
And a range of 6 to 10 shrinks the range a bit too much.
name old time/op new time/op delta
Build-16 20.6s ± 0% 20.6s ± 0% ~ (p=0.421 n=5+5)
name old bin-B new bin-B delta
Build-16 5.77M ± 0% 5.66M ± 0% -1.92% (p=0.008 n=5+5)
name old cached-time/op new cached-time/op delta
Build-16 705ms ± 4% 703ms ± 2% ~ (p=0.690 n=5+5)
name old mallocs/op new mallocs/op delta
Build-16 25.0M ± 0% 25.0M ± 0% -0.08% (p=0.008 n=5+5)
name old sys-time/op new sys-time/op delta
Build-16 8.11s ± 2% 8.11s ± 2% ~ (p=0.841 n=5+5)
Updates #618.
We were obfuscating reflect's package path and its declared names,
but the toolchain wants to detect the presence of method reflection
to turn down the aggressiveness of dead code elimination.
Given that the obfuscation broke the detection,
we could easily end up in crashes when making reflect calls:
fatal error: unreachable method called. linker bug?
goroutine 1 [running]:
runtime.throw({0x50c9b3?, 0x2?})
runtime/panic.go:1047 +0x5d fp=0xc000063660 sp=0xc000063630 pc=0x43245d
runtime.unreachableMethod()
runtime/iface.go:532 +0x25 fp=0xc000063680 sp=0xc000063660 pc=0x40a845
runtime.call16(0xc00010a360, 0xc00000e0a8, 0x0, 0x0, 0x0, 0x8, 0xc000063bb0)
runtime/wcS9OpRFL:728 +0x49 fp=0xc0000636a0 sp=0xc000063680 pc=0x45eae9
runtime.reflectcall(0xc00001c120?, 0x1?, 0x1?, 0x18110?, 0xc0?, 0x1?, 0x1?)
<autogenerated>:1 +0x3c fp=0xc0000636e0 sp=0xc0000636a0 pc=0x462e9c
Avoid obfuscating the three names which cause problems: "reflect",
"Method", and "MethodByName".
While here, we also teach obfuscatedImportPath to skip "runtime",
as I also saw that the toolchain detects it for many reasons.
That wasn't a problem yet, as we do not obfuscate the runtime,
but it was likely going to become a problem in the future.
We can drop the code that kicked in when GOGARBLE was empty.
We can also add the value in addGarbleToHash unconditionally,
as we never allow it to be empty.
In the tests, remove all GOGARBLE lines where it just meant "obfuscate
everything" or "obfuscate the entire main module".
cgo.txtar had "obfuscate everything" as a separate step,
so remove it entirely.
linkname.txtar started failing because the imported package did not
import strings, so listPackage errored out. This wasn't a problem when
strings itself wasn't obfuscated, as transformLinkname silently left
strings.IndexByte untouched. It is a problem when IndexByte does get
obfuscated. Make that kind of listPackage error visible, and fix it.
reflect.txtar started failing with "unreachable method" runtime throws.
It's not clear to me why; it appears that GOGARBLE=* makes the linker
think that ExportedMethodName is suddenly unreachable.
Work around the problem by making the method explicitly reachable,
and leave a TODO as a reminder to investigate.
Finally, gogarble.txtar no longer needs to test for GOPRIVATE.
The rest of the test is left the same, as we still want the various
values for GOGARBLE to continue to work just like before.
Fixes#594.
Up to this point, -debugdir only included obfuscated Go files.
Include assembly files and their headers as well.
While here, ensure that -debugdir supports the standard library too,
and that it behaves correctly with build tags as well.
Also use more consistent names for path strings, to clarify which are
only the basename, and which are the obfuscated version.
Some big changes landed in Go for the upcoming 1.20.
While here, remove the use of GOGC=off with make.bash,
as https://go.dev/cl/436235 makes that unnecessary now.
We were still leaking the filenames for assembly files.
In our existing asm.txtar test's output binary,
the string `test/main/garble_main_amd64.s` was present.
This leaked full import paths on one hand,
and the filenames of each assembly file on the other.
We avoid this in Go files by using `/*line` directives,
but those are not supported in assembly files.
Instead, obfuscate the paths in the temporary directory.
Note that we still need a separate temporary directory per package,
because otherwise any included header files might collide.
We must remove the `main` package panic in obfuscatedImportPath,
as we now need to use that function for all packages.
While here, remove the outdated comment about `-trimpath`.
Fixes#605.
A main package can be imported in one edge case we weren't testing:
when the same main package has any tests, which implicitly import main.
Before the fix, we would panic:
> garble test -v ./...
# test/bar/somemaintest.test
panic: main packages should never need to obfuscate their import paths
The last change made it so that hashWithCustomSalt does not always end
up with 8 base64 characters, which is a good change for the sake of
avoiding easy patterns in obfuscated code.
However, the docs weren't updated accordingly, and it wasn't
particularly clear that the byte giving us randomness wasn't part of the
resulting base64-encoded name.
First, refactor the code to only feed as many checksum bytes to the
base64 encoder as necessary, which is 12.
This shrinks b64NameBuffer and saves us some base64 encoding work.
Second, use the first checksum byte that we don't use, the 13th,
as the source of the randomness.
Note how before we used a base64-encoded byte for the randomness,
which isn't great as that byte was only one of 63 characters,
whereas a checksum byte is one of 256.
Third, update the docs so that the code is as clear as possible.
This is particularly important given that we have no tests.
With debug prints in the gogarble.txt test, we can see that the
randomness in hash lengths is working as intended:
# test/main/stdimporter
hashLength = 13
hashLength = 8
hashLength = 12
hashLength = 15
hashLength = 10
hashLength = 15
hashLength = 9
hashLength = 8
hashLength = 15
hashLength = 15
hashLength = 12
hashLength = 10
hashLength = 13
hashLength = 13
hashLength = 8
hashLength = 15
hashLength = 11
Finally, add a regression test that will complain if we end up with
hashed names that reuse the same length too often.
Out of eight hashed names, the test will fail if six end up with the
same length, as that is incredibly unlikely given that each should pick
one of eight lengths with a fair distribution.
One more package that further unblocks obfuscating the runtime.
The issue was the TODO we already had about go:linkname directives with
just one argument, which are used in the syscall package.
While here, factor out the obfuscation of linkname directives into
transformLinkname, as it was starting to get a bit complex.
We now support debug logging as well, while still being able to use
"early returns" for some cases where we bail out.
We also need listPackage to treat all runtime sub-packages like it does
runtime itself, as `runtime/internal/syscall` linknames into `syscall`
without it being a dependency as well.
Finally, add a regression test that, without the fix,
properly spots that the syscall package was not obfuscated:
FAIL: testdata/script/gogarble.txtar:41: unexpected match for ["syscall.RawSyscall6"] in out
Updates #193.
See https://golang.org/issue/28749. The improved asm test would fail:
go parse: $WORK/imported/imported_amd64.s:1:1: expected 'package', found TEXT (and 2 more errors)
because we would incorrectly parse a non-Go file as a Go file.
Add a workaround. The original reporter's reproducer with go-ethereum
works now, as this was the last hiccup.
Fixes#555.
The reverse feature relied on `GoFiles` from `go list`,
but that list may not be enough to typecheck a package:
typecheck error: $WORK/main.go:3:15: undeclared name: longMain
`go help list` shows:
GoFiles []string // .go source files (excluding CgoFiles, TestGoFiles, XTestGoFiles)
CgoFiles []string // .go source files that import "C"
CompiledGoFiles []string // .go files presented to compiler (when using -compiled)
In other words, to mimic the same list of Go files fed to the compiler,
we want CompiledGoFiles.
Note that, since the cgo files show up as generated files,
we currently do not support reversing their filenames.
That is left as a TODO for now.
Updates #555.
Assembly files can include header files within the same Go module,
and those header files can include "defines" which refer to Go names.
Since those Go names are likely being obfuscated,
we need to replace them just like we do in assembly files.
The added mechanism is rather basic; we add two TODOs to improve it.
This should help when building projects like go-ethereum.
Fixes#553.
This way, the child process knows that it's running a toolchain command
via -toolexec without having to guess via filepath.IsAbs.
While here, improve the docs and tests a bit.
Following the best practices from upstream.
In particular, the "txt" extension is somewhat ambiguous.
This may cause some conflicts due to the git diff noise,
but hopefully we won't ever do this again.