We were still leaking the filenames for assembly files.
In our existing asm.txtar test's output binary,
the string `test/main/garble_main_amd64.s` was present.
This leaked full import paths on one hand,
and the filenames of each assembly file on the other.
We avoid this in Go files by using `/*line` directives,
but those are not supported in assembly files.
Instead, obfuscate the paths in the temporary directory.
Note that we still need a separate temporary directory per package,
because otherwise any included header files might collide.
We must remove the `main` package panic in obfuscatedImportPath,
as we now need to use that function for all packages.
While here, remove the outdated comment about `-trimpath`.
Fixes#605.
A main package can be imported in one edge case we weren't testing:
when the same main package has any tests, which implicitly import main.
Before the fix, we would panic:
> garble test -v ./...
# test/bar/somemaintest.test
panic: main packages should never need to obfuscate their import paths
The last change made it so that hashWithCustomSalt does not always end
up with 8 base64 characters, which is a good change for the sake of
avoiding easy patterns in obfuscated code.
However, the docs weren't updated accordingly, and it wasn't
particularly clear that the byte giving us randomness wasn't part of the
resulting base64-encoded name.
First, refactor the code to only feed as many checksum bytes to the
base64 encoder as necessary, which is 12.
This shrinks b64NameBuffer and saves us some base64 encoding work.
Second, use the first checksum byte that we don't use, the 13th,
as the source of the randomness.
Note how before we used a base64-encoded byte for the randomness,
which isn't great as that byte was only one of 63 characters,
whereas a checksum byte is one of 256.
Third, update the docs so that the code is as clear as possible.
This is particularly important given that we have no tests.
With debug prints in the gogarble.txt test, we can see that the
randomness in hash lengths is working as intended:
# test/main/stdimporter
hashLength = 13
hashLength = 8
hashLength = 12
hashLength = 15
hashLength = 10
hashLength = 15
hashLength = 9
hashLength = 8
hashLength = 15
hashLength = 15
hashLength = 12
hashLength = 10
hashLength = 13
hashLength = 13
hashLength = 8
hashLength = 15
hashLength = 11
Finally, add a regression test that will complain if we end up with
hashed names that reuse the same length too often.
Out of eight hashed names, the test will fail if six end up with the
same length, as that is incredibly unlikely given that each should pick
one of eight lengths with a fair distribution.
One more package that further unblocks obfuscating the runtime.
The issue was the TODO we already had about go:linkname directives with
just one argument, which are used in the syscall package.
While here, factor out the obfuscation of linkname directives into
transformLinkname, as it was starting to get a bit complex.
We now support debug logging as well, while still being able to use
"early returns" for some cases where we bail out.
We also need listPackage to treat all runtime sub-packages like it does
runtime itself, as `runtime/internal/syscall` linknames into `syscall`
without it being a dependency as well.
Finally, add a regression test that, without the fix,
properly spots that the syscall package was not obfuscated:
FAIL: testdata/script/gogarble.txtar:41: unexpected match for ["syscall.RawSyscall6"] in out
Updates #193.
This failed at link time because transformAsm did not know how to handle
the fact that the runtime package's assembly code implements the
`time.now` function via:
TEXT time·now<ABIInternal>(SB),NOSPLIT,$16-24
First, we need transformAsm to happen for all packages, not just the
ones that we are obfuscating. This is because the runtime can implement
APIs in other packages which are themselves obfuscated, whereas runtime
may not itself be getting obfuscated. This is currently the case with
`GOGARBLE=*` as we do not yet support obfuscating the runtime.
Second, we need to teach replaceAsmNames to handle qualified names with
import paths. Not just to look up the right package information for the
name, but also to obfuscate the package path if necessary.
Third, we need to relax the Deps requirement on listPackage, since the
runtime package and its dependencies are always implicit dependencies.
This is a big step towards being able to obfuscate the runtime, as there
is now just one package left that we cannot obfuscate outside the runtime.
Updates #193.
The generics issue has been fixed for the upcoming Go 1.20.
Include that version as a reminder for when we can drop Go 1.19.
The fs.SkipAll proposal is also implemented for Go 1.20.
The BinaryContentID comment was a little bit trickier.
We did get stamped VCS information some time ago,
but it only provides us with the current commit info and a dirty bit.
That is not enough for our use of the build cache,
because we want any uncommitted changes to garble to cause rebuilds.
I don't think we'll get any better than using garble's own build ID.
Reword the quasi-TODO to instead explain what we're doing and why.
See https://golang.org/issue/28749. The improved asm test would fail:
go parse: $WORK/imported/imported_amd64.s:1:1: expected 'package', found TEXT (and 2 more errors)
because we would incorrectly parse a non-Go file as a Go file.
Add a workaround. The original reporter's reproducer with go-ethereum
works now, as this was the last hiccup.
Fixes#555.
The reverse feature relied on `GoFiles` from `go list`,
but that list may not be enough to typecheck a package:
typecheck error: $WORK/main.go:3:15: undeclared name: longMain
`go help list` shows:
GoFiles []string // .go source files (excluding CgoFiles, TestGoFiles, XTestGoFiles)
CgoFiles []string // .go source files that import "C"
CompiledGoFiles []string // .go files presented to compiler (when using -compiled)
In other words, to mimic the same list of Go files fed to the compiler,
we want CompiledGoFiles.
Note that, since the cgo files show up as generated files,
we currently do not support reversing their filenames.
That is left as a TODO for now.
Updates #555.
Assembly files can include header files within the same Go module,
and those header files can include "defines" which refer to Go names.
Since those Go names are likely being obfuscated,
we need to replace them just like we do in assembly files.
The added mechanism is rather basic; we add two TODOs to improve it.
This should help when building projects like go-ethereum.
Fixes#553.
This way, the child process knows that it's running a toolchain command
via -toolexec without having to guess via filepath.IsAbs.
While here, improve the docs and tests a bit.
Following the best practices from upstream.
In particular, the "txt" extension is somewhat ambiguous.
This may cause some conflicts due to the git diff noise,
but hopefully we won't ever do this again.
I was wrongly assumed that, if `used` has an `Elem` method,
then `origin` must too. But it does not if it's a type parameter.
Add a test case too, which panicked before the fix.
Fixes#577.
While here, start the changelog for the upcoming release,
which will likely be a bugfix release as it's a bit early to drop 1.18.
We also bump staticcheck to get a version that supports 1.19.
I also noticed the "Go version X or newer" messages were slightly weird
and inconsistent. Our policy, per the README, is "Go version X or newer",
so the errors given to the user were unnecessarily confusing.
For example, now that Go 1.19 is out, we shouldn't simply recommend that
they upgrade to 1.18; we should recommend 1.18 or later.
When obfuscating the following piece of code:
func issue_573(s struct{ f int }) {
var _ *int = &s.f
/*x*/
}
the function body would roughly end up printed as:
we would roughly end up with:
var _ *int = &dZ4xYx3N
/*x*/.rbg1IM3V
Note that the /*x*/ comment got moved earlier in the source code.
This happens because the new identifiers are longer, so the printer
thinks that the selector now ends past the comment.
That would be fine - we don't really mind where comments end up,
because these non-directive comments end up being removed anyway.
However, the resulting syntax is wrong, as the period for the selector
must be on the first line rather than the second.
This is a go/printer bug that we should fix upstream,
but until then, we must work around it in Go 1.18.x and 1.19.x.
The fix is somewhat obvious in hindsight. To reduce the chances that
go/printer will trip over comments and produce invalid syntax,
get rid of most comments before we use the printer.
We still keep the removal of comments after printing,
since go/printer consumes some comments in ast.Node Doc fields.
Add the minimized unit test case above, and add the upstream project
that found this bug to check-third-party.
andybalholm/brotli helps cover a compression algorithm and ccgo code
generation from C to Go, and it's also a fairly popular module,
particular with HTTP implementations which want pure-Go brotli.
While here, fix the check-third-party script: it was setting GOFLAGS
a bit too late, so it may run `go get` on the wrong mod file.
Fixes#573.
Every now and then, I get test failures in the goenv test like:
> [!windows] cp $EXEC_PATH $NAME/garble$exe
> [!windows] exec $NAME/garble$exe build
[fork/exec with"double"quotes/garble: text file busy]
FAIL: testdata/scripts/goenv.txt:21: unexpected command failure
The root cause is https://go.dev/issue/22315, which isn't going to be
fixed anytime soon, as it is a race condition in Linux itself, triggered
by how heavily concurrent Go tends to be.
For now, try to make the race much less likely to happen.
The changes are all fairly minor and non-breaking,
and the last release was less than two months ago,
so a bugfix release sounds like the right choice.
That is, since Go 1.18.1, released back in April 2022.
We no longer need to worry about the buggy Go 1.18.0.
While here, use a clearer env var name; the settings are build settings.
Make the literals section easier to follow by using the word
"expression" more consistently.
The tiny section was misleading as well, as the README made it seem like
position information was stripped both with and without the flag.
Clarify what happens with that information in each case.
We obfuscate import paths in import declarations like:
"domain.com/somepkg"
by replacing them with the obfuscated package path:
somepkg "HPS4Mskq"
Note how we add a name to the import if there wasn't one,
so that references like somepkg.Foo keep working in the code.
This could break in some edge cases involving comments between imports
in the Go code, because go/printer is somewhat brittle with positions:
> garble build -tags buildtag
[stderr]
# test/main/importedpkg
:16: syntax error: missing import path
exit status 2
exit status 2
To prevent that, ensure the name has a reasonable position.
This was preventing github.com/gorilla/websocket from being obufscated.
It is a fairly popular library in Go, but we don't add it to
scripts/check-third-party.sh for now as wireguard already gives us
coverage over networking and cryptography.
These lines get executed for every identifier in every package in each
Go build, so one allocation per log.Printf call can quickly add up to
millions of allocations across a build.
Until https://go.dev/issue/53465 is fixed, the best way to avoid the
escaping due to `...any` is to not perform the function call at all.
name old time/op new time/op delta
Build-16 10.5s ± 1% 10.5s ± 2% ~ (p=0.604 n=9+10)
name old bin-B new bin-B delta
Build-16 5.52M ± 0% 5.52M ± 0% ~ (all equal)
name old cached-time/op new cached-time/op delta
Build-16 506ms ±13% 500ms ± 7% ~ (p=0.739 n=10+10)
name old mallocs/op new mallocs/op delta
Build-16 31.7M ± 0% 30.1M ± 0% -5.33% (p=0.000 n=10+9)
name old sys-time/op new sys-time/op delta
Build-16 5.70s ± 5% 5.78s ± 6% ~ (p=0.278 n=9+10)
There used to be a reason to keep these maps separate, but ever since we
became better at obfuscating the standard library, that has gone away.
It's still a good idea to keep `go list -deps runtime` as a group,
but we can do that via a comment inside a joint map literal.
I also noticed that one comment still referred to cannotObfuscateNames,
which hasn't existed for some time. Fix that up.
It's also not documented how cachedOutput contains info for all deps,
so clarify that while we're improving the docs.
Finally, the reason we cannot obfuscate the syscall package was out of
date; it's not part of the runtime. It is a go:linkname bug.
A chunk from crypto/internal/boring has been split away as a separate
package very recently, shortly before 1.19rc1 is due for release.
See https://go.dev/cl/407135 for more information.
Makes garble work on the latest Go tip again.
It's not a problem to leak filenames like _cgo_gotypes.go,
but it is a problem when it includes the import path:
$ strings main | grep _cgo_gotypes
test/main/_cgo_gotypes.go
Here, "test/main" is the module path, which we want to hide.
We hadn't caught this before because the cgo.txt test did not check that
module paths aren't being leaked - it does now.
The fix is rather simple; we let printFile handle cgo-generated files.
We used to avoid that due to compiler errors, as the compiler only
allows some special cgo comment directives to work in cgo-generated
code, to prevent misuse in user code.
The fix is rather easy: the obfuscated filenames should begin with
"_cgo_" to appease the compiler's check.
Back in February 2021, we changed the obfuscation logic so that the
entire `garble build` process would use one shared temporary directory
across all package builds, reducing the amount of files we created in
the top-level system temporary directory.
However, we made one mistake: we didn't swap os.Remove for os.RemoveAll.
Ever since then, we've been leaving temporary files behind.
Add regression tests, which failed before the fix, and fix the bug.
Note that we need to test `garble reverse` as well, as it calls
toolexecCmd separately, so it needs its own cleanup as well.
The cleanup happens via the env var, which doesn't feel worse than
having toolexecCmd return an extra string or cleanup func.
While here, also test that we support TMPDIRs with special characters.
printFile is one of the functions to blame for most of the CPU cost and
allocations for garble itself, as reported by `perf record` for a clean build.
One contributor is how we print each file and then parse it again,
which we did for the sake of inserting line directives correctly.
With a bit of care, we can do this by tokenizing after printing,
as opposed to parsing into a full go/ast again.
This is moderately cheaper, but more than anything, allocates far less.
That is to be expected given how go/ast is a tree of pointers,
whereas go/scanner simply gives us a stream of tokens.
name old time/op new time/op delta
Build-16 10.4s ± 2% 10.3s ± 1% ~ (p=0.393 n=10+10)
name old bin-B new bin-B delta
Build-16 5.51M ± 0% 5.51M ± 0% ~ (all equal)
name old cached-time/op new cached-time/op delta
Build-16 398ms ±12% 391ms ±10% ~ (p=0.529 n=10+10)
name old mallocs/op new mallocs/op delta
Build-16 34.4M ± 0% 31.8M ± 0% -7.65% (p=0.000 n=10+10)
name old sys-time/op new sys-time/op delta
Build-16 5.80s ± 6% 5.86s ± 4% ~ (p=0.218 n=10+10)
The new code is shorter, but perhaps a bit trickier,
so I also added more comments to explain what's going on.
Note how the time/op change is practically noise,
but mallocs/op goes down significantly, which is always a good sign.
Reuse a buffer and a map across loop iterations, because we can.
Make recordTypeDone only track named types, as that is enough to detect
type cycles. Without named types, there can be no cycles.
These two reduce allocs by a fraction of a percent:
name old time/op new time/op delta
Build-16 10.4s ± 2% 10.4s ± 1% ~ (p=0.739 n=10+10)
name old bin-B new bin-B delta
Build-16 5.51M ± 0% 5.51M ± 0% ~ (all equal)
name old cached-time/op new cached-time/op delta
Build-16 391ms ± 9% 407ms ± 7% ~ (p=0.095 n=10+9)
name old mallocs/op new mallocs/op delta
Build-16 34.5M ± 0% 34.4M ± 0% -0.12% (p=0.000 n=10+10)
name old sys-time/op new sys-time/op delta
Build-16 5.87s ± 5% 5.82s ± 5% ~ (p=0.182 n=10+9)
It doesn't seem like much, but remember that these stats are for the
entire set of processes, where garble only accounts for about 10% of the
total wall time when compared to the compiler or linker. So a ~0.1%
decrease globally is still significant.
linkerVariableStrings is also indexed by *types.Var rather than types.Object,
since -ldflags=-X only supports setting the string value of variables.
This shouldn't make a significant difference in terms of allocs,
but at least the map is less prone to confusion with other object types.
To ensure the new code doesn't trip up on non-variables, we add test cases.
Finally, for the sake of clarity, index into the types.Info maps like
Defs and Uses rather than calling ObjectOf if we know whether the
identifier we have is a definition of a name or the use of a defined name.
This isn't better in terms of performance, as ObjectOf is a tiny method,
but just like with linkerVariableStrings before, the new code is clearer.
The _gomod_.go file inserted by the Go toolchain no longer shows up;
it's likely that either the -trimpath or -buildvcs=false flags are
preventing that extra bit of work from happening entirely.
The modinfo.txt test ensures that we're not breaking,
and the inner lines of code weren't hit as part of `go test`.
It also appears that we don't need to avoid obfuscating functions
defined with an `//export` directive. This is likely because cgo runs as
a pre-process step compared to the compiler, so us removing the
directive later does not make a difference.
We might need to revisit this in the future if we implement obfuscating
Go code instead of builds, e.g. `garble export`.
Just in case, I've expanded the cgo.txt test to also include one more
kind of cgo integration: an "import C" block including a C header file.
Either of these changes are slightly risky, as our tests don't cover all
edge cases. We've just done a release, so now is the time to try them.
Now that we've required Go 1.18 or later for some time,
stop supporting `// +build` directives entirely.
That should be fine, given that `go build` will fail too.
The TODO about ToObfuscate is also obsolete; see the added comment.
Finally, tweak the comments a bit after reading them again.
We don't use go/ast.Objects, as we use go/types instead.
Avoiding this work saves a bit of CPU and memory allocs.
name old time/op new time/op delta
Build-16 10.2s ± 1% 10.2s ± 1% ~ (p=0.937 n=6+6)
name old bin-B new bin-B delta
Build-16 5.47M ± 0% 5.47M ± 0% ~ (all equal)
name old cached-time/op new cached-time/op delta
Build-16 328ms ±14% 321ms ± 6% ~ (p=0.589 n=6+6)
name old mallocs/op new mallocs/op delta
Build-16 34.8M ± 0% 34.0M ± 0% -2.26% (p=0.010 n=6+4)
name old sys-time/op new sys-time/op delta
Build-16 5.89s ± 3% 5.89s ± 3% ~ (p=0.937 n=6+6)
See golang/go#52463.
It appears that we already support obfuscating them,
and nothing seems to break when they are pulled in.
While here, add runtime/internal/syscall to runtimeAndDeps.
It first appeared in Go 1.18, but we missed adding it.
It seems like not having it there didn't cause any issues,
which makes sense given it's got almost zero Go code.
We also teach garble about the -work boolean build flag,
which has existed for multiple years but we forgot about.
It's likely that noone noticed as it's a rarely used flag.
If we don't quote it, paths containing spaces or quote characters will
fail. For instance, the added test without the fix fails:
> env NAME='with spaces'
> mkdir $NAME
> cp $EXEC_PATH $NAME/garble$exe
> exec $NAME/garble$exe build main.go
[stderr]
go tool compile: fork/exec $WORK/with: no such file or directory
exit status 1
Luckily, the fix is easy: we bundle Go's cmd/internal/quoted package,
which implements a QuotedJoin API for this very purpose.
Fixes#544.
First, I tried to follow my own past advice to only set GarbleActionID
if ToObfuscate is true. However, that broke at least three parts of
transformCompile, as the hash is used for more than I recalled.
Give up on that idea, because the current code is working as intended.
Better document what GarbleActionID is and what we use it for.
Second, now that https://go.dev/cl/348741 was shipped with Go 1.18,
using the logger when its output is io.Discard is already a no-op.
So we no longer need our debugf wrapper to apply the no-op logic.
First, two cleanups: unsafe and internal/abi are already in
runtimeAndDeps, so they are already not being obfuscated.
No need to repeat them in the other map.
Then, via trial and error, remove:
* runtime/pprof; it seems like we handle its runtime linknames well now.
* os/signal; unclear what "rebuilds don't work" meant, but it works now.
Our gogarble.txt test already does a full reproducible rebuild.
* crypto/x509/internal/macos; another linkname user that works now.
It is likely that we could remove one or two more packages already,
but it's best to move slowly and watch out for unexpected regressions.
When splitFlagsFromFiles saw "-p foo/bar.go",
it thought that was the first Go file, when in fact it's not.
We didn't notice because such import paths are pretty rare,
but they do exist, like github.com/nats-io/nats.go.
Before the fix, the added test case fails as expected:
> garble build -tags buildtag
[stderr]
# test/main/goextension.go
open test/main/goextension.go: no such file or directory
We could go through the trouble of teaching splitFlagsFromFiles about
all of the flags understood by the compiler and linker, but that feels
like far more code than the small alternative we went with.
And I'm pretty sure the alternative will work pretty reliably for now.
Fixes#539.
Our tests should already be pretty extensive,
and any bug fixes should result in more regression test cases,
but testing against a few diverse and popular third party modules
will help prevent unintended regressions while developing garble.
The list is short for now. More can be added later.
This adds protobuf and wireguard from the original issue,
but not cobra and logrus, as they aren't particularly complex nor add
significant variety on top of protobuf and wireguard.
While here, we remove the job that only runs crlf-test.sh,
as we don't really need a separate job for a tiny script.
Fixes#240.
It added packages which are only built with the boringcrypto build tag,
so trying to `go list` them will fail even though it doesn't matter.
While here, a few more minor cleanups:
1) Hide GarbleActionID and ToObfuscate from encoding/json, so that they
can't possibly collide with the fields consumed from `go list -json`.
2) Add test cases for `garble build` with packages that fail to load.
Note that this requires GOGARBLE=* to avoid its "does not match any
package to be built" error.
3) Remove the last use of interface{}, in a testdata file.
Fixes#531.
The seed obfuscator uses a type declaration in order to declare a function,
which returns a function with the same type.
This breaks when obfuscating literals inside generic functions, because
type declarations inside generic functions are not currently supported.
Therefore the obfuscator gets disabled until
https://github.com/golang/go/issues/47631 is fixed.