We rely on importcfg files to load type info for obfuscated packages.
We use this type information to remember what names we didn't obfuscate.
Unfortunately, indirect dependencies aren't listed in importcfg files,
so we relied on extra "go list -toolexec" calls to locate object files.
This worked fine, but added a significant amount of complexity.
The extra "go list -export -toolexec=garble" invocations weren't slow,
as they avoided rebuilding or re-obfuscating thanks to the build cache.
Still, it was hard to reason about how garble runs during a build
if we might have multiple layers of -toolexec invocations.
Instead, record the export files we encounter in an incremental map,
and persist it in the build cache via the gob file we're already using.
This way, each garble invocation knows where all object files are,
even those for indirect imports.
One wrinkle is that importcfg files can point to temporary object files.
In that case, figure out its final location in the build cache.
This requires hard-coding a bit of knowledge about how GOCACHE works,
but it seems relatively harmless given how it's very little code.
Plus, if GOCACHE ever changes, it will be obvious when our code breaks.
Finally, add a TODO about potentially saving even more work.
For packages that we alter, we parse and modify the importcfg file.
Parsing is necessary so we can locate obfuscated object files,
which we use to remember what identifiers were obfuscated.
Modifying the files is necessary when we obfuscate import paths,
and those import paths have entries in an importcfg file.
However, we made one crucial mistake when writing the code.
When handling importmap entries such as:
importmap golang.org/x/net/idna=vendor/golang.org/x/net/idna
we would name the two sides beforePath and afterPath, respectively.
They were added to importMap with afterPath as the key,
but when we iterated over the map to write a modified importcfg file,
we would then assume the key is beforepath.
All in all, we would end up writing the opposite direction:
importmap vendor/golang.org/x/net/idna=golang.org/x/net/idna
This would ultimately result in the importmap never being useful,
and some rather confusing error messages such as:
cannot find package golang.org/x/net/idna (using -importcfg)
Add a test case that reproduces this error,
and fix the code so it always uses beforePath as the key.
Note that we were also updating importCfgEntries with such entries.
I could not reproduce any failure when just removing that code,
nor could I explain why it was added there in the first place.
As such, remove that bit of code as well.
Finally, a reasonable question might be why we never noticed the bug.
In practice, such "importmap"s, represented as ImportMap by "go list",
only currently appear for packages vendored into the standard library.
Until very recently, we didn't support obfuscating most of std,
so we would usually not alter the affected importcfg files.
Now that we do parse and modify them, the bug surfaced.
Fixes#408.
In the added test case, "garble -literals build" would fail:
--- FAIL: TestScripts/literals (8.29s)
testscript.go:397:
> env GOPRIVATE=test/main
> garble -literals build
[stderr]
# test/main
Usz1FmFm.go:1: cannot call non-function string (type int), declared at Usz1FmFm.go:1
Usz1FmFm.go:1: string is not a type
Usz1FmFm.go:1: cannot call non-function append (type int), declared at Usz1FmFm.go:1
That is, for input code such as:
var append int
println("foo")
_ = append
We'd end up with obfuscated code like:
var append int
println(func() string {
// obfuscation...
x = append(x, ...)
// obfuscation...
return string(x)
})
_ = append
Which would then break, as the code is shadowing the "append" builtin.
To work around this, always obfuscate variable names, so we end up with:
var mwu1xuNz int
println(func() string {
// obfuscation...
x = append(x, ...)
// obfuscation...
return string(x)
})
_ = mwu1xuNz
This change shouldn't make the quality of our obfuscation stronger,
as local variable names do not currently end up in Go binaries.
However, this does make garble more consistent in treating identifiers,
and it completely avoids any issues related to shadowing builtins.
Moreover, this also paves the way for publishing obfuscated source code,
such as #369.
Fixes#417.
While investigating a bug report,
I noticed that garble was writing to the same temp file twice.
At best, writing to the same path on disk twice is wasteful,
as the design is careful to be deterministic and use unique paths.
At worst, the two writes could cause races at the filesystem level.
To prevent either of those situations,
we now create files with os.OpenFile and os.O_EXCL,
meaning that we will error if the file already exists.
That change uncovered a number of such unintended cases.
First, transformAsm would write obfuscated Go files twice.
This is because the Go toolchain actually runs:
[...]/asm -gensymabis [...] foo.s bar.s
[...]/asm [...] foo.s bar.s
That is, the first run is only meant to generate symbol ABIs,
which are then used by the compiler.
We need to obfuscate at that first stage,
because the symbol ABI descriptions need to use obfuscated names.
However, having already obfuscated the assembly on the first stage,
there is no need to do so again on the second stage.
If we detect gensymabis is missing, we simply reuse the previous files.
This first situation doesn't seem racy,
but obfuscating the Go assembly files twice is certainly unnecessary.
Second, saveKnownReflectAPIs wrote a gob file to the build cache.
Since the build cache can be kept between builds,
and since the build cache uses reproducible paths for each build,
running the same "garble build" twice could overwrite those files.
This could actually cause races at the filesystem level;
if two concurrent builds write to the same gob file on disk,
one of them could end up using a partially-written file.
Note that this is the only of the three cases not using temporary files.
As such, it is expected that the file may already exist.
In such a case, we simply avoid overwriting it rather than failing.
Third, when "garble build -a" was used,
and when we needed an export file not listed in importcfg,
we would end up calling roughly:
go list -export -toolexec=garble -a <dependency>
This meant we would re-build and re-obfuscate those packages.
Which is unfortunate, because the parent process already did via:
go build -toolexec=garble -a <main>
The repeated dependency builds tripped the new os.O_EXCL check,
as we would try to overwrite the same obfuscated Go files.
Beyond being wasteful, this could again cause subtle filesystem races.
To fix the problem, avoid passing flags like "-a" to nested go commands.
Overall, we should likely be using safer ways to write to disk,
be it via either atomic writes or locked files.
However, for now, catching duplicate writes is a big step.
I have left a self-assigned TODO for further improvements.
CI on the pull request found a failure on test-gotip.
The failure reproduces on master, so it seems to be related to gotip,
and not a regression introduced by this change.
For now, disable test-gotip until we can investigate.
That is, use a very specific build tag and git commit,
and ensure that neither ends up in the binary.
Luckily, we have nothing to do here.
We were already removing _gomod_.go from the build entirely,
and that is still the mechanism that "go build" uses to bundle the data.
Note that the test will still work if git is not installed,
but it will simply not check the VCS side.
Finally, we use "go version -m" to check the existing fields,
which is easier than calling the Go APIs directly.
It seems like "go test" passes on yesterday's Go master, now.
So, enable test-gotip again with that commit hash.
Fixes#385.
In particular, internal/abi now has some actual code,
so obfuscating those literals was breaking as expected.
Document how to update the list in the future as well.
The change above gets "go test" to just one test failure on:
go version devel go1.18-578ada410d Tue Nov 9 22:58:24 2021 +0000 linux/amd64
We also move the doc about why we disable GarbleLiterals,
so that it's next to where the disabling happens.
While here, we also rename GarbleLiterals to ObfuscateLiterals,
as we have been trying to move away from "to garble" as a verb.
Finally, limit the verbosity of diffoscope.
One test was failing for me, and diffoscope printed thousands of lines.
Not particularly useful when I'm trying to skim test results.
Usually, seeing a few dozen lines of output is enough.
Updates #385.
Spotted by our friend "go test -race":
WARNING: DATA RACE
Write at 0x0000010522d8 by goroutine 69:
mvdan.cc/garble.readFile()
garble/main_test.go:124 +0x23a
mvdan.cc/garble.binsubstr()
garble/main_test.go:141 +0xc4
github.com/rogpeppe/go-internal/testscript.(*TestScript).run()
github.com/rogpeppe/go-internal@v1.8.1-0.20211023094830-115ce09fd6b4/testscript/testscript.go:496 +0x9e8
[...]
Previous write at 0x0000010522d8 by goroutine 60:
mvdan.cc/garble.readFile()
garble/main_test.go:124 +0x23a
mvdan.cc/garble.binsubstr()
garble/main_test.go:141 +0xc4
github.com/rogpeppe/go-internal/testscript.(*TestScript).run()
github.com/rogpeppe/go-internal@v1.8.1-0.20211023094830-115ce09fd6b4/testscript/testscript.go:496 +0x9e8
[...]
This wasn't a data race that we spotted via failures in practice,
as it only affected test code since July.
The race is due to the fact that each test script runs as a parallel
sub-test within the same Go program, sharing all globals.
As such, a single "cached binary" global is read and written with races.
Moreover, note that the caching always missed.
I briefly rewrote the code to avoid the race via a sync.Map keyed by
absolute filenames, and while that removed the data race,
the caching never actually hit.
To have a cache hit, we need an absolute path to already be in the cache
and for it to not have been modified since it was last cached. That is:
modify-bin-1 foo
binsubstr foo 'abc' # miss
binsubstr foo 'def' # hit; use the cached "/tmp/[...]/foo" entry
modify-bin-2 foo
binsubstr foo 'abc' # miss
However, the test scripts don't do contiguous binsubstr calls like
these. Instead, they join repeated binsubstr calls:
modify-bin-1 foo
binsubstr foo 'abc' 'def' # miss
modify-bin-2 foo
binsubstr foo 'abc' # miss
For that reason, remove the extra code entirely.
I didn't notice any change to the performance of "go test -short"
with a warm build cache, with:
go test -c
./garble.test -test.short #warm cache
benchcmd -n 5 TestShort ./garble.test -test.short
name old time/op new time/op delta
TestShort 4.62s ±12% 4.35s ±12% ~ (p=0.310 n=5+5)
name old user-time/op new user-time/op delta
TestShort 16.8s ± 3% 16.7s ± 3% ~ (p=0.690 n=5+5)
name old sys-time/op new sys-time/op delta
TestShort 7.28s ± 1% 7.26s ± 2% ~ (p=0.841 n=5+5)
name old peak-RSS-bytes new peak-RSS-bytes delta
TestShort 305MB ± 0% 306MB ± 0% ~ (p=0.421 n=5+5)
Finally, start using "go test -race" on Linux on CI,
which should have made the PR back in July red before merging.
In Go 1.18 runtime/internal/sys is replaced with internal/goarch. This
change makes sure that when running garble after the runtime package
change, runtime stripping correctly removes the unused import.
Also bump go-internal version to work with go1.18, but disable go1.18 in
CI tests.
We recently improved the automatic detection of reflection.
However, note that a hint may still be needed in some cases,
in particular when the reflection usage happens in a downstream package.
Keep making that suggestion.
Also mention that we obfuscate one package at a time.
I've also written down #406 with more ideas on how we could polish this
rough edge for end users in the future.
Fixes#403.
Obfuscating newName arguments of linkname directives
with dots in the importpath didn't work before.
We had a test which covers this,
but the corresponding package wasn't actually obfuscated.
Also update linkname directives of public packages,
to allow the package where something is linknamed to to be
obfuscated regardless.
Public packages can now depend on private packages.
We can now use pruned module graphs in go.mod files,
and we no longer need to worry about runtime/internal/sys.
Note that I had to update testdata/mod slightly,
as the new pruned module graphs algorithm downloads an extra go.mod file.
This change also paves the way towards future Go 1.18 support.
Thanks to lu4p for cleaning up two TODOs as well.
Co-Authored-By: lu4p <lu4p@pm.me>
The "reverse" command had many levels of optional arguments:
garble [garble flags] reverse [build flags] [package] [files]
This was pretty confusing,
and could easily lead to people running the command incorrectly:
# note that output.txt isn't a Go package!
garble reverse output.txt
Moreover, it made the handling of Go build flags pretty confusing.
Should the command below work?
garble reverse -tags=mytag
It also made it easy to not notice that one must supply the main package
to properly reverse some text that it produced, like a panic message.
With the package path being implicit,
one could mistakenly provide the wrong package by running garble
in a directory containing a different package.
See #394.
After the last commit, I started seeing seemingly random test failures:
--- FAIL: TestScripts/cgo (1.17s)
testscript.go:397:
> env GOPRIVATE=test/main
> garble build
[stderr]
# runtime/internal/math
EOF
# internal/bytealg
EOF
exit status 2
[exit status 1]
FAIL: testdata/scripts/cgo.txt:3: unexpected command failure
--- FAIL: TestScripts/reflect (8.63s)
testscript.go:397:
> env GOPRIVATE=test/main
> garble build
[stderr]
# math
EOF
exit status 2
[exit status 1]
FAIL: testdata/scripts/reflect.txt:3: unexpected command failure
--- FAIL: TestScripts/linkname (8.72s)
testscript.go:397:
> env GOPRIVATE=test/main
> garble build
[stderr]
# math
EOF
exit status 2
[exit status 1]
FAIL: testdata/scripts/linkname.txt:3: unexpected command failure
After some investigation,
it turned out that concurrent "garble build" processes
were writing to the same build cache paths at the same time.
This is effectively a filesystem race,
and encoding/gob could error when reading partly written files.
To fix this problem,
use a cache path that changes according to the obfuscated build.
See garbleExportFile.
Note that the data we store in that file does not vary with obfuscation.
We could fix the filesystem race by adding locking around the old path.
However, we'll want to cache data that does vary with garble flags,
such as the -debugdir source code.
Functions which use reflection on one of their parameters are,
now added to knownReflectAPIs automatically.
This makes most explicit hints for reflection redundant.
Simple protobuf code now works correctly when obfuscated.
Fixes#162Fixes#373
Now that Go 1.17.x is out,
we no longer need to worry about users on Go 1.15.x.
Since Go 1.16, the best way to install programs has been "go install":
https://golang.org/doc/go1.16#go-command
This method does not interfere with the current module,
and allows selecting a version such as "latest" or "master".
Also bump the gotip commit to the first after the merge window was open,
which simply bumps the internal version from 1.17 to 1.18.
We're leaving GO_COMMIT there for now,
as the 1.18 merge window is currently very noisy with generics changes.
Before, we would just notice direct calls to reflect's TypeOf and
ValueOf. Any other uses of reflection, such as encoding/json or
google.golang.org/protobuf, would require hints as documented by the
README.
Issue #162 outlines some ways we could fix this issue in a general way,
automatically detecting what functions use reflection on their parameters,
even for third party API funcs.
However, that goal is pretty significant in terms of code and effort.
As a temporary improvement, we can expand the list of "known" reflection
APIs via a static table.
Since this table is keyed by "func full name" strings, we could
potentially include third party APIs, such as:
google.golang.org/protobuf/proto.Marshal
However, for now simply include all the std APIs we know about.
If we fail to do the proper fix for automatic detection in the future,
we can then fall back to expanding this global table for third parties.
Update the README's docs, to clarify that the hint is not always
necessary anymore.
Also update the reflect.txt test to stop using the hint for encoding/json,
and to also start testing text/template with a method call.
While at it, I noticed that we weren't testing the println outputs,
as they'd go to stderr - fix that too.
Updates #162.
Go 1.16 just added one flag, -overlay, as documented in
https://golang.org/doc/go1.16#go-command.
We forgot to update these tables for 1.16, though it probably doesn't
matter as this flag is pretty much only used by gopls.
We're just a couple of weeks away from the final release, so
double-check that we still work well with tip.
Update x/tools as well, as it has had some minor fixes for Go 1.17.
Otherwise, the added test case would fail, as we don't modify the C code
and so there would be a name mismatch.
In the far future we might start modifying Go names in C code,
similar to what we did for Go assembly,
but right now that seems out of scope and too complex.
An easier fix is to simply record those (hopefully few) names in ignoreObjects.
While at it, recordReflectArgs started to really outgrow its name, as it
also collected expressions used as constants for literal obfuscation.
Give it a better name.
Fixes#366.
I spent a couple of days trying to obfuscate all of std.
Ultimately I failed at making it fully work,
especially when it comes to the runtime package,
but I did fix a few problems along the way, as seen here.
First, fix the TODO to allow handleDirectives and transformGo to run on
runtime packages as well, if they are considered private. Note that this
is never true right now, but it matters once we remove runtimeRelated.
Second, modify parsedebugvars in a way that doesn't break typechecking.
We can remove AST nodes or even modify them in simple ways,
but if we add new AST nodes after typechecking,
those will lack type information.
We were replacing the entire body, running into that problem.
Instead, carefully cut the body to set some defaults,
but remove everything from the point GODEBUG is read.
Finally, add commented-out debug prints of transformed asm files.
For #193.
With the -literals flag, we try to convert some const declarations to
vars, as long as that doesn't break typechecking. We really only do that
for typed constant strings, really.
There was a quirk: if a numerical constant had a type and used iota, we
would not obfuscate its value, but we would still convert the
declaration from const to var. Since iotas only work within const
declarations, that would break compilation:
> garble -literals build
[stderr]
# test/main
FeWE3zwi.go:19: undefined: iota
exit status 2
To fix the problem, make the logic more conservative: only obfuscate
constant declarations where the values are typed strings, meaning that
any numerical constants are left entirely untouched.
This fixes the build of google.golang.org/protobuf/runtime/protoiface
with -literals turned on.
Back in early April we added initial support for Go 1.17,
working on a commit from master at that time. For that to work, we just
needed to add a couple of packages to runtimeRelated and tweak printFile
a bit to not break the new "//go:build" directives.
A significant amount of changes have landed since, though, and the tests
broke in multiple ways.
Most notably, the new register ABI is enabled by default for GOOS=amd64.
That affected garble indirectly in two ways: there's a new internal
package to add to runtimeRelated, and we must make reverse.txt more
clever in making its output constant across ABIs.
Another noticeable change is that Go 1.17 changes how its own version is
injected into the runtime package. It used to be via a constant in
runtime/internal/sys, such as:
const TheVersion = `devel ...`
Since we couldn't override such constants via the linker's -X flag,
we had to directly alter the declaration while compiling.
Thankfully, Go 1.17 simply uses a "var buildVersion string" in the
runtime package, and its value is injected by the linker.
This means we can now override it with the linker's -X flag.
We make the code to alter TheVersion for Go 1.16 a bit more clever,
to not break the package when building with Go 1.17.
Finally, our hack to work around ambiguous TOOLEXEC_IMPORTPATH values
now only kicks in for non-test packages, since Go 1.17 includes our
upstream fix. Otherwise, some tests would end up with the ".test"
variant suffix added a second time:
test/bar [test/bar.test] [test/bar [test/bar.test].test]
All the code to keep compatibility with Go 1.16.x remains in place.
We're still leaving TODOs to remind ourselves to remove it or simplify
it once we remove support for 1.16.x.
The 1.17 development freeze has already been in place for a month,
and beta1 is due to come this week, so it's unlikely that Go will change
in any considerable way at this point. Hence, we can say that support
for 1.17 is done.
Fixes#347.
Our recent work in fieldToAlias worked well when the embedded field
declaration (using an alias) was in the same package as the use of that
field. We would have the *ast.Ident for the field declaration, so
types.Info.Uses would give us the TypeName for the alias.
Unfortunately, if the declaration was in a dependency package, we did
not have that same *ast.Ident, as we weren't parsing the source code for
dependencies for type-checking. This resulted in us incorrectly
obfuscating the use of such an embedded field:
> garble build
[stderr]
# test/main
JtzmzxWf.go:4: unknown field 'ExternalForeignAlias' in struct literal of type _BdSNiEL.Vcs_smer
To fix this, look through the direct imports of the package defining the
field to find an alias under the exact same name. Not a foolproof
solution, as there's a TODO, but it should work for most cases.
Fixes the obfuscation of google.golang.org/grpc/internal/status, too.
Updates #349.
The added test case, which is obfuscating and linking os/user, would fail
before this fix:
> garble build
[stderr]
# test/main
/usr/lib/go/pkg/tool/linux_amd64/link: running gcc failed: exit status 1
/usr/bin/ld: $WORK/.tmp/go-link-073246656/go.o: in function `Chz0Yfs2._cgo_cmalloc':
go.go:(.text+0x993cc): undefined reference to `Chz0Yfs2.runtime_throw'
/usr/bin/ld: $WORK/.tmp/go-link-073246656/go.o: in function `Chz0Yfs2.tDfhQ8uK':
go.go:(.text+0x99801): undefined reference to `Chz0Yfs2._cgo_runtime_gostring'
/usr/bin/ld: go.go:(.text+0x9982a): undefined reference to `Chz0Yfs2._cgo_runtime_gostring'
/usr/bin/ld: go.go:(.text+0x99853): undefined reference to `Chz0Yfs2._cgo_runtime_gostring'
collect2: error: ld returned 1 exit status
The reason is that we would alter the linkname directives of cgo-generated
code, but we would not obfuscate the code itself at all.
The generated code would end up being transformed into:
//go:linkname zh_oKZIy runtime.throw
func runtime_throw(string)
One can clearly see the error there; handleDirectives obfuscated the
local linkname name, but since transformGo didn't run, the actual Go
declaration was not obfuscated in the same way. Thus, the linker fails
to find a function body for runtime_throw, and fails.
The solution is simple: handleDirectives assumes that it's running on
code being obfuscated, so only run it when transformGo is running.
We can also remove the cgo skip check in handleDirectives, as it never
runs on cgo-generated code now.
Fixes a number of build errors that have been noticed since
907aebd770.
When such an alias name was used to define an embedded field, we handled
that case gracefully via the code using:
tf.info.Uses[node].(*types.TypeName)
Unfortunately, when the same field name was used elsewhere, such as a
composite literal, tf.Info.Uses gave us a *types.Var, not a
*types.TypeName, meaning we could no longer tell if this was an alias,
or what it pointed to.
Thus, we failed to obfuscate the name properly in the added test case:
> garble build
[stderr]
# test/main/sub
xxWZf66u.go:36: unknown field 'foreignAlias' in struct literal of type smhWelwn
It doesn't seem like any of the go/types APIs allows us to obtain the
*types.TypeName directly in this scenario. Thus, use a trick that we
used before: after typechecking, but before obfuscating, record all
embedded struct field *types.Var which are aliases via a map, where the
value holds the *types.TypeName for the alias.
Updates #349.
We already added support for "//go:embed" with string and []byte,
by not obfuscating the "embed" import path.
However, embed.FS was still failing:
> garble build
[stderr]
# test/main
:13: go:embed cannot apply to var of type embed.WtKNvwbN
The compiler detects the type by matching its name to exactly "embed.FS",
so don't obfuscate the name "FS" either.
While at it, ensure that the embed code behaves the same with "go build".
Updates #349.
See the added comment for the rationale. For that same reason, I always
build Go itself via "GOGC=off ./make.bash", as it's noticeably faster.
Before this change:
$ go clean -cache && go test -short
PASS
ok mvdan.cc/garble 35.298s
$ go test -short
PASS
ok mvdan.cc/garble 2.703s
With the change:
$ go clean -cache && go test -short
PASS
ok mvdan.cc/garble 25.323s
$ go test -short
PASS
ok mvdan.cc/garble 2.469s
Incremental test runs with a warm cache are largely unaffected, as those
would run very few of those short-lived and allocation-heavy programs.
However, when the build cache isn't warm (such as when garble itself is
modified), we easily see savings of 20-30%.
We might revisit this in the future if Go's GC gets better in these
situations, which should make "go build" faster. For now, we run our
tests very often, so having them burn a bit less CPU is nice.
Before, "garble build -h" would print the same as "garble -h", which is
too much and confusing, as it doesn't tell us much about "build".
Now it's far better, and includes the output of "go build -h":
$ garble build -h
usage: garble [garble flags] build [arguments]
This command wraps "go build". Below is its help:
usage: go build [-o output] [build flags] [packages]
Run 'go help build' for details.
We do the same for "garble reverse -h", since it doesn't wrap a Go tool
command.
I added a second one recently without noticing, since the original was
too far below and I had written commands above it.
Deduplicate them, and put the new env line as early as possible, to
prevent further issues and confusion.
Every now and then, a CI run would fail:
FAIL: testdata/scripts/reflect.txt:7: unexpected match for ["main.go"] in main
These were rare, and very hard to reproduce or debug.
My best guess is that, since "main.go" is a short string and we use
random eight-character obfuscated filenames ending with ".go", it was
possible that the random filename happened to end in "main" in some
cases.
Given the base64 encoding, the chances of a single suffix collision are
about 0.000006%. Note, however, that a single obfuscated build will most
likely obfuscate many filenames, especially for the tests obfuscating
multiple packages. For a single CI run with many tests across three OSs,
the chances of any collision are likely very low, but realistic.
All this has a simple fix: use longer filenames to match with. We choose
"garble_main.go" since it's long enough, but also because it's still
clear it's a "main" Go file, and it's very unlikely to cause conflicts
with filenames in upstream Go given the "garble_" prefix.