Up until now, the new SSA reflection detection relied on call sites
to propagate which objects (named types, struct fields) used reflection.
For example, given the code:
json.Marshal(new(T))
we would first record that json.Marshal calls reflect.TypeOf,
and then that the user's code called json.Marshal with the type *T.
However, this would not catch a slight variation on the above:
var t T
reflect.TypeOf(t)
t.foo = struct{bar int}{}
Here, type T's fields such as "foo" and "bar" are not obfuscated,
since our logic sees the call site and marks the type T recursively.
However, the unnamed `struct{bar int}` type was still obfuscated,
causing errors such as:
cannot use struct{uKGvcJvD24 int}{} (value of type struct{uKGvcJvD24 int}) as struct{bar int} value in assignment
The solution is to teach the analysis about *ssa.Store instructions
in a similar way to how it already knows about *ssa.Call instructions.
If we see a store where the destination type is marked for reflection,
then we mark the source type as well, fixing the bug above.
This fixes obfuscating github.com/gogo/protobuf/proto.
A number of other Go modules fail with similar errors,
and they look like very similar bugs,
but this particular fix doesn't apply to them.
Future incremental fixes will try to deal with those extra cases.
Fixes#685.
Update CI to use a newer version of Go master,
now that we're already getting release candidates.
Look at the diffs between Go 1.20 and master of `go help build`
and `go help testflag`, and add two flags that were recently added.
While here, bump a hopeful TODO for a feature request,
since that one definitely did not happen for 1.21.
First, teach scripts/gen-go-std-tables.sh to omit test packages,
since runtime/metrics_test would always result in an error.
Instead, make transformLinkname explicitly skip that package,
leaving a comment about a potential improvement if needed.
Second, the only remaining "not found" error we had was "maps" on 1.20,
so rewrite that check based on ImportPath and GoVersionSemver.
Third, detect packages with the "exclude all Go files" error
by looking at CompiledGoFiles and IgnoredGoFiles, which is less brittle.
This means that we are no longer doing any filtering on pkg.Error.Err,
which means we are less likely to break with Go error message changes.
Fourth, the check on pkg.Incomplete is now obsolete given the above,
meaning that the CompiledGoFiles length check is plenty.
Finally, stop trying to be clever about how we print errors.
Now that we're no longer skipping packages based on pkg.Error values,
printing pkg.DepsErrors was causing duplicate messages in the output.
Simply print pkg.Error values with only minimal tweaks:
including the position if there is any, and avoiding double newlines.
Overall, this makes our logic a lot less complicated,
and garble still works the way we want it to.
computeLinkerVariableStrinsg had an unusedargument.
Only skip obfuscating the name "FS" in the "embed" package.
The reflect methods no longer use the transformer receiver type,
so that TODO now feels unnecessary. Many methods need to be aware
of what the current types.Package is, and that seems reasonable.
We no longer use writeFileExclusive for our own cache on disk,
so the TODO about using locking or atomic writes is no longer relevant.
Our cache is now robust against different Go package build inputs
which result in exactly the same build outputs.
In the past, this caused Go's cache to be filled but not ours.
In the present, our cache is just as resilient as Go's is.
This means we now have a unified cache directory for garble,
which is now documented in the README.
I considered using the same hash-based cache used for pkgCache,
but decided against it since that cache implementation only stores
regular files without any executable bits set.
We could force the cache package to do what we want here,
but I'm leaning against it for now given that single files work OK.
Some users had been running into "cannot load cache entry" errors,
which could happen if garble's cache files in GOCACHE were removed
when Go's own cache files were not.
Now that we've moved to our own separate cache directory,
and that we've refactored the codebase to depend less on globals
and no longer assume that we're loading info for the current package,
we can now compute a pkgCache entry for a dependency if needed.
We add a pkgCache.CopyFrom method to be able to append map entries
from one pkgCache to another without needing an encoding/gob roundtrip.
We also add a parseFiles helper, since we now have three bits of code
which need to parse a list of Go files from disk.
Fixes#708.
loadPkgCache also uses different pkgCache variables
depending on whether it finds a direct cache hit or not.
Now we only initialize an entirely new pkgCache
with the first two ReflectAPIs entries when we get a direct cache miss,
since a direct cache hit will already load those from the cache.
It is true that each garble process only obfuscates up to one package,
which is why we made them globals to begin with.
However, garble does quite a lot more now,
such as reversing the obfuscation of many packages at once.
Having a global "current package" variable makes mistakes easier.
Some funcs, like those in transformFuncs, are now transformer methods.
We've been obfuscating all linknamed names for a while now,
so the part in the docs about "recording" is no longer true.
All it does is transform the directives to use obfuscated names.
Give it a better name and rewrite the docs.
The name and docs on that func were wildly out of date,
since it no longer has anything to do with reflection at all.
We only use the linkerVariableStrings map with -literals,
so we can avoid the call entirely if the flag isn't set.
Neither of them has anything to do with transforming Go code;
they simply load or compute the information necessary for doing so.
Split typecheck into two functions as well.
The new typecheck function only does typechecking and nothing else.
A new comptueFieldToStruct func fills the fieldToStruct map,
which depends on typecheck, but is not needed when computing pkgCache.
This isolation also forces us to separate the code that fills pkgCache
from the code that fills the in-memory-only maps in transformer,
removing the need for the NOTE that we had left around as a reminder.
That is, stop reusing "transformer" as the receiver on methods,
and stop writing the results to the global curPkgCache struct.
Soon we will need to support computing pkgCache for any dependency,
not just the current package, to make the caching properly robust.
This allows us to fill reflectInspector with different values.
The explicit isolation also helps prevent bugs.
For instance, we were calling recursivelyRecordAsNotObfuscated from
transformCompile, which happens after we have loaded or saved pkgCache.
Meaning, the current package sees a larger pkgCache than its dependents.
In this particular case it wasn't causing any bugs,
since the two reflect types in question only had unexported fields,
but it's still good to treat pkgCache as read-only in transformCompile.
To properly make our cache robust, we'll need to be able to compute
cache entries for dependencies as needed if they are missing.
So we'll need to create more of these struct values in the code.
Rename cachedOutput to curPkgCache, to clarify that it relates
to the current package.
While here, remove the "known" prefix on all pkgCache fields.
All of the names still make perfect sense without it.
Per the TODOs that I left myself in the last commit.
As expected, this change allows tidying up the code a bit,
makes our use of caching a bit more consistent,
and also allows us to load the current package from the cache.
For each Go package we obfuscate, we need to store information about
how we obfuscated it, which is needed when obfuscating its dependents.
For example, if A depends on B to use the type B.Foo, A needs to know
whether or not B.Foo was obfuscated; it depends on B's use of reflect.
We record this information in a gob file, which is cached on disk.
To avoid rolling our own custom cache, and since garble is so closely
connected with cmd/go already, we piggybacked off of Go's GOCACHE.
In particular, for each build cache entry per `go list`'s Export field,
we would store a "garble" sibling file with that gob content.
However, this was brittle for two reasons:
1) We were doing this without cmd/go's permission or knowledge.
We were careful to use filename suffixes similar to Export files,
meaning that `go clean` and other commands would treat them the same.
However, this could confuse cmd/go at any point in the future.
2) cmd/go trims cache entries in GOCACHE regularly, to keep the size of
the build and test caches under control. Right now, this means that
every 24h, any file not accessed in the last five days is deleted.
However, that trimming heuristic is done per-file.
If the trimming removed Garble's sibling file but not the original
Export file, this could cause errors such as
"cannot load garble export file" which users already ran into.
Instead, start using github.com/rogpeppe/go-internal/cache,
an exported copy of cmd/go's own cache implementation for GOCACHE.
Since we need an entirely separate directory, we introduce GARBLE_CACHE,
defaulting to the "garble" directory inside the user's cache directory.
For example, on Linux this would be ~/.cache/garble.
Inside GARBLE_CACHE, our gob file cache will be under "build",
which helps clarify that this cache is used when obfuscating Go builds,
and allows placing other kinds of caches inside GARBLE_CACHE.
For example, we already have a need for storing linker binaries,
which for now still use their own caching mechanism.
This commit does not make our cache properly resistant to removed files.
The proof is that our seed.txtar testscript still fails the second case.
However, we do rewrite all of our caching logic away from Export files,
which in itself is a considerable refactor, and we add a few TODOs.
One notable change is how we load gob files from dependencies
when building the cache entry for the current package.
We used to load the gob files from all packages in the Deps field.
However, that is the list of all _transitive_ dependencies.
Since these gob files are already flat, meaning they contain information
about all of their transitive dependencies as well, we need only load
the gob files from the direct dependencies, the Imports field.
Performance is largely unchanged, since the behavior is similar.
However, the change from Deps to Imports saves us some work,
which can be seen in the reduced mallocs per obfuscated build.
It's unclear why the binary size isn't stable.
When reverting the Deps to Imports change, it then settles at 5.386Mi,
which is almost exactly in between the two measurements below.
I'm not sure why, but that metric appears to be slightly unstable.
goos: linux
goarch: amd64
pkg: mvdan.cc/garble
cpu: AMD Ryzen 7 PRO 5850U with Radeon Graphics
│ old │ new │
│ sec/op │ sec/op vs base │
Build-8 11.09 ± 1% 11.08 ± 1% ~ (p=0.796 n=10)
│ old │ new │
│ bin-B │ bin-B vs base │
Build-8 5.390Mi ± 0% 5.382Mi ± 0% -0.14% (p=0.000 n=10)
│ old │ new │
│ cached-sec/op │ cached-sec/op vs base │
Build-8 415.5m ± 4% 421.6m ± 1% ~ (p=0.190 n=10)
│ old │ new │
│ mallocs/op │ mallocs/op vs base │
Build-8 35.43M ± 0% 34.05M ± 0% -3.89% (p=0.000 n=10)
│ old │ new │
│ sys-sec/op │ sys-sec/op vs base │
Build-8 5.662 ± 1% 5.701 ± 2% ~ (p=0.280 n=10)
This is in preparation for the switch to Go's cache package,
whose ActionID type is also a full sha256 hash with 32 bytes.
We were using "short" hashes as shown by `go tool buildid`,
since that was consistent and 15 bytes was generally enough.
First, rename "component" to "hash", since it's shorter and more useful.
A full build ID is two or four hashes joined with slashes.
Second, add sanity checks that buildIDHashLength is being followed.
Otherwise the use of []byte could lead to human error.
Third, move all the hash encoding and decoding logic together.
We first called the typecheck method, which starts filling cachedOutput
with information from the current package, and later we would load the
gob files for all dependencies via loadCachedOutputs.
This was a bit confusing; instead, load the cached gob files first,
and then do all the operations which fill information for curPkg.
Similarly, we were waiting until the very end of transformCompile to
write curPkg's cachedOutput gob file to the disk cache.
We can write the file at an earlier point, before we have obfuscated and
re-printed all Go files for the current package.
We can also write the file before other work like processImportCfg.
None of these changes should affect garble's behavior,
but they will make the cache redesign for #708 easier.
See the added comment and test, which failed before the fix when
the encoded certificate was decoded.
It took a while to find the culprit in asn1, but in hindsight it's easy:
case reflect.Slice:
if t.Elem().Kind() == reflect.Uint8 {
return false, TagOctetString, false, true
}
if strings.HasSuffix(t.Name(), "SET") {
return false, TagSet, true, true
}
return false, TagSequence, true, true
Fixes#711.
printOneCgoTraceback now returns a boolean rather than an int.
Since we need to have different logic based on the Go version,
and toolchainVersionSemver was only set for the main process,
move the string to the shared cache global.
This is a nice thing to do anyway, to reduce the number of globals.
While here, update actions/setup-go to v4, which starts caching
GOMODCACHE and GOCACHE by default now.
Disable it, because it still doesn't help in our case,
and GitHub's Actions caching is still really inefficient.
And update staticcheck too.
The seedFlag.random field had never worked,
as my refactor in December 2021 never set it to true.
Even if the boolean was working, we only printed the random seed
when we failed. It's still useful to see it when a build succeeds,
for example when wanting to reproduce the same binary
or when wanting to reverse a panic from the produced binary.
Add a test this time.
Fixes#696.
Added in Go 1.19, types like sync/atomic.Uint64 are handy,
because they ensure proper alignment even on 32-bit GOOSes.
However, this was done via a magic `type align64 struct{}`,
which the compiler spotted by name.
To keep that magic working, do not obfuscate the name.
Neither package path was being obfuscated,
as both packages contain compiler intrinsics already.
Fixes#686.
We're building the linker binary for the host GOOS,
not the target GOOS that we happen to be building for.
I noticed that, after running `go test`, my garble cache
would contain both link and link.exe, which made no sense
as I run linux and not windows.
`go env` has GOHOSTOS to mirror GOOS, but there is no
GOHOSTEXE to mirror GOEXE, so we reconstruct it from
runtime.GOOS, which is equivalent to GOHOSTOS.
Add a regression test as well.
When we use `go list` on the standard library, we need to be careful
about what flags are passed from the top-level build command,
because some flags are not going to be appropriate.
In particular, GOFLAGS=-modfile=... resulted in a failure,
reproduced via the GOFLAGS variable added to linker.txtar:
go: inconsistent vendoring in /home/mvdan/tip/src:
golang.org/x/crypto@v0.5.1-0.20230203195927-310bfa40f1e4: is marked as explicit in vendor/modules.txt, but not explicitly required in go.mod
golang.org/x/net@v0.7.0: is marked as explicit in vendor/modules.txt, but not explicitly required in go.mod
golang.org/x/sys@v0.5.1-0.20230208141308-4fee21c92339: is marked as explicit in vendor/modules.txt, but not explicitly required in go.mod
golang.org/x/text@v0.7.1-0.20230207171107-30dadde3188b: is marked as explicit in vendor/modules.txt, but not explicitly required in go.mod
To ignore the vendor directory, use -mod=readonly or -mod=mod.
To sync the vendor directory, run:
go mod vendor
To work around this problem, reset the -mod and -modfile flags when
calling "go list" on the standard library, as those are the only two
flags which alter how we load the main module in a build.
The code which builds a modified cmd/link has a similar problem;
it already reset GOOS and GOARCH, but it could similarly run into
problems if other env vars like GOFLAGS were set.
To be on the safe side, we also disable GOENV and GOEXPERIMENT,
which we borrow from Go's bootstrapping commands.
At linker stage, we now encrypt funcInfo.entryoff value with a simple algorithm (1 xor + 1 mul).
This makes it harder to relate function metadata (e.g. name) to function itself in binary, almost without affecting performance.
The test package hack still appears to be needed as of Go 1.20.
Interestingly, the cgo filename check does not trigger at all anymore.
Presumably this means that it's not needed at all, and obfuscating code
generated by cgo appears to be fine. Go with that for now.
It assumes that reflect.Value has a field named "flag",
which wasn't the case with obfuscated builds as we obfuscated it.
We already treated the reflect package as special,
for instance when not obfuscating Method or MethodByName.
In a similar fashion, mark reflect's rtype and Value types to not be
obfuscated alongside their field names. Note that rtype is the
implementation behind the reflect.Type interface.
This fix is fairly manual and repetitive.
transformCompile, transformLinkname, and transformAsm should all
use the same mechanism to tell if names should be obfuscated.
However, they do not do that right now, and that refactor feels too
risky for a bugfix release. We add more TODOs instead.
We're not adding go-spew to scripts/check-third-party.sh since the
project is largely abandoned. It's not even a Go module yet.
The only broken bit from it is what we've added to our tests.
Fixes#676.
Go contains such inline comments, like:
crypto/aes/gcm_ppc64x.s:129: VPMSUMD IN, HL, XL // H.lo·H.lo
We didn't notice since these comments were somewhat rare.
While here, "//" does not need to be followed by a space.
The code turns out to be pretty easy with strings.Cut.
Fixes#672.
`go help build` now has -C, -cover, -coverpkg, and -pgo.
There are no new boolean flags, but note that -cover is now a common
build flag rather than just a test flag.
While here, sort the lists so that they are easier to skim for missing
items in the future.
Go master, the upcoming Go 1.21, has had its merge window open for over
two weeks at this point, and it seems calmer at this point.
ALso update staticcheck to its latest release, which supports Go 1.20.
This is not common, but it is done by a few projects.
Namely, github.com/goccy/go-json reached into reflect's guts,
which included a number of methods:
internal/runtime/rtype.go
11://go:linkname rtype_Align reflect.(*rtype).Align
19://go:linkname rtype_FieldAlign reflect.(*rtype).FieldAlign
27://go:linkname rtype_Method reflect.(*rtype).Method
35://go:linkname rtype_MethodByName reflect.(*rtype).MethodByName
[...]
Add tests for such go:linkname directives pointing at methods.
Note that there are two possible symbol string variants;
"pkg/path.(*Receiver).method" for methods with pointer receivers,
and "pkg/path.Receiver.method" for the rest.
We can't assume that the presence of two dots means a method either.
For example, a package path may be "pkg/path.with.dots",
and so "pkg/path.with.dots.SomeFunc" is the function "SomeFunc"
rather than the method "SomeFunc" on a type "dots".
To account for this ambiguity, rather than splitting on the last dot
like we used to, try to find a package path prefix by splitting on an
increasing number of first dots.
This can in theory still be ambiguous. For example,
we could have the package "pkg/path" expose the method "foo.bar",
and the package "pkg/path.foo" expose the func "bar".
Then, the symbol string "pkg/path.foo.bar" could mean either of them.
However, this seems extremely unlikely to happen in practice,
and I'm not sure that Go's toolchain would support it either.
I also noticed that goccy/go-json still failed to build after the fix.
The reason was that the type reflect.rtype wasn't being obfuscated.
We could, and likely should, teach our assembly and linkname
transformers about which names we chose not to obfuscate due to the use
of reflection. However, in this particular case, reflect's own types
can be obfuscated safely, so just do that.
Fixes#656.
Declare the ast.GenDecl node once,
which allows us to deduplicate and inline code.
resolveImportSpec doesn't need to be separate either.
We also don't need explicit panics for unexpected nils;
we can rely on nil checks inserted by the compiler.
Finally, move the bulk of the code outside of the scope.Names loop,
which unindents most of the logic.
While here, add a few more inline docs.
garble's -literals flag and its patching of the runtime may leave unused imports.
We used to try to detect those and remove the imports,
but that was still buggy with edge cases like dot imports or renamed imports.
Moreover, it was potentially incorrect.
Completely removing an import from a package means we don't run its init funcs,
which could have side effects changing the behavior of a program.
As an example, database/sql drivers are registered at init time.
Instead, for each import in an obfuscated Go file,
add an unnamed declaration which references the imported package.
This may not be necessary for all imported packages,
as only a minority become unused due to garble,
but it's also relatively harmless to do so.
Fixes#658.
We obfuscate import paths as well as their declared names.
The compiler treats some packages and APIs in special ways,
and the way it detects those is by looking at import paths and names.
In the past, we have avoided obfuscating some names like embed.FS or
reflect.Value.MethodByName for this reason. Otherwise,
go:embed or the linker's deadcode elimination might be broken.
This matching by path and name also happens with compiler intrinsics.
Intrinsics allow the compiler to rewrite some standard library calls
with small and efficient assembly, depending on the target GOARCH.
For example, math/bits.TrailingZeros32 gets replaced with ssa.OpCtz32,
which on amd64 may result in using the TZCNTL instruction.
We never noticed that we were breaking many of these intrinsics.
The intrinsics for funcs declared in the runtime and its dependencies
still worked properly, as we do not obfuscate those packages yet.
However, for other packages like math/bits and sync/atomic,
the intrinsics were being entirely disabled due to obfuscated names.
Skipping intrinsics is particularly bad for performance,
and it also leads to slightly larger binaries:
│ old │ new │
│ bin-B │ bin-B vs base │
Build-16 5.450Mi ± ∞ ¹ 5.333Mi ± ∞ ¹ -2.15% (p=0.029 n=4)
Finally, the main reason we noticed that intrinsics were broken
is that apparently GOARCH=mips fails to link without them,
as some symbols end up being not defined at all.
This patch fixes builds for the MIPS family of architectures.
Rather than building and linking all of std for every GOARCH,
test that intrinsics work by asking the compiler to print which
intrinsics are being applied, and checking that math/bits gets them.
This fix is relatively unfortunate, as it means we stop obfuscating
about 120 function names and a handful of package paths.
However, fixing builds and intrinsics is much more important.
We can figure out better ways to deal with intrinsics in the future.
Fixes#646.
Go's package runtime/internal/atomic contains references to functions
which refer to the current package by its package name alone:
src/runtime/internal/atomic/atomic_loong64.s: JMP atomic·Load(SB)
src/runtime/internal/atomic/atomic_loong64.s: JMP atomic·Load64(SB)
src/runtime/internal/atomic/atomic_loong64.s: JMP atomic·Load64(SB)
src/runtime/internal/atomic/atomic_mips64x.s: JMP atomic·Load(SB)
src/runtime/internal/atomic/atomic_mips64x.s: JMP atomic·Load64(SB)
src/runtime/internal/atomic/atomic_mips64x.s: JMP atomic·Load64(SB)
We could only handle unqualified or fully qualified references, like:
JMP ·Load64(SB)
JMP runtime∕internal∕atomic·Load64(SB)
Apparently, all three forms are equally valid.
Add a test case and fix it.
I checked whether referencing an imported package by its name worked;
it does not seem to be the case.
This feature appears to be restricted to the current package alone.
While here, we only need goPkgPath when we need to call listPackage.
Fixes#619.
To detect if an object is global, our previous code did:
obj.Pkg().Scope().Lookup(obj.Name()) == obj
However, that Lookup call is unnecessary.
It is a map lookup, which is cheap, but not free.
Instead, we can compare the scopes directly:
obj.Pkg().Scope() == obj.Parent()
While here, reuse the calls to obj.Pkg(),
and avoid using fmt.Sprintf in this hot path.
Looked here since perf on Linux with a garble build
reports that we spend about 2.6% of CPU time in recordedObjectString.
Overall provides a very minor improvement:
name old time/op new time/op delta
Build-16 21.8s ± 1% 21.7s ± 0% ~ (p=0.200 n=4+4)
name old bin-B new bin-B delta
Build-16 5.67M ± 0% 5.67M ± 0% ~ (all equal)
name old cached-time/op new cached-time/op delta
Build-16 734ms ± 3% 722ms ± 3% ~ (p=0.486 n=4+4)
name old mallocs/op new mallocs/op delta
Build-16 25.0M ± 0% 24.7M ± 0% -1.28% (p=0.029 n=4+4)
name old sys-time/op new sys-time/op delta
Build-16 17.0s ± 1% 16.8s ± 1% ~ (p=0.200 n=4+4)
github.com/bytedance/sonic has many comments in its assembly files,
and one in particular caused us trouble:
internal/rt/asm_amd64.s:2:// Code generated by asm2asm, DO NOT EDIT·
Since we looked for "middle dot" characters anywhere,
we would try to load the package "EDIT", which clearly does not exist.
To fix that, start skipping over comments. Thankfully,
Go's assembly syntax only supports "//" line comments,
so it's enough to read one line at a time and skip them quickly.
We also longer need a regular expression to find #include lines.
While here, two other minor bits of cleanup.
First, rename transformGo to transformGoFile, for clarity.
Second, use ".s" extensions for obfuscated assembly filenames,
just like we do with the comiler and ".go" files.
Updates #621.
The patch to the linker does this when generating the pclntab,
which is the binary section containing func names.
When `-tiny` is being used, we look for unexported funcs,
and set their names to the offset `0` - a shared empty string.
We also avoid including the original name in the binary,
which saves a significant amount of space.
The following stats were collected on GOOS=linux,
which show that `-tiny` is now about 4% smaller:
go build 1203067
garble build 782336
(old) garble -tiny build 688128
(new) garble -tiny build 659456