Commit Graph

541 Commits (89b27fa7f944643333f013d6a4d18a6852246c55)
 

Author SHA1 Message Date
Daniel Martí 89b27fa7f9 tweaks thanks to code coverage info
The test package hack still appears to be needed as of Go 1.20.

Interestingly, the cgo filename check does not trigger at all anymore.
Presumably this means that it's not needed at all, and obfuscating code
generated by cgo appears to be fine. Go with that for now.
1 year ago
Daniel Martí e37f39054d update testscript to support code coverage with Go 1.20
See https://github.com/rogpeppe/go-internal/pull/201.
1 year ago
Daniel Martí 859a5877c9 internal/literals: re-enable the seed obfuscator
Now that we dropped Go 1.19, we can use it again,
since the referenced bug was fixed in Go 1.20.

This is a separate commit, as this change does alter the way we
obfuscate Go builds, so it's not just a cleanup.
1 year ago
Daniel Martí 658060851d drop bits of code to support Go 1.19 1 year ago
Daniel Martí b322876efe drop support for Go 1.19
Now that we're done with garble v0.9.x,
v0.10 will only support Go 1.20 as a minimum version.
1 year ago
pagran 9a8608f061
internal/literals: add benchmark to measure the run-time overhead 1 year ago
Daniel Martí 89facf1648 CHANGELOG: prepare for v0.9.3 1 year ago
Daniel Martí 5effe20c19 avoid breaking github.com/davecgh/go-spew
It assumes that reflect.Value has a field named "flag",
which wasn't the case with obfuscated builds as we obfuscated it.

We already treated the reflect package as special,
for instance when not obfuscating Method or MethodByName.
In a similar fashion, mark reflect's rtype and Value types to not be
obfuscated alongside their field names. Note that rtype is the
implementation behind the reflect.Type interface.

This fix is fairly manual and repetitive.
transformCompile, transformLinkname, and transformAsm should all
use the same mechanism to tell if names should be obfuscated.
However, they do not do that right now, and that refactor feels too
risky for a bugfix release. We add more TODOs instead.

We're not adding go-spew to scripts/check-third-party.sh since the
project is largely abandoned. It's not even a Go module yet.
The only broken bit from it is what we've added to our tests.

Fixes #676.
1 year ago
pagran b0f8dfb409
prevent "git apply" from walking parent directories
Even when setting git's current directory to a temporary directory,
it could find a git repository in a parent directory and still malfunction.
Use the --git-dir flag to ensure that walking doesn't happen at all.

While here, ensure that "git apply" is always applying our patches,
and add a regression test to linker.txtar when not testing with -short.
1 year ago
Daniel Martí 95ef0357da support inline and non-spaced comments in assembly
Go contains such inline comments, like:

	crypto/aes/gcm_ppc64x.s:129: VPMSUMD IN, HL, XL // H.lo·H.lo

We didn't notice since these comments were somewhat rare.
While here, "//" does not need to be followed by a space.

The code turns out to be pretty easy with strings.Cut.

Fixes #672.
1 year ago
pagran f7bde1d40e
don't use the current directory to patch and build cmd/link
We were noticing sporadic `go test` failures where running
an obfuscated binary could panic with:

    fatal error: invalid function symbol table

It turns out that this could happen when modinfo.txtar runs first;
when it patched the linker with `git -C apply`, its current directory
had a valid git repository initialized, and apparently that somehow
prevents the patches from being applied.

It's unclear whether this is git's fault or our own, but in any case,
using a temporary working directory is easier and fixes the bug.
Do the same for `go build`, just in case.
1 year ago
Daniel Martí 99d30c033e go.mod: update dependencies before a release 1 year ago
Daniel Martí 3e0d360777 README: update the minimum Go version required
We had dropped supprot for Go 1.18.x back in October,
but completely forgot about the README.
1 year ago
Daniel Martí 08302ed5fc prepare changelog for v0.9.2
I will do the tag tomorrow morning.
1 year ago
Daniel Martí 6d54a77771 update flag tables with Go 1.20
`go help build` now has -C, -cover, -coverpkg, and -pgo.

There are no new boolean flags, but note that -cover is now a common
build flag rather than just a test flag.

While here, sort the lists so that they are easier to skim for missing
items in the future.
1 year ago
Daniel Martí 4919be0658 switch to final Go 1.20 release and re-enable gotip in CI
Go master, the upcoming Go 1.21, has had its merge window open for over
two weeks at this point, and it seems calmer at this point.

ALso update staticcheck to its latest release, which supports Go 1.20.
1 year ago
Daniel Martí d0a6faa4e6 README: rewrite seeds section
A user correctly points out that some sentences were confusing, like:

	It can also make reverse-engineering harder, as an end user could
	guess what version of Go or garble you're using.

Rewrite the whole section. It now also explains what I meant by that
sentence with a practical example.

I also merged the bits about "provide your own seed" and "rotating the
seeds", because they're both talking about the same mechanism.

Fixes #666.
1 year ago
Daniel Martí 2ee9cf7a43 support go:linkname directives pointing at methods
This is not common, but it is done by a few projects.
Namely, github.com/goccy/go-json reached into reflect's guts,
which included a number of methods:

	internal/runtime/rtype.go
	11://go:linkname rtype_Align reflect.(*rtype).Align
	19://go:linkname rtype_FieldAlign reflect.(*rtype).FieldAlign
	27://go:linkname rtype_Method reflect.(*rtype).Method
	35://go:linkname rtype_MethodByName reflect.(*rtype).MethodByName
	[...]

Add tests for such go:linkname directives pointing at methods.
Note that there are two possible symbol string variants;
"pkg/path.(*Receiver).method" for methods with pointer receivers,
and "pkg/path.Receiver.method" for the rest.

We can't assume that the presence of two dots means a method either.
For example, a package path may be "pkg/path.with.dots",
and so "pkg/path.with.dots.SomeFunc" is the function "SomeFunc"
rather than the method "SomeFunc" on a type "dots".
To account for this ambiguity, rather than splitting on the last dot
like we used to, try to find a package path prefix by splitting on an
increasing number of first dots.

This can in theory still be ambiguous. For example,
we could have the package "pkg/path" expose the method "foo.bar",
and the package "pkg/path.foo" expose the func "bar".
Then, the symbol string "pkg/path.foo.bar" could mean either of them.
However, this seems extremely unlikely to happen in practice,
and I'm not sure that Go's toolchain would support it either.

I also noticed that goccy/go-json still failed to build after the fix.
The reason was that the type reflect.rtype wasn't being obfuscated.
We could, and likely should, teach our assembly and linkname
transformers about which names we chose not to obfuscate due to the use
of reflection. However, in this particular case, reflect's own types
can be obfuscated safely, so just do that.

Fixes #656.
1 year ago
Daniel Martí 09a17375e3 reduce verbosity in the last change
Declare the ast.GenDecl node once,
which allows us to deduplicate and inline code.
resolveImportSpec doesn't need to be separate either.

We also don't need explicit panics for unexpected nils;
we can rely on nil checks inserted by the compiler.

Finally, move the bulk of the code outside of the scope.Names loop,
which unindents most of the logic.

While here, add a few more inline docs.
1 year ago
pagran 7c50979899
ensure that all imports are used after obfuscation
garble's -literals flag and its patching of the runtime may leave unused imports.
We used to try to detect those and remove the imports,
but that was still buggy with edge cases like dot imports or renamed imports.

Moreover, it was potentially incorrect.
Completely removing an import from a package means we don't run its init funcs,
which could have side effects changing the behavior of a program.
As an example, database/sql drivers are registered at init time.

Instead, for each import in an obfuscated Go file,
add an unnamed declaration which references the imported package.
This may not be necessary for all imported packages,
as only a minority become unused due to garble,
but it's also relatively harmless to do so.

Fixes #658.
1 year ago
Daniel Martí f34be3f572 CHANGELOG: prepare for v0.9.1 release 1 year ago
Daniel Martí 0ec363d9c8 avoid breaking intrinsics when obfuscating names
We obfuscate import paths as well as their declared names.
The compiler treats some packages and APIs in special ways,
and the way it detects those is by looking at import paths and names.

In the past, we have avoided obfuscating some names like embed.FS or
reflect.Value.MethodByName for this reason. Otherwise,
go:embed or the linker's deadcode elimination might be broken.

This matching by path and name also happens with compiler intrinsics.
Intrinsics allow the compiler to rewrite some standard library calls
with small and efficient assembly, depending on the target GOARCH.
For example, math/bits.TrailingZeros32 gets replaced with ssa.OpCtz32,
which on amd64 may result in using the TZCNTL instruction.

We never noticed that we were breaking many of these intrinsics.
The intrinsics for funcs declared in the runtime and its dependencies
still worked properly, as we do not obfuscate those packages yet.
However, for other packages like math/bits and sync/atomic,
the intrinsics were being entirely disabled due to obfuscated names.

Skipping intrinsics is particularly bad for performance,
and it also leads to slightly larger binaries:

			 │      old      │                 new                 │
			 │     bin-B     │     bin-B      vs base              │
	Build-16   5.450Mi ± ∞ ¹   5.333Mi ± ∞ ¹  -2.15% (p=0.029 n=4)

Finally, the main reason we noticed that intrinsics were broken
is that apparently GOARCH=mips fails to link without them,
as some symbols end up being not defined at all.
This patch fixes builds for the MIPS family of architectures.

Rather than building and linking all of std for every GOARCH,
test that intrinsics work by asking the compiler to print which
intrinsics are being applied, and checking that math/bits gets them.

This fix is relatively unfortunate, as it means we stop obfuscating
about 120 function names and a handful of package paths.
However, fixing builds and intrinsics is much more important.
We can figure out better ways to deal with intrinsics in the future.

Fixes #646.
1 year ago
Daniel Martí 598d5182fb update x/tools version used in go:generate
Fixes running this go:generate line with Go tip.
1 year ago
Daniel Martí 0b096c9e75 generate go_std_tables.go in its entirety
No more having to manually run the script and adapting it to Go code.
1 year ago
Daniel Martí 33ceca7ef8 split runtimeAndDeps and runtimeLinknamed into a separate Go file
The next commit will start generating these via //go:generate,
so this first change keeps the diffs easier to review.
1 year ago
pagran d3c6ed6729
add support for obfuscation of dot-imported string constants 1 year ago
Daniel Martí d4e7abc28c reuse calls to testing.B.TempDir in the build benchmark
Multiple calls to TempDir give new unique temporary directories.
We don't need that, as we already used subdirectories.
1 year ago
Daniel Martí d9e74dabbb CI: add Go 1.20rc3, disable gotip for now
tip is now the start of Go 1.21 as of a couple of days ago.
However, the first week or two is when the biggest changes land,
which means that Go tip is far more prone to bugs and changes that might
break garble.

We'll start tracking tip again in a few weeks, once the dust has settled
and we can look at what changes might have broken garble.
1 year ago
Daniel Martí 71eda055c2
CHANGELOG: prepare for v0.9.0 1 year ago
Daniel Martí 294450fdc3 CHANGELOG: draft for v0.9.0 1 year ago
pagran e04c605c93 preallocate data in shuffle and split literal obfuscator 1 year ago
lu4p 5b2193351f Decrease binary size for -literals
Only string literals over 8 characters in length are now being
obfuscated. This leads to around 20% smaller binaries when building with
-literals.

Fixes #618
1 year ago
Daniel Martí 5dd6b2dc43 support assembly references to package names
Go's package runtime/internal/atomic contains references to functions
which refer to the current package by its package name alone:

	src/runtime/internal/atomic/atomic_loong64.s:   JMP atomic·Load(SB)
	src/runtime/internal/atomic/atomic_loong64.s:   JMP atomic·Load64(SB)
	src/runtime/internal/atomic/atomic_loong64.s:   JMP atomic·Load64(SB)
	src/runtime/internal/atomic/atomic_mips64x.s:   JMP atomic·Load(SB)
	src/runtime/internal/atomic/atomic_mips64x.s:   JMP atomic·Load64(SB)
	src/runtime/internal/atomic/atomic_mips64x.s:   JMP atomic·Load64(SB)

We could only handle unqualified or fully qualified references, like:

	JMP ·Load64(SB)
	JMP runtime∕internal∕atomic·Load64(SB)

Apparently, all three forms are equally valid.
Add a test case and fix it.

I checked whether referencing an imported package by its name worked;
it does not seem to be the case.
This feature appears to be restricted to the current package alone.

While here, we only need goPkgPath when we need to call listPackage.

Fixes #619.
1 year ago
Daniel Martí 7e95fcf1cb detect which objects are global in a simpler way
To detect if an object is global, our previous code did:

	obj.Pkg().Scope().Lookup(obj.Name()) == obj

However, that Lookup call is unnecessary.
It is a map lookup, which is cheap, but not free.
Instead, we can compare the scopes directly:

	obj.Pkg().Scope() == obj.Parent()

While here, reuse the calls to obj.Pkg(),
and avoid using fmt.Sprintf in this hot path.

Looked here since perf on Linux with a garble build
reports that we spend about 2.6% of CPU time in recordedObjectString.
Overall provides a very minor improvement:

	name      old time/op         new time/op         delta
	Build-16          21.8s ± 1%          21.7s ± 0%    ~     (p=0.200 n=4+4)

	name      old bin-B           new bin-B           delta
	Build-16          5.67M ± 0%          5.67M ± 0%    ~     (all equal)

	name      old cached-time/op  new cached-time/op  delta
	Build-16          734ms ± 3%          722ms ± 3%    ~     (p=0.486 n=4+4)

	name      old mallocs/op      new mallocs/op      delta
	Build-16          25.0M ± 0%          24.7M ± 0%  -1.28%  (p=0.029 n=4+4)

	name      old sys-time/op     new sys-time/op     delta
	Build-16          17.0s ± 1%          16.8s ± 1%    ~     (p=0.200 n=4+4)
1 year ago
Daniel Martí 8ea0708bca fail on user packages with build errors early
The added test case would panic, because we would try to hash a name
with a broken package's GarbleActionID, which was empty.

We skipped over all package errors in appendListedPackages because two
kinds of errors were OK in the standard library.
However, this also meant we ignored real errors we should stop at,
because obfuscating those user packages is pointless.

Add more assertions, check for the OK errors explicitly,
and fail on any other error immediately.
Note that, in the process, I also found a bug in cmd/go.

Uncovered by github.com/bytedance/sonic,
whose internal/loader package fails to build on Go 1.20.
1 year ago
Daniel Martí 1c4fe53fc1 skip over comments when obfuscating assembly
github.com/bytedance/sonic has many comments in its assembly files,
and one in particular caused us trouble:

	internal/rt/asm_amd64.s:2:// Code generated by asm2asm, DO NOT EDIT·

Since we looked for "middle dot" characters anywhere,
we would try to load the package "EDIT", which clearly does not exist.

To fix that, start skipping over comments. Thankfully,
Go's assembly syntax only supports "//" line comments,
so it's enough to read one line at a time and skip them quickly.
We also longer need a regular expression to find #include lines.

While here, two other minor bits of cleanup.
First, rename transformGo to transformGoFile, for clarity.
Second, use ".s" extensions for obfuscated assembly filenames,
just like we do with the comiler and ".go" files.

Updates #621.
1 year ago
Daniel Martí 5ede145791 make seed.txtar less likely to flake
Our test was essentially a "birthday paradox" simulation,
with 8 people in total and 7 possible birthdays.
Instead of checking whether two or more people share a birthday,
we tested whether six or more people shared a birthday.

Per a math/rand simulation with ten million iterations,
that test has a 0.13% flake rate.
That is low enough that we didn't immediately notice,
but high enough to realistically cause problems at some point.

My previous math in the test comment was clearly wrong.
First, we recently lowered the range by 1,
and `(1 / 7) ** 6` is three times higher as a probability.
Second, that's the probability of 6 people to share the same birthday,
without taking into account the extra two people in the total of 8.
That clearly drives the chances up, per the birthday paradox.

Double the number of people to 16,
and look for 14 or more to share a birthday.
The simulation returns a 0% chance at that point,
so it's extremely unlikely to cause issues.

Fixes #634.
1 year ago
pagran 45394ac13f
update linker patches from our fork 1 year ago
Daniel Martí 6f1959ca20 README: -tiny does a bit more now
On my machine, garble itself goes from about 5.7MiB to about 4.8 MiB
when being built with `garble build` and `garble build -tiny`,
which is roughly a 16% decrease in size.
The 2-5% figure is dated at this point, as we do better than that now.
1 year ago
pagran 22c177f088
remove all unexported func names with -tiny via the linker
The patch to the linker does this when generating the pclntab,
which is the binary section containing func names.
When `-tiny` is being used, we look for unexported funcs,
and set their names to the offset `0` - a shared empty string.

We also avoid including the original name in the binary,
which saves a significant amount of space.
The following stats were collected on GOOS=linux,
which show that `-tiny` is now about 4% smaller:

	go build                  1203067
	garble build               782336
	(old) garble -tiny build   688128
	(new) garble -tiny build   659456
1 year ago
Daniel Martí 417bcf27bb support referencing package paths with periods in assembly
The test case is lifted from github.com/bytedance/sonic/internal/native,
which had the following line:

	JMP github·com∕bytedance∕sonic∕internal∕native∕avx2·__quote(SB)

Updates #621.
1 year ago
pagran 6ace03322f
patch and rebuild cmd/link to modify the magic value in pclntab
This value is hard-coded in the linker and written in a header.
We could rewrite the final binary, like we used to do with import paths,
but that would require once again maintaining libraries to do so.

Instead, we're now modifying the linker to do what we want.
It's not particularly hard, as every Go install has its source code,
and rebuilding a slightly modified linker only takes a few seconds at most.

Thanks to `go build -overlay`, we only need to copy the files we modify,
and right now we're just modifying one file in the toolchain.
We use a git patch, as the change is fairly static and small,
and the patch is easier to understand and maintain.

The other side of this change is in the runtime,
as it also hard-codes the magic value when loading information.
We modify the code via syntax trees in that case, like `-tiny` does,
because the change is tiny (one literal) and the affected lines of code
are modified regularly between major Go releases.

Since rebuilding a slightly modified linker can take a few seconds,
and Go's build cache does not cache linked binaries,
we keep our own cached version of the rebuilt binary in `os.UserCacheDir`.

The feature isn't perfect, and will be improved in the future.
See the TODOs about the added dependency on `git`,
or how we are currently only able to cache one linker binary at once.

Fixes #622.
1 year ago
Daniel Martí 05d9b4ed26 avoid call to loadSharedCache for toolexec calls we skip
We were doing too much work for tools we don't need to wrap at all,
such as `go tool pack` or `go tool buildid`, which get run about as
often as the compiler does.

The overhead is small per call, but it adds up.

	name      old time/op         new time/op         delta
	Build-16          20.7s ± 0%          20.5s ± 1%    ~     (p=0.057 n=4+4)

	name      old bin-B           new bin-B           delta
	Build-16          5.67M ± 0%          5.66M ± 0%  -0.07%  (p=0.029 n=4+4)

	name      old cached-time/op  new cached-time/op  delta
	Build-16          707ms ± 0%          705ms ± 2%    ~     (p=0.886 n=4+4)

	name      old mallocs/op      new mallocs/op      delta
	Build-16          25.0M ± 0%          24.9M ± 0%  -0.26%  (p=0.029 n=4+4)

	name      old sys-time/op     new sys-time/op     delta
	Build-16          8.32s ± 2%          7.90s ± 3%  -5.05%  (p=0.029 n=4+4)
2 years ago
Daniel Martí 21a5eb1720 simplify types.TypeName panic logic
The logic

	if X || Y {
		if X {
			<X panic>
		}
		<Y panic>
	}

is equivalent to, and easier to read as:

	if X {
		<X panic>
	}
	if Y {
		<Y panic>
	}

It probably evolved unintentionally to be this way.
2 years ago
Daniel Martí 9a0d48c27e use errors.Is for listPackage errors 2 years ago
Daniel Martí 41df1f8725 slightly reduce the range of hashed name lengths
v0.8.0 was improved to obfuscate names so that they didn't all end up
with the same length. We went from a length fixed to 8 to a range
between 8 and 15.

Since the new mechanism chooses a length evenly between 8 and 15,
the average hashed name length went from 8 to 11.5.

While this improved obfuscation, it also increased the size of
obfuscated binaries by quite a bit. We do need to use a reasonably large
length to avoid collisions within packages, but we also don't want it to
be large enough to cause too much bloat.

Reduce the minimum and maximum from 8 and 15 to 6 and 12. This means
that the average goes fro 11.5 to 9, much closer to the old average.
We could consider a range of 4 to 12, but 4 bytes is short enough where
collisions become likely in practice, even if it's just the minimum.
And a range of 6 to 10 shrinks the range a bit too much.

	name      old time/op         new time/op         delta
	Build-16          20.6s ± 0%          20.6s ± 0%    ~     (p=0.421 n=5+5)

	name      old bin-B           new bin-B           delta
	Build-16          5.77M ± 0%          5.66M ± 0%  -1.92%  (p=0.008 n=5+5)

	name      old cached-time/op  new cached-time/op  delta
	Build-16          705ms ± 4%          703ms ± 2%    ~     (p=0.690 n=5+5)

	name      old mallocs/op      new mallocs/op      delta
	Build-16          25.0M ± 0%          25.0M ± 0%  -0.08%  (p=0.008 n=5+5)

	name      old sys-time/op     new sys-time/op     delta
	Build-16          8.11s ± 2%          8.11s ± 2%    ~     (p=0.841 n=5+5)

Updates #618.
2 years ago
Daniel Martí 7ead5998bc drop shadowed and unused global consts
I forgot these in the recent hash length randomness refactor.
2 years ago
Daniel Martí d955196470 avoid using math/rand's global funcs like Seed and Intn
Go 1.20 is starting to deprecate the use of math/rand's global state,
per https://go.dev/issue/56319 and https://go.dev/issue/20661.
The reasoning is sound:

	Deprecated: Programs that call Seed and then expect a specific sequence
	of results from the global random source (using functions such as Int)
	can be broken when a dependency changes how much it consumes from the
	global random source. To avoid such breakages, programs that need a
	specific result sequence should use NewRand(NewSource(seed)) to obtain a
	random generator that other packages cannot access.

Aside from the tests, we used math/rand only for obfuscating literals,
which caused a deterministic series of calls like Intn. Our call to Seed
was also deterministic, per either GarbleActionID or the -seed flag.

However, our determinism was fragile. If any of our dependencies or
other packages made any calls to math/rand's global funcs, then our
determinism could be broken entirely, and it's hard to notice.

Start using separate math/rand.Rand objects for each use case.
Also make uses of crypto/rand use "cryptorand" for consistency.

Note that this requires a bit of a refactor in internal/literals
to start passing around Rand objects. We also do away with unnecessary
short funcs, especially since math/rand's Read never errors,
and we can obtain a byte via math/rand's Uint32.
2 years ago
Daniel Martí fa29c14e4b README: clarify that -literals affects -ldflags=-X too 2 years ago
Daniel Martí 98466a1f64 scripts: use -trimpath for "go build" in check-third-party
Since "garble build" first performs a Go build with -trimpath,
it helps to also use the flag with the "go build" step.
This way we get more build cache hits, as building a Go package with and
without the flag results in two separate builds.

Before:

	$ go clean -cache && time ./scripts/check-third-party.sh

	real  0m41.844s
	user  2m17.791s
	sys   0m35.440s

After:

	$ go clean -cache && time ./scripts/check-third-party.sh

	real  0m33.983s
	user  1m50.596s
	sys   0m28.499s
2 years ago