Commit Graph

623 Commits (126618a0d5971456c7bba5a6f4081d432d812083)
 

Author SHA1 Message Date
Daniel Martí 09a17375e3 reduce verbosity in the last change
Declare the ast.GenDecl node once,
which allows us to deduplicate and inline code.
resolveImportSpec doesn't need to be separate either.

We also don't need explicit panics for unexpected nils;
we can rely on nil checks inserted by the compiler.

Finally, move the bulk of the code outside of the scope.Names loop,
which unindents most of the logic.

While here, add a few more inline docs.
1 year ago
pagran 7c50979899
ensure that all imports are used after obfuscation
garble's -literals flag and its patching of the runtime may leave unused imports.
We used to try to detect those and remove the imports,
but that was still buggy with edge cases like dot imports or renamed imports.

Moreover, it was potentially incorrect.
Completely removing an import from a package means we don't run its init funcs,
which could have side effects changing the behavior of a program.
As an example, database/sql drivers are registered at init time.

Instead, for each import in an obfuscated Go file,
add an unnamed declaration which references the imported package.
This may not be necessary for all imported packages,
as only a minority become unused due to garble,
but it's also relatively harmless to do so.

Fixes #658.
1 year ago
Daniel Martí f34be3f572 CHANGELOG: prepare for v0.9.1 release 1 year ago
Daniel Martí 0ec363d9c8 avoid breaking intrinsics when obfuscating names
We obfuscate import paths as well as their declared names.
The compiler treats some packages and APIs in special ways,
and the way it detects those is by looking at import paths and names.

In the past, we have avoided obfuscating some names like embed.FS or
reflect.Value.MethodByName for this reason. Otherwise,
go:embed or the linker's deadcode elimination might be broken.

This matching by path and name also happens with compiler intrinsics.
Intrinsics allow the compiler to rewrite some standard library calls
with small and efficient assembly, depending on the target GOARCH.
For example, math/bits.TrailingZeros32 gets replaced with ssa.OpCtz32,
which on amd64 may result in using the TZCNTL instruction.

We never noticed that we were breaking many of these intrinsics.
The intrinsics for funcs declared in the runtime and its dependencies
still worked properly, as we do not obfuscate those packages yet.
However, for other packages like math/bits and sync/atomic,
the intrinsics were being entirely disabled due to obfuscated names.

Skipping intrinsics is particularly bad for performance,
and it also leads to slightly larger binaries:

			 │      old      │                 new                 │
			 │     bin-B     │     bin-B      vs base              │
	Build-16   5.450Mi ± ∞ ¹   5.333Mi ± ∞ ¹  -2.15% (p=0.029 n=4)

Finally, the main reason we noticed that intrinsics were broken
is that apparently GOARCH=mips fails to link without them,
as some symbols end up being not defined at all.
This patch fixes builds for the MIPS family of architectures.

Rather than building and linking all of std for every GOARCH,
test that intrinsics work by asking the compiler to print which
intrinsics are being applied, and checking that math/bits gets them.

This fix is relatively unfortunate, as it means we stop obfuscating
about 120 function names and a handful of package paths.
However, fixing builds and intrinsics is much more important.
We can figure out better ways to deal with intrinsics in the future.

Fixes #646.
1 year ago
Daniel Martí 598d5182fb update x/tools version used in go:generate
Fixes running this go:generate line with Go tip.
1 year ago
Daniel Martí 0b096c9e75 generate go_std_tables.go in its entirety
No more having to manually run the script and adapting it to Go code.
1 year ago
Daniel Martí 33ceca7ef8 split runtimeAndDeps and runtimeLinknamed into a separate Go file
The next commit will start generating these via //go:generate,
so this first change keeps the diffs easier to review.
1 year ago
pagran d3c6ed6729
add support for obfuscation of dot-imported string constants 1 year ago
Daniel Martí d4e7abc28c reuse calls to testing.B.TempDir in the build benchmark
Multiple calls to TempDir give new unique temporary directories.
We don't need that, as we already used subdirectories.
1 year ago
Daniel Martí d9e74dabbb CI: add Go 1.20rc3, disable gotip for now
tip is now the start of Go 1.21 as of a couple of days ago.
However, the first week or two is when the biggest changes land,
which means that Go tip is far more prone to bugs and changes that might
break garble.

We'll start tracking tip again in a few weeks, once the dust has settled
and we can look at what changes might have broken garble.
1 year ago
Daniel Martí 71eda055c2
CHANGELOG: prepare for v0.9.0 1 year ago
Daniel Martí 294450fdc3 CHANGELOG: draft for v0.9.0 1 year ago
pagran e04c605c93 preallocate data in shuffle and split literal obfuscator 1 year ago
lu4p 5b2193351f Decrease binary size for -literals
Only string literals over 8 characters in length are now being
obfuscated. This leads to around 20% smaller binaries when building with
-literals.

Fixes #618
1 year ago
Daniel Martí 5dd6b2dc43 support assembly references to package names
Go's package runtime/internal/atomic contains references to functions
which refer to the current package by its package name alone:

	src/runtime/internal/atomic/atomic_loong64.s:   JMP atomic·Load(SB)
	src/runtime/internal/atomic/atomic_loong64.s:   JMP atomic·Load64(SB)
	src/runtime/internal/atomic/atomic_loong64.s:   JMP atomic·Load64(SB)
	src/runtime/internal/atomic/atomic_mips64x.s:   JMP atomic·Load(SB)
	src/runtime/internal/atomic/atomic_mips64x.s:   JMP atomic·Load64(SB)
	src/runtime/internal/atomic/atomic_mips64x.s:   JMP atomic·Load64(SB)

We could only handle unqualified or fully qualified references, like:

	JMP ·Load64(SB)
	JMP runtime∕internal∕atomic·Load64(SB)

Apparently, all three forms are equally valid.
Add a test case and fix it.

I checked whether referencing an imported package by its name worked;
it does not seem to be the case.
This feature appears to be restricted to the current package alone.

While here, we only need goPkgPath when we need to call listPackage.

Fixes #619.
1 year ago
Daniel Martí 7e95fcf1cb detect which objects are global in a simpler way
To detect if an object is global, our previous code did:

	obj.Pkg().Scope().Lookup(obj.Name()) == obj

However, that Lookup call is unnecessary.
It is a map lookup, which is cheap, but not free.
Instead, we can compare the scopes directly:

	obj.Pkg().Scope() == obj.Parent()

While here, reuse the calls to obj.Pkg(),
and avoid using fmt.Sprintf in this hot path.

Looked here since perf on Linux with a garble build
reports that we spend about 2.6% of CPU time in recordedObjectString.
Overall provides a very minor improvement:

	name      old time/op         new time/op         delta
	Build-16          21.8s ± 1%          21.7s ± 0%    ~     (p=0.200 n=4+4)

	name      old bin-B           new bin-B           delta
	Build-16          5.67M ± 0%          5.67M ± 0%    ~     (all equal)

	name      old cached-time/op  new cached-time/op  delta
	Build-16          734ms ± 3%          722ms ± 3%    ~     (p=0.486 n=4+4)

	name      old mallocs/op      new mallocs/op      delta
	Build-16          25.0M ± 0%          24.7M ± 0%  -1.28%  (p=0.029 n=4+4)

	name      old sys-time/op     new sys-time/op     delta
	Build-16          17.0s ± 1%          16.8s ± 1%    ~     (p=0.200 n=4+4)
1 year ago
Daniel Martí 8ea0708bca fail on user packages with build errors early
The added test case would panic, because we would try to hash a name
with a broken package's GarbleActionID, which was empty.

We skipped over all package errors in appendListedPackages because two
kinds of errors were OK in the standard library.
However, this also meant we ignored real errors we should stop at,
because obfuscating those user packages is pointless.

Add more assertions, check for the OK errors explicitly,
and fail on any other error immediately.
Note that, in the process, I also found a bug in cmd/go.

Uncovered by github.com/bytedance/sonic,
whose internal/loader package fails to build on Go 1.20.
1 year ago
Daniel Martí 1c4fe53fc1 skip over comments when obfuscating assembly
github.com/bytedance/sonic has many comments in its assembly files,
and one in particular caused us trouble:

	internal/rt/asm_amd64.s:2:// Code generated by asm2asm, DO NOT EDIT·

Since we looked for "middle dot" characters anywhere,
we would try to load the package "EDIT", which clearly does not exist.

To fix that, start skipping over comments. Thankfully,
Go's assembly syntax only supports "//" line comments,
so it's enough to read one line at a time and skip them quickly.
We also longer need a regular expression to find #include lines.

While here, two other minor bits of cleanup.
First, rename transformGo to transformGoFile, for clarity.
Second, use ".s" extensions for obfuscated assembly filenames,
just like we do with the comiler and ".go" files.

Updates #621.
1 year ago
Daniel Martí 5ede145791 make seed.txtar less likely to flake
Our test was essentially a "birthday paradox" simulation,
with 8 people in total and 7 possible birthdays.
Instead of checking whether two or more people share a birthday,
we tested whether six or more people shared a birthday.

Per a math/rand simulation with ten million iterations,
that test has a 0.13% flake rate.
That is low enough that we didn't immediately notice,
but high enough to realistically cause problems at some point.

My previous math in the test comment was clearly wrong.
First, we recently lowered the range by 1,
and `(1 / 7) ** 6` is three times higher as a probability.
Second, that's the probability of 6 people to share the same birthday,
without taking into account the extra two people in the total of 8.
That clearly drives the chances up, per the birthday paradox.

Double the number of people to 16,
and look for 14 or more to share a birthday.
The simulation returns a 0% chance at that point,
so it's extremely unlikely to cause issues.

Fixes #634.
1 year ago
pagran 45394ac13f
update linker patches from our fork 1 year ago
Daniel Martí 6f1959ca20 README: -tiny does a bit more now
On my machine, garble itself goes from about 5.7MiB to about 4.8 MiB
when being built with `garble build` and `garble build -tiny`,
which is roughly a 16% decrease in size.
The 2-5% figure is dated at this point, as we do better than that now.
1 year ago
pagran 22c177f088
remove all unexported func names with -tiny via the linker
The patch to the linker does this when generating the pclntab,
which is the binary section containing func names.
When `-tiny` is being used, we look for unexported funcs,
and set their names to the offset `0` - a shared empty string.

We also avoid including the original name in the binary,
which saves a significant amount of space.
The following stats were collected on GOOS=linux,
which show that `-tiny` is now about 4% smaller:

	go build                  1203067
	garble build               782336
	(old) garble -tiny build   688128
	(new) garble -tiny build   659456
1 year ago
Daniel Martí 417bcf27bb support referencing package paths with periods in assembly
The test case is lifted from github.com/bytedance/sonic/internal/native,
which had the following line:

	JMP github·com∕bytedance∕sonic∕internal∕native∕avx2·__quote(SB)

Updates #621.
1 year ago
pagran 6ace03322f
patch and rebuild cmd/link to modify the magic value in pclntab
This value is hard-coded in the linker and written in a header.
We could rewrite the final binary, like we used to do with import paths,
but that would require once again maintaining libraries to do so.

Instead, we're now modifying the linker to do what we want.
It's not particularly hard, as every Go install has its source code,
and rebuilding a slightly modified linker only takes a few seconds at most.

Thanks to `go build -overlay`, we only need to copy the files we modify,
and right now we're just modifying one file in the toolchain.
We use a git patch, as the change is fairly static and small,
and the patch is easier to understand and maintain.

The other side of this change is in the runtime,
as it also hard-codes the magic value when loading information.
We modify the code via syntax trees in that case, like `-tiny` does,
because the change is tiny (one literal) and the affected lines of code
are modified regularly between major Go releases.

Since rebuilding a slightly modified linker can take a few seconds,
and Go's build cache does not cache linked binaries,
we keep our own cached version of the rebuilt binary in `os.UserCacheDir`.

The feature isn't perfect, and will be improved in the future.
See the TODOs about the added dependency on `git`,
or how we are currently only able to cache one linker binary at once.

Fixes #622.
1 year ago
Daniel Martí 05d9b4ed26 avoid call to loadSharedCache for toolexec calls we skip
We were doing too much work for tools we don't need to wrap at all,
such as `go tool pack` or `go tool buildid`, which get run about as
often as the compiler does.

The overhead is small per call, but it adds up.

	name      old time/op         new time/op         delta
	Build-16          20.7s ± 0%          20.5s ± 1%    ~     (p=0.057 n=4+4)

	name      old bin-B           new bin-B           delta
	Build-16          5.67M ± 0%          5.66M ± 0%  -0.07%  (p=0.029 n=4+4)

	name      old cached-time/op  new cached-time/op  delta
	Build-16          707ms ± 0%          705ms ± 2%    ~     (p=0.886 n=4+4)

	name      old mallocs/op      new mallocs/op      delta
	Build-16          25.0M ± 0%          24.9M ± 0%  -0.26%  (p=0.029 n=4+4)

	name      old sys-time/op     new sys-time/op     delta
	Build-16          8.32s ± 2%          7.90s ± 3%  -5.05%  (p=0.029 n=4+4)
1 year ago
Daniel Martí 21a5eb1720 simplify types.TypeName panic logic
The logic

	if X || Y {
		if X {
			<X panic>
		}
		<Y panic>
	}

is equivalent to, and easier to read as:

	if X {
		<X panic>
	}
	if Y {
		<Y panic>
	}

It probably evolved unintentionally to be this way.
1 year ago
Daniel Martí 9a0d48c27e use errors.Is for listPackage errors 1 year ago
Daniel Martí 41df1f8725 slightly reduce the range of hashed name lengths
v0.8.0 was improved to obfuscate names so that they didn't all end up
with the same length. We went from a length fixed to 8 to a range
between 8 and 15.

Since the new mechanism chooses a length evenly between 8 and 15,
the average hashed name length went from 8 to 11.5.

While this improved obfuscation, it also increased the size of
obfuscated binaries by quite a bit. We do need to use a reasonably large
length to avoid collisions within packages, but we also don't want it to
be large enough to cause too much bloat.

Reduce the minimum and maximum from 8 and 15 to 6 and 12. This means
that the average goes fro 11.5 to 9, much closer to the old average.
We could consider a range of 4 to 12, but 4 bytes is short enough where
collisions become likely in practice, even if it's just the minimum.
And a range of 6 to 10 shrinks the range a bit too much.

	name      old time/op         new time/op         delta
	Build-16          20.6s ± 0%          20.6s ± 0%    ~     (p=0.421 n=5+5)

	name      old bin-B           new bin-B           delta
	Build-16          5.77M ± 0%          5.66M ± 0%  -1.92%  (p=0.008 n=5+5)

	name      old cached-time/op  new cached-time/op  delta
	Build-16          705ms ± 4%          703ms ± 2%    ~     (p=0.690 n=5+5)

	name      old mallocs/op      new mallocs/op      delta
	Build-16          25.0M ± 0%          25.0M ± 0%  -0.08%  (p=0.008 n=5+5)

	name      old sys-time/op     new sys-time/op     delta
	Build-16          8.11s ± 2%          8.11s ± 2%    ~     (p=0.841 n=5+5)

Updates #618.
2 years ago
Daniel Martí 7ead5998bc drop shadowed and unused global consts
I forgot these in the recent hash length randomness refactor.
2 years ago
Daniel Martí d955196470 avoid using math/rand's global funcs like Seed and Intn
Go 1.20 is starting to deprecate the use of math/rand's global state,
per https://go.dev/issue/56319 and https://go.dev/issue/20661.
The reasoning is sound:

	Deprecated: Programs that call Seed and then expect a specific sequence
	of results from the global random source (using functions such as Int)
	can be broken when a dependency changes how much it consumes from the
	global random source. To avoid such breakages, programs that need a
	specific result sequence should use NewRand(NewSource(seed)) to obtain a
	random generator that other packages cannot access.

Aside from the tests, we used math/rand only for obfuscating literals,
which caused a deterministic series of calls like Intn. Our call to Seed
was also deterministic, per either GarbleActionID or the -seed flag.

However, our determinism was fragile. If any of our dependencies or
other packages made any calls to math/rand's global funcs, then our
determinism could be broken entirely, and it's hard to notice.

Start using separate math/rand.Rand objects for each use case.
Also make uses of crypto/rand use "cryptorand" for consistency.

Note that this requires a bit of a refactor in internal/literals
to start passing around Rand objects. We also do away with unnecessary
short funcs, especially since math/rand's Read never errors,
and we can obtain a byte via math/rand's Uint32.
2 years ago
Daniel Martí fa29c14e4b README: clarify that -literals affects -ldflags=-X too 2 years ago
Daniel Martí 98466a1f64 scripts: use -trimpath for "go build" in check-third-party
Since "garble build" first performs a Go build with -trimpath,
it helps to also use the flag with the "go build" step.
This way we get more build cache hits, as building a Go package with and
without the flag results in two separate builds.

Before:

	$ go clean -cache && time ./scripts/check-third-party.sh

	real  0m41.844s
	user  2m17.791s
	sys   0m35.440s

After:

	$ go clean -cache && time ./scripts/check-third-party.sh

	real  0m33.983s
	user  1m50.596s
	sys   0m28.499s
2 years ago
Daniel Martí 352d2fa73d scripts: follow shfmt 2 years ago
Daniel Martí 82b955dfe1
CHANGELOG: release v0.8.0 2 years ago
Daniel Martí 4a0c2d0274 remove duplicate err!=nil check 2 years ago
Daniel Martí 567b41c2b9 CI: bump gotip to December 2022
Go 1.20 isn't far away at this point,
and we're well into the freeze.

Also note that a build error output got refined again.
2 years ago
Daniel Martí 89f873bba4 CHANGELOG: note the previous fix in the changelog 2 years ago
Daniel Martí 9c0cdc61ef avoid reflect method call panics with GOGARBLE=*
We were obfuscating reflect's package path and its declared names,
but the toolchain wants to detect the presence of method reflection
to turn down the aggressiveness of dead code elimination.

Given that the obfuscation broke the detection,
we could easily end up in crashes when making reflect calls:

	fatal error: unreachable method called. linker bug?

	goroutine 1 [running]:
	runtime.throw({0x50c9b3?, 0x2?})
		runtime/panic.go:1047 +0x5d fp=0xc000063660 sp=0xc000063630 pc=0x43245d
	runtime.unreachableMethod()
		runtime/iface.go:532 +0x25 fp=0xc000063680 sp=0xc000063660 pc=0x40a845
	runtime.call16(0xc00010a360, 0xc00000e0a8, 0x0, 0x0, 0x0, 0x8, 0xc000063bb0)
		runtime/wcS9OpRFL:728 +0x49 fp=0xc0000636a0 sp=0xc000063680 pc=0x45eae9
	runtime.reflectcall(0xc00001c120?, 0x1?, 0x1?, 0x18110?, 0xc0?, 0x1?, 0x1?)
		<autogenerated>:1 +0x3c fp=0xc0000636e0 sp=0xc0000636a0 pc=0x462e9c

Avoid obfuscating the three names which cause problems: "reflect",
"Method", and "MethodByName".

While here, we also teach obfuscatedImportPath to skip "runtime",
as I also saw that the toolchain detects it for many reasons.
That wasn't a problem yet, as we do not obfuscate the runtime,
but it was likely going to become a problem in the future.
2 years ago
Daniel Martí 481e3a1f09 default to GOGARBLE=*, stop using GOPRIVATE
We can drop the code that kicked in when GOGARBLE was empty.
We can also add the value in addGarbleToHash unconditionally,
as we never allow it to be empty.

In the tests, remove all GOGARBLE lines where it just meant "obfuscate
everything" or "obfuscate the entire main module".

cgo.txtar had "obfuscate everything" as a separate step,
so remove it entirely.

linkname.txtar started failing because the imported package did not
import strings, so listPackage errored out. This wasn't a problem when
strings itself wasn't obfuscated, as transformLinkname silently left
strings.IndexByte untouched. It is a problem when IndexByte does get
obfuscated. Make that kind of listPackage error visible, and fix it.

reflect.txtar started failing with "unreachable method" runtime throws.
It's not clear to me why; it appears that GOGARBLE=* makes the linker
think that ExportedMethodName is suddenly unreachable.
Work around the problem by making the method explicitly reachable,
and leave a TODO as a reminder to investigate.

Finally, gogarble.txtar no longer needs to test for GOPRIVATE.
The rest of the test is left the same, as we still want the various
values for GOGARBLE to continue to work just like before.

Fixes #594.
2 years ago
Daniel Martí ec32030be0
README: drop obsolete note about `//export` declarations (#615)
Since #550 in June 2022, we have stopped considering `//export`
directives when deciding what names to obfuscate.
We forgot to remove the caveat from the README.
2 years ago
Daniel Martí 41a09ad72e initial changelog draft for v0.8.0 2 years ago
Daniel Martí 1877dfb474 CI: stop tracking macos-latest for now
Since it's not yet clear whether it's a bug on garble's end,
and the regression was the MacOS version bump and not any recent change
we did.

For #609.
2 years ago
Daniel Martí ffb4987756
clarify that "garble version" does more now (#612)
Since earlier this year, it has also printed build settings.
2 years ago
Daniel Martí 7d591830cd teach -debugdir to produce assembly files too
Up to this point, -debugdir only included obfuscated Go files.
Include assembly files and their headers as well.
While here, ensure that -debugdir supports the standard library too,
and that it behaves correctly with build tags as well.

Also use more consistent names for path strings, to clarify which are
only the basename, and which are the obfuscated version.
2 years ago
Daniel Martí 12bc0349e6 make bincmp keep binaries around when it fails
Even if diffoscope is installed, because further investigation might be
needed, and some failures are rare or hard to reproduce.

Make GitHub Actions upload those artifacts,
so that a failed CI run on Windows or Mac due to bincmp
allows us to download and inspect those binaries locally.
2 years ago
Daniel Martí e61317e7ae fix garble with newer Go tip versions
Some big changes landed in Go for the upcoming 1.20.

While here, remove the use of GOGC=off with make.bash,
as https://go.dev/cl/436235 makes that unnecessary now.
2 years ago
Daniel Martí ff521782f1 obfuscate all assembly filenames
We were still leaking the filenames for assembly files.
In our existing asm.txtar test's output binary,
the string `test/main/garble_main_amd64.s` was present.
This leaked full import paths on one hand,
and the filenames of each assembly file on the other.

We avoid this in Go files by using `/*line` directives,
but those are not supported in assembly files.
Instead, obfuscate the paths in the temporary directory.
Note that we still need a separate temporary directory per package,
because otherwise any included header files might collide.

We must remove the `main` package panic in obfuscatedImportPath,
as we now need to use that function for all packages.

While here, remove the outdated comment about `-trimpath`.

Fixes #605.
2 years ago
Daniel Martí 416782340f support `garble test` in main packages
A main package can be imported in one edge case we weren't testing:
when the same main package has any tests, which implicitly import main.

Before the fix, we would panic:

        > garble test -v ./...
        # test/bar/somemaintest.test
        panic: main packages should never need to obfuscate their import paths
2 years ago
Daniel Martí b6a0284f84 immprove how hashWithCustomSalt comes up with its random lengths
The last change made it so that hashWithCustomSalt does not always end
up with 8 base64 characters, which is a good change for the sake of
avoiding easy patterns in obfuscated code.

However, the docs weren't updated accordingly, and it wasn't
particularly clear that the byte giving us randomness wasn't part of the
resulting base64-encoded name.

First, refactor the code to only feed as many checksum bytes to the
base64 encoder as necessary, which is 12.
This shrinks b64NameBuffer and saves us some base64 encoding work.

Second, use the first checksum byte that we don't use, the 13th,
as the source of the randomness.
Note how before we used a base64-encoded byte for the randomness,
which isn't great as that byte was only one of 63 characters,
whereas a checksum byte is one of 256.

Third, update the docs so that the code is as clear as possible.
This is particularly important given that we have no tests.

With debug prints in the gogarble.txt test, we can see that the
randomness in hash lengths is working as intended:

	# test/main/stdimporter
	hashLength = 13
	hashLength = 8
	hashLength = 12
	hashLength = 15
	hashLength = 10
	hashLength = 15
	hashLength = 9
	hashLength = 8
	hashLength = 15
	hashLength = 15
	hashLength = 12
	hashLength = 10
	hashLength = 13
	hashLength = 13
	hashLength = 8
	hashLength = 15
	hashLength = 11

Finally, add a regression test that will complain if we end up with
hashed names that reuse the same length too often.
Out of eight hashed names, the test will fail if six end up with the
same length, as that is incredibly unlikely given that each should pick
one of eight lengths with a fair distribution.
2 years ago
Azrotronik 73b77ce6be
randomize the length of hashes used for identifiers and filenames
Otherwise all of those names share the same exact length,
which can be a rather easy pattern to spot that garble was used.
2 years ago