Commit Graph

56 Commits (745d089a9dde5d8b5a76a048c960b0545b4da5d5)

Author SHA1 Message Date
Daniel Martí 7c2866356f support obfuscating the syscall package
One more package that further unblocks obfuscating the runtime.
The issue was the TODO we already had about go:linkname directives with
just one argument, which are used in the syscall package.

While here, factor out the obfuscation of linkname directives into
transformLinkname, as it was starting to get a bit complex.
We now support debug logging as well, while still being able to use
"early returns" for some cases where we bail out.

We also need listPackage to treat all runtime sub-packages like it does
runtime itself, as `runtime/internal/syscall` linknames into `syscall`
without it being a dependency as well.

Finally, add a regression test that, without the fix,
properly spots that the syscall package was not obfuscated:

	FAIL: testdata/script/gogarble.txtar:41: unexpected match for ["syscall.RawSyscall6"] in out

Updates #193.
2 years ago
Daniel Martí e71cb69dd8 support obfuscating the time package
This failed at link time because transformAsm did not know how to handle
the fact that the runtime package's assembly code implements the
`time.now` function via:

	TEXT time·now<ABIInternal>(SB),NOSPLIT,$16-24

First, we need transformAsm to happen for all packages, not just the
ones that we are obfuscating. This is because the runtime can implement
APIs in other packages which are themselves obfuscated, whereas runtime
may not itself be getting obfuscated. This is currently the case with
`GOGARBLE=*` as we do not yet support obfuscating the runtime.

Second, we need to teach replaceAsmNames to handle qualified names with
import paths. Not just to look up the right package information for the
name, but also to obfuscate the package path if necessary.

Third, we need to relax the Deps requirement on listPackage, since the
runtime package and its dependencies are always implicit dependencies.

This is a big step towards being able to obfuscate the runtime, as there
is now just one package left that we cannot obfuscate outside the runtime.

Updates #193.
2 years ago
Daniel Martí 5d926a8011 add support for the latest gotip
A new runtime package was added.
2 years ago
Daniel Martí 3c7141e801 update the state of a few TODOs related to upstream Go
The generics issue has been fixed for the upcoming Go 1.20.
Include that version as a reminder for when we can drop Go 1.19.

The fs.SkipAll proposal is also implemented for Go 1.20.

The BinaryContentID comment was a little bit trickier.
We did get stamped VCS information some time ago,
but it only provides us with the current commit info and a dirty bit.
That is not enough for our use of the build cache,
because we want any uncommitted changes to garble to cause rebuilds.

I don't think we'll get any better than using garble's own build ID.
Reword the quasi-TODO to instead explain what we're doing and why.
2 years ago
Daniel Martí ac0945eaa5 work around cmd/go issue relating to CompiledGoFiles
See https://golang.org/issue/28749. The improved asm test would fail:

	go parse: $WORK/imported/imported_amd64.s:1:1: expected 'package', found TEXT (and 2 more errors)

because we would incorrectly parse a non-Go file as a Go file.

Add a workaround. The original reporter's reproducer with go-ethereum
works now, as this was the last hiccup.

Fixes #555.
2 years ago
Daniel Martí e8e06f6ad6 support reverse on packages using cgo
The reverse feature relied on `GoFiles` from `go list`,
but that list may not be enough to typecheck a package:

	typecheck error: $WORK/main.go:3:15: undeclared name: longMain

`go help list` shows:

	GoFiles         []string   // .go source files (excluding CgoFiles, TestGoFiles, XTestGoFiles)
	CgoFiles        []string   // .go source files that import "C"
	CompiledGoFiles []string   // .go files presented to compiler (when using -compiled)

In other words, to mimic the same list of Go files fed to the compiler,
we want CompiledGoFiles.

Note that, since the cgo files show up as generated files,
we currently do not support reversing their filenames.
That is left as a TODO for now.

Updates #555.
2 years ago
Daniel Martí f9d99190d2 use -toolexec="garble toolexec"
This way, the child process knows that it's running a toolchain command
via -toolexec without having to guess via filepath.IsAbs.

While here, improve the docs and tests a bit.
2 years ago
Daniel Martí 3f9d77d9b6 join runtimeAndDeps into cannotObfuscate
There used to be a reason to keep these maps separate, but ever since we
became better at obfuscating the standard library, that has gone away.

It's still a good idea to keep `go list -deps runtime` as a group,
but we can do that via a comment inside a joint map literal.

I also noticed that one comment still referred to cannotObfuscateNames,
which hasn't existed for some time. Fix that up.

It's also not documented how cachedOutput contains info for all deps,
so clarify that while we're improving the docs.

Finally, the reason we cannot obfuscate the syscall package was out of
date; it's not part of the runtime. It is a go:linkname bug.
2 years ago
Daniel Martí f6ef988823 add crypto/internal/boring/bcache to runtimeLinknamed
A chunk from crypto/internal/boring has been split away as a separate
package very recently, shortly before 1.19rc1 is due for release.
See https://go.dev/cl/407135 for more information.

Makes garble work on the latest Go tip again.
2 years ago
Daniel Martí d6afdd08bb obfuscate net and runtime/debug
It appears that we already support obfuscating them,
and nothing seems to break when they are pulled in.

While here, add runtime/internal/syscall to runtimeAndDeps.
It first appeared in Go 1.18, but we missed adding it.
It seems like not having it there didn't cause any issues,
which makes sense given it's got almost zero Go code.

We also teach garble about the -work boolean build flag,
which has existed for multiple years but we forgot about.
It's likely that noone noticed as it's a rarely used flag.
2 years ago
Daniel Martí f0d79a38d4 remove a couple of easy TODOs
First, I tried to follow my own past advice to only set GarbleActionID
if ToObfuscate is true. However, that broke at least three parts of
transformCompile, as the hash is used for more than I recalled.
Give up on that idea, because the current code is working as intended.
Better document what GarbleActionID is and what we use it for.

Second, now that https://go.dev/cl/348741 was shipped with Go 1.18,
using the logger when its output is io.Discard is already a no-op.
So we no longer need our debugf wrapper to apply the no-op logic.
2 years ago
Daniel Martí d2a2f2012b obfuscate a few more std packages
First, two cleanups: unsafe and internal/abi are already in
runtimeAndDeps, so they are already not being obfuscated.
No need to repeat them in the other map.

Then, via trial and error, remove:

* runtime/pprof; it seems like we handle its runtime linknames well now.

* os/signal; unclear what "rebuilds don't work" meant, but it works now.
  Our gogarble.txt test already does a full reproducible rebuild.

* crypto/x509/internal/macos; another linkname user that works now.

It is likely that we could remove one or two more packages already,
but it's best to move slowly and watch out for unexpected regressions.
2 years ago
Daniel Martí 7fb390e403 fix support with the latest Go master version
It added packages which are only built with the boringcrypto build tag,
so trying to `go list` them will fail even though it doesn't matter.

While here, a few more minor cleanups:

1) Hide GarbleActionID and ToObfuscate from encoding/json, so that they
   can't possibly collide with the fields consumed from `go list -json`.

2) Add test cases for `garble build` with packages that fail to load.
   Note that this requires GOGARBLE=* to avoid its "does not match any
   package to be built" error.

3) Remove the last use of interface{}, in a testdata file.

Fixes #531.
2 years ago
Daniel Martí e3a59eae07 add missing context to two unmarshal errors
Returning a json or gob error directly to the user is generally not
helpful, as it lacks any form of context.
For example, from the json unmarshal of "go env -json":

	$ garble build
	invalid character 'w' looking for beginning of value

Also improve the error when the user ran garble in the wrong way,
resulting in no shared gob file. The context is now shorter, and we also
include the os.Open error in case it contains any useful details.

While here, apply Go tip's gofmt, which reformatted a godoc list.

For #523.
2 years ago
lu4p 1a0b028db7 all: drop support for Go 1.17
Now that we've released v0.6.0, that will be the last feature release to
feature support for Go 1.17. The upcoming v0.7.0 will be Go 1.18+.

Code-wise, the cleanup here isn't super noticeable,
but it will be easier to work on features like VCS-aware version
information and generics support without worrying about Go 1.17.
Plus, now CI is back to being much faster.

Note how "go 1.18" in go.mod makes "go mod tidy" more aggressive.
2 years ago
Daniel Martí 8b55dd4bd2 work around a build cache regression in the previous commit
The added comment in main.go explains the situation in detail.
The added test is a minimization of the scenario, which failed:

        > cd mod1
        > garble -seed=${SEED1} build -v gopkg.in/garbletest.v2
        > cd ../mod2
        > garble -seed=${SEED1} build -v
        [stderr]
        test/main/mod2
        # test/main/mod2
        cannot load garble export file for gopkg.in/garbletest.v2: open […]/go-build/ed/[…]-garble-ZV[…]-d: no such file or directory

To work around the problem, we'll always add each package's
GarbleActionID to its build artifact, even when not using -seed.
This will get us the previous behavior with regards to the build cache,
meaning that we fix the recent regression.
The added variable doesn't make it to the final binary either.

While here, improve the cached file loading error's context,
and add an extra sanity check for duplicates on ListedPackages.
2 years ago
Daniel Martí c1c90fee13 make obfuscation fully deterministic with -seed
The default behavior of garble is to seed via the build inputs,
including the build IDs of the entire Go build of each package.
This works well as a default, and does give us determinism,
but it means that building for different platforms
will result in different obfuscation per platform.

Instead, when -seed is provided, don't use any other hash seed or salt.
This means that a particular Go name will be obfuscated the same way
as long as the seed, package path, and name itself remain constant.

In other words, when the user supplies a custom -seed,
we assume they know what they're doing in terms of storage and rotation.

Expand the README docs with more examples and detail.

Fixes #449.
2 years ago
Daniel Martí a9a721e352 concentrate and simplify "to obfuscate" logic
Back in the day, we used to call toObfuscate anytime we needed to know
whether a package should be obfuscated.
More recently, we started computing via the ToObfuscate field,
which then gets shared with all sub-processes via sharedCache.

We still had two places that directly called toObfuscate.
Replace those with ToObfuscate, and inline toObfuscate into shared.go.

obfuscatedImportPath is also a potential footgun for main packages.
Some use cases always want the original "main" package name,
such as for use in the compiler's "-p main" flag,
while other cases want the obfuscated package import path,
such as the entries in importcfg files.

Since each of these call sites handles the edge case well,
obfuscatedImportPath now panics on main packages to avoid any misuse.

Finally, test that we never leak main package paths via ldflags.txt.
We never did, but it's good to make sure.

Overall, this avoids confusion and trims the size of main.go a bit.
2 years ago
Daniel Martí 70b1cb2fd8 CI: start enforcing vet and staticcheck
Fix a staticcheck warning about unused code,
as well as an unparam warning and a missing copyright header.

We also bump the action versions to their latest releases,
and drop unnecessary "name" fields for self-describing steps.

Note that we drop the "go env" commands, as setup-go does that now.

Finally, I did briefly try to add caching,
but then realised it didn't help us at all. Document why.
2 years ago
Daniel Martí 955c24856c properly record when type aliases are embedded as fields
There are two scenarios when it comes to embedding fields.
The first is easy, and we always handled it well:

	type Named struct { Foo int }

	type T struct { Named }

In this scenario, T ends up with an embedded field named "Named",
and a promoted field named "Foo".

Then there's the form with a type alias:

	type Named struct { Foo int }

	type Alias = Named

	type T struct { Alias }

This case is different: T ends up with an embedded field named "Alias",
and a promoted field named "Foo".
Note how the field gets its name from the referenced type,
even if said type is just an alias to another type.

This poses two problems.
First, we must obfuscate the field T.Alias as the name "Alias",
and not as the name "Named" that the alias points to.
Second, we must be careful of cases where Named and Alias are declared
in different packages, as they will obfuscate the same name differently.

Both of those problems compounded in the reported issue.
The actual reason is that quic-go has a type alias in the form of:

	type ConnectionState = qtls.ConnectionState

In other words, the entire problem boils down to a type alias which
points to a named type in a different package, where both types share
the same name. For example:

	package parent

	import "parent/p1"

	type T struct { p1.SameName }

	[...]

	package p1

	import "parent/p2"

	type SameName = p2.SameName

	[...]

	package p2

	type SameName struct { Foo int }

This broke garble because we had a heuristic to detect when an embedded
field was a type alias:

	// Instead, detect such a "foreign alias embed".
	// If we embed a final named type,
	// but the field name does not match its name,
	// then it must have been done via an alias.
	// We dig out the alias's TypeName via locateForeignAlias.
	if named.Obj().Name() != node.Name {

As the reader can deduce, this heuristic would incorrectly assume that
the snippet above does not embed a type alias, when in fact it does.
When obfuscating the field T.SameName, which uses a type alias,
we would correctly obfuscate the name "SameName",
but we would incorrectly obfuscate it with the package p2, not p1.
This would then result in build errors.

To fix this problem for good, we need to get rid of the heuristic.
Instead, we now mimic what was done for KnownCannotObfuscate,
but for embedded fields which use type aliases.
KnownEmbeddedAliasFields is now filled for each package
and stored in the cache as part of cachedOutput.
We can then detect the "embedded alias" case reliably,
even when the field is declared in an imported package.

On the plus side, we get to remove locateForeignAlias.
We also add a couple of TODOs to record further improvements.
Finally, add a test.

Fixes #466.
2 years ago
Daniel Martí 8652271db2 slightly simplify how we deal with linknamed runtime deps
Obfuscating the runtime only needs to list the linknamed packages,
and doesn't need to know about their dependencies directly.

Refactor the script to return a "flat" list that includes all packages
we need, except those that we know the runtime already pulled in.

This allows us to simplify the script and avoid passing -deps to cmd/go.
Performance is unaffected, but I reckon it's worthwhile given how much
we simplified the script.

Longer term, it's also best to avoid using -deps when we don't need it,
as cmd/go could avoid computing information we don't need.

	name              old time/op       new time/op       delta
	Build/NoCache-16        1.68s ± 1%        1.68s ± 0%    ~     (p=1.000 n=5+5)

	name              old bin-B         new bin-B         delta
	Build/NoCache-16        6.72M ± 0%        6.72M ± 0%  +0.01%  (p=0.008 n=5+5)

	name              old sys-time/op   new sys-time/op   delta
	Build/NoCache-16        1.88s ± 1%        1.89s ± 2%    ~     (p=0.548 n=5+5)

	name              old user-time/op  new user-time/op  delta
	Build/NoCache-16        19.9s ± 1%        19.8s ± 0%    ~     (p=0.421 n=5+5)
2 years ago
Daniel Martí 321fa85441 remove the use of -buildinfo=false
It looks like the flag will be scrapped from Go 1.18.
Stop using it before 1.18rc1 releases without it.

See: https://github.com/golang/go/issues/50501#issuecomment-1010225207
3 years ago
Daniel Martí 34cbd1b841 only list missing packages when obfuscating the runtime
We were listing all of std, which certainly worked,
but was quite slow at over 200 packages.
In practice, we can only be missing up to 20-30 packages.
It was a good change as it fixed a severe bug,
but it also introduced a fairly noticeable slow-down.

The numbers are clear; this change shaves off multiple seconds when
obfuscating the runtime with a cold cache:

	name              old time/op       new time/op       delta
	Build/NoCache-16        5.06s ± 1%        1.94s ± 1%  -61.64%  (p=0.008 n=5+5)

	name              old bin-B         new bin-B         delta
	Build/NoCache-16        6.70M ± 0%        6.71M ± 0%   +0.05%  (p=0.008 n=5+5)

	name              old sys-time/op   new sys-time/op   delta
	Build/NoCache-16        13.4s ± 2%         5.0s ± 2%  -62.45%  (p=0.008 n=5+5)

	name              old user-time/op  new user-time/op  delta
	Build/NoCache-16        60.6s ± 1%        19.8s ± 1%  -67.34%  (p=0.008 n=5+5)

Since we only want to call "go list" one extra time,
instead of once for every package we find out we're missing,
we want to know what packages we could be missing in advance.
Resurrect a smarter version of the runtime-related script.

Finally, remove the runtime-related.txt test script,
as it has now been superseeded by the sanity checks in listPackage.
That is, obfuscating the runtime package will now panic if we are
missing any necessary package information.

To double check that we get the runtime's linkname edge case right,
make gogarble.txt use runtime/debug.WriteHeapDump,
which is implemented via a direct runtime linkname.
This ensures we don't lose test coverage from runtime-related.txt.
3 years ago
Daniel Martí 4f0657a19a prepare for v0.5.0
While here, add a TODO I forgot about, and run gofumpt.

Also bump all test timeouts slightly,
as the Mac and Windows hosted runners are a bit slow
and I've hit failures twice recently.
3 years ago
Daniel Martí 5abd3c468d update cmd/go flags for 1.18
As of 1.18beta1. I used bash commands like:

	diff -u <(go1.17.5 help build) <(gotip help build)

While here, supply -buildinfo=false and -buildvcs=false to go build.
We already remove that information by discarding the _gomod_.go file,
but we might as well pass the flags too.
If anything, it lets the toolchain avoid the work entirely.
Note that we can't use these flags on Go 1.17 for now, though.

Add a TODO that came to mind while writing this, too.
3 years ago
Daniel Martí 93b2873c28 ensure the runtime is built in a reproducible way
We went to great lengths to ensure garble builds are reproducible.
This includes how the tool itself works,
as its behavior should be the same given the same inputs.

However, we made one crucial mistake with the runtime package.
It has go:linkname directives pointing at other packages,
and some of those pointed packages aren't its dependencies.

Imagine two scenarios where garble builds the runtime package:

1) We run "garble build runtime". The way we handle linkname directives
   calls listPackage on the target package, to obfuscate the target's
   import path and object name. However, since we only obtained build
   info of runtime and its deps, calls for some linknames such as
   listPackage("sync/atomic") will fail. The linkname directive will
   leave its target untouched.

2) We run "garble build std". Unlike the first scenario, all listPackage
   calls issued by runtime's linkname directives will succeed, so its
   linkname directive targets will be obfuscated.

At best, this can result in inconsistent builds, depending on how the
runtime package was built. At worst, the mismatching object names can
result in errors at link time, if the target packages are actually used.

The modified test reproduces the worst case scenario reliably,
when the fix is reverted:

	> env GOCACHE=${WORK}/gocache-empty
	> garble build -a runtime
	> garble build -o=out_rebuild ./stdimporter
	[stderr]
	# test/main/stdimporter
	JZzQivnl.NtQJu0H3: relocation target JZzQivnl.iioHinYT not defined
	JZzQivnl.NtQJu0H3.func9: relocation target JZzQivnl.yz5z0NaH not defined
	JZzQivnl.(*ypvqhKiQ).String: relocation target JZzQivnl.eVciBQeI not defined
	JZzQivnl.(*ypvqhKiQ).PkgPath: relocation target JZzQivnl.eVciBQeI not defined
	[...]

The fix consists of two steps. First, if we're building the runtime and
listPackage fails on a package, that means we ran into scenario 1 above.
To avoid the inconsistency, we fill ListedPackages with "go list [...] std".
This means we'll always build runtime as described in scenario 2 above.

Second, when building packages other than the runtime,
we only allow listPackage to succeed if we're listing a dependency of
the current package.
This ensures we won't run into similar reproducibility bugs in the future.

Finally, re-enable test-gotip on CI since this was the last test flake.
3 years ago
Daniel Martí 06d5972223
add -debug to aid in debugging (#431)
Not really meant for end users,
but they might still debug failures before filing bugs.

We add the -debug flag itself,
as well as machinery to deduplicate output lines.
There are quite a lot of them otherwise,
which aren't helpful and simply add noise.

In the future, if we always want to output a debug log line,
such as "choosing not to obfuscate here because X",
we can simply insert the unique position string.

Finally, turn all commented-out log.Printf calls to debugf.
Improve a few log lines to be more human-friendly,
and also add a few extras like how long it takes to load files.

We can improve the logging further in the future.
This seems like a good starting point.
3 years ago
Daniel Martí 5f74a1c9f0 unify the definition and storage of flag values
The parent garble process parses the original flags,
as provided by the user via the command line.
Previously, those got stored in the shared cache file,
so that child processes spawned by toolexec could see them.

Unfortunately, this made the code relatively easy to misuse.
A child process would always see flagLiterals as zero value,
given that it should never see such a flag argument directly.
Similarly, one would have to be careful with cached options,
as they could only be consumed after the cache file is loaded.

Simplify the situation by deduplicating the storage of flags.
Now, the parent passes all flags onto children via toolexec.

One exception is GarbleDir, which now becomes an env var.
This seems in line with other top-level dirs like GARBLE_SHARED.

Finally, we turn -seed into a flag.Value,
which lets us implement its "set" behavior as part of flag.Parse.

Overall, we barely reduce the amount of code involved,
but we certainly remove a couple of footguns.
As part of the cleanup, we also introduce appendFlags.
3 years ago
Daniel Martí fceb19f6da
deprecate using GOPRIVATE in favor of GOGARBLE (#427)
Piggybacking off of GOPRIVATE is great for a number of reasons:

* People tend to obfuscate private code, whose package paths will
  generally be in GOPRIVATE already

* Its meaning and syntax are well understood

* It allows all the flexibility we need without adding our own env var
  or config option

However, using GOPRIVATE directly has one main drawback.
It's fairly common to also want to obfuscate public dependencies,
to make the code in private packages even harder to follow.
However, using "GOPRIVATE=*" will result in two main downsides:

* GONOPROXY defaults to GOPRIVATE, so the proxy would be entirely disabled.
  Downloading modules, such as when adding or updating dependencies,
  or when the local cache is cold, can be less reliable.

* GONOSUMDB defaults to GOPRIVATE, so the sumdb would be entirely disabled.
  Adding entries to go.sum, such as when adding or updating dependencies,
  can be less secure.

We will continue to consume GOPRIVATE as a fallback,
but we now expect users to set GOGARBLE instead.
The new logic is documented in the README.

While here, rewrite some uses of "private" with "to obfuscate",
to make the code easier to follow and harder to misunderstand.

Fixes #276.
3 years ago
Daniel Martí b5bef981ee
stop relying on nested "go list -toolexec" calls (#422)
We rely on importcfg files to load type info for obfuscated packages.
We use this type information to remember what names we didn't obfuscate.
Unfortunately, indirect dependencies aren't listed in importcfg files,
so we relied on extra "go list -toolexec" calls to locate object files.

This worked fine, but added a significant amount of complexity.
The extra "go list -export -toolexec=garble" invocations weren't slow,
as they avoided rebuilding or re-obfuscating thanks to the build cache.
Still, it was hard to reason about how garble runs during a build
if we might have multiple layers of -toolexec invocations.

Instead, record the export files we encounter in an incremental map,
and persist it in the build cache via the gob file we're already using.
This way, each garble invocation knows where all object files are,
even those for indirect imports.

One wrinkle is that importcfg files can point to temporary object files.
In that case, figure out its final location in the build cache.
This requires hard-coding a bit of knowledge about how GOCACHE works,
but it seems relatively harmless given how it's very little code.
Plus, if GOCACHE ever changes, it will be obvious when our code breaks.

Finally, add a TODO about potentially saving even more work.
3 years ago
Daniel Martí caa9831a63
fail if we are unexpectedly overwriting files (#418)
While investigating a bug report,
I noticed that garble was writing to the same temp file twice.
At best, writing to the same path on disk twice is wasteful,
as the design is careful to be deterministic and use unique paths.
At worst, the two writes could cause races at the filesystem level.

To prevent either of those situations,
we now create files with os.OpenFile and os.O_EXCL,
meaning that we will error if the file already exists.
That change uncovered a number of such unintended cases.

First, transformAsm would write obfuscated Go files twice.
This is because the Go toolchain actually runs:

	[...]/asm -gensymabis [...] foo.s bar.s
	[...]/asm [...] foo.s bar.s

That is, the first run is only meant to generate symbol ABIs,
which are then used by the compiler.
We need to obfuscate at that first stage,
because the symbol ABI descriptions need to use obfuscated names.

However, having already obfuscated the assembly on the first stage,
there is no need to do so again on the second stage.
If we detect gensymabis is missing, we simply reuse the previous files.

This first situation doesn't seem racy,
but obfuscating the Go assembly files twice is certainly unnecessary.

Second, saveKnownReflectAPIs wrote a gob file to the build cache.
Since the build cache can be kept between builds,
and since the build cache uses reproducible paths for each build,
running the same "garble build" twice could overwrite those files.

This could actually cause races at the filesystem level;
if two concurrent builds write to the same gob file on disk,
one of them could end up using a partially-written file.

Note that this is the only of the three cases not using temporary files.
As such, it is expected that the file may already exist.
In such a case, we simply avoid overwriting it rather than failing.

Third, when "garble build -a" was used,
and when we needed an export file not listed in importcfg,
we would end up calling roughly:

	go list -export -toolexec=garble -a <dependency>

This meant we would re-build and re-obfuscate those packages.
Which is unfortunate, because the parent process already did via:

	go build -toolexec=garble -a <main>

The repeated dependency builds tripped the new os.O_EXCL check,
as we would try to overwrite the same obfuscated Go files.
Beyond being wasteful, this could again cause subtle filesystem races.
To fix the problem, avoid passing flags like "-a" to nested go commands.

Overall, we should likely be using safer ways to write to disk,
be it via either atomic writes or locked files.
However, for now, catching duplicate writes is a big step.
I have left a self-assigned TODO for further improvements.

CI on the pull request found a failure on test-gotip.
The failure reproduces on master, so it seems to be related to gotip,
and not a regression introduced by this change.
For now, disable test-gotip until we can investigate.
3 years ago
Daniel Martí 29356f30f7 update runtimeAndDeps for Go 1.18
In particular, internal/abi now has some actual code,
so obfuscating those literals was breaking as expected.
Document how to update the list in the future as well.

The change above gets "go test" to just one test failure on:

	go version devel go1.18-578ada410d Tue Nov 9 22:58:24 2021 +0000 linux/amd64

We also move the doc about why we disable GarbleLiterals,
so that it's next to where the disabling happens.

While here, we also rename GarbleLiterals to ObfuscateLiterals,
as we have been trying to move away from "to garble" as a verb.

Finally, limit the verbosity of diffoscope.
One test was failing for me, and diffoscope printed thousands of lines.
Not particularly useful when I'm trying to skim test results.
Usually, seeing a few dozen lines of output is enough.

Updates #385.
3 years ago
lu4p 88f238e558
Obfuscate more packages of the standard library (#312)
Also update linkname directives of public packages,
to allow the package where something is linknamed to to be
obfuscated regardless.

Public packages can now depend on private packages.
3 years ago
Daniel Martí c77bc9e5e1 start using "go install pkg@version" in docs
Now that Go 1.17.x is out,
we no longer need to worry about users on Go 1.15.x.

Since Go 1.16, the best way to install programs has been "go install":
https://golang.org/doc/go1.16#go-command

This method does not interfere with the current module,
and allows selecting a version such as "latest" or "master".
3 years ago
Daniel Martí 1d31a139f5 support aliases as embedded fields in dependencies
Our recent work in fieldToAlias worked well when the embedded field
declaration (using an alias) was in the same package as the use of that
field. We would have the *ast.Ident for the field declaration, so
types.Info.Uses would give us the TypeName for the alias.

Unfortunately, if the declaration was in a dependency package, we did
not have that same *ast.Ident, as we weren't parsing the source code for
dependencies for type-checking. This resulted in us incorrectly
obfuscating the use of such an embedded field:

	> garble build
	[stderr]
	# test/main
	JtzmzxWf.go:4: unknown field 'ExternalForeignAlias' in struct literal of type _BdSNiEL.Vcs_smer

To fix this, look through the direct imports of the package defining the
field to find an alias under the exact same name. Not a foolproof
solution, as there's a TODO, but it should work for most cases.

Fixes the obfuscation of google.golang.org/grpc/internal/status, too.

Updates #349.
3 years ago
Daniel Martí c9b0b07853 hash field names equally in all packages
Packages P1 and P2 can define identical struct types T1 and T2, and one
can convert from type T1 to T2 or vice versa.

The spec defines two identical struct types as:

	Two struct types are identical if they have the same sequence of
	fields, and if corresponding fields have the same names, and
	identical types, and identical tags. Non-exported field names
	from different packages are always different.

Unfortunately, garble broke this: since we obfuscated field names
differently depending on the package, cross-package conversions like the
case above would result in typechecking errors.

To fix this, implement Joe Tsai's idea: hash struct field names with the
string representation of the entire struct. This way, identical struct
types will have their field names obfuscated in the same way in all
packages across a build.

Note that we had to refactor "reverse" a bit to start using transformer,
since now it needs to keep track of struct types as well.

This failure was affecting the build of google.golang.org/protobuf,
since it makes regular use of cross-package struct conversions.

Note that the protobuf module still fails to build, but for other
reasons. The package that used to fail now succeeds, so the build gets a
bit further than before. #240 tracks adding relevant third-party Go
modules to CI, so we'll track the other remaining failures there.

Fixes #310.
3 years ago
Daniel Martí 3afc993266 use "go env -json" to collect env info all at once
In the worst case scenario, when GOPRIVATE isn't set at all, we would
run these three commands:

* "go env GOPRIVATE", to fetch GOPRIVATE itself
* "go list -m", for GOPRIVATE's fallback
* "go version", to check the version of Go being used

Now that we support Go 1.16 and later, all these three can be obtained
via "go env -json":

	$ go env -json GOPRIVATE GOMOD GOVERSION
	{
		"GOMOD": "/home/mvdan/src/garble/go.mod",
		"GOPRIVATE": "",
		"GOVERSION": "go1.16.3"
	}

Note that we don't get the module path directly, but we can use the
x/mod/modfile Go API to parse it from the GOMOD file cheaply.

Notably, this also simplifies our Go version checking logic, as now we
get just the version string without the "go version" prefix and
"GOOS/GOARCH" suffix we don't care about.

This makes our code a bit more maintainable and robust. When running a
short incremental build, we can also see a small speed-up, as saving two
"go" invocations can save a few milliseconds:

	name           old time/op       new time/op       delta
	Build/Cache-8        168ms ± 0%        166ms ± 1%  -1.26%  (p=0.009 n=6+6)

	name           old bin-B         new bin-B         delta
	Build/Cache-8        6.36M ± 0%        6.36M ± 0%  +0.12%  (p=0.002 n=6+6)

	name           old sys-time/op   new sys-time/op   delta
	Build/Cache-8        222ms ± 2%        219ms ± 3%    ~     (p=0.589 n=6+6)

	name           old user-time/op  new user-time/op  delta
	Build/Cache-8        857ms ± 1%        846ms ± 1%  -1.31%  (p=0.041 n=6+6)
3 years ago
Daniel Martí 10ec00b37a
make flags like -literals and GOPRIVATE affect hashing (#288)
In 6898d61637, we switched from using action IDs from "go list
-toolexec=garble" to those from the original "go list". We still wanted
the obfuscation and hashing to change if the version of garble changes,
so we hashed that "original action ID" with garble's own content ID, and
called the new hash "garble action ID".

While working on a different patch, I noticed something weird: with the
new mechanism, adding or removing flags like -literals did not alter
those hashes, unlike the old method. This is because the old method used
ownContentID, which includes such bits of information, but the new
method does not.

Change that, and add a test that locks in the behavior we want. In
seed.txt, we check that a single function name gets hashed in particular
ways in different scenarios.

Note that we use a mix of "cmp" and "! bincmp", since the former has no
negated form.

While at it, the seed.txt test is revamped a bit. Now, we only run with
-literals once, as this test is mainly about -seed. We also declare seed
strings once, as environment variables, which makes it easier to track
what each step is doing.
3 years ago
Daniel Martí 13e4ba2ae0 use "obfuscate" instead of "garble" in some more places
Mainly comments. "garble" refers to the tool, but the verb and adjective
is more intuitive as "obfuscate" and "obfuscated" instead of "garble"
and "garbled".
3 years ago
Andrew LeFevre 24518ceead fix failing to build user packages named "embed" 3 years ago
Daniel Martí 5d74ab07f5 all: replace uses of the deprecated ioutil
Now that we require Go 1.16, we can simplify code by removing ioutil.
3 years ago
Daniel Martí a328a487f8 improve the code comments around -seed
Explain why we print the random seed on failure, and why we accept
base64 padding when we don't use it.

While at it, the flagOptions.Random field is unused, so remove it. It
also seems wrong for it to exist; the random value is only for the seed
flag, which already has a field of its own.
3 years ago
lu4p 810bdf448e don't obfuscate the "embed" import path
Updates #263
3 years ago
Daniel Martí def9351b25
fix and re-enable "garble test" (#268)
With the many refactors building up to v0.1.0, we broke "garble test" as
we no longer dealt with test packages well.

Luckily, now that we can depend on TOOLEXEC_IMPORTPATH, we can support
the test command again, as we can always figure out what package we're
currently compiling, without having to track a "main" package.

Note that one major pitfall there is test packages, where
TOOLEXEC_IMPORTPATH does not agree with ImportPath from "go list -json".
However, we can still work around that with a bit of glue code, which is
also copiously documented.

The second change necessary is to consider test packages private
depending on whether their non-test package is private or not. This can
be done via the ForTest field in "go list -json".

The third change is to obfuscate "_testmain.go" files, which are the
code-generated main functions which actually run tests. We used to not
need to obfuscate them, since test function names are never obfuscated
and we used to not obfuscate import paths at compilation time. Now we do
rewrite import paths, so we must do that for "_testmain.go" too.

The fourth change is to re-enable test.txt, and expand it with more
sanity checks and edge cases.

Finally, document "garble test" again.

Fixes #241.
3 years ago
Daniel Martí 4e9ee17ec8
refactor "current package" with TOOLEXEC_IMPORTPATH (#266)
Now that we've dropped support for Go 1.15.x, we can finally rely on
this environment variable for toolexec calls, present in Go 1.16.

Before, we had hacky ways of trying to figure out the current package's
import path, mostly from the -p flag. The biggest rough edge there was
that, for main packages, that was simply the package name, and not its
full import path.

To work around that, we had a restriction on a single main package, so
we could work around that issue. That restriction is now gone.

The new code is simpler, especially because we can set curPkg in a
single place for all toolexec transform funcs.

Since we can always rely on curPkg not being nil now, we can also start
reusing listedPackage.Private and avoid the majority of repeated calls
to isPrivate. The function is cheap, but still not free.

isPrivate itself can also get simpler. We no longer have to worry about
the "main" edge case. Plus, the sanity check for invalid package paths
is now unnecessary; we only got malformed paths from goobj2, and we now
require exact matches with the ImportPath field from "go list -json".

Another effect of clearing up the "main" edge case is that -debugdir now
uses the right directory for main packages. We also start using
consistent debugdir paths in the tests, for the sake of being easier to
read and maintain.

Finally, note that commandReverse did not need the extra call to "go
list -toolexec", as the "shared" call stored in the cache is enough. We
still call toolexecCmd to get said cache, which should probably be
simplified in a future PR.

While at it, replace the use of the "-std" compiler flag with the
Standard field from "go list -json".
3 years ago
Daniel Martí ff0bea73b5
all: drop support for Go 1.15.x (#265)
This mainly cleans up the few bits of code where we explicitly kept
support for Go 1.15.x. With v0.1.0 released, we can drop support now,
since the next v0.2.0 release will only support Go 1.16.x.

Also updates all modules, including test ones, to 'go 1.16'.

Note that the TOOLEXEC_IMPORTPATH refactor is not done here, despite all
the TODOs about doing so when we drop 1.15 support. This is because that
refactor needs to be done carefully and might have side effects, so it's
best to keep it to a separate commit.

Finally, update the deps.
3 years ago
Daniel Martí 1267e2eced
fix link errors when importing crypto/ecdsa (#262)
First, we had some link errors such as:

	cannot find package J6OzO8GN (using -importcfg)

This was caused by the code that writes an updated importcfg, which did
not handle import maps well. That code is now fixed, and we also add an
obfuscatedImportPath method for clarity.

Once fixed, we ran into other link errors:

	Pw3g97ww.addVW: relocation target Pw3g97ww.addVWlarge not defined

After some digging, the cause of those is assembly code that we do not
yet support obfuscating. #261 tracks that.

Meanwhile, to fix "GOPRIVATE=* garble build" and to be able to have a
test for the original import path bug, we add the packages which use
that form of assembly code to runtimeRelated - math/big and
crypto/sha512. There might be more, but these were the ones found by
trying to link crypto/tls, a fairly common dependency.

Fixes #256.
3 years ago
Daniel Martí b03cd08c09
avoid one more call to 'go tool buildid' (#253)
We use it to get the content ID of garble's binary, which is used for
both the garble action IDs, as well as 'go tool compile -V=full'.

Since those two happen in separate processes, both used to call 'go tool
buildid' separately. Store it in the gob cache the first time, and reuse
it the second time.

Since each call to cmd/go costs about 10ms (new process, running its
many init funcs, etc), this results in a nice speed-up for our small
benchmark. Most builds will take many seconds though, so note that a
~15ms speedup there will likely not be noticeable.

While at it, simplify the buildInfo global, as now it just contains a
map representation of the -importcfg contents. It now has better names,
docs, and a simpler representation.

We also stop using the term "garbled import", as it was a bit confusing.
"obfuscated types.Package" is a much better description.

	name     old time/op       new time/op       delta
	Build-8        106ms ± 1%         92ms ± 0%  -14.07%  (p=0.010 n=6+4)

	name     old bin-B         new bin-B         delta
	Build-8        6.60M ± 0%        6.60M ± 0%   -0.01%  (p=0.002 n=6+6)

	name     old sys-time/op   new sys-time/op   delta
	Build-8        208ms ± 5%        149ms ± 3%  -28.27%  (p=0.004 n=6+5)

	name     old user-time/op  new user-time/op  delta
	Build-8        433ms ± 3%        384ms ± 3%  -11.35%  (p=0.002 n=6+6)
3 years ago
Daniel Martí 6898d61637
start using original action IDs (#251)
When we obfuscate a name, what we do is hash the name with the action ID
of the package that contains the name. To ensure that the hash changes
if the garble tool changes, we used the action ID of the obfuscated
build, which is different than the original action ID, as we include
garble's own content ID in "go tool compile -V=full" via -toolexec.

Let's call that the "obfuscated action ID". Remember that a content ID
is roughly the hash of a binary or object file, and an action ID
contains the hash of a package's source code plus the content IDs of its
dependencies.

This had the advantage that it did what we wanted. However, it had one
massive drawback: when we compile a package, we only have the obfuscated
action IDs of its dependencies. This is because one can't have the
content ID of dependent packages before they are built.

Usually, this is not a problem, because hashing a foreign name means it
comes from a dependency, where we already have the obfuscated action ID.
However, that's not always the case.

First, go:linkname directives can point to any symbol that ends up in
the binary, even if the package is not a dependency. So garble could
only support linkname targets belonging to dependencies. This is at the
root of why we could not obfuscate the runtime; it contains linkname
directives targeting the net package, for example, which depends on runtime.

Second, some other places did not have an easy access to obfuscated
action IDs, like transformAsm, which had to recover it from a temporary
file stored by transformCompile.

Plus, this was all pretty expensive, as each toolexec sub-process had to
make repeated calls to buildidOf with the object files of dependencies.
We even had to use extra calls to "go list" in the case of indirect
dependencies, as their export files do not appear in importcfg files.

All in all, the old method was complex and expensive. A better mechanism
is to use the original action IDs directly, as listed by "go list"
without garble in the picture.

This would mean that the hashing does not change if garble changes,
meaning weaker obfuscation. To regain that property, we define the
"garble action ID", which is just the original action ID hashed together
with garble's own content ID.

This is practically the same as the obfuscated build ID we used before,
but since it doesn't go through "go tool compile -V=full" and the
obfuscated build itself, we can work out *all* the garble action IDs
upfront, before the obfuscated build even starts.

This fixes all of our problems. Now we know all garble build IDs
upfront, so a bunch of hacks can be entirely removed. Plus, since we
know them upfront, we can also cache them and avoid repeated calls to
"go tool buildid".

While at it, make use of the new BuildID field in Go 1.16's "list -json
-export". This avoids the vast majority of "go tool buildid" calls, as
the only ones that remain are 2 on the garble binary itself.

The numbers for Go 1.16 look very good:

	name     old time/op       new time/op       delta
	Build-8        146ms ± 4%        101ms ± 1%  -31.01%  (p=0.002 n=6+6)

	name     old bin-B         new bin-B         delta
	Build-8        6.61M ± 0%        6.60M ± 0%   -0.09%  (p=0.002 n=6+6)

	name     old sys-time/op   new sys-time/op   delta
	Build-8        321ms ± 7%        202ms ± 6%  -37.11%  (p=0.002 n=6+6)

	name     old user-time/op  new user-time/op  delta
	Build-8        538ms ± 4%        414ms ± 4%  -23.12%  (p=0.002 n=6+6)
3 years ago
Daniel Martí 63c42c3cc7
support typechecking all of std (#236)
There was one bug keeping the command below from working:

	GOPRIVATE='*' garble build std

The bug is rather obscure; I'm still working on a minimal reproducer
that I can submit upstream, and I'm not yet convinced about where the
bug lives and how it can be fixed.

In short, the command would fail with:

	typecheck error: /go/src/crypto/ecdsa/ecdsa.go:122:12: cannot use asn1.SEQUENCE (constant 48 of type asn1.Tag) as asn1.Tag value in argument to b.AddASN1

Note that the error is ambiguous; there are two asn1 packages, but they
are actually mismatching. We can see that by manually adding debug
prints to go/types:

	constant: asn1.SEQUENCE (constant 48 of type golang.org/x/crypto/cryptobyte/asn1.Tag)
	argument type: vendor/golang.org/x/crypto/cryptobyte/asn1.Tag

It's clear that, for some reason, go/types ends up confused and loading
a vendored and non-vendored version of asn1. There also seems to be no
way to work around this with our lookup function, as it just receives an
import path as a parameter, and returns an object file reader.

For now, work around the issue by *not* using a custom lookup function
in this rare edge case involving vendored dependencies in std packages.
The added code has a lengthy comment explaining the reasoning.

I still intend to investigate this further, but there's no reason to
keep garble failing if we can work around the bug.

Fixes #223.
3 years ago