Go code can retrieve and use field and method names via the `reflect` package.
For that reason, historically we did not obfuscate names of fields and methods
underneath types that we detected as used for reflection, via e.g. `reflect.TypeOf`.
However, that caused a number of issues. Since we obfuscate and build one package
at a time, we could only detect when types were used for reflection in their own package
or in upstream packages. Use of reflection in downstream packages would be detected
too late, causing one package to obfuscate the names and the other not to, leading to a build failure.
A different approach is implemented here. All names are obfuscated now, but we collect
those types used for reflection, and at the end of a build in `package main`,
we inject a function into the runtime's `internal/abi` package to reverse the obfuscation
for those names which can be used for reflection.
This does mean that the obfuscation for these names is very weak, as the binary
contains a one-to-one mapping to their original names, but they cannot be obfuscated
without breaking too many Go packages out in the wild. There is also some amount
of overhead in `internal/abi` due to this, but we aim to make the overhead insignificant.
Fixes#884, #799, #817, #881, #858, #843, #842Closes#406
It is true that each garble process only obfuscates up to one package,
which is why we made them globals to begin with.
However, garble does quite a lot more now,
such as reversing the obfuscation of many packages at once.
Having a global "current package" variable makes mistakes easier.
Some funcs, like those in transformFuncs, are now transformer methods.
When obfuscating the following piece of code:
func issue_573(s struct{ f int }) {
var _ *int = &s.f
/*x*/
}
the function body would roughly end up printed as:
we would roughly end up with:
var _ *int = &dZ4xYx3N
/*x*/.rbg1IM3V
Note that the /*x*/ comment got moved earlier in the source code.
This happens because the new identifiers are longer, so the printer
thinks that the selector now ends past the comment.
That would be fine - we don't really mind where comments end up,
because these non-directive comments end up being removed anyway.
However, the resulting syntax is wrong, as the period for the selector
must be on the first line rather than the second.
This is a go/printer bug that we should fix upstream,
but until then, we must work around it in Go 1.18.x and 1.19.x.
The fix is somewhat obvious in hindsight. To reduce the chances that
go/printer will trip over comments and produce invalid syntax,
get rid of most comments before we use the printer.
We still keep the removal of comments after printing,
since go/printer consumes some comments in ast.Node Doc fields.
Add the minimized unit test case above, and add the upstream project
that found this bug to check-third-party.
andybalholm/brotli helps cover a compression algorithm and ccgo code
generation from C to Go, and it's also a fairly popular module,
particular with HTTP implementations which want pure-Go brotli.
While here, fix the check-third-party script: it was setting GOFLAGS
a bit too late, so it may run `go get` on the wrong mod file.
Fixes#573.
It's not a problem to leak filenames like _cgo_gotypes.go,
but it is a problem when it includes the import path:
$ strings main | grep _cgo_gotypes
test/main/_cgo_gotypes.go
Here, "test/main" is the module path, which we want to hide.
We hadn't caught this before because the cgo.txt test did not check that
module paths aren't being leaked - it does now.
The fix is rather simple; we let printFile handle cgo-generated files.
We used to avoid that due to compiler errors, as the compiler only
allows some special cgo comment directives to work in cgo-generated
code, to prevent misuse in user code.
The fix is rather easy: the obfuscated filenames should begin with
"_cgo_" to appease the compiler's check.
printFile is one of the functions to blame for most of the CPU cost and
allocations for garble itself, as reported by `perf record` for a clean build.
One contributor is how we print each file and then parse it again,
which we did for the sake of inserting line directives correctly.
With a bit of care, we can do this by tokenizing after printing,
as opposed to parsing into a full go/ast again.
This is moderately cheaper, but more than anything, allocates far less.
That is to be expected given how go/ast is a tree of pointers,
whereas go/scanner simply gives us a stream of tokens.
name old time/op new time/op delta
Build-16 10.4s ± 2% 10.3s ± 1% ~ (p=0.393 n=10+10)
name old bin-B new bin-B delta
Build-16 5.51M ± 0% 5.51M ± 0% ~ (all equal)
name old cached-time/op new cached-time/op delta
Build-16 398ms ±12% 391ms ±10% ~ (p=0.529 n=10+10)
name old mallocs/op new mallocs/op delta
Build-16 34.4M ± 0% 31.8M ± 0% -7.65% (p=0.000 n=10+10)
name old sys-time/op new sys-time/op delta
Build-16 5.80s ± 6% 5.86s ± 4% ~ (p=0.218 n=10+10)
The new code is shorter, but perhaps a bit trickier,
so I also added more comments to explain what's going on.
Note how the time/op change is practically noise,
but mallocs/op goes down significantly, which is always a good sign.
Now that we've required Go 1.18 or later for some time,
stop supporting `// +build` directives entirely.
That should be fine, given that `go build` will fail too.
The TODO about ToObfuscate is also obsolete; see the added comment.
Finally, tweak the comments a bit after reading them again.
We don't use go/ast.Objects, as we use go/types instead.
Avoiding this work saves a bit of CPU and memory allocs.
name old time/op new time/op delta
Build-16 10.2s ± 1% 10.2s ± 1% ~ (p=0.937 n=6+6)
name old bin-B new bin-B delta
Build-16 5.47M ± 0% 5.47M ± 0% ~ (all equal)
name old cached-time/op new cached-time/op delta
Build-16 328ms ±14% 321ms ± 6% ~ (p=0.589 n=6+6)
name old mallocs/op new mallocs/op delta
Build-16 34.8M ± 0% 34.0M ± 0% -2.26% (p=0.010 n=6+4)
name old sys-time/op new sys-time/op delta
Build-16 5.89s ± 3% 5.89s ± 3% ~ (p=0.937 n=6+6)
See golang/go#52463.
strings.Cut makes some string handling code more intuitive.
Note that we can't use it everywhere, as some places need LastIndexByte.
Start using x/exp/slices, too, which is our first use of generics.
Note that its API is experimental and may still change,
but since we are not a library, we can control its version updates.
I also noticed that we were using TrimSpace for importcfg files.
It's actually unnecessary if we swap strings.SplitAfter for Split,
as the only whitespace present was the trailing newline.
While here, I noticed an unused copy of printfWithoutPackage.
Now that we've released v0.6.0, that will be the last feature release to
feature support for Go 1.17. The upcoming v0.7.0 will be Go 1.18+.
Code-wise, the cleanup here isn't super noticeable,
but it will be easier to work on features like VCS-aware version
information and generics support without worrying about Go 1.17.
Plus, now CI is back to being much faster.
Note how "go 1.18" in go.mod makes "go mod tidy" more aggressive.
The default behavior of garble is to seed via the build inputs,
including the build IDs of the entire Go build of each package.
This works well as a default, and does give us determinism,
but it means that building for different platforms
will result in different obfuscation per platform.
Instead, when -seed is provided, don't use any other hash seed or salt.
This means that a particular Go name will be obfuscated the same way
as long as the seed, package path, and name itself remain constant.
In other words, when the user supplies a custom -seed,
we assume they know what they're doing in terms of storage and rotation.
Expand the README docs with more examples and detail.
Fixes#449.
Reuse buffers, so they only grow once.
Join a fmt.Sprintf with a string addition.
name old time/op new time/op delta
Build-16 9.39s ± 2% 9.28s ± 2% ~ (p=0.548 n=5+5)
name old bin-B new bin-B delta
Build-16 5.16M ± 0% 5.16M ± 0% ~ (all equal)
name old cached-time/op new cached-time/op delta
Build-16 324ms ± 6% 317ms ± 2% ~ (p=0.421 n=5+5)
name old mallocs/op new mallocs/op delta
Build-16 29.4M ± 0% 29.3M ± 0% -0.25% (p=0.008 n=5+5)
name old sys-time/op new sys-time/op delta
Build-16 4.66s ± 2% 4.59s ± 2% ~ (p=0.310 n=5+5)
The parent garble process parses the original flags,
as provided by the user via the command line.
Previously, those got stored in the shared cache file,
so that child processes spawned by toolexec could see them.
Unfortunately, this made the code relatively easy to misuse.
A child process would always see flagLiterals as zero value,
given that it should never see such a flag argument directly.
Similarly, one would have to be careful with cached options,
as they could only be consumed after the cache file is loaded.
Simplify the situation by deduplicating the storage of flags.
Now, the parent passes all flags onto children via toolexec.
One exception is GarbleDir, which now becomes an env var.
This seems in line with other top-level dirs like GARBLE_SHARED.
Finally, we turn -seed into a flag.Value,
which lets us implement its "set" behavior as part of flag.Parse.
Overall, we barely reduce the amount of code involved,
but we certainly remove a couple of footguns.
As part of the cleanup, we also introduce appendFlags.
Piggybacking off of GOPRIVATE is great for a number of reasons:
* People tend to obfuscate private code, whose package paths will
generally be in GOPRIVATE already
* Its meaning and syntax are well understood
* It allows all the flexibility we need without adding our own env var
or config option
However, using GOPRIVATE directly has one main drawback.
It's fairly common to also want to obfuscate public dependencies,
to make the code in private packages even harder to follow.
However, using "GOPRIVATE=*" will result in two main downsides:
* GONOPROXY defaults to GOPRIVATE, so the proxy would be entirely disabled.
Downloading modules, such as when adding or updating dependencies,
or when the local cache is cold, can be less reliable.
* GONOSUMDB defaults to GOPRIVATE, so the sumdb would be entirely disabled.
Adding entries to go.sum, such as when adding or updating dependencies,
can be less secure.
We will continue to consume GOPRIVATE as a fallback,
but we now expect users to set GOGARBLE instead.
The new logic is documented in the README.
While here, rewrite some uses of "private" with "to obfuscate",
to make the code easier to follow and harder to misunderstand.
Fixes#276.
Remove some asthelper APIs that haven't been used for some time.
They can be recovered from the git history if needed again.
One type assertion in the literals package is always true.
Embedded field objects are handled near the top of transformGo, so the
extra !obj.Embedded() check was always true. Remove it.
We always obfuscate standalone funcs now, so the obfuscatedTypesPackage
check is no longer necessary. This was necessary when we used to not
obfuscate func names when they were used in linkname directives.
The workaround for test package imports in obfuscatedTypesPackage I had
to add a few commits ago no longer seems to be necessary. This might be
thanks to the simplification with functions in the paragraph just above.
It's impossible to run garble without -trimpath nowadays, as we error
before the build even starts:
$ go build -toolexec=garble
go tool compile: exit status 1
cannot open shared file, this is most likely due to not running "garble [command]"
When run as "garble build", the trimpath flag is always set. So the
check in alterTrimpath never triggers anymore, and couldn't be tested.
Finally, simplify the handling of comment syntax in printFile, and add a
few TODOs for other code paths not covered by our existing tests.
Total code coverage is up from 90.3% to 91.0%.
In printFile, we print and re-parse the modified AST to be able to have
reliable position information.
The re-parsing step can fail if something goes very wrong, such as a bug
in -literals. It should generally not happen. However, in rare cases it
has happened, and it's confusing for the end user to see syntax errors
pointing at an existing file on disk, when the code doesn't align
- since we're on a modified copy.
To prevent such confusion, use an empty filename. Syntax errors will
still not be terribly helpful, but they should be extremely rare and
promptly fixed, so that's not a huge concern.
For that same reason, we can't really add a good test here. We could
perhaps add a test that forces garble to mess up the src slice in some
way, but that would be a weird test, and not particularly worth it.
Fixes#286.
Just two minor tweaks were necessary to get "go test" to pass on:
go version devel go1.17-a25c584629 Tue Apr 6 04:48:09 2021 +0000 linux/amd64
Re-enable the CI for it, too. The config needed changing since the
set-env and add-path commands now use special files instead, due to some
security issues uncovered last winter.
It's possible that CI on master could suddenly break, if Go master
changes in some substantial way that requires more tweaks. If that turns
out to be an issue pretty often, we could always pin a specific git repo
commit and update it every few weeks.
The regular obfuscation process simply modifies some simple nodes, such
as identifiers and strings. In those cases, we modify the nodes
in-place, meaning that their positions remain the same. This hasn't
caused any problems.
Literal obfuscation is trickier. Since we replace one expression with an
entirely different one, we use cursor.Replace. The new expression is
entirely made up on the spot, so it lacks position information.
This was causing problems. For example, in the added test input:
> garble -literals build
[stderr]
# test/main
dgcm4t6w.go:3: misplaced compiler directive
dgcm4t6w.go:4: misplaced compiler directive
dgcm4t6w.go:3: misplaced compiler directive
dgcm4t6w.go:6: misplaced compiler directive
dgcm4t6w.go:7: misplaced compiler directive
dgcm4t6w.go:3: misplaced compiler directive
dgcm4t6w.go:9: misplaced compiler directive
dgcm4t6w.go:3: misplaced compiler directive
dgcm4t6w.go:3: too many errors
The build errors are because we'd move the compiler directives, which
makes the compiler unhappy as they must be directly followed by a
function declaration.
The root cause there seems to be that, since the replacement nodes lack
position information, go/printer would try to estimate its printing
position by adding to the last known position. Since -literals adds
code, this would result in the printer position increasing rapidly, and
potentially printing directive comments earlier than needed.
For now, making the replacement nodes have the same position as the
original node seems to stop go/printer from making this mistake.
It's possible that this workaround won't be bulletproof forever, but it
works well for now, and I don't see a simpler workaround right now.
It would be possible to use fancier mechanisms like go/ast.CommentMap or
dave/dst, but those are a significant amount of added complexity as well.
Fixes#285.
Fix up a few TODOs, and simplify the way we handle comments.
We now add whitespace around inline /*line*/ directives, to ensure we
don't break programs. A test case is added too.
We now add line directives to call sites, not function declarations,
since those are what actually shows up in stack traces.
It's unclear if we care about any other lines inside functions at all.
This also fixes reversing with -literals, since that feature adds a
significant amount of code which shuffles line numbers around.
Finally, we extend the tests with types, methods, and anonymous
functions, and we make all of them work well.
Updates #5.
In particular, the positions within function declarations, including the
positions of call sites to other functions.
Note that this isn't well tested just yet, particularly not with other
features like -literals. We can extend the tests and code over time.
This gets us the core basics.
The issue will be closed once the feature is documented for users, in a
follow-up PR.
Updates #5.
First, rename line_obfuscator.go to position.go. We obfuscate filenames,
not just line numbers, and "obfuscator" is a bit redundant.
Second, use "/*line :x*/" comments rather than the "//line :x" form, as
the former allows us to insert them in any position without adding
unnecessary newlines. This will be important for changing the position
of call sites, which will be important for "garble reverse".
Third, do not rely on go/ast to remove and add comments. Since they are
free-floating, we can very easily end up with misplaced comments,
especially as the literal obfuscator heavily modifies the AST.
The new method prints and re-parses the file, to ensure all node
positions are consistent with a buffer, buf1. Then, we copy the contents
into a new buffer, buf2, while inserting the comments that we need.
The new method also modifies line numbers at the very end of obfuscating
a Go file, instead of at the very beginning. That's going to be more
robust long-term, as we will also obfuscate line numbers for any
additions or modifications to the AST.
Fourth, detachedDirectives is unnecessary, as we can accomplish the same
with two simple prefix matches.
Finally, this means we can stop using detachedComments entirely, as
printFile already inserts the comments we need.
For #5.