|
|
|
|
// Copyright (c) 2019, The Garble Authors.
|
|
|
|
|
// See LICENSE for licensing information.
|
|
|
|
|
|
|
|
|
|
package main
|
|
|
|
|
|
|
|
|
|
import (
|
|
|
|
|
"bytes"
|
unify the definition and storage of flag values
The parent garble process parses the original flags,
as provided by the user via the command line.
Previously, those got stored in the shared cache file,
so that child processes spawned by toolexec could see them.
Unfortunately, this made the code relatively easy to misuse.
A child process would always see flagLiterals as zero value,
given that it should never see such a flag argument directly.
Similarly, one would have to be careful with cached options,
as they could only be consumed after the cache file is loaded.
Simplify the situation by deduplicating the storage of flags.
Now, the parent passes all flags onto children via toolexec.
One exception is GarbleDir, which now becomes an env var.
This seems in line with other top-level dirs like GARBLE_SHARED.
Finally, we turn -seed into a flag.Value,
which lets us implement its "set" behavior as part of flag.Parse.
Overall, we barely reduce the amount of code involved,
but we certainly remove a couple of footguns.
As part of the cleanup, we also introduce appendFlags.
3 years ago
|
|
|
|
"crypto/rand"
|
|
|
|
|
"encoding/base64"
|
|
|
|
|
"encoding/binary"
|
|
|
|
|
"encoding/gob"
|
use "go env -json" to collect env info all at once
In the worst case scenario, when GOPRIVATE isn't set at all, we would
run these three commands:
* "go env GOPRIVATE", to fetch GOPRIVATE itself
* "go list -m", for GOPRIVATE's fallback
* "go version", to check the version of Go being used
Now that we support Go 1.16 and later, all these three can be obtained
via "go env -json":
$ go env -json GOPRIVATE GOMOD GOVERSION
{
"GOMOD": "/home/mvdan/src/garble/go.mod",
"GOPRIVATE": "",
"GOVERSION": "go1.16.3"
}
Note that we don't get the module path directly, but we can use the
x/mod/modfile Go API to parse it from the GOMOD file cheaply.
Notably, this also simplifies our Go version checking logic, as now we
get just the version string without the "go version" prefix and
"GOOS/GOARCH" suffix we don't care about.
This makes our code a bit more maintainable and robust. When running a
short incremental build, we can also see a small speed-up, as saving two
"go" invocations can save a few milliseconds:
name old time/op new time/op delta
Build/Cache-8 168ms ± 0% 166ms ± 1% -1.26% (p=0.009 n=6+6)
name old bin-B new bin-B delta
Build/Cache-8 6.36M ± 0% 6.36M ± 0% +0.12% (p=0.002 n=6+6)
name old sys-time/op new sys-time/op delta
Build/Cache-8 222ms ± 2% 219ms ± 3% ~ (p=0.589 n=6+6)
name old user-time/op new user-time/op delta
Build/Cache-8 857ms ± 1% 846ms ± 1% -1.31% (p=0.041 n=6+6)
4 years ago
|
|
|
|
"encoding/json"
|
fail if we are unexpectedly overwriting files (#418)
While investigating a bug report,
I noticed that garble was writing to the same temp file twice.
At best, writing to the same path on disk twice is wasteful,
as the design is careful to be deterministic and use unique paths.
At worst, the two writes could cause races at the filesystem level.
To prevent either of those situations,
we now create files with os.OpenFile and os.O_EXCL,
meaning that we will error if the file already exists.
That change uncovered a number of such unintended cases.
First, transformAsm would write obfuscated Go files twice.
This is because the Go toolchain actually runs:
[...]/asm -gensymabis [...] foo.s bar.s
[...]/asm [...] foo.s bar.s
That is, the first run is only meant to generate symbol ABIs,
which are then used by the compiler.
We need to obfuscate at that first stage,
because the symbol ABI descriptions need to use obfuscated names.
However, having already obfuscated the assembly on the first stage,
there is no need to do so again on the second stage.
If we detect gensymabis is missing, we simply reuse the previous files.
This first situation doesn't seem racy,
but obfuscating the Go assembly files twice is certainly unnecessary.
Second, saveKnownReflectAPIs wrote a gob file to the build cache.
Since the build cache can be kept between builds,
and since the build cache uses reproducible paths for each build,
running the same "garble build" twice could overwrite those files.
This could actually cause races at the filesystem level;
if two concurrent builds write to the same gob file on disk,
one of them could end up using a partially-written file.
Note that this is the only of the three cases not using temporary files.
As such, it is expected that the file may already exist.
In such a case, we simply avoid overwriting it rather than failing.
Third, when "garble build -a" was used,
and when we needed an export file not listed in importcfg,
we would end up calling roughly:
go list -export -toolexec=garble -a <dependency>
This meant we would re-build and re-obfuscate those packages.
Which is unfortunate, because the parent process already did via:
go build -toolexec=garble -a <main>
The repeated dependency builds tripped the new os.O_EXCL check,
as we would try to overwrite the same obfuscated Go files.
Beyond being wasteful, this could again cause subtle filesystem races.
To fix the problem, avoid passing flags like "-a" to nested go commands.
Overall, we should likely be using safer ways to write to disk,
be it via either atomic writes or locked files.
However, for now, catching duplicate writes is a big step.
I have left a self-assigned TODO for further improvements.
CI on the pull request found a failure on test-gotip.
The failure reproduces on master, so it seems to be related to gotip,
and not a regression introduced by this change.
For now, disable test-gotip until we can investigate.
3 years ago
|
|
|
|
"errors"
|
|
|
|
|
"flag"
|
|
|
|
|
"fmt"
|
|
|
|
|
"go/ast"
|
|
|
|
|
"go/importer"
|
|
|
|
|
"go/parser"
|
|
|
|
|
"go/token"
|
|
|
|
|
"go/types"
|
|
|
|
|
"io"
|
fail if we are unexpectedly overwriting files (#418)
While investigating a bug report,
I noticed that garble was writing to the same temp file twice.
At best, writing to the same path on disk twice is wasteful,
as the design is careful to be deterministic and use unique paths.
At worst, the two writes could cause races at the filesystem level.
To prevent either of those situations,
we now create files with os.OpenFile and os.O_EXCL,
meaning that we will error if the file already exists.
That change uncovered a number of such unintended cases.
First, transformAsm would write obfuscated Go files twice.
This is because the Go toolchain actually runs:
[...]/asm -gensymabis [...] foo.s bar.s
[...]/asm [...] foo.s bar.s
That is, the first run is only meant to generate symbol ABIs,
which are then used by the compiler.
We need to obfuscate at that first stage,
because the symbol ABI descriptions need to use obfuscated names.
However, having already obfuscated the assembly on the first stage,
there is no need to do so again on the second stage.
If we detect gensymabis is missing, we simply reuse the previous files.
This first situation doesn't seem racy,
but obfuscating the Go assembly files twice is certainly unnecessary.
Second, saveKnownReflectAPIs wrote a gob file to the build cache.
Since the build cache can be kept between builds,
and since the build cache uses reproducible paths for each build,
running the same "garble build" twice could overwrite those files.
This could actually cause races at the filesystem level;
if two concurrent builds write to the same gob file on disk,
one of them could end up using a partially-written file.
Note that this is the only of the three cases not using temporary files.
As such, it is expected that the file may already exist.
In such a case, we simply avoid overwriting it rather than failing.
Third, when "garble build -a" was used,
and when we needed an export file not listed in importcfg,
we would end up calling roughly:
go list -export -toolexec=garble -a <dependency>
This meant we would re-build and re-obfuscate those packages.
Which is unfortunate, because the parent process already did via:
go build -toolexec=garble -a <main>
The repeated dependency builds tripped the new os.O_EXCL check,
as we would try to overwrite the same obfuscated Go files.
Beyond being wasteful, this could again cause subtle filesystem races.
To fix the problem, avoid passing flags like "-a" to nested go commands.
Overall, we should likely be using safer ways to write to disk,
be it via either atomic writes or locked files.
However, for now, catching duplicate writes is a big step.
I have left a self-assigned TODO for further improvements.
CI on the pull request found a failure on test-gotip.
The failure reproduces on master, so it seems to be related to gotip,
and not a regression introduced by this change.
For now, disable test-gotip until we can investigate.
3 years ago
|
|
|
|
"io/fs"
|
|
|
|
|
"log"
|
|
|
|
|
mathrand "math/rand"
|
|
|
|
|
"os"
|
|
|
|
|
"os/exec"
|
|
|
|
|
"path/filepath"
|
|
|
|
|
"regexp"
|
|
|
|
|
"runtime"
|
|
|
|
|
"runtime/debug"
|
reimplement import path obfuscation without goobj2 (#242)
We used to rely on a parallel implementation of an object file parser
and writer to be able to obfuscate import paths. After compiling each
package, we would parse the object file, replace the import paths, and
write the updated object file in-place.
That worked well, in most cases. Unfortunately, it had some flaws:
* Complexity. Even when most of the code is maintained in a separate
module, the import_obfuscation.go file was still close to a thousand
lines of code.
* Go compatibility. The object file format changes between Go releases,
so we were supporting Go 1.15, but not 1.16. Fixing the object file
package to work with 1.16 would probably break 1.15 support.
* Bugs. For example, we recently had to add a workaround for #224, since
import paths containing dots after the domain would end up escaped.
Another example is #190, which seems to be caused by the object file
parser or writer corrupting the compiled code and causing segfaults in
some rare edge cases.
Instead, let's drop that method entirely, and force the compiler and
linker to do the work for us. The steps necessary when compiling a
package to obfuscate are:
1) Replace its "package foo" lines with the obfuscated package path. No
need to separate the package path and name, since the obfuscated path
does not contain slashes.
2) Replace the "-p pkg/foo" flag with the obfuscated path.
3) Replace the "import" spec lines with the obfuscated package paths,
for those dependencies which were obfuscated.
4) Replace the "-importcfg [...]" file with a version that uses the
obfuscated paths instead.
The linker also needs that last step, since it also uses an importcfg
file to find object files.
There are three noteworthy drawbacks to this new method:
1) Since we no longer write object files, we can't use them to store
data to be cached. As such, the -debugdir flag goes back to using the
"-a" build flag to always rebuild all packages. On the plus side,
that caching didn't work very well; see #176.
2) The package name "main" remains in all declarations under it, not
just "func main", since we can only rename entire packages. This
seems fine, as it gives little information to the end user.
3) The -tiny mode no longer sets all lines to 0, since it did that by
modifying object files. As a temporary measure, we instead set all
top-level declarations to be on line 1. A TODO is added to hopefully
improve this again in the near future.
The upside is that we get rid of all the issues mentioned before. Plus,
garble now nearly works with Go 1.16, with the exception of two very
minor bugs that look fixable. A follow-up PR will take care of that and
start testing on 1.16.
Fixes #176.
Fixes #190.
4 years ago
|
|
|
|
"strconv"
|
|
|
|
|
"strings"
|
|
|
|
|
"time"
|
|
|
|
|
"unicode"
|
|
|
|
|
"unicode/utf8"
|
|
|
|
|
|
slight simplifications and alloc reductions
Reuse a buffer and a map across loop iterations, because we can.
Make recordTypeDone only track named types, as that is enough to detect
type cycles. Without named types, there can be no cycles.
These two reduce allocs by a fraction of a percent:
name old time/op new time/op delta
Build-16 10.4s ± 2% 10.4s ± 1% ~ (p=0.739 n=10+10)
name old bin-B new bin-B delta
Build-16 5.51M ± 0% 5.51M ± 0% ~ (all equal)
name old cached-time/op new cached-time/op delta
Build-16 391ms ± 9% 407ms ± 7% ~ (p=0.095 n=10+9)
name old mallocs/op new mallocs/op delta
Build-16 34.5M ± 0% 34.4M ± 0% -0.12% (p=0.000 n=10+10)
name old sys-time/op new sys-time/op delta
Build-16 5.87s ± 5% 5.82s ± 5% ~ (p=0.182 n=10+9)
It doesn't seem like much, but remember that these stats are for the
entire set of processes, where garble only accounts for about 10% of the
total wall time when compared to the compiler or linker. So a ~0.1%
decrease globally is still significant.
linkerVariableStrings is also indexed by *types.Var rather than types.Object,
since -ldflags=-X only supports setting the string value of variables.
This shouldn't make a significant difference in terms of allocs,
but at least the map is less prone to confusion with other object types.
To ensure the new code doesn't trip up on non-variables, we add test cases.
Finally, for the sake of clarity, index into the types.Info maps like
Defs and Uses rather than calling ObjectOf if we know whether the
identifier we have is a definition of a name or the use of a defined name.
This isn't better in terms of performance, as ObjectOf is a tiny method,
but just like with linkerVariableStrings before, the new code is clearer.
3 years ago
|
|
|
|
"golang.org/x/exp/maps"
|
|
|
|
|
"golang.org/x/exp/slices"
|
|
|
|
|
"golang.org/x/mod/module"
|
|
|
|
|
"golang.org/x/mod/semver"
|
|
|
|
|
"golang.org/x/tools/go/ast/astutil"
|
|
|
|
|
|
|
|
|
|
"mvdan.cc/garble/internal/literals"
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
var flagSet = flag.NewFlagSet("garble", flag.ContinueOnError)
|
|
|
|
|
|
|
|
|
|
var (
|
unify the definition and storage of flag values
The parent garble process parses the original flags,
as provided by the user via the command line.
Previously, those got stored in the shared cache file,
so that child processes spawned by toolexec could see them.
Unfortunately, this made the code relatively easy to misuse.
A child process would always see flagLiterals as zero value,
given that it should never see such a flag argument directly.
Similarly, one would have to be careful with cached options,
as they could only be consumed after the cache file is loaded.
Simplify the situation by deduplicating the storage of flags.
Now, the parent passes all flags onto children via toolexec.
One exception is GarbleDir, which now becomes an env var.
This seems in line with other top-level dirs like GARBLE_SHARED.
Finally, we turn -seed into a flag.Value,
which lets us implement its "set" behavior as part of flag.Parse.
Overall, we barely reduce the amount of code involved,
but we certainly remove a couple of footguns.
As part of the cleanup, we also introduce appendFlags.
3 years ago
|
|
|
|
flagLiterals bool
|
|
|
|
|
flagTiny bool
|
|
|
|
|
flagDebug bool
|
unify the definition and storage of flag values
The parent garble process parses the original flags,
as provided by the user via the command line.
Previously, those got stored in the shared cache file,
so that child processes spawned by toolexec could see them.
Unfortunately, this made the code relatively easy to misuse.
A child process would always see flagLiterals as zero value,
given that it should never see such a flag argument directly.
Similarly, one would have to be careful with cached options,
as they could only be consumed after the cache file is loaded.
Simplify the situation by deduplicating the storage of flags.
Now, the parent passes all flags onto children via toolexec.
One exception is GarbleDir, which now becomes an env var.
This seems in line with other top-level dirs like GARBLE_SHARED.
Finally, we turn -seed into a flag.Value,
which lets us implement its "set" behavior as part of flag.Parse.
Overall, we barely reduce the amount of code involved,
but we certainly remove a couple of footguns.
As part of the cleanup, we also introduce appendFlags.
3 years ago
|
|
|
|
flagDebugDir string
|
|
|
|
|
flagSeed seedFlag
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
func init() {
|
|
|
|
|
flagSet.Usage = usage
|
unify the definition and storage of flag values
The parent garble process parses the original flags,
as provided by the user via the command line.
Previously, those got stored in the shared cache file,
so that child processes spawned by toolexec could see them.
Unfortunately, this made the code relatively easy to misuse.
A child process would always see flagLiterals as zero value,
given that it should never see such a flag argument directly.
Similarly, one would have to be careful with cached options,
as they could only be consumed after the cache file is loaded.
Simplify the situation by deduplicating the storage of flags.
Now, the parent passes all flags onto children via toolexec.
One exception is GarbleDir, which now becomes an env var.
This seems in line with other top-level dirs like GARBLE_SHARED.
Finally, we turn -seed into a flag.Value,
which lets us implement its "set" behavior as part of flag.Parse.
Overall, we barely reduce the amount of code involved,
but we certainly remove a couple of footguns.
As part of the cleanup, we also introduce appendFlags.
3 years ago
|
|
|
|
flagSet.BoolVar(&flagLiterals, "literals", false, "Obfuscate literals such as strings")
|
|
|
|
|
flagSet.BoolVar(&flagTiny, "tiny", false, "Optimize for binary size, losing some ability to reverse the process")
|
|
|
|
|
flagSet.BoolVar(&flagDebug, "debug", false, "Print debug logs to stderr")
|
|
|
|
|
flagSet.StringVar(&flagDebugDir, "debugdir", "", "Write the obfuscated source to a directory, e.g. -debugdir=out")
|
unify the definition and storage of flag values
The parent garble process parses the original flags,
as provided by the user via the command line.
Previously, those got stored in the shared cache file,
so that child processes spawned by toolexec could see them.
Unfortunately, this made the code relatively easy to misuse.
A child process would always see flagLiterals as zero value,
given that it should never see such a flag argument directly.
Similarly, one would have to be careful with cached options,
as they could only be consumed after the cache file is loaded.
Simplify the situation by deduplicating the storage of flags.
Now, the parent passes all flags onto children via toolexec.
One exception is GarbleDir, which now becomes an env var.
This seems in line with other top-level dirs like GARBLE_SHARED.
Finally, we turn -seed into a flag.Value,
which lets us implement its "set" behavior as part of flag.Parse.
Overall, we barely reduce the amount of code involved,
but we certainly remove a couple of footguns.
As part of the cleanup, we also introduce appendFlags.
3 years ago
|
|
|
|
flagSet.Var(&flagSeed, "seed", "Provide a base64-encoded seed, e.g. -seed=o9WDTZ4CN4w\nFor a random seed, provide -seed=random")
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
var rxGarbleFlag = regexp.MustCompile(`-(?:literals|tiny|debug|debugdir|seed)(?:$|=)`)
|
|
|
|
|
|
unify the definition and storage of flag values
The parent garble process parses the original flags,
as provided by the user via the command line.
Previously, those got stored in the shared cache file,
so that child processes spawned by toolexec could see them.
Unfortunately, this made the code relatively easy to misuse.
A child process would always see flagLiterals as zero value,
given that it should never see such a flag argument directly.
Similarly, one would have to be careful with cached options,
as they could only be consumed after the cache file is loaded.
Simplify the situation by deduplicating the storage of flags.
Now, the parent passes all flags onto children via toolexec.
One exception is GarbleDir, which now becomes an env var.
This seems in line with other top-level dirs like GARBLE_SHARED.
Finally, we turn -seed into a flag.Value,
which lets us implement its "set" behavior as part of flag.Parse.
Overall, we barely reduce the amount of code involved,
but we certainly remove a couple of footguns.
As part of the cleanup, we also introduce appendFlags.
3 years ago
|
|
|
|
type seedFlag struct {
|
|
|
|
|
random bool
|
|
|
|
|
bytes []byte
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
func (f seedFlag) present() bool { return len(f.bytes) > 0 }
|
|
|
|
|
|
unify the definition and storage of flag values
The parent garble process parses the original flags,
as provided by the user via the command line.
Previously, those got stored in the shared cache file,
so that child processes spawned by toolexec could see them.
Unfortunately, this made the code relatively easy to misuse.
A child process would always see flagLiterals as zero value,
given that it should never see such a flag argument directly.
Similarly, one would have to be careful with cached options,
as they could only be consumed after the cache file is loaded.
Simplify the situation by deduplicating the storage of flags.
Now, the parent passes all flags onto children via toolexec.
One exception is GarbleDir, which now becomes an env var.
This seems in line with other top-level dirs like GARBLE_SHARED.
Finally, we turn -seed into a flag.Value,
which lets us implement its "set" behavior as part of flag.Parse.
Overall, we barely reduce the amount of code involved,
but we certainly remove a couple of footguns.
As part of the cleanup, we also introduce appendFlags.
3 years ago
|
|
|
|
func (f seedFlag) String() string {
|
|
|
|
|
return base64.RawStdEncoding.EncodeToString(f.bytes)
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
func (f *seedFlag) Set(s string) error {
|
|
|
|
|
if s == "random" {
|
|
|
|
|
f.bytes = make([]byte, 16) // random 128 bit seed
|
|
|
|
|
if _, err := rand.Read(f.bytes); err != nil {
|
|
|
|
|
return fmt.Errorf("error generating random seed: %v", err)
|
|
|
|
|
}
|
|
|
|
|
} else {
|
|
|
|
|
// We expect unpadded base64, but to be nice, accept padded
|
|
|
|
|
// strings too.
|
|
|
|
|
s = strings.TrimRight(s, "=")
|
|
|
|
|
seed, err := base64.RawStdEncoding.DecodeString(s)
|
|
|
|
|
if err != nil {
|
|
|
|
|
return fmt.Errorf("error decoding seed: %v", err)
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if len(seed) < 8 {
|
|
|
|
|
return fmt.Errorf("-seed needs at least 8 bytes, have %d", len(seed))
|
|
|
|
|
}
|
|
|
|
|
f.bytes = seed
|
|
|
|
|
}
|
|
|
|
|
return nil
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
func usage() {
|
|
|
|
|
fmt.Fprintf(os.Stderr, `
|
|
|
|
|
Garble obfuscates Go code by wrapping the Go toolchain.
|
|
|
|
|
|
|
|
|
|
garble [garble flags] command [go flags] [go arguments]
|
|
|
|
|
|
|
|
|
|
For example, to build an obfuscated program:
|
|
|
|
|
|
|
|
|
|
garble build ./cmd/foo
|
|
|
|
|
|
|
|
|
|
Similarly, to combine garble flags and Go build flags:
|
|
|
|
|
|
|
|
|
|
garble -literals build -tags=purego ./cmd/foo
|
|
|
|
|
|
|
|
|
|
The following commands are supported:
|
|
|
|
|
|
|
|
|
|
build replace "go build"
|
|
|
|
|
test replace "go test"
|
|
|
|
|
reverse de-obfuscate output such as stack traces
|
|
|
|
|
version print the version and build settings of the garble binary
|
|
|
|
|
|
|
|
|
|
To learn more about a command, run "garble help <command>".
|
fix and re-enable "garble test" (#268)
With the many refactors building up to v0.1.0, we broke "garble test" as
we no longer dealt with test packages well.
Luckily, now that we can depend on TOOLEXEC_IMPORTPATH, we can support
the test command again, as we can always figure out what package we're
currently compiling, without having to track a "main" package.
Note that one major pitfall there is test packages, where
TOOLEXEC_IMPORTPATH does not agree with ImportPath from "go list -json".
However, we can still work around that with a bit of glue code, which is
also copiously documented.
The second change necessary is to consider test packages private
depending on whether their non-test package is private or not. This can
be done via the ForTest field in "go list -json".
The third change is to obfuscate "_testmain.go" files, which are the
code-generated main functions which actually run tests. We used to not
need to obfuscate them, since test function names are never obfuscated
and we used to not obfuscate import paths at compilation time. Now we do
rewrite import paths, so we must do that for "_testmain.go" too.
The fourth change is to re-enable test.txt, and expand it with more
sanity checks and edge cases.
Finally, document "garble test" again.
Fixes #241.
4 years ago
|
|
|
|
|
|
|
|
|
garble accepts the following flags before a command:
|
|
|
|
|
|
|
|
|
|
`[1:])
|
|
|
|
|
flagSet.PrintDefaults()
|
|
|
|
|
fmt.Fprintf(os.Stderr, `
|
|
|
|
|
|
|
|
|
|
For more information, see https://github.com/burrowers/garble.
|
|
|
|
|
`[1:])
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
func main() { os.Exit(main1()) }
|
|
|
|
|
|
|
|
|
|
var (
|
|
|
|
|
fset = token.NewFileSet()
|
|
|
|
|
sharedTempDir = os.Getenv("GARBLE_SHARED")
|
unify the definition and storage of flag values
The parent garble process parses the original flags,
as provided by the user via the command line.
Previously, those got stored in the shared cache file,
so that child processes spawned by toolexec could see them.
Unfortunately, this made the code relatively easy to misuse.
A child process would always see flagLiterals as zero value,
given that it should never see such a flag argument directly.
Similarly, one would have to be careful with cached options,
as they could only be consumed after the cache file is loaded.
Simplify the situation by deduplicating the storage of flags.
Now, the parent passes all flags onto children via toolexec.
One exception is GarbleDir, which now becomes an env var.
This seems in line with other top-level dirs like GARBLE_SHARED.
Finally, we turn -seed into a flag.Value,
which lets us implement its "set" behavior as part of flag.Parse.
Overall, we barely reduce the amount of code involved,
but we certainly remove a couple of footguns.
As part of the cleanup, we also introduce appendFlags.
3 years ago
|
|
|
|
parentWorkDir = os.Getenv("GARBLE_PARENT_WORK")
|
|
|
|
|
|
simplify globals, split hash.go (#191)
The previous globals worked, but were unnecessarily complex. For
example, we passed the fromPath variable around, but it's really a
static global, since we only compile or link a single package in each Go
process. Use such global variables instead of passing them around, which
currently include the package's import path, its build ID, and its
import config path.
Also split all the hashing and build ID code into hash.go, since that's
a relatively well contained 200 lines of code that doesn't need to make
main.go any bigger. We also split the code to alter Go's own version to
a separate function, so that it can be moved out of main.go as well.
4 years ago
|
|
|
|
// origImporter is a go/types importer which uses the original versions
|
|
|
|
|
// of packages, without any obfuscation. This is helpful to make
|
|
|
|
|
// decisions on how to obfuscate our input code.
|
wrap types.Importer to canonicalize import paths
The docs for go/importer.ForCompiler say:
The lookup function is called each time the resulting importer
needs to resolve an import path. In this mode the importer can
only be invoked with canonical import paths (not relative or
absolute ones); it is assumed that the translation to canonical
import paths is being done by the client of the importer.
We use a lookup func for two reasons: first, to support modules, and
second, to be able to use our information from "go list -json -export".
However, go/types does not canonicalize import paths before calling
ImportFrom. This is somewhat understandable; it doesn't know whether an
importer was created with a lookup func, and ImportFrom only requires
the input path to be canonicalized in that scenario. When the lookup
func is nil, the importer canonicalizes by itself via go/build.Import.
Before this change, the added crossbuild test would fail:
> garble build net/http
[stderr]
# vendor/golang.org/x/crypto/chacha20
typecheck error: /usr/lib/go/src/vendor/golang.org/x/crypto/chacha20/chacha_generic.go:10:2: could not import crypto/cipher (can't find import: "crypto/cipher")
# vendor/golang.org/x/text/secure/bidirule
typecheck error: /usr/lib/go/src/vendor/golang.org/x/text/secure/bidirule/bidirule.go:12:2: could not import errors (can't find import: "errors")
# vendor/golang.org/x/crypto/cryptobyte
typecheck error: /usr/lib/go/src/vendor/golang.org/x/crypto/cryptobyte/asn1.go:8:16: could not import encoding/asn1 (can't find import: "encoding/asn1")
# vendor/golang.org/x/text/unicode/norm
typecheck error: /usr/lib/go/src/vendor/golang.org/x/text/unicode/norm/composition.go:7:8: could not import unicode/utf8 (can't find import: "unicode/utf8")
This is because we'd fall back to importer.Default, which only knows how
to find packages in $GOROOT/pkg. Those are missing for cross-builds,
unsurprisingly, as those built archives end up in the build cache.
After this change, we properly support importing std-vendored packages,
so we can get rid of the importer.Default workaround. And, by extension,
cross-builds now work as well.
Note that, in the added test script, the full build of the binary fails,
as there seems to be some sort of linker problem:
> garble build
[stderr]
# test/main
d9rqJyxo.uoqIiDs5: relocation target runtime.os9A16A3 not defined
We leave that as a TODO for now, as this change is subtle enough as it
is.
4 years ago
|
|
|
|
origImporter = importerWithMap(importer.ForCompiler(fset, "gc", func(path string) (io.ReadCloser, error) {
|
simplify globals, split hash.go (#191)
The previous globals worked, but were unnecessarily complex. For
example, we passed the fromPath variable around, but it's really a
static global, since we only compile or link a single package in each Go
process. Use such global variables instead of passing them around, which
currently include the package's import path, its build ID, and its
import config path.
Also split all the hashing and build ID code into hash.go, since that's
a relatively well contained 200 lines of code that doesn't need to make
main.go any bigger. We also split the code to alter Go's own version to
a separate function, so that it can be moved out of main.go as well.
4 years ago
|
|
|
|
pkg, err := listPackage(path)
|
|
|
|
|
if err != nil {
|
|
|
|
|
return nil, err
|
|
|
|
|
}
|
|
|
|
|
return os.Open(pkg.Export)
|
wrap types.Importer to canonicalize import paths
The docs for go/importer.ForCompiler say:
The lookup function is called each time the resulting importer
needs to resolve an import path. In this mode the importer can
only be invoked with canonical import paths (not relative or
absolute ones); it is assumed that the translation to canonical
import paths is being done by the client of the importer.
We use a lookup func for two reasons: first, to support modules, and
second, to be able to use our information from "go list -json -export".
However, go/types does not canonicalize import paths before calling
ImportFrom. This is somewhat understandable; it doesn't know whether an
importer was created with a lookup func, and ImportFrom only requires
the input path to be canonicalized in that scenario. When the lookup
func is nil, the importer canonicalizes by itself via go/build.Import.
Before this change, the added crossbuild test would fail:
> garble build net/http
[stderr]
# vendor/golang.org/x/crypto/chacha20
typecheck error: /usr/lib/go/src/vendor/golang.org/x/crypto/chacha20/chacha_generic.go:10:2: could not import crypto/cipher (can't find import: "crypto/cipher")
# vendor/golang.org/x/text/secure/bidirule
typecheck error: /usr/lib/go/src/vendor/golang.org/x/text/secure/bidirule/bidirule.go:12:2: could not import errors (can't find import: "errors")
# vendor/golang.org/x/crypto/cryptobyte
typecheck error: /usr/lib/go/src/vendor/golang.org/x/crypto/cryptobyte/asn1.go:8:16: could not import encoding/asn1 (can't find import: "encoding/asn1")
# vendor/golang.org/x/text/unicode/norm
typecheck error: /usr/lib/go/src/vendor/golang.org/x/text/unicode/norm/composition.go:7:8: could not import unicode/utf8 (can't find import: "unicode/utf8")
This is because we'd fall back to importer.Default, which only knows how
to find packages in $GOROOT/pkg. Those are missing for cross-builds,
unsurprisingly, as those built archives end up in the build cache.
After this change, we properly support importing std-vendored packages,
so we can get rid of the importer.Default workaround. And, by extension,
cross-builds now work as well.
Note that, in the added test script, the full build of the binary fails,
as there seems to be some sort of linker problem:
> garble build
[stderr]
# test/main
d9rqJyxo.uoqIiDs5: relocation target runtime.os9A16A3 not defined
We leave that as a TODO for now, as this change is subtle enough as it
is.
4 years ago
|
|
|
|
}).(types.ImporterFrom).ImportFrom)
|
|
|
|
|
|
start using original action IDs (#251)
When we obfuscate a name, what we do is hash the name with the action ID
of the package that contains the name. To ensure that the hash changes
if the garble tool changes, we used the action ID of the obfuscated
build, which is different than the original action ID, as we include
garble's own content ID in "go tool compile -V=full" via -toolexec.
Let's call that the "obfuscated action ID". Remember that a content ID
is roughly the hash of a binary or object file, and an action ID
contains the hash of a package's source code plus the content IDs of its
dependencies.
This had the advantage that it did what we wanted. However, it had one
massive drawback: when we compile a package, we only have the obfuscated
action IDs of its dependencies. This is because one can't have the
content ID of dependent packages before they are built.
Usually, this is not a problem, because hashing a foreign name means it
comes from a dependency, where we already have the obfuscated action ID.
However, that's not always the case.
First, go:linkname directives can point to any symbol that ends up in
the binary, even if the package is not a dependency. So garble could
only support linkname targets belonging to dependencies. This is at the
root of why we could not obfuscate the runtime; it contains linkname
directives targeting the net package, for example, which depends on runtime.
Second, some other places did not have an easy access to obfuscated
action IDs, like transformAsm, which had to recover it from a temporary
file stored by transformCompile.
Plus, this was all pretty expensive, as each toolexec sub-process had to
make repeated calls to buildidOf with the object files of dependencies.
We even had to use extra calls to "go list" in the case of indirect
dependencies, as their export files do not appear in importcfg files.
All in all, the old method was complex and expensive. A better mechanism
is to use the original action IDs directly, as listed by "go list"
without garble in the picture.
This would mean that the hashing does not change if garble changes,
meaning weaker obfuscation. To regain that property, we define the
"garble action ID", which is just the original action ID hashed together
with garble's own content ID.
This is practically the same as the obfuscated build ID we used before,
but since it doesn't go through "go tool compile -V=full" and the
obfuscated build itself, we can work out *all* the garble action IDs
upfront, before the obfuscated build even starts.
This fixes all of our problems. Now we know all garble build IDs
upfront, so a bunch of hacks can be entirely removed. Plus, since we
know them upfront, we can also cache them and avoid repeated calls to
"go tool buildid".
While at it, make use of the new BuildID field in Go 1.16's "list -json
-export". This avoids the vast majority of "go tool buildid" calls, as
the only ones that remain are 2 on the garble binary itself.
The numbers for Go 1.16 look very good:
name old time/op new time/op delta
Build-8 146ms ± 4% 101ms ± 1% -31.01% (p=0.002 n=6+6)
name old bin-B new bin-B delta
Build-8 6.61M ± 0% 6.60M ± 0% -0.09% (p=0.002 n=6+6)
name old sys-time/op new sys-time/op delta
Build-8 321ms ± 7% 202ms ± 6% -37.11% (p=0.002 n=6+6)
name old user-time/op new user-time/op delta
Build-8 538ms ± 4% 414ms ± 4% -23.12% (p=0.002 n=6+6)
4 years ago
|
|
|
|
// Basic information about the package being currently compiled or linked.
|
|
|
|
|
curPkg *listedPackage
|
|
|
|
|
)
|
|
|
|
|
|
wrap types.Importer to canonicalize import paths
The docs for go/importer.ForCompiler say:
The lookup function is called each time the resulting importer
needs to resolve an import path. In this mode the importer can
only be invoked with canonical import paths (not relative or
absolute ones); it is assumed that the translation to canonical
import paths is being done by the client of the importer.
We use a lookup func for two reasons: first, to support modules, and
second, to be able to use our information from "go list -json -export".
However, go/types does not canonicalize import paths before calling
ImportFrom. This is somewhat understandable; it doesn't know whether an
importer was created with a lookup func, and ImportFrom only requires
the input path to be canonicalized in that scenario. When the lookup
func is nil, the importer canonicalizes by itself via go/build.Import.
Before this change, the added crossbuild test would fail:
> garble build net/http
[stderr]
# vendor/golang.org/x/crypto/chacha20
typecheck error: /usr/lib/go/src/vendor/golang.org/x/crypto/chacha20/chacha_generic.go:10:2: could not import crypto/cipher (can't find import: "crypto/cipher")
# vendor/golang.org/x/text/secure/bidirule
typecheck error: /usr/lib/go/src/vendor/golang.org/x/text/secure/bidirule/bidirule.go:12:2: could not import errors (can't find import: "errors")
# vendor/golang.org/x/crypto/cryptobyte
typecheck error: /usr/lib/go/src/vendor/golang.org/x/crypto/cryptobyte/asn1.go:8:16: could not import encoding/asn1 (can't find import: "encoding/asn1")
# vendor/golang.org/x/text/unicode/norm
typecheck error: /usr/lib/go/src/vendor/golang.org/x/text/unicode/norm/composition.go:7:8: could not import unicode/utf8 (can't find import: "unicode/utf8")
This is because we'd fall back to importer.Default, which only knows how
to find packages in $GOROOT/pkg. Those are missing for cross-builds,
unsurprisingly, as those built archives end up in the build cache.
After this change, we properly support importing std-vendored packages,
so we can get rid of the importer.Default workaround. And, by extension,
cross-builds now work as well.
Note that, in the added test script, the full build of the binary fails,
as there seems to be some sort of linker problem:
> garble build
[stderr]
# test/main
d9rqJyxo.uoqIiDs5: relocation target runtime.os9A16A3 not defined
We leave that as a TODO for now, as this change is subtle enough as it
is.
4 years ago
|
|
|
|
type importerWithMap func(path, dir string, mode types.ImportMode) (*types.Package, error)
|
|
|
|
|
|
|
|
|
|
func (fn importerWithMap) Import(path string) (*types.Package, error) {
|
|
|
|
|
panic("should never be called")
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
func (fn importerWithMap) ImportFrom(path, dir string, mode types.ImportMode) (*types.Package, error) {
|
|
|
|
|
if path2 := curPkg.ImportMap[path]; path2 != "" {
|
|
|
|
|
path = path2
|
|
|
|
|
}
|
|
|
|
|
return fn(path, dir, mode)
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// uniqueLineWriter sits underneath log.SetOutput to deduplicate log lines.
|
|
|
|
|
// We log bits of useful information for debugging,
|
|
|
|
|
// and logging the same detail twice is not going to help the user.
|
|
|
|
|
// Duplicates are relatively normal, given that names tend to repeat.
|
|
|
|
|
type uniqueLineWriter struct {
|
|
|
|
|
out io.Writer
|
|
|
|
|
seen map[string]bool
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
func (w *uniqueLineWriter) Write(p []byte) (n int, err error) {
|
|
|
|
|
if !flagDebug {
|
|
|
|
|
panic("unexpected use of uniqueLineWriter with -debug unset")
|
|
|
|
|
}
|
|
|
|
|
if bytes.Count(p, []byte("\n")) != 1 {
|
|
|
|
|
panic(fmt.Sprintf("log write wasn't just one line: %q", p))
|
|
|
|
|
}
|
|
|
|
|
if w.seen[string(p)] {
|
|
|
|
|
return len(p), nil
|
|
|
|
|
}
|
|
|
|
|
if w.seen == nil {
|
|
|
|
|
w.seen = make(map[string]bool)
|
|
|
|
|
}
|
|
|
|
|
w.seen[string(p)] = true
|
|
|
|
|
return w.out.Write(p)
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// debugSince is like time.Since but resulting in shorter output.
|
|
|
|
|
// A build process takes at least hundreds of milliseconds,
|
|
|
|
|
// so extra decimal points in the order of microseconds aren't meaningful.
|
|
|
|
|
func debugSince(start time.Time) time.Duration {
|
|
|
|
|
return time.Since(start).Truncate(10 * time.Microsecond)
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
func main1() int {
|
|
|
|
|
defer func() {
|
|
|
|
|
if os.Getenv("GARBLE_WRITE_ALLOCS") != "true" {
|
|
|
|
|
return
|
|
|
|
|
}
|
|
|
|
|
var memStats runtime.MemStats
|
|
|
|
|
runtime.ReadMemStats(&memStats)
|
|
|
|
|
fmt.Fprintf(os.Stderr, "garble allocs: %d\n", memStats.Mallocs)
|
|
|
|
|
}()
|
|
|
|
|
if err := flagSet.Parse(os.Args[1:]); err != nil {
|
|
|
|
|
return 2
|
|
|
|
|
}
|
|
|
|
|
log.SetPrefix("[garble] ")
|
|
|
|
|
log.SetFlags(0) // no timestamps, as they aren't very useful
|
|
|
|
|
if flagDebug {
|
|
|
|
|
// TODO: cover this in the tests.
|
|
|
|
|
log.SetOutput(&uniqueLineWriter{out: os.Stderr})
|
|
|
|
|
} else {
|
|
|
|
|
log.SetOutput(io.Discard)
|
|
|
|
|
}
|
|
|
|
|
args := flagSet.Args()
|
|
|
|
|
if len(args) < 1 {
|
|
|
|
|
usage()
|
|
|
|
|
return 2
|
|
|
|
|
}
|
|
|
|
|
if err := mainErr(args); err != nil {
|
|
|
|
|
if code, ok := err.(errJustExit); ok {
|
|
|
|
|
return int(code)
|
|
|
|
|
}
|
|
|
|
|
fmt.Fprintln(os.Stderr, err)
|
|
|
|
|
|
|
|
|
|
// If the build failed and a random seed was used,
|
|
|
|
|
// the failure might not reproduce with a different seed.
|
|
|
|
|
// Print it before we exit.
|
unify the definition and storage of flag values
The parent garble process parses the original flags,
as provided by the user via the command line.
Previously, those got stored in the shared cache file,
so that child processes spawned by toolexec could see them.
Unfortunately, this made the code relatively easy to misuse.
A child process would always see flagLiterals as zero value,
given that it should never see such a flag argument directly.
Similarly, one would have to be careful with cached options,
as they could only be consumed after the cache file is loaded.
Simplify the situation by deduplicating the storage of flags.
Now, the parent passes all flags onto children via toolexec.
One exception is GarbleDir, which now becomes an env var.
This seems in line with other top-level dirs like GARBLE_SHARED.
Finally, we turn -seed into a flag.Value,
which lets us implement its "set" behavior as part of flag.Parse.
Overall, we barely reduce the amount of code involved,
but we certainly remove a couple of footguns.
As part of the cleanup, we also introduce appendFlags.
3 years ago
|
|
|
|
if flagSeed.random {
|
|
|
|
|
fmt.Fprintf(os.Stderr, "random seed: %s\n", base64.RawStdEncoding.EncodeToString(flagSeed.bytes))
|
|
|
|
|
}
|
|
|
|
|
return 1
|
|
|
|
|
}
|
|
|
|
|
return 0
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
type errJustExit int
|
|
|
|
|
|
|
|
|
|
func (e errJustExit) Error() string { return fmt.Sprintf("exit: %d", e) }
|
|
|
|
|
|
|
|
|
|
// toolchainVersionSemver is a semver-compatible version of the Go toolchain currently
|
|
|
|
|
// being used, as reported by "go env GOVERSION".
|
|
|
|
|
// Note that the version of Go that built the garble binary might be newer.
|
|
|
|
|
var toolchainVersionSemver string
|
|
|
|
|
|
|
|
|
|
func goVersionOK() bool {
|
|
|
|
|
const (
|
|
|
|
|
minGoVersionSemver = "v1.19.0"
|
|
|
|
|
suggestedGoVersion = "1.19.x"
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
// rxVersion looks for a version like "go1.2" or "go1.2.3"
|
|
|
|
|
rxVersion := regexp.MustCompile(`go\d+\.\d+(?:\.\d+)?`)
|
|
|
|
|
|
|
|
|
|
toolchainVersionFull := cache.GoEnv.GOVERSION
|
|
|
|
|
toolchainVersion := rxVersion.FindString(cache.GoEnv.GOVERSION)
|
|
|
|
|
if toolchainVersion == "" {
|
use "go env -json" to collect env info all at once
In the worst case scenario, when GOPRIVATE isn't set at all, we would
run these three commands:
* "go env GOPRIVATE", to fetch GOPRIVATE itself
* "go list -m", for GOPRIVATE's fallback
* "go version", to check the version of Go being used
Now that we support Go 1.16 and later, all these three can be obtained
via "go env -json":
$ go env -json GOPRIVATE GOMOD GOVERSION
{
"GOMOD": "/home/mvdan/src/garble/go.mod",
"GOPRIVATE": "",
"GOVERSION": "go1.16.3"
}
Note that we don't get the module path directly, but we can use the
x/mod/modfile Go API to parse it from the GOMOD file cheaply.
Notably, this also simplifies our Go version checking logic, as now we
get just the version string without the "go version" prefix and
"GOOS/GOARCH" suffix we don't care about.
This makes our code a bit more maintainable and robust. When running a
short incremental build, we can also see a small speed-up, as saving two
"go" invocations can save a few milliseconds:
name old time/op new time/op delta
Build/Cache-8 168ms ± 0% 166ms ± 1% -1.26% (p=0.009 n=6+6)
name old bin-B new bin-B delta
Build/Cache-8 6.36M ± 0% 6.36M ± 0% +0.12% (p=0.002 n=6+6)
name old sys-time/op new sys-time/op delta
Build/Cache-8 222ms ± 2% 219ms ± 3% ~ (p=0.589 n=6+6)
name old user-time/op new user-time/op delta
Build/Cache-8 857ms ± 1% 846ms ± 1% -1.31% (p=0.041 n=6+6)
4 years ago
|
|
|
|
// Go 1.15.x and older do not have GOVERSION yet.
|
|
|
|
|
// We could go the extra mile and fetch it via 'go toolchainVersion',
|
use "go env -json" to collect env info all at once
In the worst case scenario, when GOPRIVATE isn't set at all, we would
run these three commands:
* "go env GOPRIVATE", to fetch GOPRIVATE itself
* "go list -m", for GOPRIVATE's fallback
* "go version", to check the version of Go being used
Now that we support Go 1.16 and later, all these three can be obtained
via "go env -json":
$ go env -json GOPRIVATE GOMOD GOVERSION
{
"GOMOD": "/home/mvdan/src/garble/go.mod",
"GOPRIVATE": "",
"GOVERSION": "go1.16.3"
}
Note that we don't get the module path directly, but we can use the
x/mod/modfile Go API to parse it from the GOMOD file cheaply.
Notably, this also simplifies our Go version checking logic, as now we
get just the version string without the "go version" prefix and
"GOOS/GOARCH" suffix we don't care about.
This makes our code a bit more maintainable and robust. When running a
short incremental build, we can also see a small speed-up, as saving two
"go" invocations can save a few milliseconds:
name old time/op new time/op delta
Build/Cache-8 168ms ± 0% 166ms ± 1% -1.26% (p=0.009 n=6+6)
name old bin-B new bin-B delta
Build/Cache-8 6.36M ± 0% 6.36M ± 0% +0.12% (p=0.002 n=6+6)
name old sys-time/op new sys-time/op delta
Build/Cache-8 222ms ± 2% 219ms ± 3% ~ (p=0.589 n=6+6)
name old user-time/op new user-time/op delta
Build/Cache-8 857ms ± 1% 846ms ± 1% -1.31% (p=0.041 n=6+6)
4 years ago
|
|
|
|
// but we'd have to error anyway.
|
|
|
|
|
fmt.Fprintf(os.Stderr, "Go version is too old; please upgrade to Go %s or newer\n", suggestedGoVersion)
|
|
|
|
|
return false
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
toolchainVersionSemver = "v" + strings.TrimPrefix(toolchainVersion, "go")
|
|
|
|
|
if semver.Compare(toolchainVersionSemver, minGoVersionSemver) < 0 {
|
|
|
|
|
fmt.Fprintf(os.Stderr, "Go version %q is too old; please upgrade to Go %s or newer\n", toolchainVersionFull, suggestedGoVersion)
|
|
|
|
|
return false
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Ensure that the version of Go that built the garble binary is equal or
|
|
|
|
|
// newer than toolchainVersionSemver.
|
|
|
|
|
builtVersionFull := os.Getenv("GARBLE_TEST_GOVERSION")
|
|
|
|
|
if builtVersionFull == "" {
|
|
|
|
|
builtVersionFull = runtime.Version()
|
|
|
|
|
}
|
|
|
|
|
builtVersion := rxVersion.FindString(builtVersionFull)
|
|
|
|
|
if builtVersion == "" {
|
|
|
|
|
// If garble built itself, we don't know what Go version was used.
|
|
|
|
|
// Fall back to not performing the check against the toolchain version.
|
|
|
|
|
return true
|
|
|
|
|
}
|
|
|
|
|
builtVersionSemver := "v" + strings.TrimPrefix(builtVersion, "go")
|
|
|
|
|
if semver.Compare(builtVersionSemver, toolchainVersionSemver) < 0 {
|
|
|
|
|
fmt.Fprintf(os.Stderr, "garble was built with %q and is being used with %q; please rebuild garble with the newer version\n",
|
|
|
|
|
builtVersionFull, toolchainVersionFull)
|
|
|
|
|
return false
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
return true
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
func mainErr(args []string) error {
|
|
|
|
|
command, args := args[0], args[1:]
|
|
|
|
|
|
|
|
|
|
// Catch users reaching for `go build -toolexec=garble`.
|
|
|
|
|
if command != "toolexec" && len(args) == 1 && args[0] == "-V=full" {
|
|
|
|
|
return fmt.Errorf(`did you run "go [command] -toolexec=garble" instead of "garble [command]"?`)
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
switch command {
|
|
|
|
|
case "help":
|
|
|
|
|
if hasHelpFlag(args) || len(args) > 1 {
|
|
|
|
|
fmt.Fprintf(os.Stderr, "usage: garble help [command]\n")
|
|
|
|
|
return errJustExit(2)
|
|
|
|
|
}
|
|
|
|
|
if len(args) == 1 {
|
|
|
|
|
return mainErr([]string{args[0], "-h"})
|
|
|
|
|
}
|
|
|
|
|
usage()
|
|
|
|
|
return errJustExit(2)
|
|
|
|
|
case "version":
|
|
|
|
|
if hasHelpFlag(args) || len(args) > 0 {
|
|
|
|
|
fmt.Fprintf(os.Stderr, "usage: garble version\n")
|
|
|
|
|
return errJustExit(2)
|
|
|
|
|
}
|
|
|
|
|
info, ok := debug.ReadBuildInfo()
|
|
|
|
|
if !ok {
|
|
|
|
|
// The build binary was stripped of build info?
|
|
|
|
|
// Could be the case if garble built itself.
|
|
|
|
|
fmt.Println("unknown")
|
|
|
|
|
return nil
|
|
|
|
|
}
|
|
|
|
|
mod := &info.Main
|
|
|
|
|
if mod.Replace != nil {
|
|
|
|
|
mod = mod.Replace
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// For the tests.
|
|
|
|
|
if v := os.Getenv("GARBLE_TEST_BUILDSETTINGS"); v != "" {
|
|
|
|
|
var extra []debug.BuildSetting
|
|
|
|
|
if err := json.Unmarshal([]byte(v), &extra); err != nil {
|
|
|
|
|
return err
|
|
|
|
|
}
|
|
|
|
|
info.Settings = append(info.Settings, extra...)
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Until https://github.com/golang/go/issues/50603 is implemented,
|
|
|
|
|
// manually construct something like a pseudo-version.
|
|
|
|
|
// TODO: remove when this code is dead, hopefully in Go 1.20.
|
|
|
|
|
if mod.Version == "(devel)" {
|
|
|
|
|
var vcsTime time.Time
|
|
|
|
|
var vcsRevision string
|
|
|
|
|
for _, setting := range info.Settings {
|
|
|
|
|
switch setting.Key {
|
|
|
|
|
case "vcs.time":
|
|
|
|
|
// If the format is invalid, we'll print a zero timestamp.
|
|
|
|
|
vcsTime, _ = time.Parse(time.RFC3339Nano, setting.Value)
|
|
|
|
|
case "vcs.revision":
|
|
|
|
|
vcsRevision = setting.Value
|
|
|
|
|
if len(vcsRevision) > 12 {
|
|
|
|
|
vcsRevision = vcsRevision[:12]
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
if vcsRevision != "" {
|
|
|
|
|
mod.Version = module.PseudoVersion("", "", vcsTime, vcsRevision)
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
fmt.Printf("%s %s\n\n", mod.Path, mod.Version)
|
|
|
|
|
fmt.Printf("Build settings:\n")
|
|
|
|
|
for _, setting := range info.Settings {
|
|
|
|
|
if setting.Value == "" {
|
|
|
|
|
continue // do empty build settings even matter?
|
|
|
|
|
}
|
|
|
|
|
// The padding helps keep readability by aligning:
|
|
|
|
|
//
|
|
|
|
|
// veryverylong.key value
|
|
|
|
|
// short.key some-other-value
|
|
|
|
|
//
|
|
|
|
|
// Empirically, 16 is enough; the longest key seen is "vcs.revision".
|
|
|
|
|
fmt.Printf("%16s %s\n", setting.Key, setting.Value)
|
|
|
|
|
}
|
|
|
|
|
return nil
|
|
|
|
|
case "reverse":
|
|
|
|
|
return commandReverse(args)
|
fix and re-enable "garble test" (#268)
With the many refactors building up to v0.1.0, we broke "garble test" as
we no longer dealt with test packages well.
Luckily, now that we can depend on TOOLEXEC_IMPORTPATH, we can support
the test command again, as we can always figure out what package we're
currently compiling, without having to track a "main" package.
Note that one major pitfall there is test packages, where
TOOLEXEC_IMPORTPATH does not agree with ImportPath from "go list -json".
However, we can still work around that with a bit of glue code, which is
also copiously documented.
The second change necessary is to consider test packages private
depending on whether their non-test package is private or not. This can
be done via the ForTest field in "go list -json".
The third change is to obfuscate "_testmain.go" files, which are the
code-generated main functions which actually run tests. We used to not
need to obfuscate them, since test function names are never obfuscated
and we used to not obfuscate import paths at compilation time. Now we do
rewrite import paths, so we must do that for "_testmain.go" too.
The fourth change is to re-enable test.txt, and expand it with more
sanity checks and edge cases.
Finally, document "garble test" again.
Fixes #241.
4 years ago
|
|
|
|
case "build", "test":
|
|
|
|
|
cmd, err := toolexecCmd(command, args)
|
actually remove temporary directories after obfuscation
Back in February 2021, we changed the obfuscation logic so that the
entire `garble build` process would use one shared temporary directory
across all package builds, reducing the amount of files we created in
the top-level system temporary directory.
However, we made one mistake: we didn't swap os.Remove for os.RemoveAll.
Ever since then, we've been leaving temporary files behind.
Add regression tests, which failed before the fix, and fix the bug.
Note that we need to test `garble reverse` as well, as it calls
toolexecCmd separately, so it needs its own cleanup as well.
The cleanup happens via the env var, which doesn't feel worse than
having toolexecCmd return an extra string or cleanup func.
While here, also test that we support TMPDIRs with special characters.
3 years ago
|
|
|
|
defer os.RemoveAll(os.Getenv("GARBLE_SHARED"))
|
|
|
|
|
if err != nil {
|
|
|
|
|
return err
|
|
|
|
|
}
|
|
|
|
|
cmd.Stdout = os.Stdout
|
|
|
|
|
cmd.Stderr = os.Stderr
|
|
|
|
|
log.Printf("calling via toolexec: %s", cmd)
|
|
|
|
|
return cmd.Run()
|
|
|
|
|
|
|
|
|
|
case "toolexec":
|
|
|
|
|
// We're in a toolexec sub-process, not directly called by the user.
|
|
|
|
|
// Load the shared data and wrap the tool, like the compiler or linker.
|
|
|
|
|
if err := loadSharedCache(); err != nil {
|
|
|
|
|
return err
|
|
|
|
|
}
|
simplify globals, split hash.go (#191)
The previous globals worked, but were unnecessarily complex. For
example, we passed the fromPath variable around, but it's really a
static global, since we only compile or link a single package in each Go
process. Use such global variables instead of passing them around, which
currently include the package's import path, its build ID, and its
import config path.
Also split all the hashing and build ID code into hash.go, since that's
a relatively well contained 200 lines of code that doesn't need to make
main.go any bigger. We also split the code to alter Go's own version to
a separate function, so that it can be moved out of main.go as well.
4 years ago
|
|
|
|
|
|
|
|
|
_, tool := filepath.Split(args[0])
|
|
|
|
|
if runtime.GOOS == "windows" {
|
|
|
|
|
tool = strings.TrimSuffix(tool, ".exe")
|
|
|
|
|
}
|
|
|
|
|
if len(args) == 2 && args[1] == "-V=full" {
|
|
|
|
|
return alterToolVersion(tool, args)
|
|
|
|
|
}
|
fix and re-enable "garble test" (#268)
With the many refactors building up to v0.1.0, we broke "garble test" as
we no longer dealt with test packages well.
Luckily, now that we can depend on TOOLEXEC_IMPORTPATH, we can support
the test command again, as we can always figure out what package we're
currently compiling, without having to track a "main" package.
Note that one major pitfall there is test packages, where
TOOLEXEC_IMPORTPATH does not agree with ImportPath from "go list -json".
However, we can still work around that with a bit of glue code, which is
also copiously documented.
The second change necessary is to consider test packages private
depending on whether their non-test package is private or not. This can
be done via the ForTest field in "go list -json".
The third change is to obfuscate "_testmain.go" files, which are the
code-generated main functions which actually run tests. We used to not
need to obfuscate them, since test function names are never obfuscated
and we used to not obfuscate import paths at compilation time. Now we do
rewrite import paths, so we must do that for "_testmain.go" too.
The fourth change is to re-enable test.txt, and expand it with more
sanity checks and edge cases.
Finally, document "garble test" again.
Fixes #241.
4 years ago
|
|
|
|
|
|
|
|
|
toolexecImportPath := os.Getenv("TOOLEXEC_IMPORTPATH")
|
|
|
|
|
curPkg = cache.ListedPackages[toolexecImportPath]
|
|
|
|
|
if curPkg == nil {
|
|
|
|
|
return fmt.Errorf("TOOLEXEC_IMPORTPATH not found in listed packages: %s", toolexecImportPath)
|
|
|
|
|
}
|
refactor "current package" with TOOLEXEC_IMPORTPATH (#266)
Now that we've dropped support for Go 1.15.x, we can finally rely on
this environment variable for toolexec calls, present in Go 1.16.
Before, we had hacky ways of trying to figure out the current package's
import path, mostly from the -p flag. The biggest rough edge there was
that, for main packages, that was simply the package name, and not its
full import path.
To work around that, we had a restriction on a single main package, so
we could work around that issue. That restriction is now gone.
The new code is simpler, especially because we can set curPkg in a
single place for all toolexec transform funcs.
Since we can always rely on curPkg not being nil now, we can also start
reusing listedPackage.Private and avoid the majority of repeated calls
to isPrivate. The function is cheap, but still not free.
isPrivate itself can also get simpler. We no longer have to worry about
the "main" edge case. Plus, the sanity check for invalid package paths
is now unnecessary; we only got malformed paths from goobj2, and we now
require exact matches with the ImportPath field from "go list -json".
Another effect of clearing up the "main" edge case is that -debugdir now
uses the right directory for main packages. We also start using
consistent debugdir paths in the tests, for the sake of being easier to
read and maintain.
Finally, note that commandReverse did not need the extra call to "go
list -toolexec", as the "shared" call stored in the cache is enough. We
still call toolexecCmd to get said cache, which should probably be
simplified in a future PR.
While at it, replace the use of the "-std" compiler flag with the
Standard field from "go list -json".
4 years ago
|
|
|
|
|
|
|
|
|
transform := transformFuncs[tool]
|
|
|
|
|
transformed := args[1:]
|
|
|
|
|
if transform != nil {
|
|
|
|
|
startTime := time.Now()
|
|
|
|
|
log.Printf("transforming %s with args: %s", tool, strings.Join(transformed, " "))
|
|
|
|
|
var err error
|
|
|
|
|
if transformed, err = transform(transformed); err != nil {
|
|
|
|
|
return err
|
|
|
|
|
}
|
|
|
|
|
log.Printf("transformed args for %s in %s: %s", tool, debugSince(startTime), strings.Join(transformed, " "))
|
|
|
|
|
} else {
|
|
|
|
|
log.Printf("skipping transform on %s with args: %s", tool, strings.Join(transformed, " "))
|
|
|
|
|
}
|
|
|
|
|
cmd := exec.Command(args[0], transformed...)
|
|
|
|
|
cmd.Stdout = os.Stdout
|
|
|
|
|
cmd.Stderr = os.Stderr
|
|
|
|
|
if err := cmd.Run(); err != nil {
|
|
|
|
|
return err
|
|
|
|
|
}
|
|
|
|
|
return nil
|
|
|
|
|
default:
|
|
|
|
|
return fmt.Errorf("unknown command: %q", command)
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
func hasHelpFlag(flags []string) bool {
|
|
|
|
|
for _, f := range flags {
|
|
|
|
|
switch f {
|
|
|
|
|
case "-h", "-help", "--help":
|
|
|
|
|
return true
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
return false
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// toolexecCmd builds an *exec.Cmd which is set up for running "go <command>"
|
|
|
|
|
// with -toolexec=garble and the supplied arguments.
|
|
|
|
|
//
|
|
|
|
|
// Note that it uses and modifies global state; in general, it should only be
|
|
|
|
|
// called once from mainErr in the top-level garble process.
|
|
|
|
|
func toolexecCmd(command string, args []string) (*exec.Cmd, error) {
|
|
|
|
|
// Split the flags from the package arguments, since we'll need
|
|
|
|
|
// to run 'go list' on the same set of packages.
|
|
|
|
|
flags, args := splitFlagsFromArgs(args)
|
|
|
|
|
if hasHelpFlag(flags) {
|
|
|
|
|
out, _ := exec.Command("go", command, "-h").CombinedOutput()
|
|
|
|
|
fmt.Fprintf(os.Stderr, `
|
|
|
|
|
usage: garble [garble flags] %s [arguments]
|
|
|
|
|
|
|
|
|
|
This command wraps "go %s". Below is its help:
|
|
|
|
|
|
|
|
|
|
%s`[1:], command, command, out)
|
|
|
|
|
return nil, errJustExit(2)
|
|
|
|
|
}
|
|
|
|
|
for _, flag := range flags {
|
|
|
|
|
if rxGarbleFlag.MatchString(flag) {
|
|
|
|
|
return nil, fmt.Errorf("garble flags must precede command, like: garble %s build ./pkg", flag)
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
start using original action IDs (#251)
When we obfuscate a name, what we do is hash the name with the action ID
of the package that contains the name. To ensure that the hash changes
if the garble tool changes, we used the action ID of the obfuscated
build, which is different than the original action ID, as we include
garble's own content ID in "go tool compile -V=full" via -toolexec.
Let's call that the "obfuscated action ID". Remember that a content ID
is roughly the hash of a binary or object file, and an action ID
contains the hash of a package's source code plus the content IDs of its
dependencies.
This had the advantage that it did what we wanted. However, it had one
massive drawback: when we compile a package, we only have the obfuscated
action IDs of its dependencies. This is because one can't have the
content ID of dependent packages before they are built.
Usually, this is not a problem, because hashing a foreign name means it
comes from a dependency, where we already have the obfuscated action ID.
However, that's not always the case.
First, go:linkname directives can point to any symbol that ends up in
the binary, even if the package is not a dependency. So garble could
only support linkname targets belonging to dependencies. This is at the
root of why we could not obfuscate the runtime; it contains linkname
directives targeting the net package, for example, which depends on runtime.
Second, some other places did not have an easy access to obfuscated
action IDs, like transformAsm, which had to recover it from a temporary
file stored by transformCompile.
Plus, this was all pretty expensive, as each toolexec sub-process had to
make repeated calls to buildidOf with the object files of dependencies.
We even had to use extra calls to "go list" in the case of indirect
dependencies, as their export files do not appear in importcfg files.
All in all, the old method was complex and expensive. A better mechanism
is to use the original action IDs directly, as listed by "go list"
without garble in the picture.
This would mean that the hashing does not change if garble changes,
meaning weaker obfuscation. To regain that property, we define the
"garble action ID", which is just the original action ID hashed together
with garble's own content ID.
This is practically the same as the obfuscated build ID we used before,
but since it doesn't go through "go tool compile -V=full" and the
obfuscated build itself, we can work out *all* the garble action IDs
upfront, before the obfuscated build even starts.
This fixes all of our problems. Now we know all garble build IDs
upfront, so a bunch of hacks can be entirely removed. Plus, since we
know them upfront, we can also cache them and avoid repeated calls to
"go tool buildid".
While at it, make use of the new BuildID field in Go 1.16's "list -json
-export". This avoids the vast majority of "go tool buildid" calls, as
the only ones that remain are 2 on the garble binary itself.
The numbers for Go 1.16 look very good:
name old time/op new time/op delta
Build-8 146ms ± 4% 101ms ± 1% -31.01% (p=0.002 n=6+6)
name old bin-B new bin-B delta
Build-8 6.61M ± 0% 6.60M ± 0% -0.09% (p=0.002 n=6+6)
name old sys-time/op new sys-time/op delta
Build-8 321ms ± 7% 202ms ± 6% -37.11% (p=0.002 n=6+6)
name old user-time/op new user-time/op delta
Build-8 538ms ± 4% 414ms ± 4% -23.12% (p=0.002 n=6+6)
4 years ago
|
|
|
|
// Here is the only place we initialize the cache.
|
|
|
|
|
// The sub-processes will parse it from a shared gob file.
|
unify the definition and storage of flag values
The parent garble process parses the original flags,
as provided by the user via the command line.
Previously, those got stored in the shared cache file,
so that child processes spawned by toolexec could see them.
Unfortunately, this made the code relatively easy to misuse.
A child process would always see flagLiterals as zero value,
given that it should never see such a flag argument directly.
Similarly, one would have to be careful with cached options,
as they could only be consumed after the cache file is loaded.
Simplify the situation by deduplicating the storage of flags.
Now, the parent passes all flags onto children via toolexec.
One exception is GarbleDir, which now becomes an env var.
This seems in line with other top-level dirs like GARBLE_SHARED.
Finally, we turn -seed into a flag.Value,
which lets us implement its "set" behavior as part of flag.Parse.
Overall, we barely reduce the amount of code involved,
but we certainly remove a couple of footguns.
As part of the cleanup, we also introduce appendFlags.
3 years ago
|
|
|
|
cache = &sharedCache{}
|
start using original action IDs (#251)
When we obfuscate a name, what we do is hash the name with the action ID
of the package that contains the name. To ensure that the hash changes
if the garble tool changes, we used the action ID of the obfuscated
build, which is different than the original action ID, as we include
garble's own content ID in "go tool compile -V=full" via -toolexec.
Let's call that the "obfuscated action ID". Remember that a content ID
is roughly the hash of a binary or object file, and an action ID
contains the hash of a package's source code plus the content IDs of its
dependencies.
This had the advantage that it did what we wanted. However, it had one
massive drawback: when we compile a package, we only have the obfuscated
action IDs of its dependencies. This is because one can't have the
content ID of dependent packages before they are built.
Usually, this is not a problem, because hashing a foreign name means it
comes from a dependency, where we already have the obfuscated action ID.
However, that's not always the case.
First, go:linkname directives can point to any symbol that ends up in
the binary, even if the package is not a dependency. So garble could
only support linkname targets belonging to dependencies. This is at the
root of why we could not obfuscate the runtime; it contains linkname
directives targeting the net package, for example, which depends on runtime.
Second, some other places did not have an easy access to obfuscated
action IDs, like transformAsm, which had to recover it from a temporary
file stored by transformCompile.
Plus, this was all pretty expensive, as each toolexec sub-process had to
make repeated calls to buildidOf with the object files of dependencies.
We even had to use extra calls to "go list" in the case of indirect
dependencies, as their export files do not appear in importcfg files.
All in all, the old method was complex and expensive. A better mechanism
is to use the original action IDs directly, as listed by "go list"
without garble in the picture.
This would mean that the hashing does not change if garble changes,
meaning weaker obfuscation. To regain that property, we define the
"garble action ID", which is just the original action ID hashed together
with garble's own content ID.
This is practically the same as the obfuscated build ID we used before,
but since it doesn't go through "go tool compile -V=full" and the
obfuscated build itself, we can work out *all* the garble action IDs
upfront, before the obfuscated build even starts.
This fixes all of our problems. Now we know all garble build IDs
upfront, so a bunch of hacks can be entirely removed. Plus, since we
know them upfront, we can also cache them and avoid repeated calls to
"go tool buildid".
While at it, make use of the new BuildID field in Go 1.16's "list -json
-export". This avoids the vast majority of "go tool buildid" calls, as
the only ones that remain are 2 on the garble binary itself.
The numbers for Go 1.16 look very good:
name old time/op new time/op delta
Build-8 146ms ± 4% 101ms ± 1% -31.01% (p=0.002 n=6+6)
name old bin-B new bin-B delta
Build-8 6.61M ± 0% 6.60M ± 0% -0.09% (p=0.002 n=6+6)
name old sys-time/op new sys-time/op delta
Build-8 321ms ± 7% 202ms ± 6% -37.11% (p=0.002 n=6+6)
name old user-time/op new user-time/op delta
Build-8 538ms ± 4% 414ms ± 4% -23.12% (p=0.002 n=6+6)
4 years ago
|
|
|
|
|
|
|
|
|
// Note that we also need to pass build flags to 'go list', such
|
|
|
|
|
// as -tags.
|
fail if we are unexpectedly overwriting files (#418)
While investigating a bug report,
I noticed that garble was writing to the same temp file twice.
At best, writing to the same path on disk twice is wasteful,
as the design is careful to be deterministic and use unique paths.
At worst, the two writes could cause races at the filesystem level.
To prevent either of those situations,
we now create files with os.OpenFile and os.O_EXCL,
meaning that we will error if the file already exists.
That change uncovered a number of such unintended cases.
First, transformAsm would write obfuscated Go files twice.
This is because the Go toolchain actually runs:
[...]/asm -gensymabis [...] foo.s bar.s
[...]/asm [...] foo.s bar.s
That is, the first run is only meant to generate symbol ABIs,
which are then used by the compiler.
We need to obfuscate at that first stage,
because the symbol ABI descriptions need to use obfuscated names.
However, having already obfuscated the assembly on the first stage,
there is no need to do so again on the second stage.
If we detect gensymabis is missing, we simply reuse the previous files.
This first situation doesn't seem racy,
but obfuscating the Go assembly files twice is certainly unnecessary.
Second, saveKnownReflectAPIs wrote a gob file to the build cache.
Since the build cache can be kept between builds,
and since the build cache uses reproducible paths for each build,
running the same "garble build" twice could overwrite those files.
This could actually cause races at the filesystem level;
if two concurrent builds write to the same gob file on disk,
one of them could end up using a partially-written file.
Note that this is the only of the three cases not using temporary files.
As such, it is expected that the file may already exist.
In such a case, we simply avoid overwriting it rather than failing.
Third, when "garble build -a" was used,
and when we needed an export file not listed in importcfg,
we would end up calling roughly:
go list -export -toolexec=garble -a <dependency>
This meant we would re-build and re-obfuscate those packages.
Which is unfortunate, because the parent process already did via:
go build -toolexec=garble -a <main>
The repeated dependency builds tripped the new os.O_EXCL check,
as we would try to overwrite the same obfuscated Go files.
Beyond being wasteful, this could again cause subtle filesystem races.
To fix the problem, avoid passing flags like "-a" to nested go commands.
Overall, we should likely be using safer ways to write to disk,
be it via either atomic writes or locked files.
However, for now, catching duplicate writes is a big step.
I have left a self-assigned TODO for further improvements.
CI on the pull request found a failure on test-gotip.
The failure reproduces on master, so it seems to be related to gotip,
and not a regression introduced by this change.
For now, disable test-gotip until we can investigate.
3 years ago
|
|
|
|
cache.ForwardBuildFlags, _ = filterForwardBuildFlags(flags)
|
|
|
|
|
if command == "test" {
|
fail if we are unexpectedly overwriting files (#418)
While investigating a bug report,
I noticed that garble was writing to the same temp file twice.
At best, writing to the same path on disk twice is wasteful,
as the design is careful to be deterministic and use unique paths.
At worst, the two writes could cause races at the filesystem level.
To prevent either of those situations,
we now create files with os.OpenFile and os.O_EXCL,
meaning that we will error if the file already exists.
That change uncovered a number of such unintended cases.
First, transformAsm would write obfuscated Go files twice.
This is because the Go toolchain actually runs:
[...]/asm -gensymabis [...] foo.s bar.s
[...]/asm [...] foo.s bar.s
That is, the first run is only meant to generate symbol ABIs,
which are then used by the compiler.
We need to obfuscate at that first stage,
because the symbol ABI descriptions need to use obfuscated names.
However, having already obfuscated the assembly on the first stage,
there is no need to do so again on the second stage.
If we detect gensymabis is missing, we simply reuse the previous files.
This first situation doesn't seem racy,
but obfuscating the Go assembly files twice is certainly unnecessary.
Second, saveKnownReflectAPIs wrote a gob file to the build cache.
Since the build cache can be kept between builds,
and since the build cache uses reproducible paths for each build,
running the same "garble build" twice could overwrite those files.
This could actually cause races at the filesystem level;
if two concurrent builds write to the same gob file on disk,
one of them could end up using a partially-written file.
Note that this is the only of the three cases not using temporary files.
As such, it is expected that the file may already exist.
In such a case, we simply avoid overwriting it rather than failing.
Third, when "garble build -a" was used,
and when we needed an export file not listed in importcfg,
we would end up calling roughly:
go list -export -toolexec=garble -a <dependency>
This meant we would re-build and re-obfuscate those packages.
Which is unfortunate, because the parent process already did via:
go build -toolexec=garble -a <main>
The repeated dependency builds tripped the new os.O_EXCL check,
as we would try to overwrite the same obfuscated Go files.
Beyond being wasteful, this could again cause subtle filesystem races.
To fix the problem, avoid passing flags like "-a" to nested go commands.
Overall, we should likely be using safer ways to write to disk,
be it via either atomic writes or locked files.
However, for now, catching duplicate writes is a big step.
I have left a self-assigned TODO for further improvements.
CI on the pull request found a failure on test-gotip.
The failure reproduces on master, so it seems to be related to gotip,
and not a regression introduced by this change.
For now, disable test-gotip until we can investigate.
3 years ago
|
|
|
|
cache.ForwardBuildFlags = append(cache.ForwardBuildFlags, "-test")
|
|
|
|
|
}
|
|
|
|
|
|
use "go env -json" to collect env info all at once
In the worst case scenario, when GOPRIVATE isn't set at all, we would
run these three commands:
* "go env GOPRIVATE", to fetch GOPRIVATE itself
* "go list -m", for GOPRIVATE's fallback
* "go version", to check the version of Go being used
Now that we support Go 1.16 and later, all these three can be obtained
via "go env -json":
$ go env -json GOPRIVATE GOMOD GOVERSION
{
"GOMOD": "/home/mvdan/src/garble/go.mod",
"GOPRIVATE": "",
"GOVERSION": "go1.16.3"
}
Note that we don't get the module path directly, but we can use the
x/mod/modfile Go API to parse it from the GOMOD file cheaply.
Notably, this also simplifies our Go version checking logic, as now we
get just the version string without the "go version" prefix and
"GOOS/GOARCH" suffix we don't care about.
This makes our code a bit more maintainable and robust. When running a
short incremental build, we can also see a small speed-up, as saving two
"go" invocations can save a few milliseconds:
name old time/op new time/op delta
Build/Cache-8 168ms ± 0% 166ms ± 1% -1.26% (p=0.009 n=6+6)
name old bin-B new bin-B delta
Build/Cache-8 6.36M ± 0% 6.36M ± 0% +0.12% (p=0.002 n=6+6)
name old sys-time/op new sys-time/op delta
Build/Cache-8 222ms ± 2% 219ms ± 3% ~ (p=0.589 n=6+6)
name old user-time/op new user-time/op delta
Build/Cache-8 857ms ± 1% 846ms ± 1% -1.31% (p=0.041 n=6+6)
4 years ago
|
|
|
|
if err := fetchGoEnv(); err != nil {
|
|
|
|
|
return nil, err
|
|
|
|
|
}
|
|
|
|
|
|
use "go env -json" to collect env info all at once
In the worst case scenario, when GOPRIVATE isn't set at all, we would
run these three commands:
* "go env GOPRIVATE", to fetch GOPRIVATE itself
* "go list -m", for GOPRIVATE's fallback
* "go version", to check the version of Go being used
Now that we support Go 1.16 and later, all these three can be obtained
via "go env -json":
$ go env -json GOPRIVATE GOMOD GOVERSION
{
"GOMOD": "/home/mvdan/src/garble/go.mod",
"GOPRIVATE": "",
"GOVERSION": "go1.16.3"
}
Note that we don't get the module path directly, but we can use the
x/mod/modfile Go API to parse it from the GOMOD file cheaply.
Notably, this also simplifies our Go version checking logic, as now we
get just the version string without the "go version" prefix and
"GOOS/GOARCH" suffix we don't care about.
This makes our code a bit more maintainable and robust. When running a
short incremental build, we can also see a small speed-up, as saving two
"go" invocations can save a few milliseconds:
name old time/op new time/op delta
Build/Cache-8 168ms ± 0% 166ms ± 1% -1.26% (p=0.009 n=6+6)
name old bin-B new bin-B delta
Build/Cache-8 6.36M ± 0% 6.36M ± 0% +0.12% (p=0.002 n=6+6)
name old sys-time/op new sys-time/op delta
Build/Cache-8 222ms ± 2% 219ms ± 3% ~ (p=0.589 n=6+6)
name old user-time/op new user-time/op delta
Build/Cache-8 857ms ± 1% 846ms ± 1% -1.31% (p=0.041 n=6+6)
4 years ago
|
|
|
|
if !goVersionOK() {
|
|
|
|
|
return nil, errJustExit(1)
|
use "go env -json" to collect env info all at once
In the worst case scenario, when GOPRIVATE isn't set at all, we would
run these three commands:
* "go env GOPRIVATE", to fetch GOPRIVATE itself
* "go list -m", for GOPRIVATE's fallback
* "go version", to check the version of Go being used
Now that we support Go 1.16 and later, all these three can be obtained
via "go env -json":
$ go env -json GOPRIVATE GOMOD GOVERSION
{
"GOMOD": "/home/mvdan/src/garble/go.mod",
"GOPRIVATE": "",
"GOVERSION": "go1.16.3"
}
Note that we don't get the module path directly, but we can use the
x/mod/modfile Go API to parse it from the GOMOD file cheaply.
Notably, this also simplifies our Go version checking logic, as now we
get just the version string without the "go version" prefix and
"GOOS/GOARCH" suffix we don't care about.
This makes our code a bit more maintainable and robust. When running a
short incremental build, we can also see a small speed-up, as saving two
"go" invocations can save a few milliseconds:
name old time/op new time/op delta
Build/Cache-8 168ms ± 0% 166ms ± 1% -1.26% (p=0.009 n=6+6)
name old bin-B new bin-B delta
Build/Cache-8 6.36M ± 0% 6.36M ± 0% +0.12% (p=0.002 n=6+6)
name old sys-time/op new sys-time/op delta
Build/Cache-8 222ms ± 2% 219ms ± 3% ~ (p=0.589 n=6+6)
name old user-time/op new user-time/op delta
Build/Cache-8 857ms ± 1% 846ms ± 1% -1.31% (p=0.041 n=6+6)
4 years ago
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
var err error
|
|
|
|
|
cache.ExecPath, err = os.Executable()
|
|
|
|
|
if err != nil {
|
|
|
|
|
return nil, err
|
|
|
|
|
}
|
|
|
|
|
|
ensure the runtime is built in a reproducible way
We went to great lengths to ensure garble builds are reproducible.
This includes how the tool itself works,
as its behavior should be the same given the same inputs.
However, we made one crucial mistake with the runtime package.
It has go:linkname directives pointing at other packages,
and some of those pointed packages aren't its dependencies.
Imagine two scenarios where garble builds the runtime package:
1) We run "garble build runtime". The way we handle linkname directives
calls listPackage on the target package, to obfuscate the target's
import path and object name. However, since we only obtained build
info of runtime and its deps, calls for some linknames such as
listPackage("sync/atomic") will fail. The linkname directive will
leave its target untouched.
2) We run "garble build std". Unlike the first scenario, all listPackage
calls issued by runtime's linkname directives will succeed, so its
linkname directive targets will be obfuscated.
At best, this can result in inconsistent builds, depending on how the
runtime package was built. At worst, the mismatching object names can
result in errors at link time, if the target packages are actually used.
The modified test reproduces the worst case scenario reliably,
when the fix is reverted:
> env GOCACHE=${WORK}/gocache-empty
> garble build -a runtime
> garble build -o=out_rebuild ./stdimporter
[stderr]
# test/main/stdimporter
JZzQivnl.NtQJu0H3: relocation target JZzQivnl.iioHinYT not defined
JZzQivnl.NtQJu0H3.func9: relocation target JZzQivnl.yz5z0NaH not defined
JZzQivnl.(*ypvqhKiQ).String: relocation target JZzQivnl.eVciBQeI not defined
JZzQivnl.(*ypvqhKiQ).PkgPath: relocation target JZzQivnl.eVciBQeI not defined
[...]
The fix consists of two steps. First, if we're building the runtime and
listPackage fails on a package, that means we ran into scenario 1 above.
To avoid the inconsistency, we fill ListedPackages with "go list [...] std".
This means we'll always build runtime as described in scenario 2 above.
Second, when building packages other than the runtime,
we only allow listPackage to succeed if we're listing a dependency of
the current package.
This ensures we won't run into similar reproducibility bugs in the future.
Finally, re-enable test-gotip on CI since this was the last test flake.
3 years ago
|
|
|
|
binaryBuildID, err := buildidOf(cache.ExecPath)
|
|
|
|
|
if err != nil {
|
|
|
|
|
return nil, err
|
|
|
|
|
}
|
|
|
|
|
cache.BinaryContentID = decodeHash(splitContentID(binaryBuildID))
|
|
|
|
|
|
|
|
|
|
if err := appendListedPackages(args, true); err != nil {
|
|
|
|
|
return nil, err
|
|
|
|
|
}
|
|
|
|
|
|
start using original action IDs (#251)
When we obfuscate a name, what we do is hash the name with the action ID
of the package that contains the name. To ensure that the hash changes
if the garble tool changes, we used the action ID of the obfuscated
build, which is different than the original action ID, as we include
garble's own content ID in "go tool compile -V=full" via -toolexec.
Let's call that the "obfuscated action ID". Remember that a content ID
is roughly the hash of a binary or object file, and an action ID
contains the hash of a package's source code plus the content IDs of its
dependencies.
This had the advantage that it did what we wanted. However, it had one
massive drawback: when we compile a package, we only have the obfuscated
action IDs of its dependencies. This is because one can't have the
content ID of dependent packages before they are built.
Usually, this is not a problem, because hashing a foreign name means it
comes from a dependency, where we already have the obfuscated action ID.
However, that's not always the case.
First, go:linkname directives can point to any symbol that ends up in
the binary, even if the package is not a dependency. So garble could
only support linkname targets belonging to dependencies. This is at the
root of why we could not obfuscate the runtime; it contains linkname
directives targeting the net package, for example, which depends on runtime.
Second, some other places did not have an easy access to obfuscated
action IDs, like transformAsm, which had to recover it from a temporary
file stored by transformCompile.
Plus, this was all pretty expensive, as each toolexec sub-process had to
make repeated calls to buildidOf with the object files of dependencies.
We even had to use extra calls to "go list" in the case of indirect
dependencies, as their export files do not appear in importcfg files.
All in all, the old method was complex and expensive. A better mechanism
is to use the original action IDs directly, as listed by "go list"
without garble in the picture.
This would mean that the hashing does not change if garble changes,
meaning weaker obfuscation. To regain that property, we define the
"garble action ID", which is just the original action ID hashed together
with garble's own content ID.
This is practically the same as the obfuscated build ID we used before,
but since it doesn't go through "go tool compile -V=full" and the
obfuscated build itself, we can work out *all* the garble action IDs
upfront, before the obfuscated build even starts.
This fixes all of our problems. Now we know all garble build IDs
upfront, so a bunch of hacks can be entirely removed. Plus, since we
know them upfront, we can also cache them and avoid repeated calls to
"go tool buildid".
While at it, make use of the new BuildID field in Go 1.16's "list -json
-export". This avoids the vast majority of "go tool buildid" calls, as
the only ones that remain are 2 on the garble binary itself.
The numbers for Go 1.16 look very good:
name old time/op new time/op delta
Build-8 146ms ± 4% 101ms ± 1% -31.01% (p=0.002 n=6+6)
name old bin-B new bin-B delta
Build-8 6.61M ± 0% 6.60M ± 0% -0.09% (p=0.002 n=6+6)
name old sys-time/op new sys-time/op delta
Build-8 321ms ± 7% 202ms ± 6% -37.11% (p=0.002 n=6+6)
name old user-time/op new user-time/op delta
Build-8 538ms ± 4% 414ms ± 4% -23.12% (p=0.002 n=6+6)
4 years ago
|
|
|
|
sharedTempDir, err = saveSharedCache()
|
|
|
|
|
if err != nil {
|
|
|
|
|
return nil, err
|
|
|
|
|
}
|
|
|
|
|
os.Setenv("GARBLE_SHARED", sharedTempDir)
|
unify the definition and storage of flag values
The parent garble process parses the original flags,
as provided by the user via the command line.
Previously, those got stored in the shared cache file,
so that child processes spawned by toolexec could see them.
Unfortunately, this made the code relatively easy to misuse.
A child process would always see flagLiterals as zero value,
given that it should never see such a flag argument directly.
Similarly, one would have to be careful with cached options,
as they could only be consumed after the cache file is loaded.
Simplify the situation by deduplicating the storage of flags.
Now, the parent passes all flags onto children via toolexec.
One exception is GarbleDir, which now becomes an env var.
This seems in line with other top-level dirs like GARBLE_SHARED.
Finally, we turn -seed into a flag.Value,
which lets us implement its "set" behavior as part of flag.Parse.
Overall, we barely reduce the amount of code involved,
but we certainly remove a couple of footguns.
As part of the cleanup, we also introduce appendFlags.
3 years ago
|
|
|
|
wd, err := os.Getwd()
|
|
|
|
|
if err != nil {
|
|
|
|
|
return nil, err
|
|
|
|
|
}
|
|
|
|
|
os.Setenv("GARBLE_PARENT_WORK", wd)
|
|
|
|
|
|
|
|
|
|
if flagDebugDir != "" {
|
|
|
|
|
if !filepath.IsAbs(flagDebugDir) {
|
|
|
|
|
flagDebugDir = filepath.Join(wd, flagDebugDir)
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if err := os.RemoveAll(flagDebugDir); err != nil {
|
|
|
|
|
return nil, fmt.Errorf("could not empty debugdir: %v", err)
|
|
|
|
|
}
|
|
|
|
|
if err := os.MkdirAll(flagDebugDir, 0o755); err != nil {
|
|
|
|
|
return nil, err
|
unify the definition and storage of flag values
The parent garble process parses the original flags,
as provided by the user via the command line.
Previously, those got stored in the shared cache file,
so that child processes spawned by toolexec could see them.
Unfortunately, this made the code relatively easy to misuse.
A child process would always see flagLiterals as zero value,
given that it should never see such a flag argument directly.
Similarly, one would have to be careful with cached options,
as they could only be consumed after the cache file is loaded.
Simplify the situation by deduplicating the storage of flags.
Now, the parent passes all flags onto children via toolexec.
One exception is GarbleDir, which now becomes an env var.
This seems in line with other top-level dirs like GARBLE_SHARED.
Finally, we turn -seed into a flag.Value,
which lets us implement its "set" behavior as part of flag.Parse.
Overall, we barely reduce the amount of code involved,
but we certainly remove a couple of footguns.
As part of the cleanup, we also introduce appendFlags.
3 years ago
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
goArgs := []string{
|
|
|
|
|
command,
|
|
|
|
|
"-trimpath",
|
|
|
|
|
"-buildvcs=false",
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Pass the garble flags down to each toolexec invocation.
|
|
|
|
|
// This way, all garble processes see the same flag values.
|
|
|
|
|
// Note that we can end up with a single argument to `go` in the form of:
|
|
|
|
|
//
|
|
|
|
|
// -toolexec='/binary dir/garble' -tiny toolexec
|
|
|
|
|
//
|
|
|
|
|
// We quote the absolute path to garble if it contains spaces.
|
|
|
|
|
// We can add extra flags to the end of the same -toolexec argument.
|
|
|
|
|
var toolexecFlag strings.Builder
|
|
|
|
|
toolexecFlag.WriteString("-toolexec=")
|
|
|
|
|
quotedExecPath, err := cmdgoQuotedJoin([]string{cache.ExecPath})
|
|
|
|
|
if err != nil {
|
|
|
|
|
// Can only happen if the absolute path to the garble binary contains
|
|
|
|
|
// both single and double quotes. Seems extremely unlikely.
|
|
|
|
|
return nil, err
|
|
|
|
|
}
|
|
|
|
|
toolexecFlag.WriteString(quotedExecPath)
|
|
|
|
|
appendFlags(&toolexecFlag, false)
|
|
|
|
|
toolexecFlag.WriteString(" toolexec")
|
|
|
|
|
goArgs = append(goArgs, toolexecFlag.String())
|
|
|
|
|
|
reimplement import path obfuscation without goobj2 (#242)
We used to rely on a parallel implementation of an object file parser
and writer to be able to obfuscate import paths. After compiling each
package, we would parse the object file, replace the import paths, and
write the updated object file in-place.
That worked well, in most cases. Unfortunately, it had some flaws:
* Complexity. Even when most of the code is maintained in a separate
module, the import_obfuscation.go file was still close to a thousand
lines of code.
* Go compatibility. The object file format changes between Go releases,
so we were supporting Go 1.15, but not 1.16. Fixing the object file
package to work with 1.16 would probably break 1.15 support.
* Bugs. For example, we recently had to add a workaround for #224, since
import paths containing dots after the domain would end up escaped.
Another example is #190, which seems to be caused by the object file
parser or writer corrupting the compiled code and causing segfaults in
some rare edge cases.
Instead, let's drop that method entirely, and force the compiler and
linker to do the work for us. The steps necessary when compiling a
package to obfuscate are:
1) Replace its "package foo" lines with the obfuscated package path. No
need to separate the package path and name, since the obfuscated path
does not contain slashes.
2) Replace the "-p pkg/foo" flag with the obfuscated path.
3) Replace the "import" spec lines with the obfuscated package paths,
for those dependencies which were obfuscated.
4) Replace the "-importcfg [...]" file with a version that uses the
obfuscated paths instead.
The linker also needs that last step, since it also uses an importcfg
file to find object files.
There are three noteworthy drawbacks to this new method:
1) Since we no longer write object files, we can't use them to store
data to be cached. As such, the -debugdir flag goes back to using the
"-a" build flag to always rebuild all packages. On the plus side,
that caching didn't work very well; see #176.
2) The package name "main" remains in all declarations under it, not
just "func main", since we can only rename entire packages. This
seems fine, as it gives little information to the end user.
3) The -tiny mode no longer sets all lines to 0, since it did that by
modifying object files. As a temporary measure, we instead set all
top-level declarations to be on line 1. A TODO is added to hopefully
improve this again in the near future.
The upside is that we get rid of all the issues mentioned before. Plus,
garble now nearly works with Go 1.16, with the exception of two very
minor bugs that look fixable. A follow-up PR will take care of that and
start testing on 1.16.
Fixes #176.
Fixes #190.
4 years ago
|
|
|
|
if flagDebugDir != "" {
|
|
|
|
|
// In case the user deletes the debug directory,
|
|
|
|
|
// and a previous build is cached,
|
|
|
|
|
// rebuild all packages to re-fill the debug dir.
|
|
|
|
|
goArgs = append(goArgs, "-a")
|
|
|
|
|
}
|
|
|
|
|
if command == "test" {
|
|
|
|
|
// vet is generally not useful on obfuscated code; keep it
|
|
|
|
|
// disabled by default.
|
|
|
|
|
goArgs = append(goArgs, "-vet=off")
|
|
|
|
|
}
|
|
|
|
|
goArgs = append(goArgs, flags...)
|
|
|
|
|
goArgs = append(goArgs, args...)
|
|
|
|
|
|
|
|
|
|
return exec.Command("go", goArgs...), nil
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
var transformFuncs = map[string]func([]string) ([]string, error){
|
|
|
|
|
"asm": transformAsm,
|
|
|
|
|
"compile": transformCompile,
|
|
|
|
|
"link": transformLink,
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
var rxIncludeHeader = regexp.MustCompile(`#include\s+"([^"]+)"`)
|
|
|
|
|
|
|
|
|
|
func transformAsm(args []string) ([]string, error) {
|
|
|
|
|
flags, paths := splitFlagsFromFiles(args, ".s")
|
|
|
|
|
|
concentrate and simplify "to obfuscate" logic
Back in the day, we used to call toObfuscate anytime we needed to know
whether a package should be obfuscated.
More recently, we started computing via the ToObfuscate field,
which then gets shared with all sub-processes via sharedCache.
We still had two places that directly called toObfuscate.
Replace those with ToObfuscate, and inline toObfuscate into shared.go.
obfuscatedImportPath is also a potential footgun for main packages.
Some use cases always want the original "main" package name,
such as for use in the compiler's "-p main" flag,
while other cases want the obfuscated package import path,
such as the entries in importcfg files.
Since each of these call sites handles the edge case well,
obfuscatedImportPath now panics on main packages to avoid any misuse.
Finally, test that we never leak main package paths via ldflags.txt.
We never did, but it's good to make sure.
Overall, this avoids confusion and trims the size of main.go a bit.
3 years ago
|
|
|
|
// When assembling, the import path can make its way into the output object file.
|
support obfuscating the time package
This failed at link time because transformAsm did not know how to handle
the fact that the runtime package's assembly code implements the
`time.now` function via:
TEXT time·now<ABIInternal>(SB),NOSPLIT,$16-24
First, we need transformAsm to happen for all packages, not just the
ones that we are obfuscating. This is because the runtime can implement
APIs in other packages which are themselves obfuscated, whereas runtime
may not itself be getting obfuscated. This is currently the case with
`GOGARBLE=*` as we do not yet support obfuscating the runtime.
Second, we need to teach replaceAsmNames to handle qualified names with
import paths. Not just to look up the right package information for the
name, but also to obfuscate the package path if necessary.
Third, we need to relax the Deps requirement on listPackage, since the
runtime package and its dependencies are always implicit dependencies.
This is a big step towards being able to obfuscate the runtime, as there
is now just one package left that we cannot obfuscate outside the runtime.
Updates #193.
3 years ago
|
|
|
|
if curPkg.Name != "main" && curPkg.ToObfuscate {
|
refactor "current package" with TOOLEXEC_IMPORTPATH (#266)
Now that we've dropped support for Go 1.15.x, we can finally rely on
this environment variable for toolexec calls, present in Go 1.16.
Before, we had hacky ways of trying to figure out the current package's
import path, mostly from the -p flag. The biggest rough edge there was
that, for main packages, that was simply the package name, and not its
full import path.
To work around that, we had a restriction on a single main package, so
we could work around that issue. That restriction is now gone.
The new code is simpler, especially because we can set curPkg in a
single place for all toolexec transform funcs.
Since we can always rely on curPkg not being nil now, we can also start
reusing listedPackage.Private and avoid the majority of repeated calls
to isPrivate. The function is cheap, but still not free.
isPrivate itself can also get simpler. We no longer have to worry about
the "main" edge case. Plus, the sanity check for invalid package paths
is now unnecessary; we only got malformed paths from goobj2, and we now
require exact matches with the ImportPath field from "go list -json".
Another effect of clearing up the "main" edge case is that -debugdir now
uses the right directory for main packages. We also start using
consistent debugdir paths in the tests, for the sake of being easier to
read and maintain.
Finally, note that commandReverse did not need the extra call to "go
list -toolexec", as the "shared" call stored in the cache is enough. We
still call toolexecCmd to get said cache, which should probably be
simplified in a future PR.
While at it, replace the use of the "-std" compiler flag with the
Standard field from "go list -json".
4 years ago
|
|
|
|
flags = flagSetValue(flags, "-p", curPkg.obfuscatedImportPath())
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
flags = alterTrimpath(flags)
|
avoid reproducibility issues with full rebuilds
We were using temporary filenames for modified Go and assembly files.
For example, an obfuscated "encoding/json/encode.go" would end up as:
/tmp/garble-shared123/encode.go.456.go
where "123" and "456" are random numbers, usually longer.
This was usually fine for two reasons:
1) We would add "/tmp/garble-shared123/" to -trimpath, so the temporary
directory and its random number would be invisible.
2) We would add "//line" directives to the source files, replacing
the filename with obfuscated versions excluding any random number.
Unfortunately, this broke in multiple ways. Most notably, assembly files
do not have any line directives, and it's not clear that there's any
support for them. So the random number in their basename could end up in
the binary, breaking reproducibility.
Another issue is that the -trimpath addition described above was only
done for cmd/compile, not cmd/asm, so assembly filenames included the
randomized temporary directory.
To fix the issues above, the same "encoding/json/encode.go" would now
end up as:
/tmp/garble-shared123/encoding/json/encode.go
Such a path is still unique even though the "456" random number is gone,
as import paths are unique within a single build.
This fixes issues with the base name of each file, so we no longer rely
on line directives as the only way to remove the second original random
number.
We still rely on -trimpath to get rid of the temporary directory in
filenames. To fix its problem with assembly files, also amend the
-trimpath flag when running the assembler tool.
Finally, add a test that reproducible builds still work when a full
rebuild is done. We choose goprivate.txt for such a test as its
stdimporter package imports a number of std packages, including uses of
assembly and cgo.
For the time being, we don't use such a "full rebuild" reproducibility
test in other test scripts, as this step is expensive, rebuilding many
packages from scratch.
This issue went unnoticed for over a year because such random numbers
"123" and "456" were created when a package was obfuscated, and that
only happened once per package version as long as the build cache was
kept intact.
When clearing the build cache, or forcing a rebuild with -a, one gets
new random numbers, and thus a different binary resulting from the same
build input. That's not something that most users would do regularly,
and our tests did not cover that edge case either, until now.
Fixes #328.
4 years ago
|
|
|
|
|
|
|
|
|
// The assembler runs twice; the first with -gensymabis,
|
|
|
|
|
// where we continue below and we obfuscate all the source.
|
|
|
|
|
// The second time, without -gensymabis, we reconstruct the paths to the
|
|
|
|
|
// obfuscated source files and reuse them to avoid work.
|
fail if we are unexpectedly overwriting files (#418)
While investigating a bug report,
I noticed that garble was writing to the same temp file twice.
At best, writing to the same path on disk twice is wasteful,
as the design is careful to be deterministic and use unique paths.
At worst, the two writes could cause races at the filesystem level.
To prevent either of those situations,
we now create files with os.OpenFile and os.O_EXCL,
meaning that we will error if the file already exists.
That change uncovered a number of such unintended cases.
First, transformAsm would write obfuscated Go files twice.
This is because the Go toolchain actually runs:
[...]/asm -gensymabis [...] foo.s bar.s
[...]/asm [...] foo.s bar.s
That is, the first run is only meant to generate symbol ABIs,
which are then used by the compiler.
We need to obfuscate at that first stage,
because the symbol ABI descriptions need to use obfuscated names.
However, having already obfuscated the assembly on the first stage,
there is no need to do so again on the second stage.
If we detect gensymabis is missing, we simply reuse the previous files.
This first situation doesn't seem racy,
but obfuscating the Go assembly files twice is certainly unnecessary.
Second, saveKnownReflectAPIs wrote a gob file to the build cache.
Since the build cache can be kept between builds,
and since the build cache uses reproducible paths for each build,
running the same "garble build" twice could overwrite those files.
This could actually cause races at the filesystem level;
if two concurrent builds write to the same gob file on disk,
one of them could end up using a partially-written file.
Note that this is the only of the three cases not using temporary files.
As such, it is expected that the file may already exist.
In such a case, we simply avoid overwriting it rather than failing.
Third, when "garble build -a" was used,
and when we needed an export file not listed in importcfg,
we would end up calling roughly:
go list -export -toolexec=garble -a <dependency>
This meant we would re-build and re-obfuscate those packages.
Which is unfortunate, because the parent process already did via:
go build -toolexec=garble -a <main>
The repeated dependency builds tripped the new os.O_EXCL check,
as we would try to overwrite the same obfuscated Go files.
Beyond being wasteful, this could again cause subtle filesystem races.
To fix the problem, avoid passing flags like "-a" to nested go commands.
Overall, we should likely be using safer ways to write to disk,
be it via either atomic writes or locked files.
However, for now, catching duplicate writes is a big step.
I have left a self-assigned TODO for further improvements.
CI on the pull request found a failure on test-gotip.
The failure reproduces on master, so it seems to be related to gotip,
and not a regression introduced by this change.
For now, disable test-gotip until we can investigate.
3 years ago
|
|
|
|
newPaths := make([]string, 0, len(paths))
|
|
|
|
|
if !slices.Contains(args, "-gensymabis") {
|
fail if we are unexpectedly overwriting files (#418)
While investigating a bug report,
I noticed that garble was writing to the same temp file twice.
At best, writing to the same path on disk twice is wasteful,
as the design is careful to be deterministic and use unique paths.
At worst, the two writes could cause races at the filesystem level.
To prevent either of those situations,
we now create files with os.OpenFile and os.O_EXCL,
meaning that we will error if the file already exists.
That change uncovered a number of such unintended cases.
First, transformAsm would write obfuscated Go files twice.
This is because the Go toolchain actually runs:
[...]/asm -gensymabis [...] foo.s bar.s
[...]/asm [...] foo.s bar.s
That is, the first run is only meant to generate symbol ABIs,
which are then used by the compiler.
We need to obfuscate at that first stage,
because the symbol ABI descriptions need to use obfuscated names.
However, having already obfuscated the assembly on the first stage,
there is no need to do so again on the second stage.
If we detect gensymabis is missing, we simply reuse the previous files.
This first situation doesn't seem racy,
but obfuscating the Go assembly files twice is certainly unnecessary.
Second, saveKnownReflectAPIs wrote a gob file to the build cache.
Since the build cache can be kept between builds,
and since the build cache uses reproducible paths for each build,
running the same "garble build" twice could overwrite those files.
This could actually cause races at the filesystem level;
if two concurrent builds write to the same gob file on disk,
one of them could end up using a partially-written file.
Note that this is the only of the three cases not using temporary files.
As such, it is expected that the file may already exist.
In such a case, we simply avoid overwriting it rather than failing.
Third, when "garble build -a" was used,
and when we needed an export file not listed in importcfg,
we would end up calling roughly:
go list -export -toolexec=garble -a <dependency>
This meant we would re-build and re-obfuscate those packages.
Which is unfortunate, because the parent process already did via:
go build -toolexec=garble -a <main>
The repeated dependency builds tripped the new os.O_EXCL check,
as we would try to overwrite the same obfuscated Go files.
Beyond being wasteful, this could again cause subtle filesystem races.
To fix the problem, avoid passing flags like "-a" to nested go commands.
Overall, we should likely be using safer ways to write to disk,
be it via either atomic writes or locked files.
However, for now, catching duplicate writes is a big step.
I have left a self-assigned TODO for further improvements.
CI on the pull request found a failure on test-gotip.
The failure reproduces on master, so it seems to be related to gotip,
and not a regression introduced by this change.
For now, disable test-gotip until we can investigate.
3 years ago
|
|
|
|
for _, path := range paths {
|
|
|
|
|
name := hashWithPackage(curPkg, filepath.Base(path))
|
|
|
|
|
pkgDir := filepath.Join(sharedTempDir, curPkg.obfuscatedImportPath())
|
fail if we are unexpectedly overwriting files (#418)
While investigating a bug report,
I noticed that garble was writing to the same temp file twice.
At best, writing to the same path on disk twice is wasteful,
as the design is careful to be deterministic and use unique paths.
At worst, the two writes could cause races at the filesystem level.
To prevent either of those situations,
we now create files with os.OpenFile and os.O_EXCL,
meaning that we will error if the file already exists.
That change uncovered a number of such unintended cases.
First, transformAsm would write obfuscated Go files twice.
This is because the Go toolchain actually runs:
[...]/asm -gensymabis [...] foo.s bar.s
[...]/asm [...] foo.s bar.s
That is, the first run is only meant to generate symbol ABIs,
which are then used by the compiler.
We need to obfuscate at that first stage,
because the symbol ABI descriptions need to use obfuscated names.
However, having already obfuscated the assembly on the first stage,
there is no need to do so again on the second stage.
If we detect gensymabis is missing, we simply reuse the previous files.
This first situation doesn't seem racy,
but obfuscating the Go assembly files twice is certainly unnecessary.
Second, saveKnownReflectAPIs wrote a gob file to the build cache.
Since the build cache can be kept between builds,
and since the build cache uses reproducible paths for each build,
running the same "garble build" twice could overwrite those files.
This could actually cause races at the filesystem level;
if two concurrent builds write to the same gob file on disk,
one of them could end up using a partially-written file.
Note that this is the only of the three cases not using temporary files.
As such, it is expected that the file may already exist.
In such a case, we simply avoid overwriting it rather than failing.
Third, when "garble build -a" was used,
and when we needed an export file not listed in importcfg,
we would end up calling roughly:
go list -export -toolexec=garble -a <dependency>
This meant we would re-build and re-obfuscate those packages.
Which is unfortunate, because the parent process already did via:
go build -toolexec=garble -a <main>
The repeated dependency builds tripped the new os.O_EXCL check,
as we would try to overwrite the same obfuscated Go files.
Beyond being wasteful, this could again cause subtle filesystem races.
To fix the problem, avoid passing flags like "-a" to nested go commands.
Overall, we should likely be using safer ways to write to disk,
be it via either atomic writes or locked files.
However, for now, catching duplicate writes is a big step.
I have left a self-assigned TODO for further improvements.
CI on the pull request found a failure on test-gotip.
The failure reproduces on master, so it seems to be related to gotip,
and not a regression introduced by this change.
For now, disable test-gotip until we can investigate.
3 years ago
|
|
|
|
newPath := filepath.Join(pkgDir, name)
|
|
|
|
|
newPaths = append(newPaths, newPath)
|
|
|
|
|
}
|
|
|
|
|
return append(flags, newPaths...), nil
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
const missingHeader = "missing header path"
|
|
|
|
|
newHeaderPaths := make(map[string]string)
|
slight simplifications and alloc reductions
Reuse a buffer and a map across loop iterations, because we can.
Make recordTypeDone only track named types, as that is enough to detect
type cycles. Without named types, there can be no cycles.
These two reduce allocs by a fraction of a percent:
name old time/op new time/op delta
Build-16 10.4s ± 2% 10.4s ± 1% ~ (p=0.739 n=10+10)
name old bin-B new bin-B delta
Build-16 5.51M ± 0% 5.51M ± 0% ~ (all equal)
name old cached-time/op new cached-time/op delta
Build-16 391ms ± 9% 407ms ± 7% ~ (p=0.095 n=10+9)
name old mallocs/op new mallocs/op delta
Build-16 34.5M ± 0% 34.4M ± 0% -0.12% (p=0.000 n=10+10)
name old sys-time/op new sys-time/op delta
Build-16 5.87s ± 5% 5.82s ± 5% ~ (p=0.182 n=10+9)
It doesn't seem like much, but remember that these stats are for the
entire set of processes, where garble only accounts for about 10% of the
total wall time when compared to the compiler or linker. So a ~0.1%
decrease globally is still significant.
linkerVariableStrings is also indexed by *types.Var rather than types.Object,
since -ldflags=-X only supports setting the string value of variables.
This shouldn't make a significant difference in terms of allocs,
but at least the map is less prone to confusion with other object types.
To ensure the new code doesn't trip up on non-variables, we add test cases.
Finally, for the sake of clarity, index into the types.Info maps like
Defs and Uses rather than calling ObjectOf if we know whether the
identifier we have is a definition of a name or the use of a defined name.
This isn't better in terms of performance, as ObjectOf is a tiny method,
but just like with linkerVariableStrings before, the new code is clearer.
3 years ago
|
|
|
|
var buf bytes.Buffer
|
|
|
|
|
for _, path := range paths {
|
|
|
|
|
// Read the entire file into memory.
|
|
|
|
|
// If we find issues with large files, we can use bufio.
|
|
|
|
|
content, err := os.ReadFile(path)
|
|
|
|
|
if err != nil {
|
|
|
|
|
return nil, err
|
|
|
|
|
}
|
|
|
|
|
offset := 0
|
|
|
|
|
for _, match := range rxIncludeHeader.FindAllSubmatchIndex(content, -1) {
|
|
|
|
|
start, end := offset+match[2], offset+match[3]
|
|
|
|
|
path := string(content[start:end])
|
|
|
|
|
if strings.ContainsAny(path, "\n\"") {
|
|
|
|
|
// If we failed to keep track of offsets, we could see a header
|
|
|
|
|
// path that contains quotes or newlines, which should not happen.
|
|
|
|
|
return nil, fmt.Errorf("bad offset tracking? %q", path)
|
|
|
|
|
}
|
|
|
|
|
newPath := newHeaderPaths[path]
|
|
|
|
|
switch newPath {
|
|
|
|
|
case missingHeader: // no need to try again
|
|
|
|
|
continue
|
|
|
|
|
case "": // first time we see this header
|
|
|
|
|
buf.Reset()
|
|
|
|
|
content, err := os.ReadFile(path)
|
|
|
|
|
if errors.Is(err, fs.ErrNotExist) {
|
|
|
|
|
newHeaderPaths[path] = missingHeader
|
|
|
|
|
continue // a header file provided by Go or the system
|
|
|
|
|
} else if err != nil {
|
|
|
|
|
return nil, err
|
|
|
|
|
}
|
|
|
|
|
replaceAsmNames(&buf, content)
|
|
|
|
|
|
|
|
|
|
// For now, we replace `foo.h` or `dir/foo.h` with `garbled_foo.h`.
|
|
|
|
|
// The different name ensures we don't use the unobfuscated file.
|
|
|
|
|
// This is far from perfect, but does the job for the time being.
|
|
|
|
|
// In the future, use a randomized name.
|
|
|
|
|
basename := filepath.Base(path)
|
|
|
|
|
newPath = "garbled_" + basename
|
|
|
|
|
|
|
|
|
|
if _, err := writeSourceFile(basename, newPath, buf.Bytes()); err != nil {
|
|
|
|
|
return nil, err
|
|
|
|
|
}
|
|
|
|
|
newHeaderPaths[path] = newPath
|
|
|
|
|
}
|
|
|
|
|
offset += len(newPath) - len(path)
|
|
|
|
|
// TODO: copying the bytes in a loop like this is far from optimal.
|
|
|
|
|
var newContent []byte
|
|
|
|
|
newContent = append(newContent, content[:start]...)
|
|
|
|
|
newContent = append(newContent, newPath...)
|
|
|
|
|
newContent = append(newContent, content[end:]...)
|
|
|
|
|
content = newContent
|
|
|
|
|
}
|
|
|
|
|
buf.Reset()
|
|
|
|
|
replaceAsmNames(&buf, content)
|
|
|
|
|
|
|
|
|
|
// With assembly files, we obfuscate the filename in the temporary
|
|
|
|
|
// directory, as assembly files do not support `/*line` directives.
|
|
|
|
|
basename := filepath.Base(path)
|
|
|
|
|
newName := hashWithPackage(curPkg, basename)
|
|
|
|
|
if path, err := writeSourceFile(basename, newName, buf.Bytes()); err != nil {
|
|
|
|
|
return nil, err
|
|
|
|
|
} else {
|
|
|
|
|
newPaths = append(newPaths, path)
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
return append(flags, newPaths...), nil
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
func replaceAsmNames(buf *bytes.Buffer, remaining []byte) {
|
|
|
|
|
// We need to replace all function references with their obfuscated name
|
|
|
|
|
// counterparts.
|
|
|
|
|
// Luckily, all func names in Go assembly files are immediately followed
|
|
|
|
|
// by the unicode "middle dot", like:
|
|
|
|
|
//
|
support obfuscating the time package
This failed at link time because transformAsm did not know how to handle
the fact that the runtime package's assembly code implements the
`time.now` function via:
TEXT time·now<ABIInternal>(SB),NOSPLIT,$16-24
First, we need transformAsm to happen for all packages, not just the
ones that we are obfuscating. This is because the runtime can implement
APIs in other packages which are themselves obfuscated, whereas runtime
may not itself be getting obfuscated. This is currently the case with
`GOGARBLE=*` as we do not yet support obfuscating the runtime.
Second, we need to teach replaceAsmNames to handle qualified names with
import paths. Not just to look up the right package information for the
name, but also to obfuscate the package path if necessary.
Third, we need to relax the Deps requirement on listPackage, since the
runtime package and its dependencies are always implicit dependencies.
This is a big step towards being able to obfuscate the runtime, as there
is now just one package left that we cannot obfuscate outside the runtime.
Updates #193.
3 years ago
|
|
|
|
// TEXT ·privateAdd(SB),$0-24
|
|
|
|
|
// TEXT runtime∕internal∕sys·Ctz64(SB), NOSPLIT, $0-12
|
|
|
|
|
const middleDot = '·'
|
|
|
|
|
middleDotLen := utf8.RuneLen(middleDot)
|
|
|
|
|
|
support obfuscating the time package
This failed at link time because transformAsm did not know how to handle
the fact that the runtime package's assembly code implements the
`time.now` function via:
TEXT time·now<ABIInternal>(SB),NOSPLIT,$16-24
First, we need transformAsm to happen for all packages, not just the
ones that we are obfuscating. This is because the runtime can implement
APIs in other packages which are themselves obfuscated, whereas runtime
may not itself be getting obfuscated. This is currently the case with
`GOGARBLE=*` as we do not yet support obfuscating the runtime.
Second, we need to teach replaceAsmNames to handle qualified names with
import paths. Not just to look up the right package information for the
name, but also to obfuscate the package path if necessary.
Third, we need to relax the Deps requirement on listPackage, since the
runtime package and its dependencies are always implicit dependencies.
This is a big step towards being able to obfuscate the runtime, as there
is now just one package left that we cannot obfuscate outside the runtime.
Updates #193.
3 years ago
|
|
|
|
// Note that import paths in assembly, like `runtime∕internal∕sys` above,
|
|
|
|
|
// use a Unicode slash rather than the ASCII one used by Go and `go list`.
|
|
|
|
|
// We need to convert to ASCII to find the right package information.
|
|
|
|
|
const asmPkgSlash = '∕'
|
|
|
|
|
const goPkgSlash = '/'
|
|
|
|
|
|
|
|
|
|
for {
|
|
|
|
|
i := bytes.IndexRune(remaining, middleDot)
|
|
|
|
|
if i < 0 {
|
|
|
|
|
buf.Write(remaining)
|
|
|
|
|
remaining = nil
|
|
|
|
|
break
|
|
|
|
|
}
|
|
|
|
|
|
support obfuscating the time package
This failed at link time because transformAsm did not know how to handle
the fact that the runtime package's assembly code implements the
`time.now` function via:
TEXT time·now<ABIInternal>(SB),NOSPLIT,$16-24
First, we need transformAsm to happen for all packages, not just the
ones that we are obfuscating. This is because the runtime can implement
APIs in other packages which are themselves obfuscated, whereas runtime
may not itself be getting obfuscated. This is currently the case with
`GOGARBLE=*` as we do not yet support obfuscating the runtime.
Second, we need to teach replaceAsmNames to handle qualified names with
import paths. Not just to look up the right package information for the
name, but also to obfuscate the package path if necessary.
Third, we need to relax the Deps requirement on listPackage, since the
runtime package and its dependencies are always implicit dependencies.
This is a big step towards being able to obfuscate the runtime, as there
is now just one package left that we cannot obfuscate outside the runtime.
Updates #193.
3 years ago
|
|
|
|
// The package name ends at the first rune which cannot be part of a Go
|
|
|
|
|
// import path, such as a comma or space.
|
|
|
|
|
pkgStart := i
|
|
|
|
|
for pkgStart >= 0 {
|
|
|
|
|
c, size := utf8.DecodeLastRune(remaining[:pkgStart])
|
|
|
|
|
if !unicode.IsLetter(c) && c != '_' && c != asmPkgSlash && !unicode.IsDigit(c) {
|
|
|
|
|
break
|
|
|
|
|
}
|
|
|
|
|
pkgStart -= size
|
|
|
|
|
}
|
|
|
|
|
asmPkgPath := string(remaining[pkgStart:i])
|
|
|
|
|
goPkgPath := strings.ReplaceAll(asmPkgPath, string(asmPkgSlash), string(goPkgSlash))
|
|
|
|
|
|
|
|
|
|
// Write the bytes before our unqualified `·foo` or qualified `pkg·foo`.
|
|
|
|
|
buf.Write(remaining[:pkgStart])
|
|
|
|
|
|
|
|
|
|
// If the name was qualified, fetch the package, and write the
|
|
|
|
|
// obfuscated import path if needed.
|
|
|
|
|
// Note that runtime/internal/startlinetest refers to runtime_test in
|
|
|
|
|
// one of its assembly files, and we currently do not always collect
|
|
|
|
|
// test packages in appendListedPackages for the sake of performance.
|
|
|
|
|
// We don't care about testing the runtime just yet, so work around it.
|
support obfuscating the time package
This failed at link time because transformAsm did not know how to handle
the fact that the runtime package's assembly code implements the
`time.now` function via:
TEXT time·now<ABIInternal>(SB),NOSPLIT,$16-24
First, we need transformAsm to happen for all packages, not just the
ones that we are obfuscating. This is because the runtime can implement
APIs in other packages which are themselves obfuscated, whereas runtime
may not itself be getting obfuscated. This is currently the case with
`GOGARBLE=*` as we do not yet support obfuscating the runtime.
Second, we need to teach replaceAsmNames to handle qualified names with
import paths. Not just to look up the right package information for the
name, but also to obfuscate the package path if necessary.
Third, we need to relax the Deps requirement on listPackage, since the
runtime package and its dependencies are always implicit dependencies.
This is a big step towards being able to obfuscate the runtime, as there
is now just one package left that we cannot obfuscate outside the runtime.
Updates #193.
3 years ago
|
|
|
|
lpkg := curPkg
|
|
|
|
|
if asmPkgPath != "" && asmPkgPath != "runtime_test" {
|
support obfuscating the time package
This failed at link time because transformAsm did not know how to handle
the fact that the runtime package's assembly code implements the
`time.now` function via:
TEXT time·now<ABIInternal>(SB),NOSPLIT,$16-24
First, we need transformAsm to happen for all packages, not just the
ones that we are obfuscating. This is because the runtime can implement
APIs in other packages which are themselves obfuscated, whereas runtime
may not itself be getting obfuscated. This is currently the case with
`GOGARBLE=*` as we do not yet support obfuscating the runtime.
Second, we need to teach replaceAsmNames to handle qualified names with
import paths. Not just to look up the right package information for the
name, but also to obfuscate the package path if necessary.
Third, we need to relax the Deps requirement on listPackage, since the
runtime package and its dependencies are always implicit dependencies.
This is a big step towards being able to obfuscate the runtime, as there
is now just one package left that we cannot obfuscate outside the runtime.
Updates #193.
3 years ago
|
|
|
|
var err error
|
|
|
|
|
lpkg, err = listPackage(goPkgPath)
|
|
|
|
|
if err != nil {
|
|
|
|
|
panic(err) // shouldn't happen
|
|
|
|
|
}
|
|
|
|
|
if lpkg.ToObfuscate {
|
|
|
|
|
// Note that we don't need to worry about asmPkgSlash here,
|
|
|
|
|
// because our obfuscated import paths contain no slashes right now.
|
|
|
|
|
buf.WriteString(lpkg.obfuscatedImportPath())
|
|
|
|
|
} else {
|
|
|
|
|
buf.WriteString(asmPkgPath)
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
support obfuscating the time package
This failed at link time because transformAsm did not know how to handle
the fact that the runtime package's assembly code implements the
`time.now` function via:
TEXT time·now<ABIInternal>(SB),NOSPLIT,$16-24
First, we need transformAsm to happen for all packages, not just the
ones that we are obfuscating. This is because the runtime can implement
APIs in other packages which are themselves obfuscated, whereas runtime
may not itself be getting obfuscated. This is currently the case with
`GOGARBLE=*` as we do not yet support obfuscating the runtime.
Second, we need to teach replaceAsmNames to handle qualified names with
import paths. Not just to look up the right package information for the
name, but also to obfuscate the package path if necessary.
Third, we need to relax the Deps requirement on listPackage, since the
runtime package and its dependencies are always implicit dependencies.
This is a big step towards being able to obfuscate the runtime, as there
is now just one package left that we cannot obfuscate outside the runtime.
Updates #193.
3 years ago
|
|
|
|
// Write the middle dot and advance the remaining slice.
|
|
|
|
|
buf.WriteRune(middleDot)
|
|
|
|
|
remaining = remaining[i+middleDotLen:]
|
|
|
|
|
|
support obfuscating the time package
This failed at link time because transformAsm did not know how to handle
the fact that the runtime package's assembly code implements the
`time.now` function via:
TEXT time·now<ABIInternal>(SB),NOSPLIT,$16-24
First, we need transformAsm to happen for all packages, not just the
ones that we are obfuscating. This is because the runtime can implement
APIs in other packages which are themselves obfuscated, whereas runtime
may not itself be getting obfuscated. This is currently the case with
`GOGARBLE=*` as we do not yet support obfuscating the runtime.
Second, we need to teach replaceAsmNames to handle qualified names with
import paths. Not just to look up the right package information for the
name, but also to obfuscate the package path if necessary.
Third, we need to relax the Deps requirement on listPackage, since the
runtime package and its dependencies are always implicit dependencies.
This is a big step towards being able to obfuscate the runtime, as there
is now just one package left that we cannot obfuscate outside the runtime.
Updates #193.
3 years ago
|
|
|
|
// The declared name ends at the first rune which cannot be part of a Go
|
|
|
|
|
// identifier, such as a comma or space.
|
|
|
|
|
nameEnd := 0
|
|
|
|
|
for nameEnd < len(remaining) {
|
|
|
|
|
c, size := utf8.DecodeRune(remaining[nameEnd:])
|
|
|
|
|
if !unicode.IsLetter(c) && c != '_' && !unicode.IsDigit(c) {
|
|
|
|
|
break
|
|
|
|
|
}
|
|
|
|
|
nameEnd += size
|
|
|
|
|
}
|
|
|
|
|
name := string(remaining[:nameEnd])
|
|
|
|
|
remaining = remaining[nameEnd:]
|
|
|
|
|
|
support obfuscating the time package
This failed at link time because transformAsm did not know how to handle
the fact that the runtime package's assembly code implements the
`time.now` function via:
TEXT time·now<ABIInternal>(SB),NOSPLIT,$16-24
First, we need transformAsm to happen for all packages, not just the
ones that we are obfuscating. This is because the runtime can implement
APIs in other packages which are themselves obfuscated, whereas runtime
may not itself be getting obfuscated. This is currently the case with
`GOGARBLE=*` as we do not yet support obfuscating the runtime.
Second, we need to teach replaceAsmNames to handle qualified names with
import paths. Not just to look up the right package information for the
name, but also to obfuscate the package path if necessary.
Third, we need to relax the Deps requirement on listPackage, since the
runtime package and its dependencies are always implicit dependencies.
This is a big step towards being able to obfuscate the runtime, as there
is now just one package left that we cannot obfuscate outside the runtime.
Updates #193.
3 years ago
|
|
|
|
if lpkg.ToObfuscate {
|
|
|
|
|
newName := hashWithPackage(lpkg, name)
|
|
|
|
|
if flagDebug { // TODO(mvdan): remove once https://go.dev/issue/53465 if fixed
|
|
|
|
|
log.Printf("asm name %q hashed with %x to %q", name, curPkg.GarbleActionID, newName)
|
|
|
|
|
}
|
|
|
|
|
buf.WriteString(newName)
|
|
|
|
|
} else {
|
|
|
|
|
buf.WriteString(name)
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// writeSourceFile is a mix between os.CreateTemp and os.WriteFile, as it writes a
|
avoid reproducibility issues with full rebuilds
We were using temporary filenames for modified Go and assembly files.
For example, an obfuscated "encoding/json/encode.go" would end up as:
/tmp/garble-shared123/encode.go.456.go
where "123" and "456" are random numbers, usually longer.
This was usually fine for two reasons:
1) We would add "/tmp/garble-shared123/" to -trimpath, so the temporary
directory and its random number would be invisible.
2) We would add "//line" directives to the source files, replacing
the filename with obfuscated versions excluding any random number.
Unfortunately, this broke in multiple ways. Most notably, assembly files
do not have any line directives, and it's not clear that there's any
support for them. So the random number in their basename could end up in
the binary, breaking reproducibility.
Another issue is that the -trimpath addition described above was only
done for cmd/compile, not cmd/asm, so assembly filenames included the
randomized temporary directory.
To fix the issues above, the same "encoding/json/encode.go" would now
end up as:
/tmp/garble-shared123/encoding/json/encode.go
Such a path is still unique even though the "456" random number is gone,
as import paths are unique within a single build.
This fixes issues with the base name of each file, so we no longer rely
on line directives as the only way to remove the second original random
number.
We still rely on -trimpath to get rid of the temporary directory in
filenames. To fix its problem with assembly files, also amend the
-trimpath flag when running the assembler tool.
Finally, add a test that reproducible builds still work when a full
rebuild is done. We choose goprivate.txt for such a test as its
stdimporter package imports a number of std packages, including uses of
assembly and cgo.
For the time being, we don't use such a "full rebuild" reproducibility
test in other test scripts, as this step is expensive, rebuilding many
packages from scratch.
This issue went unnoticed for over a year because such random numbers
"123" and "456" were created when a package was obfuscated, and that
only happened once per package version as long as the build cache was
kept intact.
When clearing the build cache, or forcing a rebuild with -a, one gets
new random numbers, and thus a different binary resulting from the same
build input. That's not something that most users would do regularly,
and our tests did not cover that edge case either, until now.
Fixes #328.
4 years ago
|
|
|
|
// named source file in sharedTempDir given an input buffer.
|
|
|
|
|
//
|
avoid reproducibility issues with full rebuilds
We were using temporary filenames for modified Go and assembly files.
For example, an obfuscated "encoding/json/encode.go" would end up as:
/tmp/garble-shared123/encode.go.456.go
where "123" and "456" are random numbers, usually longer.
This was usually fine for two reasons:
1) We would add "/tmp/garble-shared123/" to -trimpath, so the temporary
directory and its random number would be invisible.
2) We would add "//line" directives to the source files, replacing
the filename with obfuscated versions excluding any random number.
Unfortunately, this broke in multiple ways. Most notably, assembly files
do not have any line directives, and it's not clear that there's any
support for them. So the random number in their basename could end up in
the binary, breaking reproducibility.
Another issue is that the -trimpath addition described above was only
done for cmd/compile, not cmd/asm, so assembly filenames included the
randomized temporary directory.
To fix the issues above, the same "encoding/json/encode.go" would now
end up as:
/tmp/garble-shared123/encoding/json/encode.go
Such a path is still unique even though the "456" random number is gone,
as import paths are unique within a single build.
This fixes issues with the base name of each file, so we no longer rely
on line directives as the only way to remove the second original random
number.
We still rely on -trimpath to get rid of the temporary directory in
filenames. To fix its problem with assembly files, also amend the
-trimpath flag when running the assembler tool.
Finally, add a test that reproducible builds still work when a full
rebuild is done. We choose goprivate.txt for such a test as its
stdimporter package imports a number of std packages, including uses of
assembly and cgo.
For the time being, we don't use such a "full rebuild" reproducibility
test in other test scripts, as this step is expensive, rebuilding many
packages from scratch.
This issue went unnoticed for over a year because such random numbers
"123" and "456" were created when a package was obfuscated, and that
only happened once per package version as long as the build cache was
kept intact.
When clearing the build cache, or forcing a rebuild with -a, one gets
new random numbers, and thus a different binary resulting from the same
build input. That's not something that most users would do regularly,
and our tests did not cover that edge case either, until now.
Fixes #328.
4 years ago
|
|
|
|
// Note that the file is created under a directory tree following curPkg's
|
|
|
|
|
// import path, mimicking how files are laid out in modules and GOROOT.
|
|
|
|
|
func writeSourceFile(basename, obfuscated string, content []byte) (string, error) {
|
|
|
|
|
// Uncomment for some quick debugging. Do not delete.
|
|
|
|
|
// fmt.Fprintf(os.Stderr, "\n-- %s/%s --\n%s", curPkg.ImportPath, basename, content)
|
|
|
|
|
|
|
|
|
|
if flagDebugDir != "" {
|
|
|
|
|
pkgDir := filepath.Join(flagDebugDir, filepath.FromSlash(curPkg.ImportPath))
|
|
|
|
|
if err := os.MkdirAll(pkgDir, 0o755); err != nil {
|
|
|
|
|
return "", err
|
|
|
|
|
}
|
|
|
|
|
dstPath := filepath.Join(pkgDir, basename)
|
|
|
|
|
if err := os.WriteFile(dstPath, content, 0o666); err != nil {
|
|
|
|
|
return "", err
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
// We use the obfuscated import path to hold the temporary files.
|
|
|
|
|
// Assembly files do not support line directives to set positions,
|
|
|
|
|
// so the only way to not leak the import path is to replace it.
|
|
|
|
|
pkgDir := filepath.Join(sharedTempDir, curPkg.obfuscatedImportPath())
|
avoid reproducibility issues with full rebuilds
We were using temporary filenames for modified Go and assembly files.
For example, an obfuscated "encoding/json/encode.go" would end up as:
/tmp/garble-shared123/encode.go.456.go
where "123" and "456" are random numbers, usually longer.
This was usually fine for two reasons:
1) We would add "/tmp/garble-shared123/" to -trimpath, so the temporary
directory and its random number would be invisible.
2) We would add "//line" directives to the source files, replacing
the filename with obfuscated versions excluding any random number.
Unfortunately, this broke in multiple ways. Most notably, assembly files
do not have any line directives, and it's not clear that there's any
support for them. So the random number in their basename could end up in
the binary, breaking reproducibility.
Another issue is that the -trimpath addition described above was only
done for cmd/compile, not cmd/asm, so assembly filenames included the
randomized temporary directory.
To fix the issues above, the same "encoding/json/encode.go" would now
end up as:
/tmp/garble-shared123/encoding/json/encode.go
Such a path is still unique even though the "456" random number is gone,
as import paths are unique within a single build.
This fixes issues with the base name of each file, so we no longer rely
on line directives as the only way to remove the second original random
number.
We still rely on -trimpath to get rid of the temporary directory in
filenames. To fix its problem with assembly files, also amend the
-trimpath flag when running the assembler tool.
Finally, add a test that reproducible builds still work when a full
rebuild is done. We choose goprivate.txt for such a test as its
stdimporter package imports a number of std packages, including uses of
assembly and cgo.
For the time being, we don't use such a "full rebuild" reproducibility
test in other test scripts, as this step is expensive, rebuilding many
packages from scratch.
This issue went unnoticed for over a year because such random numbers
"123" and "456" were created when a package was obfuscated, and that
only happened once per package version as long as the build cache was
kept intact.
When clearing the build cache, or forcing a rebuild with -a, one gets
new random numbers, and thus a different binary resulting from the same
build input. That's not something that most users would do regularly,
and our tests did not cover that edge case either, until now.
Fixes #328.
4 years ago
|
|
|
|
if err := os.MkdirAll(pkgDir, 0o777); err != nil {
|
|
|
|
|
return "", err
|
|
|
|
|
}
|
|
|
|
|
dstPath := filepath.Join(pkgDir, obfuscated)
|
fail if we are unexpectedly overwriting files (#418)
While investigating a bug report,
I noticed that garble was writing to the same temp file twice.
At best, writing to the same path on disk twice is wasteful,
as the design is careful to be deterministic and use unique paths.
At worst, the two writes could cause races at the filesystem level.
To prevent either of those situations,
we now create files with os.OpenFile and os.O_EXCL,
meaning that we will error if the file already exists.
That change uncovered a number of such unintended cases.
First, transformAsm would write obfuscated Go files twice.
This is because the Go toolchain actually runs:
[...]/asm -gensymabis [...] foo.s bar.s
[...]/asm [...] foo.s bar.s
That is, the first run is only meant to generate symbol ABIs,
which are then used by the compiler.
We need to obfuscate at that first stage,
because the symbol ABI descriptions need to use obfuscated names.
However, having already obfuscated the assembly on the first stage,
there is no need to do so again on the second stage.
If we detect gensymabis is missing, we simply reuse the previous files.
This first situation doesn't seem racy,
but obfuscating the Go assembly files twice is certainly unnecessary.
Second, saveKnownReflectAPIs wrote a gob file to the build cache.
Since the build cache can be kept between builds,
and since the build cache uses reproducible paths for each build,
running the same "garble build" twice could overwrite those files.
This could actually cause races at the filesystem level;
if two concurrent builds write to the same gob file on disk,
one of them could end up using a partially-written file.
Note that this is the only of the three cases not using temporary files.
As such, it is expected that the file may already exist.
In such a case, we simply avoid overwriting it rather than failing.
Third, when "garble build -a" was used,
and when we needed an export file not listed in importcfg,
we would end up calling roughly:
go list -export -toolexec=garble -a <dependency>
This meant we would re-build and re-obfuscate those packages.
Which is unfortunate, because the parent process already did via:
go build -toolexec=garble -a <main>
The repeated dependency builds tripped the new os.O_EXCL check,
as we would try to overwrite the same obfuscated Go files.
Beyond being wasteful, this could again cause subtle filesystem races.
To fix the problem, avoid passing flags like "-a" to nested go commands.
Overall, we should likely be using safer ways to write to disk,
be it via either atomic writes or locked files.
However, for now, catching duplicate writes is a big step.
I have left a self-assigned TODO for further improvements.
CI on the pull request found a failure on test-gotip.
The failure reproduces on master, so it seems to be related to gotip,
and not a regression introduced by this change.
For now, disable test-gotip until we can investigate.
3 years ago
|
|
|
|
if err := writeFileExclusive(dstPath, content); err != nil {
|
|
|
|
|
return "", err
|
|
|
|
|
}
|
avoid reproducibility issues with full rebuilds
We were using temporary filenames for modified Go and assembly files.
For example, an obfuscated "encoding/json/encode.go" would end up as:
/tmp/garble-shared123/encode.go.456.go
where "123" and "456" are random numbers, usually longer.
This was usually fine for two reasons:
1) We would add "/tmp/garble-shared123/" to -trimpath, so the temporary
directory and its random number would be invisible.
2) We would add "//line" directives to the source files, replacing
the filename with obfuscated versions excluding any random number.
Unfortunately, this broke in multiple ways. Most notably, assembly files
do not have any line directives, and it's not clear that there's any
support for them. So the random number in their basename could end up in
the binary, breaking reproducibility.
Another issue is that the -trimpath addition described above was only
done for cmd/compile, not cmd/asm, so assembly filenames included the
randomized temporary directory.
To fix the issues above, the same "encoding/json/encode.go" would now
end up as:
/tmp/garble-shared123/encoding/json/encode.go
Such a path is still unique even though the "456" random number is gone,
as import paths are unique within a single build.
This fixes issues with the base name of each file, so we no longer rely
on line directives as the only way to remove the second original random
number.
We still rely on -trimpath to get rid of the temporary directory in
filenames. To fix its problem with assembly files, also amend the
-trimpath flag when running the assembler tool.
Finally, add a test that reproducible builds still work when a full
rebuild is done. We choose goprivate.txt for such a test as its
stdimporter package imports a number of std packages, including uses of
assembly and cgo.
For the time being, we don't use such a "full rebuild" reproducibility
test in other test scripts, as this step is expensive, rebuilding many
packages from scratch.
This issue went unnoticed for over a year because such random numbers
"123" and "456" were created when a package was obfuscated, and that
only happened once per package version as long as the build cache was
kept intact.
When clearing the build cache, or forcing a rebuild with -a, one gets
new random numbers, and thus a different binary resulting from the same
build input. That's not something that most users would do regularly,
and our tests did not cover that edge case either, until now.
Fixes #328.
4 years ago
|
|
|
|
return dstPath, nil
|
|
|
|
|
}
|
|
|
|
|
|
reimplement import path obfuscation without goobj2 (#242)
We used to rely on a parallel implementation of an object file parser
and writer to be able to obfuscate import paths. After compiling each
package, we would parse the object file, replace the import paths, and
write the updated object file in-place.
That worked well, in most cases. Unfortunately, it had some flaws:
* Complexity. Even when most of the code is maintained in a separate
module, the import_obfuscation.go file was still close to a thousand
lines of code.
* Go compatibility. The object file format changes between Go releases,
so we were supporting Go 1.15, but not 1.16. Fixing the object file
package to work with 1.16 would probably break 1.15 support.
* Bugs. For example, we recently had to add a workaround for #224, since
import paths containing dots after the domain would end up escaped.
Another example is #190, which seems to be caused by the object file
parser or writer corrupting the compiled code and causing segfaults in
some rare edge cases.
Instead, let's drop that method entirely, and force the compiler and
linker to do the work for us. The steps necessary when compiling a
package to obfuscate are:
1) Replace its "package foo" lines with the obfuscated package path. No
need to separate the package path and name, since the obfuscated path
does not contain slashes.
2) Replace the "-p pkg/foo" flag with the obfuscated path.
3) Replace the "import" spec lines with the obfuscated package paths,
for those dependencies which were obfuscated.
4) Replace the "-importcfg [...]" file with a version that uses the
obfuscated paths instead.
The linker also needs that last step, since it also uses an importcfg
file to find object files.
There are three noteworthy drawbacks to this new method:
1) Since we no longer write object files, we can't use them to store
data to be cached. As such, the -debugdir flag goes back to using the
"-a" build flag to always rebuild all packages. On the plus side,
that caching didn't work very well; see #176.
2) The package name "main" remains in all declarations under it, not
just "func main", since we can only rename entire packages. This
seems fine, as it gives little information to the end user.
3) The -tiny mode no longer sets all lines to 0, since it did that by
modifying object files. As a temporary measure, we instead set all
top-level declarations to be on line 1. A TODO is added to hopefully
improve this again in the near future.
The upside is that we get rid of all the issues mentioned before. Plus,
garble now nearly works with Go 1.16, with the exception of two very
minor bugs that look fixable. A follow-up PR will take care of that and
start testing on 1.16.
Fixes #176.
Fixes #190.
4 years ago
|
|
|
|
func transformCompile(args []string) ([]string, error) {
|
|
|
|
|
var err error
|
|
|
|
|
flags, paths := splitFlagsFromFiles(args, ".go")
|
always use the compiler's -dwarf=false flag (#96)
First, our original append line was completely ineffective; we never
used that "flags" slice again. Second, we only attempted to use the flag
when we obfuscated a package.
In fact, we never care about debugging information here, so for any
package we compile, we can add "-dwarf=false". At the moment, we compile
all packages, even if they aren't to be obfuscated, due to the lack of
access to the build cache.
As such, we save a significant amount of work. The numbers below were
obtained on a quiet machine with "go test -bench=. -benchtime=10x", six
times before and after the change.
name old time/op new time/op delta
Build-8 2.06s ± 4% 1.87s ± 2% -9.21% (p=0.002 n=6+6)
name old sys-time/op new sys-time/op delta
Build-8 1.51s ± 2% 1.46s ± 1% -3.12% (p=0.004 n=6+5)
name old user-time/op new user-time/op delta
Build-8 11.9s ± 2% 10.8s ± 1% -8.71% (p=0.002 n=6+6)
While at it, only do CI builds on pushes and PRs to the master branch,
so that my PRs created from the same repo don't trigger duplicate
builds.
5 years ago
|
|
|
|
|
|
|
|
|
// We will force the linker to drop DWARF via -w, so don't spend time
|
|
|
|
|
// generating it.
|
|
|
|
|
flags = append(flags, "-dwarf=false")
|
|
|
|
|
|
|
|
|
|
var files []*ast.File
|
|
|
|
|
for _, path := range paths {
|
|
|
|
|
file, err := parser.ParseFile(fset, path, nil, parser.SkipObjectResolution|parser.ParseComments)
|
|
|
|
|
if err != nil {
|
reimplement import path obfuscation without goobj2 (#242)
We used to rely on a parallel implementation of an object file parser
and writer to be able to obfuscate import paths. After compiling each
package, we would parse the object file, replace the import paths, and
write the updated object file in-place.
That worked well, in most cases. Unfortunately, it had some flaws:
* Complexity. Even when most of the code is maintained in a separate
module, the import_obfuscation.go file was still close to a thousand
lines of code.
* Go compatibility. The object file format changes between Go releases,
so we were supporting Go 1.15, but not 1.16. Fixing the object file
package to work with 1.16 would probably break 1.15 support.
* Bugs. For example, we recently had to add a workaround for #224, since
import paths containing dots after the domain would end up escaped.
Another example is #190, which seems to be caused by the object file
parser or writer corrupting the compiled code and causing segfaults in
some rare edge cases.
Instead, let's drop that method entirely, and force the compiler and
linker to do the work for us. The steps necessary when compiling a
package to obfuscate are:
1) Replace its "package foo" lines with the obfuscated package path. No
need to separate the package path and name, since the obfuscated path
does not contain slashes.
2) Replace the "-p pkg/foo" flag with the obfuscated path.
3) Replace the "import" spec lines with the obfuscated package paths,
for those dependencies which were obfuscated.
4) Replace the "-importcfg [...]" file with a version that uses the
obfuscated paths instead.
The linker also needs that last step, since it also uses an importcfg
file to find object files.
There are three noteworthy drawbacks to this new method:
1) Since we no longer write object files, we can't use them to store
data to be cached. As such, the -debugdir flag goes back to using the
"-a" build flag to always rebuild all packages. On the plus side,
that caching didn't work very well; see #176.
2) The package name "main" remains in all declarations under it, not
just "func main", since we can only rename entire packages. This
seems fine, as it gives little information to the end user.
3) The -tiny mode no longer sets all lines to 0, since it did that by
modifying object files. As a temporary measure, we instead set all
top-level declarations to be on line 1. A TODO is added to hopefully
improve this again in the near future.
The upside is that we get rid of all the issues mentioned before. Plus,
garble now nearly works with Go 1.16, with the exception of two very
minor bugs that look fixable. A follow-up PR will take care of that and
start testing on 1.16.
Fixes #176.
Fixes #190.
4 years ago
|
|
|
|
return nil, err
|
|
|
|
|
}
|
|
|
|
|
files = append(files, file)
|
|
|
|
|
}
|
|
|
|
|
tf := newTransformer()
|
|
|
|
|
if err := tf.typecheck(files); err != nil {
|
|
|
|
|
return nil, err
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
flags = alterTrimpath(flags)
|
|
|
|
|
|
fail if we are unexpectedly overwriting files (#418)
While investigating a bug report,
I noticed that garble was writing to the same temp file twice.
At best, writing to the same path on disk twice is wasteful,
as the design is careful to be deterministic and use unique paths.
At worst, the two writes could cause races at the filesystem level.
To prevent either of those situations,
we now create files with os.OpenFile and os.O_EXCL,
meaning that we will error if the file already exists.
That change uncovered a number of such unintended cases.
First, transformAsm would write obfuscated Go files twice.
This is because the Go toolchain actually runs:
[...]/asm -gensymabis [...] foo.s bar.s
[...]/asm [...] foo.s bar.s
That is, the first run is only meant to generate symbol ABIs,
which are then used by the compiler.
We need to obfuscate at that first stage,
because the symbol ABI descriptions need to use obfuscated names.
However, having already obfuscated the assembly on the first stage,
there is no need to do so again on the second stage.
If we detect gensymabis is missing, we simply reuse the previous files.
This first situation doesn't seem racy,
but obfuscating the Go assembly files twice is certainly unnecessary.
Second, saveKnownReflectAPIs wrote a gob file to the build cache.
Since the build cache can be kept between builds,
and since the build cache uses reproducible paths for each build,
running the same "garble build" twice could overwrite those files.
This could actually cause races at the filesystem level;
if two concurrent builds write to the same gob file on disk,
one of them could end up using a partially-written file.
Note that this is the only of the three cases not using temporary files.
As such, it is expected that the file may already exist.
In such a case, we simply avoid overwriting it rather than failing.
Third, when "garble build -a" was used,
and when we needed an export file not listed in importcfg,
we would end up calling roughly:
go list -export -toolexec=garble -a <dependency>
This meant we would re-build and re-obfuscate those packages.
Which is unfortunate, because the parent process already did via:
go build -toolexec=garble -a <main>
The repeated dependency builds tripped the new os.O_EXCL check,
as we would try to overwrite the same obfuscated Go files.
Beyond being wasteful, this could again cause subtle filesystem races.
To fix the problem, avoid passing flags like "-a" to nested go commands.
Overall, we should likely be using safer ways to write to disk,
be it via either atomic writes or locked files.
However, for now, catching duplicate writes is a big step.
I have left a self-assigned TODO for further improvements.
CI on the pull request found a failure on test-gotip.
The failure reproduces on master, so it seems to be related to gotip,
and not a regression introduced by this change.
For now, disable test-gotip until we can investigate.
3 years ago
|
|
|
|
// Note that if the file already exists in the cache from another build,
|
|
|
|
|
// we don't need to write to it again thanks to the hash.
|
|
|
|
|
// TODO: as an optimization, just load that one gob file.
|
stop relying on nested "go list -toolexec" calls (#422)
We rely on importcfg files to load type info for obfuscated packages.
We use this type information to remember what names we didn't obfuscate.
Unfortunately, indirect dependencies aren't listed in importcfg files,
so we relied on extra "go list -toolexec" calls to locate object files.
This worked fine, but added a significant amount of complexity.
The extra "go list -export -toolexec=garble" invocations weren't slow,
as they avoided rebuilding or re-obfuscating thanks to the build cache.
Still, it was hard to reason about how garble runs during a build
if we might have multiple layers of -toolexec invocations.
Instead, record the export files we encounter in an incremental map,
and persist it in the build cache via the gob file we're already using.
This way, each garble invocation knows where all object files are,
even those for indirect imports.
One wrinkle is that importcfg files can point to temporary object files.
In that case, figure out its final location in the build cache.
This requires hard-coding a bit of knowledge about how GOCACHE works,
but it seems relatively harmless given how it's very little code.
Plus, if GOCACHE ever changes, it will be obvious when our code breaks.
Finally, add a TODO about potentially saving even more work.
3 years ago
|
|
|
|
if err := loadCachedOutputs(); err != nil {
|
|
|
|
|
return nil, err
|
|
|
|
|
}
|
stop relying on nested "go list -toolexec" calls (#422)
We rely on importcfg files to load type info for obfuscated packages.
We use this type information to remember what names we didn't obfuscate.
Unfortunately, indirect dependencies aren't listed in importcfg files,
so we relied on extra "go list -toolexec" calls to locate object files.
This worked fine, but added a significant amount of complexity.
The extra "go list -export -toolexec=garble" invocations weren't slow,
as they avoided rebuilding or re-obfuscating thanks to the build cache.
Still, it was hard to reason about how garble runs during a build
if we might have multiple layers of -toolexec invocations.
Instead, record the export files we encounter in an incremental map,
and persist it in the build cache via the gob file we're already using.
This way, each garble invocation knows where all object files are,
even those for indirect imports.
One wrinkle is that importcfg files can point to temporary object files.
In that case, figure out its final location in the build cache.
This requires hard-coding a bit of knowledge about how GOCACHE works,
but it seems relatively harmless given how it's very little code.
Plus, if GOCACHE ever changes, it will be obvious when our code breaks.
Finally, add a TODO about potentially saving even more work.
3 years ago
|
|
|
|
|
|
|
|
|
tf.findReflectFunctions(files)
|
stop relying on nested "go list -toolexec" calls (#422)
We rely on importcfg files to load type info for obfuscated packages.
We use this type information to remember what names we didn't obfuscate.
Unfortunately, indirect dependencies aren't listed in importcfg files,
so we relied on extra "go list -toolexec" calls to locate object files.
This worked fine, but added a significant amount of complexity.
The extra "go list -export -toolexec=garble" invocations weren't slow,
as they avoided rebuilding or re-obfuscating thanks to the build cache.
Still, it was hard to reason about how garble runs during a build
if we might have multiple layers of -toolexec invocations.
Instead, record the export files we encounter in an incremental map,
and persist it in the build cache via the gob file we're already using.
This way, each garble invocation knows where all object files are,
even those for indirect imports.
One wrinkle is that importcfg files can point to temporary object files.
In that case, figure out its final location in the build cache.
This requires hard-coding a bit of knowledge about how GOCACHE works,
but it seems relatively harmless given how it's very little code.
Plus, if GOCACHE ever changes, it will be obvious when our code breaks.
Finally, add a TODO about potentially saving even more work.
3 years ago
|
|
|
|
newImportCfg, err := processImportCfg(flags)
|
|
|
|
|
if err != nil {
|
|
|
|
|
return nil, err
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Literal obfuscation uses math/rand, so seed it deterministically.
|
|
|
|
|
randSeed := curPkg.GarbleActionID
|
|
|
|
|
if flagSeed.present() {
|
|
|
|
|
randSeed = flagSeed.bytes
|
|
|
|
|
}
|
|
|
|
|
// log.Printf("seeding math/rand with %x\n", randSeed)
|
|
|
|
|
mathrand.Seed(int64(binary.BigEndian.Uint64(randSeed)))
|
|
|
|
|
|
|
|
|
|
if err := tf.prefillObjectMaps(files); err != nil {
|
|
|
|
|
return nil, err
|
|
|
|
|
}
|
|
|
|
|
|
concentrate and simplify "to obfuscate" logic
Back in the day, we used to call toObfuscate anytime we needed to know
whether a package should be obfuscated.
More recently, we started computing via the ToObfuscate field,
which then gets shared with all sub-processes via sharedCache.
We still had two places that directly called toObfuscate.
Replace those with ToObfuscate, and inline toObfuscate into shared.go.
obfuscatedImportPath is also a potential footgun for main packages.
Some use cases always want the original "main" package name,
such as for use in the compiler's "-p main" flag,
while other cases want the obfuscated package import path,
such as the entries in importcfg files.
Since each of these call sites handles the edge case well,
obfuscatedImportPath now panics on main packages to avoid any misuse.
Finally, test that we never leak main package paths via ldflags.txt.
We never did, but it's good to make sure.
Overall, this avoids confusion and trims the size of main.go a bit.
3 years ago
|
|
|
|
// If this is a package to obfuscate, swap the -p flag with the new package path.
|
|
|
|
|
// We don't if it's the main package, as that just uses "-p main".
|
|
|
|
|
// We only set newPkgPath if we're obfuscating the import path,
|
|
|
|
|
// to replace the original package name in the package clause below.
|
refactor "current package" with TOOLEXEC_IMPORTPATH (#266)
Now that we've dropped support for Go 1.15.x, we can finally rely on
this environment variable for toolexec calls, present in Go 1.16.
Before, we had hacky ways of trying to figure out the current package's
import path, mostly from the -p flag. The biggest rough edge there was
that, for main packages, that was simply the package name, and not its
full import path.
To work around that, we had a restriction on a single main package, so
we could work around that issue. That restriction is now gone.
The new code is simpler, especially because we can set curPkg in a
single place for all toolexec transform funcs.
Since we can always rely on curPkg not being nil now, we can also start
reusing listedPackage.Private and avoid the majority of repeated calls
to isPrivate. The function is cheap, but still not free.
isPrivate itself can also get simpler. We no longer have to worry about
the "main" edge case. Plus, the sanity check for invalid package paths
is now unnecessary; we only got malformed paths from goobj2, and we now
require exact matches with the ImportPath field from "go list -json".
Another effect of clearing up the "main" edge case is that -debugdir now
uses the right directory for main packages. We also start using
consistent debugdir paths in the tests, for the sake of being easier to
read and maintain.
Finally, note that commandReverse did not need the extra call to "go
list -toolexec", as the "shared" call stored in the cache is enough. We
still call toolexecCmd to get said cache, which should probably be
simplified in a future PR.
While at it, replace the use of the "-std" compiler flag with the
Standard field from "go list -json".
4 years ago
|
|
|
|
newPkgPath := ""
|
deprecate using GOPRIVATE in favor of GOGARBLE (#427)
Piggybacking off of GOPRIVATE is great for a number of reasons:
* People tend to obfuscate private code, whose package paths will
generally be in GOPRIVATE already
* Its meaning and syntax are well understood
* It allows all the flexibility we need without adding our own env var
or config option
However, using GOPRIVATE directly has one main drawback.
It's fairly common to also want to obfuscate public dependencies,
to make the code in private packages even harder to follow.
However, using "GOPRIVATE=*" will result in two main downsides:
* GONOPROXY defaults to GOPRIVATE, so the proxy would be entirely disabled.
Downloading modules, such as when adding or updating dependencies,
or when the local cache is cold, can be less reliable.
* GONOSUMDB defaults to GOPRIVATE, so the sumdb would be entirely disabled.
Adding entries to go.sum, such as when adding or updating dependencies,
can be less secure.
We will continue to consume GOPRIVATE as a fallback,
but we now expect users to set GOGARBLE instead.
The new logic is documented in the README.
While here, rewrite some uses of "private" with "to obfuscate",
to make the code easier to follow and harder to misunderstand.
Fixes #276.
3 years ago
|
|
|
|
if curPkg.Name != "main" && curPkg.ToObfuscate {
|
|
|
|
|
newPkgPath = curPkg.obfuscatedImportPath()
|
reimplement import path obfuscation without goobj2 (#242)
We used to rely on a parallel implementation of an object file parser
and writer to be able to obfuscate import paths. After compiling each
package, we would parse the object file, replace the import paths, and
write the updated object file in-place.
That worked well, in most cases. Unfortunately, it had some flaws:
* Complexity. Even when most of the code is maintained in a separate
module, the import_obfuscation.go file was still close to a thousand
lines of code.
* Go compatibility. The object file format changes between Go releases,
so we were supporting Go 1.15, but not 1.16. Fixing the object file
package to work with 1.16 would probably break 1.15 support.
* Bugs. For example, we recently had to add a workaround for #224, since
import paths containing dots after the domain would end up escaped.
Another example is #190, which seems to be caused by the object file
parser or writer corrupting the compiled code and causing segfaults in
some rare edge cases.
Instead, let's drop that method entirely, and force the compiler and
linker to do the work for us. The steps necessary when compiling a
package to obfuscate are:
1) Replace its "package foo" lines with the obfuscated package path. No
need to separate the package path and name, since the obfuscated path
does not contain slashes.
2) Replace the "-p pkg/foo" flag with the obfuscated path.
3) Replace the "import" spec lines with the obfuscated package paths,
for those dependencies which were obfuscated.
4) Replace the "-importcfg [...]" file with a version that uses the
obfuscated paths instead.
The linker also needs that last step, since it also uses an importcfg
file to find object files.
There are three noteworthy drawbacks to this new method:
1) Since we no longer write object files, we can't use them to store
data to be cached. As such, the -debugdir flag goes back to using the
"-a" build flag to always rebuild all packages. On the plus side,
that caching didn't work very well; see #176.
2) The package name "main" remains in all declarations under it, not
just "func main", since we can only rename entire packages. This
seems fine, as it gives little information to the end user.
3) The -tiny mode no longer sets all lines to 0, since it did that by
modifying object files. As a temporary measure, we instead set all
top-level declarations to be on line 1. A TODO is added to hopefully
improve this again in the near future.
The upside is that we get rid of all the issues mentioned before. Plus,
garble now nearly works with Go 1.16, with the exception of two very
minor bugs that look fixable. A follow-up PR will take care of that and
start testing on 1.16.
Fixes #176.
Fixes #190.
4 years ago
|
|
|
|
flags = flagSetValue(flags, "-p", newPkgPath)
|
|
|
|
|
}
|
|
|
|
|
|
always use the compiler's -dwarf=false flag (#96)
First, our original append line was completely ineffective; we never
used that "flags" slice again. Second, we only attempted to use the flag
when we obfuscated a package.
In fact, we never care about debugging information here, so for any
package we compile, we can add "-dwarf=false". At the moment, we compile
all packages, even if they aren't to be obfuscated, due to the lack of
access to the build cache.
As such, we save a significant amount of work. The numbers below were
obtained on a quiet machine with "go test -bench=. -benchtime=10x", six
times before and after the change.
name old time/op new time/op delta
Build-8 2.06s ± 4% 1.87s ± 2% -9.21% (p=0.002 n=6+6)
name old sys-time/op new sys-time/op delta
Build-8 1.51s ± 2% 1.46s ± 1% -3.12% (p=0.004 n=6+5)
name old user-time/op new user-time/op delta
Build-8 11.9s ± 2% 10.8s ± 1% -8.71% (p=0.002 n=6+6)
While at it, only do CI builds on pushes and PRs to the master branch,
so that my PRs created from the same repo don't trigger duplicate
builds.
5 years ago
|
|
|
|
newPaths := make([]string, 0, len(files))
|
|
|
|
|
|
|
|
|
|
for i, file := range files {
|
|
|
|
|
basename := filepath.Base(paths[i])
|
|
|
|
|
log.Printf("obfuscating %s", basename)
|
unify the definition and storage of flag values
The parent garble process parses the original flags,
as provided by the user via the command line.
Previously, those got stored in the shared cache file,
so that child processes spawned by toolexec could see them.
Unfortunately, this made the code relatively easy to misuse.
A child process would always see flagLiterals as zero value,
given that it should never see such a flag argument directly.
Similarly, one would have to be careful with cached options,
as they could only be consumed after the cache file is loaded.
Simplify the situation by deduplicating the storage of flags.
Now, the parent passes all flags onto children via toolexec.
One exception is GarbleDir, which now becomes an env var.
This seems in line with other top-level dirs like GARBLE_SHARED.
Finally, we turn -seed into a flag.Value,
which lets us implement its "set" behavior as part of flag.Parse.
Overall, we barely reduce the amount of code involved,
but we certainly remove a couple of footguns.
As part of the cleanup, we also introduce appendFlags.
3 years ago
|
|
|
|
if curPkg.ImportPath == "runtime" && flagTiny {
|
|
|
|
|
// strip unneeded runtime code
|
|
|
|
|
stripRuntime(basename, file)
|
|
|
|
|
tf.removeUnnecessaryImports(file)
|
|
|
|
|
}
|
stop loading obfuscated type information from deps
If package P1 imports package P2, P1 needs to know which names from P2
weren't obfuscated. For instance, if P2 declares T2 and does
"reflect.TypeOf(T2{...})", then P2 won't obfuscate the name T2, and
neither should P1.
This information should flow from P2 to P1, as P2 builds before
P1. We do this via obfuscatedTypesPackage; P1 loads the type information
of the obfuscated version of P2, and does a lookup for T2. If T2 exists,
then it wasn't obfuscated.
This mechanism has served us well, but it has downsides:
1) It wastes CPU; we load the type information for the entire package.
2) It's complex; for instance, we need KnownObjectFiles as an extra.
3) It makes our code harder to understand, as we load both the original
and obfuscated type informaiton.
Instead, we now have each package record what names were not obfuscated
as part of its cachedOuput file. Much like KnownObjectFiles, the map
records incrementally through the import graph, to avoid having to load
cachedOutput files for indirect dependencies.
We shouldn't need to worry about those maps getting large;
we only skip obfuscating declared names in a few uncommon scenarios,
such as the use of reflection or cgo's "//export".
Since go/types is relatively allocation-heavy, and the export files
contain a lot of data, we get a nice speed-up:
name old time/op new time/op delta
Build-16 11.5s ± 2% 11.1s ± 3% -3.77% (p=0.008 n=5+5)
name old bin-B new bin-B delta
Build-16 5.15M ± 0% 5.15M ± 0% ~ (all equal)
name old cached-time/op new cached-time/op delta
Build-16 375ms ± 3% 341ms ± 6% -8.96% (p=0.008 n=5+5)
name old sys-time/op new sys-time/op delta
Build-16 283ms ±17% 289ms ±13% ~ (p=0.841 n=5+5)
name old user-time/op new user-time/op delta
Build-16 687ms ± 6% 664ms ± 7% ~ (p=0.548 n=5+5)
Fixes #456.
Updates #475.
3 years ago
|
|
|
|
tf.handleDirectives(file.Comments)
|
|
|
|
|
file = tf.transformGo(file)
|
refactor "current package" with TOOLEXEC_IMPORTPATH (#266)
Now that we've dropped support for Go 1.15.x, we can finally rely on
this environment variable for toolexec calls, present in Go 1.16.
Before, we had hacky ways of trying to figure out the current package's
import path, mostly from the -p flag. The biggest rough edge there was
that, for main packages, that was simply the package name, and not its
full import path.
To work around that, we had a restriction on a single main package, so
we could work around that issue. That restriction is now gone.
The new code is simpler, especially because we can set curPkg in a
single place for all toolexec transform funcs.
Since we can always rely on curPkg not being nil now, we can also start
reusing listedPackage.Private and avoid the majority of repeated calls
to isPrivate. The function is cheap, but still not free.
isPrivate itself can also get simpler. We no longer have to worry about
the "main" edge case. Plus, the sanity check for invalid package paths
is now unnecessary; we only got malformed paths from goobj2, and we now
require exact matches with the ImportPath field from "go list -json".
Another effect of clearing up the "main" edge case is that -debugdir now
uses the right directory for main packages. We also start using
consistent debugdir paths in the tests, for the sake of being easier to
read and maintain.
Finally, note that commandReverse did not need the extra call to "go
list -toolexec", as the "shared" call stored in the cache is enough. We
still call toolexecCmd to get said cache, which should probably be
simplified in a future PR.
While at it, replace the use of the "-std" compiler flag with the
Standard field from "go list -json".
4 years ago
|
|
|
|
if newPkgPath != "" {
|
reimplement import path obfuscation without goobj2 (#242)
We used to rely on a parallel implementation of an object file parser
and writer to be able to obfuscate import paths. After compiling each
package, we would parse the object file, replace the import paths, and
write the updated object file in-place.
That worked well, in most cases. Unfortunately, it had some flaws:
* Complexity. Even when most of the code is maintained in a separate
module, the import_obfuscation.go file was still close to a thousand
lines of code.
* Go compatibility. The object file format changes between Go releases,
so we were supporting Go 1.15, but not 1.16. Fixing the object file
package to work with 1.16 would probably break 1.15 support.
* Bugs. For example, we recently had to add a workaround for #224, since
import paths containing dots after the domain would end up escaped.
Another example is #190, which seems to be caused by the object file
parser or writer corrupting the compiled code and causing segfaults in
some rare edge cases.
Instead, let's drop that method entirely, and force the compiler and
linker to do the work for us. The steps necessary when compiling a
package to obfuscate are:
1) Replace its "package foo" lines with the obfuscated package path. No
need to separate the package path and name, since the obfuscated path
does not contain slashes.
2) Replace the "-p pkg/foo" flag with the obfuscated path.
3) Replace the "import" spec lines with the obfuscated package paths,
for those dependencies which were obfuscated.
4) Replace the "-importcfg [...]" file with a version that uses the
obfuscated paths instead.
The linker also needs that last step, since it also uses an importcfg
file to find object files.
There are three noteworthy drawbacks to this new method:
1) Since we no longer write object files, we can't use them to store
data to be cached. As such, the -debugdir flag goes back to using the
"-a" build flag to always rebuild all packages. On the plus side,
that caching didn't work very well; see #176.
2) The package name "main" remains in all declarations under it, not
just "func main", since we can only rename entire packages. This
seems fine, as it gives little information to the end user.
3) The -tiny mode no longer sets all lines to 0, since it did that by
modifying object files. As a temporary measure, we instead set all
top-level declarations to be on line 1. A TODO is added to hopefully
improve this again in the near future.
The upside is that we get rid of all the issues mentioned before. Plus,
garble now nearly works with Go 1.16, with the exception of two very
minor bugs that look fixable. A follow-up PR will take care of that and
start testing on 1.16.
Fixes #176.
Fixes #190.
4 years ago
|
|
|
|
file.Name.Name = newPkgPath
|
|
|
|
|
}
|
|
|
|
|
|
rework the position obfuscator (#282)
First, rename line_obfuscator.go to position.go. We obfuscate filenames,
not just line numbers, and "obfuscator" is a bit redundant.
Second, use "/*line :x*/" comments rather than the "//line :x" form, as
the former allows us to insert them in any position without adding
unnecessary newlines. This will be important for changing the position
of call sites, which will be important for "garble reverse".
Third, do not rely on go/ast to remove and add comments. Since they are
free-floating, we can very easily end up with misplaced comments,
especially as the literal obfuscator heavily modifies the AST.
The new method prints and re-parses the file, to ensure all node
positions are consistent with a buffer, buf1. Then, we copy the contents
into a new buffer, buf2, while inserting the comments that we need.
The new method also modifies line numbers at the very end of obfuscating
a Go file, instead of at the very beginning. That's going to be more
robust long-term, as we will also obfuscate line numbers for any
additions or modifications to the AST.
Fourth, detachedDirectives is unnecessary, as we can accomplish the same
with two simple prefix matches.
Finally, this means we can stop using detachedComments entirely, as
printFile already inserts the comments we need.
For #5.
4 years ago
|
|
|
|
src, err := printFile(file)
|
|
|
|
|
if err != nil {
|
|
|
|
|
return nil, err
|
|
|
|
|
}
|
|
|
|
|
// It is possible to end up in an edge case where two instances of the
|
|
|
|
|
// same package have different Action IDs, but their obfuscation and
|
|
|
|
|
// builds produce exactly the same results.
|
|
|
|
|
// In such an edge case, Go's build cache is smart enough for the second
|
|
|
|
|
// instance to reuse the first's build artifact.
|
|
|
|
|
// However, garble's caching via garbleExportFile is not as smart,
|
|
|
|
|
// as we base the location of these files purely based on Action IDs.
|
|
|
|
|
// Thus, the incremental build can fail to find garble's cached file.
|
|
|
|
|
// To sidestep this bug entirely, ensure that different action IDs never
|
|
|
|
|
// produce the same cached output when building with garble.
|
|
|
|
|
// Note that this edge case tends to happen when a -seed is provided,
|
|
|
|
|
// as then a package's Action ID is not used as an obfuscation seed.
|
|
|
|
|
// TODO(mvdan): replace this workaround with an actual fix if we can.
|
|
|
|
|
// This workaround is presumably worse on the build cache,
|
|
|
|
|
// as we end up with extra near-duplicate cached artifacts.
|
|
|
|
|
if i == 0 {
|
|
|
|
|
src = append(src, fmt.Sprintf(
|
|
|
|
|
"\nvar garbleActionID = %q\n", hashToString(curPkg.GarbleActionID),
|
|
|
|
|
)...)
|
|
|
|
|
}
|
rework the position obfuscator (#282)
First, rename line_obfuscator.go to position.go. We obfuscate filenames,
not just line numbers, and "obfuscator" is a bit redundant.
Second, use "/*line :x*/" comments rather than the "//line :x" form, as
the former allows us to insert them in any position without adding
unnecessary newlines. This will be important for changing the position
of call sites, which will be important for "garble reverse".
Third, do not rely on go/ast to remove and add comments. Since they are
free-floating, we can very easily end up with misplaced comments,
especially as the literal obfuscator heavily modifies the AST.
The new method prints and re-parses the file, to ensure all node
positions are consistent with a buffer, buf1. Then, we copy the contents
into a new buffer, buf2, while inserting the comments that we need.
The new method also modifies line numbers at the very end of obfuscating
a Go file, instead of at the very beginning. That's going to be more
robust long-term, as we will also obfuscate line numbers for any
additions or modifications to the AST.
Fourth, detachedDirectives is unnecessary, as we can accomplish the same
with two simple prefix matches.
Finally, this means we can stop using detachedComments entirely, as
printFile already inserts the comments we need.
For #5.
4 years ago
|
|
|
|
|
|
|
|
|
// We hide Go source filenames via "//line" directives,
|
|
|
|
|
// so there is no need to use obfuscated filenames here.
|
|
|
|
|
if path, err := writeSourceFile(basename, basename, src); err != nil {
|
reimplement import path obfuscation without goobj2 (#242)
We used to rely on a parallel implementation of an object file parser
and writer to be able to obfuscate import paths. After compiling each
package, we would parse the object file, replace the import paths, and
write the updated object file in-place.
That worked well, in most cases. Unfortunately, it had some flaws:
* Complexity. Even when most of the code is maintained in a separate
module, the import_obfuscation.go file was still close to a thousand
lines of code.
* Go compatibility. The object file format changes between Go releases,
so we were supporting Go 1.15, but not 1.16. Fixing the object file
package to work with 1.16 would probably break 1.15 support.
* Bugs. For example, we recently had to add a workaround for #224, since
import paths containing dots after the domain would end up escaped.
Another example is #190, which seems to be caused by the object file
parser or writer corrupting the compiled code and causing segfaults in
some rare edge cases.
Instead, let's drop that method entirely, and force the compiler and
linker to do the work for us. The steps necessary when compiling a
package to obfuscate are:
1) Replace its "package foo" lines with the obfuscated package path. No
need to separate the package path and name, since the obfuscated path
does not contain slashes.
2) Replace the "-p pkg/foo" flag with the obfuscated path.
3) Replace the "import" spec lines with the obfuscated package paths,
for those dependencies which were obfuscated.
4) Replace the "-importcfg [...]" file with a version that uses the
obfuscated paths instead.
The linker also needs that last step, since it also uses an importcfg
file to find object files.
There are three noteworthy drawbacks to this new method:
1) Since we no longer write object files, we can't use them to store
data to be cached. As such, the -debugdir flag goes back to using the
"-a" build flag to always rebuild all packages. On the plus side,
that caching didn't work very well; see #176.
2) The package name "main" remains in all declarations under it, not
just "func main", since we can only rename entire packages. This
seems fine, as it gives little information to the end user.
3) The -tiny mode no longer sets all lines to 0, since it did that by
modifying object files. As a temporary measure, we instead set all
top-level declarations to be on line 1. A TODO is added to hopefully
improve this again in the near future.
The upside is that we get rid of all the issues mentioned before. Plus,
garble now nearly works with Go 1.16, with the exception of two very
minor bugs that look fixable. A follow-up PR will take care of that and
start testing on 1.16.
Fixes #176.
Fixes #190.
4 years ago
|
|
|
|
return nil, err
|
|
|
|
|
} else {
|
|
|
|
|
newPaths = append(newPaths, path)
|
|
|
|
|
}
|
|
|
|
|
}
|
reimplement import path obfuscation without goobj2 (#242)
We used to rely on a parallel implementation of an object file parser
and writer to be able to obfuscate import paths. After compiling each
package, we would parse the object file, replace the import paths, and
write the updated object file in-place.
That worked well, in most cases. Unfortunately, it had some flaws:
* Complexity. Even when most of the code is maintained in a separate
module, the import_obfuscation.go file was still close to a thousand
lines of code.
* Go compatibility. The object file format changes between Go releases,
so we were supporting Go 1.15, but not 1.16. Fixing the object file
package to work with 1.16 would probably break 1.15 support.
* Bugs. For example, we recently had to add a workaround for #224, since
import paths containing dots after the domain would end up escaped.
Another example is #190, which seems to be caused by the object file
parser or writer corrupting the compiled code and causing segfaults in
some rare edge cases.
Instead, let's drop that method entirely, and force the compiler and
linker to do the work for us. The steps necessary when compiling a
package to obfuscate are:
1) Replace its "package foo" lines with the obfuscated package path. No
need to separate the package path and name, since the obfuscated path
does not contain slashes.
2) Replace the "-p pkg/foo" flag with the obfuscated path.
3) Replace the "import" spec lines with the obfuscated package paths,
for those dependencies which were obfuscated.
4) Replace the "-importcfg [...]" file with a version that uses the
obfuscated paths instead.
The linker also needs that last step, since it also uses an importcfg
file to find object files.
There are three noteworthy drawbacks to this new method:
1) Since we no longer write object files, we can't use them to store
data to be cached. As such, the -debugdir flag goes back to using the
"-a" build flag to always rebuild all packages. On the plus side,
that caching didn't work very well; see #176.
2) The package name "main" remains in all declarations under it, not
just "func main", since we can only rename entire packages. This
seems fine, as it gives little information to the end user.
3) The -tiny mode no longer sets all lines to 0, since it did that by
modifying object files. As a temporary measure, we instead set all
top-level declarations to be on line 1. A TODO is added to hopefully
improve this again in the near future.
The upside is that we get rid of all the issues mentioned before. Plus,
garble now nearly works with Go 1.16, with the exception of two very
minor bugs that look fixable. A follow-up PR will take care of that and
start testing on 1.16.
Fixes #176.
Fixes #190.
4 years ago
|
|
|
|
flags = flagSetValue(flags, "-importcfg", newImportCfg)
|
|
|
|
|
|
stop loading obfuscated type information from deps
If package P1 imports package P2, P1 needs to know which names from P2
weren't obfuscated. For instance, if P2 declares T2 and does
"reflect.TypeOf(T2{...})", then P2 won't obfuscate the name T2, and
neither should P1.
This information should flow from P2 to P1, as P2 builds before
P1. We do this via obfuscatedTypesPackage; P1 loads the type information
of the obfuscated version of P2, and does a lookup for T2. If T2 exists,
then it wasn't obfuscated.
This mechanism has served us well, but it has downsides:
1) It wastes CPU; we load the type information for the entire package.
2) It's complex; for instance, we need KnownObjectFiles as an extra.
3) It makes our code harder to understand, as we load both the original
and obfuscated type informaiton.
Instead, we now have each package record what names were not obfuscated
as part of its cachedOuput file. Much like KnownObjectFiles, the map
records incrementally through the import graph, to avoid having to load
cachedOutput files for indirect dependencies.
We shouldn't need to worry about those maps getting large;
we only skip obfuscating declared names in a few uncommon scenarios,
such as the use of reflection or cgo's "//export".
Since go/types is relatively allocation-heavy, and the export files
contain a lot of data, we get a nice speed-up:
name old time/op new time/op delta
Build-16 11.5s ± 2% 11.1s ± 3% -3.77% (p=0.008 n=5+5)
name old bin-B new bin-B delta
Build-16 5.15M ± 0% 5.15M ± 0% ~ (all equal)
name old cached-time/op new cached-time/op delta
Build-16 375ms ± 3% 341ms ± 6% -8.96% (p=0.008 n=5+5)
name old sys-time/op new sys-time/op delta
Build-16 283ms ±17% 289ms ±13% ~ (p=0.841 n=5+5)
name old user-time/op new user-time/op delta
Build-16 687ms ± 6% 664ms ± 7% ~ (p=0.548 n=5+5)
Fixes #456.
Updates #475.
3 years ago
|
|
|
|
if err := writeGobExclusive(
|
|
|
|
|
garbleExportFile(curPkg),
|
|
|
|
|
cachedOutput,
|
|
|
|
|
); err != nil && !errors.Is(err, fs.ErrExist) {
|
|
|
|
|
return nil, err
|
|
|
|
|
}
|
|
|
|
|
|
reimplement import path obfuscation without goobj2 (#242)
We used to rely on a parallel implementation of an object file parser
and writer to be able to obfuscate import paths. After compiling each
package, we would parse the object file, replace the import paths, and
write the updated object file in-place.
That worked well, in most cases. Unfortunately, it had some flaws:
* Complexity. Even when most of the code is maintained in a separate
module, the import_obfuscation.go file was still close to a thousand
lines of code.
* Go compatibility. The object file format changes between Go releases,
so we were supporting Go 1.15, but not 1.16. Fixing the object file
package to work with 1.16 would probably break 1.15 support.
* Bugs. For example, we recently had to add a workaround for #224, since
import paths containing dots after the domain would end up escaped.
Another example is #190, which seems to be caused by the object file
parser or writer corrupting the compiled code and causing segfaults in
some rare edge cases.
Instead, let's drop that method entirely, and force the compiler and
linker to do the work for us. The steps necessary when compiling a
package to obfuscate are:
1) Replace its "package foo" lines with the obfuscated package path. No
need to separate the package path and name, since the obfuscated path
does not contain slashes.
2) Replace the "-p pkg/foo" flag with the obfuscated path.
3) Replace the "import" spec lines with the obfuscated package paths,
for those dependencies which were obfuscated.
4) Replace the "-importcfg [...]" file with a version that uses the
obfuscated paths instead.
The linker also needs that last step, since it also uses an importcfg
file to find object files.
There are three noteworthy drawbacks to this new method:
1) Since we no longer write object files, we can't use them to store
data to be cached. As such, the -debugdir flag goes back to using the
"-a" build flag to always rebuild all packages. On the plus side,
that caching didn't work very well; see #176.
2) The package name "main" remains in all declarations under it, not
just "func main", since we can only rename entire packages. This
seems fine, as it gives little information to the end user.
3) The -tiny mode no longer sets all lines to 0, since it did that by
modifying object files. As a temporary measure, we instead set all
top-level declarations to be on line 1. A TODO is added to hopefully
improve this again in the near future.
The upside is that we get rid of all the issues mentioned before. Plus,
garble now nearly works with Go 1.16, with the exception of two very
minor bugs that look fixable. A follow-up PR will take care of that and
start testing on 1.16.
Fixes #176.
Fixes #190.
4 years ago
|
|
|
|
return append(flags, newPaths...), nil
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// handleDirectives looks at all the comments in a file containing build
|
|
|
|
|
// directives, and does the necessary for the obfuscation process to work.
|
|
|
|
|
//
|
|
|
|
|
// Right now, this means recording what local names are used with go:linkname,
|
|
|
|
|
// and rewriting those directives to use obfuscated name from other packages.
|
rework the position obfuscator (#282)
First, rename line_obfuscator.go to position.go. We obfuscate filenames,
not just line numbers, and "obfuscator" is a bit redundant.
Second, use "/*line :x*/" comments rather than the "//line :x" form, as
the former allows us to insert them in any position without adding
unnecessary newlines. This will be important for changing the position
of call sites, which will be important for "garble reverse".
Third, do not rely on go/ast to remove and add comments. Since they are
free-floating, we can very easily end up with misplaced comments,
especially as the literal obfuscator heavily modifies the AST.
The new method prints and re-parses the file, to ensure all node
positions are consistent with a buffer, buf1. Then, we copy the contents
into a new buffer, buf2, while inserting the comments that we need.
The new method also modifies line numbers at the very end of obfuscating
a Go file, instead of at the very beginning. That's going to be more
robust long-term, as we will also obfuscate line numbers for any
additions or modifications to the AST.
Fourth, detachedDirectives is unnecessary, as we can accomplish the same
with two simple prefix matches.
Finally, this means we can stop using detachedComments entirely, as
printFile already inserts the comments we need.
For #5.
4 years ago
|
|
|
|
func (tf *transformer) handleDirectives(comments []*ast.CommentGroup) {
|
|
|
|
|
for _, group := range comments {
|
|
|
|
|
for _, comment := range group.List {
|
|
|
|
|
if !strings.HasPrefix(comment.Text, "//go:linkname ") {
|
|
|
|
|
continue
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// We can have either just one argument:
|
|
|
|
|
//
|
|
|
|
|
// //go:linkname localName
|
|
|
|
|
//
|
|
|
|
|
// Or two arguments, where the second may refer to a name in a
|
|
|
|
|
// different package:
|
|
|
|
|
//
|
|
|
|
|
// //go:linkname localName newName
|
|
|
|
|
// //go:linkname localName pkg.newName
|
rework the position obfuscator (#282)
First, rename line_obfuscator.go to position.go. We obfuscate filenames,
not just line numbers, and "obfuscator" is a bit redundant.
Second, use "/*line :x*/" comments rather than the "//line :x" form, as
the former allows us to insert them in any position without adding
unnecessary newlines. This will be important for changing the position
of call sites, which will be important for "garble reverse".
Third, do not rely on go/ast to remove and add comments. Since they are
free-floating, we can very easily end up with misplaced comments,
especially as the literal obfuscator heavily modifies the AST.
The new method prints and re-parses the file, to ensure all node
positions are consistent with a buffer, buf1. Then, we copy the contents
into a new buffer, buf2, while inserting the comments that we need.
The new method also modifies line numbers at the very end of obfuscating
a Go file, instead of at the very beginning. That's going to be more
robust long-term, as we will also obfuscate line numbers for any
additions or modifications to the AST.
Fourth, detachedDirectives is unnecessary, as we can accomplish the same
with two simple prefix matches.
Finally, this means we can stop using detachedComments entirely, as
printFile already inserts the comments we need.
For #5.
4 years ago
|
|
|
|
fields := strings.Fields(comment.Text)
|
|
|
|
|
localName := fields[1]
|
|
|
|
|
newName := ""
|
|
|
|
|
if len(fields) == 3 {
|
|
|
|
|
newName = fields[2]
|
rework the position obfuscator (#282)
First, rename line_obfuscator.go to position.go. We obfuscate filenames,
not just line numbers, and "obfuscator" is a bit redundant.
Second, use "/*line :x*/" comments rather than the "//line :x" form, as
the former allows us to insert them in any position without adding
unnecessary newlines. This will be important for changing the position
of call sites, which will be important for "garble reverse".
Third, do not rely on go/ast to remove and add comments. Since they are
free-floating, we can very easily end up with misplaced comments,
especially as the literal obfuscator heavily modifies the AST.
The new method prints and re-parses the file, to ensure all node
positions are consistent with a buffer, buf1. Then, we copy the contents
into a new buffer, buf2, while inserting the comments that we need.
The new method also modifies line numbers at the very end of obfuscating
a Go file, instead of at the very beginning. That's going to be more
robust long-term, as we will also obfuscate line numbers for any
additions or modifications to the AST.
Fourth, detachedDirectives is unnecessary, as we can accomplish the same
with two simple prefix matches.
Finally, this means we can stop using detachedComments entirely, as
printFile already inserts the comments we need.
For #5.
4 years ago
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
localName, newName = tf.transformLinkname(localName, newName)
|
|
|
|
|
fields[1] = localName
|
|
|
|
|
if len(fields) == 3 {
|
|
|
|
|
fields[2] = newName
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if flagDebug { // TODO(mvdan): remove once https://go.dev/issue/53465 if fixed
|
|
|
|
|
log.Printf("linkname %q changed to %q", comment.Text, strings.Join(fields, " "))
|
ensure the runtime is built in a reproducible way
We went to great lengths to ensure garble builds are reproducible.
This includes how the tool itself works,
as its behavior should be the same given the same inputs.
However, we made one crucial mistake with the runtime package.
It has go:linkname directives pointing at other packages,
and some of those pointed packages aren't its dependencies.
Imagine two scenarios where garble builds the runtime package:
1) We run "garble build runtime". The way we handle linkname directives
calls listPackage on the target package, to obfuscate the target's
import path and object name. However, since we only obtained build
info of runtime and its deps, calls for some linknames such as
listPackage("sync/atomic") will fail. The linkname directive will
leave its target untouched.
2) We run "garble build std". Unlike the first scenario, all listPackage
calls issued by runtime's linkname directives will succeed, so its
linkname directive targets will be obfuscated.
At best, this can result in inconsistent builds, depending on how the
runtime package was built. At worst, the mismatching object names can
result in errors at link time, if the target packages are actually used.
The modified test reproduces the worst case scenario reliably,
when the fix is reverted:
> env GOCACHE=${WORK}/gocache-empty
> garble build -a runtime
> garble build -o=out_rebuild ./stdimporter
[stderr]
# test/main/stdimporter
JZzQivnl.NtQJu0H3: relocation target JZzQivnl.iioHinYT not defined
JZzQivnl.NtQJu0H3.func9: relocation target JZzQivnl.yz5z0NaH not defined
JZzQivnl.(*ypvqhKiQ).String: relocation target JZzQivnl.eVciBQeI not defined
JZzQivnl.(*ypvqhKiQ).PkgPath: relocation target JZzQivnl.eVciBQeI not defined
[...]
The fix consists of two steps. First, if we're building the runtime and
listPackage fails on a package, that means we ran into scenario 1 above.
To avoid the inconsistency, we fill ListedPackages with "go list [...] std".
This means we'll always build runtime as described in scenario 2 above.
Second, when building packages other than the runtime,
we only allow listPackage to succeed if we're listing a dependency of
the current package.
This ensures we won't run into similar reproducibility bugs in the future.
Finally, re-enable test-gotip on CI since this was the last test flake.
3 years ago
|
|
|
|
}
|
|
|
|
|
comment.Text = strings.Join(fields, " ")
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
func (tf *transformer) transformLinkname(localName, newName string) (string, string) {
|
|
|
|
|
// obfuscate the local name, if the current package is obfuscated
|
|
|
|
|
if curPkg.ToObfuscate {
|
|
|
|
|
localName = hashWithPackage(curPkg, localName)
|
|
|
|
|
}
|
|
|
|
|
if newName == "" {
|
|
|
|
|
return localName, ""
|
|
|
|
|
}
|
|
|
|
|
// If the new name is of the form "pkgpath.Name", and we've obfuscated
|
|
|
|
|
// "Name" in that package, rewrite the directive to use the obfuscated name.
|
|
|
|
|
dotCnt := strings.Count(newName, ".")
|
|
|
|
|
if dotCnt < 1 {
|
|
|
|
|
// cgo-generated code uses linknames to made up symbol names,
|
|
|
|
|
// which do not have a package path at all.
|
|
|
|
|
// Replace the comment in case the local name was obfuscated.
|
|
|
|
|
return localName, newName
|
|
|
|
|
}
|
|
|
|
|
switch newName {
|
|
|
|
|
case "main.main", "main..inittask", "runtime..inittask":
|
|
|
|
|
// The runtime uses some special symbols with "..".
|
|
|
|
|
// We aren't touching those at the moment.
|
|
|
|
|
return localName, newName
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// If the package path has multiple dots, split on the last one.
|
|
|
|
|
lastDotIdx := strings.LastIndex(newName, ".")
|
|
|
|
|
pkgPath, foreignName := newName[:lastDotIdx], newName[lastDotIdx+1:]
|
|
|
|
|
|
|
|
|
|
lpkg, err := listPackage(pkgPath)
|
|
|
|
|
if err != nil {
|
default to GOGARBLE=*, stop using GOPRIVATE
We can drop the code that kicked in when GOGARBLE was empty.
We can also add the value in addGarbleToHash unconditionally,
as we never allow it to be empty.
In the tests, remove all GOGARBLE lines where it just meant "obfuscate
everything" or "obfuscate the entire main module".
cgo.txtar had "obfuscate everything" as a separate step,
so remove it entirely.
linkname.txtar started failing because the imported package did not
import strings, so listPackage errored out. This wasn't a problem when
strings itself wasn't obfuscated, as transformLinkname silently left
strings.IndexByte untouched. It is a problem when IndexByte does get
obfuscated. Make that kind of listPackage error visible, and fix it.
reflect.txtar started failing with "unreachable method" runtime throws.
It's not clear to me why; it appears that GOGARBLE=* makes the linker
think that ExportedMethodName is suddenly unreachable.
Work around the problem by making the method explicitly reachable,
and leave a TODO as a reminder to investigate.
Finally, gogarble.txtar no longer needs to test for GOPRIVATE.
The rest of the test is left the same, as we still want the various
values for GOGARBLE to continue to work just like before.
Fixes #594.
2 years ago
|
|
|
|
// TODO(mvdan): use errors.As or errors.Is instead
|
|
|
|
|
if strings.Contains(err.Error(), "path not found") {
|
|
|
|
|
// Probably a made up name like above, but with a dot.
|
|
|
|
|
return localName, newName
|
|
|
|
|
}
|
|
|
|
|
if strings.Contains(err.Error(), "refusing to list") {
|
|
|
|
|
fmt.Fprintf(os.Stderr,
|
|
|
|
|
"//go:linkname refers to %s - add `import _ %q` so garble can find the package",
|
|
|
|
|
newName, pkgPath)
|
|
|
|
|
return localName, newName
|
|
|
|
|
}
|
|
|
|
|
panic(err) // shouldn't happen
|
|
|
|
|
}
|
|
|
|
|
if lpkg.ToObfuscate {
|
|
|
|
|
// The name exists and was obfuscated; obfuscate the new name.
|
|
|
|
|
newForeignName := hashWithPackage(lpkg, foreignName)
|
|
|
|
|
newPkgPath := pkgPath
|
|
|
|
|
if pkgPath != "main" {
|
|
|
|
|
newPkgPath = lpkg.obfuscatedImportPath()
|
reimplement import path obfuscation without goobj2 (#242)
We used to rely on a parallel implementation of an object file parser
and writer to be able to obfuscate import paths. After compiling each
package, we would parse the object file, replace the import paths, and
write the updated object file in-place.
That worked well, in most cases. Unfortunately, it had some flaws:
* Complexity. Even when most of the code is maintained in a separate
module, the import_obfuscation.go file was still close to a thousand
lines of code.
* Go compatibility. The object file format changes between Go releases,
so we were supporting Go 1.15, but not 1.16. Fixing the object file
package to work with 1.16 would probably break 1.15 support.
* Bugs. For example, we recently had to add a workaround for #224, since
import paths containing dots after the domain would end up escaped.
Another example is #190, which seems to be caused by the object file
parser or writer corrupting the compiled code and causing segfaults in
some rare edge cases.
Instead, let's drop that method entirely, and force the compiler and
linker to do the work for us. The steps necessary when compiling a
package to obfuscate are:
1) Replace its "package foo" lines with the obfuscated package path. No
need to separate the package path and name, since the obfuscated path
does not contain slashes.
2) Replace the "-p pkg/foo" flag with the obfuscated path.
3) Replace the "import" spec lines with the obfuscated package paths,
for those dependencies which were obfuscated.
4) Replace the "-importcfg [...]" file with a version that uses the
obfuscated paths instead.
The linker also needs that last step, since it also uses an importcfg
file to find object files.
There are three noteworthy drawbacks to this new method:
1) Since we no longer write object files, we can't use them to store
data to be cached. As such, the -debugdir flag goes back to using the
"-a" build flag to always rebuild all packages. On the plus side,
that caching didn't work very well; see #176.
2) The package name "main" remains in all declarations under it, not
just "func main", since we can only rename entire packages. This
seems fine, as it gives little information to the end user.
3) The -tiny mode no longer sets all lines to 0, since it did that by
modifying object files. As a temporary measure, we instead set all
top-level declarations to be on line 1. A TODO is added to hopefully
improve this again in the near future.
The upside is that we get rid of all the issues mentioned before. Plus,
garble now nearly works with Go 1.16, with the exception of two very
minor bugs that look fixable. A follow-up PR will take care of that and
start testing on 1.16.
Fixes #176.
Fixes #190.
4 years ago
|
|
|
|
}
|
|
|
|
|
newName = newPkgPath + "." + newForeignName
|
|
|
|
|
}
|
|
|
|
|
return localName, newName
|
|
|
|
|
}
|
|
|
|
|
|
stop loading obfuscated type information from deps
If package P1 imports package P2, P1 needs to know which names from P2
weren't obfuscated. For instance, if P2 declares T2 and does
"reflect.TypeOf(T2{...})", then P2 won't obfuscate the name T2, and
neither should P1.
This information should flow from P2 to P1, as P2 builds before
P1. We do this via obfuscatedTypesPackage; P1 loads the type information
of the obfuscated version of P2, and does a lookup for T2. If T2 exists,
then it wasn't obfuscated.
This mechanism has served us well, but it has downsides:
1) It wastes CPU; we load the type information for the entire package.
2) It's complex; for instance, we need KnownObjectFiles as an extra.
3) It makes our code harder to understand, as we load both the original
and obfuscated type informaiton.
Instead, we now have each package record what names were not obfuscated
as part of its cachedOuput file. Much like KnownObjectFiles, the map
records incrementally through the import graph, to avoid having to load
cachedOutput files for indirect dependencies.
We shouldn't need to worry about those maps getting large;
we only skip obfuscating declared names in a few uncommon scenarios,
such as the use of reflection or cgo's "//export".
Since go/types is relatively allocation-heavy, and the export files
contain a lot of data, we get a nice speed-up:
name old time/op new time/op delta
Build-16 11.5s ± 2% 11.1s ± 3% -3.77% (p=0.008 n=5+5)
name old bin-B new bin-B delta
Build-16 5.15M ± 0% 5.15M ± 0% ~ (all equal)
name old cached-time/op new cached-time/op delta
Build-16 375ms ± 3% 341ms ± 6% -8.96% (p=0.008 n=5+5)
name old sys-time/op new sys-time/op delta
Build-16 283ms ±17% 289ms ±13% ~ (p=0.841 n=5+5)
name old user-time/op new user-time/op delta
Build-16 687ms ± 6% 664ms ± 7% ~ (p=0.548 n=5+5)
Fixes #456.
Updates #475.
3 years ago
|
|
|
|
// processImportCfg parses the importcfg file passed to a compile or link step.
|
stop relying on nested "go list -toolexec" calls (#422)
We rely on importcfg files to load type info for obfuscated packages.
We use this type information to remember what names we didn't obfuscate.
Unfortunately, indirect dependencies aren't listed in importcfg files,
so we relied on extra "go list -toolexec" calls to locate object files.
This worked fine, but added a significant amount of complexity.
The extra "go list -export -toolexec=garble" invocations weren't slow,
as they avoided rebuilding or re-obfuscating thanks to the build cache.
Still, it was hard to reason about how garble runs during a build
if we might have multiple layers of -toolexec invocations.
Instead, record the export files we encounter in an incremental map,
and persist it in the build cache via the gob file we're already using.
This way, each garble invocation knows where all object files are,
even those for indirect imports.
One wrinkle is that importcfg files can point to temporary object files.
In that case, figure out its final location in the build cache.
This requires hard-coding a bit of knowledge about how GOCACHE works,
but it seems relatively harmless given how it's very little code.
Plus, if GOCACHE ever changes, it will be obvious when our code breaks.
Finally, add a TODO about potentially saving even more work.
3 years ago
|
|
|
|
// It also builds a new importcfg file to account for obfuscated import paths.
|
avoid one more call to 'go tool buildid' (#253)
We use it to get the content ID of garble's binary, which is used for
both the garble action IDs, as well as 'go tool compile -V=full'.
Since those two happen in separate processes, both used to call 'go tool
buildid' separately. Store it in the gob cache the first time, and reuse
it the second time.
Since each call to cmd/go costs about 10ms (new process, running its
many init funcs, etc), this results in a nice speed-up for our small
benchmark. Most builds will take many seconds though, so note that a
~15ms speedup there will likely not be noticeable.
While at it, simplify the buildInfo global, as now it just contains a
map representation of the -importcfg contents. It now has better names,
docs, and a simpler representation.
We also stop using the term "garbled import", as it was a bit confusing.
"obfuscated types.Package" is a much better description.
name old time/op new time/op delta
Build-8 106ms ± 1% 92ms ± 0% -14.07% (p=0.010 n=6+4)
name old bin-B new bin-B delta
Build-8 6.60M ± 0% 6.60M ± 0% -0.01% (p=0.002 n=6+6)
name old sys-time/op new sys-time/op delta
Build-8 208ms ± 5% 149ms ± 3% -28.27% (p=0.004 n=6+5)
name old user-time/op new user-time/op delta
Build-8 433ms ± 3% 384ms ± 3% -11.35% (p=0.002 n=6+6)
4 years ago
|
|
|
|
func processImportCfg(flags []string) (newImportCfg string, _ error) {
|
start using original action IDs (#251)
When we obfuscate a name, what we do is hash the name with the action ID
of the package that contains the name. To ensure that the hash changes
if the garble tool changes, we used the action ID of the obfuscated
build, which is different than the original action ID, as we include
garble's own content ID in "go tool compile -V=full" via -toolexec.
Let's call that the "obfuscated action ID". Remember that a content ID
is roughly the hash of a binary or object file, and an action ID
contains the hash of a package's source code plus the content IDs of its
dependencies.
This had the advantage that it did what we wanted. However, it had one
massive drawback: when we compile a package, we only have the obfuscated
action IDs of its dependencies. This is because one can't have the
content ID of dependent packages before they are built.
Usually, this is not a problem, because hashing a foreign name means it
comes from a dependency, where we already have the obfuscated action ID.
However, that's not always the case.
First, go:linkname directives can point to any symbol that ends up in
the binary, even if the package is not a dependency. So garble could
only support linkname targets belonging to dependencies. This is at the
root of why we could not obfuscate the runtime; it contains linkname
directives targeting the net package, for example, which depends on runtime.
Second, some other places did not have an easy access to obfuscated
action IDs, like transformAsm, which had to recover it from a temporary
file stored by transformCompile.
Plus, this was all pretty expensive, as each toolexec sub-process had to
make repeated calls to buildidOf with the object files of dependencies.
We even had to use extra calls to "go list" in the case of indirect
dependencies, as their export files do not appear in importcfg files.
All in all, the old method was complex and expensive. A better mechanism
is to use the original action IDs directly, as listed by "go list"
without garble in the picture.
This would mean that the hashing does not change if garble changes,
meaning weaker obfuscation. To regain that property, we define the
"garble action ID", which is just the original action ID hashed together
with garble's own content ID.
This is practically the same as the obfuscated build ID we used before,
but since it doesn't go through "go tool compile -V=full" and the
obfuscated build itself, we can work out *all* the garble action IDs
upfront, before the obfuscated build even starts.
This fixes all of our problems. Now we know all garble build IDs
upfront, so a bunch of hacks can be entirely removed. Plus, since we
know them upfront, we can also cache them and avoid repeated calls to
"go tool buildid".
While at it, make use of the new BuildID field in Go 1.16's "list -json
-export". This avoids the vast majority of "go tool buildid" calls, as
the only ones that remain are 2 on the garble binary itself.
The numbers for Go 1.16 look very good:
name old time/op new time/op delta
Build-8 146ms ± 4% 101ms ± 1% -31.01% (p=0.002 n=6+6)
name old bin-B new bin-B delta
Build-8 6.61M ± 0% 6.60M ± 0% -0.09% (p=0.002 n=6+6)
name old sys-time/op new sys-time/op delta
Build-8 321ms ± 7% 202ms ± 6% -37.11% (p=0.002 n=6+6)
name old user-time/op new user-time/op delta
Build-8 538ms ± 4% 414ms ± 4% -23.12% (p=0.002 n=6+6)
4 years ago
|
|
|
|
importCfg := flagValue(flags, "-importcfg")
|
|
|
|
|
if importCfg == "" {
|
reimplement import path obfuscation without goobj2 (#242)
We used to rely on a parallel implementation of an object file parser
and writer to be able to obfuscate import paths. After compiling each
package, we would parse the object file, replace the import paths, and
write the updated object file in-place.
That worked well, in most cases. Unfortunately, it had some flaws:
* Complexity. Even when most of the code is maintained in a separate
module, the import_obfuscation.go file was still close to a thousand
lines of code.
* Go compatibility. The object file format changes between Go releases,
so we were supporting Go 1.15, but not 1.16. Fixing the object file
package to work with 1.16 would probably break 1.15 support.
* Bugs. For example, we recently had to add a workaround for #224, since
import paths containing dots after the domain would end up escaped.
Another example is #190, which seems to be caused by the object file
parser or writer corrupting the compiled code and causing segfaults in
some rare edge cases.
Instead, let's drop that method entirely, and force the compiler and
linker to do the work for us. The steps necessary when compiling a
package to obfuscate are:
1) Replace its "package foo" lines with the obfuscated package path. No
need to separate the package path and name, since the obfuscated path
does not contain slashes.
2) Replace the "-p pkg/foo" flag with the obfuscated path.
3) Replace the "import" spec lines with the obfuscated package paths,
for those dependencies which were obfuscated.
4) Replace the "-importcfg [...]" file with a version that uses the
obfuscated paths instead.
The linker also needs that last step, since it also uses an importcfg
file to find object files.
There are three noteworthy drawbacks to this new method:
1) Since we no longer write object files, we can't use them to store
data to be cached. As such, the -debugdir flag goes back to using the
"-a" build flag to always rebuild all packages. On the plus side,
that caching didn't work very well; see #176.
2) The package name "main" remains in all declarations under it, not
just "func main", since we can only rename entire packages. This
seems fine, as it gives little information to the end user.
3) The -tiny mode no longer sets all lines to 0, since it did that by
modifying object files. As a temporary measure, we instead set all
top-level declarations to be on line 1. A TODO is added to hopefully
improve this again in the near future.
The upside is that we get rid of all the issues mentioned before. Plus,
garble now nearly works with Go 1.16, with the exception of two very
minor bugs that look fixable. A follow-up PR will take care of that and
start testing on 1.16.
Fixes #176.
Fixes #190.
4 years ago
|
|
|
|
return "", fmt.Errorf("could not find -importcfg argument")
|
|
|
|
|
}
|
|
|
|
|
data, err := os.ReadFile(importCfg)
|
|
|
|
|
if err != nil {
|
reimplement import path obfuscation without goobj2 (#242)
We used to rely on a parallel implementation of an object file parser
and writer to be able to obfuscate import paths. After compiling each
package, we would parse the object file, replace the import paths, and
write the updated object file in-place.
That worked well, in most cases. Unfortunately, it had some flaws:
* Complexity. Even when most of the code is maintained in a separate
module, the import_obfuscation.go file was still close to a thousand
lines of code.
* Go compatibility. The object file format changes between Go releases,
so we were supporting Go 1.15, but not 1.16. Fixing the object file
package to work with 1.16 would probably break 1.15 support.
* Bugs. For example, we recently had to add a workaround for #224, since
import paths containing dots after the domain would end up escaped.
Another example is #190, which seems to be caused by the object file
parser or writer corrupting the compiled code and causing segfaults in
some rare edge cases.
Instead, let's drop that method entirely, and force the compiler and
linker to do the work for us. The steps necessary when compiling a
package to obfuscate are:
1) Replace its "package foo" lines with the obfuscated package path. No
need to separate the package path and name, since the obfuscated path
does not contain slashes.
2) Replace the "-p pkg/foo" flag with the obfuscated path.
3) Replace the "import" spec lines with the obfuscated package paths,
for those dependencies which were obfuscated.
4) Replace the "-importcfg [...]" file with a version that uses the
obfuscated paths instead.
The linker also needs that last step, since it also uses an importcfg
file to find object files.
There are three noteworthy drawbacks to this new method:
1) Since we no longer write object files, we can't use them to store
data to be cached. As such, the -debugdir flag goes back to using the
"-a" build flag to always rebuild all packages. On the plus side,
that caching didn't work very well; see #176.
2) The package name "main" remains in all declarations under it, not
just "func main", since we can only rename entire packages. This
seems fine, as it gives little information to the end user.
3) The -tiny mode no longer sets all lines to 0, since it did that by
modifying object files. As a temporary measure, we instead set all
top-level declarations to be on line 1. A TODO is added to hopefully
improve this again in the near future.
The upside is that we get rid of all the issues mentioned before. Plus,
garble now nearly works with Go 1.16, with the exception of two very
minor bugs that look fixable. A follow-up PR will take care of that and
start testing on 1.16.
Fixes #176.
Fixes #190.
4 years ago
|
|
|
|
return "", err
|
|
|
|
|
}
|
|
|
|
|
|
stop relying on nested "go list -toolexec" calls (#422)
We rely on importcfg files to load type info for obfuscated packages.
We use this type information to remember what names we didn't obfuscate.
Unfortunately, indirect dependencies aren't listed in importcfg files,
so we relied on extra "go list -toolexec" calls to locate object files.
This worked fine, but added a significant amount of complexity.
The extra "go list -export -toolexec=garble" invocations weren't slow,
as they avoided rebuilding or re-obfuscating thanks to the build cache.
Still, it was hard to reason about how garble runs during a build
if we might have multiple layers of -toolexec invocations.
Instead, record the export files we encounter in an incremental map,
and persist it in the build cache via the gob file we're already using.
This way, each garble invocation knows where all object files are,
even those for indirect imports.
One wrinkle is that importcfg files can point to temporary object files.
In that case, figure out its final location in the build cache.
This requires hard-coding a bit of knowledge about how GOCACHE works,
but it seems relatively harmless given how it's very little code.
Plus, if GOCACHE ever changes, it will be obvious when our code breaks.
Finally, add a TODO about potentially saving even more work.
3 years ago
|
|
|
|
var packagefiles, importmaps [][2]string
|
avoid one more call to 'go tool buildid' (#253)
We use it to get the content ID of garble's binary, which is used for
both the garble action IDs, as well as 'go tool compile -V=full'.
Since those two happen in separate processes, both used to call 'go tool
buildid' separately. Store it in the gob cache the first time, and reuse
it the second time.
Since each call to cmd/go costs about 10ms (new process, running its
many init funcs, etc), this results in a nice speed-up for our small
benchmark. Most builds will take many seconds though, so note that a
~15ms speedup there will likely not be noticeable.
While at it, simplify the buildInfo global, as now it just contains a
map representation of the -importcfg contents. It now has better names,
docs, and a simpler representation.
We also stop using the term "garbled import", as it was a bit confusing.
"obfuscated types.Package" is a much better description.
name old time/op new time/op delta
Build-8 106ms ± 1% 92ms ± 0% -14.07% (p=0.010 n=6+4)
name old bin-B new bin-B delta
Build-8 6.60M ± 0% 6.60M ± 0% -0.01% (p=0.002 n=6+6)
name old sys-time/op new sys-time/op delta
Build-8 208ms ± 5% 149ms ± 3% -28.27% (p=0.004 n=6+5)
name old user-time/op new user-time/op delta
Build-8 433ms ± 3% 384ms ± 3% -11.35% (p=0.002 n=6+6)
4 years ago
|
|
|
|
|
|
|
|
|
for _, line := range strings.Split(string(data), "\n") {
|
|
|
|
|
if line == "" || strings.HasPrefix(line, "#") {
|
|
|
|
|
continue
|
|
|
|
|
}
|
|
|
|
|
verb, args, found := strings.Cut(line, " ")
|
|
|
|
|
if !found {
|
|
|
|
|
continue
|
|
|
|
|
}
|
|
|
|
|
switch verb {
|
|
|
|
|
case "importmap":
|
|
|
|
|
beforePath, afterPath, found := strings.Cut(args, "=")
|
|
|
|
|
if !found {
|
|
|
|
|
continue
|
|
|
|
|
}
|
stop relying on nested "go list -toolexec" calls (#422)
We rely on importcfg files to load type info for obfuscated packages.
We use this type information to remember what names we didn't obfuscate.
Unfortunately, indirect dependencies aren't listed in importcfg files,
so we relied on extra "go list -toolexec" calls to locate object files.
This worked fine, but added a significant amount of complexity.
The extra "go list -export -toolexec=garble" invocations weren't slow,
as they avoided rebuilding or re-obfuscating thanks to the build cache.
Still, it was hard to reason about how garble runs during a build
if we might have multiple layers of -toolexec invocations.
Instead, record the export files we encounter in an incremental map,
and persist it in the build cache via the gob file we're already using.
This way, each garble invocation knows where all object files are,
even those for indirect imports.
One wrinkle is that importcfg files can point to temporary object files.
In that case, figure out its final location in the build cache.
This requires hard-coding a bit of knowledge about how GOCACHE works,
but it seems relatively harmless given how it's very little code.
Plus, if GOCACHE ever changes, it will be obvious when our code breaks.
Finally, add a TODO about potentially saving even more work.
3 years ago
|
|
|
|
importmaps = append(importmaps, [2]string{beforePath, afterPath})
|
|
|
|
|
case "packagefile":
|
|
|
|
|
importPath, objectPath, found := strings.Cut(args, "=")
|
|
|
|
|
if !found {
|
|
|
|
|
continue
|
|
|
|
|
}
|
stop relying on nested "go list -toolexec" calls (#422)
We rely on importcfg files to load type info for obfuscated packages.
We use this type information to remember what names we didn't obfuscate.
Unfortunately, indirect dependencies aren't listed in importcfg files,
so we relied on extra "go list -toolexec" calls to locate object files.
This worked fine, but added a significant amount of complexity.
The extra "go list -export -toolexec=garble" invocations weren't slow,
as they avoided rebuilding or re-obfuscating thanks to the build cache.
Still, it was hard to reason about how garble runs during a build
if we might have multiple layers of -toolexec invocations.
Instead, record the export files we encounter in an incremental map,
and persist it in the build cache via the gob file we're already using.
This way, each garble invocation knows where all object files are,
even those for indirect imports.
One wrinkle is that importcfg files can point to temporary object files.
In that case, figure out its final location in the build cache.
This requires hard-coding a bit of knowledge about how GOCACHE works,
but it seems relatively harmless given how it's very little code.
Plus, if GOCACHE ever changes, it will be obvious when our code breaks.
Finally, add a TODO about potentially saving even more work.
3 years ago
|
|
|
|
packagefiles = append(packagefiles, [2]string{importPath, objectPath})
|
|
|
|
|
}
|
|
|
|
|
}
|
reimplement import path obfuscation without goobj2 (#242)
We used to rely on a parallel implementation of an object file parser
and writer to be able to obfuscate import paths. After compiling each
package, we would parse the object file, replace the import paths, and
write the updated object file in-place.
That worked well, in most cases. Unfortunately, it had some flaws:
* Complexity. Even when most of the code is maintained in a separate
module, the import_obfuscation.go file was still close to a thousand
lines of code.
* Go compatibility. The object file format changes between Go releases,
so we were supporting Go 1.15, but not 1.16. Fixing the object file
package to work with 1.16 would probably break 1.15 support.
* Bugs. For example, we recently had to add a workaround for #224, since
import paths containing dots after the domain would end up escaped.
Another example is #190, which seems to be caused by the object file
parser or writer corrupting the compiled code and causing segfaults in
some rare edge cases.
Instead, let's drop that method entirely, and force the compiler and
linker to do the work for us. The steps necessary when compiling a
package to obfuscate are:
1) Replace its "package foo" lines with the obfuscated package path. No
need to separate the package path and name, since the obfuscated path
does not contain slashes.
2) Replace the "-p pkg/foo" flag with the obfuscated path.
3) Replace the "import" spec lines with the obfuscated package paths,
for those dependencies which were obfuscated.
4) Replace the "-importcfg [...]" file with a version that uses the
obfuscated paths instead.
The linker also needs that last step, since it also uses an importcfg
file to find object files.
There are three noteworthy drawbacks to this new method:
1) Since we no longer write object files, we can't use them to store
data to be cached. As such, the -debugdir flag goes back to using the
"-a" build flag to always rebuild all packages. On the plus side,
that caching didn't work very well; see #176.
2) The package name "main" remains in all declarations under it, not
just "func main", since we can only rename entire packages. This
seems fine, as it gives little information to the end user.
3) The -tiny mode no longer sets all lines to 0, since it did that by
modifying object files. As a temporary measure, we instead set all
top-level declarations to be on line 1. A TODO is added to hopefully
improve this again in the near future.
The upside is that we get rid of all the issues mentioned before. Plus,
garble now nearly works with Go 1.16, with the exception of two very
minor bugs that look fixable. A follow-up PR will take care of that and
start testing on 1.16.
Fixes #176.
Fixes #190.
4 years ago
|
|
|
|
|
|
|
|
|
// Produce the modified importcfg file.
|
|
|
|
|
// This is mainly replacing the obfuscated paths.
|
|
|
|
|
// Note that we range over maps, so this is non-deterministic, but that
|
|
|
|
|
// should not matter as the file is treated like a lookup table.
|
|
|
|
|
newCfg, err := os.CreateTemp(sharedTempDir, "importcfg")
|
reimplement import path obfuscation without goobj2 (#242)
We used to rely on a parallel implementation of an object file parser
and writer to be able to obfuscate import paths. After compiling each
package, we would parse the object file, replace the import paths, and
write the updated object file in-place.
That worked well, in most cases. Unfortunately, it had some flaws:
* Complexity. Even when most of the code is maintained in a separate
module, the import_obfuscation.go file was still close to a thousand
lines of code.
* Go compatibility. The object file format changes between Go releases,
so we were supporting Go 1.15, but not 1.16. Fixing the object file
package to work with 1.16 would probably break 1.15 support.
* Bugs. For example, we recently had to add a workaround for #224, since
import paths containing dots after the domain would end up escaped.
Another example is #190, which seems to be caused by the object file
parser or writer corrupting the compiled code and causing segfaults in
some rare edge cases.
Instead, let's drop that method entirely, and force the compiler and
linker to do the work for us. The steps necessary when compiling a
package to obfuscate are:
1) Replace its "package foo" lines with the obfuscated package path. No
need to separate the package path and name, since the obfuscated path
does not contain slashes.
2) Replace the "-p pkg/foo" flag with the obfuscated path.
3) Replace the "import" spec lines with the obfuscated package paths,
for those dependencies which were obfuscated.
4) Replace the "-importcfg [...]" file with a version that uses the
obfuscated paths instead.
The linker also needs that last step, since it also uses an importcfg
file to find object files.
There are three noteworthy drawbacks to this new method:
1) Since we no longer write object files, we can't use them to store
data to be cached. As such, the -debugdir flag goes back to using the
"-a" build flag to always rebuild all packages. On the plus side,
that caching didn't work very well; see #176.
2) The package name "main" remains in all declarations under it, not
just "func main", since we can only rename entire packages. This
seems fine, as it gives little information to the end user.
3) The -tiny mode no longer sets all lines to 0, since it did that by
modifying object files. As a temporary measure, we instead set all
top-level declarations to be on line 1. A TODO is added to hopefully
improve this again in the near future.
The upside is that we get rid of all the issues mentioned before. Plus,
garble now nearly works with Go 1.16, with the exception of two very
minor bugs that look fixable. A follow-up PR will take care of that and
start testing on 1.16.
Fixes #176.
Fixes #190.
4 years ago
|
|
|
|
if err != nil {
|
|
|
|
|
return "", err
|
|
|
|
|
}
|
stop relying on nested "go list -toolexec" calls (#422)
We rely on importcfg files to load type info for obfuscated packages.
We use this type information to remember what names we didn't obfuscate.
Unfortunately, indirect dependencies aren't listed in importcfg files,
so we relied on extra "go list -toolexec" calls to locate object files.
This worked fine, but added a significant amount of complexity.
The extra "go list -export -toolexec=garble" invocations weren't slow,
as they avoided rebuilding or re-obfuscating thanks to the build cache.
Still, it was hard to reason about how garble runs during a build
if we might have multiple layers of -toolexec invocations.
Instead, record the export files we encounter in an incremental map,
and persist it in the build cache via the gob file we're already using.
This way, each garble invocation knows where all object files are,
even those for indirect imports.
One wrinkle is that importcfg files can point to temporary object files.
In that case, figure out its final location in the build cache.
This requires hard-coding a bit of knowledge about how GOCACHE works,
but it seems relatively harmless given how it's very little code.
Plus, if GOCACHE ever changes, it will be obvious when our code breaks.
Finally, add a TODO about potentially saving even more work.
3 years ago
|
|
|
|
for _, pair := range importmaps {
|
|
|
|
|
beforePath, afterPath := pair[0], pair[1]
|
concentrate and simplify "to obfuscate" logic
Back in the day, we used to call toObfuscate anytime we needed to know
whether a package should be obfuscated.
More recently, we started computing via the ToObfuscate field,
which then gets shared with all sub-processes via sharedCache.
We still had two places that directly called toObfuscate.
Replace those with ToObfuscate, and inline toObfuscate into shared.go.
obfuscatedImportPath is also a potential footgun for main packages.
Some use cases always want the original "main" package name,
such as for use in the compiler's "-p main" flag,
while other cases want the obfuscated package import path,
such as the entries in importcfg files.
Since each of these call sites handles the edge case well,
obfuscatedImportPath now panics on main packages to avoid any misuse.
Finally, test that we never leak main package paths via ldflags.txt.
We never did, but it's good to make sure.
Overall, this avoids confusion and trims the size of main.go a bit.
3 years ago
|
|
|
|
lpkg, err := listPackage(beforePath)
|
|
|
|
|
if err != nil {
|
|
|
|
|
panic(err) // shouldn't happen
|
|
|
|
|
}
|
|
|
|
|
if lpkg.ToObfuscate {
|
|
|
|
|
// Note that beforePath is not the canonical path.
|
|
|
|
|
// For beforePath="vendor/foo", afterPath and
|
|
|
|
|
// lpkg.ImportPath can be just "foo".
|
|
|
|
|
// Don't use obfuscatedImportPath here.
|
|
|
|
|
beforePath = hashWithPackage(lpkg, beforePath)
|
|
|
|
|
|
|
|
|
|
afterPath = lpkg.obfuscatedImportPath()
|
reimplement import path obfuscation without goobj2 (#242)
We used to rely on a parallel implementation of an object file parser
and writer to be able to obfuscate import paths. After compiling each
package, we would parse the object file, replace the import paths, and
write the updated object file in-place.
That worked well, in most cases. Unfortunately, it had some flaws:
* Complexity. Even when most of the code is maintained in a separate
module, the import_obfuscation.go file was still close to a thousand
lines of code.
* Go compatibility. The object file format changes between Go releases,
so we were supporting Go 1.15, but not 1.16. Fixing the object file
package to work with 1.16 would probably break 1.15 support.
* Bugs. For example, we recently had to add a workaround for #224, since
import paths containing dots after the domain would end up escaped.
Another example is #190, which seems to be caused by the object file
parser or writer corrupting the compiled code and causing segfaults in
some rare edge cases.
Instead, let's drop that method entirely, and force the compiler and
linker to do the work for us. The steps necessary when compiling a
package to obfuscate are:
1) Replace its "package foo" lines with the obfuscated package path. No
need to separate the package path and name, since the obfuscated path
does not contain slashes.
2) Replace the "-p pkg/foo" flag with the obfuscated path.
3) Replace the "import" spec lines with the obfuscated package paths,
for those dependencies which were obfuscated.
4) Replace the "-importcfg [...]" file with a version that uses the
obfuscated paths instead.
The linker also needs that last step, since it also uses an importcfg
file to find object files.
There are three noteworthy drawbacks to this new method:
1) Since we no longer write object files, we can't use them to store
data to be cached. As such, the -debugdir flag goes back to using the
"-a" build flag to always rebuild all packages. On the plus side,
that caching didn't work very well; see #176.
2) The package name "main" remains in all declarations under it, not
just "func main", since we can only rename entire packages. This
seems fine, as it gives little information to the end user.
3) The -tiny mode no longer sets all lines to 0, since it did that by
modifying object files. As a temporary measure, we instead set all
top-level declarations to be on line 1. A TODO is added to hopefully
improve this again in the near future.
The upside is that we get rid of all the issues mentioned before. Plus,
garble now nearly works with Go 1.16, with the exception of two very
minor bugs that look fixable. A follow-up PR will take care of that and
start testing on 1.16.
Fixes #176.
Fixes #190.
4 years ago
|
|
|
|
}
|
|
|
|
|
fmt.Fprintf(newCfg, "importmap %s=%s\n", beforePath, afterPath)
|
|
|
|
|
}
|
stop relying on nested "go list -toolexec" calls (#422)
We rely on importcfg files to load type info for obfuscated packages.
We use this type information to remember what names we didn't obfuscate.
Unfortunately, indirect dependencies aren't listed in importcfg files,
so we relied on extra "go list -toolexec" calls to locate object files.
This worked fine, but added a significant amount of complexity.
The extra "go list -export -toolexec=garble" invocations weren't slow,
as they avoided rebuilding or re-obfuscating thanks to the build cache.
Still, it was hard to reason about how garble runs during a build
if we might have multiple layers of -toolexec invocations.
Instead, record the export files we encounter in an incremental map,
and persist it in the build cache via the gob file we're already using.
This way, each garble invocation knows where all object files are,
even those for indirect imports.
One wrinkle is that importcfg files can point to temporary object files.
In that case, figure out its final location in the build cache.
This requires hard-coding a bit of knowledge about how GOCACHE works,
but it seems relatively harmless given how it's very little code.
Plus, if GOCACHE ever changes, it will be obvious when our code breaks.
Finally, add a TODO about potentially saving even more work.
3 years ago
|
|
|
|
for _, pair := range packagefiles {
|
|
|
|
|
impPath, pkgfile := pair[0], pair[1]
|
concentrate and simplify "to obfuscate" logic
Back in the day, we used to call toObfuscate anytime we needed to know
whether a package should be obfuscated.
More recently, we started computing via the ToObfuscate field,
which then gets shared with all sub-processes via sharedCache.
We still had two places that directly called toObfuscate.
Replace those with ToObfuscate, and inline toObfuscate into shared.go.
obfuscatedImportPath is also a potential footgun for main packages.
Some use cases always want the original "main" package name,
such as for use in the compiler's "-p main" flag,
while other cases want the obfuscated package import path,
such as the entries in importcfg files.
Since each of these call sites handles the edge case well,
obfuscatedImportPath now panics on main packages to avoid any misuse.
Finally, test that we never leak main package paths via ldflags.txt.
We never did, but it's good to make sure.
Overall, this avoids confusion and trims the size of main.go a bit.
3 years ago
|
|
|
|
lpkg, err := listPackage(impPath)
|
|
|
|
|
if err != nil {
|
|
|
|
|
// TODO: it's unclear why an importcfg can include an import path
|
|
|
|
|
// that's not a dependency in an edge case with "go test ./...".
|
|
|
|
|
// See exporttest/*.go in testdata/scripts/test.txt.
|
|
|
|
|
// For now, spot the pattern and avoid the unnecessary error;
|
|
|
|
|
// the dependency is unused, so the packagefile line is redundant.
|
|
|
|
|
// This still triggers as of go1.19beta1.
|
|
|
|
|
if strings.HasSuffix(curPkg.ImportPath, ".test]") && strings.HasPrefix(curPkg.ImportPath, impPath) {
|
|
|
|
|
continue
|
|
|
|
|
}
|
concentrate and simplify "to obfuscate" logic
Back in the day, we used to call toObfuscate anytime we needed to know
whether a package should be obfuscated.
More recently, we started computing via the ToObfuscate field,
which then gets shared with all sub-processes via sharedCache.
We still had two places that directly called toObfuscate.
Replace those with ToObfuscate, and inline toObfuscate into shared.go.
obfuscatedImportPath is also a potential footgun for main packages.
Some use cases always want the original "main" package name,
such as for use in the compiler's "-p main" flag,
while other cases want the obfuscated package import path,
such as the entries in importcfg files.
Since each of these call sites handles the edge case well,
obfuscatedImportPath now panics on main packages to avoid any misuse.
Finally, test that we never leak main package paths via ldflags.txt.
We never did, but it's good to make sure.
Overall, this avoids confusion and trims the size of main.go a bit.
3 years ago
|
|
|
|
panic(err) // shouldn't happen
|
|
|
|
|
}
|
|
|
|
|
if lpkg.Name != "main" {
|
|
|
|
|
impPath = lpkg.obfuscatedImportPath()
|
reimplement import path obfuscation without goobj2 (#242)
We used to rely on a parallel implementation of an object file parser
and writer to be able to obfuscate import paths. After compiling each
package, we would parse the object file, replace the import paths, and
write the updated object file in-place.
That worked well, in most cases. Unfortunately, it had some flaws:
* Complexity. Even when most of the code is maintained in a separate
module, the import_obfuscation.go file was still close to a thousand
lines of code.
* Go compatibility. The object file format changes between Go releases,
so we were supporting Go 1.15, but not 1.16. Fixing the object file
package to work with 1.16 would probably break 1.15 support.
* Bugs. For example, we recently had to add a workaround for #224, since
import paths containing dots after the domain would end up escaped.
Another example is #190, which seems to be caused by the object file
parser or writer corrupting the compiled code and causing segfaults in
some rare edge cases.
Instead, let's drop that method entirely, and force the compiler and
linker to do the work for us. The steps necessary when compiling a
package to obfuscate are:
1) Replace its "package foo" lines with the obfuscated package path. No
need to separate the package path and name, since the obfuscated path
does not contain slashes.
2) Replace the "-p pkg/foo" flag with the obfuscated path.
3) Replace the "import" spec lines with the obfuscated package paths,
for those dependencies which were obfuscated.
4) Replace the "-importcfg [...]" file with a version that uses the
obfuscated paths instead.
The linker also needs that last step, since it also uses an importcfg
file to find object files.
There are three noteworthy drawbacks to this new method:
1) Since we no longer write object files, we can't use them to store
data to be cached. As such, the -debugdir flag goes back to using the
"-a" build flag to always rebuild all packages. On the plus side,
that caching didn't work very well; see #176.
2) The package name "main" remains in all declarations under it, not
just "func main", since we can only rename entire packages. This
seems fine, as it gives little information to the end user.
3) The -tiny mode no longer sets all lines to 0, since it did that by
modifying object files. As a temporary measure, we instead set all
top-level declarations to be on line 1. A TODO is added to hopefully
improve this again in the near future.
The upside is that we get rid of all the issues mentioned before. Plus,
garble now nearly works with Go 1.16, with the exception of two very
minor bugs that look fixable. A follow-up PR will take care of that and
start testing on 1.16.
Fixes #176.
Fixes #190.
4 years ago
|
|
|
|
}
|
stop relying on nested "go list -toolexec" calls (#422)
We rely on importcfg files to load type info for obfuscated packages.
We use this type information to remember what names we didn't obfuscate.
Unfortunately, indirect dependencies aren't listed in importcfg files,
so we relied on extra "go list -toolexec" calls to locate object files.
This worked fine, but added a significant amount of complexity.
The extra "go list -export -toolexec=garble" invocations weren't slow,
as they avoided rebuilding or re-obfuscating thanks to the build cache.
Still, it was hard to reason about how garble runs during a build
if we might have multiple layers of -toolexec invocations.
Instead, record the export files we encounter in an incremental map,
and persist it in the build cache via the gob file we're already using.
This way, each garble invocation knows where all object files are,
even those for indirect imports.
One wrinkle is that importcfg files can point to temporary object files.
In that case, figure out its final location in the build cache.
This requires hard-coding a bit of knowledge about how GOCACHE works,
but it seems relatively harmless given how it's very little code.
Plus, if GOCACHE ever changes, it will be obvious when our code breaks.
Finally, add a TODO about potentially saving even more work.
3 years ago
|
|
|
|
fmt.Fprintf(newCfg, "packagefile %s=%s\n", impPath, pkgfile)
|
reimplement import path obfuscation without goobj2 (#242)
We used to rely on a parallel implementation of an object file parser
and writer to be able to obfuscate import paths. After compiling each
package, we would parse the object file, replace the import paths, and
write the updated object file in-place.
That worked well, in most cases. Unfortunately, it had some flaws:
* Complexity. Even when most of the code is maintained in a separate
module, the import_obfuscation.go file was still close to a thousand
lines of code.
* Go compatibility. The object file format changes between Go releases,
so we were supporting Go 1.15, but not 1.16. Fixing the object file
package to work with 1.16 would probably break 1.15 support.
* Bugs. For example, we recently had to add a workaround for #224, since
import paths containing dots after the domain would end up escaped.
Another example is #190, which seems to be caused by the object file
parser or writer corrupting the compiled code and causing segfaults in
some rare edge cases.
Instead, let's drop that method entirely, and force the compiler and
linker to do the work for us. The steps necessary when compiling a
package to obfuscate are:
1) Replace its "package foo" lines with the obfuscated package path. No
need to separate the package path and name, since the obfuscated path
does not contain slashes.
2) Replace the "-p pkg/foo" flag with the obfuscated path.
3) Replace the "import" spec lines with the obfuscated package paths,
for those dependencies which were obfuscated.
4) Replace the "-importcfg [...]" file with a version that uses the
obfuscated paths instead.
The linker also needs that last step, since it also uses an importcfg
file to find object files.
There are three noteworthy drawbacks to this new method:
1) Since we no longer write object files, we can't use them to store
data to be cached. As such, the -debugdir flag goes back to using the
"-a" build flag to always rebuild all packages. On the plus side,
that caching didn't work very well; see #176.
2) The package name "main" remains in all declarations under it, not
just "func main", since we can only rename entire packages. This
seems fine, as it gives little information to the end user.
3) The -tiny mode no longer sets all lines to 0, since it did that by
modifying object files. As a temporary measure, we instead set all
top-level declarations to be on line 1. A TODO is added to hopefully
improve this again in the near future.
The upside is that we get rid of all the issues mentioned before. Plus,
garble now nearly works with Go 1.16, with the exception of two very
minor bugs that look fixable. A follow-up PR will take care of that and
start testing on 1.16.
Fixes #176.
Fixes #190.
4 years ago
|
|
|
|
}
|
fix and re-enable "garble test" (#268)
With the many refactors building up to v0.1.0, we broke "garble test" as
we no longer dealt with test packages well.
Luckily, now that we can depend on TOOLEXEC_IMPORTPATH, we can support
the test command again, as we can always figure out what package we're
currently compiling, without having to track a "main" package.
Note that one major pitfall there is test packages, where
TOOLEXEC_IMPORTPATH does not agree with ImportPath from "go list -json".
However, we can still work around that with a bit of glue code, which is
also copiously documented.
The second change necessary is to consider test packages private
depending on whether their non-test package is private or not. This can
be done via the ForTest field in "go list -json".
The third change is to obfuscate "_testmain.go" files, which are the
code-generated main functions which actually run tests. We used to not
need to obfuscate them, since test function names are never obfuscated
and we used to not obfuscate import paths at compilation time. Now we do
rewrite import paths, so we must do that for "_testmain.go" too.
The fourth change is to re-enable test.txt, and expand it with more
sanity checks and edge cases.
Finally, document "garble test" again.
Fixes #241.
4 years ago
|
|
|
|
|
|
|
|
|
// Uncomment to debug the transformed importcfg. Do not delete.
|
|
|
|
|
// newCfg.Seek(0, 0)
|
|
|
|
|
// io.Copy(os.Stderr, newCfg)
|
|
|
|
|
|
reimplement import path obfuscation without goobj2 (#242)
We used to rely on a parallel implementation of an object file parser
and writer to be able to obfuscate import paths. After compiling each
package, we would parse the object file, replace the import paths, and
write the updated object file in-place.
That worked well, in most cases. Unfortunately, it had some flaws:
* Complexity. Even when most of the code is maintained in a separate
module, the import_obfuscation.go file was still close to a thousand
lines of code.
* Go compatibility. The object file format changes between Go releases,
so we were supporting Go 1.15, but not 1.16. Fixing the object file
package to work with 1.16 would probably break 1.15 support.
* Bugs. For example, we recently had to add a workaround for #224, since
import paths containing dots after the domain would end up escaped.
Another example is #190, which seems to be caused by the object file
parser or writer corrupting the compiled code and causing segfaults in
some rare edge cases.
Instead, let's drop that method entirely, and force the compiler and
linker to do the work for us. The steps necessary when compiling a
package to obfuscate are:
1) Replace its "package foo" lines with the obfuscated package path. No
need to separate the package path and name, since the obfuscated path
does not contain slashes.
2) Replace the "-p pkg/foo" flag with the obfuscated path.
3) Replace the "import" spec lines with the obfuscated package paths,
for those dependencies which were obfuscated.
4) Replace the "-importcfg [...]" file with a version that uses the
obfuscated paths instead.
The linker also needs that last step, since it also uses an importcfg
file to find object files.
There are three noteworthy drawbacks to this new method:
1) Since we no longer write object files, we can't use them to store
data to be cached. As such, the -debugdir flag goes back to using the
"-a" build flag to always rebuild all packages. On the plus side,
that caching didn't work very well; see #176.
2) The package name "main" remains in all declarations under it, not
just "func main", since we can only rename entire packages. This
seems fine, as it gives little information to the end user.
3) The -tiny mode no longer sets all lines to 0, since it did that by
modifying object files. As a temporary measure, we instead set all
top-level declarations to be on line 1. A TODO is added to hopefully
improve this again in the near future.
The upside is that we get rid of all the issues mentioned before. Plus,
garble now nearly works with Go 1.16, with the exception of two very
minor bugs that look fixable. A follow-up PR will take care of that and
start testing on 1.16.
Fixes #176.
Fixes #190.
4 years ago
|
|
|
|
if err := newCfg.Close(); err != nil {
|
|
|
|
|
return "", err
|
|
|
|
|
}
|
|
|
|
|
return newCfg.Name(), nil
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
type (
|
properly record when type aliases are embedded as fields
There are two scenarios when it comes to embedding fields.
The first is easy, and we always handled it well:
type Named struct { Foo int }
type T struct { Named }
In this scenario, T ends up with an embedded field named "Named",
and a promoted field named "Foo".
Then there's the form with a type alias:
type Named struct { Foo int }
type Alias = Named
type T struct { Alias }
This case is different: T ends up with an embedded field named "Alias",
and a promoted field named "Foo".
Note how the field gets its name from the referenced type,
even if said type is just an alias to another type.
This poses two problems.
First, we must obfuscate the field T.Alias as the name "Alias",
and not as the name "Named" that the alias points to.
Second, we must be careful of cases where Named and Alias are declared
in different packages, as they will obfuscate the same name differently.
Both of those problems compounded in the reported issue.
The actual reason is that quic-go has a type alias in the form of:
type ConnectionState = qtls.ConnectionState
In other words, the entire problem boils down to a type alias which
points to a named type in a different package, where both types share
the same name. For example:
package parent
import "parent/p1"
type T struct { p1.SameName }
[...]
package p1
import "parent/p2"
type SameName = p2.SameName
[...]
package p2
type SameName struct { Foo int }
This broke garble because we had a heuristic to detect when an embedded
field was a type alias:
// Instead, detect such a "foreign alias embed".
// If we embed a final named type,
// but the field name does not match its name,
// then it must have been done via an alias.
// We dig out the alias's TypeName via locateForeignAlias.
if named.Obj().Name() != node.Name {
As the reader can deduce, this heuristic would incorrectly assume that
the snippet above does not embed a type alias, when in fact it does.
When obfuscating the field T.SameName, which uses a type alias,
we would correctly obfuscate the name "SameName",
but we would incorrectly obfuscate it with the package p2, not p1.
This would then result in build errors.
To fix this problem for good, we need to get rid of the heuristic.
Instead, we now mimic what was done for KnownCannotObfuscate,
but for embedded fields which use type aliases.
KnownEmbeddedAliasFields is now filled for each package
and stored in the cache as part of cachedOutput.
We can then detect the "embedded alias" case reliably,
even when the field is declared in an imported package.
On the plus side, we get to remove locateForeignAlias.
We also add a couple of TODOs to record further improvements.
Finally, add a test.
Fixes #466.
3 years ago
|
|
|
|
funcFullName = string // as per go/types.Func.FullName
|
|
|
|
|
objectString = string // as per recordedObjectString
|
|
|
|
|
|
|
|
|
|
reflectParameter struct {
|
|
|
|
|
Position int // 0-indexed
|
|
|
|
|
Variadic bool // ...int
|
|
|
|
|
}
|
properly record when type aliases are embedded as fields
There are two scenarios when it comes to embedding fields.
The first is easy, and we always handled it well:
type Named struct { Foo int }
type T struct { Named }
In this scenario, T ends up with an embedded field named "Named",
and a promoted field named "Foo".
Then there's the form with a type alias:
type Named struct { Foo int }
type Alias = Named
type T struct { Alias }
This case is different: T ends up with an embedded field named "Alias",
and a promoted field named "Foo".
Note how the field gets its name from the referenced type,
even if said type is just an alias to another type.
This poses two problems.
First, we must obfuscate the field T.Alias as the name "Alias",
and not as the name "Named" that the alias points to.
Second, we must be careful of cases where Named and Alias are declared
in different packages, as they will obfuscate the same name differently.
Both of those problems compounded in the reported issue.
The actual reason is that quic-go has a type alias in the form of:
type ConnectionState = qtls.ConnectionState
In other words, the entire problem boils down to a type alias which
points to a named type in a different package, where both types share
the same name. For example:
package parent
import "parent/p1"
type T struct { p1.SameName }
[...]
package p1
import "parent/p2"
type SameName = p2.SameName
[...]
package p2
type SameName struct { Foo int }
This broke garble because we had a heuristic to detect when an embedded
field was a type alias:
// Instead, detect such a "foreign alias embed".
// If we embed a final named type,
// but the field name does not match its name,
// then it must have been done via an alias.
// We dig out the alias's TypeName via locateForeignAlias.
if named.Obj().Name() != node.Name {
As the reader can deduce, this heuristic would incorrectly assume that
the snippet above does not embed a type alias, when in fact it does.
When obfuscating the field T.SameName, which uses a type alias,
we would correctly obfuscate the name "SameName",
but we would incorrectly obfuscate it with the package p2, not p1.
This would then result in build errors.
To fix this problem for good, we need to get rid of the heuristic.
Instead, we now mimic what was done for KnownCannotObfuscate,
but for embedded fields which use type aliases.
KnownEmbeddedAliasFields is now filled for each package
and stored in the cache as part of cachedOutput.
We can then detect the "embedded alias" case reliably,
even when the field is declared in an imported package.
On the plus side, we get to remove locateForeignAlias.
We also add a couple of TODOs to record further improvements.
Finally, add a test.
Fixes #466.
3 years ago
|
|
|
|
|
|
|
|
|
typeName struct {
|
|
|
|
|
PkgPath, Name string
|
|
|
|
|
}
|
|
|
|
|
)
|
detect more std API calls which use reflection
Before, we would just notice direct calls to reflect's TypeOf and
ValueOf. Any other uses of reflection, such as encoding/json or
google.golang.org/protobuf, would require hints as documented by the
README.
Issue #162 outlines some ways we could fix this issue in a general way,
automatically detecting what functions use reflection on their parameters,
even for third party API funcs.
However, that goal is pretty significant in terms of code and effort.
As a temporary improvement, we can expand the list of "known" reflection
APIs via a static table.
Since this table is keyed by "func full name" strings, we could
potentially include third party APIs, such as:
google.golang.org/protobuf/proto.Marshal
However, for now simply include all the std APIs we know about.
If we fail to do the proper fix for automatic detection in the future,
we can then fall back to expanding this global table for third parties.
Update the README's docs, to clarify that the hint is not always
necessary anymore.
Also update the reflect.txt test to stop using the hint for encoding/json,
and to also start testing text/template with a method call.
While at it, I noticed that we weren't testing the println outputs,
as they'd go to stderr - fix that too.
Updates #162.
4 years ago
|
|
|
|
|
clarify how each "cannot obfuscate" map works
We used to record all objects in cannotObfuscateNames,
and then we'd add the exported ones to KnownCannotObfuscate.
Instead, teach recordAsNotObfuscated to store each object in either
knownCannotObfuscateUnexported or KnownCannotObfuscate, but not both.
The former isn't cached so it uses in-memory pointers as keys,
and the latter uses the cross-process objectStrings like before.
Functionally, this is all the same, but with the difference that the map
indexed by types.Object will not contain objects already recorded in
KnownCannotObfuscate, reducing the amount of duplicate memory use.
While here, give recordIgnore a less ambiguous name,
and remove the second parameter as it was always tf.pkg.Path().
This also means we can compare *types.Package pointers directly.
Finally, add more TODOs for further improvement ideas.
It does mean that we end up with more TODOs than before,
even though I'm fixing one, but I reckon that's a good thing.
Recording these ideas can give first-time contributors ways to help,
and it ensures I don't forget about ideas just in my head.
3 years ago
|
|
|
|
// TODO: read-write globals like these should probably be inside transformer
|
|
|
|
|
|
|
|
|
|
// knownCannotObfuscateUnexported is like KnownCannotObfuscate but for
|
|
|
|
|
// unexported names. We don't need to store this in the build cache,
|
|
|
|
|
// because these names cannot be referenced by downstream packages.
|
|
|
|
|
var knownCannotObfuscateUnexported = map[types.Object]bool{}
|
|
|
|
|
|
stop relying on nested "go list -toolexec" calls (#422)
We rely on importcfg files to load type info for obfuscated packages.
We use this type information to remember what names we didn't obfuscate.
Unfortunately, indirect dependencies aren't listed in importcfg files,
so we relied on extra "go list -toolexec" calls to locate object files.
This worked fine, but added a significant amount of complexity.
The extra "go list -export -toolexec=garble" invocations weren't slow,
as they avoided rebuilding or re-obfuscating thanks to the build cache.
Still, it was hard to reason about how garble runs during a build
if we might have multiple layers of -toolexec invocations.
Instead, record the export files we encounter in an incremental map,
and persist it in the build cache via the gob file we're already using.
This way, each garble invocation knows where all object files are,
even those for indirect imports.
One wrinkle is that importcfg files can point to temporary object files.
In that case, figure out its final location in the build cache.
This requires hard-coding a bit of knowledge about how GOCACHE works,
but it seems relatively harmless given how it's very little code.
Plus, if GOCACHE ever changes, it will be obvious when our code breaks.
Finally, add a TODO about potentially saving even more work.
3 years ago
|
|
|
|
// cachedOutput contains information that will be stored as per garbleExportFile.
|
|
|
|
|
// Note that cachedOutput gets loaded from all direct package dependencies,
|
|
|
|
|
// and gets filled while obfuscating the current package, so it ends up
|
|
|
|
|
// containing entries for the current package and its transitive dependencies.
|
stop relying on nested "go list -toolexec" calls (#422)
We rely on importcfg files to load type info for obfuscated packages.
We use this type information to remember what names we didn't obfuscate.
Unfortunately, indirect dependencies aren't listed in importcfg files,
so we relied on extra "go list -toolexec" calls to locate object files.
This worked fine, but added a significant amount of complexity.
The extra "go list -export -toolexec=garble" invocations weren't slow,
as they avoided rebuilding or re-obfuscating thanks to the build cache.
Still, it was hard to reason about how garble runs during a build
if we might have multiple layers of -toolexec invocations.
Instead, record the export files we encounter in an incremental map,
and persist it in the build cache via the gob file we're already using.
This way, each garble invocation knows where all object files are,
even those for indirect imports.
One wrinkle is that importcfg files can point to temporary object files.
In that case, figure out its final location in the build cache.
This requires hard-coding a bit of knowledge about how GOCACHE works,
but it seems relatively harmless given how it's very little code.
Plus, if GOCACHE ever changes, it will be obvious when our code breaks.
Finally, add a TODO about potentially saving even more work.
3 years ago
|
|
|
|
var cachedOutput = struct {
|
|
|
|
|
// KnownReflectAPIs is a static record of what std APIs use reflection on their
|
|
|
|
|
// parameters, so we can avoid obfuscating types used with them.
|
|
|
|
|
//
|
|
|
|
|
// TODO: we're not including fmt.Printf, as it would have many false positives,
|
|
|
|
|
// unless we were smart enough to detect which arguments get used as %#v or %T.
|
|
|
|
|
KnownReflectAPIs map[funcFullName][]reflectParameter
|
stop loading obfuscated type information from deps
If package P1 imports package P2, P1 needs to know which names from P2
weren't obfuscated. For instance, if P2 declares T2 and does
"reflect.TypeOf(T2{...})", then P2 won't obfuscate the name T2, and
neither should P1.
This information should flow from P2 to P1, as P2 builds before
P1. We do this via obfuscatedTypesPackage; P1 loads the type information
of the obfuscated version of P2, and does a lookup for T2. If T2 exists,
then it wasn't obfuscated.
This mechanism has served us well, but it has downsides:
1) It wastes CPU; we load the type information for the entire package.
2) It's complex; for instance, we need KnownObjectFiles as an extra.
3) It makes our code harder to understand, as we load both the original
and obfuscated type informaiton.
Instead, we now have each package record what names were not obfuscated
as part of its cachedOuput file. Much like KnownObjectFiles, the map
records incrementally through the import graph, to avoid having to load
cachedOutput files for indirect dependencies.
We shouldn't need to worry about those maps getting large;
we only skip obfuscating declared names in a few uncommon scenarios,
such as the use of reflection or cgo's "//export".
Since go/types is relatively allocation-heavy, and the export files
contain a lot of data, we get a nice speed-up:
name old time/op new time/op delta
Build-16 11.5s ± 2% 11.1s ± 3% -3.77% (p=0.008 n=5+5)
name old bin-B new bin-B delta
Build-16 5.15M ± 0% 5.15M ± 0% ~ (all equal)
name old cached-time/op new cached-time/op delta
Build-16 375ms ± 3% 341ms ± 6% -8.96% (p=0.008 n=5+5)
name old sys-time/op new sys-time/op delta
Build-16 283ms ±17% 289ms ±13% ~ (p=0.841 n=5+5)
name old user-time/op new user-time/op delta
Build-16 687ms ± 6% 664ms ± 7% ~ (p=0.548 n=5+5)
Fixes #456.
Updates #475.
3 years ago
|
|
|
|
|
|
|
|
|
// KnownCannotObfuscate is filled with the fully qualified names from each
|
|
|
|
|
// package that we cannot obfuscate.
|
stop loading obfuscated type information from deps
If package P1 imports package P2, P1 needs to know which names from P2
weren't obfuscated. For instance, if P2 declares T2 and does
"reflect.TypeOf(T2{...})", then P2 won't obfuscate the name T2, and
neither should P1.
This information should flow from P2 to P1, as P2 builds before
P1. We do this via obfuscatedTypesPackage; P1 loads the type information
of the obfuscated version of P2, and does a lookup for T2. If T2 exists,
then it wasn't obfuscated.
This mechanism has served us well, but it has downsides:
1) It wastes CPU; we load the type information for the entire package.
2) It's complex; for instance, we need KnownObjectFiles as an extra.
3) It makes our code harder to understand, as we load both the original
and obfuscated type informaiton.
Instead, we now have each package record what names were not obfuscated
as part of its cachedOuput file. Much like KnownObjectFiles, the map
records incrementally through the import graph, to avoid having to load
cachedOutput files for indirect dependencies.
We shouldn't need to worry about those maps getting large;
we only skip obfuscating declared names in a few uncommon scenarios,
such as the use of reflection or cgo's "//export".
Since go/types is relatively allocation-heavy, and the export files
contain a lot of data, we get a nice speed-up:
name old time/op new time/op delta
Build-16 11.5s ± 2% 11.1s ± 3% -3.77% (p=0.008 n=5+5)
name old bin-B new bin-B delta
Build-16 5.15M ± 0% 5.15M ± 0% ~ (all equal)
name old cached-time/op new cached-time/op delta
Build-16 375ms ± 3% 341ms ± 6% -8.96% (p=0.008 n=5+5)
name old sys-time/op new sys-time/op delta
Build-16 283ms ±17% 289ms ±13% ~ (p=0.841 n=5+5)
name old user-time/op new user-time/op delta
Build-16 687ms ± 6% 664ms ± 7% ~ (p=0.548 n=5+5)
Fixes #456.
Updates #475.
3 years ago
|
|
|
|
// This record is necessary for knowing what names from imported packages
|
|
|
|
|
// weren't obfuscated, so we can obfuscate their local uses accordingly.
|
properly record when type aliases are embedded as fields
There are two scenarios when it comes to embedding fields.
The first is easy, and we always handled it well:
type Named struct { Foo int }
type T struct { Named }
In this scenario, T ends up with an embedded field named "Named",
and a promoted field named "Foo".
Then there's the form with a type alias:
type Named struct { Foo int }
type Alias = Named
type T struct { Alias }
This case is different: T ends up with an embedded field named "Alias",
and a promoted field named "Foo".
Note how the field gets its name from the referenced type,
even if said type is just an alias to another type.
This poses two problems.
First, we must obfuscate the field T.Alias as the name "Alias",
and not as the name "Named" that the alias points to.
Second, we must be careful of cases where Named and Alias are declared
in different packages, as they will obfuscate the same name differently.
Both of those problems compounded in the reported issue.
The actual reason is that quic-go has a type alias in the form of:
type ConnectionState = qtls.ConnectionState
In other words, the entire problem boils down to a type alias which
points to a named type in a different package, where both types share
the same name. For example:
package parent
import "parent/p1"
type T struct { p1.SameName }
[...]
package p1
import "parent/p2"
type SameName = p2.SameName
[...]
package p2
type SameName struct { Foo int }
This broke garble because we had a heuristic to detect when an embedded
field was a type alias:
// Instead, detect such a "foreign alias embed".
// If we embed a final named type,
// but the field name does not match its name,
// then it must have been done via an alias.
// We dig out the alias's TypeName via locateForeignAlias.
if named.Obj().Name() != node.Name {
As the reader can deduce, this heuristic would incorrectly assume that
the snippet above does not embed a type alias, when in fact it does.
When obfuscating the field T.SameName, which uses a type alias,
we would correctly obfuscate the name "SameName",
but we would incorrectly obfuscate it with the package p2, not p1.
This would then result in build errors.
To fix this problem for good, we need to get rid of the heuristic.
Instead, we now mimic what was done for KnownCannotObfuscate,
but for embedded fields which use type aliases.
KnownEmbeddedAliasFields is now filled for each package
and stored in the cache as part of cachedOutput.
We can then detect the "embedded alias" case reliably,
even when the field is declared in an imported package.
On the plus side, we get to remove locateForeignAlias.
We also add a couple of TODOs to record further improvements.
Finally, add a test.
Fixes #466.
3 years ago
|
|
|
|
KnownCannotObfuscate map[objectString]struct{}
|
|
|
|
|
|
|
|
|
|
// KnownEmbeddedAliasFields records which embedded fields use a type alias.
|
|
|
|
|
// They are the only instance where a type alias matters for obfuscation,
|
|
|
|
|
// because the embedded field name is derived from the type alias itself,
|
|
|
|
|
// and not the type that the alias points to.
|
|
|
|
|
// In that way, the type alias is obfuscated as a form of named type,
|
|
|
|
|
// bearing in mind that it may be owned by a different package.
|
|
|
|
|
KnownEmbeddedAliasFields map[objectString]typeName
|
stop relying on nested "go list -toolexec" calls (#422)
We rely on importcfg files to load type info for obfuscated packages.
We use this type information to remember what names we didn't obfuscate.
Unfortunately, indirect dependencies aren't listed in importcfg files,
so we relied on extra "go list -toolexec" calls to locate object files.
This worked fine, but added a significant amount of complexity.
The extra "go list -export -toolexec=garble" invocations weren't slow,
as they avoided rebuilding or re-obfuscating thanks to the build cache.
Still, it was hard to reason about how garble runs during a build
if we might have multiple layers of -toolexec invocations.
Instead, record the export files we encounter in an incremental map,
and persist it in the build cache via the gob file we're already using.
This way, each garble invocation knows where all object files are,
even those for indirect imports.
One wrinkle is that importcfg files can point to temporary object files.
In that case, figure out its final location in the build cache.
This requires hard-coding a bit of knowledge about how GOCACHE works,
but it seems relatively harmless given how it's very little code.
Plus, if GOCACHE ever changes, it will be obvious when our code breaks.
Finally, add a TODO about potentially saving even more work.
3 years ago
|
|
|
|
}{
|
|
|
|
|
KnownReflectAPIs: map[funcFullName][]reflectParameter{
|
|
|
|
|
"reflect.TypeOf": {{Position: 0, Variadic: false}},
|
|
|
|
|
"reflect.ValueOf": {{Position: 0, Variadic: false}},
|
stop relying on nested "go list -toolexec" calls (#422)
We rely on importcfg files to load type info for obfuscated packages.
We use this type information to remember what names we didn't obfuscate.
Unfortunately, indirect dependencies aren't listed in importcfg files,
so we relied on extra "go list -toolexec" calls to locate object files.
This worked fine, but added a significant amount of complexity.
The extra "go list -export -toolexec=garble" invocations weren't slow,
as they avoided rebuilding or re-obfuscating thanks to the build cache.
Still, it was hard to reason about how garble runs during a build
if we might have multiple layers of -toolexec invocations.
Instead, record the export files we encounter in an incremental map,
and persist it in the build cache via the gob file we're already using.
This way, each garble invocation knows where all object files are,
even those for indirect imports.
One wrinkle is that importcfg files can point to temporary object files.
In that case, figure out its final location in the build cache.
This requires hard-coding a bit of knowledge about how GOCACHE works,
but it seems relatively harmless given how it's very little code.
Plus, if GOCACHE ever changes, it will be obvious when our code breaks.
Finally, add a TODO about potentially saving even more work.
3 years ago
|
|
|
|
},
|
|
|
|
|
KnownCannotObfuscate: map[objectString]struct{}{},
|
|
|
|
|
KnownEmbeddedAliasFields: map[objectString]typeName{},
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// garbleExportFile returns an absolute path to a build cache entry
|
|
|
|
|
// which belongs to garble and corresponds to the given Go package.
|
|
|
|
|
//
|
|
|
|
|
// Unlike pkg.Export, it is only read and written by garble itself.
|
|
|
|
|
// Also unlike pkg.Export, it includes GarbleActionID,
|
|
|
|
|
// so its path will change if the obfuscated build changes.
|
|
|
|
|
//
|
|
|
|
|
// The purpose of such a file is to store garble-specific information
|
|
|
|
|
// in the build cache, to be reused at a later time.
|
|
|
|
|
// The file should have the same lifetime as pkg.Export,
|
|
|
|
|
// as it lives under the same cache directory that gets trimmed automatically.
|
|
|
|
|
func garbleExportFile(pkg *listedPackage) string {
|
|
|
|
|
trimmed := strings.TrimSuffix(pkg.Export, "-d")
|
|
|
|
|
if trimmed == pkg.Export {
|
|
|
|
|
panic(fmt.Sprintf("unexpected export path of %s: %q", pkg.ImportPath, pkg.Export))
|
|
|
|
|
}
|
|
|
|
|
return trimmed + "-garble-" + hashToString(pkg.GarbleActionID) + "-d"
|
|
|
|
|
}
|
|
|
|
|
|
stop relying on nested "go list -toolexec" calls (#422)
We rely on importcfg files to load type info for obfuscated packages.
We use this type information to remember what names we didn't obfuscate.
Unfortunately, indirect dependencies aren't listed in importcfg files,
so we relied on extra "go list -toolexec" calls to locate object files.
This worked fine, but added a significant amount of complexity.
The extra "go list -export -toolexec=garble" invocations weren't slow,
as they avoided rebuilding or re-obfuscating thanks to the build cache.
Still, it was hard to reason about how garble runs during a build
if we might have multiple layers of -toolexec invocations.
Instead, record the export files we encounter in an incremental map,
and persist it in the build cache via the gob file we're already using.
This way, each garble invocation knows where all object files are,
even those for indirect imports.
One wrinkle is that importcfg files can point to temporary object files.
In that case, figure out its final location in the build cache.
This requires hard-coding a bit of knowledge about how GOCACHE works,
but it seems relatively harmless given how it's very little code.
Plus, if GOCACHE ever changes, it will be obvious when our code breaks.
Finally, add a TODO about potentially saving even more work.
3 years ago
|
|
|
|
func loadCachedOutputs() error {
|
|
|
|
|
startTime := time.Now()
|
|
|
|
|
loaded := 0
|
|
|
|
|
for _, path := range curPkg.Deps {
|
|
|
|
|
pkg, err := listPackage(path)
|
|
|
|
|
if err != nil {
|
stop relying on nested "go list -toolexec" calls (#422)
We rely on importcfg files to load type info for obfuscated packages.
We use this type information to remember what names we didn't obfuscate.
Unfortunately, indirect dependencies aren't listed in importcfg files,
so we relied on extra "go list -toolexec" calls to locate object files.
This worked fine, but added a significant amount of complexity.
The extra "go list -export -toolexec=garble" invocations weren't slow,
as they avoided rebuilding or re-obfuscating thanks to the build cache.
Still, it was hard to reason about how garble runs during a build
if we might have multiple layers of -toolexec invocations.
Instead, record the export files we encounter in an incremental map,
and persist it in the build cache via the gob file we're already using.
This way, each garble invocation knows where all object files are,
even those for indirect imports.
One wrinkle is that importcfg files can point to temporary object files.
In that case, figure out its final location in the build cache.
This requires hard-coding a bit of knowledge about how GOCACHE works,
but it seems relatively harmless given how it's very little code.
Plus, if GOCACHE ever changes, it will be obvious when our code breaks.
Finally, add a TODO about potentially saving even more work.
3 years ago
|
|
|
|
panic(err) // shouldn't happen
|
|
|
|
|
}
|
|
|
|
|
if pkg.Export == "" {
|
|
|
|
|
continue // nothing to load
|
|
|
|
|
}
|
|
|
|
|
// this function literal is used for the deferred close
|
|
|
|
|
if err := func() error {
|
|
|
|
|
filename := garbleExportFile(pkg)
|
|
|
|
|
f, err := os.Open(filename)
|
|
|
|
|
if err != nil {
|
|
|
|
|
return err
|
|
|
|
|
}
|
|
|
|
|
defer f.Close()
|
|
|
|
|
|
stop relying on nested "go list -toolexec" calls (#422)
We rely on importcfg files to load type info for obfuscated packages.
We use this type information to remember what names we didn't obfuscate.
Unfortunately, indirect dependencies aren't listed in importcfg files,
so we relied on extra "go list -toolexec" calls to locate object files.
This worked fine, but added a significant amount of complexity.
The extra "go list -export -toolexec=garble" invocations weren't slow,
as they avoided rebuilding or re-obfuscating thanks to the build cache.
Still, it was hard to reason about how garble runs during a build
if we might have multiple layers of -toolexec invocations.
Instead, record the export files we encounter in an incremental map,
and persist it in the build cache via the gob file we're already using.
This way, each garble invocation knows where all object files are,
even those for indirect imports.
One wrinkle is that importcfg files can point to temporary object files.
In that case, figure out its final location in the build cache.
This requires hard-coding a bit of knowledge about how GOCACHE works,
but it seems relatively harmless given how it's very little code.
Plus, if GOCACHE ever changes, it will be obvious when our code breaks.
Finally, add a TODO about potentially saving even more work.
3 years ago
|
|
|
|
// Decode appends new entries to the existing maps
|
|
|
|
|
if err := gob.NewDecoder(f).Decode(&cachedOutput); err != nil {
|
|
|
|
|
return fmt.Errorf("gob decode: %w", err)
|
|
|
|
|
}
|
|
|
|
|
return nil
|
|
|
|
|
}(); err != nil {
|
|
|
|
|
return fmt.Errorf("cannot load garble export file for %s: %w", path, err)
|
|
|
|
|
}
|
|
|
|
|
loaded++
|
|
|
|
|
}
|
|
|
|
|
log.Printf("%d cached output files loaded in %s", loaded, debugSince(startTime))
|
|
|
|
|
return nil
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
func (tf *transformer) findReflectFunctions(files []*ast.File) {
|
slight simplifications and alloc reductions
Reuse a buffer and a map across loop iterations, because we can.
Make recordTypeDone only track named types, as that is enough to detect
type cycles. Without named types, there can be no cycles.
These two reduce allocs by a fraction of a percent:
name old time/op new time/op delta
Build-16 10.4s ± 2% 10.4s ± 1% ~ (p=0.739 n=10+10)
name old bin-B new bin-B delta
Build-16 5.51M ± 0% 5.51M ± 0% ~ (all equal)
name old cached-time/op new cached-time/op delta
Build-16 391ms ± 9% 407ms ± 7% ~ (p=0.095 n=10+9)
name old mallocs/op new mallocs/op delta
Build-16 34.5M ± 0% 34.4M ± 0% -0.12% (p=0.000 n=10+10)
name old sys-time/op new sys-time/op delta
Build-16 5.87s ± 5% 5.82s ± 5% ~ (p=0.182 n=10+9)
It doesn't seem like much, but remember that these stats are for the
entire set of processes, where garble only accounts for about 10% of the
total wall time when compared to the compiler or linker. So a ~0.1%
decrease globally is still significant.
linkerVariableStrings is also indexed by *types.Var rather than types.Object,
since -ldflags=-X only supports setting the string value of variables.
This shouldn't make a significant difference in terms of allocs,
but at least the map is less prone to confusion with other object types.
To ensure the new code doesn't trip up on non-variables, we add test cases.
Finally, for the sake of clarity, index into the types.Info maps like
Defs and Uses rather than calling ObjectOf if we know whether the
identifier we have is a definition of a name or the use of a defined name.
This isn't better in terms of performance, as ObjectOf is a tiny method,
but just like with linkerVariableStrings before, the new code is clearer.
3 years ago
|
|
|
|
seenReflectParams := make(map[*types.Var]bool)
|
|
|
|
|
visitFuncDecl := func(funcDecl *ast.FuncDecl) {
|
slight simplifications and alloc reductions
Reuse a buffer and a map across loop iterations, because we can.
Make recordTypeDone only track named types, as that is enough to detect
type cycles. Without named types, there can be no cycles.
These two reduce allocs by a fraction of a percent:
name old time/op new time/op delta
Build-16 10.4s ± 2% 10.4s ± 1% ~ (p=0.739 n=10+10)
name old bin-B new bin-B delta
Build-16 5.51M ± 0% 5.51M ± 0% ~ (all equal)
name old cached-time/op new cached-time/op delta
Build-16 391ms ± 9% 407ms ± 7% ~ (p=0.095 n=10+9)
name old mallocs/op new mallocs/op delta
Build-16 34.5M ± 0% 34.4M ± 0% -0.12% (p=0.000 n=10+10)
name old sys-time/op new sys-time/op delta
Build-16 5.87s ± 5% 5.82s ± 5% ~ (p=0.182 n=10+9)
It doesn't seem like much, but remember that these stats are for the
entire set of processes, where garble only accounts for about 10% of the
total wall time when compared to the compiler or linker. So a ~0.1%
decrease globally is still significant.
linkerVariableStrings is also indexed by *types.Var rather than types.Object,
since -ldflags=-X only supports setting the string value of variables.
This shouldn't make a significant difference in terms of allocs,
but at least the map is less prone to confusion with other object types.
To ensure the new code doesn't trip up on non-variables, we add test cases.
Finally, for the sake of clarity, index into the types.Info maps like
Defs and Uses rather than calling ObjectOf if we know whether the
identifier we have is a definition of a name or the use of a defined name.
This isn't better in terms of performance, as ObjectOf is a tiny method,
but just like with linkerVariableStrings before, the new code is clearer.
3 years ago
|
|
|
|
funcObj := tf.info.Defs[funcDecl.Name].(*types.Func)
|
|
|
|
|
funcType := funcObj.Type().(*types.Signature)
|
|
|
|
|
funcParams := funcType.Params()
|
|
|
|
|
|
slight simplifications and alloc reductions
Reuse a buffer and a map across loop iterations, because we can.
Make recordTypeDone only track named types, as that is enough to detect
type cycles. Without named types, there can be no cycles.
These two reduce allocs by a fraction of a percent:
name old time/op new time/op delta
Build-16 10.4s ± 2% 10.4s ± 1% ~ (p=0.739 n=10+10)
name old bin-B new bin-B delta
Build-16 5.51M ± 0% 5.51M ± 0% ~ (all equal)
name old cached-time/op new cached-time/op delta
Build-16 391ms ± 9% 407ms ± 7% ~ (p=0.095 n=10+9)
name old mallocs/op new mallocs/op delta
Build-16 34.5M ± 0% 34.4M ± 0% -0.12% (p=0.000 n=10+10)
name old sys-time/op new sys-time/op delta
Build-16 5.87s ± 5% 5.82s ± 5% ~ (p=0.182 n=10+9)
It doesn't seem like much, but remember that these stats are for the
entire set of processes, where garble only accounts for about 10% of the
total wall time when compared to the compiler or linker. So a ~0.1%
decrease globally is still significant.
linkerVariableStrings is also indexed by *types.Var rather than types.Object,
since -ldflags=-X only supports setting the string value of variables.
This shouldn't make a significant difference in terms of allocs,
but at least the map is less prone to confusion with other object types.
To ensure the new code doesn't trip up on non-variables, we add test cases.
Finally, for the sake of clarity, index into the types.Info maps like
Defs and Uses rather than calling ObjectOf if we know whether the
identifier we have is a definition of a name or the use of a defined name.
This isn't better in terms of performance, as ObjectOf is a tiny method,
but just like with linkerVariableStrings before, the new code is clearer.
3 years ago
|
|
|
|
maps.Clear(seenReflectParams)
|
|
|
|
|
for i := 0; i < funcParams.Len(); i++ {
|
|
|
|
|
seenReflectParams[funcParams.At(i)] = false
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
ast.Inspect(funcDecl, func(node ast.Node) bool {
|
|
|
|
|
call, ok := node.(*ast.CallExpr)
|
|
|
|
|
if !ok {
|
|
|
|
|
return true
|
|
|
|
|
}
|
|
|
|
|
sel, ok := call.Fun.(*ast.SelectorExpr)
|
|
|
|
|
if !ok {
|
|
|
|
|
return true
|
|
|
|
|
}
|
slight simplifications and alloc reductions
Reuse a buffer and a map across loop iterations, because we can.
Make recordTypeDone only track named types, as that is enough to detect
type cycles. Without named types, there can be no cycles.
These two reduce allocs by a fraction of a percent:
name old time/op new time/op delta
Build-16 10.4s ± 2% 10.4s ± 1% ~ (p=0.739 n=10+10)
name old bin-B new bin-B delta
Build-16 5.51M ± 0% 5.51M ± 0% ~ (all equal)
name old cached-time/op new cached-time/op delta
Build-16 391ms ± 9% 407ms ± 7% ~ (p=0.095 n=10+9)
name old mallocs/op new mallocs/op delta
Build-16 34.5M ± 0% 34.4M ± 0% -0.12% (p=0.000 n=10+10)
name old sys-time/op new sys-time/op delta
Build-16 5.87s ± 5% 5.82s ± 5% ~ (p=0.182 n=10+9)
It doesn't seem like much, but remember that these stats are for the
entire set of processes, where garble only accounts for about 10% of the
total wall time when compared to the compiler or linker. So a ~0.1%
decrease globally is still significant.
linkerVariableStrings is also indexed by *types.Var rather than types.Object,
since -ldflags=-X only supports setting the string value of variables.
This shouldn't make a significant difference in terms of allocs,
but at least the map is less prone to confusion with other object types.
To ensure the new code doesn't trip up on non-variables, we add test cases.
Finally, for the sake of clarity, index into the types.Info maps like
Defs and Uses rather than calling ObjectOf if we know whether the
identifier we have is a definition of a name or the use of a defined name.
This isn't better in terms of performance, as ObjectOf is a tiny method,
but just like with linkerVariableStrings before, the new code is clearer.
3 years ago
|
|
|
|
calledFunc, _ := tf.info.Uses[sel.Sel].(*types.Func)
|
|
|
|
|
if calledFunc == nil || calledFunc.Pkg() == nil {
|
|
|
|
|
return true
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
fullName := calledFunc.FullName()
|
|
|
|
|
for _, reflectParam := range cachedOutput.KnownReflectAPIs[fullName] {
|
|
|
|
|
// We need a range to handle any number of variadic arguments,
|
|
|
|
|
// which could be 0 or multiple.
|
|
|
|
|
// The non-variadic case is always one argument,
|
|
|
|
|
// but we still use the range to deduplicate code.
|
|
|
|
|
argStart := reflectParam.Position
|
|
|
|
|
argEnd := argStart + 1
|
|
|
|
|
if reflectParam.Variadic {
|
|
|
|
|
argEnd = len(call.Args)
|
|
|
|
|
}
|
|
|
|
|
for _, arg := range call.Args[argStart:argEnd] {
|
|
|
|
|
ident, ok := arg.(*ast.Ident)
|
|
|
|
|
if !ok {
|
|
|
|
|
continue
|
|
|
|
|
}
|
slight simplifications and alloc reductions
Reuse a buffer and a map across loop iterations, because we can.
Make recordTypeDone only track named types, as that is enough to detect
type cycles. Without named types, there can be no cycles.
These two reduce allocs by a fraction of a percent:
name old time/op new time/op delta
Build-16 10.4s ± 2% 10.4s ± 1% ~ (p=0.739 n=10+10)
name old bin-B new bin-B delta
Build-16 5.51M ± 0% 5.51M ± 0% ~ (all equal)
name old cached-time/op new cached-time/op delta
Build-16 391ms ± 9% 407ms ± 7% ~ (p=0.095 n=10+9)
name old mallocs/op new mallocs/op delta
Build-16 34.5M ± 0% 34.4M ± 0% -0.12% (p=0.000 n=10+10)
name old sys-time/op new sys-time/op delta
Build-16 5.87s ± 5% 5.82s ± 5% ~ (p=0.182 n=10+9)
It doesn't seem like much, but remember that these stats are for the
entire set of processes, where garble only accounts for about 10% of the
total wall time when compared to the compiler or linker. So a ~0.1%
decrease globally is still significant.
linkerVariableStrings is also indexed by *types.Var rather than types.Object,
since -ldflags=-X only supports setting the string value of variables.
This shouldn't make a significant difference in terms of allocs,
but at least the map is less prone to confusion with other object types.
To ensure the new code doesn't trip up on non-variables, we add test cases.
Finally, for the sake of clarity, index into the types.Info maps like
Defs and Uses rather than calling ObjectOf if we know whether the
identifier we have is a definition of a name or the use of a defined name.
This isn't better in terms of performance, as ObjectOf is a tiny method,
but just like with linkerVariableStrings before, the new code is clearer.
3 years ago
|
|
|
|
obj, _ := tf.info.Uses[ident].(*types.Var)
|
|
|
|
|
if obj == nil {
|
|
|
|
|
continue
|
|
|
|
|
}
|
|
|
|
|
if _, ok := seenReflectParams[obj]; ok {
|
|
|
|
|
seenReflectParams[obj] = true
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
var reflectParams []reflectParameter
|
|
|
|
|
for i := 0; i < funcParams.Len(); i++ {
|
|
|
|
|
if seenReflectParams[funcParams.At(i)] {
|
|
|
|
|
reflectParams = append(reflectParams, reflectParameter{
|
|
|
|
|
Position: i,
|
|
|
|
|
Variadic: funcType.Variadic() && i == funcParams.Len()-1,
|
|
|
|
|
})
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
if len(reflectParams) > 0 {
|
|
|
|
|
cachedOutput.KnownReflectAPIs[funcObj.FullName()] = reflectParams
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
return true
|
|
|
|
|
})
|
|
|
|
|
}
|
|
|
|
|
|
stop relying on nested "go list -toolexec" calls (#422)
We rely on importcfg files to load type info for obfuscated packages.
We use this type information to remember what names we didn't obfuscate.
Unfortunately, indirect dependencies aren't listed in importcfg files,
so we relied on extra "go list -toolexec" calls to locate object files.
This worked fine, but added a significant amount of complexity.
The extra "go list -export -toolexec=garble" invocations weren't slow,
as they avoided rebuilding or re-obfuscating thanks to the build cache.
Still, it was hard to reason about how garble runs during a build
if we might have multiple layers of -toolexec invocations.
Instead, record the export files we encounter in an incremental map,
and persist it in the build cache via the gob file we're already using.
This way, each garble invocation knows where all object files are,
even those for indirect imports.
One wrinkle is that importcfg files can point to temporary object files.
In that case, figure out its final location in the build cache.
This requires hard-coding a bit of knowledge about how GOCACHE works,
but it seems relatively harmless given how it's very little code.
Plus, if GOCACHE ever changes, it will be obvious when our code breaks.
Finally, add a TODO about potentially saving even more work.
3 years ago
|
|
|
|
lenPrevKnownReflectAPIs := len(cachedOutput.KnownReflectAPIs)
|
|
|
|
|
for _, file := range files {
|
|
|
|
|
for _, decl := range file.Decls {
|
|
|
|
|
if decl, ok := decl.(*ast.FuncDecl); ok {
|
|
|
|
|
visitFuncDecl(decl)
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// if a new reflectAPI is found we need to Re-evaluate all functions which might be using that API
|
stop relying on nested "go list -toolexec" calls (#422)
We rely on importcfg files to load type info for obfuscated packages.
We use this type information to remember what names we didn't obfuscate.
Unfortunately, indirect dependencies aren't listed in importcfg files,
so we relied on extra "go list -toolexec" calls to locate object files.
This worked fine, but added a significant amount of complexity.
The extra "go list -export -toolexec=garble" invocations weren't slow,
as they avoided rebuilding or re-obfuscating thanks to the build cache.
Still, it was hard to reason about how garble runs during a build
if we might have multiple layers of -toolexec invocations.
Instead, record the export files we encounter in an incremental map,
and persist it in the build cache via the gob file we're already using.
This way, each garble invocation knows where all object files are,
even those for indirect imports.
One wrinkle is that importcfg files can point to temporary object files.
In that case, figure out its final location in the build cache.
This requires hard-coding a bit of knowledge about how GOCACHE works,
but it seems relatively harmless given how it's very little code.
Plus, if GOCACHE ever changes, it will be obvious when our code breaks.
Finally, add a TODO about potentially saving even more work.
3 years ago
|
|
|
|
if len(cachedOutput.KnownReflectAPIs) > lenPrevKnownReflectAPIs {
|
|
|
|
|
tf.findReflectFunctions(files)
|
|
|
|
|
}
|
detect more std API calls which use reflection
Before, we would just notice direct calls to reflect's TypeOf and
ValueOf. Any other uses of reflection, such as encoding/json or
google.golang.org/protobuf, would require hints as documented by the
README.
Issue #162 outlines some ways we could fix this issue in a general way,
automatically detecting what functions use reflection on their parameters,
even for third party API funcs.
However, that goal is pretty significant in terms of code and effort.
As a temporary improvement, we can expand the list of "known" reflection
APIs via a static table.
Since this table is keyed by "func full name" strings, we could
potentially include third party APIs, such as:
google.golang.org/protobuf/proto.Marshal
However, for now simply include all the std APIs we know about.
If we fail to do the proper fix for automatic detection in the future,
we can then fall back to expanding this global table for third parties.
Update the README's docs, to clarify that the hint is not always
necessary anymore.
Also update the reflect.txt test to stop using the hint for encoding/json,
and to also start testing text/template with a method call.
While at it, I noticed that we weren't testing the println outputs,
as they'd go to stderr - fix that too.
Updates #162.
4 years ago
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// cmd/bundle will include a go:generate directive in its output by default.
|
|
|
|
|
// Ours specifies a version and doesn't assume bundle is in $PATH, so drop it.
|
|
|
|
|
|
|
|
|
|
//go:generate go run golang.org/x/tools/cmd/bundle@v0.1.9 -o cmdgo_quoted.go -prefix cmdgoQuoted cmd/internal/quoted
|
|
|
|
|
//go:generate sed -i /go:generate/d cmdgo_quoted.go
|
|
|
|
|
|
avoid obfuscating literals set via -ldflags=-X
The -X linker flag sets a string variable to a given value,
which is often used to inject strings such as versions.
The way garble's literal obfuscation works,
we replace string literals with anonymous functions which,
when evaluated, result in the original string.
Both of these features work fine separately,
but when intersecting, they break. For example, given:
var myVar = "original"
[...]
-ldflags=-X=main.myVar=replaced
The -X flag effectively replaces the initial value,
and -literals adds code to be run at init time:
var myVar = "replaced"
func init() { myVar = func() string { ... } }
Since the init func runs later, -literals breaks -X.
To avoid that problem,
don't obfuscate literals whose variables are set via -ldflags=-X.
We also leave TODOs about obfuscating those in the future,
but we're also leaving regression tests to ensure we get it right.
Fixes #323.
3 years ago
|
|
|
|
// prefillObjectMaps collects objects which should not be obfuscated,
|
|
|
|
|
// such as those used as arguments to reflect.TypeOf or reflect.ValueOf.
|
|
|
|
|
// Since we obfuscate one package at a time, we only detect those if the type
|
|
|
|
|
// definition and the reflect usage are both in the same package.
|
|
|
|
|
func (tf *transformer) prefillObjectMaps(files []*ast.File) error {
|
slight simplifications and alloc reductions
Reuse a buffer and a map across loop iterations, because we can.
Make recordTypeDone only track named types, as that is enough to detect
type cycles. Without named types, there can be no cycles.
These two reduce allocs by a fraction of a percent:
name old time/op new time/op delta
Build-16 10.4s ± 2% 10.4s ± 1% ~ (p=0.739 n=10+10)
name old bin-B new bin-B delta
Build-16 5.51M ± 0% 5.51M ± 0% ~ (all equal)
name old cached-time/op new cached-time/op delta
Build-16 391ms ± 9% 407ms ± 7% ~ (p=0.095 n=10+9)
name old mallocs/op new mallocs/op delta
Build-16 34.5M ± 0% 34.4M ± 0% -0.12% (p=0.000 n=10+10)
name old sys-time/op new sys-time/op delta
Build-16 5.87s ± 5% 5.82s ± 5% ~ (p=0.182 n=10+9)
It doesn't seem like much, but remember that these stats are for the
entire set of processes, where garble only accounts for about 10% of the
total wall time when compared to the compiler or linker. So a ~0.1%
decrease globally is still significant.
linkerVariableStrings is also indexed by *types.Var rather than types.Object,
since -ldflags=-X only supports setting the string value of variables.
This shouldn't make a significant difference in terms of allocs,
but at least the map is less prone to confusion with other object types.
To ensure the new code doesn't trip up on non-variables, we add test cases.
Finally, for the sake of clarity, index into the types.Info maps like
Defs and Uses rather than calling ObjectOf if we know whether the
identifier we have is a definition of a name or the use of a defined name.
This isn't better in terms of performance, as ObjectOf is a tiny method,
but just like with linkerVariableStrings before, the new code is clearer.
3 years ago
|
|
|
|
tf.linkerVariableStrings = make(map[*types.Var]string)
|
avoid obfuscating literals set via -ldflags=-X
The -X linker flag sets a string variable to a given value,
which is often used to inject strings such as versions.
The way garble's literal obfuscation works,
we replace string literals with anonymous functions which,
when evaluated, result in the original string.
Both of these features work fine separately,
but when intersecting, they break. For example, given:
var myVar = "original"
[...]
-ldflags=-X=main.myVar=replaced
The -X flag effectively replaces the initial value,
and -literals adds code to be run at init time:
var myVar = "replaced"
func init() { myVar = func() string { ... } }
Since the init func runs later, -literals breaks -X.
To avoid that problem,
don't obfuscate literals whose variables are set via -ldflags=-X.
We also leave TODOs about obfuscating those in the future,
but we're also leaving regression tests to ensure we get it right.
Fixes #323.
3 years ago
|
|
|
|
|
|
|
|
|
// TODO: this is a linker flag that affects how we obfuscate a package at
|
|
|
|
|
// compile time. Note that, if the user changes ldflags, then Go may only
|
|
|
|
|
// re-link the final binary, without re-compiling any packages at all.
|
|
|
|
|
// It's possible that this could result in:
|
|
|
|
|
//
|
|
|
|
|
// garble -literals build -ldflags=-X=pkg.name=before # name="before"
|
|
|
|
|
// garble -literals build -ldflags=-X=pkg.name=after # name="before" as cached
|
|
|
|
|
//
|
|
|
|
|
// We haven't been able to reproduce this problem for now,
|
|
|
|
|
// but it's worth noting it and keeping an eye out for it in the future.
|
|
|
|
|
// If we do confirm this theoretical bug,
|
|
|
|
|
// the solution will be to either find a different solution for -literals,
|
|
|
|
|
// or to force including -ldflags into the build cache key.
|
|
|
|
|
ldflags, err := cmdgoQuotedSplit(flagValue(cache.ForwardBuildFlags, "-ldflags"))
|
|
|
|
|
if err != nil {
|
|
|
|
|
return err
|
|
|
|
|
}
|
|
|
|
|
flagValueIter(ldflags, "-X", func(val string) {
|
|
|
|
|
// val is in the form of "foo.com/bar.name=value".
|
|
|
|
|
fullName, stringValue, found := strings.Cut(val, "=")
|
|
|
|
|
if !found {
|
avoid obfuscating literals set via -ldflags=-X
The -X linker flag sets a string variable to a given value,
which is often used to inject strings such as versions.
The way garble's literal obfuscation works,
we replace string literals with anonymous functions which,
when evaluated, result in the original string.
Both of these features work fine separately,
but when intersecting, they break. For example, given:
var myVar = "original"
[...]
-ldflags=-X=main.myVar=replaced
The -X flag effectively replaces the initial value,
and -literals adds code to be run at init time:
var myVar = "replaced"
func init() { myVar = func() string { ... } }
Since the init func runs later, -literals breaks -X.
To avoid that problem,
don't obfuscate literals whose variables are set via -ldflags=-X.
We also leave TODOs about obfuscating those in the future,
but we're also leaving regression tests to ensure we get it right.
Fixes #323.
3 years ago
|
|
|
|
return // invalid
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// fullName is "foo.com/bar.name"
|
|
|
|
|
i := strings.LastIndexByte(fullName, '.')
|
|
|
|
|
path, name := fullName[:i], fullName[i+1:]
|
avoid obfuscating literals set via -ldflags=-X
The -X linker flag sets a string variable to a given value,
which is often used to inject strings such as versions.
The way garble's literal obfuscation works,
we replace string literals with anonymous functions which,
when evaluated, result in the original string.
Both of these features work fine separately,
but when intersecting, they break. For example, given:
var myVar = "original"
[...]
-ldflags=-X=main.myVar=replaced
The -X flag effectively replaces the initial value,
and -literals adds code to be run at init time:
var myVar = "replaced"
func init() { myVar = func() string { ... } }
Since the init func runs later, -literals breaks -X.
To avoid that problem,
don't obfuscate literals whose variables are set via -ldflags=-X.
We also leave TODOs about obfuscating those in the future,
but we're also leaving regression tests to ensure we get it right.
Fixes #323.
3 years ago
|
|
|
|
|
|
|
|
|
// -X represents the main package as "main", not its import path.
|
|
|
|
|
if path != curPkg.ImportPath && !(path == "main" && curPkg.Name == "main") {
|
|
|
|
|
return // not the current package
|
|
|
|
|
}
|
|
|
|
|
|
slight simplifications and alloc reductions
Reuse a buffer and a map across loop iterations, because we can.
Make recordTypeDone only track named types, as that is enough to detect
type cycles. Without named types, there can be no cycles.
These two reduce allocs by a fraction of a percent:
name old time/op new time/op delta
Build-16 10.4s ± 2% 10.4s ± 1% ~ (p=0.739 n=10+10)
name old bin-B new bin-B delta
Build-16 5.51M ± 0% 5.51M ± 0% ~ (all equal)
name old cached-time/op new cached-time/op delta
Build-16 391ms ± 9% 407ms ± 7% ~ (p=0.095 n=10+9)
name old mallocs/op new mallocs/op delta
Build-16 34.5M ± 0% 34.4M ± 0% -0.12% (p=0.000 n=10+10)
name old sys-time/op new sys-time/op delta
Build-16 5.87s ± 5% 5.82s ± 5% ~ (p=0.182 n=10+9)
It doesn't seem like much, but remember that these stats are for the
entire set of processes, where garble only accounts for about 10% of the
total wall time when compared to the compiler or linker. So a ~0.1%
decrease globally is still significant.
linkerVariableStrings is also indexed by *types.Var rather than types.Object,
since -ldflags=-X only supports setting the string value of variables.
This shouldn't make a significant difference in terms of allocs,
but at least the map is less prone to confusion with other object types.
To ensure the new code doesn't trip up on non-variables, we add test cases.
Finally, for the sake of clarity, index into the types.Info maps like
Defs and Uses rather than calling ObjectOf if we know whether the
identifier we have is a definition of a name or the use of a defined name.
This isn't better in terms of performance, as ObjectOf is a tiny method,
but just like with linkerVariableStrings before, the new code is clearer.
3 years ago
|
|
|
|
obj, _ := tf.pkg.Scope().Lookup(name).(*types.Var)
|
avoid obfuscating literals set via -ldflags=-X
The -X linker flag sets a string variable to a given value,
which is often used to inject strings such as versions.
The way garble's literal obfuscation works,
we replace string literals with anonymous functions which,
when evaluated, result in the original string.
Both of these features work fine separately,
but when intersecting, they break. For example, given:
var myVar = "original"
[...]
-ldflags=-X=main.myVar=replaced
The -X flag effectively replaces the initial value,
and -literals adds code to be run at init time:
var myVar = "replaced"
func init() { myVar = func() string { ... } }
Since the init func runs later, -literals breaks -X.
To avoid that problem,
don't obfuscate literals whose variables are set via -ldflags=-X.
We also leave TODOs about obfuscating those in the future,
but we're also leaving regression tests to ensure we get it right.
Fixes #323.
3 years ago
|
|
|
|
if obj == nil {
|
slight simplifications and alloc reductions
Reuse a buffer and a map across loop iterations, because we can.
Make recordTypeDone only track named types, as that is enough to detect
type cycles. Without named types, there can be no cycles.
These two reduce allocs by a fraction of a percent:
name old time/op new time/op delta
Build-16 10.4s ± 2% 10.4s ± 1% ~ (p=0.739 n=10+10)
name old bin-B new bin-B delta
Build-16 5.51M ± 0% 5.51M ± 0% ~ (all equal)
name old cached-time/op new cached-time/op delta
Build-16 391ms ± 9% 407ms ± 7% ~ (p=0.095 n=10+9)
name old mallocs/op new mallocs/op delta
Build-16 34.5M ± 0% 34.4M ± 0% -0.12% (p=0.000 n=10+10)
name old sys-time/op new sys-time/op delta
Build-16 5.87s ± 5% 5.82s ± 5% ~ (p=0.182 n=10+9)
It doesn't seem like much, but remember that these stats are for the
entire set of processes, where garble only accounts for about 10% of the
total wall time when compared to the compiler or linker. So a ~0.1%
decrease globally is still significant.
linkerVariableStrings is also indexed by *types.Var rather than types.Object,
since -ldflags=-X only supports setting the string value of variables.
This shouldn't make a significant difference in terms of allocs,
but at least the map is less prone to confusion with other object types.
To ensure the new code doesn't trip up on non-variables, we add test cases.
Finally, for the sake of clarity, index into the types.Info maps like
Defs and Uses rather than calling ObjectOf if we know whether the
identifier we have is a definition of a name or the use of a defined name.
This isn't better in terms of performance, as ObjectOf is a tiny method,
but just like with linkerVariableStrings before, the new code is clearer.
3 years ago
|
|
|
|
return // no such variable; skip
|
avoid obfuscating literals set via -ldflags=-X
The -X linker flag sets a string variable to a given value,
which is often used to inject strings such as versions.
The way garble's literal obfuscation works,
we replace string literals with anonymous functions which,
when evaluated, result in the original string.
Both of these features work fine separately,
but when intersecting, they break. For example, given:
var myVar = "original"
[...]
-ldflags=-X=main.myVar=replaced
The -X flag effectively replaces the initial value,
and -literals adds code to be run at init time:
var myVar = "replaced"
func init() { myVar = func() string { ... } }
Since the init func runs later, -literals breaks -X.
To avoid that problem,
don't obfuscate literals whose variables are set via -ldflags=-X.
We also leave TODOs about obfuscating those in the future,
but we're also leaving regression tests to ensure we get it right.
Fixes #323.
3 years ago
|
|
|
|
}
|
|
|
|
|
tf.linkerVariableStrings[obj] = stringValue
|
|
|
|
|
})
|
|
|
|
|
|
|
|
|
|
visit := func(node ast.Node) bool {
|
|
|
|
|
call, ok := node.(*ast.CallExpr)
|
|
|
|
|
if !ok {
|
|
|
|
|
return true
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
ident, ok := call.Fun.(*ast.Ident)
|
|
|
|
|
if !ok {
|
|
|
|
|
sel, ok := call.Fun.(*ast.SelectorExpr)
|
|
|
|
|
if !ok {
|
|
|
|
|
return true
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
ident = sel.Sel
|
|
|
|
|
}
|
|
|
|
|
|
slight simplifications and alloc reductions
Reuse a buffer and a map across loop iterations, because we can.
Make recordTypeDone only track named types, as that is enough to detect
type cycles. Without named types, there can be no cycles.
These two reduce allocs by a fraction of a percent:
name old time/op new time/op delta
Build-16 10.4s ± 2% 10.4s ± 1% ~ (p=0.739 n=10+10)
name old bin-B new bin-B delta
Build-16 5.51M ± 0% 5.51M ± 0% ~ (all equal)
name old cached-time/op new cached-time/op delta
Build-16 391ms ± 9% 407ms ± 7% ~ (p=0.095 n=10+9)
name old mallocs/op new mallocs/op delta
Build-16 34.5M ± 0% 34.4M ± 0% -0.12% (p=0.000 n=10+10)
name old sys-time/op new sys-time/op delta
Build-16 5.87s ± 5% 5.82s ± 5% ~ (p=0.182 n=10+9)
It doesn't seem like much, but remember that these stats are for the
entire set of processes, where garble only accounts for about 10% of the
total wall time when compared to the compiler or linker. So a ~0.1%
decrease globally is still significant.
linkerVariableStrings is also indexed by *types.Var rather than types.Object,
since -ldflags=-X only supports setting the string value of variables.
This shouldn't make a significant difference in terms of allocs,
but at least the map is less prone to confusion with other object types.
To ensure the new code doesn't trip up on non-variables, we add test cases.
Finally, for the sake of clarity, index into the types.Info maps like
Defs and Uses rather than calling ObjectOf if we know whether the
identifier we have is a definition of a name or the use of a defined name.
This isn't better in terms of performance, as ObjectOf is a tiny method,
but just like with linkerVariableStrings before, the new code is clearer.
3 years ago
|
|
|
|
fnType, _ := tf.info.Uses[ident].(*types.Func)
|
detect more std API calls which use reflection
Before, we would just notice direct calls to reflect's TypeOf and
ValueOf. Any other uses of reflection, such as encoding/json or
google.golang.org/protobuf, would require hints as documented by the
README.
Issue #162 outlines some ways we could fix this issue in a general way,
automatically detecting what functions use reflection on their parameters,
even for third party API funcs.
However, that goal is pretty significant in terms of code and effort.
As a temporary improvement, we can expand the list of "known" reflection
APIs via a static table.
Since this table is keyed by "func full name" strings, we could
potentially include third party APIs, such as:
google.golang.org/protobuf/proto.Marshal
However, for now simply include all the std APIs we know about.
If we fail to do the proper fix for automatic detection in the future,
we can then fall back to expanding this global table for third parties.
Update the README's docs, to clarify that the hint is not always
necessary anymore.
Also update the reflect.txt test to stop using the hint for encoding/json,
and to also start testing text/template with a method call.
While at it, I noticed that we weren't testing the println outputs,
as they'd go to stderr - fix that too.
Updates #162.
4 years ago
|
|
|
|
if fnType == nil || fnType.Pkg() == nil {
|
|
|
|
|
return true
|
|
|
|
|
}
|
|
|
|
|
|
detect more std API calls which use reflection
Before, we would just notice direct calls to reflect's TypeOf and
ValueOf. Any other uses of reflection, such as encoding/json or
google.golang.org/protobuf, would require hints as documented by the
README.
Issue #162 outlines some ways we could fix this issue in a general way,
automatically detecting what functions use reflection on their parameters,
even for third party API funcs.
However, that goal is pretty significant in terms of code and effort.
As a temporary improvement, we can expand the list of "known" reflection
APIs via a static table.
Since this table is keyed by "func full name" strings, we could
potentially include third party APIs, such as:
google.golang.org/protobuf/proto.Marshal
However, for now simply include all the std APIs we know about.
If we fail to do the proper fix for automatic detection in the future,
we can then fall back to expanding this global table for third parties.
Update the README's docs, to clarify that the hint is not always
necessary anymore.
Also update the reflect.txt test to stop using the hint for encoding/json,
and to also start testing text/template with a method call.
While at it, I noticed that we weren't testing the println outputs,
as they'd go to stderr - fix that too.
Updates #162.
4 years ago
|
|
|
|
fullName := fnType.FullName()
|
|
|
|
|
for _, reflectParam := range cachedOutput.KnownReflectAPIs[fullName] {
|
|
|
|
|
argStart := reflectParam.Position
|
|
|
|
|
argEnd := argStart + 1
|
|
|
|
|
if reflectParam.Variadic {
|
|
|
|
|
argEnd = len(call.Args)
|
|
|
|
|
}
|
|
|
|
|
for _, arg := range call.Args[argStart:argEnd] {
|
|
|
|
|
argType := tf.info.TypeOf(arg)
|
|
|
|
|
tf.recursivelyRecordAsNotObfuscated(argType)
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
return true
|
|
|
|
|
}
|
|
|
|
|
for _, file := range files {
|
|
|
|
|
ast.Inspect(file, visit)
|
|
|
|
|
}
|
|
|
|
|
return nil
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// transformer holds all the information and state necessary to obfuscate a
|
|
|
|
|
// single Go package.
|
|
|
|
|
type transformer struct {
|
|
|
|
|
// The type-checking results; the package itself, and the Info struct.
|
|
|
|
|
pkg *types.Package
|
|
|
|
|
info *types.Info
|
|
|
|
|
|
avoid obfuscating literals set via -ldflags=-X
The -X linker flag sets a string variable to a given value,
which is often used to inject strings such as versions.
The way garble's literal obfuscation works,
we replace string literals with anonymous functions which,
when evaluated, result in the original string.
Both of these features work fine separately,
but when intersecting, they break. For example, given:
var myVar = "original"
[...]
-ldflags=-X=main.myVar=replaced
The -X flag effectively replaces the initial value,
and -literals adds code to be run at init time:
var myVar = "replaced"
func init() { myVar = func() string { ... } }
Since the init func runs later, -literals breaks -X.
To avoid that problem,
don't obfuscate literals whose variables are set via -ldflags=-X.
We also leave TODOs about obfuscating those in the future,
but we're also leaving regression tests to ensure we get it right.
Fixes #323.
3 years ago
|
|
|
|
// linkerVariableStrings is also initialized by prefillObjectMaps.
|
|
|
|
|
// It records objects for variables used in -ldflags=-X flags,
|
|
|
|
|
// as well as the strings the user wants to inject them with.
|
slight simplifications and alloc reductions
Reuse a buffer and a map across loop iterations, because we can.
Make recordTypeDone only track named types, as that is enough to detect
type cycles. Without named types, there can be no cycles.
These two reduce allocs by a fraction of a percent:
name old time/op new time/op delta
Build-16 10.4s ± 2% 10.4s ± 1% ~ (p=0.739 n=10+10)
name old bin-B new bin-B delta
Build-16 5.51M ± 0% 5.51M ± 0% ~ (all equal)
name old cached-time/op new cached-time/op delta
Build-16 391ms ± 9% 407ms ± 7% ~ (p=0.095 n=10+9)
name old mallocs/op new mallocs/op delta
Build-16 34.5M ± 0% 34.4M ± 0% -0.12% (p=0.000 n=10+10)
name old sys-time/op new sys-time/op delta
Build-16 5.87s ± 5% 5.82s ± 5% ~ (p=0.182 n=10+9)
It doesn't seem like much, but remember that these stats are for the
entire set of processes, where garble only accounts for about 10% of the
total wall time when compared to the compiler or linker. So a ~0.1%
decrease globally is still significant.
linkerVariableStrings is also indexed by *types.Var rather than types.Object,
since -ldflags=-X only supports setting the string value of variables.
This shouldn't make a significant difference in terms of allocs,
but at least the map is less prone to confusion with other object types.
To ensure the new code doesn't trip up on non-variables, we add test cases.
Finally, for the sake of clarity, index into the types.Info maps like
Defs and Uses rather than calling ObjectOf if we know whether the
identifier we have is a definition of a name or the use of a defined name.
This isn't better in terms of performance, as ObjectOf is a tiny method,
but just like with linkerVariableStrings before, the new code is clearer.
3 years ago
|
|
|
|
linkerVariableStrings map[*types.Var]string
|
hash field names equally in all packages
Packages P1 and P2 can define identical struct types T1 and T2, and one
can convert from type T1 to T2 or vice versa.
The spec defines two identical struct types as:
Two struct types are identical if they have the same sequence of
fields, and if corresponding fields have the same names, and
identical types, and identical tags. Non-exported field names
from different packages are always different.
Unfortunately, garble broke this: since we obfuscated field names
differently depending on the package, cross-package conversions like the
case above would result in typechecking errors.
To fix this, implement Joe Tsai's idea: hash struct field names with the
string representation of the entire struct. This way, identical struct
types will have their field names obfuscated in the same way in all
packages across a build.
Note that we had to refactor "reverse" a bit to start using transformer,
since now it needs to keep track of struct types as well.
This failure was affecting the build of google.golang.org/protobuf,
since it makes regular use of cross-package struct conversions.
Note that the protobuf module still fails to build, but for other
reasons. The package that used to fail now succeeds, so the build gets a
bit further than before. #240 tracks adding relevant third-party Go
modules to CI, so we'll track the other remaining failures there.
Fixes #310.
4 years ago
|
|
|
|
|
slight simplifications and alloc reductions
Reuse a buffer and a map across loop iterations, because we can.
Make recordTypeDone only track named types, as that is enough to detect
type cycles. Without named types, there can be no cycles.
These two reduce allocs by a fraction of a percent:
name old time/op new time/op delta
Build-16 10.4s ± 2% 10.4s ± 1% ~ (p=0.739 n=10+10)
name old bin-B new bin-B delta
Build-16 5.51M ± 0% 5.51M ± 0% ~ (all equal)
name old cached-time/op new cached-time/op delta
Build-16 391ms ± 9% 407ms ± 7% ~ (p=0.095 n=10+9)
name old mallocs/op new mallocs/op delta
Build-16 34.5M ± 0% 34.4M ± 0% -0.12% (p=0.000 n=10+10)
name old sys-time/op new sys-time/op delta
Build-16 5.87s ± 5% 5.82s ± 5% ~ (p=0.182 n=10+9)
It doesn't seem like much, but remember that these stats are for the
entire set of processes, where garble only accounts for about 10% of the
total wall time when compared to the compiler or linker. So a ~0.1%
decrease globally is still significant.
linkerVariableStrings is also indexed by *types.Var rather than types.Object,
since -ldflags=-X only supports setting the string value of variables.
This shouldn't make a significant difference in terms of allocs,
but at least the map is less prone to confusion with other object types.
To ensure the new code doesn't trip up on non-variables, we add test cases.
Finally, for the sake of clarity, index into the types.Info maps like
Defs and Uses rather than calling ObjectOf if we know whether the
identifier we have is a definition of a name or the use of a defined name.
This isn't better in terms of performance, as ObjectOf is a tiny method,
but just like with linkerVariableStrings before, the new code is clearer.
3 years ago
|
|
|
|
// recordTypeDone helps avoid type cycles in recordType.
|
|
|
|
|
// We only need to track named types, as all cycles must use them.
|
|
|
|
|
recordTypeDone map[*types.Named]bool
|
handle aliases to foreign named types properly
When such an alias name was used to define an embedded field, we handled
that case gracefully via the code using:
tf.info.Uses[node].(*types.TypeName)
Unfortunately, when the same field name was used elsewhere, such as a
composite literal, tf.Info.Uses gave us a *types.Var, not a
*types.TypeName, meaning we could no longer tell if this was an alias,
or what it pointed to.
Thus, we failed to obfuscate the name properly in the added test case:
> garble build
[stderr]
# test/main/sub
xxWZf66u.go:36: unknown field 'foreignAlias' in struct literal of type smhWelwn
It doesn't seem like any of the go/types APIs allows us to obtain the
*types.TypeName directly in this scenario. Thus, use a trick that we
used before: after typechecking, but before obfuscating, record all
embedded struct field *types.Var which are aliases via a map, where the
value holds the *types.TypeName for the alias.
Updates #349.
4 years ago
|
|
|
|
|
|
|
|
|
// fieldToStruct helps locate struct types from any of their field
|
|
|
|
|
// objects. Useful when obfuscating field names.
|
|
|
|
|
fieldToStruct map[*types.Var]*types.Struct
|
hash field names equally in all packages
Packages P1 and P2 can define identical struct types T1 and T2, and one
can convert from type T1 to T2 or vice versa.
The spec defines two identical struct types as:
Two struct types are identical if they have the same sequence of
fields, and if corresponding fields have the same names, and
identical types, and identical tags. Non-exported field names
from different packages are always different.
Unfortunately, garble broke this: since we obfuscated field names
differently depending on the package, cross-package conversions like the
case above would result in typechecking errors.
To fix this, implement Joe Tsai's idea: hash struct field names with the
string representation of the entire struct. This way, identical struct
types will have their field names obfuscated in the same way in all
packages across a build.
Note that we had to refactor "reverse" a bit to start using transformer,
since now it needs to keep track of struct types as well.
This failure was affecting the build of google.golang.org/protobuf,
since it makes regular use of cross-package struct conversions.
Note that the protobuf module still fails to build, but for other
reasons. The package that used to fail now succeeds, so the build gets a
bit further than before. #240 tracks adding relevant third-party Go
modules to CI, so we'll track the other remaining failures there.
Fixes #310.
4 years ago
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// newTransformer helps initialize some maps.
|
|
|
|
|
func newTransformer() *transformer {
|
|
|
|
|
return &transformer{
|
|
|
|
|
info: &types.Info{
|
|
|
|
|
Types: make(map[ast.Expr]types.TypeAndValue),
|
|
|
|
|
Defs: make(map[*ast.Ident]types.Object),
|
|
|
|
|
Uses: make(map[*ast.Ident]types.Object),
|
|
|
|
|
},
|
slight simplifications and alloc reductions
Reuse a buffer and a map across loop iterations, because we can.
Make recordTypeDone only track named types, as that is enough to detect
type cycles. Without named types, there can be no cycles.
These two reduce allocs by a fraction of a percent:
name old time/op new time/op delta
Build-16 10.4s ± 2% 10.4s ± 1% ~ (p=0.739 n=10+10)
name old bin-B new bin-B delta
Build-16 5.51M ± 0% 5.51M ± 0% ~ (all equal)
name old cached-time/op new cached-time/op delta
Build-16 391ms ± 9% 407ms ± 7% ~ (p=0.095 n=10+9)
name old mallocs/op new mallocs/op delta
Build-16 34.5M ± 0% 34.4M ± 0% -0.12% (p=0.000 n=10+10)
name old sys-time/op new sys-time/op delta
Build-16 5.87s ± 5% 5.82s ± 5% ~ (p=0.182 n=10+9)
It doesn't seem like much, but remember that these stats are for the
entire set of processes, where garble only accounts for about 10% of the
total wall time when compared to the compiler or linker. So a ~0.1%
decrease globally is still significant.
linkerVariableStrings is also indexed by *types.Var rather than types.Object,
since -ldflags=-X only supports setting the string value of variables.
This shouldn't make a significant difference in terms of allocs,
but at least the map is less prone to confusion with other object types.
To ensure the new code doesn't trip up on non-variables, we add test cases.
Finally, for the sake of clarity, index into the types.Info maps like
Defs and Uses rather than calling ObjectOf if we know whether the
identifier we have is a definition of a name or the use of a defined name.
This isn't better in terms of performance, as ObjectOf is a tiny method,
but just like with linkerVariableStrings before, the new code is clearer.
3 years ago
|
|
|
|
recordTypeDone: make(map[*types.Named]bool),
|
hash field names equally in all packages
Packages P1 and P2 can define identical struct types T1 and T2, and one
can convert from type T1 to T2 or vice versa.
The spec defines two identical struct types as:
Two struct types are identical if they have the same sequence of
fields, and if corresponding fields have the same names, and
identical types, and identical tags. Non-exported field names
from different packages are always different.
Unfortunately, garble broke this: since we obfuscated field names
differently depending on the package, cross-package conversions like the
case above would result in typechecking errors.
To fix this, implement Joe Tsai's idea: hash struct field names with the
string representation of the entire struct. This way, identical struct
types will have their field names obfuscated in the same way in all
packages across a build.
Note that we had to refactor "reverse" a bit to start using transformer,
since now it needs to keep track of struct types as well.
This failure was affecting the build of google.golang.org/protobuf,
since it makes regular use of cross-package struct conversions.
Note that the protobuf module still fails to build, but for other
reasons. The package that used to fail now succeeds, so the build gets a
bit further than before. #240 tracks adding relevant third-party Go
modules to CI, so we'll track the other remaining failures there.
Fixes #310.
4 years ago
|
|
|
|
fieldToStruct: make(map[*types.Var]*types.Struct),
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
func (tf *transformer) typecheck(files []*ast.File) error {
|
|
|
|
|
origTypesConfig := types.Config{Importer: origImporter}
|
|
|
|
|
pkg, err := origTypesConfig.Check(curPkg.ImportPath, fset, files, tf.info)
|
|
|
|
|
if err != nil {
|
|
|
|
|
return fmt.Errorf("typecheck error: %v", err)
|
|
|
|
|
}
|
|
|
|
|
tf.pkg = pkg
|
|
|
|
|
|
|
|
|
|
// Run recordType on all types reachable via types.Info.
|
|
|
|
|
// A bit hacky, but I could not find an easier way to do this.
|
|
|
|
|
for _, obj := range tf.info.Defs {
|
|
|
|
|
if obj != nil {
|
|
|
|
|
tf.recordType(obj.Type(), nil)
|
hash field names equally in all packages
Packages P1 and P2 can define identical struct types T1 and T2, and one
can convert from type T1 to T2 or vice versa.
The spec defines two identical struct types as:
Two struct types are identical if they have the same sequence of
fields, and if corresponding fields have the same names, and
identical types, and identical tags. Non-exported field names
from different packages are always different.
Unfortunately, garble broke this: since we obfuscated field names
differently depending on the package, cross-package conversions like the
case above would result in typechecking errors.
To fix this, implement Joe Tsai's idea: hash struct field names with the
string representation of the entire struct. This way, identical struct
types will have their field names obfuscated in the same way in all
packages across a build.
Note that we had to refactor "reverse" a bit to start using transformer,
since now it needs to keep track of struct types as well.
This failure was affecting the build of google.golang.org/protobuf,
since it makes regular use of cross-package struct conversions.
Note that the protobuf module still fails to build, but for other
reasons. The package that used to fail now succeeds, so the build gets a
bit further than before. #240 tracks adding relevant third-party Go
modules to CI, so we'll track the other remaining failures there.
Fixes #310.
4 years ago
|
|
|
|
}
|
|
|
|
|
}
|
handle aliases to foreign named types properly
When such an alias name was used to define an embedded field, we handled
that case gracefully via the code using:
tf.info.Uses[node].(*types.TypeName)
Unfortunately, when the same field name was used elsewhere, such as a
composite literal, tf.Info.Uses gave us a *types.Var, not a
*types.TypeName, meaning we could no longer tell if this was an alias,
or what it pointed to.
Thus, we failed to obfuscate the name properly in the added test case:
> garble build
[stderr]
# test/main/sub
xxWZf66u.go:36: unknown field 'foreignAlias' in struct literal of type smhWelwn
It doesn't seem like any of the go/types APIs allows us to obtain the
*types.TypeName directly in this scenario. Thus, use a trick that we
used before: after typechecking, but before obfuscating, record all
embedded struct field *types.Var which are aliases via a map, where the
value holds the *types.TypeName for the alias.
Updates #349.
4 years ago
|
|
|
|
for name, obj := range tf.info.Uses {
|
properly record when type aliases are embedded as fields
There are two scenarios when it comes to embedding fields.
The first is easy, and we always handled it well:
type Named struct { Foo int }
type T struct { Named }
In this scenario, T ends up with an embedded field named "Named",
and a promoted field named "Foo".
Then there's the form with a type alias:
type Named struct { Foo int }
type Alias = Named
type T struct { Alias }
This case is different: T ends up with an embedded field named "Alias",
and a promoted field named "Foo".
Note how the field gets its name from the referenced type,
even if said type is just an alias to another type.
This poses two problems.
First, we must obfuscate the field T.Alias as the name "Alias",
and not as the name "Named" that the alias points to.
Second, we must be careful of cases where Named and Alias are declared
in different packages, as they will obfuscate the same name differently.
Both of those problems compounded in the reported issue.
The actual reason is that quic-go has a type alias in the form of:
type ConnectionState = qtls.ConnectionState
In other words, the entire problem boils down to a type alias which
points to a named type in a different package, where both types share
the same name. For example:
package parent
import "parent/p1"
type T struct { p1.SameName }
[...]
package p1
import "parent/p2"
type SameName = p2.SameName
[...]
package p2
type SameName struct { Foo int }
This broke garble because we had a heuristic to detect when an embedded
field was a type alias:
// Instead, detect such a "foreign alias embed".
// If we embed a final named type,
// but the field name does not match its name,
// then it must have been done via an alias.
// We dig out the alias's TypeName via locateForeignAlias.
if named.Obj().Name() != node.Name {
As the reader can deduce, this heuristic would incorrectly assume that
the snippet above does not embed a type alias, when in fact it does.
When obfuscating the field T.SameName, which uses a type alias,
we would correctly obfuscate the name "SameName",
but we would incorrectly obfuscate it with the package p2, not p1.
This would then result in build errors.
To fix this problem for good, we need to get rid of the heuristic.
Instead, we now mimic what was done for KnownCannotObfuscate,
but for embedded fields which use type aliases.
KnownEmbeddedAliasFields is now filled for each package
and stored in the cache as part of cachedOutput.
We can then detect the "embedded alias" case reliably,
even when the field is declared in an imported package.
On the plus side, we get to remove locateForeignAlias.
We also add a couple of TODOs to record further improvements.
Finally, add a test.
Fixes #466.
3 years ago
|
|
|
|
if obj == nil {
|
|
|
|
|
continue
|
hash field names equally in all packages
Packages P1 and P2 can define identical struct types T1 and T2, and one
can convert from type T1 to T2 or vice versa.
The spec defines two identical struct types as:
Two struct types are identical if they have the same sequence of
fields, and if corresponding fields have the same names, and
identical types, and identical tags. Non-exported field names
from different packages are always different.
Unfortunately, garble broke this: since we obfuscated field names
differently depending on the package, cross-package conversions like the
case above would result in typechecking errors.
To fix this, implement Joe Tsai's idea: hash struct field names with the
string representation of the entire struct. This way, identical struct
types will have their field names obfuscated in the same way in all
packages across a build.
Note that we had to refactor "reverse" a bit to start using transformer,
since now it needs to keep track of struct types as well.
This failure was affecting the build of google.golang.org/protobuf,
since it makes regular use of cross-package struct conversions.
Note that the protobuf module still fails to build, but for other
reasons. The package that used to fail now succeeds, so the build gets a
bit further than before. #240 tracks adding relevant third-party Go
modules to CI, so we'll track the other remaining failures there.
Fixes #310.
4 years ago
|
|
|
|
}
|
|
|
|
|
tf.recordType(obj.Type(), nil)
|
properly record when type aliases are embedded as fields
There are two scenarios when it comes to embedding fields.
The first is easy, and we always handled it well:
type Named struct { Foo int }
type T struct { Named }
In this scenario, T ends up with an embedded field named "Named",
and a promoted field named "Foo".
Then there's the form with a type alias:
type Named struct { Foo int }
type Alias = Named
type T struct { Alias }
This case is different: T ends up with an embedded field named "Alias",
and a promoted field named "Foo".
Note how the field gets its name from the referenced type,
even if said type is just an alias to another type.
This poses two problems.
First, we must obfuscate the field T.Alias as the name "Alias",
and not as the name "Named" that the alias points to.
Second, we must be careful of cases where Named and Alias are declared
in different packages, as they will obfuscate the same name differently.
Both of those problems compounded in the reported issue.
The actual reason is that quic-go has a type alias in the form of:
type ConnectionState = qtls.ConnectionState
In other words, the entire problem boils down to a type alias which
points to a named type in a different package, where both types share
the same name. For example:
package parent
import "parent/p1"
type T struct { p1.SameName }
[...]
package p1
import "parent/p2"
type SameName = p2.SameName
[...]
package p2
type SameName struct { Foo int }
This broke garble because we had a heuristic to detect when an embedded
field was a type alias:
// Instead, detect such a "foreign alias embed".
// If we embed a final named type,
// but the field name does not match its name,
// then it must have been done via an alias.
// We dig out the alias's TypeName via locateForeignAlias.
if named.Obj().Name() != node.Name {
As the reader can deduce, this heuristic would incorrectly assume that
the snippet above does not embed a type alias, when in fact it does.
When obfuscating the field T.SameName, which uses a type alias,
we would correctly obfuscate the name "SameName",
but we would incorrectly obfuscate it with the package p2, not p1.
This would then result in build errors.
To fix this problem for good, we need to get rid of the heuristic.
Instead, we now mimic what was done for KnownCannotObfuscate,
but for embedded fields which use type aliases.
KnownEmbeddedAliasFields is now filled for each package
and stored in the cache as part of cachedOutput.
We can then detect the "embedded alias" case reliably,
even when the field is declared in an imported package.
On the plus side, we get to remove locateForeignAlias.
We also add a couple of TODOs to record further improvements.
Finally, add a test.
Fixes #466.
3 years ago
|
|
|
|
|
|
|
|
|
// Record into KnownEmbeddedAliasFields.
|
|
|
|
|
obj, ok := obj.(*types.TypeName)
|
|
|
|
|
if !ok || !obj.IsAlias() {
|
|
|
|
|
continue
|
|
|
|
|
}
|
|
|
|
|
vr, _ := tf.info.Defs[name].(*types.Var)
|
|
|
|
|
if vr == nil || !vr.Embedded() {
|
|
|
|
|
continue
|
|
|
|
|
}
|
|
|
|
|
vrStr := recordedObjectString(vr)
|
|
|
|
|
if vrStr == "" {
|
|
|
|
|
continue
|
|
|
|
|
}
|
|
|
|
|
aliasTypeName := typeName{
|
|
|
|
|
PkgPath: obj.Pkg().Path(),
|
|
|
|
|
Name: obj.Name(),
|
|
|
|
|
}
|
|
|
|
|
cachedOutput.KnownEmbeddedAliasFields[vrStr] = aliasTypeName
|
hash field names equally in all packages
Packages P1 and P2 can define identical struct types T1 and T2, and one
can convert from type T1 to T2 or vice versa.
The spec defines two identical struct types as:
Two struct types are identical if they have the same sequence of
fields, and if corresponding fields have the same names, and
identical types, and identical tags. Non-exported field names
from different packages are always different.
Unfortunately, garble broke this: since we obfuscated field names
differently depending on the package, cross-package conversions like the
case above would result in typechecking errors.
To fix this, implement Joe Tsai's idea: hash struct field names with the
string representation of the entire struct. This way, identical struct
types will have their field names obfuscated in the same way in all
packages across a build.
Note that we had to refactor "reverse" a bit to start using transformer,
since now it needs to keep track of struct types as well.
This failure was affecting the build of google.golang.org/protobuf,
since it makes regular use of cross-package struct conversions.
Note that the protobuf module still fails to build, but for other
reasons. The package that used to fail now succeeds, so the build gets a
bit further than before. #240 tracks adding relevant third-party Go
modules to CI, so we'll track the other remaining failures there.
Fixes #310.
4 years ago
|
|
|
|
}
|
|
|
|
|
for _, tv := range tf.info.Types {
|
|
|
|
|
tf.recordType(tv.Type, nil)
|
hash field names equally in all packages
Packages P1 and P2 can define identical struct types T1 and T2, and one
can convert from type T1 to T2 or vice versa.
The spec defines two identical struct types as:
Two struct types are identical if they have the same sequence of
fields, and if corresponding fields have the same names, and
identical types, and identical tags. Non-exported field names
from different packages are always different.
Unfortunately, garble broke this: since we obfuscated field names
differently depending on the package, cross-package conversions like the
case above would result in typechecking errors.
To fix this, implement Joe Tsai's idea: hash struct field names with the
string representation of the entire struct. This way, identical struct
types will have their field names obfuscated in the same way in all
packages across a build.
Note that we had to refactor "reverse" a bit to start using transformer,
since now it needs to keep track of struct types as well.
This failure was affecting the build of google.golang.org/protobuf,
since it makes regular use of cross-package struct conversions.
Note that the protobuf module still fails to build, but for other
reasons. The package that used to fail now succeeds, so the build gets a
bit further than before. #240 tracks adding relevant third-party Go
modules to CI, so we'll track the other remaining failures there.
Fixes #310.
4 years ago
|
|
|
|
}
|
|
|
|
|
return nil
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// recordType visits every reachable type after typechecking a package.
|
|
|
|
|
// Right now, all it does is fill the fieldToStruct field.
|
|
|
|
|
// Since types can be recursive, we need a map to avoid cycles.
|
|
|
|
|
func (tf *transformer) recordType(used, origin types.Type) {
|
|
|
|
|
if origin == nil {
|
|
|
|
|
origin = used
|
hash field names equally in all packages
Packages P1 and P2 can define identical struct types T1 and T2, and one
can convert from type T1 to T2 or vice versa.
The spec defines two identical struct types as:
Two struct types are identical if they have the same sequence of
fields, and if corresponding fields have the same names, and
identical types, and identical tags. Non-exported field names
from different packages are always different.
Unfortunately, garble broke this: since we obfuscated field names
differently depending on the package, cross-package conversions like the
case above would result in typechecking errors.
To fix this, implement Joe Tsai's idea: hash struct field names with the
string representation of the entire struct. This way, identical struct
types will have their field names obfuscated in the same way in all
packages across a build.
Note that we had to refactor "reverse" a bit to start using transformer,
since now it needs to keep track of struct types as well.
This failure was affecting the build of google.golang.org/protobuf,
since it makes regular use of cross-package struct conversions.
Note that the protobuf module still fails to build, but for other
reasons. The package that used to fail now succeeds, so the build gets a
bit further than before. #240 tracks adding relevant third-party Go
modules to CI, so we'll track the other remaining failures there.
Fixes #310.
4 years ago
|
|
|
|
}
|
|
|
|
|
type Container interface{ Elem() types.Type }
|
|
|
|
|
switch used := used.(type) {
|
|
|
|
|
case Container:
|
|
|
|
|
// origin may be a *types.TypeParam, which is not a Container.
|
|
|
|
|
// For now, we haven't found a need to recurse in that case.
|
|
|
|
|
// We can edit this code in the future if we find an example,
|
|
|
|
|
// because we panic if a field is not in fieldToStruct.
|
|
|
|
|
if origin, ok := origin.(Container); ok {
|
|
|
|
|
tf.recordType(used.Elem(), origin.Elem())
|
|
|
|
|
}
|
|
|
|
|
case *types.Named:
|
slight simplifications and alloc reductions
Reuse a buffer and a map across loop iterations, because we can.
Make recordTypeDone only track named types, as that is enough to detect
type cycles. Without named types, there can be no cycles.
These two reduce allocs by a fraction of a percent:
name old time/op new time/op delta
Build-16 10.4s ± 2% 10.4s ± 1% ~ (p=0.739 n=10+10)
name old bin-B new bin-B delta
Build-16 5.51M ± 0% 5.51M ± 0% ~ (all equal)
name old cached-time/op new cached-time/op delta
Build-16 391ms ± 9% 407ms ± 7% ~ (p=0.095 n=10+9)
name old mallocs/op new mallocs/op delta
Build-16 34.5M ± 0% 34.4M ± 0% -0.12% (p=0.000 n=10+10)
name old sys-time/op new sys-time/op delta
Build-16 5.87s ± 5% 5.82s ± 5% ~ (p=0.182 n=10+9)
It doesn't seem like much, but remember that these stats are for the
entire set of processes, where garble only accounts for about 10% of the
total wall time when compared to the compiler or linker. So a ~0.1%
decrease globally is still significant.
linkerVariableStrings is also indexed by *types.Var rather than types.Object,
since -ldflags=-X only supports setting the string value of variables.
This shouldn't make a significant difference in terms of allocs,
but at least the map is less prone to confusion with other object types.
To ensure the new code doesn't trip up on non-variables, we add test cases.
Finally, for the sake of clarity, index into the types.Info maps like
Defs and Uses rather than calling ObjectOf if we know whether the
identifier we have is a definition of a name or the use of a defined name.
This isn't better in terms of performance, as ObjectOf is a tiny method,
but just like with linkerVariableStrings before, the new code is clearer.
3 years ago
|
|
|
|
if tf.recordTypeDone[used] {
|
|
|
|
|
return
|
|
|
|
|
}
|
|
|
|
|
tf.recordTypeDone[used] = true
|
|
|
|
|
// If we have a generic struct like
|
|
|
|
|
//
|
|
|
|
|
// type Foo[T any] struct { Bar T }
|
|
|
|
|
//
|
|
|
|
|
// then we want the hashing to use the original "Bar T",
|
|
|
|
|
// because otherwise different instances like "Bar int" and "Bar bool"
|
|
|
|
|
// will result in different hashes and the field names will break.
|
|
|
|
|
// Ensure we record the original generic struct, if there is one.
|
|
|
|
|
tf.recordType(used.Underlying(), used.Origin().Underlying())
|
|
|
|
|
case *types.Struct:
|
|
|
|
|
origin := origin.(*types.Struct)
|
|
|
|
|
for i := 0; i < used.NumFields(); i++ {
|
|
|
|
|
field := used.Field(i)
|
|
|
|
|
tf.fieldToStruct[field] = origin
|
|
|
|
|
|
|
|
|
|
if field.Embedded() {
|
|
|
|
|
tf.recordType(field.Type(), origin.Field(i).Type())
|
|
|
|
|
}
|
hash field names equally in all packages
Packages P1 and P2 can define identical struct types T1 and T2, and one
can convert from type T1 to T2 or vice versa.
The spec defines two identical struct types as:
Two struct types are identical if they have the same sequence of
fields, and if corresponding fields have the same names, and
identical types, and identical tags. Non-exported field names
from different packages are always different.
Unfortunately, garble broke this: since we obfuscated field names
differently depending on the package, cross-package conversions like the
case above would result in typechecking errors.
To fix this, implement Joe Tsai's idea: hash struct field names with the
string representation of the entire struct. This way, identical struct
types will have their field names obfuscated in the same way in all
packages across a build.
Note that we had to refactor "reverse" a bit to start using transformer,
since now it needs to keep track of struct types as well.
This failure was affecting the build of google.golang.org/protobuf,
since it makes regular use of cross-package struct conversions.
Note that the protobuf module still fails to build, but for other
reasons. The package that used to fail now succeeds, so the build gets a
bit further than before. #240 tracks adding relevant third-party Go
modules to CI, so we'll track the other remaining failures there.
Fixes #310.
4 years ago
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
clarify how each "cannot obfuscate" map works
We used to record all objects in cannotObfuscateNames,
and then we'd add the exported ones to KnownCannotObfuscate.
Instead, teach recordAsNotObfuscated to store each object in either
knownCannotObfuscateUnexported or KnownCannotObfuscate, but not both.
The former isn't cached so it uses in-memory pointers as keys,
and the latter uses the cross-process objectStrings like before.
Functionally, this is all the same, but with the difference that the map
indexed by types.Object will not contain objects already recorded in
KnownCannotObfuscate, reducing the amount of duplicate memory use.
While here, give recordIgnore a less ambiguous name,
and remove the second parameter as it was always tf.pkg.Path().
This also means we can compare *types.Package pointers directly.
Finally, add more TODOs for further improvement ideas.
It does mean that we end up with more TODOs than before,
even though I'm fixing one, but I reckon that's a good thing.
Recording these ideas can give first-time contributors ways to help,
and it ensures I don't forget about ideas just in my head.
3 years ago
|
|
|
|
// TODO: consider caching recordedObjectString via a map,
|
|
|
|
|
// if that shows an improvement in our benchmark
|
|
|
|
|
|
properly record when type aliases are embedded as fields
There are two scenarios when it comes to embedding fields.
The first is easy, and we always handled it well:
type Named struct { Foo int }
type T struct { Named }
In this scenario, T ends up with an embedded field named "Named",
and a promoted field named "Foo".
Then there's the form with a type alias:
type Named struct { Foo int }
type Alias = Named
type T struct { Alias }
This case is different: T ends up with an embedded field named "Alias",
and a promoted field named "Foo".
Note how the field gets its name from the referenced type,
even if said type is just an alias to another type.
This poses two problems.
First, we must obfuscate the field T.Alias as the name "Alias",
and not as the name "Named" that the alias points to.
Second, we must be careful of cases where Named and Alias are declared
in different packages, as they will obfuscate the same name differently.
Both of those problems compounded in the reported issue.
The actual reason is that quic-go has a type alias in the form of:
type ConnectionState = qtls.ConnectionState
In other words, the entire problem boils down to a type alias which
points to a named type in a different package, where both types share
the same name. For example:
package parent
import "parent/p1"
type T struct { p1.SameName }
[...]
package p1
import "parent/p2"
type SameName = p2.SameName
[...]
package p2
type SameName struct { Foo int }
This broke garble because we had a heuristic to detect when an embedded
field was a type alias:
// Instead, detect such a "foreign alias embed".
// If we embed a final named type,
// but the field name does not match its name,
// then it must have been done via an alias.
// We dig out the alias's TypeName via locateForeignAlias.
if named.Obj().Name() != node.Name {
As the reader can deduce, this heuristic would incorrectly assume that
the snippet above does not embed a type alias, when in fact it does.
When obfuscating the field T.SameName, which uses a type alias,
we would correctly obfuscate the name "SameName",
but we would incorrectly obfuscate it with the package p2, not p1.
This would then result in build errors.
To fix this problem for good, we need to get rid of the heuristic.
Instead, we now mimic what was done for KnownCannotObfuscate,
but for embedded fields which use type aliases.
KnownEmbeddedAliasFields is now filled for each package
and stored in the cache as part of cachedOutput.
We can then detect the "embedded alias" case reliably,
even when the field is declared in an imported package.
On the plus side, we get to remove locateForeignAlias.
We also add a couple of TODOs to record further improvements.
Finally, add a test.
Fixes #466.
3 years ago
|
|
|
|
func recordedObjectString(obj types.Object) objectString {
|
stop loading obfuscated type information from deps
If package P1 imports package P2, P1 needs to know which names from P2
weren't obfuscated. For instance, if P2 declares T2 and does
"reflect.TypeOf(T2{...})", then P2 won't obfuscate the name T2, and
neither should P1.
This information should flow from P2 to P1, as P2 builds before
P1. We do this via obfuscatedTypesPackage; P1 loads the type information
of the obfuscated version of P2, and does a lookup for T2. If T2 exists,
then it wasn't obfuscated.
This mechanism has served us well, but it has downsides:
1) It wastes CPU; we load the type information for the entire package.
2) It's complex; for instance, we need KnownObjectFiles as an extra.
3) It makes our code harder to understand, as we load both the original
and obfuscated type informaiton.
Instead, we now have each package record what names were not obfuscated
as part of its cachedOuput file. Much like KnownObjectFiles, the map
records incrementally through the import graph, to avoid having to load
cachedOutput files for indirect dependencies.
We shouldn't need to worry about those maps getting large;
we only skip obfuscating declared names in a few uncommon scenarios,
such as the use of reflection or cgo's "//export".
Since go/types is relatively allocation-heavy, and the export files
contain a lot of data, we get a nice speed-up:
name old time/op new time/op delta
Build-16 11.5s ± 2% 11.1s ± 3% -3.77% (p=0.008 n=5+5)
name old bin-B new bin-B delta
Build-16 5.15M ± 0% 5.15M ± 0% ~ (all equal)
name old cached-time/op new cached-time/op delta
Build-16 375ms ± 3% 341ms ± 6% -8.96% (p=0.008 n=5+5)
name old sys-time/op new sys-time/op delta
Build-16 283ms ±17% 289ms ±13% ~ (p=0.841 n=5+5)
name old user-time/op new user-time/op delta
Build-16 687ms ± 6% 664ms ± 7% ~ (p=0.548 n=5+5)
Fixes #456.
Updates #475.
3 years ago
|
|
|
|
if obj, ok := obj.(*types.Var); ok && obj.IsField() {
|
|
|
|
|
// For exported fields, "pkgpath.Field" is not unique,
|
|
|
|
|
// because two exported top-level types could share "Field".
|
|
|
|
|
//
|
|
|
|
|
// Moreover, note that not all fields belong to named struct types;
|
|
|
|
|
// an API could be exposing:
|
|
|
|
|
//
|
|
|
|
|
// var usedInReflection = struct{Field string}
|
|
|
|
|
//
|
|
|
|
|
// For now, a hack: assume that packages don't declare the same field
|
|
|
|
|
// more than once in the same line. This works in practice, but one
|
|
|
|
|
// could craft Go code to break this assumption.
|
|
|
|
|
// Also note that the compiler's object files include filenames and line
|
|
|
|
|
// numbers, but not column numbers nor byte offsets.
|
|
|
|
|
// TODO(mvdan): give this another think, and add tests involving anon types.
|
|
|
|
|
pos := fset.Position(obj.Pos())
|
|
|
|
|
return fmt.Sprintf("%s.%s - %s:%d", obj.Pkg().Path(), obj.Name(),
|
|
|
|
|
filepath.Base(pos.Filename), pos.Line)
|
|
|
|
|
}
|
|
|
|
|
// Names which are not at the top level cannot be imported,
|
|
|
|
|
// so we don't need to record them either.
|
|
|
|
|
// Note that this doesn't apply to fields, which are never top-level.
|
|
|
|
|
if obj.Pkg().Scope().Lookup(obj.Name()) != obj {
|
|
|
|
|
return ""
|
|
|
|
|
}
|
|
|
|
|
// For top-level exported names, "pkgpath.Name" is unique.
|
|
|
|
|
return fmt.Sprintf("%s.%s", obj.Pkg().Path(), obj.Name())
|
|
|
|
|
}
|
|
|
|
|
|
clarify how each "cannot obfuscate" map works
We used to record all objects in cannotObfuscateNames,
and then we'd add the exported ones to KnownCannotObfuscate.
Instead, teach recordAsNotObfuscated to store each object in either
knownCannotObfuscateUnexported or KnownCannotObfuscate, but not both.
The former isn't cached so it uses in-memory pointers as keys,
and the latter uses the cross-process objectStrings like before.
Functionally, this is all the same, but with the difference that the map
indexed by types.Object will not contain objects already recorded in
KnownCannotObfuscate, reducing the amount of duplicate memory use.
While here, give recordIgnore a less ambiguous name,
and remove the second parameter as it was always tf.pkg.Path().
This also means we can compare *types.Package pointers directly.
Finally, add more TODOs for further improvement ideas.
It does mean that we end up with more TODOs than before,
even though I'm fixing one, but I reckon that's a good thing.
Recording these ideas can give first-time contributors ways to help,
and it ensures I don't forget about ideas just in my head.
3 years ago
|
|
|
|
// recordAsNotObfuscated records all the objects whose names we cannot obfuscate.
|
|
|
|
|
// An object is any named entity, such as a declared variable or type.
|
|
|
|
|
//
|
|
|
|
|
// As of June 2022, this only records types which are used in reflection.
|
|
|
|
|
// TODO(mvdan): If this is still the case in a year's time,
|
|
|
|
|
// we should probably rename "not obfuscated" and "cannot obfuscate" to be
|
|
|
|
|
// directly about reflection, e.g. "used in reflection".
|
clarify how each "cannot obfuscate" map works
We used to record all objects in cannotObfuscateNames,
and then we'd add the exported ones to KnownCannotObfuscate.
Instead, teach recordAsNotObfuscated to store each object in either
knownCannotObfuscateUnexported or KnownCannotObfuscate, but not both.
The former isn't cached so it uses in-memory pointers as keys,
and the latter uses the cross-process objectStrings like before.
Functionally, this is all the same, but with the difference that the map
indexed by types.Object will not contain objects already recorded in
KnownCannotObfuscate, reducing the amount of duplicate memory use.
While here, give recordIgnore a less ambiguous name,
and remove the second parameter as it was always tf.pkg.Path().
This also means we can compare *types.Package pointers directly.
Finally, add more TODOs for further improvement ideas.
It does mean that we end up with more TODOs than before,
even though I'm fixing one, but I reckon that's a good thing.
Recording these ideas can give first-time contributors ways to help,
and it ensures I don't forget about ideas just in my head.
3 years ago
|
|
|
|
func recordAsNotObfuscated(obj types.Object) {
|
stop loading obfuscated type information from deps
If package P1 imports package P2, P1 needs to know which names from P2
weren't obfuscated. For instance, if P2 declares T2 and does
"reflect.TypeOf(T2{...})", then P2 won't obfuscate the name T2, and
neither should P1.
This information should flow from P2 to P1, as P2 builds before
P1. We do this via obfuscatedTypesPackage; P1 loads the type information
of the obfuscated version of P2, and does a lookup for T2. If T2 exists,
then it wasn't obfuscated.
This mechanism has served us well, but it has downsides:
1) It wastes CPU; we load the type information for the entire package.
2) It's complex; for instance, we need KnownObjectFiles as an extra.
3) It makes our code harder to understand, as we load both the original
and obfuscated type informaiton.
Instead, we now have each package record what names were not obfuscated
as part of its cachedOuput file. Much like KnownObjectFiles, the map
records incrementally through the import graph, to avoid having to load
cachedOutput files for indirect dependencies.
We shouldn't need to worry about those maps getting large;
we only skip obfuscating declared names in a few uncommon scenarios,
such as the use of reflection or cgo's "//export".
Since go/types is relatively allocation-heavy, and the export files
contain a lot of data, we get a nice speed-up:
name old time/op new time/op delta
Build-16 11.5s ± 2% 11.1s ± 3% -3.77% (p=0.008 n=5+5)
name old bin-B new bin-B delta
Build-16 5.15M ± 0% 5.15M ± 0% ~ (all equal)
name old cached-time/op new cached-time/op delta
Build-16 375ms ± 3% 341ms ± 6% -8.96% (p=0.008 n=5+5)
name old sys-time/op new sys-time/op delta
Build-16 283ms ±17% 289ms ±13% ~ (p=0.841 n=5+5)
name old user-time/op new user-time/op delta
Build-16 687ms ± 6% 664ms ± 7% ~ (p=0.548 n=5+5)
Fixes #456.
Updates #475.
3 years ago
|
|
|
|
if obj.Pkg().Path() != curPkg.ImportPath {
|
|
|
|
|
panic("called recordedAsNotObfuscated with a foreign object")
|
|
|
|
|
}
|
|
|
|
|
if !obj.Exported() {
|
|
|
|
|
// Unexported names will never be used by other packages,
|
clarify how each "cannot obfuscate" map works
We used to record all objects in cannotObfuscateNames,
and then we'd add the exported ones to KnownCannotObfuscate.
Instead, teach recordAsNotObfuscated to store each object in either
knownCannotObfuscateUnexported or KnownCannotObfuscate, but not both.
The former isn't cached so it uses in-memory pointers as keys,
and the latter uses the cross-process objectStrings like before.
Functionally, this is all the same, but with the difference that the map
indexed by types.Object will not contain objects already recorded in
KnownCannotObfuscate, reducing the amount of duplicate memory use.
While here, give recordIgnore a less ambiguous name,
and remove the second parameter as it was always tf.pkg.Path().
This also means we can compare *types.Package pointers directly.
Finally, add more TODOs for further improvement ideas.
It does mean that we end up with more TODOs than before,
even though I'm fixing one, but I reckon that's a good thing.
Recording these ideas can give first-time contributors ways to help,
and it ensures I don't forget about ideas just in my head.
3 years ago
|
|
|
|
// so we don't need to bother recording them in cachedOutput.
|
|
|
|
|
knownCannotObfuscateUnexported[obj] = true
|
|
|
|
|
return
|
stop loading obfuscated type information from deps
If package P1 imports package P2, P1 needs to know which names from P2
weren't obfuscated. For instance, if P2 declares T2 and does
"reflect.TypeOf(T2{...})", then P2 won't obfuscate the name T2, and
neither should P1.
This information should flow from P2 to P1, as P2 builds before
P1. We do this via obfuscatedTypesPackage; P1 loads the type information
of the obfuscated version of P2, and does a lookup for T2. If T2 exists,
then it wasn't obfuscated.
This mechanism has served us well, but it has downsides:
1) It wastes CPU; we load the type information for the entire package.
2) It's complex; for instance, we need KnownObjectFiles as an extra.
3) It makes our code harder to understand, as we load both the original
and obfuscated type informaiton.
Instead, we now have each package record what names were not obfuscated
as part of its cachedOuput file. Much like KnownObjectFiles, the map
records incrementally through the import graph, to avoid having to load
cachedOutput files for indirect dependencies.
We shouldn't need to worry about those maps getting large;
we only skip obfuscating declared names in a few uncommon scenarios,
such as the use of reflection or cgo's "//export".
Since go/types is relatively allocation-heavy, and the export files
contain a lot of data, we get a nice speed-up:
name old time/op new time/op delta
Build-16 11.5s ± 2% 11.1s ± 3% -3.77% (p=0.008 n=5+5)
name old bin-B new bin-B delta
Build-16 5.15M ± 0% 5.15M ± 0% ~ (all equal)
name old cached-time/op new cached-time/op delta
Build-16 375ms ± 3% 341ms ± 6% -8.96% (p=0.008 n=5+5)
name old sys-time/op new sys-time/op delta
Build-16 283ms ±17% 289ms ±13% ~ (p=0.841 n=5+5)
name old user-time/op new user-time/op delta
Build-16 687ms ± 6% 664ms ± 7% ~ (p=0.548 n=5+5)
Fixes #456.
Updates #475.
3 years ago
|
|
|
|
}
|
|
|
|
|
|
clarify how each "cannot obfuscate" map works
We used to record all objects in cannotObfuscateNames,
and then we'd add the exported ones to KnownCannotObfuscate.
Instead, teach recordAsNotObfuscated to store each object in either
knownCannotObfuscateUnexported or KnownCannotObfuscate, but not both.
The former isn't cached so it uses in-memory pointers as keys,
and the latter uses the cross-process objectStrings like before.
Functionally, this is all the same, but with the difference that the map
indexed by types.Object will not contain objects already recorded in
KnownCannotObfuscate, reducing the amount of duplicate memory use.
While here, give recordIgnore a less ambiguous name,
and remove the second parameter as it was always tf.pkg.Path().
This also means we can compare *types.Package pointers directly.
Finally, add more TODOs for further improvement ideas.
It does mean that we end up with more TODOs than before,
even though I'm fixing one, but I reckon that's a good thing.
Recording these ideas can give first-time contributors ways to help,
and it ensures I don't forget about ideas just in my head.
3 years ago
|
|
|
|
objStr := recordedObjectString(obj)
|
|
|
|
|
if objStr == "" {
|
|
|
|
|
// If the object can't be described via a qualified string,
|
|
|
|
|
// then other packages can't use it.
|
|
|
|
|
// TODO: should we still record it in knownCannotObfuscateUnexported?
|
|
|
|
|
return
|
stop loading obfuscated type information from deps
If package P1 imports package P2, P1 needs to know which names from P2
weren't obfuscated. For instance, if P2 declares T2 and does
"reflect.TypeOf(T2{...})", then P2 won't obfuscate the name T2, and
neither should P1.
This information should flow from P2 to P1, as P2 builds before
P1. We do this via obfuscatedTypesPackage; P1 loads the type information
of the obfuscated version of P2, and does a lookup for T2. If T2 exists,
then it wasn't obfuscated.
This mechanism has served us well, but it has downsides:
1) It wastes CPU; we load the type information for the entire package.
2) It's complex; for instance, we need KnownObjectFiles as an extra.
3) It makes our code harder to understand, as we load both the original
and obfuscated type informaiton.
Instead, we now have each package record what names were not obfuscated
as part of its cachedOuput file. Much like KnownObjectFiles, the map
records incrementally through the import graph, to avoid having to load
cachedOutput files for indirect dependencies.
We shouldn't need to worry about those maps getting large;
we only skip obfuscating declared names in a few uncommon scenarios,
such as the use of reflection or cgo's "//export".
Since go/types is relatively allocation-heavy, and the export files
contain a lot of data, we get a nice speed-up:
name old time/op new time/op delta
Build-16 11.5s ± 2% 11.1s ± 3% -3.77% (p=0.008 n=5+5)
name old bin-B new bin-B delta
Build-16 5.15M ± 0% 5.15M ± 0% ~ (all equal)
name old cached-time/op new cached-time/op delta
Build-16 375ms ± 3% 341ms ± 6% -8.96% (p=0.008 n=5+5)
name old sys-time/op new sys-time/op delta
Build-16 283ms ±17% 289ms ±13% ~ (p=0.841 n=5+5)
name old user-time/op new user-time/op delta
Build-16 687ms ± 6% 664ms ± 7% ~ (p=0.548 n=5+5)
Fixes #456.
Updates #475.
3 years ago
|
|
|
|
}
|
clarify how each "cannot obfuscate" map works
We used to record all objects in cannotObfuscateNames,
and then we'd add the exported ones to KnownCannotObfuscate.
Instead, teach recordAsNotObfuscated to store each object in either
knownCannotObfuscateUnexported or KnownCannotObfuscate, but not both.
The former isn't cached so it uses in-memory pointers as keys,
and the latter uses the cross-process objectStrings like before.
Functionally, this is all the same, but with the difference that the map
indexed by types.Object will not contain objects already recorded in
KnownCannotObfuscate, reducing the amount of duplicate memory use.
While here, give recordIgnore a less ambiguous name,
and remove the second parameter as it was always tf.pkg.Path().
This also means we can compare *types.Package pointers directly.
Finally, add more TODOs for further improvement ideas.
It does mean that we end up with more TODOs than before,
even though I'm fixing one, but I reckon that's a good thing.
Recording these ideas can give first-time contributors ways to help,
and it ensures I don't forget about ideas just in my head.
3 years ago
|
|
|
|
cachedOutput.KnownCannotObfuscate[objStr] = struct{}{}
|
stop loading obfuscated type information from deps
If package P1 imports package P2, P1 needs to know which names from P2
weren't obfuscated. For instance, if P2 declares T2 and does
"reflect.TypeOf(T2{...})", then P2 won't obfuscate the name T2, and
neither should P1.
This information should flow from P2 to P1, as P2 builds before
P1. We do this via obfuscatedTypesPackage; P1 loads the type information
of the obfuscated version of P2, and does a lookup for T2. If T2 exists,
then it wasn't obfuscated.
This mechanism has served us well, but it has downsides:
1) It wastes CPU; we load the type information for the entire package.
2) It's complex; for instance, we need KnownObjectFiles as an extra.
3) It makes our code harder to understand, as we load both the original
and obfuscated type informaiton.
Instead, we now have each package record what names were not obfuscated
as part of its cachedOuput file. Much like KnownObjectFiles, the map
records incrementally through the import graph, to avoid having to load
cachedOutput files for indirect dependencies.
We shouldn't need to worry about those maps getting large;
we only skip obfuscating declared names in a few uncommon scenarios,
such as the use of reflection or cgo's "//export".
Since go/types is relatively allocation-heavy, and the export files
contain a lot of data, we get a nice speed-up:
name old time/op new time/op delta
Build-16 11.5s ± 2% 11.1s ± 3% -3.77% (p=0.008 n=5+5)
name old bin-B new bin-B delta
Build-16 5.15M ± 0% 5.15M ± 0% ~ (all equal)
name old cached-time/op new cached-time/op delta
Build-16 375ms ± 3% 341ms ± 6% -8.96% (p=0.008 n=5+5)
name old sys-time/op new sys-time/op delta
Build-16 283ms ±17% 289ms ±13% ~ (p=0.841 n=5+5)
name old user-time/op new user-time/op delta
Build-16 687ms ± 6% 664ms ± 7% ~ (p=0.548 n=5+5)
Fixes #456.
Updates #475.
3 years ago
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
func recordedAsNotObfuscated(obj types.Object) bool {
|
clarify how each "cannot obfuscate" map works
We used to record all objects in cannotObfuscateNames,
and then we'd add the exported ones to KnownCannotObfuscate.
Instead, teach recordAsNotObfuscated to store each object in either
knownCannotObfuscateUnexported or KnownCannotObfuscate, but not both.
The former isn't cached so it uses in-memory pointers as keys,
and the latter uses the cross-process objectStrings like before.
Functionally, this is all the same, but with the difference that the map
indexed by types.Object will not contain objects already recorded in
KnownCannotObfuscate, reducing the amount of duplicate memory use.
While here, give recordIgnore a less ambiguous name,
and remove the second parameter as it was always tf.pkg.Path().
This also means we can compare *types.Package pointers directly.
Finally, add more TODOs for further improvement ideas.
It does mean that we end up with more TODOs than before,
even though I'm fixing one, but I reckon that's a good thing.
Recording these ideas can give first-time contributors ways to help,
and it ensures I don't forget about ideas just in my head.
3 years ago
|
|
|
|
if knownCannotObfuscateUnexported[obj] {
|
|
|
|
|
return true
|
stop loading obfuscated type information from deps
If package P1 imports package P2, P1 needs to know which names from P2
weren't obfuscated. For instance, if P2 declares T2 and does
"reflect.TypeOf(T2{...})", then P2 won't obfuscate the name T2, and
neither should P1.
This information should flow from P2 to P1, as P2 builds before
P1. We do this via obfuscatedTypesPackage; P1 loads the type information
of the obfuscated version of P2, and does a lookup for T2. If T2 exists,
then it wasn't obfuscated.
This mechanism has served us well, but it has downsides:
1) It wastes CPU; we load the type information for the entire package.
2) It's complex; for instance, we need KnownObjectFiles as an extra.
3) It makes our code harder to understand, as we load both the original
and obfuscated type informaiton.
Instead, we now have each package record what names were not obfuscated
as part of its cachedOuput file. Much like KnownObjectFiles, the map
records incrementally through the import graph, to avoid having to load
cachedOutput files for indirect dependencies.
We shouldn't need to worry about those maps getting large;
we only skip obfuscating declared names in a few uncommon scenarios,
such as the use of reflection or cgo's "//export".
Since go/types is relatively allocation-heavy, and the export files
contain a lot of data, we get a nice speed-up:
name old time/op new time/op delta
Build-16 11.5s ± 2% 11.1s ± 3% -3.77% (p=0.008 n=5+5)
name old bin-B new bin-B delta
Build-16 5.15M ± 0% 5.15M ± 0% ~ (all equal)
name old cached-time/op new cached-time/op delta
Build-16 375ms ± 3% 341ms ± 6% -8.96% (p=0.008 n=5+5)
name old sys-time/op new sys-time/op delta
Build-16 283ms ±17% 289ms ±13% ~ (p=0.841 n=5+5)
name old user-time/op new user-time/op delta
Build-16 687ms ± 6% 664ms ± 7% ~ (p=0.548 n=5+5)
Fixes #456.
Updates #475.
3 years ago
|
|
|
|
}
|
|
|
|
|
objStr := recordedObjectString(obj)
|
|
|
|
|
if objStr == "" {
|
|
|
|
|
return false
|
|
|
|
|
}
|
|
|
|
|
_, ok := cachedOutput.KnownCannotObfuscate[objStr]
|
|
|
|
|
return ok
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
func (tf *transformer) removeUnnecessaryImports(file *ast.File) {
|
|
|
|
|
usedImports := make(map[string]bool)
|
|
|
|
|
ast.Inspect(file, func(n ast.Node) bool {
|
|
|
|
|
node, ok := n.(*ast.Ident)
|
|
|
|
|
if !ok {
|
|
|
|
|
return true
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
uses, ok := tf.info.Uses[node].(*types.PkgName)
|
|
|
|
|
if !ok {
|
|
|
|
|
return true
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
usedImports[uses.Imported().Path()] = true
|
|
|
|
|
|
|
|
|
|
return true
|
|
|
|
|
})
|
|
|
|
|
|
|
|
|
|
for _, imp := range file.Imports {
|
|
|
|
|
if imp.Name != nil && (imp.Name.Name == "_" || imp.Name.Name == ".") {
|
|
|
|
|
continue
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
path, err := strconv.Unquote(imp.Path.Value)
|
|
|
|
|
if err != nil {
|
|
|
|
|
panic(err)
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// The import path can't be used directly here, because the actual
|
|
|
|
|
// path resolved via go/types might be different from the naive path.
|
|
|
|
|
lpkg, err := listPackage(path)
|
|
|
|
|
if err != nil {
|
|
|
|
|
panic(err)
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if usedImports[lpkg.ImportPath] {
|
|
|
|
|
continue
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
imp.Name = ast.NewIdent("_")
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// transformGo obfuscates the provided Go syntax file.
|
|
|
|
|
func (tf *transformer) transformGo(file *ast.File) *ast.File {
|
|
|
|
|
// Only obfuscate the literals here if the flag is on
|
|
|
|
|
// and if the package in question is to be obfuscated.
|
support GOGARBLE=* with -literals again
We recently made an important change when obfuscating the runtime,
so that if it's missing any linkname packages in ListedPackages,
it does an extra "go list" call to obtain their information.
This works very well, but we missed an edge case.
In main.go, we disable flagLiterals for the runtime package,
but not for other packages like sync/atomic.
And, since the runtime's extra "go list" has to compute GarbleActionIDs,
it uses the list of garble flags via appendFlags.
Unfortunately, it thinks "-literals" isn't set, when it is,
and the other packages see it as being set.
This discrepancy results in link time errors,
as each end of the linkname obfuscates with a different hash:
> garble -literals build
[stderr]
# test/main
jccGkbFG.(*yijmzGHo).String: relocation target jccGkbFG.e_77sflf not defined
jQg9GEkg.(*NLxfRPAP).pB5p2ZP0: relocation target jQg9GEkg.ce66Fmzl not defined
jQg9GEkg.(*NLxfRPAP).pB5p2ZP0: relocation target jQg9GEkg.e5kPa1qY not defined
jQg9GEkg.(*NLxfRPAP).pB5p2ZP0: relocation target jQg9GEkg.aQ_3sL3Q not defined
jQg9GEkg.(*NLxfRPAP).pB5p2ZP0: relocation target jQg9GEkg.zls3wmws not defined
jQg9GEkg.(*NLxfRPAP).pB5p2ZP0: relocation target jQg9GEkg.g69WgKIS not defined
To fix the problem, treat flagLiterals as read-only after flag.Parse,
just like we already do with the other flags except flagDebugDir.
The code that turned flagLiterals to false is no longer needed,
as literals.Obfuscate is only called when ToObfuscate is true,
and ToObfuscate is false for runtimeAndDeps already.
3 years ago
|
|
|
|
//
|
|
|
|
|
// We can't obfuscate literals in the runtime and its dependencies,
|
|
|
|
|
// because obfuscated literals sometimes escape to heap,
|
|
|
|
|
// and that's not allowed in the runtime itself.
|
unify the definition and storage of flag values
The parent garble process parses the original flags,
as provided by the user via the command line.
Previously, those got stored in the shared cache file,
so that child processes spawned by toolexec could see them.
Unfortunately, this made the code relatively easy to misuse.
A child process would always see flagLiterals as zero value,
given that it should never see such a flag argument directly.
Similarly, one would have to be careful with cached options,
as they could only be consumed after the cache file is loaded.
Simplify the situation by deduplicating the storage of flags.
Now, the parent passes all flags onto children via toolexec.
One exception is GarbleDir, which now becomes an env var.
This seems in line with other top-level dirs like GARBLE_SHARED.
Finally, we turn -seed into a flag.Value,
which lets us implement its "set" behavior as part of flag.Parse.
Overall, we barely reduce the amount of code involved,
but we certainly remove a couple of footguns.
As part of the cleanup, we also introduce appendFlags.
3 years ago
|
|
|
|
if flagLiterals && curPkg.ToObfuscate {
|
avoid obfuscating literals set via -ldflags=-X
The -X linker flag sets a string variable to a given value,
which is often used to inject strings such as versions.
The way garble's literal obfuscation works,
we replace string literals with anonymous functions which,
when evaluated, result in the original string.
Both of these features work fine separately,
but when intersecting, they break. For example, given:
var myVar = "original"
[...]
-ldflags=-X=main.myVar=replaced
The -X flag effectively replaces the initial value,
and -literals adds code to be run at init time:
var myVar = "replaced"
func init() { myVar = func() string { ... } }
Since the init func runs later, -literals breaks -X.
To avoid that problem,
don't obfuscate literals whose variables are set via -ldflags=-X.
We also leave TODOs about obfuscating those in the future,
but we're also leaving regression tests to ensure we get it right.
Fixes #323.
3 years ago
|
|
|
|
file = literals.Obfuscate(file, tf.info, fset, tf.linkerVariableStrings)
|
|
|
|
|
|
|
|
|
|
// some imported constants might not be needed anymore, remove unnecessary imports
|
|
|
|
|
tf.removeUnnecessaryImports(file)
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
pre := func(cursor *astutil.Cursor) bool {
|
|
|
|
|
node, ok := cursor.Node().(*ast.Ident)
|
|
|
|
|
if !ok {
|
|
|
|
|
return true
|
|
|
|
|
}
|
|
|
|
|
name := node.Name
|
|
|
|
|
if name == "_" {
|
|
|
|
|
return true // unnamed remains unnamed
|
|
|
|
|
}
|
|
|
|
|
obj := tf.info.ObjectOf(node)
|
|
|
|
|
if obj == nil {
|
obfuscate all variable names, even local ones (#420)
In the added test case, "garble -literals build" would fail:
--- FAIL: TestScripts/literals (8.29s)
testscript.go:397:
> env GOPRIVATE=test/main
> garble -literals build
[stderr]
# test/main
Usz1FmFm.go:1: cannot call non-function string (type int), declared at Usz1FmFm.go:1
Usz1FmFm.go:1: string is not a type
Usz1FmFm.go:1: cannot call non-function append (type int), declared at Usz1FmFm.go:1
That is, for input code such as:
var append int
println("foo")
_ = append
We'd end up with obfuscated code like:
var append int
println(func() string {
// obfuscation...
x = append(x, ...)
// obfuscation...
return string(x)
})
_ = append
Which would then break, as the code is shadowing the "append" builtin.
To work around this, always obfuscate variable names, so we end up with:
var mwu1xuNz int
println(func() string {
// obfuscation...
x = append(x, ...)
// obfuscation...
return string(x)
})
_ = mwu1xuNz
This change shouldn't make the quality of our obfuscation stronger,
as local variable names do not currently end up in Go binaries.
However, this does make garble more consistent in treating identifiers,
and it completely avoids any issues related to shadowing builtins.
Moreover, this also paves the way for publishing obfuscated source code,
such as #369.
Fixes #417.
3 years ago
|
|
|
|
_, isImplicit := tf.info.Defs[node]
|
|
|
|
|
_, parentIsFile := cursor.Parent().(*ast.File)
|
|
|
|
|
if !isImplicit || parentIsFile {
|
|
|
|
|
// We only care about nil objects in the switch scenario below.
|
obfuscate all variable names, even local ones (#420)
In the added test case, "garble -literals build" would fail:
--- FAIL: TestScripts/literals (8.29s)
testscript.go:397:
> env GOPRIVATE=test/main
> garble -literals build
[stderr]
# test/main
Usz1FmFm.go:1: cannot call non-function string (type int), declared at Usz1FmFm.go:1
Usz1FmFm.go:1: string is not a type
Usz1FmFm.go:1: cannot call non-function append (type int), declared at Usz1FmFm.go:1
That is, for input code such as:
var append int
println("foo")
_ = append
We'd end up with obfuscated code like:
var append int
println(func() string {
// obfuscation...
x = append(x, ...)
// obfuscation...
return string(x)
})
_ = append
Which would then break, as the code is shadowing the "append" builtin.
To work around this, always obfuscate variable names, so we end up with:
var mwu1xuNz int
println(func() string {
// obfuscation...
x = append(x, ...)
// obfuscation...
return string(x)
})
_ = mwu1xuNz
This change shouldn't make the quality of our obfuscation stronger,
as local variable names do not currently end up in Go binaries.
However, this does make garble more consistent in treating identifiers,
and it completely avoids any issues related to shadowing builtins.
Moreover, this also paves the way for publishing obfuscated source code,
such as #369.
Fixes #417.
3 years ago
|
|
|
|
return true
|
|
|
|
|
}
|
|
|
|
|
// In a type switch like "switch foo := bar.(type) {",
|
|
|
|
|
// "foo" is being declared as a symbolic variable,
|
|
|
|
|
// as it is only actually declared in each "case SomeType:".
|
|
|
|
|
//
|
|
|
|
|
// As such, the symbolic "foo" in the syntax tree has no object,
|
|
|
|
|
// but it is still recorded under Defs with a nil value.
|
|
|
|
|
// We still want to obfuscate that syntax tree identifier,
|
|
|
|
|
// so if we detect the case, create a dummy types.Var for it.
|
|
|
|
|
//
|
|
|
|
|
// Note that "package mypkg" also denotes a nil object in Defs,
|
|
|
|
|
// and we don't want to treat that "mypkg" as a variable,
|
|
|
|
|
// so avoid that case by checking the type of cursor.Parent.
|
|
|
|
|
obj = types.NewVar(node.Pos(), tf.pkg, name, nil)
|
|
|
|
|
}
|
|
|
|
|
pkg := obj.Pkg()
|
|
|
|
|
if vr, ok := obj.(*types.Var); ok && vr.Embedded() {
|
obfuscate alias names like any other objects
Before this change, we would try to never obfuscate alias names. That
was far from ideal, as they can end up in field names via anonymous
fields.
Even then, we would sometimes still fail to build, because we would
inconsistently obfuscate alias names. For example, in the added test
case:
--- FAIL: TestScripts/syntax (0.23s)
testscript.go:397:
> env GOPRIVATE='test/main,private.source'
> garble build
[stderr]
# test/main/sub
Lv_a8gRD.go:15: undefined: KCvSpxmQ
To fix this problem, we set obj to be the TypeName corresponding to the
alias when it is used as an embedded field. We can then make the right
choice when obfuscating the name.
Right now, all aliases will be obfuscated. A TODO exists about not
obfuscating alias names when they're used as embedded fields in a struct
type in the same package, and that package is used for reflection -
since then, the alias name ends up as the field name.
With these changes, the protobuf module now builds.
4 years ago
|
|
|
|
// The docs for ObjectOf say:
|
|
|
|
|
//
|
|
|
|
|
// If id is an embedded struct field, ObjectOf returns the
|
|
|
|
|
// field (*Var) it defines, not the type (*TypeName) it uses.
|
|
|
|
|
//
|
|
|
|
|
// If this embedded field is a type alias, we want to
|
|
|
|
|
// handle the alias's TypeName instead of treating it as
|
|
|
|
|
// the type the alias points to.
|
obfuscate alias names like any other objects
Before this change, we would try to never obfuscate alias names. That
was far from ideal, as they can end up in field names via anonymous
fields.
Even then, we would sometimes still fail to build, because we would
inconsistently obfuscate alias names. For example, in the added test
case:
--- FAIL: TestScripts/syntax (0.23s)
testscript.go:397:
> env GOPRIVATE='test/main,private.source'
> garble build
[stderr]
# test/main/sub
Lv_a8gRD.go:15: undefined: KCvSpxmQ
To fix this problem, we set obj to be the TypeName corresponding to the
alias when it is used as an embedded field. We can then make the right
choice when obfuscating the name.
Right now, all aliases will be obfuscated. A TODO exists about not
obfuscating alias names when they're used as embedded fields in a struct
type in the same package, and that package is used for reflection -
since then, the alias name ends up as the field name.
With these changes, the protobuf module now builds.
4 years ago
|
|
|
|
//
|
properly record when type aliases are embedded as fields
There are two scenarios when it comes to embedding fields.
The first is easy, and we always handled it well:
type Named struct { Foo int }
type T struct { Named }
In this scenario, T ends up with an embedded field named "Named",
and a promoted field named "Foo".
Then there's the form with a type alias:
type Named struct { Foo int }
type Alias = Named
type T struct { Alias }
This case is different: T ends up with an embedded field named "Alias",
and a promoted field named "Foo".
Note how the field gets its name from the referenced type,
even if said type is just an alias to another type.
This poses two problems.
First, we must obfuscate the field T.Alias as the name "Alias",
and not as the name "Named" that the alias points to.
Second, we must be careful of cases where Named and Alias are declared
in different packages, as they will obfuscate the same name differently.
Both of those problems compounded in the reported issue.
The actual reason is that quic-go has a type alias in the form of:
type ConnectionState = qtls.ConnectionState
In other words, the entire problem boils down to a type alias which
points to a named type in a different package, where both types share
the same name. For example:
package parent
import "parent/p1"
type T struct { p1.SameName }
[...]
package p1
import "parent/p2"
type SameName = p2.SameName
[...]
package p2
type SameName struct { Foo int }
This broke garble because we had a heuristic to detect when an embedded
field was a type alias:
// Instead, detect such a "foreign alias embed".
// If we embed a final named type,
// but the field name does not match its name,
// then it must have been done via an alias.
// We dig out the alias's TypeName via locateForeignAlias.
if named.Obj().Name() != node.Name {
As the reader can deduce, this heuristic would incorrectly assume that
the snippet above does not embed a type alias, when in fact it does.
When obfuscating the field T.SameName, which uses a type alias,
we would correctly obfuscate the name "SameName",
but we would incorrectly obfuscate it with the package p2, not p1.
This would then result in build errors.
To fix this problem for good, we need to get rid of the heuristic.
Instead, we now mimic what was done for KnownCannotObfuscate,
but for embedded fields which use type aliases.
KnownEmbeddedAliasFields is now filled for each package
and stored in the cache as part of cachedOutput.
We can then detect the "embedded alias" case reliably,
even when the field is declared in an imported package.
On the plus side, we get to remove locateForeignAlias.
We also add a couple of TODOs to record further improvements.
Finally, add a test.
Fixes #466.
3 years ago
|
|
|
|
// Alternatively, if we don't have an alias, we still want to
|
obfuscate alias names like any other objects
Before this change, we would try to never obfuscate alias names. That
was far from ideal, as they can end up in field names via anonymous
fields.
Even then, we would sometimes still fail to build, because we would
inconsistently obfuscate alias names. For example, in the added test
case:
--- FAIL: TestScripts/syntax (0.23s)
testscript.go:397:
> env GOPRIVATE='test/main,private.source'
> garble build
[stderr]
# test/main/sub
Lv_a8gRD.go:15: undefined: KCvSpxmQ
To fix this problem, we set obj to be the TypeName corresponding to the
alias when it is used as an embedded field. We can then make the right
choice when obfuscating the name.
Right now, all aliases will be obfuscated. A TODO exists about not
obfuscating alias names when they're used as embedded fields in a struct
type in the same package, and that package is used for reflection -
since then, the alias name ends up as the field name.
With these changes, the protobuf module now builds.
4 years ago
|
|
|
|
// use the embedded type, not the field.
|
properly record when type aliases are embedded as fields
There are two scenarios when it comes to embedding fields.
The first is easy, and we always handled it well:
type Named struct { Foo int }
type T struct { Named }
In this scenario, T ends up with an embedded field named "Named",
and a promoted field named "Foo".
Then there's the form with a type alias:
type Named struct { Foo int }
type Alias = Named
type T struct { Alias }
This case is different: T ends up with an embedded field named "Alias",
and a promoted field named "Foo".
Note how the field gets its name from the referenced type,
even if said type is just an alias to another type.
This poses two problems.
First, we must obfuscate the field T.Alias as the name "Alias",
and not as the name "Named" that the alias points to.
Second, we must be careful of cases where Named and Alias are declared
in different packages, as they will obfuscate the same name differently.
Both of those problems compounded in the reported issue.
The actual reason is that quic-go has a type alias in the form of:
type ConnectionState = qtls.ConnectionState
In other words, the entire problem boils down to a type alias which
points to a named type in a different package, where both types share
the same name. For example:
package parent
import "parent/p1"
type T struct { p1.SameName }
[...]
package p1
import "parent/p2"
type SameName = p2.SameName
[...]
package p2
type SameName struct { Foo int }
This broke garble because we had a heuristic to detect when an embedded
field was a type alias:
// Instead, detect such a "foreign alias embed".
// If we embed a final named type,
// but the field name does not match its name,
// then it must have been done via an alias.
// We dig out the alias's TypeName via locateForeignAlias.
if named.Obj().Name() != node.Name {
As the reader can deduce, this heuristic would incorrectly assume that
the snippet above does not embed a type alias, when in fact it does.
When obfuscating the field T.SameName, which uses a type alias,
we would correctly obfuscate the name "SameName",
but we would incorrectly obfuscate it with the package p2, not p1.
This would then result in build errors.
To fix this problem for good, we need to get rid of the heuristic.
Instead, we now mimic what was done for KnownCannotObfuscate,
but for embedded fields which use type aliases.
KnownEmbeddedAliasFields is now filled for each package
and stored in the cache as part of cachedOutput.
We can then detect the "embedded alias" case reliably,
even when the field is declared in an imported package.
On the plus side, we get to remove locateForeignAlias.
We also add a couple of TODOs to record further improvements.
Finally, add a test.
Fixes #466.
3 years ago
|
|
|
|
vrStr := recordedObjectString(vr)
|
|
|
|
|
aliasTypeName, ok := cachedOutput.KnownEmbeddedAliasFields[vrStr]
|
|
|
|
|
if ok {
|
|
|
|
|
pkg2 := tf.pkg
|
|
|
|
|
if path := aliasTypeName.PkgPath; pkg2.Path() != path {
|
|
|
|
|
// If the package is a dependency, import it.
|
|
|
|
|
// We can't grab the package via tf.pkg.Imports,
|
|
|
|
|
// because some of the packages under there are incomplete.
|
|
|
|
|
// ImportFrom will cache complete imports, anyway.
|
properly record when type aliases are embedded as fields
There are two scenarios when it comes to embedding fields.
The first is easy, and we always handled it well:
type Named struct { Foo int }
type T struct { Named }
In this scenario, T ends up with an embedded field named "Named",
and a promoted field named "Foo".
Then there's the form with a type alias:
type Named struct { Foo int }
type Alias = Named
type T struct { Alias }
This case is different: T ends up with an embedded field named "Alias",
and a promoted field named "Foo".
Note how the field gets its name from the referenced type,
even if said type is just an alias to another type.
This poses two problems.
First, we must obfuscate the field T.Alias as the name "Alias",
and not as the name "Named" that the alias points to.
Second, we must be careful of cases where Named and Alias are declared
in different packages, as they will obfuscate the same name differently.
Both of those problems compounded in the reported issue.
The actual reason is that quic-go has a type alias in the form of:
type ConnectionState = qtls.ConnectionState
In other words, the entire problem boils down to a type alias which
points to a named type in a different package, where both types share
the same name. For example:
package parent
import "parent/p1"
type T struct { p1.SameName }
[...]
package p1
import "parent/p2"
type SameName = p2.SameName
[...]
package p2
type SameName struct { Foo int }
This broke garble because we had a heuristic to detect when an embedded
field was a type alias:
// Instead, detect such a "foreign alias embed".
// If we embed a final named type,
// but the field name does not match its name,
// then it must have been done via an alias.
// We dig out the alias's TypeName via locateForeignAlias.
if named.Obj().Name() != node.Name {
As the reader can deduce, this heuristic would incorrectly assume that
the snippet above does not embed a type alias, when in fact it does.
When obfuscating the field T.SameName, which uses a type alias,
we would correctly obfuscate the name "SameName",
but we would incorrectly obfuscate it with the package p2, not p1.
This would then result in build errors.
To fix this problem for good, we need to get rid of the heuristic.
Instead, we now mimic what was done for KnownCannotObfuscate,
but for embedded fields which use type aliases.
KnownEmbeddedAliasFields is now filled for each package
and stored in the cache as part of cachedOutput.
We can then detect the "embedded alias" case reliably,
even when the field is declared in an imported package.
On the plus side, we get to remove locateForeignAlias.
We also add a couple of TODOs to record further improvements.
Finally, add a test.
Fixes #466.
3 years ago
|
|
|
|
var err error
|
|
|
|
|
pkg2, err = origImporter.ImportFrom(path, parentWorkDir, 0)
|
properly record when type aliases are embedded as fields
There are two scenarios when it comes to embedding fields.
The first is easy, and we always handled it well:
type Named struct { Foo int }
type T struct { Named }
In this scenario, T ends up with an embedded field named "Named",
and a promoted field named "Foo".
Then there's the form with a type alias:
type Named struct { Foo int }
type Alias = Named
type T struct { Alias }
This case is different: T ends up with an embedded field named "Alias",
and a promoted field named "Foo".
Note how the field gets its name from the referenced type,
even if said type is just an alias to another type.
This poses two problems.
First, we must obfuscate the field T.Alias as the name "Alias",
and not as the name "Named" that the alias points to.
Second, we must be careful of cases where Named and Alias are declared
in different packages, as they will obfuscate the same name differently.
Both of those problems compounded in the reported issue.
The actual reason is that quic-go has a type alias in the form of:
type ConnectionState = qtls.ConnectionState
In other words, the entire problem boils down to a type alias which
points to a named type in a different package, where both types share
the same name. For example:
package parent
import "parent/p1"
type T struct { p1.SameName }
[...]
package p1
import "parent/p2"
type SameName = p2.SameName
[...]
package p2
type SameName struct { Foo int }
This broke garble because we had a heuristic to detect when an embedded
field was a type alias:
// Instead, detect such a "foreign alias embed".
// If we embed a final named type,
// but the field name does not match its name,
// then it must have been done via an alias.
// We dig out the alias's TypeName via locateForeignAlias.
if named.Obj().Name() != node.Name {
As the reader can deduce, this heuristic would incorrectly assume that
the snippet above does not embed a type alias, when in fact it does.
When obfuscating the field T.SameName, which uses a type alias,
we would correctly obfuscate the name "SameName",
but we would incorrectly obfuscate it with the package p2, not p1.
This would then result in build errors.
To fix this problem for good, we need to get rid of the heuristic.
Instead, we now mimic what was done for KnownCannotObfuscate,
but for embedded fields which use type aliases.
KnownEmbeddedAliasFields is now filled for each package
and stored in the cache as part of cachedOutput.
We can then detect the "embedded alias" case reliably,
even when the field is declared in an imported package.
On the plus side, we get to remove locateForeignAlias.
We also add a couple of TODOs to record further improvements.
Finally, add a test.
Fixes #466.
3 years ago
|
|
|
|
if err != nil {
|
|
|
|
|
panic(err)
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
tname, ok := pkg2.Scope().Lookup(aliasTypeName.Name).(*types.TypeName)
|
|
|
|
|
if !ok || !tname.IsAlias() {
|
|
|
|
|
if !ok {
|
|
|
|
|
panic(fmt.Sprintf("KnownEmbeddedAliasFields pointed %q to a missing type %q", vrStr, aliasTypeName))
|
|
|
|
|
}
|
|
|
|
|
panic(fmt.Sprintf("KnownEmbeddedAliasFields pointed %q to a non-alias type %q", vrStr, aliasTypeName))
|
handle aliases to foreign named types properly
When such an alias name was used to define an embedded field, we handled
that case gracefully via the code using:
tf.info.Uses[node].(*types.TypeName)
Unfortunately, when the same field name was used elsewhere, such as a
composite literal, tf.Info.Uses gave us a *types.Var, not a
*types.TypeName, meaning we could no longer tell if this was an alias,
or what it pointed to.
Thus, we failed to obfuscate the name properly in the added test case:
> garble build
[stderr]
# test/main/sub
xxWZf66u.go:36: unknown field 'foreignAlias' in struct literal of type smhWelwn
It doesn't seem like any of the go/types APIs allows us to obtain the
*types.TypeName directly in this scenario. Thus, use a trick that we
used before: after typechecking, but before obfuscating, record all
embedded struct field *types.Var which are aliases via a map, where the
value holds the *types.TypeName for the alias.
Updates #349.
4 years ago
|
|
|
|
}
|
obfuscate alias names like any other objects
Before this change, we would try to never obfuscate alias names. That
was far from ideal, as they can end up in field names via anonymous
fields.
Even then, we would sometimes still fail to build, because we would
inconsistently obfuscate alias names. For example, in the added test
case:
--- FAIL: TestScripts/syntax (0.23s)
testscript.go:397:
> env GOPRIVATE='test/main,private.source'
> garble build
[stderr]
# test/main/sub
Lv_a8gRD.go:15: undefined: KCvSpxmQ
To fix this problem, we set obj to be the TypeName corresponding to the
alias when it is used as an embedded field. We can then make the right
choice when obfuscating the name.
Right now, all aliases will be obfuscated. A TODO exists about not
obfuscating alias names when they're used as embedded fields in a struct
type in the same package, and that package is used for reflection -
since then, the alias name ends up as the field name.
With these changes, the protobuf module now builds.
4 years ago
|
|
|
|
obj = tname
|
|
|
|
|
} else {
|
|
|
|
|
named := namedType(obj.Type())
|
|
|
|
|
if named == nil {
|
|
|
|
|
return true // unnamed type (probably a basic type, e.g. int)
|
|
|
|
|
}
|
properly record when type aliases are embedded as fields
There are two scenarios when it comes to embedding fields.
The first is easy, and we always handled it well:
type Named struct { Foo int }
type T struct { Named }
In this scenario, T ends up with an embedded field named "Named",
and a promoted field named "Foo".
Then there's the form with a type alias:
type Named struct { Foo int }
type Alias = Named
type T struct { Alias }
This case is different: T ends up with an embedded field named "Alias",
and a promoted field named "Foo".
Note how the field gets its name from the referenced type,
even if said type is just an alias to another type.
This poses two problems.
First, we must obfuscate the field T.Alias as the name "Alias",
and not as the name "Named" that the alias points to.
Second, we must be careful of cases where Named and Alias are declared
in different packages, as they will obfuscate the same name differently.
Both of those problems compounded in the reported issue.
The actual reason is that quic-go has a type alias in the form of:
type ConnectionState = qtls.ConnectionState
In other words, the entire problem boils down to a type alias which
points to a named type in a different package, where both types share
the same name. For example:
package parent
import "parent/p1"
type T struct { p1.SameName }
[...]
package p1
import "parent/p2"
type SameName = p2.SameName
[...]
package p2
type SameName struct { Foo int }
This broke garble because we had a heuristic to detect when an embedded
field was a type alias:
// Instead, detect such a "foreign alias embed".
// If we embed a final named type,
// but the field name does not match its name,
// then it must have been done via an alias.
// We dig out the alias's TypeName via locateForeignAlias.
if named.Obj().Name() != node.Name {
As the reader can deduce, this heuristic would incorrectly assume that
the snippet above does not embed a type alias, when in fact it does.
When obfuscating the field T.SameName, which uses a type alias,
we would correctly obfuscate the name "SameName",
but we would incorrectly obfuscate it with the package p2, not p1.
This would then result in build errors.
To fix this problem for good, we need to get rid of the heuristic.
Instead, we now mimic what was done for KnownCannotObfuscate,
but for embedded fields which use type aliases.
KnownEmbeddedAliasFields is now filled for each package
and stored in the cache as part of cachedOutput.
We can then detect the "embedded alias" case reliably,
even when the field is declared in an imported package.
On the plus side, we get to remove locateForeignAlias.
We also add a couple of TODOs to record further improvements.
Finally, add a test.
Fixes #466.
3 years ago
|
|
|
|
obj = named.Obj()
|
|
|
|
|
}
|
|
|
|
|
pkg = obj.Pkg()
|
|
|
|
|
}
|
|
|
|
|
if pkg == nil {
|
|
|
|
|
return true // universe scope
|
|
|
|
|
}
|
|
|
|
|
|
avoid reflect method call panics with GOGARBLE=*
We were obfuscating reflect's package path and its declared names,
but the toolchain wants to detect the presence of method reflection
to turn down the aggressiveness of dead code elimination.
Given that the obfuscation broke the detection,
we could easily end up in crashes when making reflect calls:
fatal error: unreachable method called. linker bug?
goroutine 1 [running]:
runtime.throw({0x50c9b3?, 0x2?})
runtime/panic.go:1047 +0x5d fp=0xc000063660 sp=0xc000063630 pc=0x43245d
runtime.unreachableMethod()
runtime/iface.go:532 +0x25 fp=0xc000063680 sp=0xc000063660 pc=0x40a845
runtime.call16(0xc00010a360, 0xc00000e0a8, 0x0, 0x0, 0x0, 0x8, 0xc000063bb0)
runtime/wcS9OpRFL:728 +0x49 fp=0xc0000636a0 sp=0xc000063680 pc=0x45eae9
runtime.reflectcall(0xc00001c120?, 0x1?, 0x1?, 0x18110?, 0xc0?, 0x1?, 0x1?)
<autogenerated>:1 +0x3c fp=0xc0000636e0 sp=0xc0000636a0 pc=0x462e9c
Avoid obfuscating the three names which cause problems: "reflect",
"Method", and "MethodByName".
While here, we also teach obfuscatedImportPath to skip "runtime",
as I also saw that the toolchain detects it for many reasons.
That wasn't a problem yet, as we do not obfuscate the runtime,
but it was likely going to become a problem in the future.
2 years ago
|
|
|
|
// The Go toolchain needs to detect symbols from these packages,
|
|
|
|
|
// so we are not obfuscating their package paths or declared names.
|
|
|
|
|
switch pkg.Path() {
|
|
|
|
|
case "embed":
|
|
|
|
|
// FS is detected by the compiler for //go:embed.
|
|
|
|
|
return name == "FS"
|
|
|
|
|
case "reflect":
|
|
|
|
|
// Per the linker's deadcode.go docs,
|
|
|
|
|
// the Method and MethodByName methods are what drive the logic.
|
|
|
|
|
switch name {
|
|
|
|
|
case "Method", "MethodByName":
|
|
|
|
|
return true
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
clarify how each "cannot obfuscate" map works
We used to record all objects in cannotObfuscateNames,
and then we'd add the exported ones to KnownCannotObfuscate.
Instead, teach recordAsNotObfuscated to store each object in either
knownCannotObfuscateUnexported or KnownCannotObfuscate, but not both.
The former isn't cached so it uses in-memory pointers as keys,
and the latter uses the cross-process objectStrings like before.
Functionally, this is all the same, but with the difference that the map
indexed by types.Object will not contain objects already recorded in
KnownCannotObfuscate, reducing the amount of duplicate memory use.
While here, give recordIgnore a less ambiguous name,
and remove the second parameter as it was always tf.pkg.Path().
This also means we can compare *types.Package pointers directly.
Finally, add more TODOs for further improvement ideas.
It does mean that we end up with more TODOs than before,
even though I'm fixing one, but I reckon that's a good thing.
Recording these ideas can give first-time contributors ways to help,
and it ensures I don't forget about ideas just in my head.
3 years ago
|
|
|
|
// The package that declared this object did not obfuscate it.
|
stop loading obfuscated type information from deps
If package P1 imports package P2, P1 needs to know which names from P2
weren't obfuscated. For instance, if P2 declares T2 and does
"reflect.TypeOf(T2{...})", then P2 won't obfuscate the name T2, and
neither should P1.
This information should flow from P2 to P1, as P2 builds before
P1. We do this via obfuscatedTypesPackage; P1 loads the type information
of the obfuscated version of P2, and does a lookup for T2. If T2 exists,
then it wasn't obfuscated.
This mechanism has served us well, but it has downsides:
1) It wastes CPU; we load the type information for the entire package.
2) It's complex; for instance, we need KnownObjectFiles as an extra.
3) It makes our code harder to understand, as we load both the original
and obfuscated type informaiton.
Instead, we now have each package record what names were not obfuscated
as part of its cachedOuput file. Much like KnownObjectFiles, the map
records incrementally through the import graph, to avoid having to load
cachedOutput files for indirect dependencies.
We shouldn't need to worry about those maps getting large;
we only skip obfuscating declared names in a few uncommon scenarios,
such as the use of reflection or cgo's "//export".
Since go/types is relatively allocation-heavy, and the export files
contain a lot of data, we get a nice speed-up:
name old time/op new time/op delta
Build-16 11.5s ± 2% 11.1s ± 3% -3.77% (p=0.008 n=5+5)
name old bin-B new bin-B delta
Build-16 5.15M ± 0% 5.15M ± 0% ~ (all equal)
name old cached-time/op new cached-time/op delta
Build-16 375ms ± 3% 341ms ± 6% -8.96% (p=0.008 n=5+5)
name old sys-time/op new sys-time/op delta
Build-16 283ms ±17% 289ms ±13% ~ (p=0.841 n=5+5)
name old user-time/op new user-time/op delta
Build-16 687ms ± 6% 664ms ± 7% ~ (p=0.548 n=5+5)
Fixes #456.
Updates #475.
3 years ago
|
|
|
|
if recordedAsNotObfuscated(obj) {
|
|
|
|
|
return true
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// TODO(mvdan): investigate obfuscating these too.
|
|
|
|
|
filename := fset.Position(obj.Pos()).Filename
|
|
|
|
|
if strings.HasPrefix(filename, "_cgo_") || strings.Contains(filename, ".cgo1.") {
|
|
|
|
|
return true
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
path := pkg.Path()
|
refactor "current package" with TOOLEXEC_IMPORTPATH (#266)
Now that we've dropped support for Go 1.15.x, we can finally rely on
this environment variable for toolexec calls, present in Go 1.16.
Before, we had hacky ways of trying to figure out the current package's
import path, mostly from the -p flag. The biggest rough edge there was
that, for main packages, that was simply the package name, and not its
full import path.
To work around that, we had a restriction on a single main package, so
we could work around that issue. That restriction is now gone.
The new code is simpler, especially because we can set curPkg in a
single place for all toolexec transform funcs.
Since we can always rely on curPkg not being nil now, we can also start
reusing listedPackage.Private and avoid the majority of repeated calls
to isPrivate. The function is cheap, but still not free.
isPrivate itself can also get simpler. We no longer have to worry about
the "main" edge case. Plus, the sanity check for invalid package paths
is now unnecessary; we only got malformed paths from goobj2, and we now
require exact matches with the ImportPath field from "go list -json".
Another effect of clearing up the "main" edge case is that -debugdir now
uses the right directory for main packages. We also start using
consistent debugdir paths in the tests, for the sake of being easier to
read and maintain.
Finally, note that commandReverse did not need the extra call to "go
list -toolexec", as the "shared" call stored in the cache is enough. We
still call toolexecCmd to get said cache, which should probably be
simplified in a future PR.
While at it, replace the use of the "-std" compiler flag with the
Standard field from "go list -json".
4 years ago
|
|
|
|
lpkg, err := listPackage(path)
|
|
|
|
|
if err != nil {
|
|
|
|
|
panic(err) // shouldn't happen
|
|
|
|
|
}
|
deprecate using GOPRIVATE in favor of GOGARBLE (#427)
Piggybacking off of GOPRIVATE is great for a number of reasons:
* People tend to obfuscate private code, whose package paths will
generally be in GOPRIVATE already
* Its meaning and syntax are well understood
* It allows all the flexibility we need without adding our own env var
or config option
However, using GOPRIVATE directly has one main drawback.
It's fairly common to also want to obfuscate public dependencies,
to make the code in private packages even harder to follow.
However, using "GOPRIVATE=*" will result in two main downsides:
* GONOPROXY defaults to GOPRIVATE, so the proxy would be entirely disabled.
Downloading modules, such as when adding or updating dependencies,
or when the local cache is cold, can be less reliable.
* GONOSUMDB defaults to GOPRIVATE, so the sumdb would be entirely disabled.
Adding entries to go.sum, such as when adding or updating dependencies,
can be less secure.
We will continue to consume GOPRIVATE as a fallback,
but we now expect users to set GOGARBLE instead.
The new logic is documented in the README.
While here, rewrite some uses of "private" with "to obfuscate",
to make the code easier to follow and harder to misunderstand.
Fixes #276.
3 years ago
|
|
|
|
if !lpkg.ToObfuscate {
|
|
|
|
|
return true // we're not obfuscating this package
|
|
|
|
|
}
|
hash field names equally in all packages
Packages P1 and P2 can define identical struct types T1 and T2, and one
can convert from type T1 to T2 or vice versa.
The spec defines two identical struct types as:
Two struct types are identical if they have the same sequence of
fields, and if corresponding fields have the same names, and
identical types, and identical tags. Non-exported field names
from different packages are always different.
Unfortunately, garble broke this: since we obfuscated field names
differently depending on the package, cross-package conversions like the
case above would result in typechecking errors.
To fix this, implement Joe Tsai's idea: hash struct field names with the
string representation of the entire struct. This way, identical struct
types will have their field names obfuscated in the same way in all
packages across a build.
Note that we had to refactor "reverse" a bit to start using transformer,
since now it needs to keep track of struct types as well.
This failure was affecting the build of google.golang.org/protobuf,
since it makes regular use of cross-package struct conversions.
Note that the protobuf module still fails to build, but for other
reasons. The package that used to fail now succeeds, so the build gets a
bit further than before. #240 tracks adding relevant third-party Go
modules to CI, so we'll track the other remaining failures there.
Fixes #310.
4 years ago
|
|
|
|
hashToUse := lpkg.GarbleActionID
|
|
|
|
|
debugName := "variable"
|
|
|
|
|
|
|
|
|
|
// log.Printf("%s: %#v %T", fset.Position(node.Pos()), node, obj)
|
|
|
|
|
switch obj := obj.(type) {
|
|
|
|
|
case *types.Var:
|
fix a number of issues involving types from indirect imports
obfuscatedTypesPackage is used to figure out if a name in a dependency
package was obfuscated or not. For example, if that package used
reflection on a named type, it wasn't obfuscated, so we must have the
same information to not obfuscate the same name downstream.
obfuscatedTypesPackage could return nil if the package was indirectly
imported, though. This can happen if a direct import has a function that
returns an indirect type, or if a direct import exposes a name that's a
type alias to an indirect type.
We sort of dealt with this in two pieces of code by checking for
obfPkg!=nil, but a third one did not have this check and caused a panic
in the added test case:
--- FAIL: TestScripts/reflect (0.81s)
testscript.go:397:
> env GOPRIVATE=test/main
> garble build
[stderr]
# test/main
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x8a5e39]
More importantly though, the nil check only avoids panics. It doesn't
fix the root cause of the problem: that importcfg does not contain
indirectly imported packages. The added test case would still fail, as
we would obfuscate a type in the main package, but not in the indirectly
imported package where the type is defined.
To fix this, resurrect a bit of code from earlier garble versions, which
uses "go list -toolexec=garble" to fetch a package's export file. This
lets us fill the indirect import gaps in importcfg, working around the
problem entirely.
This solution is still not particularly great, so we add a TODO about
possibly rethinking this in the future. It does add some overhead and
complexity, though thankfully indirect imports should be uncommon.
This fixes a few panics while building the protobuf module.
4 years ago
|
|
|
|
if !obj.IsField() {
|
obfuscate all variable names, even local ones (#420)
In the added test case, "garble -literals build" would fail:
--- FAIL: TestScripts/literals (8.29s)
testscript.go:397:
> env GOPRIVATE=test/main
> garble -literals build
[stderr]
# test/main
Usz1FmFm.go:1: cannot call non-function string (type int), declared at Usz1FmFm.go:1
Usz1FmFm.go:1: string is not a type
Usz1FmFm.go:1: cannot call non-function append (type int), declared at Usz1FmFm.go:1
That is, for input code such as:
var append int
println("foo")
_ = append
We'd end up with obfuscated code like:
var append int
println(func() string {
// obfuscation...
x = append(x, ...)
// obfuscation...
return string(x)
})
_ = append
Which would then break, as the code is shadowing the "append" builtin.
To work around this, always obfuscate variable names, so we end up with:
var mwu1xuNz int
println(func() string {
// obfuscation...
x = append(x, ...)
// obfuscation...
return string(x)
})
_ = mwu1xuNz
This change shouldn't make the quality of our obfuscation stronger,
as local variable names do not currently end up in Go binaries.
However, this does make garble more consistent in treating identifiers,
and it completely avoids any issues related to shadowing builtins.
Moreover, this also paves the way for publishing obfuscated source code,
such as #369.
Fixes #417.
3 years ago
|
|
|
|
// Identifiers denoting variables are always obfuscated.
|
fix a number of issues involving types from indirect imports
obfuscatedTypesPackage is used to figure out if a name in a dependency
package was obfuscated or not. For example, if that package used
reflection on a named type, it wasn't obfuscated, so we must have the
same information to not obfuscate the same name downstream.
obfuscatedTypesPackage could return nil if the package was indirectly
imported, though. This can happen if a direct import has a function that
returns an indirect type, or if a direct import exposes a name that's a
type alias to an indirect type.
We sort of dealt with this in two pieces of code by checking for
obfPkg!=nil, but a third one did not have this check and caused a panic
in the added test case:
--- FAIL: TestScripts/reflect (0.81s)
testscript.go:397:
> env GOPRIVATE=test/main
> garble build
[stderr]
# test/main
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x8a5e39]
More importantly though, the nil check only avoids panics. It doesn't
fix the root cause of the problem: that importcfg does not contain
indirectly imported packages. The added test case would still fail, as
we would obfuscate a type in the main package, but not in the indirectly
imported package where the type is defined.
To fix this, resurrect a bit of code from earlier garble versions, which
uses "go list -toolexec=garble" to fetch a package's export file. This
lets us fill the indirect import gaps in importcfg, working around the
problem entirely.
This solution is still not particularly great, so we add a TODO about
possibly rethinking this in the future. It does add some overhead and
complexity, though thankfully indirect imports should be uncommon.
This fixes a few panics while building the protobuf module.
4 years ago
|
|
|
|
break
|
|
|
|
|
}
|
|
|
|
|
debugName = "field"
|
fix a number of issues involving types from indirect imports
obfuscatedTypesPackage is used to figure out if a name in a dependency
package was obfuscated or not. For example, if that package used
reflection on a named type, it wasn't obfuscated, so we must have the
same information to not obfuscate the same name downstream.
obfuscatedTypesPackage could return nil if the package was indirectly
imported, though. This can happen if a direct import has a function that
returns an indirect type, or if a direct import exposes a name that's a
type alias to an indirect type.
We sort of dealt with this in two pieces of code by checking for
obfPkg!=nil, but a third one did not have this check and caused a panic
in the added test case:
--- FAIL: TestScripts/reflect (0.81s)
testscript.go:397:
> env GOPRIVATE=test/main
> garble build
[stderr]
# test/main
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x8a5e39]
More importantly though, the nil check only avoids panics. It doesn't
fix the root cause of the problem: that importcfg does not contain
indirectly imported packages. The added test case would still fail, as
we would obfuscate a type in the main package, but not in the indirectly
imported package where the type is defined.
To fix this, resurrect a bit of code from earlier garble versions, which
uses "go list -toolexec=garble" to fetch a package's export file. This
lets us fill the indirect import gaps in importcfg, working around the
problem entirely.
This solution is still not particularly great, so we add a TODO about
possibly rethinking this in the future. It does add some overhead and
complexity, though thankfully indirect imports should be uncommon.
This fixes a few panics while building the protobuf module.
4 years ago
|
|
|
|
// From this point on, we deal with struct fields.
|
|
|
|
|
|
hash field names equally in all packages
Packages P1 and P2 can define identical struct types T1 and T2, and one
can convert from type T1 to T2 or vice versa.
The spec defines two identical struct types as:
Two struct types are identical if they have the same sequence of
fields, and if corresponding fields have the same names, and
identical types, and identical tags. Non-exported field names
from different packages are always different.
Unfortunately, garble broke this: since we obfuscated field names
differently depending on the package, cross-package conversions like the
case above would result in typechecking errors.
To fix this, implement Joe Tsai's idea: hash struct field names with the
string representation of the entire struct. This way, identical struct
types will have their field names obfuscated in the same way in all
packages across a build.
Note that we had to refactor "reverse" a bit to start using transformer,
since now it needs to keep track of struct types as well.
This failure was affecting the build of google.golang.org/protobuf,
since it makes regular use of cross-package struct conversions.
Note that the protobuf module still fails to build, but for other
reasons. The package that used to fail now succeeds, so the build gets a
bit further than before. #240 tracks adding relevant third-party Go
modules to CI, so we'll track the other remaining failures there.
Fixes #310.
4 years ago
|
|
|
|
// Fields don't get hashed with the package's action ID.
|
|
|
|
|
// They get hashed with the type of their parent struct.
|
|
|
|
|
// This is because one struct can be converted to another,
|
|
|
|
|
// as long as the underlying types are identical,
|
|
|
|
|
// even if the structs are defined in different packages.
|
|
|
|
|
//
|
|
|
|
|
// TODO: Consider only doing this for structs where all
|
|
|
|
|
// fields are exported. We only need this special case
|
|
|
|
|
// for cross-package conversions, which can't work if
|
|
|
|
|
// any field is unexported. If that is done, add a test
|
|
|
|
|
// that ensures unexported fields from different
|
|
|
|
|
// packages result in different obfuscated names.
|
fix a number of issues involving types from indirect imports
obfuscatedTypesPackage is used to figure out if a name in a dependency
package was obfuscated or not. For example, if that package used
reflection on a named type, it wasn't obfuscated, so we must have the
same information to not obfuscate the same name downstream.
obfuscatedTypesPackage could return nil if the package was indirectly
imported, though. This can happen if a direct import has a function that
returns an indirect type, or if a direct import exposes a name that's a
type alias to an indirect type.
We sort of dealt with this in two pieces of code by checking for
obfPkg!=nil, but a third one did not have this check and caused a panic
in the added test case:
--- FAIL: TestScripts/reflect (0.81s)
testscript.go:397:
> env GOPRIVATE=test/main
> garble build
[stderr]
# test/main
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x8a5e39]
More importantly though, the nil check only avoids panics. It doesn't
fix the root cause of the problem: that importcfg does not contain
indirectly imported packages. The added test case would still fail, as
we would obfuscate a type in the main package, but not in the indirectly
imported package where the type is defined.
To fix this, resurrect a bit of code from earlier garble versions, which
uses "go list -toolexec=garble" to fetch a package's export file. This
lets us fill the indirect import gaps in importcfg, working around the
problem entirely.
This solution is still not particularly great, so we add a TODO about
possibly rethinking this in the future. It does add some overhead and
complexity, though thankfully indirect imports should be uncommon.
This fixes a few panics while building the protobuf module.
4 years ago
|
|
|
|
strct := tf.fieldToStruct[obj]
|
|
|
|
|
if strct == nil {
|
|
|
|
|
panic("could not find for " + name)
|
hash field names equally in all packages
Packages P1 and P2 can define identical struct types T1 and T2, and one
can convert from type T1 to T2 or vice versa.
The spec defines two identical struct types as:
Two struct types are identical if they have the same sequence of
fields, and if corresponding fields have the same names, and
identical types, and identical tags. Non-exported field names
from different packages are always different.
Unfortunately, garble broke this: since we obfuscated field names
differently depending on the package, cross-package conversions like the
case above would result in typechecking errors.
To fix this, implement Joe Tsai's idea: hash struct field names with the
string representation of the entire struct. This way, identical struct
types will have their field names obfuscated in the same way in all
packages across a build.
Note that we had to refactor "reverse" a bit to start using transformer,
since now it needs to keep track of struct types as well.
This failure was affecting the build of google.golang.org/protobuf,
since it makes regular use of cross-package struct conversions.
Note that the protobuf module still fails to build, but for other
reasons. The package that used to fail now succeeds, so the build gets a
bit further than before. #240 tracks adding relevant third-party Go
modules to CI, so we'll track the other remaining failures there.
Fixes #310.
4 years ago
|
|
|
|
}
|
|
|
|
|
node.Name = hashWithStruct(strct, name)
|
|
|
|
|
if flagDebug { // TODO(mvdan): remove once https://go.dev/issue/53465 if fixed
|
|
|
|
|
log.Printf("%s %q hashed with struct fields to %q", debugName, name, node.Name)
|
|
|
|
|
}
|
|
|
|
|
return true
|
hash field names equally in all packages
Packages P1 and P2 can define identical struct types T1 and T2, and one
can convert from type T1 to T2 or vice versa.
The spec defines two identical struct types as:
Two struct types are identical if they have the same sequence of
fields, and if corresponding fields have the same names, and
identical types, and identical tags. Non-exported field names
from different packages are always different.
Unfortunately, garble broke this: since we obfuscated field names
differently depending on the package, cross-package conversions like the
case above would result in typechecking errors.
To fix this, implement Joe Tsai's idea: hash struct field names with the
string representation of the entire struct. This way, identical struct
types will have their field names obfuscated in the same way in all
packages across a build.
Note that we had to refactor "reverse" a bit to start using transformer,
since now it needs to keep track of struct types as well.
This failure was affecting the build of google.golang.org/protobuf,
since it makes regular use of cross-package struct conversions.
Note that the protobuf module still fails to build, but for other
reasons. The package that used to fail now succeeds, so the build gets a
bit further than before. #240 tracks adding relevant third-party Go
modules to CI, so we'll track the other remaining failures there.
Fixes #310.
4 years ago
|
|
|
|
|
|
|
|
|
case *types.TypeName:
|
|
|
|
|
debugName = "type"
|
|
|
|
|
case *types.Func:
|
|
|
|
|
sign := obj.Type().(*types.Signature)
|
|
|
|
|
if sign.Recv() == nil {
|
|
|
|
|
debugName = "func"
|
|
|
|
|
} else {
|
|
|
|
|
debugName = "method"
|
|
|
|
|
}
|
|
|
|
|
if obj.Exported() && sign.Recv() != nil {
|
|
|
|
|
return true // might implement an interface
|
|
|
|
|
}
|
|
|
|
|
switch name {
|
|
|
|
|
case "main", "init", "TestMain":
|
|
|
|
|
return true // don't break them
|
|
|
|
|
}
|
|
|
|
|
if strings.HasPrefix(name, "Test") && isTestSignature(sign) {
|
|
|
|
|
return true // don't break tests
|
|
|
|
|
}
|
|
|
|
|
default:
|
|
|
|
|
return true // we only want to rename the above
|
|
|
|
|
}
|
fix garbling names belonging to indirect imports (#203)
main.go includes a lengthy comment that documents this edge case, why it
happened, and how we are fixing it. To summarize, we should no longer
error with a build error in those cases. Read the comment for details.
A few other minor changes were done to allow writing this patch.
First, the actionID and contentID funcs were renamed, since they started
to collide with variable names.
Second, the logging has been improved a bit, which allowed me to debug
the issue.
Third, the "cache" global shared by all garble sub-processes now
includes the necessary parameters to run "go list -toolexec", including
the path to garble and the build flags being used.
Thanks to lu4p for writing a test case, which also applied gofmt to that
testdata Go file.
Fixes #180.
Closes #181, since it includes its test case.
4 years ago
|
|
|
|
|
|
|
|
|
node.Name = hashWithPackage(lpkg, name)
|
|
|
|
|
// TODO: probably move the debugf lines inside the hash funcs
|
|
|
|
|
if flagDebug { // TODO(mvdan): remove once https://go.dev/issue/53465 if fixed
|
|
|
|
|
log.Printf("%s %q hashed with %x… to %q", debugName, name, hashToUse[:4], node.Name)
|
|
|
|
|
}
|
|
|
|
|
return true
|
|
|
|
|
}
|
|
|
|
|
post := func(cursor *astutil.Cursor) bool {
|
|
|
|
|
imp, ok := cursor.Node().(*ast.ImportSpec)
|
|
|
|
|
if !ok {
|
|
|
|
|
return true
|
|
|
|
|
}
|
|
|
|
|
path, err := strconv.Unquote(imp.Path.Value)
|
|
|
|
|
if err != nil {
|
|
|
|
|
panic(err) // should never happen
|
|
|
|
|
}
|
|
|
|
|
// We're importing an obfuscated package.
|
|
|
|
|
// Replace the import path with its obfuscated version.
|
|
|
|
|
// If the import was unnamed, give it the name of the
|
|
|
|
|
// original package name, to keep references working.
|
|
|
|
|
lpkg, err := listPackage(path)
|
|
|
|
|
if err != nil {
|
|
|
|
|
panic(err) // should never happen
|
|
|
|
|
}
|
deprecate using GOPRIVATE in favor of GOGARBLE (#427)
Piggybacking off of GOPRIVATE is great for a number of reasons:
* People tend to obfuscate private code, whose package paths will
generally be in GOPRIVATE already
* Its meaning and syntax are well understood
* It allows all the flexibility we need without adding our own env var
or config option
However, using GOPRIVATE directly has one main drawback.
It's fairly common to also want to obfuscate public dependencies,
to make the code in private packages even harder to follow.
However, using "GOPRIVATE=*" will result in two main downsides:
* GONOPROXY defaults to GOPRIVATE, so the proxy would be entirely disabled.
Downloading modules, such as when adding or updating dependencies,
or when the local cache is cold, can be less reliable.
* GONOSUMDB defaults to GOPRIVATE, so the sumdb would be entirely disabled.
Adding entries to go.sum, such as when adding or updating dependencies,
can be less secure.
We will continue to consume GOPRIVATE as a fallback,
but we now expect users to set GOGARBLE instead.
The new logic is documented in the README.
While here, rewrite some uses of "private" with "to obfuscate",
to make the code easier to follow and harder to misunderstand.
Fixes #276.
3 years ago
|
|
|
|
if !lpkg.ToObfuscate {
|
|
|
|
|
return true
|
|
|
|
|
}
|
|
|
|
|
if lpkg.Name != "main" {
|
|
|
|
|
newPath := lpkg.obfuscatedImportPath()
|
|
|
|
|
imp.Path.Value = strconv.Quote(newPath)
|
|
|
|
|
}
|
|
|
|
|
if imp.Name == nil {
|
|
|
|
|
imp.Name = &ast.Ident{
|
|
|
|
|
NamePos: imp.Path.ValuePos, // ensure it ends up on the same line
|
|
|
|
|
Name: lpkg.Name,
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
return true
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
return astutil.Apply(file, pre, post).(*ast.File)
|
|
|
|
|
}
|
|
|
|
|
|
clarify how each "cannot obfuscate" map works
We used to record all objects in cannotObfuscateNames,
and then we'd add the exported ones to KnownCannotObfuscate.
Instead, teach recordAsNotObfuscated to store each object in either
knownCannotObfuscateUnexported or KnownCannotObfuscate, but not both.
The former isn't cached so it uses in-memory pointers as keys,
and the latter uses the cross-process objectStrings like before.
Functionally, this is all the same, but with the difference that the map
indexed by types.Object will not contain objects already recorded in
KnownCannotObfuscate, reducing the amount of duplicate memory use.
While here, give recordIgnore a less ambiguous name,
and remove the second parameter as it was always tf.pkg.Path().
This also means we can compare *types.Package pointers directly.
Finally, add more TODOs for further improvement ideas.
It does mean that we end up with more TODOs than before,
even though I'm fixing one, but I reckon that's a good thing.
Recording these ideas can give first-time contributors ways to help,
and it ensures I don't forget about ideas just in my head.
3 years ago
|
|
|
|
// recursivelyRecordAsNotObfuscated calls recordAsNotObfuscated on any named
|
|
|
|
|
// types and fields under typ.
|
record types into ignoreObjects more reliably
Our previous logic only took care of fairly simple types, such as a
simple struct or a pointer to a struct. If we had a struct embedding
another struct, we'd fail to record the objects for the fields in the
inner struct, and that would lead to miscompilation:
> garble build
[stderr]
# test/main
LZmt64Nm.go:7: outer.InnerField undefined (type *CcUt1wkQ.EmbeddingOuter has no field or method InnerField)
To fix this issue, make the function that records all objects under a
types.Type smarter. Since it now does more than just dealing with
structs, it's also renamed.
Since the function now walks types properly, we get to remove the extra
ast.Inspect in recordReflectArgs, which is nice.
We also make it a method, to avoid the map parameter. A boolean
parameter is also added, since we need this feature to only look at the
current package when looking at reflect calls.
Finally, we add a test case, a simplified version of the original bug
report.
Fixes #315.
4 years ago
|
|
|
|
//
|
clarify how each "cannot obfuscate" map works
We used to record all objects in cannotObfuscateNames,
and then we'd add the exported ones to KnownCannotObfuscate.
Instead, teach recordAsNotObfuscated to store each object in either
knownCannotObfuscateUnexported or KnownCannotObfuscate, but not both.
The former isn't cached so it uses in-memory pointers as keys,
and the latter uses the cross-process objectStrings like before.
Functionally, this is all the same, but with the difference that the map
indexed by types.Object will not contain objects already recorded in
KnownCannotObfuscate, reducing the amount of duplicate memory use.
While here, give recordIgnore a less ambiguous name,
and remove the second parameter as it was always tf.pkg.Path().
This also means we can compare *types.Package pointers directly.
Finally, add more TODOs for further improvement ideas.
It does mean that we end up with more TODOs than before,
even though I'm fixing one, but I reckon that's a good thing.
Recording these ideas can give first-time contributors ways to help,
and it ensures I don't forget about ideas just in my head.
3 years ago
|
|
|
|
// Only the names declared in the current package are recorded. This is to ensure
|
|
|
|
|
// that reflection detection only happens within the package declaring a type.
|
|
|
|
|
// Detecting it in downstream packages could result in inconsistencies.
|
clarify how each "cannot obfuscate" map works
We used to record all objects in cannotObfuscateNames,
and then we'd add the exported ones to KnownCannotObfuscate.
Instead, teach recordAsNotObfuscated to store each object in either
knownCannotObfuscateUnexported or KnownCannotObfuscate, but not both.
The former isn't cached so it uses in-memory pointers as keys,
and the latter uses the cross-process objectStrings like before.
Functionally, this is all the same, but with the difference that the map
indexed by types.Object will not contain objects already recorded in
KnownCannotObfuscate, reducing the amount of duplicate memory use.
While here, give recordIgnore a less ambiguous name,
and remove the second parameter as it was always tf.pkg.Path().
This also means we can compare *types.Package pointers directly.
Finally, add more TODOs for further improvement ideas.
It does mean that we end up with more TODOs than before,
even though I'm fixing one, but I reckon that's a good thing.
Recording these ideas can give first-time contributors ways to help,
and it ensures I don't forget about ideas just in my head.
3 years ago
|
|
|
|
func (tf *transformer) recursivelyRecordAsNotObfuscated(t types.Type) {
|
record types into ignoreObjects more reliably
Our previous logic only took care of fairly simple types, such as a
simple struct or a pointer to a struct. If we had a struct embedding
another struct, we'd fail to record the objects for the fields in the
inner struct, and that would lead to miscompilation:
> garble build
[stderr]
# test/main
LZmt64Nm.go:7: outer.InnerField undefined (type *CcUt1wkQ.EmbeddingOuter has no field or method InnerField)
To fix this issue, make the function that records all objects under a
types.Type smarter. Since it now does more than just dealing with
structs, it's also renamed.
Since the function now walks types properly, we get to remove the extra
ast.Inspect in recordReflectArgs, which is nice.
We also make it a method, to avoid the map parameter. A boolean
parameter is also added, since we need this feature to only look at the
current package when looking at reflect calls.
Finally, we add a test case, a simplified version of the original bug
report.
Fixes #315.
4 years ago
|
|
|
|
switch t := t.(type) {
|
|
|
|
|
case *types.Named:
|
|
|
|
|
obj := t.Obj()
|
clarify how each "cannot obfuscate" map works
We used to record all objects in cannotObfuscateNames,
and then we'd add the exported ones to KnownCannotObfuscate.
Instead, teach recordAsNotObfuscated to store each object in either
knownCannotObfuscateUnexported or KnownCannotObfuscate, but not both.
The former isn't cached so it uses in-memory pointers as keys,
and the latter uses the cross-process objectStrings like before.
Functionally, this is all the same, but with the difference that the map
indexed by types.Object will not contain objects already recorded in
KnownCannotObfuscate, reducing the amount of duplicate memory use.
While here, give recordIgnore a less ambiguous name,
and remove the second parameter as it was always tf.pkg.Path().
This also means we can compare *types.Package pointers directly.
Finally, add more TODOs for further improvement ideas.
It does mean that we end up with more TODOs than before,
even though I'm fixing one, but I reckon that's a good thing.
Recording these ideas can give first-time contributors ways to help,
and it ensures I don't forget about ideas just in my head.
3 years ago
|
|
|
|
if obj.Pkg() == nil || obj.Pkg() != tf.pkg {
|
|
|
|
|
return // not from the specified package
|
record types into ignoreObjects more reliably
Our previous logic only took care of fairly simple types, such as a
simple struct or a pointer to a struct. If we had a struct embedding
another struct, we'd fail to record the objects for the fields in the
inner struct, and that would lead to miscompilation:
> garble build
[stderr]
# test/main
LZmt64Nm.go:7: outer.InnerField undefined (type *CcUt1wkQ.EmbeddingOuter has no field or method InnerField)
To fix this issue, make the function that records all objects under a
types.Type smarter. Since it now does more than just dealing with
structs, it's also renamed.
Since the function now walks types properly, we get to remove the extra
ast.Inspect in recordReflectArgs, which is nice.
We also make it a method, to avoid the map parameter. A boolean
parameter is also added, since we need this feature to only look at the
current package when looking at reflect calls.
Finally, we add a test case, a simplified version of the original bug
report.
Fixes #315.
4 years ago
|
|
|
|
}
|
clarify how each "cannot obfuscate" map works
We used to record all objects in cannotObfuscateNames,
and then we'd add the exported ones to KnownCannotObfuscate.
Instead, teach recordAsNotObfuscated to store each object in either
knownCannotObfuscateUnexported or KnownCannotObfuscate, but not both.
The former isn't cached so it uses in-memory pointers as keys,
and the latter uses the cross-process objectStrings like before.
Functionally, this is all the same, but with the difference that the map
indexed by types.Object will not contain objects already recorded in
KnownCannotObfuscate, reducing the amount of duplicate memory use.
While here, give recordIgnore a less ambiguous name,
and remove the second parameter as it was always tf.pkg.Path().
This also means we can compare *types.Package pointers directly.
Finally, add more TODOs for further improvement ideas.
It does mean that we end up with more TODOs than before,
even though I'm fixing one, but I reckon that's a good thing.
Recording these ideas can give first-time contributors ways to help,
and it ensures I don't forget about ideas just in my head.
3 years ago
|
|
|
|
if recordedAsNotObfuscated(obj) {
|
record types into ignoreObjects more reliably
Our previous logic only took care of fairly simple types, such as a
simple struct or a pointer to a struct. If we had a struct embedding
another struct, we'd fail to record the objects for the fields in the
inner struct, and that would lead to miscompilation:
> garble build
[stderr]
# test/main
LZmt64Nm.go:7: outer.InnerField undefined (type *CcUt1wkQ.EmbeddingOuter has no field or method InnerField)
To fix this issue, make the function that records all objects under a
types.Type smarter. Since it now does more than just dealing with
structs, it's also renamed.
Since the function now walks types properly, we get to remove the extra
ast.Inspect in recordReflectArgs, which is nice.
We also make it a method, to avoid the map parameter. A boolean
parameter is also added, since we need this feature to only look at the
current package when looking at reflect calls.
Finally, we add a test case, a simplified version of the original bug
report.
Fixes #315.
4 years ago
|
|
|
|
return // prevent endless recursion
|
|
|
|
|
}
|
clarify how each "cannot obfuscate" map works
We used to record all objects in cannotObfuscateNames,
and then we'd add the exported ones to KnownCannotObfuscate.
Instead, teach recordAsNotObfuscated to store each object in either
knownCannotObfuscateUnexported or KnownCannotObfuscate, but not both.
The former isn't cached so it uses in-memory pointers as keys,
and the latter uses the cross-process objectStrings like before.
Functionally, this is all the same, but with the difference that the map
indexed by types.Object will not contain objects already recorded in
KnownCannotObfuscate, reducing the amount of duplicate memory use.
While here, give recordIgnore a less ambiguous name,
and remove the second parameter as it was always tf.pkg.Path().
This also means we can compare *types.Package pointers directly.
Finally, add more TODOs for further improvement ideas.
It does mean that we end up with more TODOs than before,
even though I'm fixing one, but I reckon that's a good thing.
Recording these ideas can give first-time contributors ways to help,
and it ensures I don't forget about ideas just in my head.
3 years ago
|
|
|
|
recordAsNotObfuscated(obj)
|
record types into ignoreObjects more reliably
Our previous logic only took care of fairly simple types, such as a
simple struct or a pointer to a struct. If we had a struct embedding
another struct, we'd fail to record the objects for the fields in the
inner struct, and that would lead to miscompilation:
> garble build
[stderr]
# test/main
LZmt64Nm.go:7: outer.InnerField undefined (type *CcUt1wkQ.EmbeddingOuter has no field or method InnerField)
To fix this issue, make the function that records all objects under a
types.Type smarter. Since it now does more than just dealing with
structs, it's also renamed.
Since the function now walks types properly, we get to remove the extra
ast.Inspect in recordReflectArgs, which is nice.
We also make it a method, to avoid the map parameter. A boolean
parameter is also added, since we need this feature to only look at the
current package when looking at reflect calls.
Finally, we add a test case, a simplified version of the original bug
report.
Fixes #315.
4 years ago
|
|
|
|
|
|
|
|
|
// Record the underlying type, too.
|
clarify how each "cannot obfuscate" map works
We used to record all objects in cannotObfuscateNames,
and then we'd add the exported ones to KnownCannotObfuscate.
Instead, teach recordAsNotObfuscated to store each object in either
knownCannotObfuscateUnexported or KnownCannotObfuscate, but not both.
The former isn't cached so it uses in-memory pointers as keys,
and the latter uses the cross-process objectStrings like before.
Functionally, this is all the same, but with the difference that the map
indexed by types.Object will not contain objects already recorded in
KnownCannotObfuscate, reducing the amount of duplicate memory use.
While here, give recordIgnore a less ambiguous name,
and remove the second parameter as it was always tf.pkg.Path().
This also means we can compare *types.Package pointers directly.
Finally, add more TODOs for further improvement ideas.
It does mean that we end up with more TODOs than before,
even though I'm fixing one, but I reckon that's a good thing.
Recording these ideas can give first-time contributors ways to help,
and it ensures I don't forget about ideas just in my head.
3 years ago
|
|
|
|
tf.recursivelyRecordAsNotObfuscated(t.Underlying())
|
record types into ignoreObjects more reliably
Our previous logic only took care of fairly simple types, such as a
simple struct or a pointer to a struct. If we had a struct embedding
another struct, we'd fail to record the objects for the fields in the
inner struct, and that would lead to miscompilation:
> garble build
[stderr]
# test/main
LZmt64Nm.go:7: outer.InnerField undefined (type *CcUt1wkQ.EmbeddingOuter has no field or method InnerField)
To fix this issue, make the function that records all objects under a
types.Type smarter. Since it now does more than just dealing with
structs, it's also renamed.
Since the function now walks types properly, we get to remove the extra
ast.Inspect in recordReflectArgs, which is nice.
We also make it a method, to avoid the map parameter. A boolean
parameter is also added, since we need this feature to only look at the
current package when looking at reflect calls.
Finally, we add a test case, a simplified version of the original bug
report.
Fixes #315.
4 years ago
|
|
|
|
|
|
|
|
|
case *types.Struct:
|
|
|
|
|
for i := 0; i < t.NumFields(); i++ {
|
|
|
|
|
field := t.Field(i)
|
|
|
|
|
|
|
|
|
|
// This check is similar to the one in *types.Named.
|
|
|
|
|
// It's necessary for unnamed struct types,
|
|
|
|
|
// as they aren't named but still have named fields.
|
clarify how each "cannot obfuscate" map works
We used to record all objects in cannotObfuscateNames,
and then we'd add the exported ones to KnownCannotObfuscate.
Instead, teach recordAsNotObfuscated to store each object in either
knownCannotObfuscateUnexported or KnownCannotObfuscate, but not both.
The former isn't cached so it uses in-memory pointers as keys,
and the latter uses the cross-process objectStrings like before.
Functionally, this is all the same, but with the difference that the map
indexed by types.Object will not contain objects already recorded in
KnownCannotObfuscate, reducing the amount of duplicate memory use.
While here, give recordIgnore a less ambiguous name,
and remove the second parameter as it was always tf.pkg.Path().
This also means we can compare *types.Package pointers directly.
Finally, add more TODOs for further improvement ideas.
It does mean that we end up with more TODOs than before,
even though I'm fixing one, but I reckon that's a good thing.
Recording these ideas can give first-time contributors ways to help,
and it ensures I don't forget about ideas just in my head.
3 years ago
|
|
|
|
if field.Pkg() == nil || field.Pkg() != tf.pkg {
|
|
|
|
|
return // not from the specified package
|
|
|
|
|
}
|
|
|
|
|
|
record types into ignoreObjects more reliably
Our previous logic only took care of fairly simple types, such as a
simple struct or a pointer to a struct. If we had a struct embedding
another struct, we'd fail to record the objects for the fields in the
inner struct, and that would lead to miscompilation:
> garble build
[stderr]
# test/main
LZmt64Nm.go:7: outer.InnerField undefined (type *CcUt1wkQ.EmbeddingOuter has no field or method InnerField)
To fix this issue, make the function that records all objects under a
types.Type smarter. Since it now does more than just dealing with
structs, it's also renamed.
Since the function now walks types properly, we get to remove the extra
ast.Inspect in recordReflectArgs, which is nice.
We also make it a method, to avoid the map parameter. A boolean
parameter is also added, since we need this feature to only look at the
current package when looking at reflect calls.
Finally, we add a test case, a simplified version of the original bug
report.
Fixes #315.
4 years ago
|
|
|
|
// Record the field itself, too.
|
clarify how each "cannot obfuscate" map works
We used to record all objects in cannotObfuscateNames,
and then we'd add the exported ones to KnownCannotObfuscate.
Instead, teach recordAsNotObfuscated to store each object in either
knownCannotObfuscateUnexported or KnownCannotObfuscate, but not both.
The former isn't cached so it uses in-memory pointers as keys,
and the latter uses the cross-process objectStrings like before.
Functionally, this is all the same, but with the difference that the map
indexed by types.Object will not contain objects already recorded in
KnownCannotObfuscate, reducing the amount of duplicate memory use.
While here, give recordIgnore a less ambiguous name,
and remove the second parameter as it was always tf.pkg.Path().
This also means we can compare *types.Package pointers directly.
Finally, add more TODOs for further improvement ideas.
It does mean that we end up with more TODOs than before,
even though I'm fixing one, but I reckon that's a good thing.
Recording these ideas can give first-time contributors ways to help,
and it ensures I don't forget about ideas just in my head.
3 years ago
|
|
|
|
recordAsNotObfuscated(field)
|
record types into ignoreObjects more reliably
Our previous logic only took care of fairly simple types, such as a
simple struct or a pointer to a struct. If we had a struct embedding
another struct, we'd fail to record the objects for the fields in the
inner struct, and that would lead to miscompilation:
> garble build
[stderr]
# test/main
LZmt64Nm.go:7: outer.InnerField undefined (type *CcUt1wkQ.EmbeddingOuter has no field or method InnerField)
To fix this issue, make the function that records all objects under a
types.Type smarter. Since it now does more than just dealing with
structs, it's also renamed.
Since the function now walks types properly, we get to remove the extra
ast.Inspect in recordReflectArgs, which is nice.
We also make it a method, to avoid the map parameter. A boolean
parameter is also added, since we need this feature to only look at the
current package when looking at reflect calls.
Finally, we add a test case, a simplified version of the original bug
report.
Fixes #315.
4 years ago
|
|
|
|
|
clarify how each "cannot obfuscate" map works
We used to record all objects in cannotObfuscateNames,
and then we'd add the exported ones to KnownCannotObfuscate.
Instead, teach recordAsNotObfuscated to store each object in either
knownCannotObfuscateUnexported or KnownCannotObfuscate, but not both.
The former isn't cached so it uses in-memory pointers as keys,
and the latter uses the cross-process objectStrings like before.
Functionally, this is all the same, but with the difference that the map
indexed by types.Object will not contain objects already recorded in
KnownCannotObfuscate, reducing the amount of duplicate memory use.
While here, give recordIgnore a less ambiguous name,
and remove the second parameter as it was always tf.pkg.Path().
This also means we can compare *types.Package pointers directly.
Finally, add more TODOs for further improvement ideas.
It does mean that we end up with more TODOs than before,
even though I'm fixing one, but I reckon that's a good thing.
Recording these ideas can give first-time contributors ways to help,
and it ensures I don't forget about ideas just in my head.
3 years ago
|
|
|
|
tf.recursivelyRecordAsNotObfuscated(field.Type())
|
record types into ignoreObjects more reliably
Our previous logic only took care of fairly simple types, such as a
simple struct or a pointer to a struct. If we had a struct embedding
another struct, we'd fail to record the objects for the fields in the
inner struct, and that would lead to miscompilation:
> garble build
[stderr]
# test/main
LZmt64Nm.go:7: outer.InnerField undefined (type *CcUt1wkQ.EmbeddingOuter has no field or method InnerField)
To fix this issue, make the function that records all objects under a
types.Type smarter. Since it now does more than just dealing with
structs, it's also renamed.
Since the function now walks types properly, we get to remove the extra
ast.Inspect in recordReflectArgs, which is nice.
We also make it a method, to avoid the map parameter. A boolean
parameter is also added, since we need this feature to only look at the
current package when looking at reflect calls.
Finally, we add a test case, a simplified version of the original bug
report.
Fixes #315.
4 years ago
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
case interface{ Elem() types.Type }:
|
|
|
|
|
// Get past pointers, slices, etc.
|
clarify how each "cannot obfuscate" map works
We used to record all objects in cannotObfuscateNames,
and then we'd add the exported ones to KnownCannotObfuscate.
Instead, teach recordAsNotObfuscated to store each object in either
knownCannotObfuscateUnexported or KnownCannotObfuscate, but not both.
The former isn't cached so it uses in-memory pointers as keys,
and the latter uses the cross-process objectStrings like before.
Functionally, this is all the same, but with the difference that the map
indexed by types.Object will not contain objects already recorded in
KnownCannotObfuscate, reducing the amount of duplicate memory use.
While here, give recordIgnore a less ambiguous name,
and remove the second parameter as it was always tf.pkg.Path().
This also means we can compare *types.Package pointers directly.
Finally, add more TODOs for further improvement ideas.
It does mean that we end up with more TODOs than before,
even though I'm fixing one, but I reckon that's a good thing.
Recording these ideas can give first-time contributors ways to help,
and it ensures I don't forget about ideas just in my head.
3 years ago
|
|
|
|
tf.recursivelyRecordAsNotObfuscated(t.Elem())
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// named tries to obtain the *types.Named behind a type, if there is one.
|
|
|
|
|
// This is useful to obtain "testing.T" from "*testing.T", or to obtain the type
|
|
|
|
|
// declaration object from an embedded field.
|
|
|
|
|
func namedType(t types.Type) *types.Named {
|
|
|
|
|
switch t := t.(type) {
|
|
|
|
|
case *types.Named:
|
|
|
|
|
return t
|
|
|
|
|
case interface{ Elem() types.Type }:
|
|
|
|
|
return namedType(t.Elem())
|
|
|
|
|
default:
|
|
|
|
|
return nil
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// isTestSignature returns true if the signature matches "func _(*testing.T)".
|
|
|
|
|
func isTestSignature(sign *types.Signature) bool {
|
|
|
|
|
if sign.Recv() != nil {
|
|
|
|
|
return false // test funcs don't have receivers
|
|
|
|
|
}
|
|
|
|
|
params := sign.Params()
|
|
|
|
|
if params.Len() != 1 {
|
|
|
|
|
return false // too many parameters for a test func
|
|
|
|
|
}
|
|
|
|
|
named := namedType(params.At(0).Type())
|
|
|
|
|
if named == nil {
|
|
|
|
|
return false // the only parameter isn't named, like "string"
|
|
|
|
|
}
|
|
|
|
|
obj := named.Obj()
|
|
|
|
|
return obj != nil && obj.Pkg().Path() == "testing" && obj.Name() == "T"
|
|
|
|
|
}
|
|
|
|
|
|
reimplement import path obfuscation without goobj2 (#242)
We used to rely on a parallel implementation of an object file parser
and writer to be able to obfuscate import paths. After compiling each
package, we would parse the object file, replace the import paths, and
write the updated object file in-place.
That worked well, in most cases. Unfortunately, it had some flaws:
* Complexity. Even when most of the code is maintained in a separate
module, the import_obfuscation.go file was still close to a thousand
lines of code.
* Go compatibility. The object file format changes between Go releases,
so we were supporting Go 1.15, but not 1.16. Fixing the object file
package to work with 1.16 would probably break 1.15 support.
* Bugs. For example, we recently had to add a workaround for #224, since
import paths containing dots after the domain would end up escaped.
Another example is #190, which seems to be caused by the object file
parser or writer corrupting the compiled code and causing segfaults in
some rare edge cases.
Instead, let's drop that method entirely, and force the compiler and
linker to do the work for us. The steps necessary when compiling a
package to obfuscate are:
1) Replace its "package foo" lines with the obfuscated package path. No
need to separate the package path and name, since the obfuscated path
does not contain slashes.
2) Replace the "-p pkg/foo" flag with the obfuscated path.
3) Replace the "import" spec lines with the obfuscated package paths,
for those dependencies which were obfuscated.
4) Replace the "-importcfg [...]" file with a version that uses the
obfuscated paths instead.
The linker also needs that last step, since it also uses an importcfg
file to find object files.
There are three noteworthy drawbacks to this new method:
1) Since we no longer write object files, we can't use them to store
data to be cached. As such, the -debugdir flag goes back to using the
"-a" build flag to always rebuild all packages. On the plus side,
that caching didn't work very well; see #176.
2) The package name "main" remains in all declarations under it, not
just "func main", since we can only rename entire packages. This
seems fine, as it gives little information to the end user.
3) The -tiny mode no longer sets all lines to 0, since it did that by
modifying object files. As a temporary measure, we instead set all
top-level declarations to be on line 1. A TODO is added to hopefully
improve this again in the near future.
The upside is that we get rid of all the issues mentioned before. Plus,
garble now nearly works with Go 1.16, with the exception of two very
minor bugs that look fixable. A follow-up PR will take care of that and
start testing on 1.16.
Fixes #176.
Fixes #190.
4 years ago
|
|
|
|
func transformLink(args []string) ([]string, error) {
|
initial support for build caching (#142)
As per the discussion in https://github.com/golang/go/issues/41145, it
turns out that we don't need special support for build caching in
-toolexec. We can simply modify the behavior of "[...]/compile -V=full"
and "[...]/link -V=full" so that they include garble's own version and
options in the printed build ID.
The part of the build ID that matters is the last, since it's the
"content ID" which is used to work out whether there is a need to redo
the action (build) or not. Since cmd/go parses the last word in the
output as "buildID=...", we simply add "+garble buildID=_/_/_/${hash}".
The slashes let us imitate a full binary build ID, but we assume that
the other components such as the action ID are not necessary, since the
only reader here is cmd/go and it only consumes the content ID.
The reported content ID includes the tool's original content ID,
garble's own content ID from the built binary, and the garble options
which modify how we obfuscate code. If any of the three changes, we
should use a different build cache key. GOPRIVATE also affects caching,
since a different GOPRIVATE value means that we might have to garble a
different set of packages.
Include tests, which mainly check that 'garble build -v' prints package
lines when we expect to always need to rebuild packages, and that it
prints nothing when we should be reusing the build cache even when the
built binary is missing.
After this change, 'go test' on Go 1.15.2 stabilizes at about 8s on my
machine, whereas it used to be at around 25s before.
5 years ago
|
|
|
|
// We can't split by the ".a" extension, because cached object files
|
|
|
|
|
// lack any extension.
|
reimplement import path obfuscation without goobj2 (#242)
We used to rely on a parallel implementation of an object file parser
and writer to be able to obfuscate import paths. After compiling each
package, we would parse the object file, replace the import paths, and
write the updated object file in-place.
That worked well, in most cases. Unfortunately, it had some flaws:
* Complexity. Even when most of the code is maintained in a separate
module, the import_obfuscation.go file was still close to a thousand
lines of code.
* Go compatibility. The object file format changes between Go releases,
so we were supporting Go 1.15, but not 1.16. Fixing the object file
package to work with 1.16 would probably break 1.15 support.
* Bugs. For example, we recently had to add a workaround for #224, since
import paths containing dots after the domain would end up escaped.
Another example is #190, which seems to be caused by the object file
parser or writer corrupting the compiled code and causing segfaults in
some rare edge cases.
Instead, let's drop that method entirely, and force the compiler and
linker to do the work for us. The steps necessary when compiling a
package to obfuscate are:
1) Replace its "package foo" lines with the obfuscated package path. No
need to separate the package path and name, since the obfuscated path
does not contain slashes.
2) Replace the "-p pkg/foo" flag with the obfuscated path.
3) Replace the "import" spec lines with the obfuscated package paths,
for those dependencies which were obfuscated.
4) Replace the "-importcfg [...]" file with a version that uses the
obfuscated paths instead.
The linker also needs that last step, since it also uses an importcfg
file to find object files.
There are three noteworthy drawbacks to this new method:
1) Since we no longer write object files, we can't use them to store
data to be cached. As such, the -debugdir flag goes back to using the
"-a" build flag to always rebuild all packages. On the plus side,
that caching didn't work very well; see #176.
2) The package name "main" remains in all declarations under it, not
just "func main", since we can only rename entire packages. This
seems fine, as it gives little information to the end user.
3) The -tiny mode no longer sets all lines to 0, since it did that by
modifying object files. As a temporary measure, we instead set all
top-level declarations to be on line 1. A TODO is added to hopefully
improve this again in the near future.
The upside is that we get rid of all the issues mentioned before. Plus,
garble now nearly works with Go 1.16, with the exception of two very
minor bugs that look fixable. A follow-up PR will take care of that and
start testing on 1.16.
Fixes #176.
Fixes #190.
4 years ago
|
|
|
|
flags, args := splitFlagsFromArgs(args)
|
|
|
|
|
|
avoid one more call to 'go tool buildid' (#253)
We use it to get the content ID of garble's binary, which is used for
both the garble action IDs, as well as 'go tool compile -V=full'.
Since those two happen in separate processes, both used to call 'go tool
buildid' separately. Store it in the gob cache the first time, and reuse
it the second time.
Since each call to cmd/go costs about 10ms (new process, running its
many init funcs, etc), this results in a nice speed-up for our small
benchmark. Most builds will take many seconds though, so note that a
~15ms speedup there will likely not be noticeable.
While at it, simplify the buildInfo global, as now it just contains a
map representation of the -importcfg contents. It now has better names,
docs, and a simpler representation.
We also stop using the term "garbled import", as it was a bit confusing.
"obfuscated types.Package" is a much better description.
name old time/op new time/op delta
Build-8 106ms ± 1% 92ms ± 0% -14.07% (p=0.010 n=6+4)
name old bin-B new bin-B delta
Build-8 6.60M ± 0% 6.60M ± 0% -0.01% (p=0.002 n=6+6)
name old sys-time/op new sys-time/op delta
Build-8 208ms ± 5% 149ms ± 3% -28.27% (p=0.004 n=6+5)
name old user-time/op new user-time/op delta
Build-8 433ms ± 3% 384ms ± 3% -11.35% (p=0.002 n=6+6)
4 years ago
|
|
|
|
newImportCfg, err := processImportCfg(flags)
|
|
|
|
|
if err != nil {
|
reimplement import path obfuscation without goobj2 (#242)
We used to rely on a parallel implementation of an object file parser
and writer to be able to obfuscate import paths. After compiling each
package, we would parse the object file, replace the import paths, and
write the updated object file in-place.
That worked well, in most cases. Unfortunately, it had some flaws:
* Complexity. Even when most of the code is maintained in a separate
module, the import_obfuscation.go file was still close to a thousand
lines of code.
* Go compatibility. The object file format changes between Go releases,
so we were supporting Go 1.15, but not 1.16. Fixing the object file
package to work with 1.16 would probably break 1.15 support.
* Bugs. For example, we recently had to add a workaround for #224, since
import paths containing dots after the domain would end up escaped.
Another example is #190, which seems to be caused by the object file
parser or writer corrupting the compiled code and causing segfaults in
some rare edge cases.
Instead, let's drop that method entirely, and force the compiler and
linker to do the work for us. The steps necessary when compiling a
package to obfuscate are:
1) Replace its "package foo" lines with the obfuscated package path. No
need to separate the package path and name, since the obfuscated path
does not contain slashes.
2) Replace the "-p pkg/foo" flag with the obfuscated path.
3) Replace the "import" spec lines with the obfuscated package paths,
for those dependencies which were obfuscated.
4) Replace the "-importcfg [...]" file with a version that uses the
obfuscated paths instead.
The linker also needs that last step, since it also uses an importcfg
file to find object files.
There are three noteworthy drawbacks to this new method:
1) Since we no longer write object files, we can't use them to store
data to be cached. As such, the -debugdir flag goes back to using the
"-a" build flag to always rebuild all packages. On the plus side,
that caching didn't work very well; see #176.
2) The package name "main" remains in all declarations under it, not
just "func main", since we can only rename entire packages. This
seems fine, as it gives little information to the end user.
3) The -tiny mode no longer sets all lines to 0, since it did that by
modifying object files. As a temporary measure, we instead set all
top-level declarations to be on line 1. A TODO is added to hopefully
improve this again in the near future.
The upside is that we get rid of all the issues mentioned before. Plus,
garble now nearly works with Go 1.16, with the exception of two very
minor bugs that look fixable. A follow-up PR will take care of that and
start testing on 1.16.
Fixes #176.
Fixes #190.
4 years ago
|
|
|
|
return nil, err
|
|
|
|
|
}
|
|
|
|
|
|
avoid obfuscating literals set via -ldflags=-X
The -X linker flag sets a string variable to a given value,
which is often used to inject strings such as versions.
The way garble's literal obfuscation works,
we replace string literals with anonymous functions which,
when evaluated, result in the original string.
Both of these features work fine separately,
but when intersecting, they break. For example, given:
var myVar = "original"
[...]
-ldflags=-X=main.myVar=replaced
The -X flag effectively replaces the initial value,
and -literals adds code to be run at init time:
var myVar = "replaced"
func init() { myVar = func() string { ... } }
Since the init func runs later, -literals breaks -X.
To avoid that problem,
don't obfuscate literals whose variables are set via -ldflags=-X.
We also leave TODOs about obfuscating those in the future,
but we're also leaving regression tests to ensure we get it right.
Fixes #323.
3 years ago
|
|
|
|
// TODO: unify this logic with the -X handling when using -literals.
|
|
|
|
|
// We should be able to handle both cases via the syntax tree.
|
|
|
|
|
//
|
|
|
|
|
// Make sure -X works with obfuscated identifiers.
|
|
|
|
|
// To cover both obfuscated and non-obfuscated names,
|
|
|
|
|
// duplicate each flag with a obfuscated version.
|
|
|
|
|
flagValueIter(flags, "-X", func(val string) {
|
|
|
|
|
// val is in the form of "foo.com/bar.name=value".
|
|
|
|
|
fullName, stringValue, found := strings.Cut(val, "=")
|
|
|
|
|
if !found {
|
|
|
|
|
return // invalid
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// fullName is "foo.com/bar.name"
|
|
|
|
|
i := strings.LastIndexByte(fullName, '.')
|
|
|
|
|
path, name := fullName[:i], fullName[i+1:]
|
|
|
|
|
|
refactor "current package" with TOOLEXEC_IMPORTPATH (#266)
Now that we've dropped support for Go 1.15.x, we can finally rely on
this environment variable for toolexec calls, present in Go 1.16.
Before, we had hacky ways of trying to figure out the current package's
import path, mostly from the -p flag. The biggest rough edge there was
that, for main packages, that was simply the package name, and not its
full import path.
To work around that, we had a restriction on a single main package, so
we could work around that issue. That restriction is now gone.
The new code is simpler, especially because we can set curPkg in a
single place for all toolexec transform funcs.
Since we can always rely on curPkg not being nil now, we can also start
reusing listedPackage.Private and avoid the majority of repeated calls
to isPrivate. The function is cheap, but still not free.
isPrivate itself can also get simpler. We no longer have to worry about
the "main" edge case. Plus, the sanity check for invalid package paths
is now unnecessary; we only got malformed paths from goobj2, and we now
require exact matches with the ImportPath field from "go list -json".
Another effect of clearing up the "main" edge case is that -debugdir now
uses the right directory for main packages. We also start using
consistent debugdir paths in the tests, for the sake of being easier to
read and maintain.
Finally, note that commandReverse did not need the extra call to "go
list -toolexec", as the "shared" call stored in the cache is enough. We
still call toolexecCmd to get said cache, which should probably be
simplified in a future PR.
While at it, replace the use of the "-std" compiler flag with the
Standard field from "go list -json".
4 years ago
|
|
|
|
// If the package path is "main", it's the current top-level
|
|
|
|
|
// package we are linking.
|
|
|
|
|
// Otherwise, find it in the cache.
|
|
|
|
|
lpkg := curPkg
|
|
|
|
|
if path != "main" {
|
|
|
|
|
lpkg = cache.ListedPackages[path]
|
|
|
|
|
}
|
ignore -ldflags=-X flags mentioning unknown packages
That would panic, since the *listedPackage would be nil for a package
path we aren't aware of:
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x88 pc=0x126b57d]
goroutine 1 [running]:
main.transformLink.func1(0x7ffeefbff28b, 0x5d)
mvdan.cc/garble@v0.0.0-20210302140807-b03cd08c0946/main.go:1260 +0x17d
main.flagValueIter(0xc0000a8e20, 0x2f, 0x2f, 0x12e278e, 0x2, 0xc000129e28)
mvdan.cc/garble@v0.0.0-20210302140807-b03cd08c0946/main.go:1410 +0x1e9
main.transformLink(0xc0000a8e20, 0x30, 0x36, 0x4, 0xc000114648, 0x23, 0x12dfd60, 0x0)
mvdan.cc/garble@v0.0.0-20210302140807-b03cd08c0946/main.go:1241 +0x1b9
main.mainErr(0xc0000a8e10, 0x31, 0x37, 0x37, 0x0)
mvdan.cc/garble@v0.0.0-20210302140807-b03cd08c0946/main.go:287 +0x389
main.main1(0xc000096058)
mvdan.cc/garble@v0.0.0-20210302140807-b03cd08c0946/main.go:150 +0xe7
main.main()
mvdan.cc/garble@v0.0.0-20210302140807-b03cd08c0946/main.go:83 +0x25
The linker ignores such unknown references, so we should too.
Fixes #259.
4 years ago
|
|
|
|
if lpkg == nil {
|
|
|
|
|
// We couldn't find the package.
|
|
|
|
|
// Perhaps a typo, perhaps not part of the build.
|
|
|
|
|
// cmd/link ignores those, so we should too.
|
|
|
|
|
return
|
|
|
|
|
}
|
refactor "current package" with TOOLEXEC_IMPORTPATH (#266)
Now that we've dropped support for Go 1.15.x, we can finally rely on
this environment variable for toolexec calls, present in Go 1.16.
Before, we had hacky ways of trying to figure out the current package's
import path, mostly from the -p flag. The biggest rough edge there was
that, for main packages, that was simply the package name, and not its
full import path.
To work around that, we had a restriction on a single main package, so
we could work around that issue. That restriction is now gone.
The new code is simpler, especially because we can set curPkg in a
single place for all toolexec transform funcs.
Since we can always rely on curPkg not being nil now, we can also start
reusing listedPackage.Private and avoid the majority of repeated calls
to isPrivate. The function is cheap, but still not free.
isPrivate itself can also get simpler. We no longer have to worry about
the "main" edge case. Plus, the sanity check for invalid package paths
is now unnecessary; we only got malformed paths from goobj2, and we now
require exact matches with the ImportPath field from "go list -json".
Another effect of clearing up the "main" edge case is that -debugdir now
uses the right directory for main packages. We also start using
consistent debugdir paths in the tests, for the sake of being easier to
read and maintain.
Finally, note that commandReverse did not need the extra call to "go
list -toolexec", as the "shared" call stored in the cache is enough. We
still call toolexecCmd to get said cache, which should probably be
simplified in a future PR.
While at it, replace the use of the "-std" compiler flag with the
Standard field from "go list -json".
4 years ago
|
|
|
|
// As before, the main package must remain as "main".
|
|
|
|
|
newPath := path
|
|
|
|
|
if path != "main" {
|
|
|
|
|
newPath = lpkg.obfuscatedImportPath()
|
reimplement import path obfuscation without goobj2 (#242)
We used to rely on a parallel implementation of an object file parser
and writer to be able to obfuscate import paths. After compiling each
package, we would parse the object file, replace the import paths, and
write the updated object file in-place.
That worked well, in most cases. Unfortunately, it had some flaws:
* Complexity. Even when most of the code is maintained in a separate
module, the import_obfuscation.go file was still close to a thousand
lines of code.
* Go compatibility. The object file format changes between Go releases,
so we were supporting Go 1.15, but not 1.16. Fixing the object file
package to work with 1.16 would probably break 1.15 support.
* Bugs. For example, we recently had to add a workaround for #224, since
import paths containing dots after the domain would end up escaped.
Another example is #190, which seems to be caused by the object file
parser or writer corrupting the compiled code and causing segfaults in
some rare edge cases.
Instead, let's drop that method entirely, and force the compiler and
linker to do the work for us. The steps necessary when compiling a
package to obfuscate are:
1) Replace its "package foo" lines with the obfuscated package path. No
need to separate the package path and name, since the obfuscated path
does not contain slashes.
2) Replace the "-p pkg/foo" flag with the obfuscated path.
3) Replace the "import" spec lines with the obfuscated package paths,
for those dependencies which were obfuscated.
4) Replace the "-importcfg [...]" file with a version that uses the
obfuscated paths instead.
The linker also needs that last step, since it also uses an importcfg
file to find object files.
There are three noteworthy drawbacks to this new method:
1) Since we no longer write object files, we can't use them to store
data to be cached. As such, the -debugdir flag goes back to using the
"-a" build flag to always rebuild all packages. On the plus side,
that caching didn't work very well; see #176.
2) The package name "main" remains in all declarations under it, not
just "func main", since we can only rename entire packages. This
seems fine, as it gives little information to the end user.
3) The -tiny mode no longer sets all lines to 0, since it did that by
modifying object files. As a temporary measure, we instead set all
top-level declarations to be on line 1. A TODO is added to hopefully
improve this again in the near future.
The upside is that we get rid of all the issues mentioned before. Plus,
garble now nearly works with Go 1.16, with the exception of two very
minor bugs that look fixable. A follow-up PR will take care of that and
start testing on 1.16.
Fixes #176.
Fixes #190.
4 years ago
|
|
|
|
}
|
|
|
|
|
newName := hashWithPackage(lpkg, name)
|
|
|
|
|
flags = append(flags, fmt.Sprintf("-X=%s.%s=%s", newPath, newName, stringValue))
|
|
|
|
|
})
|
|
|
|
|
|
update support for Go 1.17 in time for beta1
Back in early April we added initial support for Go 1.17,
working on a commit from master at that time. For that to work, we just
needed to add a couple of packages to runtimeRelated and tweak printFile
a bit to not break the new "//go:build" directives.
A significant amount of changes have landed since, though, and the tests
broke in multiple ways.
Most notably, the new register ABI is enabled by default for GOOS=amd64.
That affected garble indirectly in two ways: there's a new internal
package to add to runtimeRelated, and we must make reverse.txt more
clever in making its output constant across ABIs.
Another noticeable change is that Go 1.17 changes how its own version is
injected into the runtime package. It used to be via a constant in
runtime/internal/sys, such as:
const TheVersion = `devel ...`
Since we couldn't override such constants via the linker's -X flag,
we had to directly alter the declaration while compiling.
Thankfully, Go 1.17 simply uses a "var buildVersion string" in the
runtime package, and its value is injected by the linker.
This means we can now override it with the linker's -X flag.
We make the code to alter TheVersion for Go 1.16 a bit more clever,
to not break the package when building with Go 1.17.
Finally, our hack to work around ambiguous TOOLEXEC_IMPORTPATH values
now only kicks in for non-test packages, since Go 1.17 includes our
upstream fix. Otherwise, some tests would end up with the ".test"
variant suffix added a second time:
test/bar [test/bar.test] [test/bar [test/bar.test].test]
All the code to keep compatibility with Go 1.16.x remains in place.
We're still leaving TODOs to remind ourselves to remove it or simplify
it once we remove support for 1.16.x.
The 1.17 development freeze has already been in place for a month,
and beta1 is due to come this week, so it's unlikely that Go will change
in any considerable way at this point. Hence, we can say that support
for 1.17 is done.
Fixes #347.
4 years ago
|
|
|
|
// Starting in Go 1.17, Go's version is implicitly injected by the linker.
|
|
|
|
|
// It's the same method as -X, so we can override it with an extra flag.
|
|
|
|
|
flags = append(flags, "-X=runtime.buildVersion=unknown")
|
|
|
|
|
|
|
|
|
|
// Ensure we strip the -buildid flag, to not leak any build IDs for the
|
|
|
|
|
// link operation or the main package's compilation.
|
|
|
|
|
flags = flagSetValue(flags, "-buildid", "")
|
|
|
|
|
|
|
|
|
|
// Strip debug information and symbol tables.
|
|
|
|
|
flags = append(flags, "-w", "-s")
|
reimplement import path obfuscation without goobj2 (#242)
We used to rely on a parallel implementation of an object file parser
and writer to be able to obfuscate import paths. After compiling each
package, we would parse the object file, replace the import paths, and
write the updated object file in-place.
That worked well, in most cases. Unfortunately, it had some flaws:
* Complexity. Even when most of the code is maintained in a separate
module, the import_obfuscation.go file was still close to a thousand
lines of code.
* Go compatibility. The object file format changes between Go releases,
so we were supporting Go 1.15, but not 1.16. Fixing the object file
package to work with 1.16 would probably break 1.15 support.
* Bugs. For example, we recently had to add a workaround for #224, since
import paths containing dots after the domain would end up escaped.
Another example is #190, which seems to be caused by the object file
parser or writer corrupting the compiled code and causing segfaults in
some rare edge cases.
Instead, let's drop that method entirely, and force the compiler and
linker to do the work for us. The steps necessary when compiling a
package to obfuscate are:
1) Replace its "package foo" lines with the obfuscated package path. No
need to separate the package path and name, since the obfuscated path
does not contain slashes.
2) Replace the "-p pkg/foo" flag with the obfuscated path.
3) Replace the "import" spec lines with the obfuscated package paths,
for those dependencies which were obfuscated.
4) Replace the "-importcfg [...]" file with a version that uses the
obfuscated paths instead.
The linker also needs that last step, since it also uses an importcfg
file to find object files.
There are three noteworthy drawbacks to this new method:
1) Since we no longer write object files, we can't use them to store
data to be cached. As such, the -debugdir flag goes back to using the
"-a" build flag to always rebuild all packages. On the plus side,
that caching didn't work very well; see #176.
2) The package name "main" remains in all declarations under it, not
just "func main", since we can only rename entire packages. This
seems fine, as it gives little information to the end user.
3) The -tiny mode no longer sets all lines to 0, since it did that by
modifying object files. As a temporary measure, we instead set all
top-level declarations to be on line 1. A TODO is added to hopefully
improve this again in the near future.
The upside is that we get rid of all the issues mentioned before. Plus,
garble now nearly works with Go 1.16, with the exception of two very
minor bugs that look fixable. A follow-up PR will take care of that and
start testing on 1.16.
Fixes #176.
Fixes #190.
4 years ago
|
|
|
|
|
|
|
|
|
flags = flagSetValue(flags, "-importcfg", newImportCfg)
|
|
|
|
|
return append(flags, args...), nil
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
func splitFlagsFromArgs(all []string) (flags, args []string) {
|
|
|
|
|
for i := 0; i < len(all); i++ {
|
|
|
|
|
arg := all[i]
|
|
|
|
|
if !strings.HasPrefix(arg, "-") {
|
always use the compiler's -dwarf=false flag (#96)
First, our original append line was completely ineffective; we never
used that "flags" slice again. Second, we only attempted to use the flag
when we obfuscated a package.
In fact, we never care about debugging information here, so for any
package we compile, we can add "-dwarf=false". At the moment, we compile
all packages, even if they aren't to be obfuscated, due to the lack of
access to the build cache.
As such, we save a significant amount of work. The numbers below were
obtained on a quiet machine with "go test -bench=. -benchtime=10x", six
times before and after the change.
name old time/op new time/op delta
Build-8 2.06s ± 4% 1.87s ± 2% -9.21% (p=0.002 n=6+6)
name old sys-time/op new sys-time/op delta
Build-8 1.51s ± 2% 1.46s ± 1% -3.12% (p=0.004 n=6+5)
name old user-time/op new user-time/op delta
Build-8 11.9s ± 2% 10.8s ± 1% -8.71% (p=0.002 n=6+6)
While at it, only do CI builds on pushes and PRs to the master branch,
so that my PRs created from the same repo don't trigger duplicate
builds.
5 years ago
|
|
|
|
return all[:i:i], all[i:]
|
|
|
|
|
}
|
|
|
|
|
if booleanFlags[arg] || strings.Contains(arg, "=") {
|
|
|
|
|
// Either "-bool" or "-name=value".
|
|
|
|
|
continue
|
|
|
|
|
}
|
|
|
|
|
// "-name value", so the next arg is part of this flag.
|
|
|
|
|
i++
|
|
|
|
|
}
|
|
|
|
|
return all, nil
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
func alterTrimpath(flags []string) []string {
|
avoid reproducibility issues with full rebuilds
We were using temporary filenames for modified Go and assembly files.
For example, an obfuscated "encoding/json/encode.go" would end up as:
/tmp/garble-shared123/encode.go.456.go
where "123" and "456" are random numbers, usually longer.
This was usually fine for two reasons:
1) We would add "/tmp/garble-shared123/" to -trimpath, so the temporary
directory and its random number would be invisible.
2) We would add "//line" directives to the source files, replacing
the filename with obfuscated versions excluding any random number.
Unfortunately, this broke in multiple ways. Most notably, assembly files
do not have any line directives, and it's not clear that there's any
support for them. So the random number in their basename could end up in
the binary, breaking reproducibility.
Another issue is that the -trimpath addition described above was only
done for cmd/compile, not cmd/asm, so assembly filenames included the
randomized temporary directory.
To fix the issues above, the same "encoding/json/encode.go" would now
end up as:
/tmp/garble-shared123/encoding/json/encode.go
Such a path is still unique even though the "456" random number is gone,
as import paths are unique within a single build.
This fixes issues with the base name of each file, so we no longer rely
on line directives as the only way to remove the second original random
number.
We still rely on -trimpath to get rid of the temporary directory in
filenames. To fix its problem with assembly files, also amend the
-trimpath flag when running the assembler tool.
Finally, add a test that reproducible builds still work when a full
rebuild is done. We choose goprivate.txt for such a test as its
stdimporter package imports a number of std packages, including uses of
assembly and cgo.
For the time being, we don't use such a "full rebuild" reproducibility
test in other test scripts, as this step is expensive, rebuilding many
packages from scratch.
This issue went unnoticed for over a year because such random numbers
"123" and "456" were created when a package was obfuscated, and that
only happened once per package version as long as the build cache was
kept intact.
When clearing the build cache, or forcing a rebuild with -a, one gets
new random numbers, and thus a different binary resulting from the same
build input. That's not something that most users would do regularly,
and our tests did not cover that edge case either, until now.
Fixes #328.
4 years ago
|
|
|
|
trimpath := flagValue(flags, "-trimpath")
|
|
|
|
|
|
|
|
|
|
// Add our temporary dir to the beginning of -trimpath, so that we don't
|
|
|
|
|
// leak temporary dirs. Needs to be at the beginning, since there may be
|
|
|
|
|
// shorter prefixes later in the list, such as $PWD if TMPDIR=$PWD/tmp.
|
|
|
|
|
return flagSetValue(flags, "-trimpath", sharedTempDir+"=>;"+trimpath)
|
avoid reproducibility issues with full rebuilds
We were using temporary filenames for modified Go and assembly files.
For example, an obfuscated "encoding/json/encode.go" would end up as:
/tmp/garble-shared123/encode.go.456.go
where "123" and "456" are random numbers, usually longer.
This was usually fine for two reasons:
1) We would add "/tmp/garble-shared123/" to -trimpath, so the temporary
directory and its random number would be invisible.
2) We would add "//line" directives to the source files, replacing
the filename with obfuscated versions excluding any random number.
Unfortunately, this broke in multiple ways. Most notably, assembly files
do not have any line directives, and it's not clear that there's any
support for them. So the random number in their basename could end up in
the binary, breaking reproducibility.
Another issue is that the -trimpath addition described above was only
done for cmd/compile, not cmd/asm, so assembly filenames included the
randomized temporary directory.
To fix the issues above, the same "encoding/json/encode.go" would now
end up as:
/tmp/garble-shared123/encoding/json/encode.go
Such a path is still unique even though the "456" random number is gone,
as import paths are unique within a single build.
This fixes issues with the base name of each file, so we no longer rely
on line directives as the only way to remove the second original random
number.
We still rely on -trimpath to get rid of the temporary directory in
filenames. To fix its problem with assembly files, also amend the
-trimpath flag when running the assembler tool.
Finally, add a test that reproducible builds still work when a full
rebuild is done. We choose goprivate.txt for such a test as its
stdimporter package imports a number of std packages, including uses of
assembly and cgo.
For the time being, we don't use such a "full rebuild" reproducibility
test in other test scripts, as this step is expensive, rebuilding many
packages from scratch.
This issue went unnoticed for over a year because such random numbers
"123" and "456" were created when a package was obfuscated, and that
only happened once per package version as long as the build cache was
kept intact.
When clearing the build cache, or forcing a rebuild with -a, one gets
new random numbers, and thus a different binary resulting from the same
build input. That's not something that most users would do regularly,
and our tests did not cover that edge case either, until now.
Fixes #328.
4 years ago
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// forwardBuildFlags is obtained from 'go help build' as of Go 1.18beta1.
|
fail if we are unexpectedly overwriting files (#418)
While investigating a bug report,
I noticed that garble was writing to the same temp file twice.
At best, writing to the same path on disk twice is wasteful,
as the design is careful to be deterministic and use unique paths.
At worst, the two writes could cause races at the filesystem level.
To prevent either of those situations,
we now create files with os.OpenFile and os.O_EXCL,
meaning that we will error if the file already exists.
That change uncovered a number of such unintended cases.
First, transformAsm would write obfuscated Go files twice.
This is because the Go toolchain actually runs:
[...]/asm -gensymabis [...] foo.s bar.s
[...]/asm [...] foo.s bar.s
That is, the first run is only meant to generate symbol ABIs,
which are then used by the compiler.
We need to obfuscate at that first stage,
because the symbol ABI descriptions need to use obfuscated names.
However, having already obfuscated the assembly on the first stage,
there is no need to do so again on the second stage.
If we detect gensymabis is missing, we simply reuse the previous files.
This first situation doesn't seem racy,
but obfuscating the Go assembly files twice is certainly unnecessary.
Second, saveKnownReflectAPIs wrote a gob file to the build cache.
Since the build cache can be kept between builds,
and since the build cache uses reproducible paths for each build,
running the same "garble build" twice could overwrite those files.
This could actually cause races at the filesystem level;
if two concurrent builds write to the same gob file on disk,
one of them could end up using a partially-written file.
Note that this is the only of the three cases not using temporary files.
As such, it is expected that the file may already exist.
In such a case, we simply avoid overwriting it rather than failing.
Third, when "garble build -a" was used,
and when we needed an export file not listed in importcfg,
we would end up calling roughly:
go list -export -toolexec=garble -a <dependency>
This meant we would re-build and re-obfuscate those packages.
Which is unfortunate, because the parent process already did via:
go build -toolexec=garble -a <main>
The repeated dependency builds tripped the new os.O_EXCL check,
as we would try to overwrite the same obfuscated Go files.
Beyond being wasteful, this could again cause subtle filesystem races.
To fix the problem, avoid passing flags like "-a" to nested go commands.
Overall, we should likely be using safer ways to write to disk,
be it via either atomic writes or locked files.
However, for now, catching duplicate writes is a big step.
I have left a self-assigned TODO for further improvements.
CI on the pull request found a failure on test-gotip.
The failure reproduces on master, so it seems to be related to gotip,
and not a regression introduced by this change.
For now, disable test-gotip until we can investigate.
3 years ago
|
|
|
|
var forwardBuildFlags = map[string]bool{
|
|
|
|
|
// These shouldn't be used in nested cmd/go calls.
|
|
|
|
|
"-a": false,
|
|
|
|
|
"-n": false,
|
|
|
|
|
"-x": false,
|
|
|
|
|
"-v": false,
|
|
|
|
|
|
|
|
|
|
// These are always set by garble.
|
|
|
|
|
"-trimpath": false,
|
|
|
|
|
"-toolexec": false,
|
|
|
|
|
"-buildvcs": false,
|
fail if we are unexpectedly overwriting files (#418)
While investigating a bug report,
I noticed that garble was writing to the same temp file twice.
At best, writing to the same path on disk twice is wasteful,
as the design is careful to be deterministic and use unique paths.
At worst, the two writes could cause races at the filesystem level.
To prevent either of those situations,
we now create files with os.OpenFile and os.O_EXCL,
meaning that we will error if the file already exists.
That change uncovered a number of such unintended cases.
First, transformAsm would write obfuscated Go files twice.
This is because the Go toolchain actually runs:
[...]/asm -gensymabis [...] foo.s bar.s
[...]/asm [...] foo.s bar.s
That is, the first run is only meant to generate symbol ABIs,
which are then used by the compiler.
We need to obfuscate at that first stage,
because the symbol ABI descriptions need to use obfuscated names.
However, having already obfuscated the assembly on the first stage,
there is no need to do so again on the second stage.
If we detect gensymabis is missing, we simply reuse the previous files.
This first situation doesn't seem racy,
but obfuscating the Go assembly files twice is certainly unnecessary.
Second, saveKnownReflectAPIs wrote a gob file to the build cache.
Since the build cache can be kept between builds,
and since the build cache uses reproducible paths for each build,
running the same "garble build" twice could overwrite those files.
This could actually cause races at the filesystem level;
if two concurrent builds write to the same gob file on disk,
one of them could end up using a partially-written file.
Note that this is the only of the three cases not using temporary files.
As such, it is expected that the file may already exist.
In such a case, we simply avoid overwriting it rather than failing.
Third, when "garble build -a" was used,
and when we needed an export file not listed in importcfg,
we would end up calling roughly:
go list -export -toolexec=garble -a <dependency>
This meant we would re-build and re-obfuscate those packages.
Which is unfortunate, because the parent process already did via:
go build -toolexec=garble -a <main>
The repeated dependency builds tripped the new os.O_EXCL check,
as we would try to overwrite the same obfuscated Go files.
Beyond being wasteful, this could again cause subtle filesystem races.
To fix the problem, avoid passing flags like "-a" to nested go commands.
Overall, we should likely be using safer ways to write to disk,
be it via either atomic writes or locked files.
However, for now, catching duplicate writes is a big step.
I have left a self-assigned TODO for further improvements.
CI on the pull request found a failure on test-gotip.
The failure reproduces on master, so it seems to be related to gotip,
and not a regression introduced by this change.
For now, disable test-gotip until we can investigate.
3 years ago
|
|
|
|
|
|
|
|
|
"-p": true,
|
|
|
|
|
"-race": true,
|
|
|
|
|
"-msan": true,
|
|
|
|
|
"-asan": true,
|
|
|
|
|
"-work": true,
|
|
|
|
|
"-asmflags": true,
|
|
|
|
|
"-buildmode": true,
|
|
|
|
|
"-compiler": true,
|
|
|
|
|
"-gccgoflags": true,
|
|
|
|
|
"-gcflags": true,
|
|
|
|
|
"-installsuffix": true,
|
|
|
|
|
"-ldflags": true,
|
|
|
|
|
"-linkshared": true,
|
|
|
|
|
"-mod": true,
|
|
|
|
|
"-modcacherw": true,
|
|
|
|
|
"-modfile": true,
|
|
|
|
|
"-pkgdir": true,
|
|
|
|
|
"-tags": true,
|
|
|
|
|
"-workfile": true,
|
|
|
|
|
"-overlay": true,
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// booleanFlags is obtained from 'go help build' and 'go help testflag' as of Go 1.19beta1.
|
|
|
|
|
var booleanFlags = map[string]bool{
|
|
|
|
|
// Shared build flags.
|
|
|
|
|
"-a": true,
|
|
|
|
|
"-i": true,
|
|
|
|
|
"-n": true,
|
|
|
|
|
"-v": true,
|
|
|
|
|
"-work": true,
|
|
|
|
|
"-x": true,
|
|
|
|
|
"-race": true,
|
|
|
|
|
"-msan": true,
|
|
|
|
|
"-asan": true,
|
|
|
|
|
"-linkshared": true,
|
|
|
|
|
"-modcacherw": true,
|
|
|
|
|
"-trimpath": true,
|
|
|
|
|
"-buildvcs": true,
|
|
|
|
|
|
|
|
|
|
// Test flags (TODO: support its special -args flag)
|
|
|
|
|
"-c": true,
|
|
|
|
|
"-json": true,
|
|
|
|
|
"-cover": true,
|
|
|
|
|
"-failfast": true,
|
|
|
|
|
"-short": true,
|
|
|
|
|
"-benchmem": true,
|
|
|
|
|
}
|
|
|
|
|
|
fail if we are unexpectedly overwriting files (#418)
While investigating a bug report,
I noticed that garble was writing to the same temp file twice.
At best, writing to the same path on disk twice is wasteful,
as the design is careful to be deterministic and use unique paths.
At worst, the two writes could cause races at the filesystem level.
To prevent either of those situations,
we now create files with os.OpenFile and os.O_EXCL,
meaning that we will error if the file already exists.
That change uncovered a number of such unintended cases.
First, transformAsm would write obfuscated Go files twice.
This is because the Go toolchain actually runs:
[...]/asm -gensymabis [...] foo.s bar.s
[...]/asm [...] foo.s bar.s
That is, the first run is only meant to generate symbol ABIs,
which are then used by the compiler.
We need to obfuscate at that first stage,
because the symbol ABI descriptions need to use obfuscated names.
However, having already obfuscated the assembly on the first stage,
there is no need to do so again on the second stage.
If we detect gensymabis is missing, we simply reuse the previous files.
This first situation doesn't seem racy,
but obfuscating the Go assembly files twice is certainly unnecessary.
Second, saveKnownReflectAPIs wrote a gob file to the build cache.
Since the build cache can be kept between builds,
and since the build cache uses reproducible paths for each build,
running the same "garble build" twice could overwrite those files.
This could actually cause races at the filesystem level;
if two concurrent builds write to the same gob file on disk,
one of them could end up using a partially-written file.
Note that this is the only of the three cases not using temporary files.
As such, it is expected that the file may already exist.
In such a case, we simply avoid overwriting it rather than failing.
Third, when "garble build -a" was used,
and when we needed an export file not listed in importcfg,
we would end up calling roughly:
go list -export -toolexec=garble -a <dependency>
This meant we would re-build and re-obfuscate those packages.
Which is unfortunate, because the parent process already did via:
go build -toolexec=garble -a <main>
The repeated dependency builds tripped the new os.O_EXCL check,
as we would try to overwrite the same obfuscated Go files.
Beyond being wasteful, this could again cause subtle filesystem races.
To fix the problem, avoid passing flags like "-a" to nested go commands.
Overall, we should likely be using safer ways to write to disk,
be it via either atomic writes or locked files.
However, for now, catching duplicate writes is a big step.
I have left a self-assigned TODO for further improvements.
CI on the pull request found a failure on test-gotip.
The failure reproduces on master, so it seems to be related to gotip,
and not a regression introduced by this change.
For now, disable test-gotip until we can investigate.
3 years ago
|
|
|
|
func filterForwardBuildFlags(flags []string) (filtered []string, firstUnknown string) {
|
|
|
|
|
for i := 0; i < len(flags); i++ {
|
|
|
|
|
arg := flags[i]
|
avoid obfuscating literals set via -ldflags=-X
The -X linker flag sets a string variable to a given value,
which is often used to inject strings such as versions.
The way garble's literal obfuscation works,
we replace string literals with anonymous functions which,
when evaluated, result in the original string.
Both of these features work fine separately,
but when intersecting, they break. For example, given:
var myVar = "original"
[...]
-ldflags=-X=main.myVar=replaced
The -X flag effectively replaces the initial value,
and -literals adds code to be run at init time:
var myVar = "replaced"
func init() { myVar = func() string { ... } }
Since the init func runs later, -literals breaks -X.
To avoid that problem,
don't obfuscate literals whose variables are set via -ldflags=-X.
We also leave TODOs about obfuscating those in the future,
but we're also leaving regression tests to ensure we get it right.
Fixes #323.
3 years ago
|
|
|
|
if strings.HasPrefix(arg, "--") {
|
|
|
|
|
arg = arg[1:] // "--name" to "-name"; keep the short form
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
name, _, _ := strings.Cut(arg, "=") // "-name=value" to "-name"
|
|
|
|
|
|
fail if we are unexpectedly overwriting files (#418)
While investigating a bug report,
I noticed that garble was writing to the same temp file twice.
At best, writing to the same path on disk twice is wasteful,
as the design is careful to be deterministic and use unique paths.
At worst, the two writes could cause races at the filesystem level.
To prevent either of those situations,
we now create files with os.OpenFile and os.O_EXCL,
meaning that we will error if the file already exists.
That change uncovered a number of such unintended cases.
First, transformAsm would write obfuscated Go files twice.
This is because the Go toolchain actually runs:
[...]/asm -gensymabis [...] foo.s bar.s
[...]/asm [...] foo.s bar.s
That is, the first run is only meant to generate symbol ABIs,
which are then used by the compiler.
We need to obfuscate at that first stage,
because the symbol ABI descriptions need to use obfuscated names.
However, having already obfuscated the assembly on the first stage,
there is no need to do so again on the second stage.
If we detect gensymabis is missing, we simply reuse the previous files.
This first situation doesn't seem racy,
but obfuscating the Go assembly files twice is certainly unnecessary.
Second, saveKnownReflectAPIs wrote a gob file to the build cache.
Since the build cache can be kept between builds,
and since the build cache uses reproducible paths for each build,
running the same "garble build" twice could overwrite those files.
This could actually cause races at the filesystem level;
if two concurrent builds write to the same gob file on disk,
one of them could end up using a partially-written file.
Note that this is the only of the three cases not using temporary files.
As such, it is expected that the file may already exist.
In such a case, we simply avoid overwriting it rather than failing.
Third, when "garble build -a" was used,
and when we needed an export file not listed in importcfg,
we would end up calling roughly:
go list -export -toolexec=garble -a <dependency>
This meant we would re-build and re-obfuscate those packages.
Which is unfortunate, because the parent process already did via:
go build -toolexec=garble -a <main>
The repeated dependency builds tripped the new os.O_EXCL check,
as we would try to overwrite the same obfuscated Go files.
Beyond being wasteful, this could again cause subtle filesystem races.
To fix the problem, avoid passing flags like "-a" to nested go commands.
Overall, we should likely be using safer ways to write to disk,
be it via either atomic writes or locked files.
However, for now, catching duplicate writes is a big step.
I have left a self-assigned TODO for further improvements.
CI on the pull request found a failure on test-gotip.
The failure reproduces on master, so it seems to be related to gotip,
and not a regression introduced by this change.
For now, disable test-gotip until we can investigate.
3 years ago
|
|
|
|
buildFlag := forwardBuildFlags[name]
|
|
|
|
|
if buildFlag {
|
|
|
|
|
filtered = append(filtered, arg)
|
|
|
|
|
} else {
|
|
|
|
|
firstUnknown = name
|
|
|
|
|
}
|
|
|
|
|
if booleanFlags[arg] || strings.Contains(arg, "=") {
|
|
|
|
|
// Either "-bool" or "-name=value".
|
|
|
|
|
continue
|
|
|
|
|
}
|
|
|
|
|
// "-name value", so the next arg is part of this flag.
|
|
|
|
|
if i++; buildFlag && i < len(flags) {
|
|
|
|
|
filtered = append(filtered, flags[i])
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
return filtered, firstUnknown
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// splitFlagsFromFiles splits args into a list of flag and file arguments. Since
|
|
|
|
|
// we can't rely on "--" being present, and we don't parse all flags upfront, we
|
|
|
|
|
// rely on finding the first argument that doesn't begin with "-" and that has
|
|
|
|
|
// the extension we expect for the list of paths.
|
|
|
|
|
//
|
|
|
|
|
// This function only makes sense for lower-level tool commands, such as
|
|
|
|
|
// "compile" or "link", since their arguments are predictable.
|
|
|
|
|
//
|
|
|
|
|
// We iterate from the end rather than from the start, to better protect
|
|
|
|
|
// oursrelves from flag arguments that may look like paths, such as:
|
|
|
|
|
//
|
|
|
|
|
// compile [flags...] -p pkg/path.go [more flags...] file1.go file2.go
|
|
|
|
|
//
|
|
|
|
|
// For now, since those confusing flags are always followed by more flags,
|
|
|
|
|
// iterating in reverse order works around them entirely.
|
|
|
|
|
func splitFlagsFromFiles(all []string, ext string) (flags, paths []string) {
|
|
|
|
|
for i := len(all) - 1; i >= 0; i-- {
|
|
|
|
|
arg := all[i]
|
|
|
|
|
if strings.HasPrefix(arg, "-") || !strings.HasSuffix(arg, ext) {
|
|
|
|
|
cutoff := i + 1 // arg is a flag, not a path
|
|
|
|
|
return all[:cutoff:cutoff], all[cutoff:]
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
return nil, all
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// flagValue retrieves the value of a flag such as "-foo", from strings in the
|
|
|
|
|
// list of arguments like "-foo=bar" or "-foo" "bar". If the flag is repeated,
|
|
|
|
|
// the last value is returned.
|
|
|
|
|
func flagValue(flags []string, name string) string {
|
|
|
|
|
lastVal := ""
|
|
|
|
|
flagValueIter(flags, name, func(val string) {
|
|
|
|
|
lastVal = val
|
|
|
|
|
})
|
|
|
|
|
return lastVal
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// flagValueIter retrieves all the values for a flag such as "-foo", like
|
|
|
|
|
// flagValue. The difference is that it allows handling complex flags, such as
|
|
|
|
|
// those whose values compose a list.
|
|
|
|
|
func flagValueIter(flags []string, name string, fn func(string)) {
|
|
|
|
|
for i, arg := range flags {
|
|
|
|
|
if val := strings.TrimPrefix(arg, name+"="); val != arg {
|
|
|
|
|
// -name=value
|
|
|
|
|
fn(val)
|
|
|
|
|
}
|
|
|
|
|
if arg == name { // -name ...
|
|
|
|
|
if i+1 < len(flags) {
|
|
|
|
|
// -name value
|
|
|
|
|
fn(flags[i+1])
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
func flagSetValue(flags []string, name, value string) []string {
|
|
|
|
|
for i, arg := range flags {
|
|
|
|
|
if strings.HasPrefix(arg, name+"=") {
|
|
|
|
|
// -name=value
|
|
|
|
|
flags[i] = name + "=" + value
|
|
|
|
|
return flags
|
|
|
|
|
}
|
|
|
|
|
if arg == name { // -name ...
|
|
|
|
|
if i+1 < len(flags) {
|
|
|
|
|
// -name value
|
|
|
|
|
flags[i+1] = value
|
|
|
|
|
return flags
|
|
|
|
|
}
|
|
|
|
|
return flags
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
return append(flags, name+"="+value)
|
|
|
|
|
}
|
|
|
|
|
|
use "go env -json" to collect env info all at once
In the worst case scenario, when GOPRIVATE isn't set at all, we would
run these three commands:
* "go env GOPRIVATE", to fetch GOPRIVATE itself
* "go list -m", for GOPRIVATE's fallback
* "go version", to check the version of Go being used
Now that we support Go 1.16 and later, all these three can be obtained
via "go env -json":
$ go env -json GOPRIVATE GOMOD GOVERSION
{
"GOMOD": "/home/mvdan/src/garble/go.mod",
"GOPRIVATE": "",
"GOVERSION": "go1.16.3"
}
Note that we don't get the module path directly, but we can use the
x/mod/modfile Go API to parse it from the GOMOD file cheaply.
Notably, this also simplifies our Go version checking logic, as now we
get just the version string without the "go version" prefix and
"GOOS/GOARCH" suffix we don't care about.
This makes our code a bit more maintainable and robust. When running a
short incremental build, we can also see a small speed-up, as saving two
"go" invocations can save a few milliseconds:
name old time/op new time/op delta
Build/Cache-8 168ms ± 0% 166ms ± 1% -1.26% (p=0.009 n=6+6)
name old bin-B new bin-B delta
Build/Cache-8 6.36M ± 0% 6.36M ± 0% +0.12% (p=0.002 n=6+6)
name old sys-time/op new sys-time/op delta
Build/Cache-8 222ms ± 2% 219ms ± 3% ~ (p=0.589 n=6+6)
name old user-time/op new user-time/op delta
Build/Cache-8 857ms ± 1% 846ms ± 1% -1.31% (p=0.041 n=6+6)
4 years ago
|
|
|
|
func fetchGoEnv() error {
|
|
|
|
|
out, err := exec.Command("go", "env", "-json",
|
default to GOGARBLE=*, stop using GOPRIVATE
We can drop the code that kicked in when GOGARBLE was empty.
We can also add the value in addGarbleToHash unconditionally,
as we never allow it to be empty.
In the tests, remove all GOGARBLE lines where it just meant "obfuscate
everything" or "obfuscate the entire main module".
cgo.txtar had "obfuscate everything" as a separate step,
so remove it entirely.
linkname.txtar started failing because the imported package did not
import strings, so listPackage errored out. This wasn't a problem when
strings itself wasn't obfuscated, as transformLinkname silently left
strings.IndexByte untouched. It is a problem when IndexByte does get
obfuscated. Make that kind of listPackage error visible, and fix it.
reflect.txtar started failing with "unreachable method" runtime throws.
It's not clear to me why; it appears that GOGARBLE=* makes the linker
think that ExportedMethodName is suddenly unreachable.
Work around the problem by making the method explicitly reachable,
and leave a TODO as a reminder to investigate.
Finally, gogarble.txtar no longer needs to test for GOPRIVATE.
The rest of the test is left the same, as we still want the various
values for GOGARBLE to continue to work just like before.
Fixes #594.
2 years ago
|
|
|
|
// Keep in sync with sharedCache.GoEnv.
|
|
|
|
|
"GOOS", "GOMOD", "GOVERSION",
|
use "go env -json" to collect env info all at once
In the worst case scenario, when GOPRIVATE isn't set at all, we would
run these three commands:
* "go env GOPRIVATE", to fetch GOPRIVATE itself
* "go list -m", for GOPRIVATE's fallback
* "go version", to check the version of Go being used
Now that we support Go 1.16 and later, all these three can be obtained
via "go env -json":
$ go env -json GOPRIVATE GOMOD GOVERSION
{
"GOMOD": "/home/mvdan/src/garble/go.mod",
"GOPRIVATE": "",
"GOVERSION": "go1.16.3"
}
Note that we don't get the module path directly, but we can use the
x/mod/modfile Go API to parse it from the GOMOD file cheaply.
Notably, this also simplifies our Go version checking logic, as now we
get just the version string without the "go version" prefix and
"GOOS/GOARCH" suffix we don't care about.
This makes our code a bit more maintainable and robust. When running a
short incremental build, we can also see a small speed-up, as saving two
"go" invocations can save a few milliseconds:
name old time/op new time/op delta
Build/Cache-8 168ms ± 0% 166ms ± 1% -1.26% (p=0.009 n=6+6)
name old bin-B new bin-B delta
Build/Cache-8 6.36M ± 0% 6.36M ± 0% +0.12% (p=0.002 n=6+6)
name old sys-time/op new sys-time/op delta
Build/Cache-8 222ms ± 2% 219ms ± 3% ~ (p=0.589 n=6+6)
name old user-time/op new user-time/op delta
Build/Cache-8 857ms ± 1% 846ms ± 1% -1.31% (p=0.041 n=6+6)
4 years ago
|
|
|
|
).CombinedOutput()
|
|
|
|
|
if err != nil {
|
|
|
|
|
// TODO: cover this in the tests.
|
|
|
|
|
fmt.Fprintf(os.Stderr, `Can't find the Go toolchain: %v
|
use "go env -json" to collect env info all at once
In the worst case scenario, when GOPRIVATE isn't set at all, we would
run these three commands:
* "go env GOPRIVATE", to fetch GOPRIVATE itself
* "go list -m", for GOPRIVATE's fallback
* "go version", to check the version of Go being used
Now that we support Go 1.16 and later, all these three can be obtained
via "go env -json":
$ go env -json GOPRIVATE GOMOD GOVERSION
{
"GOMOD": "/home/mvdan/src/garble/go.mod",
"GOPRIVATE": "",
"GOVERSION": "go1.16.3"
}
Note that we don't get the module path directly, but we can use the
x/mod/modfile Go API to parse it from the GOMOD file cheaply.
Notably, this also simplifies our Go version checking logic, as now we
get just the version string without the "go version" prefix and
"GOOS/GOARCH" suffix we don't care about.
This makes our code a bit more maintainable and robust. When running a
short incremental build, we can also see a small speed-up, as saving two
"go" invocations can save a few milliseconds:
name old time/op new time/op delta
Build/Cache-8 168ms ± 0% 166ms ± 1% -1.26% (p=0.009 n=6+6)
name old bin-B new bin-B delta
Build/Cache-8 6.36M ± 0% 6.36M ± 0% +0.12% (p=0.002 n=6+6)
name old sys-time/op new sys-time/op delta
Build/Cache-8 222ms ± 2% 219ms ± 3% ~ (p=0.589 n=6+6)
name old user-time/op new user-time/op delta
Build/Cache-8 857ms ± 1% 846ms ± 1% -1.31% (p=0.041 n=6+6)
4 years ago
|
|
|
|
|
|
|
|
|
This is likely due to Go not being installed/setup correctly.
|
use "go env -json" to collect env info all at once
In the worst case scenario, when GOPRIVATE isn't set at all, we would
run these three commands:
* "go env GOPRIVATE", to fetch GOPRIVATE itself
* "go list -m", for GOPRIVATE's fallback
* "go version", to check the version of Go being used
Now that we support Go 1.16 and later, all these three can be obtained
via "go env -json":
$ go env -json GOPRIVATE GOMOD GOVERSION
{
"GOMOD": "/home/mvdan/src/garble/go.mod",
"GOPRIVATE": "",
"GOVERSION": "go1.16.3"
}
Note that we don't get the module path directly, but we can use the
x/mod/modfile Go API to parse it from the GOMOD file cheaply.
Notably, this also simplifies our Go version checking logic, as now we
get just the version string without the "go version" prefix and
"GOOS/GOARCH" suffix we don't care about.
This makes our code a bit more maintainable and robust. When running a
short incremental build, we can also see a small speed-up, as saving two
"go" invocations can save a few milliseconds:
name old time/op new time/op delta
Build/Cache-8 168ms ± 0% 166ms ± 1% -1.26% (p=0.009 n=6+6)
name old bin-B new bin-B delta
Build/Cache-8 6.36M ± 0% 6.36M ± 0% +0.12% (p=0.002 n=6+6)
name old sys-time/op new sys-time/op delta
Build/Cache-8 222ms ± 2% 219ms ± 3% ~ (p=0.589 n=6+6)
name old user-time/op new user-time/op delta
Build/Cache-8 857ms ± 1% 846ms ± 1% -1.31% (p=0.041 n=6+6)
4 years ago
|
|
|
|
|
|
|
|
|
To install Go, see: https://go.dev/doc/install
|
use "go env -json" to collect env info all at once
In the worst case scenario, when GOPRIVATE isn't set at all, we would
run these three commands:
* "go env GOPRIVATE", to fetch GOPRIVATE itself
* "go list -m", for GOPRIVATE's fallback
* "go version", to check the version of Go being used
Now that we support Go 1.16 and later, all these three can be obtained
via "go env -json":
$ go env -json GOPRIVATE GOMOD GOVERSION
{
"GOMOD": "/home/mvdan/src/garble/go.mod",
"GOPRIVATE": "",
"GOVERSION": "go1.16.3"
}
Note that we don't get the module path directly, but we can use the
x/mod/modfile Go API to parse it from the GOMOD file cheaply.
Notably, this also simplifies our Go version checking logic, as now we
get just the version string without the "go version" prefix and
"GOOS/GOARCH" suffix we don't care about.
This makes our code a bit more maintainable and robust. When running a
short incremental build, we can also see a small speed-up, as saving two
"go" invocations can save a few milliseconds:
name old time/op new time/op delta
Build/Cache-8 168ms ± 0% 166ms ± 1% -1.26% (p=0.009 n=6+6)
name old bin-B new bin-B delta
Build/Cache-8 6.36M ± 0% 6.36M ± 0% +0.12% (p=0.002 n=6+6)
name old sys-time/op new sys-time/op delta
Build/Cache-8 222ms ± 2% 219ms ± 3% ~ (p=0.589 n=6+6)
name old user-time/op new user-time/op delta
Build/Cache-8 857ms ± 1% 846ms ± 1% -1.31% (p=0.041 n=6+6)
4 years ago
|
|
|
|
`, err)
|
|
|
|
|
return errJustExit(1)
|
use "go env -json" to collect env info all at once
In the worst case scenario, when GOPRIVATE isn't set at all, we would
run these three commands:
* "go env GOPRIVATE", to fetch GOPRIVATE itself
* "go list -m", for GOPRIVATE's fallback
* "go version", to check the version of Go being used
Now that we support Go 1.16 and later, all these three can be obtained
via "go env -json":
$ go env -json GOPRIVATE GOMOD GOVERSION
{
"GOMOD": "/home/mvdan/src/garble/go.mod",
"GOPRIVATE": "",
"GOVERSION": "go1.16.3"
}
Note that we don't get the module path directly, but we can use the
x/mod/modfile Go API to parse it from the GOMOD file cheaply.
Notably, this also simplifies our Go version checking logic, as now we
get just the version string without the "go version" prefix and
"GOOS/GOARCH" suffix we don't care about.
This makes our code a bit more maintainable and robust. When running a
short incremental build, we can also see a small speed-up, as saving two
"go" invocations can save a few milliseconds:
name old time/op new time/op delta
Build/Cache-8 168ms ± 0% 166ms ± 1% -1.26% (p=0.009 n=6+6)
name old bin-B new bin-B delta
Build/Cache-8 6.36M ± 0% 6.36M ± 0% +0.12% (p=0.002 n=6+6)
name old sys-time/op new sys-time/op delta
Build/Cache-8 222ms ± 2% 219ms ± 3% ~ (p=0.589 n=6+6)
name old user-time/op new user-time/op delta
Build/Cache-8 857ms ± 1% 846ms ± 1% -1.31% (p=0.041 n=6+6)
4 years ago
|
|
|
|
}
|
|
|
|
|
if err := json.Unmarshal(out, &cache.GoEnv); err != nil {
|
|
|
|
|
return fmt.Errorf(`cannot unmarshal from "go env -json": %w`, err)
|
|
|
|
|
}
|
deprecate using GOPRIVATE in favor of GOGARBLE (#427)
Piggybacking off of GOPRIVATE is great for a number of reasons:
* People tend to obfuscate private code, whose package paths will
generally be in GOPRIVATE already
* Its meaning and syntax are well understood
* It allows all the flexibility we need without adding our own env var
or config option
However, using GOPRIVATE directly has one main drawback.
It's fairly common to also want to obfuscate public dependencies,
to make the code in private packages even harder to follow.
However, using "GOPRIVATE=*" will result in two main downsides:
* GONOPROXY defaults to GOPRIVATE, so the proxy would be entirely disabled.
Downloading modules, such as when adding or updating dependencies,
or when the local cache is cold, can be less reliable.
* GONOSUMDB defaults to GOPRIVATE, so the sumdb would be entirely disabled.
Adding entries to go.sum, such as when adding or updating dependencies,
can be less secure.
We will continue to consume GOPRIVATE as a fallback,
but we now expect users to set GOGARBLE instead.
The new logic is documented in the README.
While here, rewrite some uses of "private" with "to obfuscate",
to make the code easier to follow and harder to misunderstand.
Fixes #276.
3 years ago
|
|
|
|
cache.GOGARBLE = os.Getenv("GOGARBLE")
|
default to GOGARBLE=*, stop using GOPRIVATE
We can drop the code that kicked in when GOGARBLE was empty.
We can also add the value in addGarbleToHash unconditionally,
as we never allow it to be empty.
In the tests, remove all GOGARBLE lines where it just meant "obfuscate
everything" or "obfuscate the entire main module".
cgo.txtar had "obfuscate everything" as a separate step,
so remove it entirely.
linkname.txtar started failing because the imported package did not
import strings, so listPackage errored out. This wasn't a problem when
strings itself wasn't obfuscated, as transformLinkname silently left
strings.IndexByte untouched. It is a problem when IndexByte does get
obfuscated. Make that kind of listPackage error visible, and fix it.
reflect.txtar started failing with "unreachable method" runtime throws.
It's not clear to me why; it appears that GOGARBLE=* makes the linker
think that ExportedMethodName is suddenly unreachable.
Work around the problem by making the method explicitly reachable,
and leave a TODO as a reminder to investigate.
Finally, gogarble.txtar no longer needs to test for GOPRIVATE.
The rest of the test is left the same, as we still want the various
values for GOGARBLE to continue to work just like before.
Fixes #594.
2 years ago
|
|
|
|
if cache.GOGARBLE == "" {
|
|
|
|
|
cache.GOGARBLE = "*" // we default to obfuscating everything
|
|
|
|
|
}
|
|
|
|
|
return nil
|
|
|
|
|
}
|