You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
garble/shared.go

393 lines
11 KiB
Go

// Copyright (c) 2020, The Garble Authors.
// See LICENSE for licensing information.
package main
import (
"bytes"
"encoding/gob"
"encoding/json"
"fmt"
"os"
"os/exec"
"path/filepath"
"strings"
"time"
ensure the runtime is built in a reproducible way We went to great lengths to ensure garble builds are reproducible. This includes how the tool itself works, as its behavior should be the same given the same inputs. However, we made one crucial mistake with the runtime package. It has go:linkname directives pointing at other packages, and some of those pointed packages aren't its dependencies. Imagine two scenarios where garble builds the runtime package: 1) We run "garble build runtime". The way we handle linkname directives calls listPackage on the target package, to obfuscate the target's import path and object name. However, since we only obtained build info of runtime and its deps, calls for some linknames such as listPackage("sync/atomic") will fail. The linkname directive will leave its target untouched. 2) We run "garble build std". Unlike the first scenario, all listPackage calls issued by runtime's linkname directives will succeed, so its linkname directive targets will be obfuscated. At best, this can result in inconsistent builds, depending on how the runtime package was built. At worst, the mismatching object names can result in errors at link time, if the target packages are actually used. The modified test reproduces the worst case scenario reliably, when the fix is reverted: > env GOCACHE=${WORK}/gocache-empty > garble build -a runtime > garble build -o=out_rebuild ./stdimporter [stderr] # test/main/stdimporter JZzQivnl.NtQJu0H3: relocation target JZzQivnl.iioHinYT not defined JZzQivnl.NtQJu0H3.func9: relocation target JZzQivnl.yz5z0NaH not defined JZzQivnl.(*ypvqhKiQ).String: relocation target JZzQivnl.eVciBQeI not defined JZzQivnl.(*ypvqhKiQ).PkgPath: relocation target JZzQivnl.eVciBQeI not defined [...] The fix consists of two steps. First, if we're building the runtime and listPackage fails on a package, that means we ran into scenario 1 above. To avoid the inconsistency, we fill ListedPackages with "go list [...] std". This means we'll always build runtime as described in scenario 2 above. Second, when building packages other than the runtime, we only allow listPackage to succeed if we're listing a dependency of the current package. This ensures we won't run into similar reproducibility bugs in the future. Finally, re-enable test-gotip on CI since this was the last test flake.
3 years ago
"golang.org/x/mod/module"
)
// sharedCache is shared as a read-only cache between the many garble toolexec
// sub-processes.
start using original action IDs (#251) When we obfuscate a name, what we do is hash the name with the action ID of the package that contains the name. To ensure that the hash changes if the garble tool changes, we used the action ID of the obfuscated build, which is different than the original action ID, as we include garble's own content ID in "go tool compile -V=full" via -toolexec. Let's call that the "obfuscated action ID". Remember that a content ID is roughly the hash of a binary or object file, and an action ID contains the hash of a package's source code plus the content IDs of its dependencies. This had the advantage that it did what we wanted. However, it had one massive drawback: when we compile a package, we only have the obfuscated action IDs of its dependencies. This is because one can't have the content ID of dependent packages before they are built. Usually, this is not a problem, because hashing a foreign name means it comes from a dependency, where we already have the obfuscated action ID. However, that's not always the case. First, go:linkname directives can point to any symbol that ends up in the binary, even if the package is not a dependency. So garble could only support linkname targets belonging to dependencies. This is at the root of why we could not obfuscate the runtime; it contains linkname directives targeting the net package, for example, which depends on runtime. Second, some other places did not have an easy access to obfuscated action IDs, like transformAsm, which had to recover it from a temporary file stored by transformCompile. Plus, this was all pretty expensive, as each toolexec sub-process had to make repeated calls to buildidOf with the object files of dependencies. We even had to use extra calls to "go list" in the case of indirect dependencies, as their export files do not appear in importcfg files. All in all, the old method was complex and expensive. A better mechanism is to use the original action IDs directly, as listed by "go list" without garble in the picture. This would mean that the hashing does not change if garble changes, meaning weaker obfuscation. To regain that property, we define the "garble action ID", which is just the original action ID hashed together with garble's own content ID. This is practically the same as the obfuscated build ID we used before, but since it doesn't go through "go tool compile -V=full" and the obfuscated build itself, we can work out *all* the garble action IDs upfront, before the obfuscated build even starts. This fixes all of our problems. Now we know all garble build IDs upfront, so a bunch of hacks can be entirely removed. Plus, since we know them upfront, we can also cache them and avoid repeated calls to "go tool buildid". While at it, make use of the new BuildID field in Go 1.16's "list -json -export". This avoids the vast majority of "go tool buildid" calls, as the only ones that remain are 2 on the garble binary itself. The numbers for Go 1.16 look very good: name old time/op new time/op delta Build-8 146ms ± 4% 101ms ± 1% -31.01% (p=0.002 n=6+6) name old bin-B new bin-B delta Build-8 6.61M ± 0% 6.60M ± 0% -0.09% (p=0.002 n=6+6) name old sys-time/op new sys-time/op delta Build-8 321ms ± 7% 202ms ± 6% -37.11% (p=0.002 n=6+6) name old user-time/op new user-time/op delta Build-8 538ms ± 4% 414ms ± 4% -23.12% (p=0.002 n=6+6)
3 years ago
//
// Note that we fill this cache once from the root process in saveListedPackages,
// store it into a temporary file via gob encoding, and then reuse that file
// in each of the garble toolexec sub-processes.
type sharedCache struct {
fail if we are unexpectedly overwriting files (#418) While investigating a bug report, I noticed that garble was writing to the same temp file twice. At best, writing to the same path on disk twice is wasteful, as the design is careful to be deterministic and use unique paths. At worst, the two writes could cause races at the filesystem level. To prevent either of those situations, we now create files with os.OpenFile and os.O_EXCL, meaning that we will error if the file already exists. That change uncovered a number of such unintended cases. First, transformAsm would write obfuscated Go files twice. This is because the Go toolchain actually runs: [...]/asm -gensymabis [...] foo.s bar.s [...]/asm [...] foo.s bar.s That is, the first run is only meant to generate symbol ABIs, which are then used by the compiler. We need to obfuscate at that first stage, because the symbol ABI descriptions need to use obfuscated names. However, having already obfuscated the assembly on the first stage, there is no need to do so again on the second stage. If we detect gensymabis is missing, we simply reuse the previous files. This first situation doesn't seem racy, but obfuscating the Go assembly files twice is certainly unnecessary. Second, saveKnownReflectAPIs wrote a gob file to the build cache. Since the build cache can be kept between builds, and since the build cache uses reproducible paths for each build, running the same "garble build" twice could overwrite those files. This could actually cause races at the filesystem level; if two concurrent builds write to the same gob file on disk, one of them could end up using a partially-written file. Note that this is the only of the three cases not using temporary files. As such, it is expected that the file may already exist. In such a case, we simply avoid overwriting it rather than failing. Third, when "garble build -a" was used, and when we needed an export file not listed in importcfg, we would end up calling roughly: go list -export -toolexec=garble -a <dependency> This meant we would re-build and re-obfuscate those packages. Which is unfortunate, because the parent process already did via: go build -toolexec=garble -a <main> The repeated dependency builds tripped the new os.O_EXCL check, as we would try to overwrite the same obfuscated Go files. Beyond being wasteful, this could again cause subtle filesystem races. To fix the problem, avoid passing flags like "-a" to nested go commands. Overall, we should likely be using safer ways to write to disk, be it via either atomic writes or locked files. However, for now, catching duplicate writes is a big step. I have left a self-assigned TODO for further improvements. CI on the pull request found a failure on test-gotip. The failure reproduces on master, so it seems to be related to gotip, and not a regression introduced by this change. For now, disable test-gotip until we can investigate.
3 years ago
ExecPath string // absolute path to the garble binary being used
ForwardBuildFlags []string // build flags fed to the original "garble ..." command
// ListedPackages contains data obtained via 'go list -json -export -deps'.
// This allows us to obtain the non-obfuscated export data of all dependencies,
// useful for type checking of the packages as we obfuscate them.
start using original action IDs (#251) When we obfuscate a name, what we do is hash the name with the action ID of the package that contains the name. To ensure that the hash changes if the garble tool changes, we used the action ID of the obfuscated build, which is different than the original action ID, as we include garble's own content ID in "go tool compile -V=full" via -toolexec. Let's call that the "obfuscated action ID". Remember that a content ID is roughly the hash of a binary or object file, and an action ID contains the hash of a package's source code plus the content IDs of its dependencies. This had the advantage that it did what we wanted. However, it had one massive drawback: when we compile a package, we only have the obfuscated action IDs of its dependencies. This is because one can't have the content ID of dependent packages before they are built. Usually, this is not a problem, because hashing a foreign name means it comes from a dependency, where we already have the obfuscated action ID. However, that's not always the case. First, go:linkname directives can point to any symbol that ends up in the binary, even if the package is not a dependency. So garble could only support linkname targets belonging to dependencies. This is at the root of why we could not obfuscate the runtime; it contains linkname directives targeting the net package, for example, which depends on runtime. Second, some other places did not have an easy access to obfuscated action IDs, like transformAsm, which had to recover it from a temporary file stored by transformCompile. Plus, this was all pretty expensive, as each toolexec sub-process had to make repeated calls to buildidOf with the object files of dependencies. We even had to use extra calls to "go list" in the case of indirect dependencies, as their export files do not appear in importcfg files. All in all, the old method was complex and expensive. A better mechanism is to use the original action IDs directly, as listed by "go list" without garble in the picture. This would mean that the hashing does not change if garble changes, meaning weaker obfuscation. To regain that property, we define the "garble action ID", which is just the original action ID hashed together with garble's own content ID. This is practically the same as the obfuscated build ID we used before, but since it doesn't go through "go tool compile -V=full" and the obfuscated build itself, we can work out *all* the garble action IDs upfront, before the obfuscated build even starts. This fixes all of our problems. Now we know all garble build IDs upfront, so a bunch of hacks can be entirely removed. Plus, since we know them upfront, we can also cache them and avoid repeated calls to "go tool buildid". While at it, make use of the new BuildID field in Go 1.16's "list -json -export". This avoids the vast majority of "go tool buildid" calls, as the only ones that remain are 2 on the garble binary itself. The numbers for Go 1.16 look very good: name old time/op new time/op delta Build-8 146ms ± 4% 101ms ± 1% -31.01% (p=0.002 n=6+6) name old bin-B new bin-B delta Build-8 6.61M ± 0% 6.60M ± 0% -0.09% (p=0.002 n=6+6) name old sys-time/op new sys-time/op delta Build-8 321ms ± 7% 202ms ± 6% -37.11% (p=0.002 n=6+6) name old user-time/op new user-time/op delta Build-8 538ms ± 4% 414ms ± 4% -23.12% (p=0.002 n=6+6)
3 years ago
ListedPackages map[string]*listedPackage
// We can't rely on the module version to exist,
// because it's missing in local builds without 'go install'.
// For now, use 'go tool buildid' on the garble binary.
// Just like Go's own cache, we use hex-encoded sha256 sums.
// Once https://github.com/golang/go/issues/37475 is fixed,
// we can likely just use that.
BinaryContentID []byte
GOGARBLE string
only list missing packages when obfuscating the runtime We were listing all of std, which certainly worked, but was quite slow at over 200 packages. In practice, we can only be missing up to 20-30 packages. It was a good change as it fixed a severe bug, but it also introduced a fairly noticeable slow-down. The numbers are clear; this change shaves off multiple seconds when obfuscating the runtime with a cold cache: name old time/op new time/op delta Build/NoCache-16 5.06s ± 1% 1.94s ± 1% -61.64% (p=0.008 n=5+5) name old bin-B new bin-B delta Build/NoCache-16 6.70M ± 0% 6.71M ± 0% +0.05% (p=0.008 n=5+5) name old sys-time/op new sys-time/op delta Build/NoCache-16 13.4s ± 2% 5.0s ± 2% -62.45% (p=0.008 n=5+5) name old user-time/op new user-time/op delta Build/NoCache-16 60.6s ± 1% 19.8s ± 1% -67.34% (p=0.008 n=5+5) Since we only want to call "go list" one extra time, instead of once for every package we find out we're missing, we want to know what packages we could be missing in advance. Resurrect a smarter version of the runtime-related script. Finally, remove the runtime-related.txt test script, as it has now been superseeded by the sanity checks in listPackage. That is, obfuscating the runtime package will now panic if we are missing any necessary package information. To double check that we get the runtime's linkname edge case right, make gogarble.txt use runtime/debug.WriteHeapDump, which is implemented via a direct runtime linkname. This ensures we don't lose test coverage from runtime-related.txt.
3 years ago
// Filled directly from "go env".
// Remember to update the exec call when adding or removing names.
GoEnv struct {
only list missing packages when obfuscating the runtime We were listing all of std, which certainly worked, but was quite slow at over 200 packages. In practice, we can only be missing up to 20-30 packages. It was a good change as it fixed a severe bug, but it also introduced a fairly noticeable slow-down. The numbers are clear; this change shaves off multiple seconds when obfuscating the runtime with a cold cache: name old time/op new time/op delta Build/NoCache-16 5.06s ± 1% 1.94s ± 1% -61.64% (p=0.008 n=5+5) name old bin-B new bin-B delta Build/NoCache-16 6.70M ± 0% 6.71M ± 0% +0.05% (p=0.008 n=5+5) name old sys-time/op new sys-time/op delta Build/NoCache-16 13.4s ± 2% 5.0s ± 2% -62.45% (p=0.008 n=5+5) name old user-time/op new user-time/op delta Build/NoCache-16 60.6s ± 1% 19.8s ± 1% -67.34% (p=0.008 n=5+5) Since we only want to call "go list" one extra time, instead of once for every package we find out we're missing, we want to know what packages we could be missing in advance. Resurrect a smarter version of the runtime-related script. Finally, remove the runtime-related.txt test script, as it has now been superseeded by the sanity checks in listPackage. That is, obfuscating the runtime package will now panic if we are missing any necessary package information. To double check that we get the runtime's linkname edge case right, make gogarble.txt use runtime/debug.WriteHeapDump, which is implemented via a direct runtime linkname. This ensures we don't lose test coverage from runtime-related.txt.
3 years ago
GOOS string // i.e. the GOOS build target
GOPRIVATE string
GOMOD string
GOVERSION string
GOCACHE string
}
}
start using original action IDs (#251) When we obfuscate a name, what we do is hash the name with the action ID of the package that contains the name. To ensure that the hash changes if the garble tool changes, we used the action ID of the obfuscated build, which is different than the original action ID, as we include garble's own content ID in "go tool compile -V=full" via -toolexec. Let's call that the "obfuscated action ID". Remember that a content ID is roughly the hash of a binary or object file, and an action ID contains the hash of a package's source code plus the content IDs of its dependencies. This had the advantage that it did what we wanted. However, it had one massive drawback: when we compile a package, we only have the obfuscated action IDs of its dependencies. This is because one can't have the content ID of dependent packages before they are built. Usually, this is not a problem, because hashing a foreign name means it comes from a dependency, where we already have the obfuscated action ID. However, that's not always the case. First, go:linkname directives can point to any symbol that ends up in the binary, even if the package is not a dependency. So garble could only support linkname targets belonging to dependencies. This is at the root of why we could not obfuscate the runtime; it contains linkname directives targeting the net package, for example, which depends on runtime. Second, some other places did not have an easy access to obfuscated action IDs, like transformAsm, which had to recover it from a temporary file stored by transformCompile. Plus, this was all pretty expensive, as each toolexec sub-process had to make repeated calls to buildidOf with the object files of dependencies. We even had to use extra calls to "go list" in the case of indirect dependencies, as their export files do not appear in importcfg files. All in all, the old method was complex and expensive. A better mechanism is to use the original action IDs directly, as listed by "go list" without garble in the picture. This would mean that the hashing does not change if garble changes, meaning weaker obfuscation. To regain that property, we define the "garble action ID", which is just the original action ID hashed together with garble's own content ID. This is practically the same as the obfuscated build ID we used before, but since it doesn't go through "go tool compile -V=full" and the obfuscated build itself, we can work out *all* the garble action IDs upfront, before the obfuscated build even starts. This fixes all of our problems. Now we know all garble build IDs upfront, so a bunch of hacks can be entirely removed. Plus, since we know them upfront, we can also cache them and avoid repeated calls to "go tool buildid". While at it, make use of the new BuildID field in Go 1.16's "list -json -export". This avoids the vast majority of "go tool buildid" calls, as the only ones that remain are 2 on the garble binary itself. The numbers for Go 1.16 look very good: name old time/op new time/op delta Build-8 146ms ± 4% 101ms ± 1% -31.01% (p=0.002 n=6+6) name old bin-B new bin-B delta Build-8 6.61M ± 0% 6.60M ± 0% -0.09% (p=0.002 n=6+6) name old sys-time/op new sys-time/op delta Build-8 321ms ± 7% 202ms ± 6% -37.11% (p=0.002 n=6+6) name old user-time/op new user-time/op delta Build-8 538ms ± 4% 414ms ± 4% -23.12% (p=0.002 n=6+6)
3 years ago
var cache *sharedCache
start using original action IDs (#251) When we obfuscate a name, what we do is hash the name with the action ID of the package that contains the name. To ensure that the hash changes if the garble tool changes, we used the action ID of the obfuscated build, which is different than the original action ID, as we include garble's own content ID in "go tool compile -V=full" via -toolexec. Let's call that the "obfuscated action ID". Remember that a content ID is roughly the hash of a binary or object file, and an action ID contains the hash of a package's source code plus the content IDs of its dependencies. This had the advantage that it did what we wanted. However, it had one massive drawback: when we compile a package, we only have the obfuscated action IDs of its dependencies. This is because one can't have the content ID of dependent packages before they are built. Usually, this is not a problem, because hashing a foreign name means it comes from a dependency, where we already have the obfuscated action ID. However, that's not always the case. First, go:linkname directives can point to any symbol that ends up in the binary, even if the package is not a dependency. So garble could only support linkname targets belonging to dependencies. This is at the root of why we could not obfuscate the runtime; it contains linkname directives targeting the net package, for example, which depends on runtime. Second, some other places did not have an easy access to obfuscated action IDs, like transformAsm, which had to recover it from a temporary file stored by transformCompile. Plus, this was all pretty expensive, as each toolexec sub-process had to make repeated calls to buildidOf with the object files of dependencies. We even had to use extra calls to "go list" in the case of indirect dependencies, as their export files do not appear in importcfg files. All in all, the old method was complex and expensive. A better mechanism is to use the original action IDs directly, as listed by "go list" without garble in the picture. This would mean that the hashing does not change if garble changes, meaning weaker obfuscation. To regain that property, we define the "garble action ID", which is just the original action ID hashed together with garble's own content ID. This is practically the same as the obfuscated build ID we used before, but since it doesn't go through "go tool compile -V=full" and the obfuscated build itself, we can work out *all* the garble action IDs upfront, before the obfuscated build even starts. This fixes all of our problems. Now we know all garble build IDs upfront, so a bunch of hacks can be entirely removed. Plus, since we know them upfront, we can also cache them and avoid repeated calls to "go tool buildid". While at it, make use of the new BuildID field in Go 1.16's "list -json -export". This avoids the vast majority of "go tool buildid" calls, as the only ones that remain are 2 on the garble binary itself. The numbers for Go 1.16 look very good: name old time/op new time/op delta Build-8 146ms ± 4% 101ms ± 1% -31.01% (p=0.002 n=6+6) name old bin-B new bin-B delta Build-8 6.61M ± 0% 6.60M ± 0% -0.09% (p=0.002 n=6+6) name old sys-time/op new sys-time/op delta Build-8 321ms ± 7% 202ms ± 6% -37.11% (p=0.002 n=6+6) name old user-time/op new user-time/op delta Build-8 538ms ± 4% 414ms ± 4% -23.12% (p=0.002 n=6+6)
3 years ago
// loadSharedCache the shared data passed from the entry garble process
func loadSharedCache() error {
if cache != nil {
panic("shared cache loaded twice?")
}
startTime := time.Now()
start using original action IDs (#251) When we obfuscate a name, what we do is hash the name with the action ID of the package that contains the name. To ensure that the hash changes if the garble tool changes, we used the action ID of the obfuscated build, which is different than the original action ID, as we include garble's own content ID in "go tool compile -V=full" via -toolexec. Let's call that the "obfuscated action ID". Remember that a content ID is roughly the hash of a binary or object file, and an action ID contains the hash of a package's source code plus the content IDs of its dependencies. This had the advantage that it did what we wanted. However, it had one massive drawback: when we compile a package, we only have the obfuscated action IDs of its dependencies. This is because one can't have the content ID of dependent packages before they are built. Usually, this is not a problem, because hashing a foreign name means it comes from a dependency, where we already have the obfuscated action ID. However, that's not always the case. First, go:linkname directives can point to any symbol that ends up in the binary, even if the package is not a dependency. So garble could only support linkname targets belonging to dependencies. This is at the root of why we could not obfuscate the runtime; it contains linkname directives targeting the net package, for example, which depends on runtime. Second, some other places did not have an easy access to obfuscated action IDs, like transformAsm, which had to recover it from a temporary file stored by transformCompile. Plus, this was all pretty expensive, as each toolexec sub-process had to make repeated calls to buildidOf with the object files of dependencies. We even had to use extra calls to "go list" in the case of indirect dependencies, as their export files do not appear in importcfg files. All in all, the old method was complex and expensive. A better mechanism is to use the original action IDs directly, as listed by "go list" without garble in the picture. This would mean that the hashing does not change if garble changes, meaning weaker obfuscation. To regain that property, we define the "garble action ID", which is just the original action ID hashed together with garble's own content ID. This is practically the same as the obfuscated build ID we used before, but since it doesn't go through "go tool compile -V=full" and the obfuscated build itself, we can work out *all* the garble action IDs upfront, before the obfuscated build even starts. This fixes all of our problems. Now we know all garble build IDs upfront, so a bunch of hacks can be entirely removed. Plus, since we know them upfront, we can also cache them and avoid repeated calls to "go tool buildid". While at it, make use of the new BuildID field in Go 1.16's "list -json -export". This avoids the vast majority of "go tool buildid" calls, as the only ones that remain are 2 on the garble binary itself. The numbers for Go 1.16 look very good: name old time/op new time/op delta Build-8 146ms ± 4% 101ms ± 1% -31.01% (p=0.002 n=6+6) name old bin-B new bin-B delta Build-8 6.61M ± 0% 6.60M ± 0% -0.09% (p=0.002 n=6+6) name old sys-time/op new sys-time/op delta Build-8 321ms ± 7% 202ms ± 6% -37.11% (p=0.002 n=6+6) name old user-time/op new user-time/op delta Build-8 538ms ± 4% 414ms ± 4% -23.12% (p=0.002 n=6+6)
3 years ago
f, err := os.Open(filepath.Join(sharedTempDir, "main-cache.gob"))
if err != nil {
return fmt.Errorf(`cannot open shared file: %v; did you run "garble [command]"?`, err)
start using original action IDs (#251) When we obfuscate a name, what we do is hash the name with the action ID of the package that contains the name. To ensure that the hash changes if the garble tool changes, we used the action ID of the obfuscated build, which is different than the original action ID, as we include garble's own content ID in "go tool compile -V=full" via -toolexec. Let's call that the "obfuscated action ID". Remember that a content ID is roughly the hash of a binary or object file, and an action ID contains the hash of a package's source code plus the content IDs of its dependencies. This had the advantage that it did what we wanted. However, it had one massive drawback: when we compile a package, we only have the obfuscated action IDs of its dependencies. This is because one can't have the content ID of dependent packages before they are built. Usually, this is not a problem, because hashing a foreign name means it comes from a dependency, where we already have the obfuscated action ID. However, that's not always the case. First, go:linkname directives can point to any symbol that ends up in the binary, even if the package is not a dependency. So garble could only support linkname targets belonging to dependencies. This is at the root of why we could not obfuscate the runtime; it contains linkname directives targeting the net package, for example, which depends on runtime. Second, some other places did not have an easy access to obfuscated action IDs, like transformAsm, which had to recover it from a temporary file stored by transformCompile. Plus, this was all pretty expensive, as each toolexec sub-process had to make repeated calls to buildidOf with the object files of dependencies. We even had to use extra calls to "go list" in the case of indirect dependencies, as their export files do not appear in importcfg files. All in all, the old method was complex and expensive. A better mechanism is to use the original action IDs directly, as listed by "go list" without garble in the picture. This would mean that the hashing does not change if garble changes, meaning weaker obfuscation. To regain that property, we define the "garble action ID", which is just the original action ID hashed together with garble's own content ID. This is practically the same as the obfuscated build ID we used before, but since it doesn't go through "go tool compile -V=full" and the obfuscated build itself, we can work out *all* the garble action IDs upfront, before the obfuscated build even starts. This fixes all of our problems. Now we know all garble build IDs upfront, so a bunch of hacks can be entirely removed. Plus, since we know them upfront, we can also cache them and avoid repeated calls to "go tool buildid". While at it, make use of the new BuildID field in Go 1.16's "list -json -export". This avoids the vast majority of "go tool buildid" calls, as the only ones that remain are 2 on the garble binary itself. The numbers for Go 1.16 look very good: name old time/op new time/op delta Build-8 146ms ± 4% 101ms ± 1% -31.01% (p=0.002 n=6+6) name old bin-B new bin-B delta Build-8 6.61M ± 0% 6.60M ± 0% -0.09% (p=0.002 n=6+6) name old sys-time/op new sys-time/op delta Build-8 321ms ± 7% 202ms ± 6% -37.11% (p=0.002 n=6+6) name old user-time/op new user-time/op delta Build-8 538ms ± 4% 414ms ± 4% -23.12% (p=0.002 n=6+6)
3 years ago
}
defer func() {
debugf("shared cache loaded in %s from %s", debugSince(startTime), f.Name())
}()
start using original action IDs (#251) When we obfuscate a name, what we do is hash the name with the action ID of the package that contains the name. To ensure that the hash changes if the garble tool changes, we used the action ID of the obfuscated build, which is different than the original action ID, as we include garble's own content ID in "go tool compile -V=full" via -toolexec. Let's call that the "obfuscated action ID". Remember that a content ID is roughly the hash of a binary or object file, and an action ID contains the hash of a package's source code plus the content IDs of its dependencies. This had the advantage that it did what we wanted. However, it had one massive drawback: when we compile a package, we only have the obfuscated action IDs of its dependencies. This is because one can't have the content ID of dependent packages before they are built. Usually, this is not a problem, because hashing a foreign name means it comes from a dependency, where we already have the obfuscated action ID. However, that's not always the case. First, go:linkname directives can point to any symbol that ends up in the binary, even if the package is not a dependency. So garble could only support linkname targets belonging to dependencies. This is at the root of why we could not obfuscate the runtime; it contains linkname directives targeting the net package, for example, which depends on runtime. Second, some other places did not have an easy access to obfuscated action IDs, like transformAsm, which had to recover it from a temporary file stored by transformCompile. Plus, this was all pretty expensive, as each toolexec sub-process had to make repeated calls to buildidOf with the object files of dependencies. We even had to use extra calls to "go list" in the case of indirect dependencies, as their export files do not appear in importcfg files. All in all, the old method was complex and expensive. A better mechanism is to use the original action IDs directly, as listed by "go list" without garble in the picture. This would mean that the hashing does not change if garble changes, meaning weaker obfuscation. To regain that property, we define the "garble action ID", which is just the original action ID hashed together with garble's own content ID. This is practically the same as the obfuscated build ID we used before, but since it doesn't go through "go tool compile -V=full" and the obfuscated build itself, we can work out *all* the garble action IDs upfront, before the obfuscated build even starts. This fixes all of our problems. Now we know all garble build IDs upfront, so a bunch of hacks can be entirely removed. Plus, since we know them upfront, we can also cache them and avoid repeated calls to "go tool buildid". While at it, make use of the new BuildID field in Go 1.16's "list -json -export". This avoids the vast majority of "go tool buildid" calls, as the only ones that remain are 2 on the garble binary itself. The numbers for Go 1.16 look very good: name old time/op new time/op delta Build-8 146ms ± 4% 101ms ± 1% -31.01% (p=0.002 n=6+6) name old bin-B new bin-B delta Build-8 6.61M ± 0% 6.60M ± 0% -0.09% (p=0.002 n=6+6) name old sys-time/op new sys-time/op delta Build-8 321ms ± 7% 202ms ± 6% -37.11% (p=0.002 n=6+6) name old user-time/op new user-time/op delta Build-8 538ms ± 4% 414ms ± 4% -23.12% (p=0.002 n=6+6)
3 years ago
defer f.Close()
if err := gob.NewDecoder(f).Decode(&cache); err != nil {
return fmt.Errorf("cannot decode shared file: %v", err)
}
return nil
}
start using original action IDs (#251) When we obfuscate a name, what we do is hash the name with the action ID of the package that contains the name. To ensure that the hash changes if the garble tool changes, we used the action ID of the obfuscated build, which is different than the original action ID, as we include garble's own content ID in "go tool compile -V=full" via -toolexec. Let's call that the "obfuscated action ID". Remember that a content ID is roughly the hash of a binary or object file, and an action ID contains the hash of a package's source code plus the content IDs of its dependencies. This had the advantage that it did what we wanted. However, it had one massive drawback: when we compile a package, we only have the obfuscated action IDs of its dependencies. This is because one can't have the content ID of dependent packages before they are built. Usually, this is not a problem, because hashing a foreign name means it comes from a dependency, where we already have the obfuscated action ID. However, that's not always the case. First, go:linkname directives can point to any symbol that ends up in the binary, even if the package is not a dependency. So garble could only support linkname targets belonging to dependencies. This is at the root of why we could not obfuscate the runtime; it contains linkname directives targeting the net package, for example, which depends on runtime. Second, some other places did not have an easy access to obfuscated action IDs, like transformAsm, which had to recover it from a temporary file stored by transformCompile. Plus, this was all pretty expensive, as each toolexec sub-process had to make repeated calls to buildidOf with the object files of dependencies. We even had to use extra calls to "go list" in the case of indirect dependencies, as their export files do not appear in importcfg files. All in all, the old method was complex and expensive. A better mechanism is to use the original action IDs directly, as listed by "go list" without garble in the picture. This would mean that the hashing does not change if garble changes, meaning weaker obfuscation. To regain that property, we define the "garble action ID", which is just the original action ID hashed together with garble's own content ID. This is practically the same as the obfuscated build ID we used before, but since it doesn't go through "go tool compile -V=full" and the obfuscated build itself, we can work out *all* the garble action IDs upfront, before the obfuscated build even starts. This fixes all of our problems. Now we know all garble build IDs upfront, so a bunch of hacks can be entirely removed. Plus, since we know them upfront, we can also cache them and avoid repeated calls to "go tool buildid". While at it, make use of the new BuildID field in Go 1.16's "list -json -export". This avoids the vast majority of "go tool buildid" calls, as the only ones that remain are 2 on the garble binary itself. The numbers for Go 1.16 look very good: name old time/op new time/op delta Build-8 146ms ± 4% 101ms ± 1% -31.01% (p=0.002 n=6+6) name old bin-B new bin-B delta Build-8 6.61M ± 0% 6.60M ± 0% -0.09% (p=0.002 n=6+6) name old sys-time/op new sys-time/op delta Build-8 321ms ± 7% 202ms ± 6% -37.11% (p=0.002 n=6+6) name old user-time/op new user-time/op delta Build-8 538ms ± 4% 414ms ± 4% -23.12% (p=0.002 n=6+6)
3 years ago
// saveSharedCache creates a temporary directory to share between garble processes.
// This directory also includes the gob-encoded cache global.
start using original action IDs (#251) When we obfuscate a name, what we do is hash the name with the action ID of the package that contains the name. To ensure that the hash changes if the garble tool changes, we used the action ID of the obfuscated build, which is different than the original action ID, as we include garble's own content ID in "go tool compile -V=full" via -toolexec. Let's call that the "obfuscated action ID". Remember that a content ID is roughly the hash of a binary or object file, and an action ID contains the hash of a package's source code plus the content IDs of its dependencies. This had the advantage that it did what we wanted. However, it had one massive drawback: when we compile a package, we only have the obfuscated action IDs of its dependencies. This is because one can't have the content ID of dependent packages before they are built. Usually, this is not a problem, because hashing a foreign name means it comes from a dependency, where we already have the obfuscated action ID. However, that's not always the case. First, go:linkname directives can point to any symbol that ends up in the binary, even if the package is not a dependency. So garble could only support linkname targets belonging to dependencies. This is at the root of why we could not obfuscate the runtime; it contains linkname directives targeting the net package, for example, which depends on runtime. Second, some other places did not have an easy access to obfuscated action IDs, like transformAsm, which had to recover it from a temporary file stored by transformCompile. Plus, this was all pretty expensive, as each toolexec sub-process had to make repeated calls to buildidOf with the object files of dependencies. We even had to use extra calls to "go list" in the case of indirect dependencies, as their export files do not appear in importcfg files. All in all, the old method was complex and expensive. A better mechanism is to use the original action IDs directly, as listed by "go list" without garble in the picture. This would mean that the hashing does not change if garble changes, meaning weaker obfuscation. To regain that property, we define the "garble action ID", which is just the original action ID hashed together with garble's own content ID. This is practically the same as the obfuscated build ID we used before, but since it doesn't go through "go tool compile -V=full" and the obfuscated build itself, we can work out *all* the garble action IDs upfront, before the obfuscated build even starts. This fixes all of our problems. Now we know all garble build IDs upfront, so a bunch of hacks can be entirely removed. Plus, since we know them upfront, we can also cache them and avoid repeated calls to "go tool buildid". While at it, make use of the new BuildID field in Go 1.16's "list -json -export". This avoids the vast majority of "go tool buildid" calls, as the only ones that remain are 2 on the garble binary itself. The numbers for Go 1.16 look very good: name old time/op new time/op delta Build-8 146ms ± 4% 101ms ± 1% -31.01% (p=0.002 n=6+6) name old bin-B new bin-B delta Build-8 6.61M ± 0% 6.60M ± 0% -0.09% (p=0.002 n=6+6) name old sys-time/op new sys-time/op delta Build-8 321ms ± 7% 202ms ± 6% -37.11% (p=0.002 n=6+6) name old user-time/op new user-time/op delta Build-8 538ms ± 4% 414ms ± 4% -23.12% (p=0.002 n=6+6)
3 years ago
func saveSharedCache() (string, error) {
if cache == nil {
panic("saving a missing cache?")
}
dir, err := os.MkdirTemp("", "garble-shared")
if err != nil {
return "", err
}
sharedCache := filepath.Join(dir, "main-cache.gob")
fail if we are unexpectedly overwriting files (#418) While investigating a bug report, I noticed that garble was writing to the same temp file twice. At best, writing to the same path on disk twice is wasteful, as the design is careful to be deterministic and use unique paths. At worst, the two writes could cause races at the filesystem level. To prevent either of those situations, we now create files with os.OpenFile and os.O_EXCL, meaning that we will error if the file already exists. That change uncovered a number of such unintended cases. First, transformAsm would write obfuscated Go files twice. This is because the Go toolchain actually runs: [...]/asm -gensymabis [...] foo.s bar.s [...]/asm [...] foo.s bar.s That is, the first run is only meant to generate symbol ABIs, which are then used by the compiler. We need to obfuscate at that first stage, because the symbol ABI descriptions need to use obfuscated names. However, having already obfuscated the assembly on the first stage, there is no need to do so again on the second stage. If we detect gensymabis is missing, we simply reuse the previous files. This first situation doesn't seem racy, but obfuscating the Go assembly files twice is certainly unnecessary. Second, saveKnownReflectAPIs wrote a gob file to the build cache. Since the build cache can be kept between builds, and since the build cache uses reproducible paths for each build, running the same "garble build" twice could overwrite those files. This could actually cause races at the filesystem level; if two concurrent builds write to the same gob file on disk, one of them could end up using a partially-written file. Note that this is the only of the three cases not using temporary files. As such, it is expected that the file may already exist. In such a case, we simply avoid overwriting it rather than failing. Third, when "garble build -a" was used, and when we needed an export file not listed in importcfg, we would end up calling roughly: go list -export -toolexec=garble -a <dependency> This meant we would re-build and re-obfuscate those packages. Which is unfortunate, because the parent process already did via: go build -toolexec=garble -a <main> The repeated dependency builds tripped the new os.O_EXCL check, as we would try to overwrite the same obfuscated Go files. Beyond being wasteful, this could again cause subtle filesystem races. To fix the problem, avoid passing flags like "-a" to nested go commands. Overall, we should likely be using safer ways to write to disk, be it via either atomic writes or locked files. However, for now, catching duplicate writes is a big step. I have left a self-assigned TODO for further improvements. CI on the pull request found a failure on test-gotip. The failure reproduces on master, so it seems to be related to gotip, and not a regression introduced by this change. For now, disable test-gotip until we can investigate.
3 years ago
if err := writeGobExclusive(sharedCache, &cache); err != nil {
return "", err
}
fail if we are unexpectedly overwriting files (#418) While investigating a bug report, I noticed that garble was writing to the same temp file twice. At best, writing to the same path on disk twice is wasteful, as the design is careful to be deterministic and use unique paths. At worst, the two writes could cause races at the filesystem level. To prevent either of those situations, we now create files with os.OpenFile and os.O_EXCL, meaning that we will error if the file already exists. That change uncovered a number of such unintended cases. First, transformAsm would write obfuscated Go files twice. This is because the Go toolchain actually runs: [...]/asm -gensymabis [...] foo.s bar.s [...]/asm [...] foo.s bar.s That is, the first run is only meant to generate symbol ABIs, which are then used by the compiler. We need to obfuscate at that first stage, because the symbol ABI descriptions need to use obfuscated names. However, having already obfuscated the assembly on the first stage, there is no need to do so again on the second stage. If we detect gensymabis is missing, we simply reuse the previous files. This first situation doesn't seem racy, but obfuscating the Go assembly files twice is certainly unnecessary. Second, saveKnownReflectAPIs wrote a gob file to the build cache. Since the build cache can be kept between builds, and since the build cache uses reproducible paths for each build, running the same "garble build" twice could overwrite those files. This could actually cause races at the filesystem level; if two concurrent builds write to the same gob file on disk, one of them could end up using a partially-written file. Note that this is the only of the three cases not using temporary files. As such, it is expected that the file may already exist. In such a case, we simply avoid overwriting it rather than failing. Third, when "garble build -a" was used, and when we needed an export file not listed in importcfg, we would end up calling roughly: go list -export -toolexec=garble -a <dependency> This meant we would re-build and re-obfuscate those packages. Which is unfortunate, because the parent process already did via: go build -toolexec=garble -a <main> The repeated dependency builds tripped the new os.O_EXCL check, as we would try to overwrite the same obfuscated Go files. Beyond being wasteful, this could again cause subtle filesystem races. To fix the problem, avoid passing flags like "-a" to nested go commands. Overall, we should likely be using safer ways to write to disk, be it via either atomic writes or locked files. However, for now, catching duplicate writes is a big step. I have left a self-assigned TODO for further improvements. CI on the pull request found a failure on test-gotip. The failure reproduces on master, so it seems to be related to gotip, and not a regression introduced by this change. For now, disable test-gotip until we can investigate.
3 years ago
return dir, nil
}
fail if we are unexpectedly overwriting files (#418) While investigating a bug report, I noticed that garble was writing to the same temp file twice. At best, writing to the same path on disk twice is wasteful, as the design is careful to be deterministic and use unique paths. At worst, the two writes could cause races at the filesystem level. To prevent either of those situations, we now create files with os.OpenFile and os.O_EXCL, meaning that we will error if the file already exists. That change uncovered a number of such unintended cases. First, transformAsm would write obfuscated Go files twice. This is because the Go toolchain actually runs: [...]/asm -gensymabis [...] foo.s bar.s [...]/asm [...] foo.s bar.s That is, the first run is only meant to generate symbol ABIs, which are then used by the compiler. We need to obfuscate at that first stage, because the symbol ABI descriptions need to use obfuscated names. However, having already obfuscated the assembly on the first stage, there is no need to do so again on the second stage. If we detect gensymabis is missing, we simply reuse the previous files. This first situation doesn't seem racy, but obfuscating the Go assembly files twice is certainly unnecessary. Second, saveKnownReflectAPIs wrote a gob file to the build cache. Since the build cache can be kept between builds, and since the build cache uses reproducible paths for each build, running the same "garble build" twice could overwrite those files. This could actually cause races at the filesystem level; if two concurrent builds write to the same gob file on disk, one of them could end up using a partially-written file. Note that this is the only of the three cases not using temporary files. As such, it is expected that the file may already exist. In such a case, we simply avoid overwriting it rather than failing. Third, when "garble build -a" was used, and when we needed an export file not listed in importcfg, we would end up calling roughly: go list -export -toolexec=garble -a <dependency> This meant we would re-build and re-obfuscate those packages. Which is unfortunate, because the parent process already did via: go build -toolexec=garble -a <main> The repeated dependency builds tripped the new os.O_EXCL check, as we would try to overwrite the same obfuscated Go files. Beyond being wasteful, this could again cause subtle filesystem races. To fix the problem, avoid passing flags like "-a" to nested go commands. Overall, we should likely be using safer ways to write to disk, be it via either atomic writes or locked files. However, for now, catching duplicate writes is a big step. I have left a self-assigned TODO for further improvements. CI on the pull request found a failure on test-gotip. The failure reproduces on master, so it seems to be related to gotip, and not a regression introduced by this change. For now, disable test-gotip until we can investigate.
3 years ago
func createExclusive(name string) (*os.File, error) {
return os.OpenFile(name, os.O_RDWR|os.O_CREATE|os.O_EXCL, 0o666)
}
// TODO(mvdan): consider using proper atomic file writes.
// Or possibly even "lockedfile", mimicking cmd/go.
func writeFileExclusive(name string, data []byte) error {
f, err := createExclusive(name)
if err != nil {
return err
}
fail if we are unexpectedly overwriting files (#418) While investigating a bug report, I noticed that garble was writing to the same temp file twice. At best, writing to the same path on disk twice is wasteful, as the design is careful to be deterministic and use unique paths. At worst, the two writes could cause races at the filesystem level. To prevent either of those situations, we now create files with os.OpenFile and os.O_EXCL, meaning that we will error if the file already exists. That change uncovered a number of such unintended cases. First, transformAsm would write obfuscated Go files twice. This is because the Go toolchain actually runs: [...]/asm -gensymabis [...] foo.s bar.s [...]/asm [...] foo.s bar.s That is, the first run is only meant to generate symbol ABIs, which are then used by the compiler. We need to obfuscate at that first stage, because the symbol ABI descriptions need to use obfuscated names. However, having already obfuscated the assembly on the first stage, there is no need to do so again on the second stage. If we detect gensymabis is missing, we simply reuse the previous files. This first situation doesn't seem racy, but obfuscating the Go assembly files twice is certainly unnecessary. Second, saveKnownReflectAPIs wrote a gob file to the build cache. Since the build cache can be kept between builds, and since the build cache uses reproducible paths for each build, running the same "garble build" twice could overwrite those files. This could actually cause races at the filesystem level; if two concurrent builds write to the same gob file on disk, one of them could end up using a partially-written file. Note that this is the only of the three cases not using temporary files. As such, it is expected that the file may already exist. In such a case, we simply avoid overwriting it rather than failing. Third, when "garble build -a" was used, and when we needed an export file not listed in importcfg, we would end up calling roughly: go list -export -toolexec=garble -a <dependency> This meant we would re-build and re-obfuscate those packages. Which is unfortunate, because the parent process already did via: go build -toolexec=garble -a <main> The repeated dependency builds tripped the new os.O_EXCL check, as we would try to overwrite the same obfuscated Go files. Beyond being wasteful, this could again cause subtle filesystem races. To fix the problem, avoid passing flags like "-a" to nested go commands. Overall, we should likely be using safer ways to write to disk, be it via either atomic writes or locked files. However, for now, catching duplicate writes is a big step. I have left a self-assigned TODO for further improvements. CI on the pull request found a failure on test-gotip. The failure reproduces on master, so it seems to be related to gotip, and not a regression introduced by this change. For now, disable test-gotip until we can investigate.
3 years ago
_, err = f.Write(data)
if err2 := f.Close(); err == nil {
err = err2
}
return err
}
func writeGobExclusive(name string, val any) error {
fail if we are unexpectedly overwriting files (#418) While investigating a bug report, I noticed that garble was writing to the same temp file twice. At best, writing to the same path on disk twice is wasteful, as the design is careful to be deterministic and use unique paths. At worst, the two writes could cause races at the filesystem level. To prevent either of those situations, we now create files with os.OpenFile and os.O_EXCL, meaning that we will error if the file already exists. That change uncovered a number of such unintended cases. First, transformAsm would write obfuscated Go files twice. This is because the Go toolchain actually runs: [...]/asm -gensymabis [...] foo.s bar.s [...]/asm [...] foo.s bar.s That is, the first run is only meant to generate symbol ABIs, which are then used by the compiler. We need to obfuscate at that first stage, because the symbol ABI descriptions need to use obfuscated names. However, having already obfuscated the assembly on the first stage, there is no need to do so again on the second stage. If we detect gensymabis is missing, we simply reuse the previous files. This first situation doesn't seem racy, but obfuscating the Go assembly files twice is certainly unnecessary. Second, saveKnownReflectAPIs wrote a gob file to the build cache. Since the build cache can be kept between builds, and since the build cache uses reproducible paths for each build, running the same "garble build" twice could overwrite those files. This could actually cause races at the filesystem level; if two concurrent builds write to the same gob file on disk, one of them could end up using a partially-written file. Note that this is the only of the three cases not using temporary files. As such, it is expected that the file may already exist. In such a case, we simply avoid overwriting it rather than failing. Third, when "garble build -a" was used, and when we needed an export file not listed in importcfg, we would end up calling roughly: go list -export -toolexec=garble -a <dependency> This meant we would re-build and re-obfuscate those packages. Which is unfortunate, because the parent process already did via: go build -toolexec=garble -a <main> The repeated dependency builds tripped the new os.O_EXCL check, as we would try to overwrite the same obfuscated Go files. Beyond being wasteful, this could again cause subtle filesystem races. To fix the problem, avoid passing flags like "-a" to nested go commands. Overall, we should likely be using safer ways to write to disk, be it via either atomic writes or locked files. However, for now, catching duplicate writes is a big step. I have left a self-assigned TODO for further improvements. CI on the pull request found a failure on test-gotip. The failure reproduces on master, so it seems to be related to gotip, and not a regression introduced by this change. For now, disable test-gotip until we can investigate.
3 years ago
f, err := createExclusive(name)
if err != nil {
return err
}
if err := gob.NewEncoder(f).Encode(val); err != nil {
return err
}
if err2 := f.Close(); err == nil {
err = err2
}
return err
}
start using original action IDs (#251) When we obfuscate a name, what we do is hash the name with the action ID of the package that contains the name. To ensure that the hash changes if the garble tool changes, we used the action ID of the obfuscated build, which is different than the original action ID, as we include garble's own content ID in "go tool compile -V=full" via -toolexec. Let's call that the "obfuscated action ID". Remember that a content ID is roughly the hash of a binary or object file, and an action ID contains the hash of a package's source code plus the content IDs of its dependencies. This had the advantage that it did what we wanted. However, it had one massive drawback: when we compile a package, we only have the obfuscated action IDs of its dependencies. This is because one can't have the content ID of dependent packages before they are built. Usually, this is not a problem, because hashing a foreign name means it comes from a dependency, where we already have the obfuscated action ID. However, that's not always the case. First, go:linkname directives can point to any symbol that ends up in the binary, even if the package is not a dependency. So garble could only support linkname targets belonging to dependencies. This is at the root of why we could not obfuscate the runtime; it contains linkname directives targeting the net package, for example, which depends on runtime. Second, some other places did not have an easy access to obfuscated action IDs, like transformAsm, which had to recover it from a temporary file stored by transformCompile. Plus, this was all pretty expensive, as each toolexec sub-process had to make repeated calls to buildidOf with the object files of dependencies. We even had to use extra calls to "go list" in the case of indirect dependencies, as their export files do not appear in importcfg files. All in all, the old method was complex and expensive. A better mechanism is to use the original action IDs directly, as listed by "go list" without garble in the picture. This would mean that the hashing does not change if garble changes, meaning weaker obfuscation. To regain that property, we define the "garble action ID", which is just the original action ID hashed together with garble's own content ID. This is practically the same as the obfuscated build ID we used before, but since it doesn't go through "go tool compile -V=full" and the obfuscated build itself, we can work out *all* the garble action IDs upfront, before the obfuscated build even starts. This fixes all of our problems. Now we know all garble build IDs upfront, so a bunch of hacks can be entirely removed. Plus, since we know them upfront, we can also cache them and avoid repeated calls to "go tool buildid". While at it, make use of the new BuildID field in Go 1.16's "list -json -export". This avoids the vast majority of "go tool buildid" calls, as the only ones that remain are 2 on the garble binary itself. The numbers for Go 1.16 look very good: name old time/op new time/op delta Build-8 146ms ± 4% 101ms ± 1% -31.01% (p=0.002 n=6+6) name old bin-B new bin-B delta Build-8 6.61M ± 0% 6.60M ± 0% -0.09% (p=0.002 n=6+6) name old sys-time/op new sys-time/op delta Build-8 321ms ± 7% 202ms ± 6% -37.11% (p=0.002 n=6+6) name old user-time/op new user-time/op delta Build-8 538ms ± 4% 414ms ± 4% -23.12% (p=0.002 n=6+6)
3 years ago
// listedPackage contains the 'go list -json -export' fields obtained by the
// root process, shared with all garble sub-processes via a file.
type listedPackage struct {
Name string
ImportPath string
ForTest string
Export string
start using original action IDs (#251) When we obfuscate a name, what we do is hash the name with the action ID of the package that contains the name. To ensure that the hash changes if the garble tool changes, we used the action ID of the obfuscated build, which is different than the original action ID, as we include garble's own content ID in "go tool compile -V=full" via -toolexec. Let's call that the "obfuscated action ID". Remember that a content ID is roughly the hash of a binary or object file, and an action ID contains the hash of a package's source code plus the content IDs of its dependencies. This had the advantage that it did what we wanted. However, it had one massive drawback: when we compile a package, we only have the obfuscated action IDs of its dependencies. This is because one can't have the content ID of dependent packages before they are built. Usually, this is not a problem, because hashing a foreign name means it comes from a dependency, where we already have the obfuscated action ID. However, that's not always the case. First, go:linkname directives can point to any symbol that ends up in the binary, even if the package is not a dependency. So garble could only support linkname targets belonging to dependencies. This is at the root of why we could not obfuscate the runtime; it contains linkname directives targeting the net package, for example, which depends on runtime. Second, some other places did not have an easy access to obfuscated action IDs, like transformAsm, which had to recover it from a temporary file stored by transformCompile. Plus, this was all pretty expensive, as each toolexec sub-process had to make repeated calls to buildidOf with the object files of dependencies. We even had to use extra calls to "go list" in the case of indirect dependencies, as their export files do not appear in importcfg files. All in all, the old method was complex and expensive. A better mechanism is to use the original action IDs directly, as listed by "go list" without garble in the picture. This would mean that the hashing does not change if garble changes, meaning weaker obfuscation. To regain that property, we define the "garble action ID", which is just the original action ID hashed together with garble's own content ID. This is practically the same as the obfuscated build ID we used before, but since it doesn't go through "go tool compile -V=full" and the obfuscated build itself, we can work out *all* the garble action IDs upfront, before the obfuscated build even starts. This fixes all of our problems. Now we know all garble build IDs upfront, so a bunch of hacks can be entirely removed. Plus, since we know them upfront, we can also cache them and avoid repeated calls to "go tool buildid". While at it, make use of the new BuildID field in Go 1.16's "list -json -export". This avoids the vast majority of "go tool buildid" calls, as the only ones that remain are 2 on the garble binary itself. The numbers for Go 1.16 look very good: name old time/op new time/op delta Build-8 146ms ± 4% 101ms ± 1% -31.01% (p=0.002 n=6+6) name old bin-B new bin-B delta Build-8 6.61M ± 0% 6.60M ± 0% -0.09% (p=0.002 n=6+6) name old sys-time/op new sys-time/op delta Build-8 321ms ± 7% 202ms ± 6% -37.11% (p=0.002 n=6+6) name old user-time/op new user-time/op delta Build-8 538ms ± 4% 414ms ± 4% -23.12% (p=0.002 n=6+6)
3 years ago
BuildID string
Deps []string
ImportMap map[string]string
refactor "current package" with TOOLEXEC_IMPORTPATH (#266) Now that we've dropped support for Go 1.15.x, we can finally rely on this environment variable for toolexec calls, present in Go 1.16. Before, we had hacky ways of trying to figure out the current package's import path, mostly from the -p flag. The biggest rough edge there was that, for main packages, that was simply the package name, and not its full import path. To work around that, we had a restriction on a single main package, so we could work around that issue. That restriction is now gone. The new code is simpler, especially because we can set curPkg in a single place for all toolexec transform funcs. Since we can always rely on curPkg not being nil now, we can also start reusing listedPackage.Private and avoid the majority of repeated calls to isPrivate. The function is cheap, but still not free. isPrivate itself can also get simpler. We no longer have to worry about the "main" edge case. Plus, the sanity check for invalid package paths is now unnecessary; we only got malformed paths from goobj2, and we now require exact matches with the ImportPath field from "go list -json". Another effect of clearing up the "main" edge case is that -debugdir now uses the right directory for main packages. We also start using consistent debugdir paths in the tests, for the sake of being easier to read and maintain. Finally, note that commandReverse did not need the extra call to "go list -toolexec", as the "shared" call stored in the cache is enough. We still call toolexecCmd to get said cache, which should probably be simplified in a future PR. While at it, replace the use of the "-std" compiler flag with the Standard field from "go list -json".
3 years ago
Standard bool
reverse: support unexported names and package paths (#233) Unexported names are a bit tricky, since they are not listed in the export data file. Perhaps unsurprisingly, it's only meant to expose exported objects. One option would be to go back to adding an extra header to the export data file, containing the unexported methods in a map[string]T or []string. However, we have an easier route: just parse the Go files and look up the names directly. This does mean that we parse the Go files every time "reverse" runs, even if the build cache is warm, but that should not be an issue. Parsing Go files without any typechecking is very cheap compared to everything else we do. Plus, we save having to load go/types information from the build cache, or having to load extra headers from export files. It should be noted that the obfuscation process does need type information, mainly to be careful about which names can be obfuscated and how they should be obfuscated. Neither is a worry here; all names belong to a single package, and it doesn't matter if some aren't actually obfuscated, since the string replacements would simply never trigger in practice. The test includes an unexported func, to test the new feature. We also start reversing the obfuscation of import paths. Now, the test's reverse output is as follows: goroutine 1 [running]: runtime/debug.Stack(0x??, 0x??, 0x??) runtime/debug/stack.go:24 +0x?? test/main/lib.ExportedLibFunc(0x??, 0x??, 0x??, 0x??) p.go:6 +0x?? main.unexportedMainFunc(...) C.go:2 main.main() z.go:3 +0x?? The only major missing feature is positions and filenames. A follow-up PR will take care of those. Updates #5.
3 years ago
Dir string
GoFiles []string
Imports []string
reverse: support unexported names and package paths (#233) Unexported names are a bit tricky, since they are not listed in the export data file. Perhaps unsurprisingly, it's only meant to expose exported objects. One option would be to go back to adding an extra header to the export data file, containing the unexported methods in a map[string]T or []string. However, we have an easier route: just parse the Go files and look up the names directly. This does mean that we parse the Go files every time "reverse" runs, even if the build cache is warm, but that should not be an issue. Parsing Go files without any typechecking is very cheap compared to everything else we do. Plus, we save having to load go/types information from the build cache, or having to load extra headers from export files. It should be noted that the obfuscation process does need type information, mainly to be careful about which names can be obfuscated and how they should be obfuscated. Neither is a worry here; all names belong to a single package, and it doesn't matter if some aren't actually obfuscated, since the string replacements would simply never trigger in practice. The test includes an unexported func, to test the new feature. We also start reversing the obfuscation of import paths. Now, the test's reverse output is as follows: goroutine 1 [running]: runtime/debug.Stack(0x??, 0x??, 0x??) runtime/debug/stack.go:24 +0x?? test/main/lib.ExportedLibFunc(0x??, 0x??, 0x??, 0x??) p.go:6 +0x?? main.unexportedMainFunc(...) C.go:2 main.main() z.go:3 +0x?? The only major missing feature is positions and filenames. A follow-up PR will take care of those. Updates #5.
3 years ago
start using original action IDs (#251) When we obfuscate a name, what we do is hash the name with the action ID of the package that contains the name. To ensure that the hash changes if the garble tool changes, we used the action ID of the obfuscated build, which is different than the original action ID, as we include garble's own content ID in "go tool compile -V=full" via -toolexec. Let's call that the "obfuscated action ID". Remember that a content ID is roughly the hash of a binary or object file, and an action ID contains the hash of a package's source code plus the content IDs of its dependencies. This had the advantage that it did what we wanted. However, it had one massive drawback: when we compile a package, we only have the obfuscated action IDs of its dependencies. This is because one can't have the content ID of dependent packages before they are built. Usually, this is not a problem, because hashing a foreign name means it comes from a dependency, where we already have the obfuscated action ID. However, that's not always the case. First, go:linkname directives can point to any symbol that ends up in the binary, even if the package is not a dependency. So garble could only support linkname targets belonging to dependencies. This is at the root of why we could not obfuscate the runtime; it contains linkname directives targeting the net package, for example, which depends on runtime. Second, some other places did not have an easy access to obfuscated action IDs, like transformAsm, which had to recover it from a temporary file stored by transformCompile. Plus, this was all pretty expensive, as each toolexec sub-process had to make repeated calls to buildidOf with the object files of dependencies. We even had to use extra calls to "go list" in the case of indirect dependencies, as their export files do not appear in importcfg files. All in all, the old method was complex and expensive. A better mechanism is to use the original action IDs directly, as listed by "go list" without garble in the picture. This would mean that the hashing does not change if garble changes, meaning weaker obfuscation. To regain that property, we define the "garble action ID", which is just the original action ID hashed together with garble's own content ID. This is practically the same as the obfuscated build ID we used before, but since it doesn't go through "go tool compile -V=full" and the obfuscated build itself, we can work out *all* the garble action IDs upfront, before the obfuscated build even starts. This fixes all of our problems. Now we know all garble build IDs upfront, so a bunch of hacks can be entirely removed. Plus, since we know them upfront, we can also cache them and avoid repeated calls to "go tool buildid". While at it, make use of the new BuildID field in Go 1.16's "list -json -export". This avoids the vast majority of "go tool buildid" calls, as the only ones that remain are 2 on the garble binary itself. The numbers for Go 1.16 look very good: name old time/op new time/op delta Build-8 146ms ± 4% 101ms ± 1% -31.01% (p=0.002 n=6+6) name old bin-B new bin-B delta Build-8 6.61M ± 0% 6.60M ± 0% -0.09% (p=0.002 n=6+6) name old sys-time/op new sys-time/op delta Build-8 321ms ± 7% 202ms ± 6% -37.11% (p=0.002 n=6+6) name old user-time/op new user-time/op delta Build-8 538ms ± 4% 414ms ± 4% -23.12% (p=0.002 n=6+6)
3 years ago
// The fields below are not part of 'go list', but are still reused
// between garble processes. Use "Garble" as a prefix to ensure no
// collisions with the JSON fields from 'go list'.
properly record when type aliases are embedded as fields There are two scenarios when it comes to embedding fields. The first is easy, and we always handled it well: type Named struct { Foo int } type T struct { Named } In this scenario, T ends up with an embedded field named "Named", and a promoted field named "Foo". Then there's the form with a type alias: type Named struct { Foo int } type Alias = Named type T struct { Alias } This case is different: T ends up with an embedded field named "Alias", and a promoted field named "Foo". Note how the field gets its name from the referenced type, even if said type is just an alias to another type. This poses two problems. First, we must obfuscate the field T.Alias as the name "Alias", and not as the name "Named" that the alias points to. Second, we must be careful of cases where Named and Alias are declared in different packages, as they will obfuscate the same name differently. Both of those problems compounded in the reported issue. The actual reason is that quic-go has a type alias in the form of: type ConnectionState = qtls.ConnectionState In other words, the entire problem boils down to a type alias which points to a named type in a different package, where both types share the same name. For example: package parent import "parent/p1" type T struct { p1.SameName } [...] package p1 import "parent/p2" type SameName = p2.SameName [...] package p2 type SameName struct { Foo int } This broke garble because we had a heuristic to detect when an embedded field was a type alias: // Instead, detect such a "foreign alias embed". // If we embed a final named type, // but the field name does not match its name, // then it must have been done via an alias. // We dig out the alias's TypeName via locateForeignAlias. if named.Obj().Name() != node.Name { As the reader can deduce, this heuristic would incorrectly assume that the snippet above does not embed a type alias, when in fact it does. When obfuscating the field T.SameName, which uses a type alias, we would correctly obfuscate the name "SameName", but we would incorrectly obfuscate it with the package p2, not p1. This would then result in build errors. To fix this problem for good, we need to get rid of the heuristic. Instead, we now mimic what was done for KnownCannotObfuscate, but for embedded fields which use type aliases. KnownEmbeddedAliasFields is now filled for each package and stored in the cache as part of cachedOutput. We can then detect the "embedded alias" case reliably, even when the field is declared in an imported package. On the plus side, we get to remove locateForeignAlias. We also add a couple of TODOs to record further improvements. Finally, add a test. Fixes #466.
2 years ago
// TODO(mvdan): consider filling this iff ToObfuscate==true,
// which will help ensure we don't obfuscate any of their names otherwise.
start using original action IDs (#251) When we obfuscate a name, what we do is hash the name with the action ID of the package that contains the name. To ensure that the hash changes if the garble tool changes, we used the action ID of the obfuscated build, which is different than the original action ID, as we include garble's own content ID in "go tool compile -V=full" via -toolexec. Let's call that the "obfuscated action ID". Remember that a content ID is roughly the hash of a binary or object file, and an action ID contains the hash of a package's source code plus the content IDs of its dependencies. This had the advantage that it did what we wanted. However, it had one massive drawback: when we compile a package, we only have the obfuscated action IDs of its dependencies. This is because one can't have the content ID of dependent packages before they are built. Usually, this is not a problem, because hashing a foreign name means it comes from a dependency, where we already have the obfuscated action ID. However, that's not always the case. First, go:linkname directives can point to any symbol that ends up in the binary, even if the package is not a dependency. So garble could only support linkname targets belonging to dependencies. This is at the root of why we could not obfuscate the runtime; it contains linkname directives targeting the net package, for example, which depends on runtime. Second, some other places did not have an easy access to obfuscated action IDs, like transformAsm, which had to recover it from a temporary file stored by transformCompile. Plus, this was all pretty expensive, as each toolexec sub-process had to make repeated calls to buildidOf with the object files of dependencies. We even had to use extra calls to "go list" in the case of indirect dependencies, as their export files do not appear in importcfg files. All in all, the old method was complex and expensive. A better mechanism is to use the original action IDs directly, as listed by "go list" without garble in the picture. This would mean that the hashing does not change if garble changes, meaning weaker obfuscation. To regain that property, we define the "garble action ID", which is just the original action ID hashed together with garble's own content ID. This is practically the same as the obfuscated build ID we used before, but since it doesn't go through "go tool compile -V=full" and the obfuscated build itself, we can work out *all* the garble action IDs upfront, before the obfuscated build even starts. This fixes all of our problems. Now we know all garble build IDs upfront, so a bunch of hacks can be entirely removed. Plus, since we know them upfront, we can also cache them and avoid repeated calls to "go tool buildid". While at it, make use of the new BuildID field in Go 1.16's "list -json -export". This avoids the vast majority of "go tool buildid" calls, as the only ones that remain are 2 on the garble binary itself. The numbers for Go 1.16 look very good: name old time/op new time/op delta Build-8 146ms ± 4% 101ms ± 1% -31.01% (p=0.002 n=6+6) name old bin-B new bin-B delta Build-8 6.61M ± 0% 6.60M ± 0% -0.09% (p=0.002 n=6+6) name old sys-time/op new sys-time/op delta Build-8 321ms ± 7% 202ms ± 6% -37.11% (p=0.002 n=6+6) name old user-time/op new user-time/op delta Build-8 538ms ± 4% 414ms ± 4% -23.12% (p=0.002 n=6+6)
3 years ago
GarbleActionID []byte
// ToObfuscate records whether the package should be obfuscated.
ToObfuscate bool
}
func (p *listedPackage) obfuscatedImportPath() string {
if p.Name == "main" {
panic("main packages should never need to obfuscate their import paths")
}
// We can't obfuscate the embed package's import path,
// as the toolchain expects to recognize the package by it.
if p.ImportPath == "embed" || !p.ToObfuscate {
return p.ImportPath
}
newPath := hashWithPackage(p, p.ImportPath)
debugf("import path %q hashed with %x to %q", p.ImportPath, p.GarbleActionID, newPath)
return newPath
}
ensure the runtime is built in a reproducible way We went to great lengths to ensure garble builds are reproducible. This includes how the tool itself works, as its behavior should be the same given the same inputs. However, we made one crucial mistake with the runtime package. It has go:linkname directives pointing at other packages, and some of those pointed packages aren't its dependencies. Imagine two scenarios where garble builds the runtime package: 1) We run "garble build runtime". The way we handle linkname directives calls listPackage on the target package, to obfuscate the target's import path and object name. However, since we only obtained build info of runtime and its deps, calls for some linknames such as listPackage("sync/atomic") will fail. The linkname directive will leave its target untouched. 2) We run "garble build std". Unlike the first scenario, all listPackage calls issued by runtime's linkname directives will succeed, so its linkname directive targets will be obfuscated. At best, this can result in inconsistent builds, depending on how the runtime package was built. At worst, the mismatching object names can result in errors at link time, if the target packages are actually used. The modified test reproduces the worst case scenario reliably, when the fix is reverted: > env GOCACHE=${WORK}/gocache-empty > garble build -a runtime > garble build -o=out_rebuild ./stdimporter [stderr] # test/main/stdimporter JZzQivnl.NtQJu0H3: relocation target JZzQivnl.iioHinYT not defined JZzQivnl.NtQJu0H3.func9: relocation target JZzQivnl.yz5z0NaH not defined JZzQivnl.(*ypvqhKiQ).String: relocation target JZzQivnl.eVciBQeI not defined JZzQivnl.(*ypvqhKiQ).PkgPath: relocation target JZzQivnl.eVciBQeI not defined [...] The fix consists of two steps. First, if we're building the runtime and listPackage fails on a package, that means we ran into scenario 1 above. To avoid the inconsistency, we fill ListedPackages with "go list [...] std". This means we'll always build runtime as described in scenario 2 above. Second, when building packages other than the runtime, we only allow listPackage to succeed if we're listing a dependency of the current package. This ensures we won't run into similar reproducibility bugs in the future. Finally, re-enable test-gotip on CI since this was the last test flake.
3 years ago
// appendListedPackages gets information about the current package
// and all of its dependencies
func appendListedPackages(packages []string, withDeps bool) error {
startTime := time.Now()
// TODO: perhaps include all top-level build flags set by garble,
// including -buildvcs=false.
// They shouldn't affect "go list" here, but might as well be consistent.
args := []string{"list", "-json", "-export", "-trimpath"}
if withDeps {
args = append(args, "-deps")
}
fail if we are unexpectedly overwriting files (#418) While investigating a bug report, I noticed that garble was writing to the same temp file twice. At best, writing to the same path on disk twice is wasteful, as the design is careful to be deterministic and use unique paths. At worst, the two writes could cause races at the filesystem level. To prevent either of those situations, we now create files with os.OpenFile and os.O_EXCL, meaning that we will error if the file already exists. That change uncovered a number of such unintended cases. First, transformAsm would write obfuscated Go files twice. This is because the Go toolchain actually runs: [...]/asm -gensymabis [...] foo.s bar.s [...]/asm [...] foo.s bar.s That is, the first run is only meant to generate symbol ABIs, which are then used by the compiler. We need to obfuscate at that first stage, because the symbol ABI descriptions need to use obfuscated names. However, having already obfuscated the assembly on the first stage, there is no need to do so again on the second stage. If we detect gensymabis is missing, we simply reuse the previous files. This first situation doesn't seem racy, but obfuscating the Go assembly files twice is certainly unnecessary. Second, saveKnownReflectAPIs wrote a gob file to the build cache. Since the build cache can be kept between builds, and since the build cache uses reproducible paths for each build, running the same "garble build" twice could overwrite those files. This could actually cause races at the filesystem level; if two concurrent builds write to the same gob file on disk, one of them could end up using a partially-written file. Note that this is the only of the three cases not using temporary files. As such, it is expected that the file may already exist. In such a case, we simply avoid overwriting it rather than failing. Third, when "garble build -a" was used, and when we needed an export file not listed in importcfg, we would end up calling roughly: go list -export -toolexec=garble -a <dependency> This meant we would re-build and re-obfuscate those packages. Which is unfortunate, because the parent process already did via: go build -toolexec=garble -a <main> The repeated dependency builds tripped the new os.O_EXCL check, as we would try to overwrite the same obfuscated Go files. Beyond being wasteful, this could again cause subtle filesystem races. To fix the problem, avoid passing flags like "-a" to nested go commands. Overall, we should likely be using safer ways to write to disk, be it via either atomic writes or locked files. However, for now, catching duplicate writes is a big step. I have left a self-assigned TODO for further improvements. CI on the pull request found a failure on test-gotip. The failure reproduces on master, so it seems to be related to gotip, and not a regression introduced by this change. For now, disable test-gotip until we can investigate.
3 years ago
args = append(args, cache.ForwardBuildFlags...)
args = append(args, packages...)
cmd := exec.Command("go", args...)
defer func() {
ensure the runtime is built in a reproducible way We went to great lengths to ensure garble builds are reproducible. This includes how the tool itself works, as its behavior should be the same given the same inputs. However, we made one crucial mistake with the runtime package. It has go:linkname directives pointing at other packages, and some of those pointed packages aren't its dependencies. Imagine two scenarios where garble builds the runtime package: 1) We run "garble build runtime". The way we handle linkname directives calls listPackage on the target package, to obfuscate the target's import path and object name. However, since we only obtained build info of runtime and its deps, calls for some linknames such as listPackage("sync/atomic") will fail. The linkname directive will leave its target untouched. 2) We run "garble build std". Unlike the first scenario, all listPackage calls issued by runtime's linkname directives will succeed, so its linkname directive targets will be obfuscated. At best, this can result in inconsistent builds, depending on how the runtime package was built. At worst, the mismatching object names can result in errors at link time, if the target packages are actually used. The modified test reproduces the worst case scenario reliably, when the fix is reverted: > env GOCACHE=${WORK}/gocache-empty > garble build -a runtime > garble build -o=out_rebuild ./stdimporter [stderr] # test/main/stdimporter JZzQivnl.NtQJu0H3: relocation target JZzQivnl.iioHinYT not defined JZzQivnl.NtQJu0H3.func9: relocation target JZzQivnl.yz5z0NaH not defined JZzQivnl.(*ypvqhKiQ).String: relocation target JZzQivnl.eVciBQeI not defined JZzQivnl.(*ypvqhKiQ).PkgPath: relocation target JZzQivnl.eVciBQeI not defined [...] The fix consists of two steps. First, if we're building the runtime and listPackage fails on a package, that means we ran into scenario 1 above. To avoid the inconsistency, we fill ListedPackages with "go list [...] std". This means we'll always build runtime as described in scenario 2 above. Second, when building packages other than the runtime, we only allow listPackage to succeed if we're listing a dependency of the current package. This ensures we won't run into similar reproducibility bugs in the future. Finally, re-enable test-gotip on CI since this was the last test flake.
3 years ago
debugf("original build info obtained in %s via: go %s", debugSince(startTime), strings.Join(args, " "))
}()
stdout, err := cmd.StdoutPipe()
if err != nil {
return err
}
var stderr bytes.Buffer
cmd.Stderr = &stderr
if err := cmd.Start(); err != nil {
return fmt.Errorf("go list error: %v", err)
}
start using original action IDs (#251) When we obfuscate a name, what we do is hash the name with the action ID of the package that contains the name. To ensure that the hash changes if the garble tool changes, we used the action ID of the obfuscated build, which is different than the original action ID, as we include garble's own content ID in "go tool compile -V=full" via -toolexec. Let's call that the "obfuscated action ID". Remember that a content ID is roughly the hash of a binary or object file, and an action ID contains the hash of a package's source code plus the content IDs of its dependencies. This had the advantage that it did what we wanted. However, it had one massive drawback: when we compile a package, we only have the obfuscated action IDs of its dependencies. This is because one can't have the content ID of dependent packages before they are built. Usually, this is not a problem, because hashing a foreign name means it comes from a dependency, where we already have the obfuscated action ID. However, that's not always the case. First, go:linkname directives can point to any symbol that ends up in the binary, even if the package is not a dependency. So garble could only support linkname targets belonging to dependencies. This is at the root of why we could not obfuscate the runtime; it contains linkname directives targeting the net package, for example, which depends on runtime. Second, some other places did not have an easy access to obfuscated action IDs, like transformAsm, which had to recover it from a temporary file stored by transformCompile. Plus, this was all pretty expensive, as each toolexec sub-process had to make repeated calls to buildidOf with the object files of dependencies. We even had to use extra calls to "go list" in the case of indirect dependencies, as their export files do not appear in importcfg files. All in all, the old method was complex and expensive. A better mechanism is to use the original action IDs directly, as listed by "go list" without garble in the picture. This would mean that the hashing does not change if garble changes, meaning weaker obfuscation. To regain that property, we define the "garble action ID", which is just the original action ID hashed together with garble's own content ID. This is practically the same as the obfuscated build ID we used before, but since it doesn't go through "go tool compile -V=full" and the obfuscated build itself, we can work out *all* the garble action IDs upfront, before the obfuscated build even starts. This fixes all of our problems. Now we know all garble build IDs upfront, so a bunch of hacks can be entirely removed. Plus, since we know them upfront, we can also cache them and avoid repeated calls to "go tool buildid". While at it, make use of the new BuildID field in Go 1.16's "list -json -export". This avoids the vast majority of "go tool buildid" calls, as the only ones that remain are 2 on the garble binary itself. The numbers for Go 1.16 look very good: name old time/op new time/op delta Build-8 146ms ± 4% 101ms ± 1% -31.01% (p=0.002 n=6+6) name old bin-B new bin-B delta Build-8 6.61M ± 0% 6.60M ± 0% -0.09% (p=0.002 n=6+6) name old sys-time/op new sys-time/op delta Build-8 321ms ± 7% 202ms ± 6% -37.11% (p=0.002 n=6+6) name old user-time/op new user-time/op delta Build-8 538ms ± 4% 414ms ± 4% -23.12% (p=0.002 n=6+6)
3 years ago
dec := json.NewDecoder(stdout)
ensure the runtime is built in a reproducible way We went to great lengths to ensure garble builds are reproducible. This includes how the tool itself works, as its behavior should be the same given the same inputs. However, we made one crucial mistake with the runtime package. It has go:linkname directives pointing at other packages, and some of those pointed packages aren't its dependencies. Imagine two scenarios where garble builds the runtime package: 1) We run "garble build runtime". The way we handle linkname directives calls listPackage on the target package, to obfuscate the target's import path and object name. However, since we only obtained build info of runtime and its deps, calls for some linknames such as listPackage("sync/atomic") will fail. The linkname directive will leave its target untouched. 2) We run "garble build std". Unlike the first scenario, all listPackage calls issued by runtime's linkname directives will succeed, so its linkname directive targets will be obfuscated. At best, this can result in inconsistent builds, depending on how the runtime package was built. At worst, the mismatching object names can result in errors at link time, if the target packages are actually used. The modified test reproduces the worst case scenario reliably, when the fix is reverted: > env GOCACHE=${WORK}/gocache-empty > garble build -a runtime > garble build -o=out_rebuild ./stdimporter [stderr] # test/main/stdimporter JZzQivnl.NtQJu0H3: relocation target JZzQivnl.iioHinYT not defined JZzQivnl.NtQJu0H3.func9: relocation target JZzQivnl.yz5z0NaH not defined JZzQivnl.(*ypvqhKiQ).String: relocation target JZzQivnl.eVciBQeI not defined JZzQivnl.(*ypvqhKiQ).PkgPath: relocation target JZzQivnl.eVciBQeI not defined [...] The fix consists of two steps. First, if we're building the runtime and listPackage fails on a package, that means we ran into scenario 1 above. To avoid the inconsistency, we fill ListedPackages with "go list [...] std". This means we'll always build runtime as described in scenario 2 above. Second, when building packages other than the runtime, we only allow listPackage to succeed if we're listing a dependency of the current package. This ensures we won't run into similar reproducibility bugs in the future. Finally, re-enable test-gotip on CI since this was the last test flake.
3 years ago
if cache.ListedPackages == nil {
cache.ListedPackages = make(map[string]*listedPackage)
}
for dec.More() {
var pkg listedPackage
if err := dec.Decode(&pkg); err != nil {
return err
}
if cache.ListedPackages[pkg.ImportPath] != nil {
return fmt.Errorf("duplicate package: %q", pkg.ImportPath)
}
if pkg.BuildID != "" {
actionID := decodeHash(splitActionID(pkg.BuildID))
pkg.GarbleActionID = addGarbleToHash(actionID)
start using original action IDs (#251) When we obfuscate a name, what we do is hash the name with the action ID of the package that contains the name. To ensure that the hash changes if the garble tool changes, we used the action ID of the obfuscated build, which is different than the original action ID, as we include garble's own content ID in "go tool compile -V=full" via -toolexec. Let's call that the "obfuscated action ID". Remember that a content ID is roughly the hash of a binary or object file, and an action ID contains the hash of a package's source code plus the content IDs of its dependencies. This had the advantage that it did what we wanted. However, it had one massive drawback: when we compile a package, we only have the obfuscated action IDs of its dependencies. This is because one can't have the content ID of dependent packages before they are built. Usually, this is not a problem, because hashing a foreign name means it comes from a dependency, where we already have the obfuscated action ID. However, that's not always the case. First, go:linkname directives can point to any symbol that ends up in the binary, even if the package is not a dependency. So garble could only support linkname targets belonging to dependencies. This is at the root of why we could not obfuscate the runtime; it contains linkname directives targeting the net package, for example, which depends on runtime. Second, some other places did not have an easy access to obfuscated action IDs, like transformAsm, which had to recover it from a temporary file stored by transformCompile. Plus, this was all pretty expensive, as each toolexec sub-process had to make repeated calls to buildidOf with the object files of dependencies. We even had to use extra calls to "go list" in the case of indirect dependencies, as their export files do not appear in importcfg files. All in all, the old method was complex and expensive. A better mechanism is to use the original action IDs directly, as listed by "go list" without garble in the picture. This would mean that the hashing does not change if garble changes, meaning weaker obfuscation. To regain that property, we define the "garble action ID", which is just the original action ID hashed together with garble's own content ID. This is practically the same as the obfuscated build ID we used before, but since it doesn't go through "go tool compile -V=full" and the obfuscated build itself, we can work out *all* the garble action IDs upfront, before the obfuscated build even starts. This fixes all of our problems. Now we know all garble build IDs upfront, so a bunch of hacks can be entirely removed. Plus, since we know them upfront, we can also cache them and avoid repeated calls to "go tool buildid". While at it, make use of the new BuildID field in Go 1.16's "list -json -export". This avoids the vast majority of "go tool buildid" calls, as the only ones that remain are 2 on the garble binary itself. The numbers for Go 1.16 look very good: name old time/op new time/op delta Build-8 146ms ± 4% 101ms ± 1% -31.01% (p=0.002 n=6+6) name old bin-B new bin-B delta Build-8 6.61M ± 0% 6.60M ± 0% -0.09% (p=0.002 n=6+6) name old sys-time/op new sys-time/op delta Build-8 321ms ± 7% 202ms ± 6% -37.11% (p=0.002 n=6+6) name old user-time/op new user-time/op delta Build-8 538ms ± 4% 414ms ± 4% -23.12% (p=0.002 n=6+6)
3 years ago
}
cache.ListedPackages[pkg.ImportPath] = &pkg
}
if err := cmd.Wait(); err != nil {
return fmt.Errorf("go list error: %v: %s", err, stderr.Bytes())
}
anyToObfuscate := false
for path, pkg := range cache.ListedPackages {
// If "GOGARBLE=foo/bar", "foo/bar_test" should also match.
if pkg.ForTest != "" {
path = pkg.ForTest
}
// Test main packages like "foo/bar.test" are always obfuscated,
// just like main packages.
switch {
case cannotObfuscate[path], runtimeAndDeps[path]:
// We don't support obfuscating these yet.
case pkg.Name == "main" && strings.HasSuffix(path, ".test"),
path == "command-line-arguments",
strings.HasPrefix(path, "plugin/unnamed"),
module.MatchPrefixPatterns(cache.GOGARBLE, path):
pkg.ToObfuscate = true
anyToObfuscate = true
}
}
ensure the runtime is built in a reproducible way We went to great lengths to ensure garble builds are reproducible. This includes how the tool itself works, as its behavior should be the same given the same inputs. However, we made one crucial mistake with the runtime package. It has go:linkname directives pointing at other packages, and some of those pointed packages aren't its dependencies. Imagine two scenarios where garble builds the runtime package: 1) We run "garble build runtime". The way we handle linkname directives calls listPackage on the target package, to obfuscate the target's import path and object name. However, since we only obtained build info of runtime and its deps, calls for some linknames such as listPackage("sync/atomic") will fail. The linkname directive will leave its target untouched. 2) We run "garble build std". Unlike the first scenario, all listPackage calls issued by runtime's linkname directives will succeed, so its linkname directive targets will be obfuscated. At best, this can result in inconsistent builds, depending on how the runtime package was built. At worst, the mismatching object names can result in errors at link time, if the target packages are actually used. The modified test reproduces the worst case scenario reliably, when the fix is reverted: > env GOCACHE=${WORK}/gocache-empty > garble build -a runtime > garble build -o=out_rebuild ./stdimporter [stderr] # test/main/stdimporter JZzQivnl.NtQJu0H3: relocation target JZzQivnl.iioHinYT not defined JZzQivnl.NtQJu0H3.func9: relocation target JZzQivnl.yz5z0NaH not defined JZzQivnl.(*ypvqhKiQ).String: relocation target JZzQivnl.eVciBQeI not defined JZzQivnl.(*ypvqhKiQ).PkgPath: relocation target JZzQivnl.eVciBQeI not defined [...] The fix consists of two steps. First, if we're building the runtime and listPackage fails on a package, that means we ran into scenario 1 above. To avoid the inconsistency, we fill ListedPackages with "go list [...] std". This means we'll always build runtime as described in scenario 2 above. Second, when building packages other than the runtime, we only allow listPackage to succeed if we're listing a dependency of the current package. This ensures we won't run into similar reproducibility bugs in the future. Finally, re-enable test-gotip on CI since this was the last test flake.
3 years ago
// Don't error if the user ran: GOGARBLE='*' garble build runtime
if !anyToObfuscate && !module.MatchPrefixPatterns(cache.GOGARBLE, "runtime") {
return fmt.Errorf("GOGARBLE=%q does not match any packages to be built", cache.GOGARBLE)
}
return nil
}
// cannotObfuscate is a list of some packages the runtime depends on, or
// packages which the runtime points to via go:linkname.
//
// Once we support go:linkname well and once we can obfuscate the runtime
// package, this entire map can likely go away.
//
// TODO: investigate and resolve each one of these
var cannotObfuscate = map[string]bool{
// not a "real" package
"unsafe": true,
// some linkname failure
"time": true,
"runtime/pprof": true,
// all kinds of stuff breaks when obfuscating the runtime
"syscall": true,
"internal/abi": true,
// rebuilds don't work
"os/signal": true,
// cgo breaks otherwise
"runtime/cgo": true,
// garble reverse breaks otherwise
"runtime/debug": true,
// cgo heavy net doesn't like to be obfuscated
"net": true,
// some linkname failure
"crypto/x509/internal/macos": true,
}
// Obtained from "go list -deps runtime" on Go 1.18beta1.
// Note that the same command on Go 1.17 results in a subset of this list.
var runtimeAndDeps = map[string]bool{
"internal/goarch": true,
"unsafe": true,
"internal/abi": true,
"internal/cpu": true,
"internal/bytealg": true,
"internal/goexperiment": true,
"internal/goos": true,
"runtime/internal/atomic": true,
"runtime/internal/math": true,
"runtime/internal/sys": true,
"runtime": true,
}
only list missing packages when obfuscating the runtime We were listing all of std, which certainly worked, but was quite slow at over 200 packages. In practice, we can only be missing up to 20-30 packages. It was a good change as it fixed a severe bug, but it also introduced a fairly noticeable slow-down. The numbers are clear; this change shaves off multiple seconds when obfuscating the runtime with a cold cache: name old time/op new time/op delta Build/NoCache-16 5.06s ± 1% 1.94s ± 1% -61.64% (p=0.008 n=5+5) name old bin-B new bin-B delta Build/NoCache-16 6.70M ± 0% 6.71M ± 0% +0.05% (p=0.008 n=5+5) name old sys-time/op new sys-time/op delta Build/NoCache-16 13.4s ± 2% 5.0s ± 2% -62.45% (p=0.008 n=5+5) name old user-time/op new user-time/op delta Build/NoCache-16 60.6s ± 1% 19.8s ± 1% -67.34% (p=0.008 n=5+5) Since we only want to call "go list" one extra time, instead of once for every package we find out we're missing, we want to know what packages we could be missing in advance. Resurrect a smarter version of the runtime-related script. Finally, remove the runtime-related.txt test script, as it has now been superseeded by the sanity checks in listPackage. That is, obfuscating the runtime package will now panic if we are missing any necessary package information. To double check that we get the runtime's linkname edge case right, make gogarble.txt use runtime/debug.WriteHeapDump, which is implemented via a direct runtime linkname. This ensures we don't lose test coverage from runtime-related.txt.
3 years ago
var listedRuntimeLinknamed = false
// listPackage gets the listedPackage information for a certain package
func listPackage(path string) (*listedPackage, error) {
ensure the runtime is built in a reproducible way We went to great lengths to ensure garble builds are reproducible. This includes how the tool itself works, as its behavior should be the same given the same inputs. However, we made one crucial mistake with the runtime package. It has go:linkname directives pointing at other packages, and some of those pointed packages aren't its dependencies. Imagine two scenarios where garble builds the runtime package: 1) We run "garble build runtime". The way we handle linkname directives calls listPackage on the target package, to obfuscate the target's import path and object name. However, since we only obtained build info of runtime and its deps, calls for some linknames such as listPackage("sync/atomic") will fail. The linkname directive will leave its target untouched. 2) We run "garble build std". Unlike the first scenario, all listPackage calls issued by runtime's linkname directives will succeed, so its linkname directive targets will be obfuscated. At best, this can result in inconsistent builds, depending on how the runtime package was built. At worst, the mismatching object names can result in errors at link time, if the target packages are actually used. The modified test reproduces the worst case scenario reliably, when the fix is reverted: > env GOCACHE=${WORK}/gocache-empty > garble build -a runtime > garble build -o=out_rebuild ./stdimporter [stderr] # test/main/stdimporter JZzQivnl.NtQJu0H3: relocation target JZzQivnl.iioHinYT not defined JZzQivnl.NtQJu0H3.func9: relocation target JZzQivnl.yz5z0NaH not defined JZzQivnl.(*ypvqhKiQ).String: relocation target JZzQivnl.eVciBQeI not defined JZzQivnl.(*ypvqhKiQ).PkgPath: relocation target JZzQivnl.eVciBQeI not defined [...] The fix consists of two steps. First, if we're building the runtime and listPackage fails on a package, that means we ran into scenario 1 above. To avoid the inconsistency, we fill ListedPackages with "go list [...] std". This means we'll always build runtime as described in scenario 2 above. Second, when building packages other than the runtime, we only allow listPackage to succeed if we're listing a dependency of the current package. This ensures we won't run into similar reproducibility bugs in the future. Finally, re-enable test-gotip on CI since this was the last test flake.
3 years ago
if path == curPkg.ImportPath {
return curPkg, nil
}
// If the path is listed in the top-level ImportMap, use its mapping instead.
// This is a common scenario when dealing with vendored packages in GOROOT.
// The map is flat, so we don't need to recurse.
start using original action IDs (#251) When we obfuscate a name, what we do is hash the name with the action ID of the package that contains the name. To ensure that the hash changes if the garble tool changes, we used the action ID of the obfuscated build, which is different than the original action ID, as we include garble's own content ID in "go tool compile -V=full" via -toolexec. Let's call that the "obfuscated action ID". Remember that a content ID is roughly the hash of a binary or object file, and an action ID contains the hash of a package's source code plus the content IDs of its dependencies. This had the advantage that it did what we wanted. However, it had one massive drawback: when we compile a package, we only have the obfuscated action IDs of its dependencies. This is because one can't have the content ID of dependent packages before they are built. Usually, this is not a problem, because hashing a foreign name means it comes from a dependency, where we already have the obfuscated action ID. However, that's not always the case. First, go:linkname directives can point to any symbol that ends up in the binary, even if the package is not a dependency. So garble could only support linkname targets belonging to dependencies. This is at the root of why we could not obfuscate the runtime; it contains linkname directives targeting the net package, for example, which depends on runtime. Second, some other places did not have an easy access to obfuscated action IDs, like transformAsm, which had to recover it from a temporary file stored by transformCompile. Plus, this was all pretty expensive, as each toolexec sub-process had to make repeated calls to buildidOf with the object files of dependencies. We even had to use extra calls to "go list" in the case of indirect dependencies, as their export files do not appear in importcfg files. All in all, the old method was complex and expensive. A better mechanism is to use the original action IDs directly, as listed by "go list" without garble in the picture. This would mean that the hashing does not change if garble changes, meaning weaker obfuscation. To regain that property, we define the "garble action ID", which is just the original action ID hashed together with garble's own content ID. This is practically the same as the obfuscated build ID we used before, but since it doesn't go through "go tool compile -V=full" and the obfuscated build itself, we can work out *all* the garble action IDs upfront, before the obfuscated build even starts. This fixes all of our problems. Now we know all garble build IDs upfront, so a bunch of hacks can be entirely removed. Plus, since we know them upfront, we can also cache them and avoid repeated calls to "go tool buildid". While at it, make use of the new BuildID field in Go 1.16's "list -json -export". This avoids the vast majority of "go tool buildid" calls, as the only ones that remain are 2 on the garble binary itself. The numbers for Go 1.16 look very good: name old time/op new time/op delta Build-8 146ms ± 4% 101ms ± 1% -31.01% (p=0.002 n=6+6) name old bin-B new bin-B delta Build-8 6.61M ± 0% 6.60M ± 0% -0.09% (p=0.002 n=6+6) name old sys-time/op new sys-time/op delta Build-8 321ms ± 7% 202ms ± 6% -37.11% (p=0.002 n=6+6) name old user-time/op new user-time/op delta Build-8 538ms ± 4% 414ms ± 4% -23.12% (p=0.002 n=6+6)
3 years ago
if path2 := curPkg.ImportMap[path]; path2 != "" {
path = path2
}
pkg, ok := cache.ListedPackages[path]
ensure the runtime is built in a reproducible way We went to great lengths to ensure garble builds are reproducible. This includes how the tool itself works, as its behavior should be the same given the same inputs. However, we made one crucial mistake with the runtime package. It has go:linkname directives pointing at other packages, and some of those pointed packages aren't its dependencies. Imagine two scenarios where garble builds the runtime package: 1) We run "garble build runtime". The way we handle linkname directives calls listPackage on the target package, to obfuscate the target's import path and object name. However, since we only obtained build info of runtime and its deps, calls for some linknames such as listPackage("sync/atomic") will fail. The linkname directive will leave its target untouched. 2) We run "garble build std". Unlike the first scenario, all listPackage calls issued by runtime's linkname directives will succeed, so its linkname directive targets will be obfuscated. At best, this can result in inconsistent builds, depending on how the runtime package was built. At worst, the mismatching object names can result in errors at link time, if the target packages are actually used. The modified test reproduces the worst case scenario reliably, when the fix is reverted: > env GOCACHE=${WORK}/gocache-empty > garble build -a runtime > garble build -o=out_rebuild ./stdimporter [stderr] # test/main/stdimporter JZzQivnl.NtQJu0H3: relocation target JZzQivnl.iioHinYT not defined JZzQivnl.NtQJu0H3.func9: relocation target JZzQivnl.yz5z0NaH not defined JZzQivnl.(*ypvqhKiQ).String: relocation target JZzQivnl.eVciBQeI not defined JZzQivnl.(*ypvqhKiQ).PkgPath: relocation target JZzQivnl.eVciBQeI not defined [...] The fix consists of two steps. First, if we're building the runtime and listPackage fails on a package, that means we ran into scenario 1 above. To avoid the inconsistency, we fill ListedPackages with "go list [...] std". This means we'll always build runtime as described in scenario 2 above. Second, when building packages other than the runtime, we only allow listPackage to succeed if we're listing a dependency of the current package. This ensures we won't run into similar reproducibility bugs in the future. Finally, re-enable test-gotip on CI since this was the last test flake.
3 years ago
// The runtime may list any package in std, even those it doesn't depend on.
// This is due to how it linkname-implements std packages,
// such as sync/atomic or reflect, without importing them in any way.
// If ListedPackages lacks such a package we fill it with "std".
if curPkg.ImportPath == "runtime" {
if ok {
return pkg, nil
}
only list missing packages when obfuscating the runtime We were listing all of std, which certainly worked, but was quite slow at over 200 packages. In practice, we can only be missing up to 20-30 packages. It was a good change as it fixed a severe bug, but it also introduced a fairly noticeable slow-down. The numbers are clear; this change shaves off multiple seconds when obfuscating the runtime with a cold cache: name old time/op new time/op delta Build/NoCache-16 5.06s ± 1% 1.94s ± 1% -61.64% (p=0.008 n=5+5) name old bin-B new bin-B delta Build/NoCache-16 6.70M ± 0% 6.71M ± 0% +0.05% (p=0.008 n=5+5) name old sys-time/op new sys-time/op delta Build/NoCache-16 13.4s ± 2% 5.0s ± 2% -62.45% (p=0.008 n=5+5) name old user-time/op new user-time/op delta Build/NoCache-16 60.6s ± 1% 19.8s ± 1% -67.34% (p=0.008 n=5+5) Since we only want to call "go list" one extra time, instead of once for every package we find out we're missing, we want to know what packages we could be missing in advance. Resurrect a smarter version of the runtime-related script. Finally, remove the runtime-related.txt test script, as it has now been superseeded by the sanity checks in listPackage. That is, obfuscating the runtime package will now panic if we are missing any necessary package information. To double check that we get the runtime's linkname edge case right, make gogarble.txt use runtime/debug.WriteHeapDump, which is implemented via a direct runtime linkname. This ensures we don't lose test coverage from runtime-related.txt.
3 years ago
if listedRuntimeLinknamed {
panic(fmt.Sprintf("package %q still missing after go list call", path))
}
startTime := time.Now()
// Obtained via scripts/runtime-linknamed-nodeps.sh as of Go 1.18beta1.
runtimeLinknamed := []string{
"crypto/x509/internal/macos",
"internal/poll",
"internal/reflectlite",
only list missing packages when obfuscating the runtime We were listing all of std, which certainly worked, but was quite slow at over 200 packages. In practice, we can only be missing up to 20-30 packages. It was a good change as it fixed a severe bug, but it also introduced a fairly noticeable slow-down. The numbers are clear; this change shaves off multiple seconds when obfuscating the runtime with a cold cache: name old time/op new time/op delta Build/NoCache-16 5.06s ± 1% 1.94s ± 1% -61.64% (p=0.008 n=5+5) name old bin-B new bin-B delta Build/NoCache-16 6.70M ± 0% 6.71M ± 0% +0.05% (p=0.008 n=5+5) name old sys-time/op new sys-time/op delta Build/NoCache-16 13.4s ± 2% 5.0s ± 2% -62.45% (p=0.008 n=5+5) name old user-time/op new user-time/op delta Build/NoCache-16 60.6s ± 1% 19.8s ± 1% -67.34% (p=0.008 n=5+5) Since we only want to call "go list" one extra time, instead of once for every package we find out we're missing, we want to know what packages we could be missing in advance. Resurrect a smarter version of the runtime-related script. Finally, remove the runtime-related.txt test script, as it has now been superseeded by the sanity checks in listPackage. That is, obfuscating the runtime package will now panic if we are missing any necessary package information. To double check that we get the runtime's linkname edge case right, make gogarble.txt use runtime/debug.WriteHeapDump, which is implemented via a direct runtime linkname. This ensures we don't lose test coverage from runtime-related.txt.
3 years ago
"net",
"os",
only list missing packages when obfuscating the runtime We were listing all of std, which certainly worked, but was quite slow at over 200 packages. In practice, we can only be missing up to 20-30 packages. It was a good change as it fixed a severe bug, but it also introduced a fairly noticeable slow-down. The numbers are clear; this change shaves off multiple seconds when obfuscating the runtime with a cold cache: name old time/op new time/op delta Build/NoCache-16 5.06s ± 1% 1.94s ± 1% -61.64% (p=0.008 n=5+5) name old bin-B new bin-B delta Build/NoCache-16 6.70M ± 0% 6.71M ± 0% +0.05% (p=0.008 n=5+5) name old sys-time/op new sys-time/op delta Build/NoCache-16 13.4s ± 2% 5.0s ± 2% -62.45% (p=0.008 n=5+5) name old user-time/op new user-time/op delta Build/NoCache-16 60.6s ± 1% 19.8s ± 1% -67.34% (p=0.008 n=5+5) Since we only want to call "go list" one extra time, instead of once for every package we find out we're missing, we want to know what packages we could be missing in advance. Resurrect a smarter version of the runtime-related script. Finally, remove the runtime-related.txt test script, as it has now been superseeded by the sanity checks in listPackage. That is, obfuscating the runtime package will now panic if we are missing any necessary package information. To double check that we get the runtime's linkname edge case right, make gogarble.txt use runtime/debug.WriteHeapDump, which is implemented via a direct runtime linkname. This ensures we don't lose test coverage from runtime-related.txt.
3 years ago
"os/signal",
"plugin",
"reflect",
only list missing packages when obfuscating the runtime We were listing all of std, which certainly worked, but was quite slow at over 200 packages. In practice, we can only be missing up to 20-30 packages. It was a good change as it fixed a severe bug, but it also introduced a fairly noticeable slow-down. The numbers are clear; this change shaves off multiple seconds when obfuscating the runtime with a cold cache: name old time/op new time/op delta Build/NoCache-16 5.06s ± 1% 1.94s ± 1% -61.64% (p=0.008 n=5+5) name old bin-B new bin-B delta Build/NoCache-16 6.70M ± 0% 6.71M ± 0% +0.05% (p=0.008 n=5+5) name old sys-time/op new sys-time/op delta Build/NoCache-16 13.4s ± 2% 5.0s ± 2% -62.45% (p=0.008 n=5+5) name old user-time/op new user-time/op delta Build/NoCache-16 60.6s ± 1% 19.8s ± 1% -67.34% (p=0.008 n=5+5) Since we only want to call "go list" one extra time, instead of once for every package we find out we're missing, we want to know what packages we could be missing in advance. Resurrect a smarter version of the runtime-related script. Finally, remove the runtime-related.txt test script, as it has now been superseeded by the sanity checks in listPackage. That is, obfuscating the runtime package will now panic if we are missing any necessary package information. To double check that we get the runtime's linkname edge case right, make gogarble.txt use runtime/debug.WriteHeapDump, which is implemented via a direct runtime linkname. This ensures we don't lose test coverage from runtime-related.txt.
3 years ago
"runtime/debug",
"runtime/metrics",
"runtime/pprof",
"runtime/trace",
"sync",
"sync/atomic",
"syscall",
only list missing packages when obfuscating the runtime We were listing all of std, which certainly worked, but was quite slow at over 200 packages. In practice, we can only be missing up to 20-30 packages. It was a good change as it fixed a severe bug, but it also introduced a fairly noticeable slow-down. The numbers are clear; this change shaves off multiple seconds when obfuscating the runtime with a cold cache: name old time/op new time/op delta Build/NoCache-16 5.06s ± 1% 1.94s ± 1% -61.64% (p=0.008 n=5+5) name old bin-B new bin-B delta Build/NoCache-16 6.70M ± 0% 6.71M ± 0% +0.05% (p=0.008 n=5+5) name old sys-time/op new sys-time/op delta Build/NoCache-16 13.4s ± 2% 5.0s ± 2% -62.45% (p=0.008 n=5+5) name old user-time/op new user-time/op delta Build/NoCache-16 60.6s ± 1% 19.8s ± 1% -67.34% (p=0.008 n=5+5) Since we only want to call "go list" one extra time, instead of once for every package we find out we're missing, we want to know what packages we could be missing in advance. Resurrect a smarter version of the runtime-related script. Finally, remove the runtime-related.txt test script, as it has now been superseeded by the sanity checks in listPackage. That is, obfuscating the runtime package will now panic if we are missing any necessary package information. To double check that we get the runtime's linkname edge case right, make gogarble.txt use runtime/debug.WriteHeapDump, which is implemented via a direct runtime linkname. This ensures we don't lose test coverage from runtime-related.txt.
3 years ago
"syscall/js",
"time",
only list missing packages when obfuscating the runtime We were listing all of std, which certainly worked, but was quite slow at over 200 packages. In practice, we can only be missing up to 20-30 packages. It was a good change as it fixed a severe bug, but it also introduced a fairly noticeable slow-down. The numbers are clear; this change shaves off multiple seconds when obfuscating the runtime with a cold cache: name old time/op new time/op delta Build/NoCache-16 5.06s ± 1% 1.94s ± 1% -61.64% (p=0.008 n=5+5) name old bin-B new bin-B delta Build/NoCache-16 6.70M ± 0% 6.71M ± 0% +0.05% (p=0.008 n=5+5) name old sys-time/op new sys-time/op delta Build/NoCache-16 13.4s ± 2% 5.0s ± 2% -62.45% (p=0.008 n=5+5) name old user-time/op new user-time/op delta Build/NoCache-16 60.6s ± 1% 19.8s ± 1% -67.34% (p=0.008 n=5+5) Since we only want to call "go list" one extra time, instead of once for every package we find out we're missing, we want to know what packages we could be missing in advance. Resurrect a smarter version of the runtime-related script. Finally, remove the runtime-related.txt test script, as it has now been superseeded by the sanity checks in listPackage. That is, obfuscating the runtime package will now panic if we are missing any necessary package information. To double check that we get the runtime's linkname edge case right, make gogarble.txt use runtime/debug.WriteHeapDump, which is implemented via a direct runtime linkname. This ensures we don't lose test coverage from runtime-related.txt.
3 years ago
}
missing := make([]string, 0, len(runtimeLinknamed))
only list missing packages when obfuscating the runtime We were listing all of std, which certainly worked, but was quite slow at over 200 packages. In practice, we can only be missing up to 20-30 packages. It was a good change as it fixed a severe bug, but it also introduced a fairly noticeable slow-down. The numbers are clear; this change shaves off multiple seconds when obfuscating the runtime with a cold cache: name old time/op new time/op delta Build/NoCache-16 5.06s ± 1% 1.94s ± 1% -61.64% (p=0.008 n=5+5) name old bin-B new bin-B delta Build/NoCache-16 6.70M ± 0% 6.71M ± 0% +0.05% (p=0.008 n=5+5) name old sys-time/op new sys-time/op delta Build/NoCache-16 13.4s ± 2% 5.0s ± 2% -62.45% (p=0.008 n=5+5) name old user-time/op new user-time/op delta Build/NoCache-16 60.6s ± 1% 19.8s ± 1% -67.34% (p=0.008 n=5+5) Since we only want to call "go list" one extra time, instead of once for every package we find out we're missing, we want to know what packages we could be missing in advance. Resurrect a smarter version of the runtime-related script. Finally, remove the runtime-related.txt test script, as it has now been superseeded by the sanity checks in listPackage. That is, obfuscating the runtime package will now panic if we are missing any necessary package information. To double check that we get the runtime's linkname edge case right, make gogarble.txt use runtime/debug.WriteHeapDump, which is implemented via a direct runtime linkname. This ensures we don't lose test coverage from runtime-related.txt.
3 years ago
for _, linknamed := range runtimeLinknamed {
switch {
case cache.ListedPackages[linknamed] != nil:
// We already have it; skip.
case cache.GoEnv.GOOS != "js" && linknamed == "syscall/js":
// GOOS-specific package.
case cache.GoEnv.GOOS != "darwin" && linknamed == "crypto/x509/internal/macos":
// GOOS-specific package.
default:
missing = append(missing, linknamed)
}
}
// We don't need any information about their dependencies, in this case.
if err := appendListedPackages(missing, false); err != nil {
ensure the runtime is built in a reproducible way We went to great lengths to ensure garble builds are reproducible. This includes how the tool itself works, as its behavior should be the same given the same inputs. However, we made one crucial mistake with the runtime package. It has go:linkname directives pointing at other packages, and some of those pointed packages aren't its dependencies. Imagine two scenarios where garble builds the runtime package: 1) We run "garble build runtime". The way we handle linkname directives calls listPackage on the target package, to obfuscate the target's import path and object name. However, since we only obtained build info of runtime and its deps, calls for some linknames such as listPackage("sync/atomic") will fail. The linkname directive will leave its target untouched. 2) We run "garble build std". Unlike the first scenario, all listPackage calls issued by runtime's linkname directives will succeed, so its linkname directive targets will be obfuscated. At best, this can result in inconsistent builds, depending on how the runtime package was built. At worst, the mismatching object names can result in errors at link time, if the target packages are actually used. The modified test reproduces the worst case scenario reliably, when the fix is reverted: > env GOCACHE=${WORK}/gocache-empty > garble build -a runtime > garble build -o=out_rebuild ./stdimporter [stderr] # test/main/stdimporter JZzQivnl.NtQJu0H3: relocation target JZzQivnl.iioHinYT not defined JZzQivnl.NtQJu0H3.func9: relocation target JZzQivnl.yz5z0NaH not defined JZzQivnl.(*ypvqhKiQ).String: relocation target JZzQivnl.eVciBQeI not defined JZzQivnl.(*ypvqhKiQ).PkgPath: relocation target JZzQivnl.eVciBQeI not defined [...] The fix consists of two steps. First, if we're building the runtime and listPackage fails on a package, that means we ran into scenario 1 above. To avoid the inconsistency, we fill ListedPackages with "go list [...] std". This means we'll always build runtime as described in scenario 2 above. Second, when building packages other than the runtime, we only allow listPackage to succeed if we're listing a dependency of the current package. This ensures we won't run into similar reproducibility bugs in the future. Finally, re-enable test-gotip on CI since this was the last test flake.
3 years ago
panic(err) // should never happen
}
pkg, ok := cache.ListedPackages[path]
if !ok {
panic(fmt.Sprintf("runtime listed a std package we can't find: %s", path))
}
only list missing packages when obfuscating the runtime We were listing all of std, which certainly worked, but was quite slow at over 200 packages. In practice, we can only be missing up to 20-30 packages. It was a good change as it fixed a severe bug, but it also introduced a fairly noticeable slow-down. The numbers are clear; this change shaves off multiple seconds when obfuscating the runtime with a cold cache: name old time/op new time/op delta Build/NoCache-16 5.06s ± 1% 1.94s ± 1% -61.64% (p=0.008 n=5+5) name old bin-B new bin-B delta Build/NoCache-16 6.70M ± 0% 6.71M ± 0% +0.05% (p=0.008 n=5+5) name old sys-time/op new sys-time/op delta Build/NoCache-16 13.4s ± 2% 5.0s ± 2% -62.45% (p=0.008 n=5+5) name old user-time/op new user-time/op delta Build/NoCache-16 60.6s ± 1% 19.8s ± 1% -67.34% (p=0.008 n=5+5) Since we only want to call "go list" one extra time, instead of once for every package we find out we're missing, we want to know what packages we could be missing in advance. Resurrect a smarter version of the runtime-related script. Finally, remove the runtime-related.txt test script, as it has now been superseeded by the sanity checks in listPackage. That is, obfuscating the runtime package will now panic if we are missing any necessary package information. To double check that we get the runtime's linkname edge case right, make gogarble.txt use runtime/debug.WriteHeapDump, which is implemented via a direct runtime linkname. This ensures we don't lose test coverage from runtime-related.txt.
3 years ago
listedRuntimeLinknamed = true
debugf("listed %d missing runtime-linknamed packages in %s", len(missing), debugSince(startTime))
ensure the runtime is built in a reproducible way We went to great lengths to ensure garble builds are reproducible. This includes how the tool itself works, as its behavior should be the same given the same inputs. However, we made one crucial mistake with the runtime package. It has go:linkname directives pointing at other packages, and some of those pointed packages aren't its dependencies. Imagine two scenarios where garble builds the runtime package: 1) We run "garble build runtime". The way we handle linkname directives calls listPackage on the target package, to obfuscate the target's import path and object name. However, since we only obtained build info of runtime and its deps, calls for some linknames such as listPackage("sync/atomic") will fail. The linkname directive will leave its target untouched. 2) We run "garble build std". Unlike the first scenario, all listPackage calls issued by runtime's linkname directives will succeed, so its linkname directive targets will be obfuscated. At best, this can result in inconsistent builds, depending on how the runtime package was built. At worst, the mismatching object names can result in errors at link time, if the target packages are actually used. The modified test reproduces the worst case scenario reliably, when the fix is reverted: > env GOCACHE=${WORK}/gocache-empty > garble build -a runtime > garble build -o=out_rebuild ./stdimporter [stderr] # test/main/stdimporter JZzQivnl.NtQJu0H3: relocation target JZzQivnl.iioHinYT not defined JZzQivnl.NtQJu0H3.func9: relocation target JZzQivnl.yz5z0NaH not defined JZzQivnl.(*ypvqhKiQ).String: relocation target JZzQivnl.eVciBQeI not defined JZzQivnl.(*ypvqhKiQ).PkgPath: relocation target JZzQivnl.eVciBQeI not defined [...] The fix consists of two steps. First, if we're building the runtime and listPackage fails on a package, that means we ran into scenario 1 above. To avoid the inconsistency, we fill ListedPackages with "go list [...] std". This means we'll always build runtime as described in scenario 2 above. Second, when building packages other than the runtime, we only allow listPackage to succeed if we're listing a dependency of the current package. This ensures we won't run into similar reproducibility bugs in the future. Finally, re-enable test-gotip on CI since this was the last test flake.
3 years ago
return pkg, nil
}
// Packages other than runtime can list any package,
// as long as they depend on it directly or indirectly.
if !ok {
return nil, fmt.Errorf("path not found in listed packages: %s", path)
}
ensure the runtime is built in a reproducible way We went to great lengths to ensure garble builds are reproducible. This includes how the tool itself works, as its behavior should be the same given the same inputs. However, we made one crucial mistake with the runtime package. It has go:linkname directives pointing at other packages, and some of those pointed packages aren't its dependencies. Imagine two scenarios where garble builds the runtime package: 1) We run "garble build runtime". The way we handle linkname directives calls listPackage on the target package, to obfuscate the target's import path and object name. However, since we only obtained build info of runtime and its deps, calls for some linknames such as listPackage("sync/atomic") will fail. The linkname directive will leave its target untouched. 2) We run "garble build std". Unlike the first scenario, all listPackage calls issued by runtime's linkname directives will succeed, so its linkname directive targets will be obfuscated. At best, this can result in inconsistent builds, depending on how the runtime package was built. At worst, the mismatching object names can result in errors at link time, if the target packages are actually used. The modified test reproduces the worst case scenario reliably, when the fix is reverted: > env GOCACHE=${WORK}/gocache-empty > garble build -a runtime > garble build -o=out_rebuild ./stdimporter [stderr] # test/main/stdimporter JZzQivnl.NtQJu0H3: relocation target JZzQivnl.iioHinYT not defined JZzQivnl.NtQJu0H3.func9: relocation target JZzQivnl.yz5z0NaH not defined JZzQivnl.(*ypvqhKiQ).String: relocation target JZzQivnl.eVciBQeI not defined JZzQivnl.(*ypvqhKiQ).PkgPath: relocation target JZzQivnl.eVciBQeI not defined [...] The fix consists of two steps. First, if we're building the runtime and listPackage fails on a package, that means we ran into scenario 1 above. To avoid the inconsistency, we fill ListedPackages with "go list [...] std". This means we'll always build runtime as described in scenario 2 above. Second, when building packages other than the runtime, we only allow listPackage to succeed if we're listing a dependency of the current package. This ensures we won't run into similar reproducibility bugs in the future. Finally, re-enable test-gotip on CI since this was the last test flake.
3 years ago
for _, dep := range curPkg.Deps {
if dep == pkg.ImportPath {
return pkg, nil
}
}
return nil, fmt.Errorf("refusing to list non-dependency package: %s", path)
}