simplify globals, split hash.go (#191)
The previous globals worked, but were unnecessarily complex. For
example, we passed the fromPath variable around, but it's really a
static global, since we only compile or link a single package in each Go
process. Use such global variables instead of passing them around, which
currently include the package's import path, its build ID, and its
import config path.
Also split all the hashing and build ID code into hash.go, since that's
a relatively well contained 200 lines of code that doesn't need to make
main.go any bigger. We also split the code to alter Go's own version to
a separate function, so that it can be moved out of main.go as well.
4 years ago
|
|
|
// Copyright (c) 2019, The Garble Authors.
|
|
|
|
// See LICENSE for licensing information.
|
|
|
|
|
|
|
|
package main
|
|
|
|
|
|
|
|
import (
|
|
|
|
"bytes"
|
|
|
|
"crypto/sha256"
|
|
|
|
"encoding/base64"
|
patch and rebuild cmd/link to modify the magic value in pclntab
This value is hard-coded in the linker and written in a header.
We could rewrite the final binary, like we used to do with import paths,
but that would require once again maintaining libraries to do so.
Instead, we're now modifying the linker to do what we want.
It's not particularly hard, as every Go install has its source code,
and rebuilding a slightly modified linker only takes a few seconds at most.
Thanks to `go build -overlay`, we only need to copy the files we modify,
and right now we're just modifying one file in the toolchain.
We use a git patch, as the change is fairly static and small,
and the patch is easier to understand and maintain.
The other side of this change is in the runtime,
as it also hard-codes the magic value when loading information.
We modify the code via syntax trees in that case, like `-tiny` does,
because the change is tiny (one literal) and the affected lines of code
are modified regularly between major Go releases.
Since rebuilding a slightly modified linker can take a few seconds,
and Go's build cache does not cache linked binaries,
we keep our own cached version of the rebuilt binary in `os.UserCacheDir`.
The feature isn't perfect, and will be improved in the future.
See the TODOs about the added dependency on `git`,
or how we are currently only able to cache one linker binary at once.
Fixes #622.
2 years ago
|
|
|
"encoding/binary"
|
simplify globals, split hash.go (#191)
The previous globals worked, but were unnecessarily complex. For
example, we passed the fromPath variable around, but it's really a
static global, since we only compile or link a single package in each Go
process. Use such global variables instead of passing them around, which
currently include the package's import path, its build ID, and its
import config path.
Also split all the hashing and build ID code into hash.go, since that's
a relatively well contained 200 lines of code that doesn't need to make
main.go any bigger. We also split the code to alter Go's own version to
a separate function, so that it can be moved out of main.go as well.
4 years ago
|
|
|
"fmt"
|
|
|
|
"go/token"
|
|
|
|
"go/types"
|
simplify globals, split hash.go (#191)
The previous globals worked, but were unnecessarily complex. For
example, we passed the fromPath variable around, but it's really a
static global, since we only compile or link a single package in each Go
process. Use such global variables instead of passing them around, which
currently include the package's import path, its build ID, and its
import config path.
Also split all the hashing and build ID code into hash.go, since that's
a relatively well contained 200 lines of code that doesn't need to make
main.go any bigger. We also split the code to alter Go's own version to
a separate function, so that it can be moved out of main.go as well.
4 years ago
|
|
|
"io"
|
|
|
|
"os/exec"
|
|
|
|
"strings"
|
|
|
|
|
|
|
|
"mvdan.cc/garble/internal/literals"
|
simplify globals, split hash.go (#191)
The previous globals worked, but were unnecessarily complex. For
example, we passed the fromPath variable around, but it's really a
static global, since we only compile or link a single package in each Go
process. Use such global variables instead of passing them around, which
currently include the package's import path, its build ID, and its
import config path.
Also split all the hashing and build ID code into hash.go, since that's
a relatively well contained 200 lines of code that doesn't need to make
main.go any bigger. We also split the code to alter Go's own version to
a separate function, so that it can be moved out of main.go as well.
4 years ago
|
|
|
)
|
|
|
|
|
|
|
|
const buildIDSeparator = "/"
|
|
|
|
|
|
|
|
// splitActionID returns the action ID half of a build ID, the first hash.
|
fix garbling names belonging to indirect imports (#203)
main.go includes a lengthy comment that documents this edge case, why it
happened, and how we are fixing it. To summarize, we should no longer
error with a build error in those cases. Read the comment for details.
A few other minor changes were done to allow writing this patch.
First, the actionID and contentID funcs were renamed, since they started
to collide with variable names.
Second, the logging has been improved a bit, which allowed me to debug
the issue.
Third, the "cache" global shared by all garble sub-processes now
includes the necessary parameters to run "go list -toolexec", including
the path to garble and the build flags being used.
Thanks to lu4p for writing a test case, which also applied gofmt to that
testdata Go file.
Fixes #180.
Closes #181, since it includes its test case.
4 years ago
|
|
|
func splitActionID(buildID string) string {
|
avoid one more call to 'go tool buildid' (#253)
We use it to get the content ID of garble's binary, which is used for
both the garble action IDs, as well as 'go tool compile -V=full'.
Since those two happen in separate processes, both used to call 'go tool
buildid' separately. Store it in the gob cache the first time, and reuse
it the second time.
Since each call to cmd/go costs about 10ms (new process, running its
many init funcs, etc), this results in a nice speed-up for our small
benchmark. Most builds will take many seconds though, so note that a
~15ms speedup there will likely not be noticeable.
While at it, simplify the buildInfo global, as now it just contains a
map representation of the -importcfg contents. It now has better names,
docs, and a simpler representation.
We also stop using the term "garbled import", as it was a bit confusing.
"obfuscated types.Package" is a much better description.
name old time/op new time/op delta
Build-8 106ms ± 1% 92ms ± 0% -14.07% (p=0.010 n=6+4)
name old bin-B new bin-B delta
Build-8 6.60M ± 0% 6.60M ± 0% -0.01% (p=0.002 n=6+6)
name old sys-time/op new sys-time/op delta
Build-8 208ms ± 5% 149ms ± 3% -28.27% (p=0.004 n=6+5)
name old user-time/op new user-time/op delta
Build-8 433ms ± 3% 384ms ± 3% -11.35% (p=0.002 n=6+6)
4 years ago
|
|
|
return buildID[:strings.Index(buildID, buildIDSeparator)]
|
simplify globals, split hash.go (#191)
The previous globals worked, but were unnecessarily complex. For
example, we passed the fromPath variable around, but it's really a
static global, since we only compile or link a single package in each Go
process. Use such global variables instead of passing them around, which
currently include the package's import path, its build ID, and its
import config path.
Also split all the hashing and build ID code into hash.go, since that's
a relatively well contained 200 lines of code that doesn't need to make
main.go any bigger. We also split the code to alter Go's own version to
a separate function, so that it can be moved out of main.go as well.
4 years ago
|
|
|
}
|
|
|
|
|
|
|
|
// splitContentID returns the content ID half of a build ID, the last hash.
|
fix garbling names belonging to indirect imports (#203)
main.go includes a lengthy comment that documents this edge case, why it
happened, and how we are fixing it. To summarize, we should no longer
error with a build error in those cases. Read the comment for details.
A few other minor changes were done to allow writing this patch.
First, the actionID and contentID funcs were renamed, since they started
to collide with variable names.
Second, the logging has been improved a bit, which allowed me to debug
the issue.
Third, the "cache" global shared by all garble sub-processes now
includes the necessary parameters to run "go list -toolexec", including
the path to garble and the build flags being used.
Thanks to lu4p for writing a test case, which also applied gofmt to that
testdata Go file.
Fixes #180.
Closes #181, since it includes its test case.
4 years ago
|
|
|
func splitContentID(buildID string) string {
|
simplify globals, split hash.go (#191)
The previous globals worked, but were unnecessarily complex. For
example, we passed the fromPath variable around, but it's really a
static global, since we only compile or link a single package in each Go
process. Use such global variables instead of passing them around, which
currently include the package's import path, its build ID, and its
import config path.
Also split all the hashing and build ID code into hash.go, since that's
a relatively well contained 200 lines of code that doesn't need to make
main.go any bigger. We also split the code to alter Go's own version to
a separate function, so that it can be moved out of main.go as well.
4 years ago
|
|
|
return buildID[strings.LastIndex(buildID, buildIDSeparator)+1:]
|
|
|
|
}
|
|
|
|
|
|
|
|
// buildIDHashLength is the number of bytes each build ID hash takes,
|
|
|
|
// such as an action ID or a content ID.
|
|
|
|
const buildIDHashLength = 15
|
|
|
|
|
|
|
|
// decodeBuildIDHash decodes a build ID hash in base64, just like cmd/go does.
|
|
|
|
func decodeBuildIDHash(str string) []byte {
|
simplify globals, split hash.go (#191)
The previous globals worked, but were unnecessarily complex. For
example, we passed the fromPath variable around, but it's really a
static global, since we only compile or link a single package in each Go
process. Use such global variables instead of passing them around, which
currently include the package's import path, its build ID, and its
import config path.
Also split all the hashing and build ID code into hash.go, since that's
a relatively well contained 200 lines of code that doesn't need to make
main.go any bigger. We also split the code to alter Go's own version to
a separate function, so that it can be moved out of main.go as well.
4 years ago
|
|
|
h, err := base64.RawURLEncoding.DecodeString(str)
|
|
|
|
if err != nil {
|
|
|
|
panic(fmt.Sprintf("invalid hash %q: %v", str, err))
|
|
|
|
}
|
|
|
|
if len(h) != buildIDHashLength {
|
|
|
|
panic(fmt.Sprintf("decodeBuildIDHash expects to result in a hash of length %d, got %d", buildIDHashLength, len(h)))
|
|
|
|
}
|
simplify globals, split hash.go (#191)
The previous globals worked, but were unnecessarily complex. For
example, we passed the fromPath variable around, but it's really a
static global, since we only compile or link a single package in each Go
process. Use such global variables instead of passing them around, which
currently include the package's import path, its build ID, and its
import config path.
Also split all the hashing and build ID code into hash.go, since that's
a relatively well contained 200 lines of code that doesn't need to make
main.go any bigger. We also split the code to alter Go's own version to
a separate function, so that it can be moved out of main.go as well.
4 years ago
|
|
|
return h
|
|
|
|
}
|
|
|
|
|
|
|
|
// encodeBuildIDHash encodes a build ID hash in base64, just like cmd/go does.
|
|
|
|
func encodeBuildIDHash(h [sha256.Size]byte) string {
|
|
|
|
return base64.RawURLEncoding.EncodeToString(h[:buildIDHashLength])
|
|
|
|
}
|
|
|
|
|
simplify globals, split hash.go (#191)
The previous globals worked, but were unnecessarily complex. For
example, we passed the fromPath variable around, but it's really a
static global, since we only compile or link a single package in each Go
process. Use such global variables instead of passing them around, which
currently include the package's import path, its build ID, and its
import config path.
Also split all the hashing and build ID code into hash.go, since that's
a relatively well contained 200 lines of code that doesn't need to make
main.go any bigger. We also split the code to alter Go's own version to
a separate function, so that it can be moved out of main.go as well.
4 years ago
|
|
|
func alterToolVersion(tool string, args []string) error {
|
|
|
|
cmd := exec.Command(args[0], args[1:]...)
|
|
|
|
out, err := cmd.Output()
|
|
|
|
if err != nil {
|
|
|
|
if err, _ := err.(*exec.ExitError); err != nil {
|
|
|
|
return fmt.Errorf("%v: %s", err, err.Stderr)
|
|
|
|
}
|
|
|
|
return err
|
|
|
|
}
|
|
|
|
line := string(bytes.TrimSpace(out)) // no trailing newline
|
|
|
|
f := strings.Fields(line)
|
|
|
|
if len(f) < 3 || f[0] != tool || f[1] != "version" || f[2] == "devel" && !strings.HasPrefix(f[len(f)-1], "buildID=") {
|
|
|
|
return fmt.Errorf("%s -V=full: unexpected output:\n\t%s", args[0], line)
|
|
|
|
}
|
|
|
|
var toolID []byte
|
|
|
|
if f[2] == "devel" {
|
|
|
|
// On the development branch, use the content ID part of the build ID.
|
|
|
|
toolID = decodeBuildIDHash(splitContentID(f[len(f)-1]))
|
simplify globals, split hash.go (#191)
The previous globals worked, but were unnecessarily complex. For
example, we passed the fromPath variable around, but it's really a
static global, since we only compile or link a single package in each Go
process. Use such global variables instead of passing them around, which
currently include the package's import path, its build ID, and its
import config path.
Also split all the hashing and build ID code into hash.go, since that's
a relatively well contained 200 lines of code that doesn't need to make
main.go any bigger. We also split the code to alter Go's own version to
a separate function, so that it can be moved out of main.go as well.
4 years ago
|
|
|
} else {
|
|
|
|
// For a release, the output is like: "compile version go1.9.1 X:framepointer".
|
start using original action IDs (#251)
When we obfuscate a name, what we do is hash the name with the action ID
of the package that contains the name. To ensure that the hash changes
if the garble tool changes, we used the action ID of the obfuscated
build, which is different than the original action ID, as we include
garble's own content ID in "go tool compile -V=full" via -toolexec.
Let's call that the "obfuscated action ID". Remember that a content ID
is roughly the hash of a binary or object file, and an action ID
contains the hash of a package's source code plus the content IDs of its
dependencies.
This had the advantage that it did what we wanted. However, it had one
massive drawback: when we compile a package, we only have the obfuscated
action IDs of its dependencies. This is because one can't have the
content ID of dependent packages before they are built.
Usually, this is not a problem, because hashing a foreign name means it
comes from a dependency, where we already have the obfuscated action ID.
However, that's not always the case.
First, go:linkname directives can point to any symbol that ends up in
the binary, even if the package is not a dependency. So garble could
only support linkname targets belonging to dependencies. This is at the
root of why we could not obfuscate the runtime; it contains linkname
directives targeting the net package, for example, which depends on runtime.
Second, some other places did not have an easy access to obfuscated
action IDs, like transformAsm, which had to recover it from a temporary
file stored by transformCompile.
Plus, this was all pretty expensive, as each toolexec sub-process had to
make repeated calls to buildidOf with the object files of dependencies.
We even had to use extra calls to "go list" in the case of indirect
dependencies, as their export files do not appear in importcfg files.
All in all, the old method was complex and expensive. A better mechanism
is to use the original action IDs directly, as listed by "go list"
without garble in the picture.
This would mean that the hashing does not change if garble changes,
meaning weaker obfuscation. To regain that property, we define the
"garble action ID", which is just the original action ID hashed together
with garble's own content ID.
This is practically the same as the obfuscated build ID we used before,
but since it doesn't go through "go tool compile -V=full" and the
obfuscated build itself, we can work out *all* the garble action IDs
upfront, before the obfuscated build even starts.
This fixes all of our problems. Now we know all garble build IDs
upfront, so a bunch of hacks can be entirely removed. Plus, since we
know them upfront, we can also cache them and avoid repeated calls to
"go tool buildid".
While at it, make use of the new BuildID field in Go 1.16's "list -json
-export". This avoids the vast majority of "go tool buildid" calls, as
the only ones that remain are 2 on the garble binary itself.
The numbers for Go 1.16 look very good:
name old time/op new time/op delta
Build-8 146ms ± 4% 101ms ± 1% -31.01% (p=0.002 n=6+6)
name old bin-B new bin-B delta
Build-8 6.61M ± 0% 6.60M ± 0% -0.09% (p=0.002 n=6+6)
name old sys-time/op new sys-time/op delta
Build-8 321ms ± 7% 202ms ± 6% -37.11% (p=0.002 n=6+6)
name old user-time/op new user-time/op delta
Build-8 538ms ± 4% 414ms ± 4% -23.12% (p=0.002 n=6+6)
4 years ago
|
|
|
// Use the whole line, as we can assume it's unique.
|
simplify globals, split hash.go (#191)
The previous globals worked, but were unnecessarily complex. For
example, we passed the fromPath variable around, but it's really a
static global, since we only compile or link a single package in each Go
process. Use such global variables instead of passing them around, which
currently include the package's import path, its build ID, and its
import config path.
Also split all the hashing and build ID code into hash.go, since that's
a relatively well contained 200 lines of code that doesn't need to make
main.go any bigger. We also split the code to alter Go's own version to
a separate function, so that it can be moved out of main.go as well.
4 years ago
|
|
|
toolID = []byte(line)
|
|
|
|
}
|
|
|
|
|
hash field names equally in all packages
Packages P1 and P2 can define identical struct types T1 and T2, and one
can convert from type T1 to T2 or vice versa.
The spec defines two identical struct types as:
Two struct types are identical if they have the same sequence of
fields, and if corresponding fields have the same names, and
identical types, and identical tags. Non-exported field names
from different packages are always different.
Unfortunately, garble broke this: since we obfuscated field names
differently depending on the package, cross-package conversions like the
case above would result in typechecking errors.
To fix this, implement Joe Tsai's idea: hash struct field names with the
string representation of the entire struct. This way, identical struct
types will have their field names obfuscated in the same way in all
packages across a build.
Note that we had to refactor "reverse" a bit to start using transformer,
since now it needs to keep track of struct types as well.
This failure was affecting the build of google.golang.org/protobuf,
since it makes regular use of cross-package struct conversions.
Note that the protobuf module still fails to build, but for other
reasons. The package that used to fail now succeeds, so the build gets a
bit further than before. #240 tracks adding relevant third-party Go
modules to CI, so we'll track the other remaining failures there.
Fixes #310.
4 years ago
|
|
|
contentID := addGarbleToHash(toolID)
|
simplify globals, split hash.go (#191)
The previous globals worked, but were unnecessarily complex. For
example, we passed the fromPath variable around, but it's really a
static global, since we only compile or link a single package in each Go
process. Use such global variables instead of passing them around, which
currently include the package's import path, its build ID, and its
import config path.
Also split all the hashing and build ID code into hash.go, since that's
a relatively well contained 200 lines of code that doesn't need to make
main.go any bigger. We also split the code to alter Go's own version to
a separate function, so that it can be moved out of main.go as well.
4 years ago
|
|
|
// The part of the build ID that matters is the last, since it's the
|
|
|
|
// "content ID" which is used to work out whether there is a need to redo
|
|
|
|
// the action (build) or not. Since cmd/go parses the last word in the
|
|
|
|
// output as "buildID=...", we simply add "+garble buildID=_/_/_/${hash}".
|
|
|
|
// The slashes let us imitate a full binary build ID, but we assume that
|
|
|
|
// the other hashes such as the action ID are not necessary, since the
|
simplify globals, split hash.go (#191)
The previous globals worked, but were unnecessarily complex. For
example, we passed the fromPath variable around, but it's really a
static global, since we only compile or link a single package in each Go
process. Use such global variables instead of passing them around, which
currently include the package's import path, its build ID, and its
import config path.
Also split all the hashing and build ID code into hash.go, since that's
a relatively well contained 200 lines of code that doesn't need to make
main.go any bigger. We also split the code to alter Go's own version to
a separate function, so that it can be moved out of main.go as well.
4 years ago
|
|
|
// only reader here is cmd/go and it only consumes the content ID.
|
|
|
|
fmt.Printf("%s +garble buildID=_/_/_/%s\n", line, encodeBuildIDHash(contentID))
|
simplify globals, split hash.go (#191)
The previous globals worked, but were unnecessarily complex. For
example, we passed the fromPath variable around, but it's really a
static global, since we only compile or link a single package in each Go
process. Use such global variables instead of passing them around, which
currently include the package's import path, its build ID, and its
import config path.
Also split all the hashing and build ID code into hash.go, since that's
a relatively well contained 200 lines of code that doesn't need to make
main.go any bigger. We also split the code to alter Go's own version to
a separate function, so that it can be moved out of main.go as well.
4 years ago
|
|
|
return nil
|
|
|
|
}
|
|
|
|
|
|
|
|
var (
|
immprove how hashWithCustomSalt comes up with its random lengths
The last change made it so that hashWithCustomSalt does not always end
up with 8 base64 characters, which is a good change for the sake of
avoiding easy patterns in obfuscated code.
However, the docs weren't updated accordingly, and it wasn't
particularly clear that the byte giving us randomness wasn't part of the
resulting base64-encoded name.
First, refactor the code to only feed as many checksum bytes to the
base64 encoder as necessary, which is 12.
This shrinks b64NameBuffer and saves us some base64 encoding work.
Second, use the first checksum byte that we don't use, the 13th,
as the source of the randomness.
Note how before we used a base64-encoded byte for the randomness,
which isn't great as that byte was only one of 63 characters,
whereas a checksum byte is one of 256.
Third, update the docs so that the code is as clear as possible.
This is particularly important given that we have no tests.
With debug prints in the gogarble.txt test, we can see that the
randomness in hash lengths is working as intended:
# test/main/stdimporter
hashLength = 13
hashLength = 8
hashLength = 12
hashLength = 15
hashLength = 10
hashLength = 15
hashLength = 9
hashLength = 8
hashLength = 15
hashLength = 15
hashLength = 12
hashLength = 10
hashLength = 13
hashLength = 13
hashLength = 8
hashLength = 15
hashLength = 11
Finally, add a regression test that will complain if we end up with
hashed names that reuse the same length too often.
Out of eight hashed names, the test will fail if six end up with the
same length, as that is incredibly unlikely given that each should pick
one of eight lengths with a fair distribution.
2 years ago
|
|
|
hasher = sha256.New()
|
|
|
|
sumBuffer [sha256.Size]byte
|
|
|
|
)
|
|
|
|
|
hash field names equally in all packages
Packages P1 and P2 can define identical struct types T1 and T2, and one
can convert from type T1 to T2 or vice versa.
The spec defines two identical struct types as:
Two struct types are identical if they have the same sequence of
fields, and if corresponding fields have the same names, and
identical types, and identical tags. Non-exported field names
from different packages are always different.
Unfortunately, garble broke this: since we obfuscated field names
differently depending on the package, cross-package conversions like the
case above would result in typechecking errors.
To fix this, implement Joe Tsai's idea: hash struct field names with the
string representation of the entire struct. This way, identical struct
types will have their field names obfuscated in the same way in all
packages across a build.
Note that we had to refactor "reverse" a bit to start using transformer,
since now it needs to keep track of struct types as well.
This failure was affecting the build of google.golang.org/protobuf,
since it makes regular use of cross-package struct conversions.
Note that the protobuf module still fails to build, but for other
reasons. The package that used to fail now succeeds, so the build gets a
bit further than before. #240 tracks adding relevant third-party Go
modules to CI, so we'll track the other remaining failures there.
Fixes #310.
4 years ago
|
|
|
// addGarbleToHash takes some arbitrary input bytes,
|
|
|
|
// typically a hash such as an action ID or a content ID,
|
|
|
|
// and returns a new hash which also contains garble's own deterministic inputs.
|
make flags like -literals and GOPRIVATE affect hashing (#288)
In 6898d61637, we switched from using action IDs from "go list
-toolexec=garble" to those from the original "go list". We still wanted
the obfuscation and hashing to change if the version of garble changes,
so we hashed that "original action ID" with garble's own content ID, and
called the new hash "garble action ID".
While working on a different patch, I noticed something weird: with the
new mechanism, adding or removing flags like -literals did not alter
those hashes, unlike the old method. This is because the old method used
ownContentID, which includes such bits of information, but the new
method does not.
Change that, and add a test that locks in the behavior we want. In
seed.txt, we check that a single function name gets hashed in particular
ways in different scenarios.
Note that we use a mix of "cmp" and "! bincmp", since the former has no
negated form.
While at it, the seed.txt test is revamped a bit. Now, we only run with
-literals once, as this test is mainly about -seed. We also declare seed
strings once, as environment variables, which makes it easier to track
what each step is doing.
4 years ago
|
|
|
//
|
|
|
|
// This includes garble's own version, obtained via its own binary's content ID,
|
deprecate using GOPRIVATE in favor of GOGARBLE (#427)
Piggybacking off of GOPRIVATE is great for a number of reasons:
* People tend to obfuscate private code, whose package paths will
generally be in GOPRIVATE already
* Its meaning and syntax are well understood
* It allows all the flexibility we need without adding our own env var
or config option
However, using GOPRIVATE directly has one main drawback.
It's fairly common to also want to obfuscate public dependencies,
to make the code in private packages even harder to follow.
However, using "GOPRIVATE=*" will result in two main downsides:
* GONOPROXY defaults to GOPRIVATE, so the proxy would be entirely disabled.
Downloading modules, such as when adding or updating dependencies,
or when the local cache is cold, can be less reliable.
* GONOSUMDB defaults to GOPRIVATE, so the sumdb would be entirely disabled.
Adding entries to go.sum, such as when adding or updating dependencies,
can be less secure.
We will continue to consume GOPRIVATE as a fallback,
but we now expect users to set GOGARBLE instead.
The new logic is documented in the README.
While here, rewrite some uses of "private" with "to obfuscate",
to make the code easier to follow and harder to misunderstand.
Fixes #276.
3 years ago
|
|
|
// as well as any other options which affect a build, such as GOGARBLE and -tiny.
|
|
|
|
func addGarbleToHash(inputHash []byte) [sha256.Size]byte {
|
simplify globals, split hash.go (#191)
The previous globals worked, but were unnecessarily complex. For
example, we passed the fromPath variable around, but it's really a
static global, since we only compile or link a single package in each Go
process. Use such global variables instead of passing them around, which
currently include the package's import path, its build ID, and its
import config path.
Also split all the hashing and build ID code into hash.go, since that's
a relatively well contained 200 lines of code that doesn't need to make
main.go any bigger. We also split the code to alter Go's own version to
a separate function, so that it can be moved out of main.go as well.
4 years ago
|
|
|
// Join the two content IDs together into a single base64-encoded sha256
|
|
|
|
// sum. This includes the original tool's content ID, and garble's own
|
|
|
|
// content ID.
|
|
|
|
hasher.Reset()
|
|
|
|
hasher.Write(inputHash)
|
|
|
|
if len(sharedCache.BinaryContentID) == 0 {
|
avoid one more call to 'go tool buildid' (#253)
We use it to get the content ID of garble's binary, which is used for
both the garble action IDs, as well as 'go tool compile -V=full'.
Since those two happen in separate processes, both used to call 'go tool
buildid' separately. Store it in the gob cache the first time, and reuse
it the second time.
Since each call to cmd/go costs about 10ms (new process, running its
many init funcs, etc), this results in a nice speed-up for our small
benchmark. Most builds will take many seconds though, so note that a
~15ms speedup there will likely not be noticeable.
While at it, simplify the buildInfo global, as now it just contains a
map representation of the -importcfg contents. It now has better names,
docs, and a simpler representation.
We also stop using the term "garbled import", as it was a bit confusing.
"obfuscated types.Package" is a much better description.
name old time/op new time/op delta
Build-8 106ms ± 1% 92ms ± 0% -14.07% (p=0.010 n=6+4)
name old bin-B new bin-B delta
Build-8 6.60M ± 0% 6.60M ± 0% -0.01% (p=0.002 n=6+6)
name old sys-time/op new sys-time/op delta
Build-8 208ms ± 5% 149ms ± 3% -28.27% (p=0.004 n=6+5)
name old user-time/op new user-time/op delta
Build-8 433ms ± 3% 384ms ± 3% -11.35% (p=0.002 n=6+6)
4 years ago
|
|
|
panic("missing binary content ID")
|
|
|
|
}
|
|
|
|
hasher.Write(sharedCache.BinaryContentID)
|
simplify globals, split hash.go (#191)
The previous globals worked, but were unnecessarily complex. For
example, we passed the fromPath variable around, but it's really a
static global, since we only compile or link a single package in each Go
process. Use such global variables instead of passing them around, which
currently include the package's import path, its build ID, and its
import config path.
Also split all the hashing and build ID code into hash.go, since that's
a relatively well contained 200 lines of code that doesn't need to make
main.go any bigger. We also split the code to alter Go's own version to
a separate function, so that it can be moved out of main.go as well.
4 years ago
|
|
|
|
|
|
|
// We also need to add the selected options to the full version string,
|
|
|
|
// because all of them result in different output. We use spaces to
|
|
|
|
// separate the env vars and flags, to reduce the chances of collisions.
|
|
|
|
fmt.Fprintf(hasher, " GOGARBLE=%s", sharedCache.GOGARBLE)
|
|
|
|
appendFlags(hasher, true)
|
|
|
|
// addGarbleToHash returns the sum buffer, so we need a new copy.
|
|
|
|
// Otherwise the next use of the global sumBuffer would conflict.
|
|
|
|
var sumBuffer [sha256.Size]byte
|
|
|
|
hasher.Sum(sumBuffer[:0])
|
|
|
|
return sumBuffer
|
unify the definition and storage of flag values
The parent garble process parses the original flags,
as provided by the user via the command line.
Previously, those got stored in the shared cache file,
so that child processes spawned by toolexec could see them.
Unfortunately, this made the code relatively easy to misuse.
A child process would always see flagLiterals as zero value,
given that it should never see such a flag argument directly.
Similarly, one would have to be careful with cached options,
as they could only be consumed after the cache file is loaded.
Simplify the situation by deduplicating the storage of flags.
Now, the parent passes all flags onto children via toolexec.
One exception is GarbleDir, which now becomes an env var.
This seems in line with other top-level dirs like GARBLE_SHARED.
Finally, we turn -seed into a flag.Value,
which lets us implement its "set" behavior as part of flag.Parse.
Overall, we barely reduce the amount of code involved,
but we certainly remove a couple of footguns.
As part of the cleanup, we also introduce appendFlags.
3 years ago
|
|
|
}
|
|
|
|
|
|
|
|
// appendFlags writes garble's own flags to w in string form.
|
|
|
|
// Errors are ignored, as w is always a buffer or hasher.
|
|
|
|
// If forBuildHash is set, only the flags affecting a build are written.
|
|
|
|
func appendFlags(w io.Writer, forBuildHash bool) {
|
unify the definition and storage of flag values
The parent garble process parses the original flags,
as provided by the user via the command line.
Previously, those got stored in the shared cache file,
so that child processes spawned by toolexec could see them.
Unfortunately, this made the code relatively easy to misuse.
A child process would always see flagLiterals as zero value,
given that it should never see such a flag argument directly.
Similarly, one would have to be careful with cached options,
as they could only be consumed after the cache file is loaded.
Simplify the situation by deduplicating the storage of flags.
Now, the parent passes all flags onto children via toolexec.
One exception is GarbleDir, which now becomes an env var.
This seems in line with other top-level dirs like GARBLE_SHARED.
Finally, we turn -seed into a flag.Value,
which lets us implement its "set" behavior as part of flag.Parse.
Overall, we barely reduce the amount of code involved,
but we certainly remove a couple of footguns.
As part of the cleanup, we also introduce appendFlags.
3 years ago
|
|
|
if flagLiterals {
|
|
|
|
io.WriteString(w, " -literals")
|
simplify globals, split hash.go (#191)
The previous globals worked, but were unnecessarily complex. For
example, we passed the fromPath variable around, but it's really a
static global, since we only compile or link a single package in each Go
process. Use such global variables instead of passing them around, which
currently include the package's import path, its build ID, and its
import config path.
Also split all the hashing and build ID code into hash.go, since that's
a relatively well contained 200 lines of code that doesn't need to make
main.go any bigger. We also split the code to alter Go's own version to
a separate function, so that it can be moved out of main.go as well.
4 years ago
|
|
|
}
|
unify the definition and storage of flag values
The parent garble process parses the original flags,
as provided by the user via the command line.
Previously, those got stored in the shared cache file,
so that child processes spawned by toolexec could see them.
Unfortunately, this made the code relatively easy to misuse.
A child process would always see flagLiterals as zero value,
given that it should never see such a flag argument directly.
Similarly, one would have to be careful with cached options,
as they could only be consumed after the cache file is loaded.
Simplify the situation by deduplicating the storage of flags.
Now, the parent passes all flags onto children via toolexec.
One exception is GarbleDir, which now becomes an env var.
This seems in line with other top-level dirs like GARBLE_SHARED.
Finally, we turn -seed into a flag.Value,
which lets us implement its "set" behavior as part of flag.Parse.
Overall, we barely reduce the amount of code involved,
but we certainly remove a couple of footguns.
As part of the cleanup, we also introduce appendFlags.
3 years ago
|
|
|
if flagTiny {
|
|
|
|
io.WriteString(w, " -tiny")
|
simplify globals, split hash.go (#191)
The previous globals worked, but were unnecessarily complex. For
example, we passed the fromPath variable around, but it's really a
static global, since we only compile or link a single package in each Go
process. Use such global variables instead of passing them around, which
currently include the package's import path, its build ID, and its
import config path.
Also split all the hashing and build ID code into hash.go, since that's
a relatively well contained 200 lines of code that doesn't need to make
main.go any bigger. We also split the code to alter Go's own version to
a separate function, so that it can be moved out of main.go as well.
4 years ago
|
|
|
}
|
|
|
|
if flagDebug && !forBuildHash {
|
|
|
|
// -debug doesn't affect the build result at all,
|
|
|
|
// so don't give it separate entries in the build cache.
|
|
|
|
// If the user really wants to see debug info for already built deps,
|
|
|
|
// they can use "go clean cache" or the "-a" build flag to rebuild.
|
|
|
|
io.WriteString(w, " -debug")
|
|
|
|
}
|
|
|
|
if flagDebugDir != "" && !forBuildHash {
|
|
|
|
// -debugdir is a bit special.
|
|
|
|
//
|
|
|
|
// When passing down flags via -toolexec,
|
|
|
|
// we do want the actual flag value to be kept.
|
|
|
|
//
|
|
|
|
// For build hashes, we can skip the flag entirely,
|
|
|
|
// as it doesn't affect obfuscation at all.
|
|
|
|
//
|
|
|
|
// TODO: in the future, we could avoid using the -a build flag
|
|
|
|
// by using "-debugdir=yes" here, and caching the obfuscated source.
|
|
|
|
// Incremental builds would recover the cached source
|
|
|
|
// to repopulate the output directory if it was removed.
|
unify the definition and storage of flag values
The parent garble process parses the original flags,
as provided by the user via the command line.
Previously, those got stored in the shared cache file,
so that child processes spawned by toolexec could see them.
Unfortunately, this made the code relatively easy to misuse.
A child process would always see flagLiterals as zero value,
given that it should never see such a flag argument directly.
Similarly, one would have to be careful with cached options,
as they could only be consumed after the cache file is loaded.
Simplify the situation by deduplicating the storage of flags.
Now, the parent passes all flags onto children via toolexec.
One exception is GarbleDir, which now becomes an env var.
This seems in line with other top-level dirs like GARBLE_SHARED.
Finally, we turn -seed into a flag.Value,
which lets us implement its "set" behavior as part of flag.Parse.
Overall, we barely reduce the amount of code involved,
but we certainly remove a couple of footguns.
As part of the cleanup, we also introduce appendFlags.
3 years ago
|
|
|
io.WriteString(w, " -debugdir=")
|
|
|
|
io.WriteString(w, flagDebugDir)
|
|
|
|
}
|
|
|
|
if flagSeed.present() {
|
unify the definition and storage of flag values
The parent garble process parses the original flags,
as provided by the user via the command line.
Previously, those got stored in the shared cache file,
so that child processes spawned by toolexec could see them.
Unfortunately, this made the code relatively easy to misuse.
A child process would always see flagLiterals as zero value,
given that it should never see such a flag argument directly.
Similarly, one would have to be careful with cached options,
as they could only be consumed after the cache file is loaded.
Simplify the situation by deduplicating the storage of flags.
Now, the parent passes all flags onto children via toolexec.
One exception is GarbleDir, which now becomes an env var.
This seems in line with other top-level dirs like GARBLE_SHARED.
Finally, we turn -seed into a flag.Value,
which lets us implement its "set" behavior as part of flag.Parse.
Overall, we barely reduce the amount of code involved,
but we certainly remove a couple of footguns.
As part of the cleanup, we also introduce appendFlags.
3 years ago
|
|
|
io.WriteString(w, " -seed=")
|
|
|
|
io.WriteString(w, flagSeed.String())
|
simplify globals, split hash.go (#191)
The previous globals worked, but were unnecessarily complex. For
example, we passed the fromPath variable around, but it's really a
static global, since we only compile or link a single package in each Go
process. Use such global variables instead of passing them around, which
currently include the package's import path, its build ID, and its
import config path.
Also split all the hashing and build ID code into hash.go, since that's
a relatively well contained 200 lines of code that doesn't need to make
main.go any bigger. We also split the code to alter Go's own version to
a separate function, so that it can be moved out of main.go as well.
4 years ago
|
|
|
}
|
|
|
|
if flagControlFlow && forBuildHash {
|
|
|
|
io.WriteString(w, " -ctrlflow")
|
|
|
|
}
|
|
|
|
if literals.TestObfuscator != "" && forBuildHash {
|
|
|
|
io.WriteString(w, literals.TestObfuscator)
|
|
|
|
}
|
simplify globals, split hash.go (#191)
The previous globals worked, but were unnecessarily complex. For
example, we passed the fromPath variable around, but it's really a
static global, since we only compile or link a single package in each Go
process. Use such global variables instead of passing them around, which
currently include the package's import path, its build ID, and its
import config path.
Also split all the hashing and build ID code into hash.go, since that's
a relatively well contained 200 lines of code that doesn't need to make
main.go any bigger. We also split the code to alter Go's own version to
a separate function, so that it can be moved out of main.go as well.
4 years ago
|
|
|
}
|
|
|
|
|
|
|
|
func buildidOf(path string) (string, error) {
|
|
|
|
cmd := exec.Command("go", "tool", "buildid", path)
|
|
|
|
out, err := cmd.Output()
|
|
|
|
if err != nil {
|
|
|
|
if err, _ := err.(*exec.ExitError); err != nil {
|
|
|
|
return "", fmt.Errorf("%v: %s", err, err.Stderr)
|
|
|
|
}
|
|
|
|
return "", err
|
|
|
|
}
|
|
|
|
return string(out), nil
|
|
|
|
}
|
|
|
|
|
|
|
|
var (
|
|
|
|
// Hashed names are base64-encoded.
|
|
|
|
// Go names can only be letters, numbers, and underscores.
|
|
|
|
// This means we can use base64's URL encoding, minus '-',
|
|
|
|
// which is later replaced with a duplicate 'a'.
|
|
|
|
// Such a lossy encoding is fine, since we never decode hashes.
|
immprove how hashWithCustomSalt comes up with its random lengths
The last change made it so that hashWithCustomSalt does not always end
up with 8 base64 characters, which is a good change for the sake of
avoiding easy patterns in obfuscated code.
However, the docs weren't updated accordingly, and it wasn't
particularly clear that the byte giving us randomness wasn't part of the
resulting base64-encoded name.
First, refactor the code to only feed as many checksum bytes to the
base64 encoder as necessary, which is 12.
This shrinks b64NameBuffer and saves us some base64 encoding work.
Second, use the first checksum byte that we don't use, the 13th,
as the source of the randomness.
Note how before we used a base64-encoded byte for the randomness,
which isn't great as that byte was only one of 63 characters,
whereas a checksum byte is one of 256.
Third, update the docs so that the code is as clear as possible.
This is particularly important given that we have no tests.
With debug prints in the gogarble.txt test, we can see that the
randomness in hash lengths is working as intended:
# test/main/stdimporter
hashLength = 13
hashLength = 8
hashLength = 12
hashLength = 15
hashLength = 10
hashLength = 15
hashLength = 9
hashLength = 8
hashLength = 15
hashLength = 15
hashLength = 12
hashLength = 10
hashLength = 13
hashLength = 13
hashLength = 8
hashLength = 15
hashLength = 11
Finally, add a regression test that will complain if we end up with
hashed names that reuse the same length too often.
Out of eight hashed names, the test will fail if six end up with the
same length, as that is incredibly unlikely given that each should pick
one of eight lengths with a fair distribution.
2 years ago
|
|
|
// We don't need padding either, as we take a short prefix anyway.
|
|
|
|
nameBase64 = base64.URLEncoding.WithPadding(base64.NoPadding)
|
immprove how hashWithCustomSalt comes up with its random lengths
The last change made it so that hashWithCustomSalt does not always end
up with 8 base64 characters, which is a good change for the sake of
avoiding easy patterns in obfuscated code.
However, the docs weren't updated accordingly, and it wasn't
particularly clear that the byte giving us randomness wasn't part of the
resulting base64-encoded name.
First, refactor the code to only feed as many checksum bytes to the
base64 encoder as necessary, which is 12.
This shrinks b64NameBuffer and saves us some base64 encoding work.
Second, use the first checksum byte that we don't use, the 13th,
as the source of the randomness.
Note how before we used a base64-encoded byte for the randomness,
which isn't great as that byte was only one of 63 characters,
whereas a checksum byte is one of 256.
Third, update the docs so that the code is as clear as possible.
This is particularly important given that we have no tests.
With debug prints in the gogarble.txt test, we can see that the
randomness in hash lengths is working as intended:
# test/main/stdimporter
hashLength = 13
hashLength = 8
hashLength = 12
hashLength = 15
hashLength = 10
hashLength = 15
hashLength = 9
hashLength = 8
hashLength = 15
hashLength = 15
hashLength = 12
hashLength = 10
hashLength = 13
hashLength = 13
hashLength = 8
hashLength = 15
hashLength = 11
Finally, add a regression test that will complain if we end up with
hashed names that reuse the same length too often.
Out of eight hashed names, the test will fail if six end up with the
same length, as that is incredibly unlikely given that each should pick
one of eight lengths with a fair distribution.
2 years ago
|
|
|
|
|
|
|
b64NameBuffer [12]byte // nameBase64.EncodedLen(neededSumBytes) = 12
|
|
|
|
)
|
|
|
|
|
|
|
|
// These funcs mimic the unicode package API, but byte-based since we know
|
|
|
|
// base64 is all ASCII.
|
|
|
|
|
|
|
|
func isDigit(b byte) bool { return '0' <= b && b <= '9' }
|
|
|
|
func isLower(b byte) bool { return 'a' <= b && b <= 'z' }
|
|
|
|
func isUpper(b byte) bool { return 'A' <= b && b <= 'Z' }
|
|
|
|
func toLower(b byte) byte { return b + ('a' - 'A') }
|
|
|
|
func toUpper(b byte) byte { return b - ('a' - 'A') }
|
|
|
|
|
|
|
|
func runtimeHashWithCustomSalt(salt []byte) uint32 {
|
patch and rebuild cmd/link to modify the magic value in pclntab
This value is hard-coded in the linker and written in a header.
We could rewrite the final binary, like we used to do with import paths,
but that would require once again maintaining libraries to do so.
Instead, we're now modifying the linker to do what we want.
It's not particularly hard, as every Go install has its source code,
and rebuilding a slightly modified linker only takes a few seconds at most.
Thanks to `go build -overlay`, we only need to copy the files we modify,
and right now we're just modifying one file in the toolchain.
We use a git patch, as the change is fairly static and small,
and the patch is easier to understand and maintain.
The other side of this change is in the runtime,
as it also hard-codes the magic value when loading information.
We modify the code via syntax trees in that case, like `-tiny` does,
because the change is tiny (one literal) and the affected lines of code
are modified regularly between major Go releases.
Since rebuilding a slightly modified linker can take a few seconds,
and Go's build cache does not cache linked binaries,
we keep our own cached version of the rebuilt binary in `os.UserCacheDir`.
The feature isn't perfect, and will be improved in the future.
See the TODOs about the added dependency on `git`,
or how we are currently only able to cache one linker binary at once.
Fixes #622.
2 years ago
|
|
|
hasher.Reset()
|
|
|
|
if !flagSeed.present() {
|
|
|
|
hasher.Write(sharedCache.ListedPackages["runtime"].GarbleActionID[:])
|
patch and rebuild cmd/link to modify the magic value in pclntab
This value is hard-coded in the linker and written in a header.
We could rewrite the final binary, like we used to do with import paths,
but that would require once again maintaining libraries to do so.
Instead, we're now modifying the linker to do what we want.
It's not particularly hard, as every Go install has its source code,
and rebuilding a slightly modified linker only takes a few seconds at most.
Thanks to `go build -overlay`, we only need to copy the files we modify,
and right now we're just modifying one file in the toolchain.
We use a git patch, as the change is fairly static and small,
and the patch is easier to understand and maintain.
The other side of this change is in the runtime,
as it also hard-codes the magic value when loading information.
We modify the code via syntax trees in that case, like `-tiny` does,
because the change is tiny (one literal) and the affected lines of code
are modified regularly between major Go releases.
Since rebuilding a slightly modified linker can take a few seconds,
and Go's build cache does not cache linked binaries,
we keep our own cached version of the rebuilt binary in `os.UserCacheDir`.
The feature isn't perfect, and will be improved in the future.
See the TODOs about the added dependency on `git`,
or how we are currently only able to cache one linker binary at once.
Fixes #622.
2 years ago
|
|
|
} else {
|
|
|
|
hasher.Write(flagSeed.bytes)
|
|
|
|
}
|
|
|
|
hasher.Write(salt)
|
patch and rebuild cmd/link to modify the magic value in pclntab
This value is hard-coded in the linker and written in a header.
We could rewrite the final binary, like we used to do with import paths,
but that would require once again maintaining libraries to do so.
Instead, we're now modifying the linker to do what we want.
It's not particularly hard, as every Go install has its source code,
and rebuilding a slightly modified linker only takes a few seconds at most.
Thanks to `go build -overlay`, we only need to copy the files we modify,
and right now we're just modifying one file in the toolchain.
We use a git patch, as the change is fairly static and small,
and the patch is easier to understand and maintain.
The other side of this change is in the runtime,
as it also hard-codes the magic value when loading information.
We modify the code via syntax trees in that case, like `-tiny` does,
because the change is tiny (one literal) and the affected lines of code
are modified regularly between major Go releases.
Since rebuilding a slightly modified linker can take a few seconds,
and Go's build cache does not cache linked binaries,
we keep our own cached version of the rebuilt binary in `os.UserCacheDir`.
The feature isn't perfect, and will be improved in the future.
See the TODOs about the added dependency on `git`,
or how we are currently only able to cache one linker binary at once.
Fixes #622.
2 years ago
|
|
|
sum := hasher.Sum(sumBuffer[:0])
|
|
|
|
return binary.LittleEndian.Uint32(sum)
|
|
|
|
}
|
|
|
|
|
|
|
|
// magicValue returns random magic value based
|
|
|
|
// on user specified seed or the runtime package's GarbleActionID.
|
|
|
|
func magicValue() uint32 {
|
|
|
|
return runtimeHashWithCustomSalt([]byte("magic"))
|
|
|
|
}
|
|
|
|
|
|
|
|
// entryOffKey returns random entry offset key
|
|
|
|
// on user specified seed or the runtime package's GarbleActionID.
|
|
|
|
func entryOffKey() uint32 {
|
|
|
|
return runtimeHashWithCustomSalt([]byte("entryOffKey"))
|
|
|
|
}
|
|
|
|
|
obfuscate all names used in reflection
Go code can retrieve and use field and method names via the `reflect` package.
For that reason, historically we did not obfuscate names of fields and methods
underneath types that we detected as used for reflection, via e.g. `reflect.TypeOf`.
However, that caused a number of issues. Since we obfuscate and build one package
at a time, we could only detect when types were used for reflection in their own package
or in upstream packages. Use of reflection in downstream packages would be detected
too late, causing one package to obfuscate the names and the other not to, leading to a build failure.
A different approach is implemented here. All names are obfuscated now, but we collect
those types used for reflection, and at the end of a build in `package main`,
we inject a function into the runtime's `internal/abi` package to reverse the obfuscation
for those names which can be used for reflection.
This does mean that the obfuscation for these names is very weak, as the binary
contains a one-to-one mapping to their original names, but they cannot be obfuscated
without breaking too many Go packages out in the wild. There is also some amount
of overhead in `internal/abi` due to this, but we aim to make the overhead insignificant.
Fixes #884, #799, #817, #881, #858, #843, #842
Closes #406
4 months ago
|
|
|
func hashWithPackage(pkg *listedPackage, name string) string {
|
|
|
|
// If the user provided us with an obfuscation seed,
|
|
|
|
// we use that with the package import path directly..
|
|
|
|
// Otherwise, we use GarbleActionID as a fallback salt.
|
|
|
|
if !flagSeed.present() {
|
|
|
|
return hashWithCustomSalt(pkg.GarbleActionID[:], name)
|
|
|
|
}
|
|
|
|
// Use a separator at the end of ImportPath as a salt,
|
|
|
|
// to ensure that "pkgfoo.bar" and "pkg.foobar" don't both hash
|
|
|
|
// as the same string "pkgfoobar".
|
|
|
|
return hashWithCustomSalt([]byte(pkg.ImportPath+"|"), name)
|
|
|
|
}
|
|
|
|
|
|
|
|
// stripStructTags takes the bytes produced by [types.WriteType]
|
|
|
|
// and removes any struct tags in-place, such as rewriting
|
|
|
|
//
|
|
|
|
// struct{Foo int; Bar string "json:\"bar\""}
|
|
|
|
//
|
|
|
|
// into
|
|
|
|
//
|
|
|
|
// struct{Foo int; Bar string}
|
|
|
|
//
|
|
|
|
// Note that, unlike most Go source, WriteType uses double quotes for tags.
|
|
|
|
//
|
|
|
|
// Reusing WriteType does require a second pass over its output here,
|
|
|
|
// which we could save by implementing our own modified version of WriteType.
|
|
|
|
// However, that would be a significant amount of code to maintain.
|
|
|
|
func stripStructTags(p []byte) []byte {
|
|
|
|
i := 0
|
|
|
|
for i < len(p) {
|
|
|
|
b := p[i]
|
|
|
|
start := i - 1 // a struct tag is preceded by a space
|
|
|
|
i++
|
|
|
|
if b != '"' {
|
|
|
|
continue
|
|
|
|
}
|
|
|
|
// Find the closing double quote, skipping over escaped characters.
|
|
|
|
// Note that we should probably iterate over runes and not bytes,
|
|
|
|
// but this byte implementation is probably good enough in practice.
|
|
|
|
for {
|
|
|
|
b = p[i]
|
|
|
|
i++
|
|
|
|
if b == '\\' {
|
|
|
|
i++
|
|
|
|
} else if b == '"' {
|
|
|
|
break
|
|
|
|
}
|
|
|
|
}
|
|
|
|
end := i
|
|
|
|
// Remove the bytes between start and end,
|
|
|
|
// and reset i to start, since we just shortened p.
|
|
|
|
p = append(p[:start], p[end:]...)
|
|
|
|
i = start
|
|
|
|
}
|
|
|
|
return p
|
|
|
|
}
|
|
|
|
|
|
|
|
var typeIdentityBuf bytes.Buffer
|
|
|
|
|
|
|
|
// hashWithStruct is separate from hashWithPackage since Go
|
|
|
|
// allows converting between struct types across packages.
|
|
|
|
// Hashing struct field names differently between packages would break that.
|
|
|
|
//
|
|
|
|
// We hash field names with the identity struct type as a salt
|
|
|
|
// so that the same field name used in different struct types is obfuscated differently.
|
|
|
|
// Note that "identity" means omitting struct tags since conversions ignore them.
|
|
|
|
func hashWithStruct(strct *types.Struct, field *types.Var) string {
|
|
|
|
typeIdentityBuf.Reset()
|
|
|
|
types.WriteType(&typeIdentityBuf, strct, nil)
|
|
|
|
salt := stripStructTags(typeIdentityBuf.Bytes())
|
|
|
|
|
|
|
|
// If the user provided us with an obfuscation seed,
|
|
|
|
// we only use the identity struct type as a salt.
|
|
|
|
// Otherwise, we add garble's own inputs to the salt as a fallback.
|
|
|
|
if !flagSeed.present() {
|
|
|
|
withGarbleHash := addGarbleToHash(salt)
|
|
|
|
salt = withGarbleHash[:]
|
|
|
|
}
|
|
|
|
return hashWithCustomSalt(salt, field.Name())
|
|
|
|
}
|
|
|
|
|
|
|
|
// minHashLength and maxHashLength define the range for the number of base64
|
|
|
|
// characters to use for the final hashed name.
|
|
|
|
//
|
|
|
|
// minHashLength needs to be long enough to realistically avoid hash collisions,
|
|
|
|
// but maxHashLength should be short enough to not bloat binary sizes.
|
|
|
|
// The namespace for collisions is generally a single package, since
|
|
|
|
// that's where most hashed names are namespaced to.
|
|
|
|
//
|
|
|
|
// Using a "hash collision" formula, and taking a generous estimate of a
|
|
|
|
// package having 10k names, we get the following probabilities.
|
|
|
|
// Most packages will have far fewer names, but some packages are huge,
|
|
|
|
// especially generated ones.
|
|
|
|
//
|
|
|
|
// We also have slightly fewer bits in practice, since the base64
|
|
|
|
// charset has 'z' twice, and the first base64 char is coerced into a
|
|
|
|
// valid Go identifier. So we must be conservative.
|
|
|
|
// Remember that base64 stores 6 bits per encoded byte.
|
|
|
|
// The probability numbers are approximated.
|
|
|
|
//
|
|
|
|
// length (base64) | length (bits) | collision probability
|
|
|
|
// -------------------------------------------------------
|
|
|
|
// 4 24 ~95%
|
|
|
|
// 5 30 ~4%
|
|
|
|
// 6 36 ~0.07%
|
|
|
|
// 7 42 ~0.001%
|
|
|
|
// 8 48 ~0.00001%
|
|
|
|
//
|
|
|
|
// We want collisions to be practically impossible, so the hashed names end up
|
|
|
|
// with lengths evenly distributed between 6 and 12. Naively, this results in an
|
|
|
|
// average length of 9, which has a chance well below 1 in a million even when a
|
|
|
|
// package has thousands of obfuscated names.
|
|
|
|
//
|
|
|
|
// These numbers are also chosen to keep obfuscated binary sizes reasonable.
|
|
|
|
// For example, increasing the average length of 9 by 1 results in roughly a 1%
|
|
|
|
// increase in binary sizes.
|
|
|
|
const (
|
|
|
|
minHashLength = 6
|
|
|
|
maxHashLength = 12
|
|
|
|
|
|
|
|
// At most we'll need maxHashLength base64 characters,
|
|
|
|
// so 9 checksum bytes are enough for that purpose,
|
|
|
|
// which is nameBase64.DecodedLen(12) being rounded up.
|
|
|
|
neededSumBytes = 9
|
|
|
|
)
|
|
|
|
|
|
|
|
// hashWithCustomSalt returns a hashed version of name,
|
|
|
|
// including the provided salt as well as opts.Seed into the hash input.
|
|
|
|
//
|
|
|
|
// The result is always four bytes long. If the input was a valid identifier,
|
|
|
|
// the output remains equally exported or unexported. Note that this process is
|
|
|
|
// reproducible, but not reversible.
|
|
|
|
func hashWithCustomSalt(salt []byte, name string) string {
|
obfuscate unexported names like exported ones (#227)
In 90fa325da7, the obfuscation logic was changed to use hashes for
exported names, but incremental names starting at just one letter for
unexported names. Presumably, this was done for the sake of binary size.
I argue that this is not a good idea for the default mode for a number
of reasons:
1) It makes reversing of stack traces nearly impossible for unexported
names, since replacing an obfuscated name "c" with "originalName"
would trigger too many false positives by matching single characters.
2) Exported and unexported names aren't different. We need to know how
names were obfuscated at a later time in both cases, thanks to use
cases like -ldflags=-X. Using short names for one but not the other
doesn't make a lot of sense, and makes the logic inconsistent.
3) Shaving off three bytes for unexported names doesn't seem like a huge
deal for the default mode, when we already have -tiny to optimize for
size.
This saves us a bit of work, but most importantly, simplifies the
obfuscation state as we no longer need to carry privateNameMap between
the compile and link stages.
name old time/op new time/op delta
Build-8 153ms ± 2% 150ms ± 2% ~ (p=0.065 n=6+6)
name old bin-B new bin-B delta
Build-8 7.09M ± 0% 7.08M ± 0% -0.24% (p=0.002 n=6+6)
name old sys-time/op new sys-time/op delta
Build-8 296ms ± 5% 277ms ± 6% -6.50% (p=0.026 n=6+6)
name old user-time/op new user-time/op delta
Build-8 562ms ± 1% 558ms ± 3% ~ (p=0.329 n=5+6)
Note that I do not oppose using short names for both exported and
unexported names in the future for -tiny, since reversing of stack
traces will by design not work there. The code can be resurrected from
the git history if we want to improve -tiny that way in the future, as
we'd need to store state in header files again.
Another major cleanup we can do here is to no longer use the
garbledImports map. From a look at obfuscateImports, we hash a package's
import path with its action ID, much like exported names, so we can
simply re-do that hashing for the linker's -X flag.
garbledImports does have some logic to handle duplicate package names,
but it's worth noting that should not affect package paths, as they are
always unique. That area of code could probably do with some
simplification in the future, too.
While at it, make hashWith panic if either parameter is empty.
obfuscateImports was hashing the main package path without a salt due to
a bug, so we want to catch those in the future.
Finally, make some tiny spacing and typo tweaks to the README.
4 years ago
|
|
|
if len(salt) == 0 {
|
|
|
|
panic("hashWithCustomSalt: empty salt")
|
obfuscate unexported names like exported ones (#227)
In 90fa325da7, the obfuscation logic was changed to use hashes for
exported names, but incremental names starting at just one letter for
unexported names. Presumably, this was done for the sake of binary size.
I argue that this is not a good idea for the default mode for a number
of reasons:
1) It makes reversing of stack traces nearly impossible for unexported
names, since replacing an obfuscated name "c" with "originalName"
would trigger too many false positives by matching single characters.
2) Exported and unexported names aren't different. We need to know how
names were obfuscated at a later time in both cases, thanks to use
cases like -ldflags=-X. Using short names for one but not the other
doesn't make a lot of sense, and makes the logic inconsistent.
3) Shaving off three bytes for unexported names doesn't seem like a huge
deal for the default mode, when we already have -tiny to optimize for
size.
This saves us a bit of work, but most importantly, simplifies the
obfuscation state as we no longer need to carry privateNameMap between
the compile and link stages.
name old time/op new time/op delta
Build-8 153ms ± 2% 150ms ± 2% ~ (p=0.065 n=6+6)
name old bin-B new bin-B delta
Build-8 7.09M ± 0% 7.08M ± 0% -0.24% (p=0.002 n=6+6)
name old sys-time/op new sys-time/op delta
Build-8 296ms ± 5% 277ms ± 6% -6.50% (p=0.026 n=6+6)
name old user-time/op new user-time/op delta
Build-8 562ms ± 1% 558ms ± 3% ~ (p=0.329 n=5+6)
Note that I do not oppose using short names for both exported and
unexported names in the future for -tiny, since reversing of stack
traces will by design not work there. The code can be resurrected from
the git history if we want to improve -tiny that way in the future, as
we'd need to store state in header files again.
Another major cleanup we can do here is to no longer use the
garbledImports map. From a look at obfuscateImports, we hash a package's
import path with its action ID, much like exported names, so we can
simply re-do that hashing for the linker's -X flag.
garbledImports does have some logic to handle duplicate package names,
but it's worth noting that should not affect package paths, as they are
always unique. That area of code could probably do with some
simplification in the future, too.
While at it, make hashWith panic if either parameter is empty.
obfuscateImports was hashing the main package path without a salt due to
a bug, so we want to catch those in the future.
Finally, make some tiny spacing and typo tweaks to the README.
4 years ago
|
|
|
}
|
|
|
|
if name == "" {
|
|
|
|
panic("hashWithCustomSalt: empty name")
|
obfuscate unexported names like exported ones (#227)
In 90fa325da7, the obfuscation logic was changed to use hashes for
exported names, but incremental names starting at just one letter for
unexported names. Presumably, this was done for the sake of binary size.
I argue that this is not a good idea for the default mode for a number
of reasons:
1) It makes reversing of stack traces nearly impossible for unexported
names, since replacing an obfuscated name "c" with "originalName"
would trigger too many false positives by matching single characters.
2) Exported and unexported names aren't different. We need to know how
names were obfuscated at a later time in both cases, thanks to use
cases like -ldflags=-X. Using short names for one but not the other
doesn't make a lot of sense, and makes the logic inconsistent.
3) Shaving off three bytes for unexported names doesn't seem like a huge
deal for the default mode, when we already have -tiny to optimize for
size.
This saves us a bit of work, but most importantly, simplifies the
obfuscation state as we no longer need to carry privateNameMap between
the compile and link stages.
name old time/op new time/op delta
Build-8 153ms ± 2% 150ms ± 2% ~ (p=0.065 n=6+6)
name old bin-B new bin-B delta
Build-8 7.09M ± 0% 7.08M ± 0% -0.24% (p=0.002 n=6+6)
name old sys-time/op new sys-time/op delta
Build-8 296ms ± 5% 277ms ± 6% -6.50% (p=0.026 n=6+6)
name old user-time/op new user-time/op delta
Build-8 562ms ± 1% 558ms ± 3% ~ (p=0.329 n=5+6)
Note that I do not oppose using short names for both exported and
unexported names in the future for -tiny, since reversing of stack
traces will by design not work there. The code can be resurrected from
the git history if we want to improve -tiny that way in the future, as
we'd need to store state in header files again.
Another major cleanup we can do here is to no longer use the
garbledImports map. From a look at obfuscateImports, we hash a package's
import path with its action ID, much like exported names, so we can
simply re-do that hashing for the linker's -X flag.
garbledImports does have some logic to handle duplicate package names,
but it's worth noting that should not affect package paths, as they are
always unique. That area of code could probably do with some
simplification in the future, too.
While at it, make hashWith panic if either parameter is empty.
obfuscateImports was hashing the main package path without a salt due to
a bug, so we want to catch those in the future.
Finally, make some tiny spacing and typo tweaks to the README.
4 years ago
|
|
|
}
|
immprove how hashWithCustomSalt comes up with its random lengths
The last change made it so that hashWithCustomSalt does not always end
up with 8 base64 characters, which is a good change for the sake of
avoiding easy patterns in obfuscated code.
However, the docs weren't updated accordingly, and it wasn't
particularly clear that the byte giving us randomness wasn't part of the
resulting base64-encoded name.
First, refactor the code to only feed as many checksum bytes to the
base64 encoder as necessary, which is 12.
This shrinks b64NameBuffer and saves us some base64 encoding work.
Second, use the first checksum byte that we don't use, the 13th,
as the source of the randomness.
Note how before we used a base64-encoded byte for the randomness,
which isn't great as that byte was only one of 63 characters,
whereas a checksum byte is one of 256.
Third, update the docs so that the code is as clear as possible.
This is particularly important given that we have no tests.
With debug prints in the gogarble.txt test, we can see that the
randomness in hash lengths is working as intended:
# test/main/stdimporter
hashLength = 13
hashLength = 8
hashLength = 12
hashLength = 15
hashLength = 10
hashLength = 15
hashLength = 9
hashLength = 8
hashLength = 15
hashLength = 15
hashLength = 12
hashLength = 10
hashLength = 13
hashLength = 13
hashLength = 8
hashLength = 15
hashLength = 11
Finally, add a regression test that will complain if we end up with
hashed names that reuse the same length too often.
Out of eight hashed names, the test will fail if six end up with the
same length, as that is incredibly unlikely given that each should pick
one of eight lengths with a fair distribution.
2 years ago
|
|
|
|
|
|
|
hasher.Reset()
|
|
|
|
hasher.Write(salt)
|
|
|
|
hasher.Write(flagSeed.bytes)
|
|
|
|
io.WriteString(hasher, name)
|
immprove how hashWithCustomSalt comes up with its random lengths
The last change made it so that hashWithCustomSalt does not always end
up with 8 base64 characters, which is a good change for the sake of
avoiding easy patterns in obfuscated code.
However, the docs weren't updated accordingly, and it wasn't
particularly clear that the byte giving us randomness wasn't part of the
resulting base64-encoded name.
First, refactor the code to only feed as many checksum bytes to the
base64 encoder as necessary, which is 12.
This shrinks b64NameBuffer and saves us some base64 encoding work.
Second, use the first checksum byte that we don't use, the 13th,
as the source of the randomness.
Note how before we used a base64-encoded byte for the randomness,
which isn't great as that byte was only one of 63 characters,
whereas a checksum byte is one of 256.
Third, update the docs so that the code is as clear as possible.
This is particularly important given that we have no tests.
With debug prints in the gogarble.txt test, we can see that the
randomness in hash lengths is working as intended:
# test/main/stdimporter
hashLength = 13
hashLength = 8
hashLength = 12
hashLength = 15
hashLength = 10
hashLength = 15
hashLength = 9
hashLength = 8
hashLength = 15
hashLength = 15
hashLength = 12
hashLength = 10
hashLength = 13
hashLength = 13
hashLength = 8
hashLength = 15
hashLength = 11
Finally, add a regression test that will complain if we end up with
hashed names that reuse the same length too often.
Out of eight hashed names, the test will fail if six end up with the
same length, as that is incredibly unlikely given that each should pick
one of eight lengths with a fair distribution.
2 years ago
|
|
|
sum := hasher.Sum(sumBuffer[:0])
|
|
|
|
|
immprove how hashWithCustomSalt comes up with its random lengths
The last change made it so that hashWithCustomSalt does not always end
up with 8 base64 characters, which is a good change for the sake of
avoiding easy patterns in obfuscated code.
However, the docs weren't updated accordingly, and it wasn't
particularly clear that the byte giving us randomness wasn't part of the
resulting base64-encoded name.
First, refactor the code to only feed as many checksum bytes to the
base64 encoder as necessary, which is 12.
This shrinks b64NameBuffer and saves us some base64 encoding work.
Second, use the first checksum byte that we don't use, the 13th,
as the source of the randomness.
Note how before we used a base64-encoded byte for the randomness,
which isn't great as that byte was only one of 63 characters,
whereas a checksum byte is one of 256.
Third, update the docs so that the code is as clear as possible.
This is particularly important given that we have no tests.
With debug prints in the gogarble.txt test, we can see that the
randomness in hash lengths is working as intended:
# test/main/stdimporter
hashLength = 13
hashLength = 8
hashLength = 12
hashLength = 15
hashLength = 10
hashLength = 15
hashLength = 9
hashLength = 8
hashLength = 15
hashLength = 15
hashLength = 12
hashLength = 10
hashLength = 13
hashLength = 13
hashLength = 8
hashLength = 15
hashLength = 11
Finally, add a regression test that will complain if we end up with
hashed names that reuse the same length too often.
Out of eight hashed names, the test will fail if six end up with the
same length, as that is incredibly unlikely given that each should pick
one of eight lengths with a fair distribution.
2 years ago
|
|
|
// The byte after neededSumBytes is never used as part of the name,
|
|
|
|
// but it is still deterministic and hard to predict,
|
|
|
|
// so it provides us with useful randomness between 0 and 255.
|
|
|
|
// We want the number to be between 0 and hashLenthRange-1 as well,
|
|
|
|
// so we use a remainder operation.
|
|
|
|
hashLengthRandomness := sum[neededSumBytes] % ((maxHashLength - minHashLength) + 1)
|
|
|
|
hashLength := minHashLength + hashLengthRandomness
|
immprove how hashWithCustomSalt comes up with its random lengths
The last change made it so that hashWithCustomSalt does not always end
up with 8 base64 characters, which is a good change for the sake of
avoiding easy patterns in obfuscated code.
However, the docs weren't updated accordingly, and it wasn't
particularly clear that the byte giving us randomness wasn't part of the
resulting base64-encoded name.
First, refactor the code to only feed as many checksum bytes to the
base64 encoder as necessary, which is 12.
This shrinks b64NameBuffer and saves us some base64 encoding work.
Second, use the first checksum byte that we don't use, the 13th,
as the source of the randomness.
Note how before we used a base64-encoded byte for the randomness,
which isn't great as that byte was only one of 63 characters,
whereas a checksum byte is one of 256.
Third, update the docs so that the code is as clear as possible.
This is particularly important given that we have no tests.
With debug prints in the gogarble.txt test, we can see that the
randomness in hash lengths is working as intended:
# test/main/stdimporter
hashLength = 13
hashLength = 8
hashLength = 12
hashLength = 15
hashLength = 10
hashLength = 15
hashLength = 9
hashLength = 8
hashLength = 15
hashLength = 15
hashLength = 12
hashLength = 10
hashLength = 13
hashLength = 13
hashLength = 8
hashLength = 15
hashLength = 11
Finally, add a regression test that will complain if we end up with
hashed names that reuse the same length too often.
Out of eight hashed names, the test will fail if six end up with the
same length, as that is incredibly unlikely given that each should pick
one of eight lengths with a fair distribution.
2 years ago
|
|
|
|
|
|
|
nameBase64.Encode(b64NameBuffer[:], sum[:neededSumBytes])
|
|
|
|
b64Name := b64NameBuffer[:hashLength]
|
|
|
|
|
|
|
|
// Even if we are hashing a package path, which is not an identifier,
|
|
|
|
// we still want the result to be a valid identifier,
|
|
|
|
// since we'll use it as the package name too.
|
|
|
|
if isDigit(b64Name[0]) {
|
|
|
|
// Turn "3foo" into "Dfoo".
|
|
|
|
// Similar to toLower, since uppercase letters go after digits
|
|
|
|
// in the ASCII table.
|
|
|
|
b64Name[0] += 'A' - '0'
|
|
|
|
}
|
|
|
|
for i, b := range b64Name {
|
|
|
|
if b == '-' { // URL encoding uses dashes, which aren't valid
|
|
|
|
b64Name[i] = 'a'
|
|
|
|
}
|
|
|
|
}
|
|
|
|
// Valid identifiers should stay exported or unexported.
|
|
|
|
if token.IsIdentifier(name) {
|
|
|
|
if token.IsExported(name) {
|
|
|
|
if b64Name[0] == '_' {
|
|
|
|
// Turn "_foo" into "Zfoo".
|
|
|
|
b64Name[0] = 'Z'
|
|
|
|
} else if isLower(b64Name[0]) {
|
|
|
|
// Turn "afoo" into "Afoo".
|
|
|
|
b64Name[0] = toUpper(b64Name[0])
|
|
|
|
}
|
obfuscate all names used in reflection
Go code can retrieve and use field and method names via the `reflect` package.
For that reason, historically we did not obfuscate names of fields and methods
underneath types that we detected as used for reflection, via e.g. `reflect.TypeOf`.
However, that caused a number of issues. Since we obfuscate and build one package
at a time, we could only detect when types were used for reflection in their own package
or in upstream packages. Use of reflection in downstream packages would be detected
too late, causing one package to obfuscate the names and the other not to, leading to a build failure.
A different approach is implemented here. All names are obfuscated now, but we collect
those types used for reflection, and at the end of a build in `package main`,
we inject a function into the runtime's `internal/abi` package to reverse the obfuscation
for those names which can be used for reflection.
This does mean that the obfuscation for these names is very weak, as the binary
contains a one-to-one mapping to their original names, but they cannot be obfuscated
without breaking too many Go packages out in the wild. There is also some amount
of overhead in `internal/abi` due to this, but we aim to make the overhead insignificant.
Fixes #884, #799, #817, #881, #858, #843, #842
Closes #406
4 months ago
|
|
|
} else if isUpper(b64Name[0]) {
|
|
|
|
// Turn "Afoo" into "afoo".
|
|
|
|
b64Name[0] = toLower(b64Name[0])
|
|
|
|
}
|
simplify globals, split hash.go (#191)
The previous globals worked, but were unnecessarily complex. For
example, we passed the fromPath variable around, but it's really a
static global, since we only compile or link a single package in each Go
process. Use such global variables instead of passing them around, which
currently include the package's import path, its build ID, and its
import config path.
Also split all the hashing and build ID code into hash.go, since that's
a relatively well contained 200 lines of code that doesn't need to make
main.go any bigger. We also split the code to alter Go's own version to
a separate function, so that it can be moved out of main.go as well.
4 years ago
|
|
|
}
|
|
|
|
return string(b64Name)
|
simplify globals, split hash.go (#191)
The previous globals worked, but were unnecessarily complex. For
example, we passed the fromPath variable around, but it's really a
static global, since we only compile or link a single package in each Go
process. Use such global variables instead of passing them around, which
currently include the package's import path, its build ID, and its
import config path.
Also split all the hashing and build ID code into hash.go, since that's
a relatively well contained 200 lines of code that doesn't need to make
main.go any bigger. We also split the code to alter Go's own version to
a separate function, so that it can be moved out of main.go as well.
4 years ago
|
|
|
}
|