simplify globals, split hash.go (#191)
The previous globals worked, but were unnecessarily complex. For
example, we passed the fromPath variable around, but it's really a
static global, since we only compile or link a single package in each Go
process. Use such global variables instead of passing them around, which
currently include the package's import path, its build ID, and its
import config path.
Also split all the hashing and build ID code into hash.go, since that's
a relatively well contained 200 lines of code that doesn't need to make
main.go any bigger. We also split the code to alter Go's own version to
a separate function, so that it can be moved out of main.go as well.
4 years ago
|
|
|
// Copyright (c) 2019, The Garble Authors.
|
|
|
|
// See LICENSE for licensing information.
|
|
|
|
|
|
|
|
package main
|
|
|
|
|
|
|
|
import (
|
|
|
|
"bytes"
|
|
|
|
"crypto/sha256"
|
|
|
|
"encoding/base64"
|
|
|
|
"fmt"
|
|
|
|
"go/token"
|
|
|
|
"io"
|
|
|
|
"os/exec"
|
|
|
|
"strings"
|
|
|
|
)
|
|
|
|
|
|
|
|
const buildIDSeparator = "/"
|
|
|
|
|
start using original action IDs (#251)
When we obfuscate a name, what we do is hash the name with the action ID
of the package that contains the name. To ensure that the hash changes
if the garble tool changes, we used the action ID of the obfuscated
build, which is different than the original action ID, as we include
garble's own content ID in "go tool compile -V=full" via -toolexec.
Let's call that the "obfuscated action ID". Remember that a content ID
is roughly the hash of a binary or object file, and an action ID
contains the hash of a package's source code plus the content IDs of its
dependencies.
This had the advantage that it did what we wanted. However, it had one
massive drawback: when we compile a package, we only have the obfuscated
action IDs of its dependencies. This is because one can't have the
content ID of dependent packages before they are built.
Usually, this is not a problem, because hashing a foreign name means it
comes from a dependency, where we already have the obfuscated action ID.
However, that's not always the case.
First, go:linkname directives can point to any symbol that ends up in
the binary, even if the package is not a dependency. So garble could
only support linkname targets belonging to dependencies. This is at the
root of why we could not obfuscate the runtime; it contains linkname
directives targeting the net package, for example, which depends on runtime.
Second, some other places did not have an easy access to obfuscated
action IDs, like transformAsm, which had to recover it from a temporary
file stored by transformCompile.
Plus, this was all pretty expensive, as each toolexec sub-process had to
make repeated calls to buildidOf with the object files of dependencies.
We even had to use extra calls to "go list" in the case of indirect
dependencies, as their export files do not appear in importcfg files.
All in all, the old method was complex and expensive. A better mechanism
is to use the original action IDs directly, as listed by "go list"
without garble in the picture.
This would mean that the hashing does not change if garble changes,
meaning weaker obfuscation. To regain that property, we define the
"garble action ID", which is just the original action ID hashed together
with garble's own content ID.
This is practically the same as the obfuscated build ID we used before,
but since it doesn't go through "go tool compile -V=full" and the
obfuscated build itself, we can work out *all* the garble action IDs
upfront, before the obfuscated build even starts.
This fixes all of our problems. Now we know all garble build IDs
upfront, so a bunch of hacks can be entirely removed. Plus, since we
know them upfront, we can also cache them and avoid repeated calls to
"go tool buildid".
While at it, make use of the new BuildID field in Go 1.16's "list -json
-export". This avoids the vast majority of "go tool buildid" calls, as
the only ones that remain are 2 on the garble binary itself.
The numbers for Go 1.16 look very good:
name old time/op new time/op delta
Build-8 146ms ± 4% 101ms ± 1% -31.01% (p=0.002 n=6+6)
name old bin-B new bin-B delta
Build-8 6.61M ± 0% 6.60M ± 0% -0.09% (p=0.002 n=6+6)
name old sys-time/op new sys-time/op delta
Build-8 321ms ± 7% 202ms ± 6% -37.11% (p=0.002 n=6+6)
name old user-time/op new user-time/op delta
Build-8 538ms ± 4% 414ms ± 4% -23.12% (p=0.002 n=6+6)
4 years ago
|
|
|
// splitActionID returns the action ID half of a build ID, the first component.
|
fix garbling names belonging to indirect imports (#203)
main.go includes a lengthy comment that documents this edge case, why it
happened, and how we are fixing it. To summarize, we should no longer
error with a build error in those cases. Read the comment for details.
A few other minor changes were done to allow writing this patch.
First, the actionID and contentID funcs were renamed, since they started
to collide with variable names.
Second, the logging has been improved a bit, which allowed me to debug
the issue.
Third, the "cache" global shared by all garble sub-processes now
includes the necessary parameters to run "go list -toolexec", including
the path to garble and the build flags being used.
Thanks to lu4p for writing a test case, which also applied gofmt to that
testdata Go file.
Fixes #180.
Closes #181, since it includes its test case.
4 years ago
|
|
|
func splitActionID(buildID string) string {
|
avoid one more call to 'go tool buildid' (#253)
We use it to get the content ID of garble's binary, which is used for
both the garble action IDs, as well as 'go tool compile -V=full'.
Since those two happen in separate processes, both used to call 'go tool
buildid' separately. Store it in the gob cache the first time, and reuse
it the second time.
Since each call to cmd/go costs about 10ms (new process, running its
many init funcs, etc), this results in a nice speed-up for our small
benchmark. Most builds will take many seconds though, so note that a
~15ms speedup there will likely not be noticeable.
While at it, simplify the buildInfo global, as now it just contains a
map representation of the -importcfg contents. It now has better names,
docs, and a simpler representation.
We also stop using the term "garbled import", as it was a bit confusing.
"obfuscated types.Package" is a much better description.
name old time/op new time/op delta
Build-8 106ms ± 1% 92ms ± 0% -14.07% (p=0.010 n=6+4)
name old bin-B new bin-B delta
Build-8 6.60M ± 0% 6.60M ± 0% -0.01% (p=0.002 n=6+6)
name old sys-time/op new sys-time/op delta
Build-8 208ms ± 5% 149ms ± 3% -28.27% (p=0.004 n=6+5)
name old user-time/op new user-time/op delta
Build-8 433ms ± 3% 384ms ± 3% -11.35% (p=0.002 n=6+6)
4 years ago
|
|
|
return buildID[:strings.Index(buildID, buildIDSeparator)]
|
simplify globals, split hash.go (#191)
The previous globals worked, but were unnecessarily complex. For
example, we passed the fromPath variable around, but it's really a
static global, since we only compile or link a single package in each Go
process. Use such global variables instead of passing them around, which
currently include the package's import path, its build ID, and its
import config path.
Also split all the hashing and build ID code into hash.go, since that's
a relatively well contained 200 lines of code that doesn't need to make
main.go any bigger. We also split the code to alter Go's own version to
a separate function, so that it can be moved out of main.go as well.
4 years ago
|
|
|
}
|
|
|
|
|
start using original action IDs (#251)
When we obfuscate a name, what we do is hash the name with the action ID
of the package that contains the name. To ensure that the hash changes
if the garble tool changes, we used the action ID of the obfuscated
build, which is different than the original action ID, as we include
garble's own content ID in "go tool compile -V=full" via -toolexec.
Let's call that the "obfuscated action ID". Remember that a content ID
is roughly the hash of a binary or object file, and an action ID
contains the hash of a package's source code plus the content IDs of its
dependencies.
This had the advantage that it did what we wanted. However, it had one
massive drawback: when we compile a package, we only have the obfuscated
action IDs of its dependencies. This is because one can't have the
content ID of dependent packages before they are built.
Usually, this is not a problem, because hashing a foreign name means it
comes from a dependency, where we already have the obfuscated action ID.
However, that's not always the case.
First, go:linkname directives can point to any symbol that ends up in
the binary, even if the package is not a dependency. So garble could
only support linkname targets belonging to dependencies. This is at the
root of why we could not obfuscate the runtime; it contains linkname
directives targeting the net package, for example, which depends on runtime.
Second, some other places did not have an easy access to obfuscated
action IDs, like transformAsm, which had to recover it from a temporary
file stored by transformCompile.
Plus, this was all pretty expensive, as each toolexec sub-process had to
make repeated calls to buildidOf with the object files of dependencies.
We even had to use extra calls to "go list" in the case of indirect
dependencies, as their export files do not appear in importcfg files.
All in all, the old method was complex and expensive. A better mechanism
is to use the original action IDs directly, as listed by "go list"
without garble in the picture.
This would mean that the hashing does not change if garble changes,
meaning weaker obfuscation. To regain that property, we define the
"garble action ID", which is just the original action ID hashed together
with garble's own content ID.
This is practically the same as the obfuscated build ID we used before,
but since it doesn't go through "go tool compile -V=full" and the
obfuscated build itself, we can work out *all* the garble action IDs
upfront, before the obfuscated build even starts.
This fixes all of our problems. Now we know all garble build IDs
upfront, so a bunch of hacks can be entirely removed. Plus, since we
know them upfront, we can also cache them and avoid repeated calls to
"go tool buildid".
While at it, make use of the new BuildID field in Go 1.16's "list -json
-export". This avoids the vast majority of "go tool buildid" calls, as
the only ones that remain are 2 on the garble binary itself.
The numbers for Go 1.16 look very good:
name old time/op new time/op delta
Build-8 146ms ± 4% 101ms ± 1% -31.01% (p=0.002 n=6+6)
name old bin-B new bin-B delta
Build-8 6.61M ± 0% 6.60M ± 0% -0.09% (p=0.002 n=6+6)
name old sys-time/op new sys-time/op delta
Build-8 321ms ± 7% 202ms ± 6% -37.11% (p=0.002 n=6+6)
name old user-time/op new user-time/op delta
Build-8 538ms ± 4% 414ms ± 4% -23.12% (p=0.002 n=6+6)
4 years ago
|
|
|
// splitContentID returns the content ID half of a build ID, the last component.
|
fix garbling names belonging to indirect imports (#203)
main.go includes a lengthy comment that documents this edge case, why it
happened, and how we are fixing it. To summarize, we should no longer
error with a build error in those cases. Read the comment for details.
A few other minor changes were done to allow writing this patch.
First, the actionID and contentID funcs were renamed, since they started
to collide with variable names.
Second, the logging has been improved a bit, which allowed me to debug
the issue.
Third, the "cache" global shared by all garble sub-processes now
includes the necessary parameters to run "go list -toolexec", including
the path to garble and the build flags being used.
Thanks to lu4p for writing a test case, which also applied gofmt to that
testdata Go file.
Fixes #180.
Closes #181, since it includes its test case.
4 years ago
|
|
|
func splitContentID(buildID string) string {
|
simplify globals, split hash.go (#191)
The previous globals worked, but were unnecessarily complex. For
example, we passed the fromPath variable around, but it's really a
static global, since we only compile or link a single package in each Go
process. Use such global variables instead of passing them around, which
currently include the package's import path, its build ID, and its
import config path.
Also split all the hashing and build ID code into hash.go, since that's
a relatively well contained 200 lines of code that doesn't need to make
main.go any bigger. We also split the code to alter Go's own version to
a separate function, so that it can be moved out of main.go as well.
4 years ago
|
|
|
return buildID[strings.LastIndex(buildID, buildIDSeparator)+1:]
|
|
|
|
}
|
|
|
|
|
start using original action IDs (#251)
When we obfuscate a name, what we do is hash the name with the action ID
of the package that contains the name. To ensure that the hash changes
if the garble tool changes, we used the action ID of the obfuscated
build, which is different than the original action ID, as we include
garble's own content ID in "go tool compile -V=full" via -toolexec.
Let's call that the "obfuscated action ID". Remember that a content ID
is roughly the hash of a binary or object file, and an action ID
contains the hash of a package's source code plus the content IDs of its
dependencies.
This had the advantage that it did what we wanted. However, it had one
massive drawback: when we compile a package, we only have the obfuscated
action IDs of its dependencies. This is because one can't have the
content ID of dependent packages before they are built.
Usually, this is not a problem, because hashing a foreign name means it
comes from a dependency, where we already have the obfuscated action ID.
However, that's not always the case.
First, go:linkname directives can point to any symbol that ends up in
the binary, even if the package is not a dependency. So garble could
only support linkname targets belonging to dependencies. This is at the
root of why we could not obfuscate the runtime; it contains linkname
directives targeting the net package, for example, which depends on runtime.
Second, some other places did not have an easy access to obfuscated
action IDs, like transformAsm, which had to recover it from a temporary
file stored by transformCompile.
Plus, this was all pretty expensive, as each toolexec sub-process had to
make repeated calls to buildidOf with the object files of dependencies.
We even had to use extra calls to "go list" in the case of indirect
dependencies, as their export files do not appear in importcfg files.
All in all, the old method was complex and expensive. A better mechanism
is to use the original action IDs directly, as listed by "go list"
without garble in the picture.
This would mean that the hashing does not change if garble changes,
meaning weaker obfuscation. To regain that property, we define the
"garble action ID", which is just the original action ID hashed together
with garble's own content ID.
This is practically the same as the obfuscated build ID we used before,
but since it doesn't go through "go tool compile -V=full" and the
obfuscated build itself, we can work out *all* the garble action IDs
upfront, before the obfuscated build even starts.
This fixes all of our problems. Now we know all garble build IDs
upfront, so a bunch of hacks can be entirely removed. Plus, since we
know them upfront, we can also cache them and avoid repeated calls to
"go tool buildid".
While at it, make use of the new BuildID field in Go 1.16's "list -json
-export". This avoids the vast majority of "go tool buildid" calls, as
the only ones that remain are 2 on the garble binary itself.
The numbers for Go 1.16 look very good:
name old time/op new time/op delta
Build-8 146ms ± 4% 101ms ± 1% -31.01% (p=0.002 n=6+6)
name old bin-B new bin-B delta
Build-8 6.61M ± 0% 6.60M ± 0% -0.09% (p=0.002 n=6+6)
name old sys-time/op new sys-time/op delta
Build-8 321ms ± 7% 202ms ± 6% -37.11% (p=0.002 n=6+6)
name old user-time/op new user-time/op delta
Build-8 538ms ± 4% 414ms ± 4% -23.12% (p=0.002 n=6+6)
4 years ago
|
|
|
// decodeHash is the opposite of hashToString, with a panic for error handling
|
|
|
|
// since it should never happen.
|
simplify globals, split hash.go (#191)
The previous globals worked, but were unnecessarily complex. For
example, we passed the fromPath variable around, but it's really a
static global, since we only compile or link a single package in each Go
process. Use such global variables instead of passing them around, which
currently include the package's import path, its build ID, and its
import config path.
Also split all the hashing and build ID code into hash.go, since that's
a relatively well contained 200 lines of code that doesn't need to make
main.go any bigger. We also split the code to alter Go's own version to
a separate function, so that it can be moved out of main.go as well.
4 years ago
|
|
|
func decodeHash(str string) []byte {
|
|
|
|
h, err := base64.RawURLEncoding.DecodeString(str)
|
|
|
|
if err != nil {
|
|
|
|
panic(fmt.Sprintf("invalid hash %q: %v", str, err))
|
|
|
|
}
|
|
|
|
return h
|
|
|
|
}
|
|
|
|
|
|
|
|
func alterToolVersion(tool string, args []string) error {
|
|
|
|
cmd := exec.Command(args[0], args[1:]...)
|
|
|
|
out, err := cmd.Output()
|
|
|
|
if err != nil {
|
|
|
|
if err, _ := err.(*exec.ExitError); err != nil {
|
|
|
|
return fmt.Errorf("%v: %s", err, err.Stderr)
|
|
|
|
}
|
|
|
|
return err
|
|
|
|
}
|
|
|
|
line := string(bytes.TrimSpace(out)) // no trailing newline
|
|
|
|
f := strings.Fields(line)
|
|
|
|
if len(f) < 3 || f[0] != tool || f[1] != "version" || f[2] == "devel" && !strings.HasPrefix(f[len(f)-1], "buildID=") {
|
|
|
|
return fmt.Errorf("%s -V=full: unexpected output:\n\t%s", args[0], line)
|
|
|
|
}
|
|
|
|
var toolID []byte
|
|
|
|
if f[2] == "devel" {
|
|
|
|
// On the development branch, use the content ID part of the build ID.
|
fix garbling names belonging to indirect imports (#203)
main.go includes a lengthy comment that documents this edge case, why it
happened, and how we are fixing it. To summarize, we should no longer
error with a build error in those cases. Read the comment for details.
A few other minor changes were done to allow writing this patch.
First, the actionID and contentID funcs were renamed, since they started
to collide with variable names.
Second, the logging has been improved a bit, which allowed me to debug
the issue.
Third, the "cache" global shared by all garble sub-processes now
includes the necessary parameters to run "go list -toolexec", including
the path to garble and the build flags being used.
Thanks to lu4p for writing a test case, which also applied gofmt to that
testdata Go file.
Fixes #180.
Closes #181, since it includes its test case.
4 years ago
|
|
|
toolID = decodeHash(splitContentID(f[len(f)-1]))
|
simplify globals, split hash.go (#191)
The previous globals worked, but were unnecessarily complex. For
example, we passed the fromPath variable around, but it's really a
static global, since we only compile or link a single package in each Go
process. Use such global variables instead of passing them around, which
currently include the package's import path, its build ID, and its
import config path.
Also split all the hashing and build ID code into hash.go, since that's
a relatively well contained 200 lines of code that doesn't need to make
main.go any bigger. We also split the code to alter Go's own version to
a separate function, so that it can be moved out of main.go as well.
4 years ago
|
|
|
} else {
|
|
|
|
// For a release, the output is like: "compile version go1.9.1 X:framepointer".
|
start using original action IDs (#251)
When we obfuscate a name, what we do is hash the name with the action ID
of the package that contains the name. To ensure that the hash changes
if the garble tool changes, we used the action ID of the obfuscated
build, which is different than the original action ID, as we include
garble's own content ID in "go tool compile -V=full" via -toolexec.
Let's call that the "obfuscated action ID". Remember that a content ID
is roughly the hash of a binary or object file, and an action ID
contains the hash of a package's source code plus the content IDs of its
dependencies.
This had the advantage that it did what we wanted. However, it had one
massive drawback: when we compile a package, we only have the obfuscated
action IDs of its dependencies. This is because one can't have the
content ID of dependent packages before they are built.
Usually, this is not a problem, because hashing a foreign name means it
comes from a dependency, where we already have the obfuscated action ID.
However, that's not always the case.
First, go:linkname directives can point to any symbol that ends up in
the binary, even if the package is not a dependency. So garble could
only support linkname targets belonging to dependencies. This is at the
root of why we could not obfuscate the runtime; it contains linkname
directives targeting the net package, for example, which depends on runtime.
Second, some other places did not have an easy access to obfuscated
action IDs, like transformAsm, which had to recover it from a temporary
file stored by transformCompile.
Plus, this was all pretty expensive, as each toolexec sub-process had to
make repeated calls to buildidOf with the object files of dependencies.
We even had to use extra calls to "go list" in the case of indirect
dependencies, as their export files do not appear in importcfg files.
All in all, the old method was complex and expensive. A better mechanism
is to use the original action IDs directly, as listed by "go list"
without garble in the picture.
This would mean that the hashing does not change if garble changes,
meaning weaker obfuscation. To regain that property, we define the
"garble action ID", which is just the original action ID hashed together
with garble's own content ID.
This is practically the same as the obfuscated build ID we used before,
but since it doesn't go through "go tool compile -V=full" and the
obfuscated build itself, we can work out *all* the garble action IDs
upfront, before the obfuscated build even starts.
This fixes all of our problems. Now we know all garble build IDs
upfront, so a bunch of hacks can be entirely removed. Plus, since we
know them upfront, we can also cache them and avoid repeated calls to
"go tool buildid".
While at it, make use of the new BuildID field in Go 1.16's "list -json
-export". This avoids the vast majority of "go tool buildid" calls, as
the only ones that remain are 2 on the garble binary itself.
The numbers for Go 1.16 look very good:
name old time/op new time/op delta
Build-8 146ms ± 4% 101ms ± 1% -31.01% (p=0.002 n=6+6)
name old bin-B new bin-B delta
Build-8 6.61M ± 0% 6.60M ± 0% -0.09% (p=0.002 n=6+6)
name old sys-time/op new sys-time/op delta
Build-8 321ms ± 7% 202ms ± 6% -37.11% (p=0.002 n=6+6)
name old user-time/op new user-time/op delta
Build-8 538ms ± 4% 414ms ± 4% -23.12% (p=0.002 n=6+6)
4 years ago
|
|
|
// Use the whole line, as we can assume it's unique.
|
simplify globals, split hash.go (#191)
The previous globals worked, but were unnecessarily complex. For
example, we passed the fromPath variable around, but it's really a
static global, since we only compile or link a single package in each Go
process. Use such global variables instead of passing them around, which
currently include the package's import path, its build ID, and its
import config path.
Also split all the hashing and build ID code into hash.go, since that's
a relatively well contained 200 lines of code that doesn't need to make
main.go any bigger. We also split the code to alter Go's own version to
a separate function, so that it can be moved out of main.go as well.
4 years ago
|
|
|
toolID = []byte(line)
|
|
|
|
}
|
|
|
|
|
|
|
|
contentID, err := ownContentID(toolID)
|
|
|
|
if err != nil {
|
|
|
|
return fmt.Errorf("cannot obtain garble's own version: %v", err)
|
|
|
|
}
|
|
|
|
// The part of the build ID that matters is the last, since it's the
|
|
|
|
// "content ID" which is used to work out whether there is a need to redo
|
|
|
|
// the action (build) or not. Since cmd/go parses the last word in the
|
|
|
|
// output as "buildID=...", we simply add "+garble buildID=_/_/_/${hash}".
|
|
|
|
// The slashes let us imitate a full binary build ID, but we assume that
|
|
|
|
// the other components such as the action ID are not necessary, since the
|
|
|
|
// only reader here is cmd/go and it only consumes the content ID.
|
start using original action IDs (#251)
When we obfuscate a name, what we do is hash the name with the action ID
of the package that contains the name. To ensure that the hash changes
if the garble tool changes, we used the action ID of the obfuscated
build, which is different than the original action ID, as we include
garble's own content ID in "go tool compile -V=full" via -toolexec.
Let's call that the "obfuscated action ID". Remember that a content ID
is roughly the hash of a binary or object file, and an action ID
contains the hash of a package's source code plus the content IDs of its
dependencies.
This had the advantage that it did what we wanted. However, it had one
massive drawback: when we compile a package, we only have the obfuscated
action IDs of its dependencies. This is because one can't have the
content ID of dependent packages before they are built.
Usually, this is not a problem, because hashing a foreign name means it
comes from a dependency, where we already have the obfuscated action ID.
However, that's not always the case.
First, go:linkname directives can point to any symbol that ends up in
the binary, even if the package is not a dependency. So garble could
only support linkname targets belonging to dependencies. This is at the
root of why we could not obfuscate the runtime; it contains linkname
directives targeting the net package, for example, which depends on runtime.
Second, some other places did not have an easy access to obfuscated
action IDs, like transformAsm, which had to recover it from a temporary
file stored by transformCompile.
Plus, this was all pretty expensive, as each toolexec sub-process had to
make repeated calls to buildidOf with the object files of dependencies.
We even had to use extra calls to "go list" in the case of indirect
dependencies, as their export files do not appear in importcfg files.
All in all, the old method was complex and expensive. A better mechanism
is to use the original action IDs directly, as listed by "go list"
without garble in the picture.
This would mean that the hashing does not change if garble changes,
meaning weaker obfuscation. To regain that property, we define the
"garble action ID", which is just the original action ID hashed together
with garble's own content ID.
This is practically the same as the obfuscated build ID we used before,
but since it doesn't go through "go tool compile -V=full" and the
obfuscated build itself, we can work out *all* the garble action IDs
upfront, before the obfuscated build even starts.
This fixes all of our problems. Now we know all garble build IDs
upfront, so a bunch of hacks can be entirely removed. Plus, since we
know them upfront, we can also cache them and avoid repeated calls to
"go tool buildid".
While at it, make use of the new BuildID field in Go 1.16's "list -json
-export". This avoids the vast majority of "go tool buildid" calls, as
the only ones that remain are 2 on the garble binary itself.
The numbers for Go 1.16 look very good:
name old time/op new time/op delta
Build-8 146ms ± 4% 101ms ± 1% -31.01% (p=0.002 n=6+6)
name old bin-B new bin-B delta
Build-8 6.61M ± 0% 6.60M ± 0% -0.09% (p=0.002 n=6+6)
name old sys-time/op new sys-time/op delta
Build-8 321ms ± 7% 202ms ± 6% -37.11% (p=0.002 n=6+6)
name old user-time/op new user-time/op delta
Build-8 538ms ± 4% 414ms ± 4% -23.12% (p=0.002 n=6+6)
4 years ago
|
|
|
fmt.Printf("%s +garble buildID=_/_/_/%s\n", line, hashToString(contentID))
|
simplify globals, split hash.go (#191)
The previous globals worked, but were unnecessarily complex. For
example, we passed the fromPath variable around, but it's really a
static global, since we only compile or link a single package in each Go
process. Use such global variables instead of passing them around, which
currently include the package's import path, its build ID, and its
import config path.
Also split all the hashing and build ID code into hash.go, since that's
a relatively well contained 200 lines of code that doesn't need to make
main.go any bigger. We also split the code to alter Go's own version to
a separate function, so that it can be moved out of main.go as well.
4 years ago
|
|
|
return nil
|
|
|
|
}
|
|
|
|
|
start using original action IDs (#251)
When we obfuscate a name, what we do is hash the name with the action ID
of the package that contains the name. To ensure that the hash changes
if the garble tool changes, we used the action ID of the obfuscated
build, which is different than the original action ID, as we include
garble's own content ID in "go tool compile -V=full" via -toolexec.
Let's call that the "obfuscated action ID". Remember that a content ID
is roughly the hash of a binary or object file, and an action ID
contains the hash of a package's source code plus the content IDs of its
dependencies.
This had the advantage that it did what we wanted. However, it had one
massive drawback: when we compile a package, we only have the obfuscated
action IDs of its dependencies. This is because one can't have the
content ID of dependent packages before they are built.
Usually, this is not a problem, because hashing a foreign name means it
comes from a dependency, where we already have the obfuscated action ID.
However, that's not always the case.
First, go:linkname directives can point to any symbol that ends up in
the binary, even if the package is not a dependency. So garble could
only support linkname targets belonging to dependencies. This is at the
root of why we could not obfuscate the runtime; it contains linkname
directives targeting the net package, for example, which depends on runtime.
Second, some other places did not have an easy access to obfuscated
action IDs, like transformAsm, which had to recover it from a temporary
file stored by transformCompile.
Plus, this was all pretty expensive, as each toolexec sub-process had to
make repeated calls to buildidOf with the object files of dependencies.
We even had to use extra calls to "go list" in the case of indirect
dependencies, as their export files do not appear in importcfg files.
All in all, the old method was complex and expensive. A better mechanism
is to use the original action IDs directly, as listed by "go list"
without garble in the picture.
This would mean that the hashing does not change if garble changes,
meaning weaker obfuscation. To regain that property, we define the
"garble action ID", which is just the original action ID hashed together
with garble's own content ID.
This is practically the same as the obfuscated build ID we used before,
but since it doesn't go through "go tool compile -V=full" and the
obfuscated build itself, we can work out *all* the garble action IDs
upfront, before the obfuscated build even starts.
This fixes all of our problems. Now we know all garble build IDs
upfront, so a bunch of hacks can be entirely removed. Plus, since we
know them upfront, we can also cache them and avoid repeated calls to
"go tool buildid".
While at it, make use of the new BuildID field in Go 1.16's "list -json
-export". This avoids the vast majority of "go tool buildid" calls, as
the only ones that remain are 2 on the garble binary itself.
The numbers for Go 1.16 look very good:
name old time/op new time/op delta
Build-8 146ms ± 4% 101ms ± 1% -31.01% (p=0.002 n=6+6)
name old bin-B new bin-B delta
Build-8 6.61M ± 0% 6.60M ± 0% -0.09% (p=0.002 n=6+6)
name old sys-time/op new sys-time/op delta
Build-8 321ms ± 7% 202ms ± 6% -37.11% (p=0.002 n=6+6)
name old user-time/op new user-time/op delta
Build-8 538ms ± 4% 414ms ± 4% -23.12% (p=0.002 n=6+6)
4 years ago
|
|
|
func ownContentID(toolID []byte) ([]byte, error) {
|
simplify globals, split hash.go (#191)
The previous globals worked, but were unnecessarily complex. For
example, we passed the fromPath variable around, but it's really a
static global, since we only compile or link a single package in each Go
process. Use such global variables instead of passing them around, which
currently include the package's import path, its build ID, and its
import config path.
Also split all the hashing and build ID code into hash.go, since that's
a relatively well contained 200 lines of code that doesn't need to make
main.go any bigger. We also split the code to alter Go's own version to
a separate function, so that it can be moved out of main.go as well.
4 years ago
|
|
|
// Join the two content IDs together into a single base64-encoded sha256
|
|
|
|
// sum. This includes the original tool's content ID, and garble's own
|
|
|
|
// content ID.
|
|
|
|
h := sha256.New()
|
|
|
|
h.Write(toolID)
|
avoid one more call to 'go tool buildid' (#253)
We use it to get the content ID of garble's binary, which is used for
both the garble action IDs, as well as 'go tool compile -V=full'.
Since those two happen in separate processes, both used to call 'go tool
buildid' separately. Store it in the gob cache the first time, and reuse
it the second time.
Since each call to cmd/go costs about 10ms (new process, running its
many init funcs, etc), this results in a nice speed-up for our small
benchmark. Most builds will take many seconds though, so note that a
~15ms speedup there will likely not be noticeable.
While at it, simplify the buildInfo global, as now it just contains a
map representation of the -importcfg contents. It now has better names,
docs, and a simpler representation.
We also stop using the term "garbled import", as it was a bit confusing.
"obfuscated types.Package" is a much better description.
name old time/op new time/op delta
Build-8 106ms ± 1% 92ms ± 0% -14.07% (p=0.010 n=6+4)
name old bin-B new bin-B delta
Build-8 6.60M ± 0% 6.60M ± 0% -0.01% (p=0.002 n=6+6)
name old sys-time/op new sys-time/op delta
Build-8 208ms ± 5% 149ms ± 3% -28.27% (p=0.004 n=6+5)
name old user-time/op new user-time/op delta
Build-8 433ms ± 3% 384ms ± 3% -11.35% (p=0.002 n=6+6)
4 years ago
|
|
|
if len(cache.BinaryContentID) == 0 {
|
|
|
|
panic("missing binary content ID")
|
|
|
|
}
|
|
|
|
h.Write(cache.BinaryContentID)
|
simplify globals, split hash.go (#191)
The previous globals worked, but were unnecessarily complex. For
example, we passed the fromPath variable around, but it's really a
static global, since we only compile or link a single package in each Go
process. Use such global variables instead of passing them around, which
currently include the package's import path, its build ID, and its
import config path.
Also split all the hashing and build ID code into hash.go, since that's
a relatively well contained 200 lines of code that doesn't need to make
main.go any bigger. We also split the code to alter Go's own version to
a separate function, so that it can be moved out of main.go as well.
4 years ago
|
|
|
|
|
|
|
// We also need to add the selected options to the full version string,
|
|
|
|
// because all of them result in different output. We use spaces to
|
|
|
|
// separate the env vars and flags, to reduce the chances of collisions.
|
|
|
|
if envGoPrivate != "" {
|
|
|
|
fmt.Fprintf(h, " GOPRIVATE=%s", envGoPrivate)
|
|
|
|
}
|
|
|
|
if opts.GarbleLiterals {
|
simplify globals, split hash.go (#191)
The previous globals worked, but were unnecessarily complex. For
example, we passed the fromPath variable around, but it's really a
static global, since we only compile or link a single package in each Go
process. Use such global variables instead of passing them around, which
currently include the package's import path, its build ID, and its
import config path.
Also split all the hashing and build ID code into hash.go, since that's
a relatively well contained 200 lines of code that doesn't need to make
main.go any bigger. We also split the code to alter Go's own version to
a separate function, so that it can be moved out of main.go as well.
4 years ago
|
|
|
fmt.Fprintf(h, " -literals")
|
|
|
|
}
|
|
|
|
if opts.Tiny {
|
simplify globals, split hash.go (#191)
The previous globals worked, but were unnecessarily complex. For
example, we passed the fromPath variable around, but it's really a
static global, since we only compile or link a single package in each Go
process. Use such global variables instead of passing them around, which
currently include the package's import path, its build ID, and its
import config path.
Also split all the hashing and build ID code into hash.go, since that's
a relatively well contained 200 lines of code that doesn't need to make
main.go any bigger. We also split the code to alter Go's own version to
a separate function, so that it can be moved out of main.go as well.
4 years ago
|
|
|
fmt.Fprintf(h, " -tiny")
|
|
|
|
}
|
|
|
|
if len(opts.Seed) > 0 {
|
|
|
|
fmt.Fprintf(h, " -seed=%x", opts.Seed)
|
simplify globals, split hash.go (#191)
The previous globals worked, but were unnecessarily complex. For
example, we passed the fromPath variable around, but it's really a
static global, since we only compile or link a single package in each Go
process. Use such global variables instead of passing them around, which
currently include the package's import path, its build ID, and its
import config path.
Also split all the hashing and build ID code into hash.go, since that's
a relatively well contained 200 lines of code that doesn't need to make
main.go any bigger. We also split the code to alter Go's own version to
a separate function, so that it can be moved out of main.go as well.
4 years ago
|
|
|
}
|
|
|
|
|
start using original action IDs (#251)
When we obfuscate a name, what we do is hash the name with the action ID
of the package that contains the name. To ensure that the hash changes
if the garble tool changes, we used the action ID of the obfuscated
build, which is different than the original action ID, as we include
garble's own content ID in "go tool compile -V=full" via -toolexec.
Let's call that the "obfuscated action ID". Remember that a content ID
is roughly the hash of a binary or object file, and an action ID
contains the hash of a package's source code plus the content IDs of its
dependencies.
This had the advantage that it did what we wanted. However, it had one
massive drawback: when we compile a package, we only have the obfuscated
action IDs of its dependencies. This is because one can't have the
content ID of dependent packages before they are built.
Usually, this is not a problem, because hashing a foreign name means it
comes from a dependency, where we already have the obfuscated action ID.
However, that's not always the case.
First, go:linkname directives can point to any symbol that ends up in
the binary, even if the package is not a dependency. So garble could
only support linkname targets belonging to dependencies. This is at the
root of why we could not obfuscate the runtime; it contains linkname
directives targeting the net package, for example, which depends on runtime.
Second, some other places did not have an easy access to obfuscated
action IDs, like transformAsm, which had to recover it from a temporary
file stored by transformCompile.
Plus, this was all pretty expensive, as each toolexec sub-process had to
make repeated calls to buildidOf with the object files of dependencies.
We even had to use extra calls to "go list" in the case of indirect
dependencies, as their export files do not appear in importcfg files.
All in all, the old method was complex and expensive. A better mechanism
is to use the original action IDs directly, as listed by "go list"
without garble in the picture.
This would mean that the hashing does not change if garble changes,
meaning weaker obfuscation. To regain that property, we define the
"garble action ID", which is just the original action ID hashed together
with garble's own content ID.
This is practically the same as the obfuscated build ID we used before,
but since it doesn't go through "go tool compile -V=full" and the
obfuscated build itself, we can work out *all* the garble action IDs
upfront, before the obfuscated build even starts.
This fixes all of our problems. Now we know all garble build IDs
upfront, so a bunch of hacks can be entirely removed. Plus, since we
know them upfront, we can also cache them and avoid repeated calls to
"go tool buildid".
While at it, make use of the new BuildID field in Go 1.16's "list -json
-export". This avoids the vast majority of "go tool buildid" calls, as
the only ones that remain are 2 on the garble binary itself.
The numbers for Go 1.16 look very good:
name old time/op new time/op delta
Build-8 146ms ± 4% 101ms ± 1% -31.01% (p=0.002 n=6+6)
name old bin-B new bin-B delta
Build-8 6.61M ± 0% 6.60M ± 0% -0.09% (p=0.002 n=6+6)
name old sys-time/op new sys-time/op delta
Build-8 321ms ± 7% 202ms ± 6% -37.11% (p=0.002 n=6+6)
name old user-time/op new user-time/op delta
Build-8 538ms ± 4% 414ms ± 4% -23.12% (p=0.002 n=6+6)
4 years ago
|
|
|
return h.Sum(nil)[:buildIDComponentLength], nil
|
simplify globals, split hash.go (#191)
The previous globals worked, but were unnecessarily complex. For
example, we passed the fromPath variable around, but it's really a
static global, since we only compile or link a single package in each Go
process. Use such global variables instead of passing them around, which
currently include the package's import path, its build ID, and its
import config path.
Also split all the hashing and build ID code into hash.go, since that's
a relatively well contained 200 lines of code that doesn't need to make
main.go any bigger. We also split the code to alter Go's own version to
a separate function, so that it can be moved out of main.go as well.
4 years ago
|
|
|
}
|
|
|
|
|
start using original action IDs (#251)
When we obfuscate a name, what we do is hash the name with the action ID
of the package that contains the name. To ensure that the hash changes
if the garble tool changes, we used the action ID of the obfuscated
build, which is different than the original action ID, as we include
garble's own content ID in "go tool compile -V=full" via -toolexec.
Let's call that the "obfuscated action ID". Remember that a content ID
is roughly the hash of a binary or object file, and an action ID
contains the hash of a package's source code plus the content IDs of its
dependencies.
This had the advantage that it did what we wanted. However, it had one
massive drawback: when we compile a package, we only have the obfuscated
action IDs of its dependencies. This is because one can't have the
content ID of dependent packages before they are built.
Usually, this is not a problem, because hashing a foreign name means it
comes from a dependency, where we already have the obfuscated action ID.
However, that's not always the case.
First, go:linkname directives can point to any symbol that ends up in
the binary, even if the package is not a dependency. So garble could
only support linkname targets belonging to dependencies. This is at the
root of why we could not obfuscate the runtime; it contains linkname
directives targeting the net package, for example, which depends on runtime.
Second, some other places did not have an easy access to obfuscated
action IDs, like transformAsm, which had to recover it from a temporary
file stored by transformCompile.
Plus, this was all pretty expensive, as each toolexec sub-process had to
make repeated calls to buildidOf with the object files of dependencies.
We even had to use extra calls to "go list" in the case of indirect
dependencies, as their export files do not appear in importcfg files.
All in all, the old method was complex and expensive. A better mechanism
is to use the original action IDs directly, as listed by "go list"
without garble in the picture.
This would mean that the hashing does not change if garble changes,
meaning weaker obfuscation. To regain that property, we define the
"garble action ID", which is just the original action ID hashed together
with garble's own content ID.
This is practically the same as the obfuscated build ID we used before,
but since it doesn't go through "go tool compile -V=full" and the
obfuscated build itself, we can work out *all* the garble action IDs
upfront, before the obfuscated build even starts.
This fixes all of our problems. Now we know all garble build IDs
upfront, so a bunch of hacks can be entirely removed. Plus, since we
know them upfront, we can also cache them and avoid repeated calls to
"go tool buildid".
While at it, make use of the new BuildID field in Go 1.16's "list -json
-export". This avoids the vast majority of "go tool buildid" calls, as
the only ones that remain are 2 on the garble binary itself.
The numbers for Go 1.16 look very good:
name old time/op new time/op delta
Build-8 146ms ± 4% 101ms ± 1% -31.01% (p=0.002 n=6+6)
name old bin-B new bin-B delta
Build-8 6.61M ± 0% 6.60M ± 0% -0.09% (p=0.002 n=6+6)
name old sys-time/op new sys-time/op delta
Build-8 321ms ± 7% 202ms ± 6% -37.11% (p=0.002 n=6+6)
name old user-time/op new user-time/op delta
Build-8 538ms ± 4% 414ms ± 4% -23.12% (p=0.002 n=6+6)
4 years ago
|
|
|
// buildIDComponentLength is the number of bytes each build ID component takes,
|
|
|
|
// such as an action ID or a content ID.
|
|
|
|
const buildIDComponentLength = 15
|
|
|
|
|
simplify globals, split hash.go (#191)
The previous globals worked, but were unnecessarily complex. For
example, we passed the fromPath variable around, but it's really a
static global, since we only compile or link a single package in each Go
process. Use such global variables instead of passing them around, which
currently include the package's import path, its build ID, and its
import config path.
Also split all the hashing and build ID code into hash.go, since that's
a relatively well contained 200 lines of code that doesn't need to make
main.go any bigger. We also split the code to alter Go's own version to
a separate function, so that it can be moved out of main.go as well.
4 years ago
|
|
|
// hashToString encodes the first 120 bits of a sha256 sum in base64, the same
|
start using original action IDs (#251)
When we obfuscate a name, what we do is hash the name with the action ID
of the package that contains the name. To ensure that the hash changes
if the garble tool changes, we used the action ID of the obfuscated
build, which is different than the original action ID, as we include
garble's own content ID in "go tool compile -V=full" via -toolexec.
Let's call that the "obfuscated action ID". Remember that a content ID
is roughly the hash of a binary or object file, and an action ID
contains the hash of a package's source code plus the content IDs of its
dependencies.
This had the advantage that it did what we wanted. However, it had one
massive drawback: when we compile a package, we only have the obfuscated
action IDs of its dependencies. This is because one can't have the
content ID of dependent packages before they are built.
Usually, this is not a problem, because hashing a foreign name means it
comes from a dependency, where we already have the obfuscated action ID.
However, that's not always the case.
First, go:linkname directives can point to any symbol that ends up in
the binary, even if the package is not a dependency. So garble could
only support linkname targets belonging to dependencies. This is at the
root of why we could not obfuscate the runtime; it contains linkname
directives targeting the net package, for example, which depends on runtime.
Second, some other places did not have an easy access to obfuscated
action IDs, like transformAsm, which had to recover it from a temporary
file stored by transformCompile.
Plus, this was all pretty expensive, as each toolexec sub-process had to
make repeated calls to buildidOf with the object files of dependencies.
We even had to use extra calls to "go list" in the case of indirect
dependencies, as their export files do not appear in importcfg files.
All in all, the old method was complex and expensive. A better mechanism
is to use the original action IDs directly, as listed by "go list"
without garble in the picture.
This would mean that the hashing does not change if garble changes,
meaning weaker obfuscation. To regain that property, we define the
"garble action ID", which is just the original action ID hashed together
with garble's own content ID.
This is practically the same as the obfuscated build ID we used before,
but since it doesn't go through "go tool compile -V=full" and the
obfuscated build itself, we can work out *all* the garble action IDs
upfront, before the obfuscated build even starts.
This fixes all of our problems. Now we know all garble build IDs
upfront, so a bunch of hacks can be entirely removed. Plus, since we
know them upfront, we can also cache them and avoid repeated calls to
"go tool buildid".
While at it, make use of the new BuildID field in Go 1.16's "list -json
-export". This avoids the vast majority of "go tool buildid" calls, as
the only ones that remain are 2 on the garble binary itself.
The numbers for Go 1.16 look very good:
name old time/op new time/op delta
Build-8 146ms ± 4% 101ms ± 1% -31.01% (p=0.002 n=6+6)
name old bin-B new bin-B delta
Build-8 6.61M ± 0% 6.60M ± 0% -0.09% (p=0.002 n=6+6)
name old sys-time/op new sys-time/op delta
Build-8 321ms ± 7% 202ms ± 6% -37.11% (p=0.002 n=6+6)
name old user-time/op new user-time/op delta
Build-8 538ms ± 4% 414ms ± 4% -23.12% (p=0.002 n=6+6)
4 years ago
|
|
|
// format used for components in a build ID.
|
simplify globals, split hash.go (#191)
The previous globals worked, but were unnecessarily complex. For
example, we passed the fromPath variable around, but it's really a
static global, since we only compile or link a single package in each Go
process. Use such global variables instead of passing them around, which
currently include the package's import path, its build ID, and its
import config path.
Also split all the hashing and build ID code into hash.go, since that's
a relatively well contained 200 lines of code that doesn't need to make
main.go any bigger. We also split the code to alter Go's own version to
a separate function, so that it can be moved out of main.go as well.
4 years ago
|
|
|
func hashToString(h []byte) string {
|
start using original action IDs (#251)
When we obfuscate a name, what we do is hash the name with the action ID
of the package that contains the name. To ensure that the hash changes
if the garble tool changes, we used the action ID of the obfuscated
build, which is different than the original action ID, as we include
garble's own content ID in "go tool compile -V=full" via -toolexec.
Let's call that the "obfuscated action ID". Remember that a content ID
is roughly the hash of a binary or object file, and an action ID
contains the hash of a package's source code plus the content IDs of its
dependencies.
This had the advantage that it did what we wanted. However, it had one
massive drawback: when we compile a package, we only have the obfuscated
action IDs of its dependencies. This is because one can't have the
content ID of dependent packages before they are built.
Usually, this is not a problem, because hashing a foreign name means it
comes from a dependency, where we already have the obfuscated action ID.
However, that's not always the case.
First, go:linkname directives can point to any symbol that ends up in
the binary, even if the package is not a dependency. So garble could
only support linkname targets belonging to dependencies. This is at the
root of why we could not obfuscate the runtime; it contains linkname
directives targeting the net package, for example, which depends on runtime.
Second, some other places did not have an easy access to obfuscated
action IDs, like transformAsm, which had to recover it from a temporary
file stored by transformCompile.
Plus, this was all pretty expensive, as each toolexec sub-process had to
make repeated calls to buildidOf with the object files of dependencies.
We even had to use extra calls to "go list" in the case of indirect
dependencies, as their export files do not appear in importcfg files.
All in all, the old method was complex and expensive. A better mechanism
is to use the original action IDs directly, as listed by "go list"
without garble in the picture.
This would mean that the hashing does not change if garble changes,
meaning weaker obfuscation. To regain that property, we define the
"garble action ID", which is just the original action ID hashed together
with garble's own content ID.
This is practically the same as the obfuscated build ID we used before,
but since it doesn't go through "go tool compile -V=full" and the
obfuscated build itself, we can work out *all* the garble action IDs
upfront, before the obfuscated build even starts.
This fixes all of our problems. Now we know all garble build IDs
upfront, so a bunch of hacks can be entirely removed. Plus, since we
know them upfront, we can also cache them and avoid repeated calls to
"go tool buildid".
While at it, make use of the new BuildID field in Go 1.16's "list -json
-export". This avoids the vast majority of "go tool buildid" calls, as
the only ones that remain are 2 on the garble binary itself.
The numbers for Go 1.16 look very good:
name old time/op new time/op delta
Build-8 146ms ± 4% 101ms ± 1% -31.01% (p=0.002 n=6+6)
name old bin-B new bin-B delta
Build-8 6.61M ± 0% 6.60M ± 0% -0.09% (p=0.002 n=6+6)
name old sys-time/op new sys-time/op delta
Build-8 321ms ± 7% 202ms ± 6% -37.11% (p=0.002 n=6+6)
name old user-time/op new user-time/op delta
Build-8 538ms ± 4% 414ms ± 4% -23.12% (p=0.002 n=6+6)
4 years ago
|
|
|
return base64.RawURLEncoding.EncodeToString(h[:buildIDComponentLength])
|
simplify globals, split hash.go (#191)
The previous globals worked, but were unnecessarily complex. For
example, we passed the fromPath variable around, but it's really a
static global, since we only compile or link a single package in each Go
process. Use such global variables instead of passing them around, which
currently include the package's import path, its build ID, and its
import config path.
Also split all the hashing and build ID code into hash.go, since that's
a relatively well contained 200 lines of code that doesn't need to make
main.go any bigger. We also split the code to alter Go's own version to
a separate function, so that it can be moved out of main.go as well.
4 years ago
|
|
|
}
|
|
|
|
|
|
|
|
func buildidOf(path string) (string, error) {
|
|
|
|
cmd := exec.Command("go", "tool", "buildid", path)
|
|
|
|
out, err := cmd.Output()
|
|
|
|
if err != nil {
|
|
|
|
if err, _ := err.(*exec.ExitError); err != nil {
|
|
|
|
return "", fmt.Errorf("%v: %s", err, err.Stderr)
|
|
|
|
}
|
|
|
|
return "", err
|
|
|
|
}
|
|
|
|
return string(out), nil
|
|
|
|
}
|
|
|
|
|
|
|
|
var (
|
|
|
|
// Hashed names are base64-encoded.
|
|
|
|
// Go names can only be letters, numbers, and underscores.
|
|
|
|
// This means we can use base64's URL encoding, minus '-'.
|
|
|
|
// Use the URL encoding, replacing '-' with a duplicate 'z'.
|
|
|
|
// Such a lossy encoding is fine, since we never decode hashes.
|
|
|
|
nameCharset = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789_z"
|
|
|
|
nameBase64 = base64.NewEncoding(nameCharset)
|
|
|
|
)
|
|
|
|
|
|
|
|
// These funcs mimic the unicode package API, but byte-based since we know
|
|
|
|
// base64 is all ASCII.
|
|
|
|
|
|
|
|
func isDigit(b byte) bool { return '0' <= b && b <= '9' }
|
|
|
|
func isLower(b byte) bool { return 'a' <= b && b <= 'z' }
|
|
|
|
func isUpper(b byte) bool { return 'A' <= b && b <= 'Z' }
|
|
|
|
func toLower(b byte) byte { return b + ('a' - 'A') }
|
|
|
|
func toUpper(b byte) byte { return b - ('a' - 'A') }
|
|
|
|
|
|
|
|
// hashWith returns a hashed version of name, including the provided salt as well as
|
|
|
|
// opts.Seed into the hash input.
|
|
|
|
//
|
|
|
|
// The result is always four bytes long. If the input was a valid identifier,
|
|
|
|
// the output remains equally exported or unexported. Note that this process is
|
|
|
|
// reproducible, but not reversible.
|
simplify globals, split hash.go (#191)
The previous globals worked, but were unnecessarily complex. For
example, we passed the fromPath variable around, but it's really a
static global, since we only compile or link a single package in each Go
process. Use such global variables instead of passing them around, which
currently include the package's import path, its build ID, and its
import config path.
Also split all the hashing and build ID code into hash.go, since that's
a relatively well contained 200 lines of code that doesn't need to make
main.go any bigger. We also split the code to alter Go's own version to
a separate function, so that it can be moved out of main.go as well.
4 years ago
|
|
|
func hashWith(salt []byte, name string) string {
|
obfuscate unexported names like exported ones (#227)
In 90fa325da7, the obfuscation logic was changed to use hashes for
exported names, but incremental names starting at just one letter for
unexported names. Presumably, this was done for the sake of binary size.
I argue that this is not a good idea for the default mode for a number
of reasons:
1) It makes reversing of stack traces nearly impossible for unexported
names, since replacing an obfuscated name "c" with "originalName"
would trigger too many false positives by matching single characters.
2) Exported and unexported names aren't different. We need to know how
names were obfuscated at a later time in both cases, thanks to use
cases like -ldflags=-X. Using short names for one but not the other
doesn't make a lot of sense, and makes the logic inconsistent.
3) Shaving off three bytes for unexported names doesn't seem like a huge
deal for the default mode, when we already have -tiny to optimize for
size.
This saves us a bit of work, but most importantly, simplifies the
obfuscation state as we no longer need to carry privateNameMap between
the compile and link stages.
name old time/op new time/op delta
Build-8 153ms ± 2% 150ms ± 2% ~ (p=0.065 n=6+6)
name old bin-B new bin-B delta
Build-8 7.09M ± 0% 7.08M ± 0% -0.24% (p=0.002 n=6+6)
name old sys-time/op new sys-time/op delta
Build-8 296ms ± 5% 277ms ± 6% -6.50% (p=0.026 n=6+6)
name old user-time/op new user-time/op delta
Build-8 562ms ± 1% 558ms ± 3% ~ (p=0.329 n=5+6)
Note that I do not oppose using short names for both exported and
unexported names in the future for -tiny, since reversing of stack
traces will by design not work there. The code can be resurrected from
the git history if we want to improve -tiny that way in the future, as
we'd need to store state in header files again.
Another major cleanup we can do here is to no longer use the
garbledImports map. From a look at obfuscateImports, we hash a package's
import path with its action ID, much like exported names, so we can
simply re-do that hashing for the linker's -X flag.
garbledImports does have some logic to handle duplicate package names,
but it's worth noting that should not affect package paths, as they are
always unique. That area of code could probably do with some
simplification in the future, too.
While at it, make hashWith panic if either parameter is empty.
obfuscateImports was hashing the main package path without a salt due to
a bug, so we want to catch those in the future.
Finally, make some tiny spacing and typo tweaks to the README.
4 years ago
|
|
|
if len(salt) == 0 {
|
|
|
|
panic("hashWith: empty salt")
|
|
|
|
}
|
|
|
|
if name == "" {
|
|
|
|
panic("hashWith: empty name")
|
|
|
|
}
|
use more bits for the obfuscated name hashes (#248)
We've been using four base64 characters for obfuscated names for a
while. And that has mostly worked, since most packages only have up to a
few hundred exported or unexported names at a time.
However, we have already encountered two collisions in the wild, which
can be reproduced with one seed but not another:
[...] PsaN.hQyW is a field, not a method
[...] byte is not a type
In both of those cases, we happened to run into a collision by chance.
And that's not terribly unlikely to begin with; even with just 100
names, the probability of a collision was about 0.03%. It dramatically
goes up if there are more names; with 500, we're already around 0.75%.
It's clear that four base64 chars is not enough to properly avoid
collisions in the vast majority of cases. But how many characters are
enough? The target should be that, even with a very large package and
lots of names, we should still practically never have a collision.
I did some basic estimation with "lots of names" being ten thousand,
with "practically never" being a one in a million chance. We need to go
all the way up to eight characters to reach that probability.
It's entirely possible that 7 or even 6 characters would be enough for
most users. However, collisions result in confusing errors which are
also hard to reproduce for us unless we can use exactly the same seed
and source code for a build.
So, play it safe, and use 8 characters. The constant now also has
documentation explaining how we arrived at that figure.
4 years ago
|
|
|
// hashLength is the number of base64 characters to use for the final
|
|
|
|
// hashed name.
|
|
|
|
// This needs to be long enough to realistically avoid hash collisions,
|
|
|
|
// but short enough to not bloat binary sizes.
|
|
|
|
// The namespace for collisions is generally a single package, since
|
|
|
|
// that's where most hashed names are namespaced to.
|
|
|
|
// Using a "hash collision" formula, and taking a generous estimate of a
|
|
|
|
// package having 10k names, we get the following probabilities.
|
|
|
|
// Most packages will have far fewer names, but some packages are huge,
|
|
|
|
// especially generated ones.
|
|
|
|
// We also have slightly fewer bits in practice, since the base64
|
|
|
|
// charset has 'z' twice, and the first base64 char is coerced into a
|
|
|
|
// valid Go identifier. So we must be conservative.
|
|
|
|
// Remember that base64 stores 6 bits per encoded byte.
|
|
|
|
// The probability numbers are approximated.
|
|
|
|
//
|
|
|
|
// length (base64) | length (bits) | collision probability
|
|
|
|
// -------------------------------------------------------
|
|
|
|
// 4 24 ~95%
|
|
|
|
// 5 30 ~4%
|
|
|
|
// 6 36 ~0.07%
|
|
|
|
// 7 42 ~0.001%
|
|
|
|
// 8 48 ~0.00001%
|
|
|
|
//
|
|
|
|
// We want collisions to be practically impossible, so we choose 8 to
|
|
|
|
// end up with a chance of about 1 in a million even when a package has
|
|
|
|
// thousands of obfuscated names.
|
|
|
|
const hashLength = 8
|
simplify globals, split hash.go (#191)
The previous globals worked, but were unnecessarily complex. For
example, we passed the fromPath variable around, but it's really a
static global, since we only compile or link a single package in each Go
process. Use such global variables instead of passing them around, which
currently include the package's import path, its build ID, and its
import config path.
Also split all the hashing and build ID code into hash.go, since that's
a relatively well contained 200 lines of code that doesn't need to make
main.go any bigger. We also split the code to alter Go's own version to
a separate function, so that it can be moved out of main.go as well.
4 years ago
|
|
|
|
|
|
|
d := sha256.New()
|
|
|
|
d.Write(salt)
|
|
|
|
d.Write(opts.Seed)
|
simplify globals, split hash.go (#191)
The previous globals worked, but were unnecessarily complex. For
example, we passed the fromPath variable around, but it's really a
static global, since we only compile or link a single package in each Go
process. Use such global variables instead of passing them around, which
currently include the package's import path, its build ID, and its
import config path.
Also split all the hashing and build ID code into hash.go, since that's
a relatively well contained 200 lines of code that doesn't need to make
main.go any bigger. We also split the code to alter Go's own version to
a separate function, so that it can be moved out of main.go as well.
4 years ago
|
|
|
io.WriteString(d, name)
|
|
|
|
sum := make([]byte, nameBase64.EncodedLen(d.Size()))
|
|
|
|
nameBase64.Encode(sum, d.Sum(nil))
|
use more bits for the obfuscated name hashes (#248)
We've been using four base64 characters for obfuscated names for a
while. And that has mostly worked, since most packages only have up to a
few hundred exported or unexported names at a time.
However, we have already encountered two collisions in the wild, which
can be reproduced with one seed but not another:
[...] PsaN.hQyW is a field, not a method
[...] byte is not a type
In both of those cases, we happened to run into a collision by chance.
And that's not terribly unlikely to begin with; even with just 100
names, the probability of a collision was about 0.03%. It dramatically
goes up if there are more names; with 500, we're already around 0.75%.
It's clear that four base64 chars is not enough to properly avoid
collisions in the vast majority of cases. But how many characters are
enough? The target should be that, even with a very large package and
lots of names, we should still practically never have a collision.
I did some basic estimation with "lots of names" being ten thousand,
with "practically never" being a one in a million chance. We need to go
all the way up to eight characters to reach that probability.
It's entirely possible that 7 or even 6 characters would be enough for
most users. However, collisions result in confusing errors which are
also hard to reproduce for us unless we can use exactly the same seed
and source code for a build.
So, play it safe, and use 8 characters. The constant now also has
documentation explaining how we arrived at that figure.
4 years ago
|
|
|
sum = sum[:hashLength]
|
|
|
|
|
|
|
|
// Even if we are hashing a package path, we still want the result to be
|
|
|
|
// a valid identifier, since we'll use it as the package name too.
|
|
|
|
if isDigit(sum[0]) {
|
|
|
|
// Turn "3foo" into "Dfoo".
|
|
|
|
// Similar to toLower, since uppercase letters go after digits
|
|
|
|
// in the ASCII table.
|
|
|
|
sum[0] += 'A' - '0'
|
|
|
|
}
|
|
|
|
// Keep the result equally exported or not, if it was an identifier.
|
|
|
|
if !token.IsIdentifier(name) {
|
|
|
|
return string(sum)
|
|
|
|
}
|
simplify globals, split hash.go (#191)
The previous globals worked, but were unnecessarily complex. For
example, we passed the fromPath variable around, but it's really a
static global, since we only compile or link a single package in each Go
process. Use such global variables instead of passing them around, which
currently include the package's import path, its build ID, and its
import config path.
Also split all the hashing and build ID code into hash.go, since that's
a relatively well contained 200 lines of code that doesn't need to make
main.go any bigger. We also split the code to alter Go's own version to
a separate function, so that it can be moved out of main.go as well.
4 years ago
|
|
|
if token.IsExported(name) {
|
|
|
|
if sum[0] == '_' {
|
|
|
|
// Turn "_foo" into "Zfoo".
|
|
|
|
sum[0] = 'Z'
|
|
|
|
} else if isLower(sum[0]) {
|
|
|
|
// Turn "afoo" into "Afoo".
|
|
|
|
sum[0] = toUpper(sum[0])
|
|
|
|
}
|
|
|
|
} else {
|
|
|
|
if isUpper(sum[0]) {
|
|
|
|
// Turn "Afoo" into "afoo".
|
|
|
|
sum[0] = toLower(sum[0])
|
|
|
|
}
|
simplify globals, split hash.go (#191)
The previous globals worked, but were unnecessarily complex. For
example, we passed the fromPath variable around, but it's really a
static global, since we only compile or link a single package in each Go
process. Use such global variables instead of passing them around, which
currently include the package's import path, its build ID, and its
import config path.
Also split all the hashing and build ID code into hash.go, since that's
a relatively well contained 200 lines of code that doesn't need to make
main.go any bigger. We also split the code to alter Go's own version to
a separate function, so that it can be moved out of main.go as well.
4 years ago
|
|
|
}
|
|
|
|
return string(sum)
|
simplify globals, split hash.go (#191)
The previous globals worked, but were unnecessarily complex. For
example, we passed the fromPath variable around, but it's really a
static global, since we only compile or link a single package in each Go
process. Use such global variables instead of passing them around, which
currently include the package's import path, its build ID, and its
import config path.
Also split all the hashing and build ID code into hash.go, since that's
a relatively well contained 200 lines of code that doesn't need to make
main.go any bigger. We also split the code to alter Go's own version to
a separate function, so that it can be moved out of main.go as well.
4 years ago
|
|
|
}
|