As spotted by the protobuf package via check-third-party.sh,
the two structs below are identical:
type alias1 = int64
type Struct1 struct { Alias alias1 }
type alias2 = int64
type Struct2 struct { Alias alias2 }
Our previous approach with stripStructTags dealt with struct tags,
but it did not deal with aliases, which are now present in go/types
thanks to the new alias tracking.
The new approach properly ignores struct tags and unaliases any
type aliases, resulting in correct hashes of any type.
See the two "NOTE" comments below. In summary:
* struct field tags are not hashed
* named types are hashed by name rather than by pointer
We are keeping an eye on https://go.dev/issue/69420 as well.
Allows us to remove EmbeddedAliasFields, recordedObjectString,
and all the logic around using them.
Resolves an issue where a user was running into a panic in
our logic to record embedded aliases.
Note that this means we require Go 1.23.5 or later now,
which also meant some changes to goversion.txtar to keep it green.
Fixes#827.
obfuscatedImportPath already handled ToObfuscate, so the callers
don't have to do that as well. Handle main packages too, whose logic
was sprinkled and repeated throughout the project.
Reflection can show package names alone via reflect.Type.String,
and I'm sure they are available in other ways.
We were obfuscating "package p" exactly like the import path "foo.com/p"
which worked OK for the most part, but not for reflection.
This is also a slight improvement to the quality of the obfuscation,
given that package names and import paths will no longer be aligned
when obfuscated.
There's no need to reach for sharedCache.ListedPackages
when the caller is computePkgCache, which already has the lpkg value.
While here, compare *go/types.Package pointers directly
rather than via their import path strings.
This function returns obfuscated names, so use that as its name.
Moreover, some of the callers still called the result "objStr",
which misled me into thinking the string was a unique object path.
Leave a TODO behind about using go/types/objectpath too.
Our own recordedObjectString is sort of similar, but not as principled.
The list flags are entirely unused, so they can be omitted.
The only argument that matters is the package argument to load.
While here, update the TODO, as it no longer applies.
No changes to the values. However, to avoid noise with every bugfix
release changing all the lines, only use go1.X rather than go1.X.Y
to annotate each of the entries in maps and lists.
Now that we only use the list to create a replacer at init time,
we no longer need to spend extra effort sorting by length first.
The benchmark shows no measurable difference in performance.
This way, rather than using a double loop quadratic algorithm
to search for each name to replace in a string,
we can make use of the reasonably efficient generic replacer
which makes use of tries.
Copying some code from the strings package is not ideal,
but it beats having to re-implement such an algorithm ourselves.
Not only is the algorithm much faster, as we are no longer quadratic,
but the replacer also appends into a buffer to avoid repeated string
copies, which means we allocate fewer bytes per operation.
│ old │ new │
│ sec/op │ sec/op vs base │
AbiOriginalNames-8 135708.0n ± 0% 391.1n ± 5% -99.71% (p=0.001 n=7)
│ old │ new │
│ B/s │ B/s vs base │
AbiOriginalNames-8 2.565Mi ± 0% 890.112Mi ± 4% +34597.03% (p=0.001 n=7)
│ old │ new │
│ B/op │ B/op vs base │
AbiOriginalNames-8 5464.0 ± 0% 848.0 ± 0% -84.48% (p=0.001 n=7)
│ old │ new │
│ allocs/op │ allocs/op vs base │
AbiOriginalNames-8 18.00 ± 0% 16.00 ± 0% -11.11% (p=0.001 n=7)
As Paul Scheduikat points out, the loop already does not start
if the length of name is less than minHashLength.
│ old │ new │
│ sec/op │ sec/op vs base │
AbiOriginalNames-8 135.6µ ± 1% 135.5µ ± 0% ~ (p=1.000 n=7)
Use the term "original name" rather than "real name" for the code
as it is clearer that we mean the unobfuscated name.
Update the comments at the top of the file to be clearer with the
explanation of what kinds of inputs we can expect.
While here, use fmt.Appendf to simplify the generation code a bit.
This lets us only check names up to the remaining input string length.
│ old │ new │
│ sec/op │ sec/op vs base │
AbiRealName-8 196.7µ ± 1% 172.3µ ± 1% -12.41% (p=0.001 n=7)
│ old │ new │
│ B/s │ B/s vs base │
AbiRealName-8 1.774Mi ± 1% 2.022Mi ± 0% +13.98% (p=0.001 n=7)
│ old │ new │
│ B/op │ B/op vs base │
AbiRealName-8 5.359Ki ± 0% 5.336Ki ± 0% -0.44% (p=0.001 n=7)
│ old │ new │
│ allocs/op │ allocs/op vs base │
AbiRealName-8 19.00 ± 0% 18.00 ± 0% -5.26% (p=0.001 n=7)
Iterating over a map is much more expensive than iterating over a slice,
given how it needs to work out which keys are present in each bucket
and then randomize the order in which to navigate the keys.
None of this work needs to happen when iterating over a slice.
A map would be nice if we were to actually do map lookups, but we don't.
│ old │ new │
│ sec/op │ sec/op vs base │
AbiRealName-8 707.1µ ± 1% 196.7µ ± 1% -72.17% (p=0.001 n=7)
│ old │ new │
│ B/s │ B/s vs base │
AbiRealName-8 517.6Ki ± 2% 1816.4Ki ± 1% +250.94% (p=0.001 n=7)
│ old │ new │
│ B/op │ B/op vs base │
AbiRealName-8 5.362Ki ± 0% 5.359Ki ± 0% -0.05% (p=0.001 n=7)
│ old │ new │
│ allocs/op │ allocs/op vs base │
AbiRealName-8 19.00 ± 0% 19.00 ± 0% ~ (p=1.000 n=7) ¹
Go code can retrieve and use field and method names via the `reflect` package.
For that reason, historically we did not obfuscate names of fields and methods
underneath types that we detected as used for reflection, via e.g. `reflect.TypeOf`.
However, that caused a number of issues. Since we obfuscate and build one package
at a time, we could only detect when types were used for reflection in their own package
or in upstream packages. Use of reflection in downstream packages would be detected
too late, causing one package to obfuscate the names and the other not to, leading to a build failure.
A different approach is implemented here. All names are obfuscated now, but we collect
those types used for reflection, and at the end of a build in `package main`,
we inject a function into the runtime's `internal/abi` package to reverse the obfuscation
for those names which can be used for reflection.
This does mean that the obfuscation for these names is very weak, as the binary
contains a one-to-one mapping to their original names, but they cannot be obfuscated
without breaking too many Go packages out in the wild. There is also some amount
of overhead in `internal/abi` due to this, but we aim to make the overhead insignificant.
Fixes#884, #799, #817, #881, #858, #843, #842Closes#406
We used it to detect GOOS-specific packages and ignore their load errors
without having to do a substring search.
However, it turns out that repeatedly loading the string slice
from gob files in the cache is rather slow, particularly since many
Go packages have dozens of GOOS-specific files which can be ignored.
│ old │ new │
│ cached-sec/op │ cached-sec/op vs base │
Build-8 340.3m ± 1% 335.8m ± 2% -1.32% (p=0.002 n=10)
│ old │ new │
│ mallocs/op │ mallocs/op vs base │
Build-8 35.73M ± 0% 35.09M ± 0% -1.79% (p=0.000 n=10)
In looking at the cpu and memory profiles, it surfaced that we spent
a lot of time in garbage collection, and a significant amount of the
garbage was produced by gob decoding string slices.
listedPackage.Deps is a list of a package's transitive dependencies,
so as a Go build gets larger, the list also gets larger and larger.
Given that Imports is the list of direct dependencies,
we can reconstruct it ourselves as needed, which is not always.
Moreover, since we want to do lookups, we can build a map directly.
This doesn't directly result in a wall time speed-up,
but it does result in a significant reduction in allocations.
The gob files we store in the disk cache should also be a bit smaller.
│ old │ new │
│ cached-sec/op │ cached-sec/op vs base │
Build-8 339.5m ± 2% 340.3m ± 1% ~ (p=0.218 n=10)
│ old │ new │
│ mallocs/op │ mallocs/op vs base │
Build-8 38.08M ± 0% 35.73M ± 0% -6.18% (p=0.000 n=10)
This allows me to collect a full CPU profile, showing us that we clearly
spend too much CPU time in garbage collection.
When collecting a full memory profile, we can see where the allocations
come from now:
Showing nodes accounting for 5636770, 31.07% of 18141579 total
Dropped 630 nodes (cum <= 90707)
Showing top 10 nodes out of 278
flat flat% sum% cum cum%
1692229 9.33% 9.33% 1910679 10.53% encoding/gob.decStringSlice
1005596 5.54% 14.87% 1005596 5.54% golang.org/x/tools/go/ssa.buildReferrers
458753 2.53% 17.40% 458753 2.53% go/scanner.(*Scanner).scanIdentifier
458752 2.53% 19.93% 458752 2.53% reflect.(*structType).Field
425984 2.35% 22.28% 448136 2.47% go/parser.(*parser).parseIdent
390049 2.15% 24.43% 390049 2.15% golang.org/x/tools/go/ssa.(*BasicBlock).emit
327683 1.81% 26.23% 371373 2.05% golang.org/x/tools/go/ssa.NewConst
311296 1.72% 27.95% 1024551 5.65% mvdan.cc/garble.(*transformer).transformGoFile.func1
287891 1.59% 29.54% 287891 1.59% encoding/gob.decString
278537 1.54% 31.07% 657043 3.62% golang.org/x/tools/go/ssa.(*builder).compLit
This method can work for any invocation of garble,
but for now we only directly wire it up for `go test -bench`.
It can still be used for regular invocations of `garble build`.
x/exp/rand was being used for no apparent reason; use math/rand.
x/exp/maps and x/exp/slices can be replaced with maps and slices
respectively now that we require Go 1.23 or later.
Note that the APIs are slightly different due to iterators.
This lets us start taking advantage of featurs from Go 1.23,
particularly tracking aliases in go/types and iterators.
Note that we need to add code to properly handle or skip over the new
*types.Alias type which go/types produces for Go type aliases.
Also note that we actually turn this mode off entirely for now,
due to the bug reported at https://go.dev/issue/70394.
We don't yet remove our own alias tracking code yet due to the above.
We hope to be able to remove it very soon.
Otherwise we miscalculate int sizes, type sizes, alignments, and so on.
Caught by the GOARCH=386 go test on CI, since the os package imports
internal/syscall/unix, which uses arch-dependent padding.
The different padding between our incorrect use of go/types
and the correct typechecking done by the compiler caused different
obfuscation of fields, as the struct types stringified differently,
and they are used as a hash salt for field name obfuscation.
Recently, a patch changed the argument `-mod=` to `-mod=readonly`
as the former is not really a valid flag value, and broke with go.work.
However, the latter seems to break our tests on Go 1.22.6
when listing all of runtimeLinknamed:
panic: failed to load missing runtime-linknamed packages: golang.org/x/crypto@v0.16.1-0.20231129163542-152cdb1503eb:
reading http://127.0.0.1:43357/mod/golang.org/x/crypto/@v/v0.16.1-0.20231129163542-152cdb1503eb.mod: 404 Not Found
It seems like, somehow, listing std packages was trying to download
x/crypto from GOPROXY - which is a local server with testdata/mod,
and so it does not contain x/crypto. However, this is entirely wrong,
as std vendors dependencies, including this very version of x/crypto.
Reverting the change to `-mod=readonly` resolves this issue,
which explains why we hadn't encountered this surprising GOPROXY error,
but the revert would also break users of go.work files.
Luckily, we have a better alternative: rather than trying to override
the value of the flags by adding more arguments, delete them entirely.
And update some actions and staticcheck while here.
Drop the testing of Go master as well, as I haven't used or maintained
such a setup for a while now. We can simply add Go 1.24 RC versions
to the go-version matrix once they come out.
Fixes#859.