garble

Commit Graph

Author	SHA1	Message	Date
Daniel Martí	6898d61637	start using original action IDs (#251 ) When we obfuscate a name, what we do is hash the name with the action ID of the package that contains the name. To ensure that the hash changes if the garble tool changes, we used the action ID of the obfuscated build, which is different than the original action ID, as we include garble's own content ID in "go tool compile -V=full" via -toolexec. Let's call that the "obfuscated action ID". Remember that a content ID is roughly the hash of a binary or object file, and an action ID contains the hash of a package's source code plus the content IDs of its dependencies. This had the advantage that it did what we wanted. However, it had one massive drawback: when we compile a package, we only have the obfuscated action IDs of its dependencies. This is because one can't have the content ID of dependent packages before they are built. Usually, this is not a problem, because hashing a foreign name means it comes from a dependency, where we already have the obfuscated action ID. However, that's not always the case. First, go:linkname directives can point to any symbol that ends up in the binary, even if the package is not a dependency. So garble could only support linkname targets belonging to dependencies. This is at the root of why we could not obfuscate the runtime; it contains linkname directives targeting the net package, for example, which depends on runtime. Second, some other places did not have an easy access to obfuscated action IDs, like transformAsm, which had to recover it from a temporary file stored by transformCompile. Plus, this was all pretty expensive, as each toolexec sub-process had to make repeated calls to buildidOf with the object files of dependencies. We even had to use extra calls to "go list" in the case of indirect dependencies, as their export files do not appear in importcfg files. All in all, the old method was complex and expensive. A better mechanism is to use the original action IDs directly, as listed by "go list" without garble in the picture. This would mean that the hashing does not change if garble changes, meaning weaker obfuscation. To regain that property, we define the "garble action ID", which is just the original action ID hashed together with garble's own content ID. This is practically the same as the obfuscated build ID we used before, but since it doesn't go through "go tool compile -V=full" and the obfuscated build itself, we can work out all the garble action IDs upfront, before the obfuscated build even starts. This fixes all of our problems. Now we know all garble build IDs upfront, so a bunch of hacks can be entirely removed. Plus, since we know them upfront, we can also cache them and avoid repeated calls to "go tool buildid". While at it, make use of the new BuildID field in Go 1.16's "list -json -export". This avoids the vast majority of "go tool buildid" calls, as the only ones that remain are 2 on the garble binary itself. The numbers for Go 1.16 look very good: name old time/op new time/op delta Build-8 146ms ± 4% 101ms ± 1% -31.01% (p=0.002 n=6+6) name old bin-B new bin-B delta Build-8 6.61M ± 0% 6.60M ± 0% -0.09% (p=0.002 n=6+6) name old sys-time/op new sys-time/op delta Build-8 321ms ± 7% 202ms ± 6% -37.11% (p=0.002 n=6+6) name old user-time/op new user-time/op delta Build-8 538ms ± 4% 414ms ± 4% -23.12% (p=0.002 n=6+6)	3 years ago
Daniel Martí	a223147093	use more bits for the obfuscated name hashes (#248 ) We've been using four base64 characters for obfuscated names for a while. And that has mostly worked, since most packages only have up to a few hundred exported or unexported names at a time. However, we have already encountered two collisions in the wild, which can be reproduced with one seed but not another: [...] PsaN.hQyW is a field, not a method [...] byte is not a type In both of those cases, we happened to run into a collision by chance. And that's not terribly unlikely to begin with; even with just 100 names, the probability of a collision was about 0.03%. It dramatically goes up if there are more names; with 500, we're already around 0.75%. It's clear that four base64 chars is not enough to properly avoid collisions in the vast majority of cases. But how many characters are enough? The target should be that, even with a very large package and lots of names, we should still practically never have a collision. I did some basic estimation with "lots of names" being ten thousand, with "practically never" being a one in a million chance. We need to go all the way up to eight characters to reach that probability. It's entirely possible that 7 or even 6 characters would be enough for most users. However, collisions result in confusing errors which are also hard to reproduce for us unless we can use exactly the same seed and source code for a build. So, play it safe, and use 8 characters. The constant now also has documentation explaining how we arrived at that figure.	3 years ago
Daniel Martí	840cf9b68d	make hashWith a bit smarter (#238 ) It used to return five-byte strings, and now it returns four bytes with nearly the same number of bits of entropy. It also avoids the exported vs unexported dance if the name isn't an identifier, which is now common with import paths. See the added docs for more details.	3 years ago
Daniel Martí	e64fccd367	better document and position the hash base64 encoding (#234 ) We now document why we use a custom base64 charset. The old "b64" name was also too generic, so it might have been misused for other purposes.	3 years ago
Daniel Martí	79c775e218	obfuscate unexported names like exported ones (#227 ) In `90fa325da7`, the obfuscation logic was changed to use hashes for exported names, but incremental names starting at just one letter for unexported names. Presumably, this was done for the sake of binary size. I argue that this is not a good idea for the default mode for a number of reasons: 1) It makes reversing of stack traces nearly impossible for unexported names, since replacing an obfuscated name "c" with "originalName" would trigger too many false positives by matching single characters. 2) Exported and unexported names aren't different. We need to know how names were obfuscated at a later time in both cases, thanks to use cases like -ldflags=-X. Using short names for one but not the other doesn't make a lot of sense, and makes the logic inconsistent. 3) Shaving off three bytes for unexported names doesn't seem like a huge deal for the default mode, when we already have -tiny to optimize for size. This saves us a bit of work, but most importantly, simplifies the obfuscation state as we no longer need to carry privateNameMap between the compile and link stages. name old time/op new time/op delta Build-8 153ms ± 2% 150ms ± 2% ~ (p=0.065 n=6+6) name old bin-B new bin-B delta Build-8 7.09M ± 0% 7.08M ± 0% -0.24% (p=0.002 n=6+6) name old sys-time/op new sys-time/op delta Build-8 296ms ± 5% 277ms ± 6% -6.50% (p=0.026 n=6+6) name old user-time/op new user-time/op delta Build-8 562ms ± 1% 558ms ± 3% ~ (p=0.329 n=5+6) Note that I do not oppose using short names for both exported and unexported names in the future for -tiny, since reversing of stack traces will by design not work there. The code can be resurrected from the git history if we want to improve -tiny that way in the future, as we'd need to store state in header files again. Another major cleanup we can do here is to no longer use the garbledImports map. From a look at obfuscateImports, we hash a package's import path with its action ID, much like exported names, so we can simply re-do that hashing for the linker's -X flag. garbledImports does have some logic to handle duplicate package names, but it's worth noting that should not affect package paths, as they are always unique. That area of code could probably do with some simplification in the future, too. While at it, make hashWith panic if either parameter is empty. obfuscateImports was hashing the main package path without a salt due to a bug, so we want to catch those in the future. Finally, make some tiny spacing and typo tweaks to the README.	3 years ago
Daniel Martí	249501b5e9	fix garbling names belonging to indirect imports (#203 ) main.go includes a lengthy comment that documents this edge case, why it happened, and how we are fixing it. To summarize, we should no longer error with a build error in those cases. Read the comment for details. A few other minor changes were done to allow writing this patch. First, the actionID and contentID funcs were renamed, since they started to collide with variable names. Second, the logging has been improved a bit, which allowed me to debug the issue. Third, the "cache" global shared by all garble sub-processes now includes the necessary parameters to run "go list -toolexec", including the path to garble and the build flags being used. Thanks to lu4p for writing a test case, which also applied gofmt to that testdata Go file. Fixes #180. Closes #181, since it includes its test case.	4 years ago
lu4p	cf290b8e6d	Share data between processes via a shared file. (#192 ) Previously garble heavily used env vars to share data between processes. This also makes it easy to share complex data between processes. The complexity of main.go is considerably reduced.	4 years ago
Daniel Martí	dfa622fe50	simplify globals, split hash.go (#191 ) The previous globals worked, but were unnecessarily complex. For example, we passed the fromPath variable around, but it's really a static global, since we only compile or link a single package in each Go process. Use such global variables instead of passing them around, which currently include the package's import path, its build ID, and its import config path. Also split all the hashing and build ID code into hash.go, since that's a relatively well contained 200 lines of code that doesn't need to make main.go any bigger. We also split the code to alter Go's own version to a separate function, so that it can be moved out of main.go as well.	4 years ago

8 Commits (c0c5a75454ea50120154e7f6da4fb20285ab9389)