rework the position obfuscator (#282)
First, rename line_obfuscator.go to position.go. We obfuscate filenames,
not just line numbers, and "obfuscator" is a bit redundant.
Second, use "/*line :x*/" comments rather than the "//line :x" form, as
the former allows us to insert them in any position without adding
unnecessary newlines. This will be important for changing the position
of call sites, which will be important for "garble reverse".
Third, do not rely on go/ast to remove and add comments. Since they are
free-floating, we can very easily end up with misplaced comments,
especially as the literal obfuscator heavily modifies the AST.
The new method prints and re-parses the file, to ensure all node
positions are consistent with a buffer, buf1. Then, we copy the contents
into a new buffer, buf2, while inserting the comments that we need.
The new method also modifies line numbers at the very end of obfuscating
a Go file, instead of at the very beginning. That's going to be more
robust long-term, as we will also obfuscate line numbers for any
additions or modifications to the AST.
Fourth, detachedDirectives is unnecessary, as we can accomplish the same
with two simple prefix matches.
Finally, this means we can stop using detachedComments entirely, as
printFile already inserts the comments we need.
For #5.
4 years ago
|
|
|
// Copyright (c) 2020, The Garble Authors.
|
|
|
|
// See LICENSE for licensing information.
|
|
|
|
|
|
|
|
package main
|
|
|
|
|
|
|
|
import (
|
|
|
|
"bytes"
|
|
|
|
"fmt"
|
|
|
|
"go/ast"
|
|
|
|
"go/printer"
|
replace go/parser with go/scanner in printFile
printFile is one of the functions to blame for most of the CPU cost and
allocations for garble itself, as reported by `perf record` for a clean build.
One contributor is how we print each file and then parse it again,
which we did for the sake of inserting line directives correctly.
With a bit of care, we can do this by tokenizing after printing,
as opposed to parsing into a full go/ast again.
This is moderately cheaper, but more than anything, allocates far less.
That is to be expected given how go/ast is a tree of pointers,
whereas go/scanner simply gives us a stream of tokens.
name old time/op new time/op delta
Build-16 10.4s ± 2% 10.3s ± 1% ~ (p=0.393 n=10+10)
name old bin-B new bin-B delta
Build-16 5.51M ± 0% 5.51M ± 0% ~ (all equal)
name old cached-time/op new cached-time/op delta
Build-16 398ms ±12% 391ms ±10% ~ (p=0.529 n=10+10)
name old mallocs/op new mallocs/op delta
Build-16 34.4M ± 0% 31.8M ± 0% -7.65% (p=0.000 n=10+10)
name old sys-time/op new sys-time/op delta
Build-16 5.80s ± 6% 5.86s ± 4% ~ (p=0.218 n=10+10)
The new code is shorter, but perhaps a bit trickier,
so I also added more comments to explain what's going on.
Note how the time/op change is practically noise,
but mallocs/op goes down significantly, which is always a good sign.
3 years ago
|
|
|
"go/scanner"
|
|
|
|
"go/token"
|
rework the position obfuscator (#282)
First, rename line_obfuscator.go to position.go. We obfuscate filenames,
not just line numbers, and "obfuscator" is a bit redundant.
Second, use "/*line :x*/" comments rather than the "//line :x" form, as
the former allows us to insert them in any position without adding
unnecessary newlines. This will be important for changing the position
of call sites, which will be important for "garble reverse".
Third, do not rely on go/ast to remove and add comments. Since they are
free-floating, we can very easily end up with misplaced comments,
especially as the literal obfuscator heavily modifies the AST.
The new method prints and re-parses the file, to ensure all node
positions are consistent with a buffer, buf1. Then, we copy the contents
into a new buffer, buf2, while inserting the comments that we need.
The new method also modifies line numbers at the very end of obfuscating
a Go file, instead of at the very beginning. That's going to be more
robust long-term, as we will also obfuscate line numbers for any
additions or modifications to the AST.
Fourth, detachedDirectives is unnecessary, as we can accomplish the same
with two simple prefix matches.
Finally, this means we can stop using detachedComments entirely, as
printFile already inserts the comments we need.
For #5.
4 years ago
|
|
|
"path/filepath"
|
|
|
|
"strings"
|
|
|
|
)
|
|
|
|
|
|
|
|
var printBuf1, printBuf2 bytes.Buffer
|
|
|
|
|
rework the position obfuscator (#282)
First, rename line_obfuscator.go to position.go. We obfuscate filenames,
not just line numbers, and "obfuscator" is a bit redundant.
Second, use "/*line :x*/" comments rather than the "//line :x" form, as
the former allows us to insert them in any position without adding
unnecessary newlines. This will be important for changing the position
of call sites, which will be important for "garble reverse".
Third, do not rely on go/ast to remove and add comments. Since they are
free-floating, we can very easily end up with misplaced comments,
especially as the literal obfuscator heavily modifies the AST.
The new method prints and re-parses the file, to ensure all node
positions are consistent with a buffer, buf1. Then, we copy the contents
into a new buffer, buf2, while inserting the comments that we need.
The new method also modifies line numbers at the very end of obfuscating
a Go file, instead of at the very beginning. That's going to be more
robust long-term, as we will also obfuscate line numbers for any
additions or modifications to the AST.
Fourth, detachedDirectives is unnecessary, as we can accomplish the same
with two simple prefix matches.
Finally, this means we can stop using detachedComments entirely, as
printFile already inserts the comments we need.
For #5.
4 years ago
|
|
|
// printFile prints a Go file to a buffer, while also removing non-directive
|
work around another go/printer bug to fix andybalholm/brotli
When obfuscating the following piece of code:
func issue_573(s struct{ f int }) {
var _ *int = &s.f
/*x*/
}
the function body would roughly end up printed as:
we would roughly end up with:
var _ *int = &dZ4xYx3N
/*x*/.rbg1IM3V
Note that the /*x*/ comment got moved earlier in the source code.
This happens because the new identifiers are longer, so the printer
thinks that the selector now ends past the comment.
That would be fine - we don't really mind where comments end up,
because these non-directive comments end up being removed anyway.
However, the resulting syntax is wrong, as the period for the selector
must be on the first line rather than the second.
This is a go/printer bug that we should fix upstream,
but until then, we must work around it in Go 1.18.x and 1.19.x.
The fix is somewhat obvious in hindsight. To reduce the chances that
go/printer will trip over comments and produce invalid syntax,
get rid of most comments before we use the printer.
We still keep the removal of comments after printing,
since go/printer consumes some comments in ast.Node Doc fields.
Add the minimized unit test case above, and add the upstream project
that found this bug to check-third-party.
andybalholm/brotli helps cover a compression algorithm and ccgo code
generation from C to Go, and it's also a fairly popular module,
particular with HTTP implementations which want pure-Go brotli.
While here, fix the check-third-party script: it was setting GOFLAGS
a bit too late, so it may run `go get` on the wrong mod file.
Fixes #573.
3 years ago
|
|
|
// comments and adding extra compiler directives to obfuscate position information.
|
|
|
|
func printFile(lpkg *listedPackage, file *ast.File) ([]byte, error) {
|
|
|
|
if lpkg.ToObfuscate {
|
work around another go/printer bug to fix andybalholm/brotli
When obfuscating the following piece of code:
func issue_573(s struct{ f int }) {
var _ *int = &s.f
/*x*/
}
the function body would roughly end up printed as:
we would roughly end up with:
var _ *int = &dZ4xYx3N
/*x*/.rbg1IM3V
Note that the /*x*/ comment got moved earlier in the source code.
This happens because the new identifiers are longer, so the printer
thinks that the selector now ends past the comment.
That would be fine - we don't really mind where comments end up,
because these non-directive comments end up being removed anyway.
However, the resulting syntax is wrong, as the period for the selector
must be on the first line rather than the second.
This is a go/printer bug that we should fix upstream,
but until then, we must work around it in Go 1.18.x and 1.19.x.
The fix is somewhat obvious in hindsight. To reduce the chances that
go/printer will trip over comments and produce invalid syntax,
get rid of most comments before we use the printer.
We still keep the removal of comments after printing,
since go/printer consumes some comments in ast.Node Doc fields.
Add the minimized unit test case above, and add the upstream project
that found this bug to check-third-party.
andybalholm/brotli helps cover a compression algorithm and ccgo code
generation from C to Go, and it's also a fairly popular module,
particular with HTTP implementations which want pure-Go brotli.
While here, fix the check-third-party script: it was setting GOFLAGS
a bit too late, so it may run `go get` on the wrong mod file.
Fixes #573.
3 years ago
|
|
|
// Omit comments from the final Go code.
|
|
|
|
// Keep directives, as they affect the build.
|
|
|
|
// We do this before printing to print fewer bytes below.
|
|
|
|
var newComments []*ast.CommentGroup
|
|
|
|
for _, group := range file.Comments {
|
|
|
|
var newGroup ast.CommentGroup
|
|
|
|
for _, comment := range group.List {
|
|
|
|
if strings.HasPrefix(comment.Text, "//go:") {
|
|
|
|
newGroup.List = append(newGroup.List, comment)
|
|
|
|
}
|
|
|
|
}
|
|
|
|
if len(newGroup.List) > 0 {
|
|
|
|
newComments = append(newComments, &newGroup)
|
|
|
|
}
|
|
|
|
}
|
|
|
|
file.Comments = newComments
|
|
|
|
}
|
rework the position obfuscator (#282)
First, rename line_obfuscator.go to position.go. We obfuscate filenames,
not just line numbers, and "obfuscator" is a bit redundant.
Second, use "/*line :x*/" comments rather than the "//line :x" form, as
the former allows us to insert them in any position without adding
unnecessary newlines. This will be important for changing the position
of call sites, which will be important for "garble reverse".
Third, do not rely on go/ast to remove and add comments. Since they are
free-floating, we can very easily end up with misplaced comments,
especially as the literal obfuscator heavily modifies the AST.
The new method prints and re-parses the file, to ensure all node
positions are consistent with a buffer, buf1. Then, we copy the contents
into a new buffer, buf2, while inserting the comments that we need.
The new method also modifies line numbers at the very end of obfuscating
a Go file, instead of at the very beginning. That's going to be more
robust long-term, as we will also obfuscate line numbers for any
additions or modifications to the AST.
Fourth, detachedDirectives is unnecessary, as we can accomplish the same
with two simple prefix matches.
Finally, this means we can stop using detachedComments entirely, as
printFile already inserts the comments we need.
For #5.
4 years ago
|
|
|
|
|
|
|
printBuf1.Reset()
|
work around another go/printer bug to fix andybalholm/brotli
When obfuscating the following piece of code:
func issue_573(s struct{ f int }) {
var _ *int = &s.f
/*x*/
}
the function body would roughly end up printed as:
we would roughly end up with:
var _ *int = &dZ4xYx3N
/*x*/.rbg1IM3V
Note that the /*x*/ comment got moved earlier in the source code.
This happens because the new identifiers are longer, so the printer
thinks that the selector now ends past the comment.
That would be fine - we don't really mind where comments end up,
because these non-directive comments end up being removed anyway.
However, the resulting syntax is wrong, as the period for the selector
must be on the first line rather than the second.
This is a go/printer bug that we should fix upstream,
but until then, we must work around it in Go 1.18.x and 1.19.x.
The fix is somewhat obvious in hindsight. To reduce the chances that
go/printer will trip over comments and produce invalid syntax,
get rid of most comments before we use the printer.
We still keep the removal of comments after printing,
since go/printer consumes some comments in ast.Node Doc fields.
Add the minimized unit test case above, and add the upstream project
that found this bug to check-third-party.
andybalholm/brotli helps cover a compression algorithm and ccgo code
generation from C to Go, and it's also a fairly popular module,
particular with HTTP implementations which want pure-Go brotli.
While here, fix the check-third-party script: it was setting GOFLAGS
a bit too late, so it may run `go get` on the wrong mod file.
Fixes #573.
3 years ago
|
|
|
printConfig := printer.Config{Mode: printer.RawFormat}
|
replace go/parser with go/scanner in printFile
printFile is one of the functions to blame for most of the CPU cost and
allocations for garble itself, as reported by `perf record` for a clean build.
One contributor is how we print each file and then parse it again,
which we did for the sake of inserting line directives correctly.
With a bit of care, we can do this by tokenizing after printing,
as opposed to parsing into a full go/ast again.
This is moderately cheaper, but more than anything, allocates far less.
That is to be expected given how go/ast is a tree of pointers,
whereas go/scanner simply gives us a stream of tokens.
name old time/op new time/op delta
Build-16 10.4s ± 2% 10.3s ± 1% ~ (p=0.393 n=10+10)
name old bin-B new bin-B delta
Build-16 5.51M ± 0% 5.51M ± 0% ~ (all equal)
name old cached-time/op new cached-time/op delta
Build-16 398ms ±12% 391ms ±10% ~ (p=0.529 n=10+10)
name old mallocs/op new mallocs/op delta
Build-16 34.4M ± 0% 31.8M ± 0% -7.65% (p=0.000 n=10+10)
name old sys-time/op new sys-time/op delta
Build-16 5.80s ± 6% 5.86s ± 4% ~ (p=0.218 n=10+10)
The new code is shorter, but perhaps a bit trickier,
so I also added more comments to explain what's going on.
Note how the time/op change is practically noise,
but mallocs/op goes down significantly, which is always a good sign.
3 years ago
|
|
|
if err := printConfig.Fprint(&printBuf1, fset, file); err != nil {
|
rework the position obfuscator (#282)
First, rename line_obfuscator.go to position.go. We obfuscate filenames,
not just line numbers, and "obfuscator" is a bit redundant.
Second, use "/*line :x*/" comments rather than the "//line :x" form, as
the former allows us to insert them in any position without adding
unnecessary newlines. This will be important for changing the position
of call sites, which will be important for "garble reverse".
Third, do not rely on go/ast to remove and add comments. Since they are
free-floating, we can very easily end up with misplaced comments,
especially as the literal obfuscator heavily modifies the AST.
The new method prints and re-parses the file, to ensure all node
positions are consistent with a buffer, buf1. Then, we copy the contents
into a new buffer, buf2, while inserting the comments that we need.
The new method also modifies line numbers at the very end of obfuscating
a Go file, instead of at the very beginning. That's going to be more
robust long-term, as we will also obfuscate line numbers for any
additions or modifications to the AST.
Fourth, detachedDirectives is unnecessary, as we can accomplish the same
with two simple prefix matches.
Finally, this means we can stop using detachedComments entirely, as
printFile already inserts the comments we need.
For #5.
4 years ago
|
|
|
return nil, err
|
|
|
|
}
|
|
|
|
src := printBuf1.Bytes()
|
rework the position obfuscator (#282)
First, rename line_obfuscator.go to position.go. We obfuscate filenames,
not just line numbers, and "obfuscator" is a bit redundant.
Second, use "/*line :x*/" comments rather than the "//line :x" form, as
the former allows us to insert them in any position without adding
unnecessary newlines. This will be important for changing the position
of call sites, which will be important for "garble reverse".
Third, do not rely on go/ast to remove and add comments. Since they are
free-floating, we can very easily end up with misplaced comments,
especially as the literal obfuscator heavily modifies the AST.
The new method prints and re-parses the file, to ensure all node
positions are consistent with a buffer, buf1. Then, we copy the contents
into a new buffer, buf2, while inserting the comments that we need.
The new method also modifies line numbers at the very end of obfuscating
a Go file, instead of at the very beginning. That's going to be more
robust long-term, as we will also obfuscate line numbers for any
additions or modifications to the AST.
Fourth, detachedDirectives is unnecessary, as we can accomplish the same
with two simple prefix matches.
Finally, this means we can stop using detachedComments entirely, as
printFile already inserts the comments we need.
For #5.
4 years ago
|
|
|
|
|
|
|
if !lpkg.ToObfuscate {
|
|
|
|
// We lightly transform packages which shouldn't be obfuscated,
|
|
|
|
// such as when rewriting go:linkname directives to obfuscated packages.
|
|
|
|
// We still need to print the files, but without obfuscating positions.
|
rework the position obfuscator (#282)
First, rename line_obfuscator.go to position.go. We obfuscate filenames,
not just line numbers, and "obfuscator" is a bit redundant.
Second, use "/*line :x*/" comments rather than the "//line :x" form, as
the former allows us to insert them in any position without adding
unnecessary newlines. This will be important for changing the position
of call sites, which will be important for "garble reverse".
Third, do not rely on go/ast to remove and add comments. Since they are
free-floating, we can very easily end up with misplaced comments,
especially as the literal obfuscator heavily modifies the AST.
The new method prints and re-parses the file, to ensure all node
positions are consistent with a buffer, buf1. Then, we copy the contents
into a new buffer, buf2, while inserting the comments that we need.
The new method also modifies line numbers at the very end of obfuscating
a Go file, instead of at the very beginning. That's going to be more
robust long-term, as we will also obfuscate line numbers for any
additions or modifications to the AST.
Fourth, detachedDirectives is unnecessary, as we can accomplish the same
with two simple prefix matches.
Finally, this means we can stop using detachedComments entirely, as
printFile already inserts the comments we need.
For #5.
4 years ago
|
|
|
return src, nil
|
|
|
|
}
|
|
|
|
|
replace go/parser with go/scanner in printFile
printFile is one of the functions to blame for most of the CPU cost and
allocations for garble itself, as reported by `perf record` for a clean build.
One contributor is how we print each file and then parse it again,
which we did for the sake of inserting line directives correctly.
With a bit of care, we can do this by tokenizing after printing,
as opposed to parsing into a full go/ast again.
This is moderately cheaper, but more than anything, allocates far less.
That is to be expected given how go/ast is a tree of pointers,
whereas go/scanner simply gives us a stream of tokens.
name old time/op new time/op delta
Build-16 10.4s ± 2% 10.3s ± 1% ~ (p=0.393 n=10+10)
name old bin-B new bin-B delta
Build-16 5.51M ± 0% 5.51M ± 0% ~ (all equal)
name old cached-time/op new cached-time/op delta
Build-16 398ms ±12% 391ms ±10% ~ (p=0.529 n=10+10)
name old mallocs/op new mallocs/op delta
Build-16 34.4M ± 0% 31.8M ± 0% -7.65% (p=0.000 n=10+10)
name old sys-time/op new sys-time/op delta
Build-16 5.80s ± 6% 5.86s ± 4% ~ (p=0.218 n=10+10)
The new code is shorter, but perhaps a bit trickier,
so I also added more comments to explain what's going on.
Note how the time/op change is practically noise,
but mallocs/op goes down significantly, which is always a good sign.
3 years ago
|
|
|
fsetFile := fset.File(file.Pos())
|
|
|
|
filename := filepath.Base(fsetFile.Name())
|
|
|
|
newPrefix := ""
|
|
|
|
if strings.HasPrefix(filename, "_cgo_") {
|
|
|
|
newPrefix = "_cgo_"
|
rework the position obfuscator (#282)
First, rename line_obfuscator.go to position.go. We obfuscate filenames,
not just line numbers, and "obfuscator" is a bit redundant.
Second, use "/*line :x*/" comments rather than the "//line :x" form, as
the former allows us to insert them in any position without adding
unnecessary newlines. This will be important for changing the position
of call sites, which will be important for "garble reverse".
Third, do not rely on go/ast to remove and add comments. Since they are
free-floating, we can very easily end up with misplaced comments,
especially as the literal obfuscator heavily modifies the AST.
The new method prints and re-parses the file, to ensure all node
positions are consistent with a buffer, buf1. Then, we copy the contents
into a new buffer, buf2, while inserting the comments that we need.
The new method also modifies line numbers at the very end of obfuscating
a Go file, instead of at the very beginning. That's going to be more
robust long-term, as we will also obfuscate line numbers for any
additions or modifications to the AST.
Fourth, detachedDirectives is unnecessary, as we can accomplish the same
with two simple prefix matches.
Finally, this means we can stop using detachedComments entirely, as
printFile already inserts the comments we need.
For #5.
4 years ago
|
|
|
}
|
|
|
|
|
|
|
|
// Many parts of garble, notably the literal obfuscator, modify the AST.
|
|
|
|
// Unfortunately, comments are free-floating in File.Comments,
|
|
|
|
// and those are the only source of truth that go/printer uses.
|
|
|
|
// So the positions of the comments in the given file are wrong.
|
replace go/parser with go/scanner in printFile
printFile is one of the functions to blame for most of the CPU cost and
allocations for garble itself, as reported by `perf record` for a clean build.
One contributor is how we print each file and then parse it again,
which we did for the sake of inserting line directives correctly.
With a bit of care, we can do this by tokenizing after printing,
as opposed to parsing into a full go/ast again.
This is moderately cheaper, but more than anything, allocates far less.
That is to be expected given how go/ast is a tree of pointers,
whereas go/scanner simply gives us a stream of tokens.
name old time/op new time/op delta
Build-16 10.4s ± 2% 10.3s ± 1% ~ (p=0.393 n=10+10)
name old bin-B new bin-B delta
Build-16 5.51M ± 0% 5.51M ± 0% ~ (all equal)
name old cached-time/op new cached-time/op delta
Build-16 398ms ±12% 391ms ±10% ~ (p=0.529 n=10+10)
name old mallocs/op new mallocs/op delta
Build-16 34.4M ± 0% 31.8M ± 0% -7.65% (p=0.000 n=10+10)
name old sys-time/op new sys-time/op delta
Build-16 5.80s ± 6% 5.86s ± 4% ~ (p=0.218 n=10+10)
The new code is shorter, but perhaps a bit trickier,
so I also added more comments to explain what's going on.
Note how the time/op change is practically noise,
but mallocs/op goes down significantly, which is always a good sign.
3 years ago
|
|
|
// The only way we can get the final ones is to tokenize again.
|
|
|
|
// Using go/scanner is slightly awkward, but cheaper than parsing again.
|
rework the position obfuscator (#282)
First, rename line_obfuscator.go to position.go. We obfuscate filenames,
not just line numbers, and "obfuscator" is a bit redundant.
Second, use "/*line :x*/" comments rather than the "//line :x" form, as
the former allows us to insert them in any position without adding
unnecessary newlines. This will be important for changing the position
of call sites, which will be important for "garble reverse".
Third, do not rely on go/ast to remove and add comments. Since they are
free-floating, we can very easily end up with misplaced comments,
especially as the literal obfuscator heavily modifies the AST.
The new method prints and re-parses the file, to ensure all node
positions are consistent with a buffer, buf1. Then, we copy the contents
into a new buffer, buf2, while inserting the comments that we need.
The new method also modifies line numbers at the very end of obfuscating
a Go file, instead of at the very beginning. That's going to be more
robust long-term, as we will also obfuscate line numbers for any
additions or modifications to the AST.
Fourth, detachedDirectives is unnecessary, as we can accomplish the same
with two simple prefix matches.
Finally, this means we can stop using detachedComments entirely, as
printFile already inserts the comments we need.
For #5.
4 years ago
|
|
|
|
|
|
|
// We want to use the original positions for the hashed positions.
|
replace go/parser with go/scanner in printFile
printFile is one of the functions to blame for most of the CPU cost and
allocations for garble itself, as reported by `perf record` for a clean build.
One contributor is how we print each file and then parse it again,
which we did for the sake of inserting line directives correctly.
With a bit of care, we can do this by tokenizing after printing,
as opposed to parsing into a full go/ast again.
This is moderately cheaper, but more than anything, allocates far less.
That is to be expected given how go/ast is a tree of pointers,
whereas go/scanner simply gives us a stream of tokens.
name old time/op new time/op delta
Build-16 10.4s ± 2% 10.3s ± 1% ~ (p=0.393 n=10+10)
name old bin-B new bin-B delta
Build-16 5.51M ± 0% 5.51M ± 0% ~ (all equal)
name old cached-time/op new cached-time/op delta
Build-16 398ms ±12% 391ms ±10% ~ (p=0.529 n=10+10)
name old mallocs/op new mallocs/op delta
Build-16 34.4M ± 0% 31.8M ± 0% -7.65% (p=0.000 n=10+10)
name old sys-time/op new sys-time/op delta
Build-16 5.80s ± 6% 5.86s ± 4% ~ (p=0.218 n=10+10)
The new code is shorter, but perhaps a bit trickier,
so I also added more comments to explain what's going on.
Note how the time/op change is practically noise,
but mallocs/op goes down significantly, which is always a good sign.
3 years ago
|
|
|
// Since later we'll iterate on tokens rather than walking an AST,
|
|
|
|
// we use a list of offsets indexed by identifiers in source order.
|
|
|
|
var origCallOffsets []int
|
|
|
|
nextOffset := -1
|
|
|
|
ast.Inspect(file, func(node ast.Node) bool {
|
|
|
|
switch node := node.(type) {
|
|
|
|
case *ast.CallExpr:
|
|
|
|
nextOffset = fsetFile.Position(node.Pos()).Offset
|
|
|
|
case *ast.Ident:
|
|
|
|
origCallOffsets = append(origCallOffsets, nextOffset)
|
|
|
|
nextOffset = -1
|
|
|
|
}
|
|
|
|
return true
|
|
|
|
})
|
rework the position obfuscator (#282)
First, rename line_obfuscator.go to position.go. We obfuscate filenames,
not just line numbers, and "obfuscator" is a bit redundant.
Second, use "/*line :x*/" comments rather than the "//line :x" form, as
the former allows us to insert them in any position without adding
unnecessary newlines. This will be important for changing the position
of call sites, which will be important for "garble reverse".
Third, do not rely on go/ast to remove and add comments. Since they are
free-floating, we can very easily end up with misplaced comments,
especially as the literal obfuscator heavily modifies the AST.
The new method prints and re-parses the file, to ensure all node
positions are consistent with a buffer, buf1. Then, we copy the contents
into a new buffer, buf2, while inserting the comments that we need.
The new method also modifies line numbers at the very end of obfuscating
a Go file, instead of at the very beginning. That's going to be more
robust long-term, as we will also obfuscate line numbers for any
additions or modifications to the AST.
Fourth, detachedDirectives is unnecessary, as we can accomplish the same
with two simple prefix matches.
Finally, this means we can stop using detachedComments entirely, as
printFile already inserts the comments we need.
For #5.
4 years ago
|
|
|
|
|
|
|
copied := 0
|
|
|
|
printBuf2.Reset()
|
|
|
|
|
|
|
|
// Make sure the entire file gets a zero filename by default,
|
|
|
|
// in case we miss any positions below.
|
|
|
|
// We use a //-style comment, because there might be build tags.
|
|
|
|
fmt.Fprintf(&printBuf2, "//line %s:1\n", newPrefix)
|
|
|
|
|
replace go/parser with go/scanner in printFile
printFile is one of the functions to blame for most of the CPU cost and
allocations for garble itself, as reported by `perf record` for a clean build.
One contributor is how we print each file and then parse it again,
which we did for the sake of inserting line directives correctly.
With a bit of care, we can do this by tokenizing after printing,
as opposed to parsing into a full go/ast again.
This is moderately cheaper, but more than anything, allocates far less.
That is to be expected given how go/ast is a tree of pointers,
whereas go/scanner simply gives us a stream of tokens.
name old time/op new time/op delta
Build-16 10.4s ± 2% 10.3s ± 1% ~ (p=0.393 n=10+10)
name old bin-B new bin-B delta
Build-16 5.51M ± 0% 5.51M ± 0% ~ (all equal)
name old cached-time/op new cached-time/op delta
Build-16 398ms ±12% 391ms ±10% ~ (p=0.529 n=10+10)
name old mallocs/op new mallocs/op delta
Build-16 34.4M ± 0% 31.8M ± 0% -7.65% (p=0.000 n=10+10)
name old sys-time/op new sys-time/op delta
Build-16 5.80s ± 6% 5.86s ± 4% ~ (p=0.218 n=10+10)
The new code is shorter, but perhaps a bit trickier,
so I also added more comments to explain what's going on.
Note how the time/op change is practically noise,
but mallocs/op goes down significantly, which is always a good sign.
3 years ago
|
|
|
// We use an empty filename when tokenizing below.
|
|
|
|
// We use a nil go/scanner.ErrorHandler because src comes from go/printer.
|
|
|
|
// Syntax errors should be rare, and when they do happen,
|
|
|
|
// we don't want to point to the original source file on disk.
|
|
|
|
// That would be confusing, as we've changed the source in memory.
|
|
|
|
var s scanner.Scanner
|
|
|
|
fsetFile = fset.AddFile("", fset.Base(), len(src))
|
|
|
|
s.Init(fsetFile, src, nil, scanner.ScanComments)
|
|
|
|
|
|
|
|
identIndex := 0
|
|
|
|
for {
|
|
|
|
pos, tok, lit := s.Scan()
|
|
|
|
switch tok {
|
|
|
|
case token.EOF:
|
|
|
|
// Copy the rest and return.
|
|
|
|
printBuf2.Write(src[copied:])
|
|
|
|
return printBuf2.Bytes(), nil
|
|
|
|
case token.COMMENT:
|
work around another go/printer bug to fix andybalholm/brotli
When obfuscating the following piece of code:
func issue_573(s struct{ f int }) {
var _ *int = &s.f
/*x*/
}
the function body would roughly end up printed as:
we would roughly end up with:
var _ *int = &dZ4xYx3N
/*x*/.rbg1IM3V
Note that the /*x*/ comment got moved earlier in the source code.
This happens because the new identifiers are longer, so the printer
thinks that the selector now ends past the comment.
That would be fine - we don't really mind where comments end up,
because these non-directive comments end up being removed anyway.
However, the resulting syntax is wrong, as the period for the selector
must be on the first line rather than the second.
This is a go/printer bug that we should fix upstream,
but until then, we must work around it in Go 1.18.x and 1.19.x.
The fix is somewhat obvious in hindsight. To reduce the chances that
go/printer will trip over comments and produce invalid syntax,
get rid of most comments before we use the printer.
We still keep the removal of comments after printing,
since go/printer consumes some comments in ast.Node Doc fields.
Add the minimized unit test case above, and add the upstream project
that found this bug to check-third-party.
andybalholm/brotli helps cover a compression algorithm and ccgo code
generation from C to Go, and it's also a fairly popular module,
particular with HTTP implementations which want pure-Go brotli.
While here, fix the check-third-party script: it was setting GOFLAGS
a bit too late, so it may run `go get` on the wrong mod file.
Fixes #573.
3 years ago
|
|
|
// Omit comments from the final Go code, again.
|
|
|
|
// Before we removed the comments from file.Comments,
|
|
|
|
// but go/printer also grabs comments from some Doc ast.Node fields.
|
|
|
|
// TODO: is there an easy way to filter all comments at once?
|
replace go/parser with go/scanner in printFile
printFile is one of the functions to blame for most of the CPU cost and
allocations for garble itself, as reported by `perf record` for a clean build.
One contributor is how we print each file and then parse it again,
which we did for the sake of inserting line directives correctly.
With a bit of care, we can do this by tokenizing after printing,
as opposed to parsing into a full go/ast again.
This is moderately cheaper, but more than anything, allocates far less.
That is to be expected given how go/ast is a tree of pointers,
whereas go/scanner simply gives us a stream of tokens.
name old time/op new time/op delta
Build-16 10.4s ± 2% 10.3s ± 1% ~ (p=0.393 n=10+10)
name old bin-B new bin-B delta
Build-16 5.51M ± 0% 5.51M ± 0% ~ (all equal)
name old cached-time/op new cached-time/op delta
Build-16 398ms ±12% 391ms ±10% ~ (p=0.529 n=10+10)
name old mallocs/op new mallocs/op delta
Build-16 34.4M ± 0% 31.8M ± 0% -7.65% (p=0.000 n=10+10)
name old sys-time/op new sys-time/op delta
Build-16 5.80s ± 6% 5.86s ± 4% ~ (p=0.218 n=10+10)
The new code is shorter, but perhaps a bit trickier,
so I also added more comments to explain what's going on.
Note how the time/op change is practically noise,
but mallocs/op goes down significantly, which is always a good sign.
3 years ago
|
|
|
if strings.HasPrefix(lit, "//go:") {
|
|
|
|
continue // directives are kept
|
|
|
|
}
|
|
|
|
offset := fsetFile.Position(pos).Offset
|
|
|
|
printBuf2.Write(src[copied:offset])
|
|
|
|
copied = offset + len(lit)
|
|
|
|
case token.IDENT:
|
|
|
|
origOffset := origCallOffsets[identIndex]
|
|
|
|
identIndex++
|
|
|
|
if origOffset == -1 {
|
|
|
|
continue // identifiers which don't start func calls are left untouched
|
|
|
|
}
|
|
|
|
newName := ""
|
|
|
|
if !flagTiny {
|
|
|
|
origPos := fmt.Sprintf("%s:%d", filename, origOffset)
|
obfuscate all names used in reflection
Go code can retrieve and use field and method names via the `reflect` package.
For that reason, historically we did not obfuscate names of fields and methods
underneath types that we detected as used for reflection, via e.g. `reflect.TypeOf`.
However, that caused a number of issues. Since we obfuscate and build one package
at a time, we could only detect when types were used for reflection in their own package
or in upstream packages. Use of reflection in downstream packages would be detected
too late, causing one package to obfuscate the names and the other not to, leading to a build failure.
A different approach is implemented here. All names are obfuscated now, but we collect
those types used for reflection, and at the end of a build in `package main`,
we inject a function into the runtime's `internal/abi` package to reverse the obfuscation
for those names which can be used for reflection.
This does mean that the obfuscation for these names is very weak, as the binary
contains a one-to-one mapping to their original names, but they cannot be obfuscated
without breaking too many Go packages out in the wild. There is also some amount
of overhead in `internal/abi` due to this, but we aim to make the overhead insignificant.
Fixes #884, #799, #817, #881, #858, #843, #842
Closes #406
4 months ago
|
|
|
newName = hashWithPackage(lpkg, origPos) + ".go"
|
replace go/parser with go/scanner in printFile
printFile is one of the functions to blame for most of the CPU cost and
allocations for garble itself, as reported by `perf record` for a clean build.
One contributor is how we print each file and then parse it again,
which we did for the sake of inserting line directives correctly.
With a bit of care, we can do this by tokenizing after printing,
as opposed to parsing into a full go/ast again.
This is moderately cheaper, but more than anything, allocates far less.
That is to be expected given how go/ast is a tree of pointers,
whereas go/scanner simply gives us a stream of tokens.
name old time/op new time/op delta
Build-16 10.4s ± 2% 10.3s ± 1% ~ (p=0.393 n=10+10)
name old bin-B new bin-B delta
Build-16 5.51M ± 0% 5.51M ± 0% ~ (all equal)
name old cached-time/op new cached-time/op delta
Build-16 398ms ±12% 391ms ±10% ~ (p=0.529 n=10+10)
name old mallocs/op new mallocs/op delta
Build-16 34.4M ± 0% 31.8M ± 0% -7.65% (p=0.000 n=10+10)
name old sys-time/op new sys-time/op delta
Build-16 5.80s ± 6% 5.86s ± 4% ~ (p=0.218 n=10+10)
The new code is shorter, but perhaps a bit trickier,
so I also added more comments to explain what's going on.
Note how the time/op change is practically noise,
but mallocs/op goes down significantly, which is always a good sign.
3 years ago
|
|
|
// log.Printf("%q hashed with %x to %q", origPos, curPkg.GarbleActionID, newName)
|
|
|
|
}
|
|
|
|
|
replace go/parser with go/scanner in printFile
printFile is one of the functions to blame for most of the CPU cost and
allocations for garble itself, as reported by `perf record` for a clean build.
One contributor is how we print each file and then parse it again,
which we did for the sake of inserting line directives correctly.
With a bit of care, we can do this by tokenizing after printing,
as opposed to parsing into a full go/ast again.
This is moderately cheaper, but more than anything, allocates far less.
That is to be expected given how go/ast is a tree of pointers,
whereas go/scanner simply gives us a stream of tokens.
name old time/op new time/op delta
Build-16 10.4s ± 2% 10.3s ± 1% ~ (p=0.393 n=10+10)
name old bin-B new bin-B delta
Build-16 5.51M ± 0% 5.51M ± 0% ~ (all equal)
name old cached-time/op new cached-time/op delta
Build-16 398ms ±12% 391ms ±10% ~ (p=0.529 n=10+10)
name old mallocs/op new mallocs/op delta
Build-16 34.4M ± 0% 31.8M ± 0% -7.65% (p=0.000 n=10+10)
name old sys-time/op new sys-time/op delta
Build-16 5.80s ± 6% 5.86s ± 4% ~ (p=0.218 n=10+10)
The new code is shorter, but perhaps a bit trickier,
so I also added more comments to explain what's going on.
Note how the time/op change is practically noise,
but mallocs/op goes down significantly, which is always a good sign.
3 years ago
|
|
|
offset := fsetFile.Position(pos).Offset
|
|
|
|
printBuf2.Write(src[copied:offset])
|
|
|
|
copied = offset
|
|
|
|
|
|
|
|
// We use the "/*text*/" form, since we can use multiple of them
|
|
|
|
// on a single line, and they don't require extra newlines.
|
|
|
|
// Make sure there is whitespace at either side of a comment.
|
|
|
|
// Otherwise, we could change the syntax of the program.
|
|
|
|
// Inserting "/*text*/" in "a/b" // must be "a/ /*text*/ b",
|
|
|
|
// as "a//*text*/b" is tokenized as a "//" comment.
|
|
|
|
fmt.Fprintf(&printBuf2, " /*line %s%s:1*/ ", newPrefix, newName)
|
replace go/parser with go/scanner in printFile
printFile is one of the functions to blame for most of the CPU cost and
allocations for garble itself, as reported by `perf record` for a clean build.
One contributor is how we print each file and then parse it again,
which we did for the sake of inserting line directives correctly.
With a bit of care, we can do this by tokenizing after printing,
as opposed to parsing into a full go/ast again.
This is moderately cheaper, but more than anything, allocates far less.
That is to be expected given how go/ast is a tree of pointers,
whereas go/scanner simply gives us a stream of tokens.
name old time/op new time/op delta
Build-16 10.4s ± 2% 10.3s ± 1% ~ (p=0.393 n=10+10)
name old bin-B new bin-B delta
Build-16 5.51M ± 0% 5.51M ± 0% ~ (all equal)
name old cached-time/op new cached-time/op delta
Build-16 398ms ±12% 391ms ±10% ~ (p=0.529 n=10+10)
name old mallocs/op new mallocs/op delta
Build-16 34.4M ± 0% 31.8M ± 0% -7.65% (p=0.000 n=10+10)
name old sys-time/op new sys-time/op delta
Build-16 5.80s ± 6% 5.86s ± 4% ~ (p=0.218 n=10+10)
The new code is shorter, but perhaps a bit trickier,
so I also added more comments to explain what's going on.
Note how the time/op change is practically noise,
but mallocs/op goes down significantly, which is always a good sign.
3 years ago
|
|
|
}
|
rework the position obfuscator (#282)
First, rename line_obfuscator.go to position.go. We obfuscate filenames,
not just line numbers, and "obfuscator" is a bit redundant.
Second, use "/*line :x*/" comments rather than the "//line :x" form, as
the former allows us to insert them in any position without adding
unnecessary newlines. This will be important for changing the position
of call sites, which will be important for "garble reverse".
Third, do not rely on go/ast to remove and add comments. Since they are
free-floating, we can very easily end up with misplaced comments,
especially as the literal obfuscator heavily modifies the AST.
The new method prints and re-parses the file, to ensure all node
positions are consistent with a buffer, buf1. Then, we copy the contents
into a new buffer, buf2, while inserting the comments that we need.
The new method also modifies line numbers at the very end of obfuscating
a Go file, instead of at the very beginning. That's going to be more
robust long-term, as we will also obfuscate line numbers for any
additions or modifications to the AST.
Fourth, detachedDirectives is unnecessary, as we can accomplish the same
with two simple prefix matches.
Finally, this means we can stop using detachedComments entirely, as
printFile already inserts the comments we need.
For #5.
4 years ago
|
|
|
}
|
|
|
|
}
|