Extracting Go Embeds

Backdrop

I was recently working on a go based web app which used a golang feature called embeds,
So what exactly is go-embed / what does it do ?, imagine you had to distribute/share a application
which consists of a executable and multiple auxiliary files (say images, language support files, etc.), how would you distribute such a application ?

	Typical App Folder :
			app-executable		--> Run by user
			image0.png			|
			image1.png			|-> Never Directly accessed by the user, but must be shipped 
			audio.mp3			|	with the app

the most obvious solution is to zip/tar the executable along with all the necessary files
and then hand it out to users, but extracting files would be an additional effort on the users end (especially when the user is never going to directly access all those files). Go’s embed feature saves us from this fate by providing a pseudo file system which can hold necessary files while enabling programmatic access to them, also the entire pseudo file system resides within the binary.

	App Folder while using Go Embed :
			app-executable		--> One binary which has all necessary files embedded
									within it.
	

		ELF-structure of app-executable
	|===================================|
	|.data    .....						|
	|===================================|
	|.rodata							|
	| <	Pesudo  Filesystem >			|
	|===================================|
	|.text    .....						|
	|===================================|

Note that files embedded using Go embed are read only.

Now you must wonder whats the big deal, just run strings on the binary and the view content, yea sure you can do that if want to spend the rest of you day reading through clobbered output, but our aim in this paper is to explore how we can extract embedded files from any arbitrary go executable, while preserving the original directory structure.

But why ?

All right i’ll throw you a carrot,

  1. People think embeds are a way to “secure their application files” - BooHoo! - have to prove them wrong.
  2. Anyone with some mal-dev experience would immediately see the potential of using embeds
    as a packer, and yes it is being used that way in the wild.
  3. Golang is becoming popular, so i guess this is a part of the community’s effort to better understand it.

Understanding how embeds work

Lets start with an example, Our apps folder structure looks something like this.

AppFolder:
	misc:
		another_dir:
			- sample.txt
		- text1.txt
		- text2.txt
		- text3.txt
	main.go

we’ll now write a simple go program to embed all files within the misc directory

Lets start with main.go

package main

import (
	"embed"
	"fmt"
	"log"
)

//go:embed misc
var embedFiles embed.FS  // 1

func main() {
	content, err := embedFiles.ReadFile("misc/text.txt") // 2
	if err != nil {
		log.Fatal(err)
	}
	fmt.Println(string(content))
}
  1. we declare a variable of type emebed.FS this is essentially our container of files
    embed.FS structure looks something like this:
    type FS struct{
    	files *files[]
    }
    
    type file struct{
    	name string
    	data string
    	hash [16]byte
    }
    
    it holds a array of file structures, which in-turn holds the name of a file, its content(data) and a SHA-256 hash of the file content truncated to 16 bytes, data and hash are nil for a directory, name holds the full path to a file in the embedded folder.
    A directory entry would look like this:
    file{
    	name  = "another_dir/"    // note the '/' suffix
    	data  = nil
    	hash  = nil
    }
    
    A file entry would look like this:
    file{
    	name  = "another_dir/sample.txt"
    	data  = [10,10,10,10,10...]  // files bytes
    	hash  = [1,2,3,4,5,6..]		 // file hash
    }
    
  2. ReadFile, ReadDir are methods that operate on the embed variable.
  1. Now you must have noticed the peculiar comment //go:embed well its not just a comment, its a special go directive which during compilation (yes go parses certain comments :slight_smile: ) asks the go compiler to bind all files within the misc folder to the emebedFiles variable. The Go compilers source has a function called WriteEmbed at src/cmd/compile/internal/staticdata/embed.go which performs this magic.

Lets quickly compile the program we wrote above by running go build main.go and run it ./main, this should print the contents of the misc/text.txt file.

There you go, now we have successfully embedded files within a single executable.

The Embed Structure In A Compiled Binary

Now that we have compiled our app, lets throw it against a decompiler and look at how the embed structure looks once compiled.

We know that our main function references the embed structure(see main.go above), so it should give us a clue about where the structure lies within the binary:

We notice the ReadFile call and embedFiles being passed to it as a argument, thats our embed structure lets take a look at it.

we see a chunk of bytes, but its not immediately clear what it is, but we know that the Go compilers WriteEmbed function has something to do with this, lets take a look at WriteEmbed, (don’t panic, ill explain it line by line)

  • Line 8 looks like a pointer, comments indicate that its a pointer to []files, which means it is a pointer to first file in the embed structure (pointer to array and pointer to first element of array are the same thing!).
  • Line 9 and 10 indicate the no of files
  • Line 19 iterates over files to be embedded
  • Line 20 is a pointer to the file name (full path of file)
  • Line 21 seems to represent the length of the filename
  • Line 22 determines if the file is actually a directory, if it is, the file content pointer and hash is nulled out, otherwise the else block in Line 27 fills them appropriately.

Lets take a look at that chunk of data again

it starts to make sense,

  • The Red block is pointer to the first file entry (i.e the green block), since we are dealing with little endian architecture we have to reverse the red blocks contents to obtain the actual pointer which would be 0x004ca138.
  • The two Blue blocks represent the no of files in the Go embed entry
  • The Green block is a file entry
    • Its first 8 bytes are a pointer to the file name (underlined in blue)
    • Subsequent 8+8 bytes represent length of the file name (underlined in blue)
    • Next 8 bytes are a pointer to the file contents (underlined in red)
    • Subsequent 8 bytes represent length of the file contents (underlined in red)
    • Last 16 bytes are the truncated sha256 hash of the file contents (underlined in yellow)

if you were paying close attention you’d notice that, file contents pointer and hash in the above case was nulled out, that’s because its a directory, entries for regular files will have a appropriate pointer values, like in the following example:

  • File content pointer is highlighted in yellow
  • File hash is highlighted in blue

Note: From the above information we can deduce that each file entry is 54 bytes long (for a 64 bit binary), for a 32 bit binary it’ll be 32 bytes long, this information will be useful in navigating the embed structure and building automated solutions for extraction.

Extracting file names and content

Lets put together what we have learned so far and try extracting a file, (for now lets do this manually to understand the process, a python based solution is made available at the end this paper).

We know the layout of the embed structure and we have the filename and file content pointers, these pointers are mere offsets in the binary where the file contents are located, but we cannot use these pointers as-is, we must subtract it from the ELF base address to obtain the actual offsets where the filename and file content are located.

Note: ELF base address is dependent on arch bits, for 64 bits its 0x400000, for 32 bits its 0x08048000

So for this demonstration ill be picking the third entry in the embed struct.

Reading the filename:

  • First 8 bytes are the pointer to file name (underlined in red), it can be read as 0x004a9f09 (its little endian remember)

  • Next 8 bytes represent the length of the file name (underlined in blue), it can be read as 0x0e which is 14 in decimal.

  • Extracting the name :

  • To calculate the actual offset in file we subtract the address from the ELF base address, once we obtain the offset we use dd to read 14 bytes(length of the filename) starting from the said offset.

Lets do the same thing for file contents:

  • Pointer to file contents is 0x004a89ef (underlined in pink).

  • Length of the file contents is 0x06 (underlined in yellow).

  • Extracting contents :

  • and now we have the file contents as well.

For obvious reasons this is a tedious process to perform manually, so i wrote a tool called Gembe which automates the extraction process, you only have to supply the embed structures address.

Closing Thoughts

We discussed the structure of the go embed structure and how we can identify the offsets of the embedded files and extract them. Its also worth noting that the embed stucture is go specific it won’t change across different platforms, so any extraction tooling that we build can be easily adopted to many platforms. The extraction utility i mentioned above(Gembe) requires some manual effort (i.e finding the address of the embed structure with a decompiler), it is possible to automate this process as well, my best guess is that something like capstone could be used to achieve this.

References

  1. embed package - embed - Go Packages
  2. https://github.com/golang/go/blob/master/src/cmd/compile/internal/staticdata/embed.go#L107

Resources

  1. GitHub - messede-degod/Gembe: Tool to extract Go Embeds
5 Likes

Great post on my favorite language!

1 Like

hello. I have a question regarding a previous post. Is there a way I can send you a DM? Please.

This topic was automatically closed after 121 days. New replies are no longer allowed.