The Next-Generation Universal Hlang compiler

Published on , 1453 words, 6 minutes to read

In a world where simple tasks have hundreds of dependencies and most of them are not documented, everything falls to chaos. The monolithigarchy dictates that your build times must be slow so that They (the dependocracy) can win over your hearts and minds with video games that you play during your compile times. One person gets mad about their string padding library being used by corporations without paying and then the entire internet explodes for a few days. This is unsustainable.

hlang is the sledgehammer that will break down this complexity and deliver you a truly uncompromised development experience.

Numa is delet
<Numa>

You can't spell sledgehammer without h!

If none of this is making any sense, please read the rest of the series. This will hopefully help something make sense.

Numa is delet
<Numa>

If you need even more context, check this page for more information.

There was one major flaw with hlang in the past though. It was a hollow shell of itself and had rot to the slains and arrows of time. The playground stopped working, so people could not understand the sheer might of hlang by playing with it.

Lo, behold, a new compiler was born. In this article, I will describe the nguh compiler and how it revolutionizes the ways that you use hlang for both professional and personal uses.

Mara is wat
<Mara>

Wait, what, there were professional users of hlang???

Numa is delet
<Numa>

Having 2 years of hlang on your resume will let you get hired by Google!

The Old Compiler

The old compiler was a HACK. The main way it worked was by feeding the program source code as a string to this Go template:

(module
 (import "h" "h" (func $h (param i32)))
 (func $h_main
       (local i32 i32 i32)
       (local.set 0 (i32.const 10))
       (local.set 1 (i32.const 104))
       (local.set 2 (i32.const 39))
       {{ range . -}}
       {{ if eq . 32 -}}
       (call $h (get_local 0))
       {{ end -}}
       {{ if eq . 104 -}}
       (call $h (get_local 1))
       {{ end -}}
       {{ if eq . 39 -}}
       (call $h (get_local 2))
       {{ end -}}
       {{ end -}}
       (call $h (get_local 0))
 )
 (export "h" (func $h_main))
)

This template worked by taking the program input as a string and looping over each character to decide what to do. If it was a space, it would print a newline. If it was an h, it would print h. If it was a ', it would print a '. Anything else is ignored.

However, this means that the parser was mostly ignored. And the parser spec compiles to 117 bytes when gzipped, which means that it can fit on a tshirt.

Numa is delet
<Numa>

That's a savings of 0.8475%!

Additionally, this would then use the command wat2wasm to compile it to a WebAssembly file instead of doing it directly. This combined with the fact that the get_local instruction was renamed to local.get in the text format some time in the last 2 years means that not only was my compiler hacky, it didn't work anymore.

Mara is hacker
<Mara>

Apparently that was renamed before WASM hit 1.0 and the legacy name was an alias they planned to remove. Guess who didn't get the memo!

Needless to say, this could be fixed by doing a simple s/get_local/local\.get/g on the source file, but that's not fun. You know what's really fun? Reverse-engineering a binary file on stream and reassembling an identical replica in code. That's fun.

The nguh compiler

On December 31st, 2022, I wrote the nguh compiler on stream. The nguh (nguh gives u hlang or Next-Generation Universal Hlang compiler, whichever you prefer) compiler outputs WebAssembly bytecode directly instead of using wat2wasm as a middleman.

Mara is happy
<Mara>

This means that hlang has even fewer dependencies!

nguh is supposed to be pronounced with the final sound of -ing and uh smashed together. It is not phonetically valid in English. It will take some practice to say it correctly. I'm not sorry. If you can read IPA, it's pronounced /ŋə/. The name comes from the youtuber Agma Schwa's show about conlangs named /ŋə/.

To help you understand the architecture of nguh, it will be helpful to get some context about how WebAssembly files work.

How WebAssembly files work

What is WebAssembly?

WebAssembly is a standard that specifies a way to run programs on arbitrary hardware in a sandboxed way. It is used mainly in web browsers to power things like YouTube's player component, Twitch stream viewing, and by developers any time they need to put a block of code into a website without having to rewrite it in JavaScript.

I'm part of a slowly growing group of developers that want to run WebAssembly code on the server so that you can take the same .wasm file and run it on any hardware without having to have the source code and a working compiler setup.

hlang is compiled to WebAssembly for no reason in particular.

At a high level, a WebAssembly module has a bunch of sections in it. Each section contains information for things like what functions the module exports, the types of imported fuctions, how much memory the module needs, what should be in memory by default, and the function bodies for your code. Here's an annotated disassembly of a hlang binary:

0x00, 0x61, 0x73, 0x6d, // \0asm wasm magic number
0x01, 0x00, 0x00, 0x00, // version 1

0x01, // type section
0x08, // 8 bytes long
0x02, // 2 entries
0x60, 0x01, 0x7f, 0x00, // function type 0, 1 i32 param, 0 return
0x60, 0x00, 0x00, // function type 1, 0 param, 0 return

0x02, // import section
0x07, // 7 bytes long
0x01, // 1 entry
0x01, 0x68, // module h
0x01, 0x68, // name h
0x00, // type index
0x00, // function number

0x03, // func section
0x02, // 2 bytes long
0x01, // function 1
0x01, // type 1

0x07, // export section
0x05, // 5 bytes long
0x01, // 1 entry
0x01, 0x68, // "h"
0x00, 0x01, // function 1

0x0a, // code section
0x1b, // 27 bytes long
0x01, // 1 entry
0x19, // 25 bytes long
0x01, // 1 local declaration
0x03, 0x7f, // 3 i32 values - (local i32 i32 i32)
0x41, 0x0a, // i32.const 10 (newline)
0x21, 0x00, // local.set 0
0x41, 0xe8, 0x00, // i32.const 104 (h)
0x21, 0x01, // local.set 1
0x41, 0x27, // i32.const 39 (')
0x21, 0x02, // local.set 2
0x20, 0x01, // local.get 1 push h
0x10, 0x00, // call 0 (putchar)
0x20, 0x00, // local.get 0 push newline
0x10, 0x00, // call 0 (putchar)
0x0b // end of function

At a high level, nguh just takes all the needed sections and puts them in the target binary. Most of the sections are copied verbatim from that disassembly I pasted above because they don't need any modification for the binary to work.

The exciting part happens when the individual nodes in the hlang syntax tree get compiled to WebAssembly bytecode. Each node in the tree has maybe its character to print and maybe a list of child nodes. A syntax tree for hlang could look like this if it has one character in the program:

input: h
H("h")

Or it could look like this if there are multiple characters in the program:

input: h h h
H{
	"h",
	"h",
	"h",
}

This means I need something like this:

// compile AST to wasm
if len(tree.Kids) == 0 {
    if err := compileOneNode(funcBuf, tree); err != nil {
        return nil, err
    }
} else {
    for _, node := range tree.Kids {
        if err := compileOneNode(funcBuf, node); err != nil {
            return nil, err
        }
    }
}

This will either read from the root of the tree or all of the tree's children in order to compile the entire program. The compileOneNode function will turn the text associated with the node into the correlating WASM bytecode (pushing the relevant character to the stack and then calling the h.h (putchar) function).

Finally it will generate the end of the function including a trailing newline and end the .wasm file.

Mara is hacker
<Mara>

Fun fact: the generated binary for a hlang program that only prints h is 69 bytes.

Numa is delet
<Numa>

NICE!

Here is a base-64 encoded hlang binary in case you find this interesting:

AGFzbQEAAAABCAJgAX8AYAAAAgcBAWgBaAAAAwIB
AQcFAQFoAAEKHQEbAQN/QQohAEHoACEBQSchAiAB
EAAgABAAAQEL

If you want to play with hlang, head to its new home at h.within.lgbt. If you want to witness things such as this being created live, follow me on twitch or on my VTuber business account at @xe@vt.social.

Cadey is enby
<Cadey>

Happy new year to those that celebrate!


Facts and circumstances may have changed since publication. Please contact me before jumping to conclusions if something seems wrong or unclear.

Tags: hlang, wasm