Cadey is coffee
<Cadey> Hello! Thank you for visiting my website. You seem to be using an ad-blocker. I understand why you do this, but I'd really appreciate if it you would turn it off for my website. These ads help pay for running the website and are done by Ethical Ads. I do not receive detailed analytics on the ads and from what I understand neither does Ethical Ads. If you don't want to disable your ad blocker, please consider donating on Patreon or sending some extra cash to xeiaso.eth or 0xeA223Ca8968Ca59e0Bc79Ba331c2F6f636A3fB82. It helps fund the website's hosting bills and pay for the expensive technical editor that I use for my longer articles. Thanks and be well!

The Next-Generation Universal Hlang compiler

Read time in minutes: 11

In a world where simple tasks have hundreds of dependencies and most of them are not documented, everything falls to chaos. The monolithigarchy dictates that your build times must be slow so that They (the dependocracy) can win over your hearts and minds with video games that you play during your compile times. One person gets mad about their string padding library being used by corporations without paying and then the entire internet explodes for a few days. This is unsustainable.

hlang is the sledgehammer that will break down this complexity and deliver you a truly uncompromised development experience.

Numa is delet
<Numa> You can't spell sledgehammer without h!

If none of this is making any sense, please read the rest of the series. This will hopefully help something make sense.

Numa is delet
<Numa> If you need even more context, check this page for more information.

There was one major flaw with hlang in the past though. It was a hollow shell of itself and had rot to the slains and arrows of time. The playground stopped working, so people could not understand the sheer might of hlang by playing with it.

Lo, behold, a new compiler was born. In this article, I will describe the nguh compiler and how it revolutionizes the ways that you use hlang for both professional and personal uses.

Mara is wat
<Mara> Wait, what, there were professional users of hlang???

Numa is delet
<Numa> Having 2 years of hlang on your resume will let you get hired by Google!

The Old Compiler

The old compiler was a HACK. The main way it worked was by feeding the program source code as a string to this Go template:

(module
 (import "h" "h" (func $h (param i32)))
 (func $h_main
       (local i32 i32 i32)
       (local.set 0 (i32.const 10))
       (local.set 1 (i32.const 104))
       (local.set 2 (i32.const 39))
       {{ range . -}}
       {{ if eq . 32 -}}
       (call $h (get_local 0))
       {{ end -}}
       {{ if eq . 104 -}}
       (call $h (get_local 1))
       {{ end -}}
       {{ if eq . 39 -}}
       (call $h (get_local 2))
       {{ end -}}
       {{ end -}}
       (call $h (get_local 0))
 )
 (export "h" (func $h_main))
)

This template worked by taking the program input as a string and looping over each character to decide what to do. If it was a space, it would print a newline. If it was an h, it would print h. If it was a ', it would print a '. Anything else is ignored.

However, this means that the parser was mostly ignored. And the parser spec compiles to 117 bytes when gzipped, which means that it can fit on a tshirt.

Numa is delet
<Numa> That's a savings of 0.8475%!

Additionally, this would then use the command wat2wasm to compile it to a WebAssembly file instead of doing it directly. This combined with the fact that the get_local instruction was renamed to local.get in the text format some time in the last 2 years means that not only was my compiler hacky, it didn't work anymore.

Mara is hacker
<Mara> Apparently that was renamed before WASM hit 1.0 and the legacy name was an alias they planned to remove. Guess who didn't get the memo!

Needless to say, this could be fixed by doing a simple s/get_local/local\.get/g on the source file, but that's not fun. You know what's really fun? Reverse-engineering a binary file on stream and reassembling an identical replica in code. That's fun.

The nguh compiler

On December 31st, 2022, I wrote the nguh compiler on stream. The nguh (nguh gives u hlang or Next-Generation Universal Hlang compiler, whichever you prefer) compiler outputs WebAssembly bytecode directly instead of using wat2wasm as a middleman.

Mara is happy
<Mara> This means that hlang has even fewer dependencies!

nguh is supposed to be pronounced with the final sound of -ing and uh smashed together. It is not phonetically valid in English. It will take some practice to say it correctly. I'm not sorry. If you can read IPA, it's pronounced /ŋə/. The name comes from the youtuber Agma Schwa's show about conlangs named /ŋə/.

To help you understand the architecture of nguh, it will be helpful to get some context about how WebAssembly files work.

How WebAssembly files work

What is WebAssembly?

WebAssembly is a standard that specifies a way to run programs on arbitrary hardware in a sandboxed way. It is used mainly in web browsers to power things like YouTube's player component, Twitch stream viewing, and by developers any time they need to put a block of code into a website without having to rewrite it in JavaScript.

I'm part of a slowly growing group of developers that want to run WebAssembly code on the server so that you can take the same .wasm file and run it on any hardware without having to have the source code and a working compiler setup.

hlang is compiled to WebAssembly for no reason in particular.

At a high level, a WebAssembly module has a bunch of sections in it. Each section contains information for things like what functions the module exports, the types of imported fuctions, how much memory the module needs, what should be in memory by default, and the function bodies for your code. Here's an annotated disassembly of a hlang binary:

0x00, 0x61, 0x73, 0x6d, // \0asm wasm magic number
0x01, 0x00, 0x00, 0x00, // version 1

0x01, // type section
0x08, // 8 bytes long
0x02, // 2 entries
0x60, 0x01, 0x7f, 0x00, // function type 0, 1 i32 param, 0 return
0x60, 0x00, 0x00, // function type 1, 0 param, 0 return

0x02, // import section
0x07, // 7 bytes long
0x01, // 1 entry
0x01, 0x68, // module h
0x01, 0x68, // name h
0x00, // type index
0x00, // function number

0x03, // func section
0x02, // 2 bytes long
0x01, // function 1
0x01, // type 1

0x07, // export section
0x05, // 5 bytes long
0x01, // 1 entry
0x01, 0x68, // "h"
0x00, 0x01, // function 1

0x0a, // code section
0x1b, // 27 bytes long
0x01, // 1 entry
0x19, // 25 bytes long
0x01, // 1 local declaration
0x03, 0x7f, // 3 i32 values - (local i32 i32 i32)
0x41, 0x0a, // i32.const 10 (newline)
0x21, 0x00, // local.set 0
0x41, 0xe8, 0x00, // i32.const 104 (h)
0x21, 0x01, // local.set 1
0x41, 0x27, // i32.const 39 (')
0x21, 0x02, // local.set 2
0x20, 0x01, // local.get 1 push h
0x10, 0x00, // call 0 (putchar)
0x20, 0x00, // local.get 0 push newline
0x10, 0x00, // call 0 (putchar)
0x0b // end of function

At a high level, nguh just takes all the needed sections and puts them in the target binary. Most of the sections are copied verbatim from that disassembly I pasted above because they don't need any modification for the binary to work.

The exciting part happens when the individual nodes in the hlang syntax tree get compiled to WebAssembly bytecode. Each node in the tree has maybe its character to print and maybe a list of child nodes. A syntax tree for hlang could look like this if it has one character in the program:

input: h
H("h")

Or it could look like this if there are multiple characters in the program:

input: h h h
H{
	"h",
	"h",
	"h",
}

This means I need something like this:

// compile AST to wasm
if len(tree.Kids) == 0 {
    if err := compileOneNode(funcBuf, tree); err != nil {
        return nil, err
    }
} else {
    for _, node := range tree.Kids {
        if err := compileOneNode(funcBuf, node); err != nil {
            return nil, err
        }
    }
}

This will either read from the root of the tree or all of the tree's children in order to compile the entire program. The compileOneNode function will turn the text associated with the node into the correlating WASM bytecode (pushing the relevant character to the stack and then calling the h.h (putchar) function).

Finally it will generate the end of the function including a trailing newline and end the .wasm file.

Mara is hacker
<Mara> Fun fact: the generated binary for a hlang program that only prints h is 69 bytes.

Numa is delet
<Numa> NICE!

Here is a base-64 encoded hlang binary in case you find this interesting:

AGFzbQEAAAABCAJgAX8AYAAAAgcBAWgBaAAAAwIB
AQcFAQFoAAEKHQEbAQN/QQohAEHoACEBQSchAiAB
EAAgABAAAQEL

If you want to play with hlang, head to its new home at h.within.lgbt. If you want to witness things such as this being created live, follow me on twitch or on my VTuber business account at @xe@vt.social.

Cadey is enby
<Cadey> Happy new year to those that celebrate!


This post was written live on Twitch. You can check out the stream recording on Twitch and on YouTube. If you are reading this in the first day or so of this post being published, you will need to watch it on Twitch.

This article was posted on M12 31 2022. Facts and circumstances may have changed since publication. Please contact me before jumping to conclusions if something seems wrong or unclear.

Series: h

Tags: hlang wasm

The art for Mara was drawn by Selicre.

The art for Cadey was drawn by ArtZora Studios.

Some of the art for Aoi was drawn by @Sandra_Thomas01.