Morse encoder

The morse code encodes the latin alphabet comprised of 26 letters to a binary base. Traditionally, the destination set used dots and dashes, making the encoded letters easy to transmit as signals both over radio and wire. But any binary set of symbols can be used instead. For other languages with more than 26 characters, or for special characters, there is an extended set of morse binary combinations something we won’t cover here.

The encoding for the 26 latin letters is the following:
A.- B-… C-.-. D-.. E. F..-. G–. H…. I.. J.— K-.- L.-.. M– N-. O— P.–. Q–.- R.-. S… T- U..- V…- W.– X-..- Y-.– Z–..

It’s easy to spot that some sequences are repeated, and even more interesting, some combinations are parts of others. For instance D is part of B. This property allows us to represent the entire morse set of codes as a binary tree:

                       *
                   /       \
               E               T
             /   \           /   \
           I       A       N       M
          / \     / \     / \     / \ 
         S   U   R   W   D   K   G   O
        / \ / \ / \ / \ / \ / \ / \ / \ 
        H V F * L * P J B X C Y Z Q * *

To encode any given character to morse, all we need to do is parse this binary tree. We start from the top, one step to the left is a dot, while a step to the right is a dash. Star characters are invalid positions and are ignore. As an example, N is one step to the right (a dash) and one step to the left (a dot), that gives us -. which is correct.

A short code Morse encoder

We can get away with tables where we address one encoding to every character, but that’s just too redundant. Instead I suggest a different approach, one that is given on the binary tree observation above. First, we parse the tree in preorder: **ETIANMSURWDKGOHVF*L*PJBXCYZQ** . Index starts with 0. Left branches have even indexes, while right branches have the odd ones. For any given node, descendants are the next power of 2, and are also split in equal sizes corresponding to left branch and right branch. If you’re a programmer or you just love math, you’ll quick notice this is very similar to base 2 binary encoding. And indeed it is!
E has index 2, being on the second level after the root , which is a star character that we ignore. T has index 3, being the next odd number after 2. Putting this in binary we get: 2 = 10 while 3 = 11.
To continue this example, I is the first on the next level (next power of 2), index 4 which is 100 in binary. A has index 5, which is 101, N is on T’s branch but the same power of 2, and same level, index 6 = 110 and finally M has index 7 which translates to 111.
We need to discard the first digit, as the root is invalid and we don’t interpret the base level (and the first power of 2). Then the symbols become:
E=Index 2=0
T=Index 3=1
I=Index 4=00
A=Index 5=01 and so on.
Do I need to say that 0 is the dot and 1 is the dash in this new binary 0 and 1 morse encoding, or was it too obvious? 🙂

The algorithm

Start with the parsed binary tree sequence:

char *latin = "**ETIANMSURWDKGOHVF*L*PJBXCYZQ**"; //32

Write a simple function to find the index of any given letter. As a bonus implement case handling routine. The input letter is the letter we want to convert to Morse:

if (c >= 'a' && c <= 'z') c-=32; // convert to uppercase
	if (c < 'A' || c > 'Z') return;
	uint8_t i = 0;
	while (latin[++i] != c);

Convert the index to binary, while ignoring the first digit. BTW, if you go for linear div and mod routine in the well known conversion algorithm, at the end you’ll need to reverse your digits to get the correct result. Programmatically this means storing them. Alternatively, we’ll go for a recursive function to achieve that using the stack instead:

void conv(uint8_t decimal) {
	if (decimal) {
		conv(decimal/2);
		if (decimal!=1) decimal%2 ? printf("-"): printf(".");
	} 
}

That’s about it. If you’re using sounds as the Morse output, keep in mind that the dash has a duration three times longer than the dot, and after each letter, there is an end letter pause. To be more precise, the pseudocode is:
dot() { sound = 1; delay(T); sound = 0; delay(T); }
dash() {sound = 1; delay(3*T); sound = 0; delay(T); }
endletter() { delay(3*T); }

The code

Here’s the entire sequence for your convenience:

// Morse encoder , (C)2016 Radu Motisan , www.pocketmagic.net
#include 
#include 

const char *latin = "**ETIANMSURWDKGOHVF*L*PJBXCYZQ**"; //32

void dash() { printf("-"); }
void dot() { printf("."); }

void conv(uint8_t decimal) {
	if (decimal) {
		conv(decimal/2);
		if (decimal != 1) decimal%2 ? dash() : dot();	
	} 
}

void morse(char c) {
	if (c >= 'a' && c <= 'z') c -= 32; // convert to uppercase
	if (c < 'A' || c > 'Z') return;
	uint8_t i = 0;
	while (latin[++i] != c);
	conv(i);
}

int main() {
	morse('S');
	morse('O');
	morse('S');
	
	return 0;
}

And output for SOS is: …—… (keep that in mind, it might prove useful someday)
morse_encoder_pocketmagic

A quick application

You probably saw that I tried to keep the code size as low as possible, that is to make it better suited for microcontrollers. And it has some interesting applications, one being the ability to send messages from a device that has no screen. Many of us still do quick debug checks using LEDs or Speakers as a way to signal program breakpoints. But what about Morse?

A simple Android app that I randomly picked from the Store was able to decode the signals on the fly. It also supports light signals, so if you prefer the LED that can be done as well. The code to send the message is so simple:

	char *tmp = "it is so easy to debug via a simple speaker using morse";
	while (*tmp) {
		morse.encode(*tmp);
		tmp++;
	}

The relevant Morse routines are already included above. We’re getting very close to the way a modem works. Any of you remember the good-old dialup days? 🙂

More resources here.

Leave a Reply