Table of Contents

Introduction

After wanting to work on an Open Source project for a long time, an opportunity finally presented itself. After browsing the Google Summer of Code available projects I finally landed on a project with an interesting proposal. Sound Open Firmware was trying to implement a dynamic module loader for the DSP’s found in modern Intel and AMD processors.

First of all, I want to thank my mentor Daniel Baluta for his amazing support and knowledge.

The issue with the original implementation was that the audio decoding routines for all supported codecs in the topology had to statically linked, drastically increasing the DSP’s firmware binary size. We tried to solve this problem by dynamically loading the decoding libraries after the codec was initialised and just before the codec was required to start playing.

Setting up the hardware

When I joined the project I did not have the necessary hardware to test the changes, so I was sent an NXP I.MX8. Being the first time I worked with embedded devices, setting up and booting this board was an adventure in and of itself :).

NOTE: This short guide of setting up the board is written for the Arch based distribution Manjaro. Yea, I kinda use Arch btw :).

Physical setup

The main way of communicating with the board was using the serial console present on one of the usb ports. After a bit of troubleshooting (one of the laptops I had did not properly recognize the USB device), I finally ended up at the U-Boot console. Here is where a cheat sheet made by one of the previous GSoC students came in useful. This ended up being the primary motivation for writing the setup guide :).

TFTP server

The board uses TFTP to get the kernel image and the dtb file. Installing the server was as easy as:

sudo pacman -S tftp-hpa

Now we need to point the root of the TFTP share to the folder where the image and dtb file are located:

mkdir ~/<tftp-folder-name>

Now that we have the folder we need to add it in the tftpd configuration file. The default configuration on Manjaro, at the time of writing should look something like this:

TFTP_USERNAME="tftp"
TFTP_ADDRESS=":69"
TFTPD_ARGS="--secure /home/<your-user>/<tftp-folder-name>"

The last thing we need to do is to actually start the tftp daemon:

sudo systemctl start tftpd

We can check if the server started correctly using the following command:

sudo systemctl status tftpd

If we do not see any red text we are most likely fine :).

Now that our cool TFTP server is running we should check if it actually works as intended. Put any file in the directory you set in the config and then try to download it from your local machine using the tftp utility:

# Connect to the TFTP server
tftp localhost 69 # nice

# Download the file
tftp> get <filename>

If this works, congratulations! We now have a working TFTP server, and our board will be able too boot the kernel. It actually doesn’t boot successfully because we are kind of, maybe, missing the entire filesistem :^). After the obligatory self congratulatory head pat we shall continue to fixing the tiny missing filesystem problem.

NFS server & rootfs

First of all we need to get a rootfs, NXP offers a rootfs and kernel images for our board, but they are locked behind a login, so getting the actual files is left as an exercise to the reader. The filename for the I.MX8 is named imx-image-multimedia-imx8qxpc0mek.tar.bz2, quite catchy. With our rootfs in hand we proceed to extract it from the archive:

mkdir <rootfs-dir>
cd <rootfs-dir>
tar xf <rootfs-archive>

We now need to install the NFS server. On Manjaro we can achieve this by using the following command:

sudo pacman -S nfs-utils

After that, we need to add our rootfs folder to the /etc/exports file, in order to be able to mount it over the network.

# /etc/exports - exports(5) - directories exported to NFS clients
#
# Example for NFSv2 and NFSv3:
#  /srv/home        hostname1(rw,sync) hostname2(ro,sync)
# Example for NFSv4:
#  /srv/nfs4	    hostname1(rw,sync,fsid=0)
#  /srv/nfs4/home   hostname1(rw,sync,nohide)
# Using Kerberos and integrity checking:
#  /srv/nfs4        *(rw,sync,sec=krb5i,fsid=0)
#  /srv/nfs4/home   *(rw,sync,sec=krb5i,nohide)
#
# Use `exportfs -arv` to reload.
<rootfs-dir> *(rw,sync,no_root_squash,no_subtree_check)

Using * after our directory means that the exported directory will be available to ANY device that can reach our computer, if you do not want this you need to later configure the board to have a static ip and replace the * with it.

In order to make the changes take place we need to use the command given to us in that helpful comment:

sudo exportfs -arv

And finally we can check if our NFS server works correctly:

sudo mount.nfs localhost:<rootfs-dir> <mount-point>

If you cd into <mount-point> you should see the contents of the <rootfs-dir> directory.

After another self congratulatory head pat we can finnaly go over to the board and tell it where to find our juicy files.

U-Boot configuration

There were two ways of testing the changes we made to the firmware and kernel on the board:

  • Write the kernel, firmware and rootfs on the included SD card each time we made a change.

  • Boot the board over the network, while having all of the necessary files on the development computer. explained above

After a very brief discussion with the mentor we agreed that booting the board over the network was the fastest and least cumbersome way, like most stuff until now, this requires some configuration :^).

We need to set some environment variables to tell the board what it needs to know in order to boot correctly and mount our remote filesystem, but first of all we need to connect the board to the network. This can be done in two ways:

  • Connect the board to your router

  • Connect the board directly to the PC

Both of these require a spare ethernet port on the router/pc. I have tested both methods and, at lest for me, there does not seem to be a difference in trasfer speeds, so from now on we will use the first method.

After the board is connected to the router we should make sure our PC has a IP address that does not change very often. This can be achieved from the router’s graphical interface. Figuring out how to do this is left as an exercise to the viewer :^). You should now be able to ping your PC from the board using the ping <ip> command.

Time to setup the environment!

setenv nfsroot <rootfs-dir> # Absolute path to the rootfs directory
setenv image <kernel-image> # Relative path from the TFTP directory to the kernel image
setenv fdt_file <dtb-file>  # Relative path from the TFTP directory to the dtb file
setenv netargs 'setenv bootargs console=${console},${baudrate} ${smp} root=/dev/nfs ip=dhcp nfsroot=${serverip}:${nfsroot},v3,tcp'

First we need to save these changes using saveenv. After that we can boot the board using run netboot, or we can set it to automatically boot over the network using setenv bootcmd run netboot. FINALLY the board is booting, now to the software side.

Modifying the software

Dynamic loading

One of the biggest challenges we faced was trying to make the firmware work without having the required functions statically linked into it.

The first thing we tried was looking into ways the compiler might help us, because it can already load libraries and resolve symbols at runtime. Unfortunately we hit giant wall: the fact that the firmware is running on the DSP, which has no clue what a filesystem or file is. This meant that we had to somehow interface with the linux kernel running on the ARM processor.

But because we had already found a way to achieve what we wanted we opened a Pull Request with a mock-up implementation in order to receive some feedback.

Actual implementation

We noticed that the symbol for the decoding function is resolved during the initialisation of the codec, wihch is done at boot time. This was a big no-no for our cause, so we went on a journey to find if we could maybe initialise the codec later. A suitable place was in the cadence_codec_prepare function that was called just before the actual decoding started.

Because the decoding function was not used between cadence_codec_init and cadence_codec_prepare, we concluded that the API resolution can be moved in the latter.

The first step was to add a new topology codec id that would signal to the firmware that this topology wants the decoding functions to be resolved at runtime, rather than them being statically linked.

 enum cadence_api_id {
+	CADENCE_CODEC_UNRESOLVED	= 0x00,
  	CADENCE_CODEC_WRAPPER_ID	= 0x01,
 	CADENCE_CODEC_AAC_DEC_ID	= 0x02,
 	CADENCE_CODEC_BSAC_DEC_ID	= 0x03,
 	CADENCE_CODEC_DAB_DEC_ID	= 0x04,
 	CADENCE_CODEC_DRM_DEC_ID	= 0x05,
 	CADENCE_CODEC_MP3_DEC_ID	= 0x06,
 	CADENCE_CODEC_SBC_DEC_ID	= 0x07,
 };

And so, CADENCE_CODEC_UNRESOLVED was born.

The next step was to properly handle topologies that use the UNRESOLVED codec. In order to keep the code clean and avoid duplication, the stuff that was not required during the initialization was moved from cadence_codec_init into a separate function cadence_codec_post_init:

static int cadence_codec_post_init(struct comp_dev *dev, uint32_t api_id)
{
	int ret;
	struct codec_data *codec = comp_get_codec(dev);
	struct cadence_codec_data *cd = codec->private;
	uint32_t obj_size;

	/* Resolve codec API for api_id */
	ret = cadence_codec_api_resolve(codec, api_id);
	if (ret != LIB_NO_ERROR) {
		comp_err(dev, "cadence_codec_post_init(): failed to resolve api for api id %d",
			 api_id);
		goto out;
	}

	/* Obtain codec name */
	API_CALL(cd, XA_API_CMD_GET_LIB_ID_STRINGS,
		 XA_CMD_TYPE_LIB_NAME, cd->name, ret);
	if (ret != LIB_NO_ERROR) {
		comp_err(dev, "cadence_codec_post_init() error %x: failed to get lib name",
			 ret);
		codec_free_memory(dev, cd);
		goto out;
	}
	/* Get codec object size */
	API_CALL(cd, XA_API_CMD_GET_API_SIZE, 0, &obj_size, ret);
	if (ret != LIB_NO_ERROR) {
		comp_err(dev, "cadence_codec_post_init() error %x: failed to get lib object size",
			 ret);
		codec_free_memory(dev, cd);
		goto out;
	}
	/* Allocate space for codec object */
	cd->self = codec_allocate_memory(dev, obj_size, 0);
	if (!cd->self) {
		comp_err(dev, "cadence_codec_post_init(): failed to allocate space for lib object");
		codec_free_memory(dev, cd);
		goto out;
	} else {
		comp_dbg(dev, "cadence_codec_post_init(): allocated %d bytes for lib object",
			 obj_size);
	}
	/* Set all params to their default values */
	API_CALL(cd, XA_API_CMD_INIT, XA_CMD_TYPE_INIT_API_PRE_CONFIG_PARAMS,
		 NULL, ret);
	if (ret != LIB_NO_ERROR) {
		comp_err(dev, "cadence_codec_post_init(): error %x: failed to set default config",
			 ret);
		goto out;
	}

	comp_dbg(dev, "cadence_codec_post_init() done");
out:
	return ret;
}

The cadence_codec_init function now checks if the topology’s codec id is CADENCE_CODEC_UNRESOLVED, and delays resolution of the api and codec initialisation until cadence_codec_prepare, if that is the case.

int cadence_codec_init(struct comp_dev *dev)
{
	int ret;
	struct codec_data *codec = comp_get_codec(dev);
	struct cadence_codec_data *cd = NULL;
	uint32_t api_id = CODEC_GET_API_ID(codec->id);

	comp_dbg(dev, "cadence_codec_init(): start, api_id = %d", api_id);

	cd = codec_allocate_memory(dev, sizeof(struct cadence_codec_data), 0);
	if (!cd) {
		comp_err(dev, "cadence_codec_init(): failed to allocate memory for cadence codec data");
		return -ENOMEM;
	}

	codec->private = cd;
	cd->self = NULL;
	cd->mem_tabs = NULL;
	cd->api = NULL;

	if (api_id == CADENCE_CODEC_UNRESOLVED) {
		comp_dbg(dev, "cadence_codec_init(): codec unresolved, delaying api assignment until cadence_codec_prepare");
		return LIB_NO_ERROR;
	}

	ret = cadence_codec_post_init(dev, api_id);
	if (ret != LIB_NO_ERROR)
		goto out;

	comp_dbg(dev, "cadence_codec_init(): done");
out:
	return ret;
}

Because this was just a preliminary implementation, and some of the features we required from the kernel side were not in the upstream yet, we decided that for now, to make the firmware assume that a unresolved api is always the mp3 api, as a POC. This was submitted as a pull request. The use of a default codec was not really agreed upon, and was maybe a bit of mishap from our side, but we kept our heads up and decided to start working on the missing kernel features, but first… a message from our sponsor Raid Shadow Shoddy M4 Scripts.

“Fixing” a variable leak in topology generation

Before we figured out exactly what to do to get the kernel patches into the upstream, my mentor notified me of some simple issues that required fixing. One of them was the fact that the definition of the macro CHANNELS_MIN in one of the topology generation m4 scripts was leaking, because it was not being undefined. By some miracle this was not causing any issues right now, but it required immediate fixing.

This is how the patch mostly looked:

- ifdef(`CHANNELS_MIN',`',
- `define(CHANNELS_MIN, `PIPELINE_CHANNELS')')
+ ifdef(`CHANNELS_MIN',`define(`LOCAL_CHANNELS_MIN', `CHANNELS_MIN')',
+ `define(`LOCAL_CHANNELS_MIN' `PIPELINE_CHANNELS')')

What could go wrong when editing two lines of code? Forgetting a comma :^) . Not even a day later I received some e-mail notifications about being meintioned on github. Another great move by the great Potochi. That was a nice mishap, let’s move on.

Making Intel IPC stream operations more generic

While we were waiting on feedback from the dynamic loading pull request we decided to try to touch up some of Daniel’s pull requests and get them merged into the SOF linux kernel fork.

One of those was trying to make the operations of reading and writing to the mailboxes shared between the DSP and the main CPU more generic, thus creating a generic set of operations that every platform can customise.

Here are the the function variables that were added:

void (*mailbox_read)(struct snd_sof_dev *sof_dev, u32 offset, void *dest, size_t size);
void (*mailbox_write)(struct snd_sof_dev *sof_dev, u32 offset, void *src, size_t size);

Because these functions are not mandatory, another layer of abstraction was put in place in form of two wrapper functions that check if the function is in fact used, and call it. If it is not used they simply return.

static inline void snd_sof_dsp_mailbox_read(struct snd_sof_dev *sdev,
	u32 offset, void *dest, size_t bytes)
{
	if (sof_ops(sdev)->mailbox_read)
		sof_ops(sdev)->mailbox_read(sdev, offset, dest, bytes);
}

static inline void snd_sof_dsp_mailbox_write(struct snd_sof_dev *sdev,
	u32 offset, void *src, size_t bytes)
{
	if (sof_ops(sdev)->mailbox_write)
		sof_ops(sdev)->mailbox_write(sdev, offset, src, bytes);
}

These functions are now shared between Intel and i.MX platforms. We hope that AMD will join us soon! :^)

Introducing fragment elapsed notification API

Another step towards getting to load decompression modules dynamically was getting to play compressed audio in the first place. Fristly, we needed a way to communicate when the DSP is ready to accept new fragments from the applications.

#include <sound/soc.h>
#include <sound/sof.h>
#include <sound/compress_driver.h>
#include "sof-audio.h"
#include "sof-priv.h"

static void snd_sof_compr_fragment_elapsed_work(struct work_struct *work)
{
	struct snd_sof_pcm_stream *sps =
		container_of(work, struct snd_sof_pcm_stream,
			     period_elapsed_work);

	snd_compr_fragment_elapsed(sps->cstream);
}

void snd_sof_compr_init_elapsed_work(struct work_struct *work)
{
	INIT_WORK(work, snd_sof_compr_fragment_elapsed_work);
}

/*
 * sof compr fragment elapse, this could be called in irq thread context
 */
void snd_sof_compr_fragment_elapsed(struct snd_compr_stream *cstream)
{
	struct snd_soc_component *component;
	struct snd_soc_pcm_runtime *rtd;
	struct snd_sof_pcm *spcm;

	if (!cstream)
		return;

	rtd = cstream->private_data;
	component = snd_soc_rtdcom_lookup(rtd, SOF_AUDIO_PCM_DRV_NAME);

	spcm = snd_sof_find_spcm_dai(component, rtd);
	if (!spcm) {
		dev_err(component->dev,
			"fragment elapsed called for unknown stream!\n");
		return;
	}

	/* use the same workqueue-based solution as for PCM, cf. snd_sof_pcm_elapsed */
	schedule_work(&spcm->stream[cstream->direction].period_elapsed_work);
}

Final conclusions

I thoroughly enjoyed my time working on this project, and I feel that it has massively improved my abilities as a developer. I finally learned how to properly use git after a few years of thinking I did not need it because most of my projects were developed only by myself, and working for the first time with embedded linux has been a blast, configuring everything from the bootloader, to the kernel itself.

I want to thank my awesome mentor, Daniel Baluta, again for always being there when I needed him and for being patient with the little dum dum that I was :).

Even though we did not meet the primary goal of the proposal I feel that we set up some of the foundations that were needed for it, and I cannot wait to finish it after GSoC is over.

Here is a list of all the pull requests:

SOF Linux Kernel

SOF Project