Concept

This module focuses on fundamental principles of sound synthesis algorithms in C++, covering paradigms like subtractive synthesis, additive synthesis, physical modeling, distortion methods and processed recording. Theory and background of these approaches are covered in the contents of the Sound Synthesis Introduction.

The concept is based on Linux audio systems as development and runtime systems (von Coler & Runge, 2017). Using Raspberry PIs, classes can be supplied with an ultra low cost computer pool, resolving any compatibility issues of individual systems. Besides, the single board computers can be integrated into embedded projects for actual hardware instruments. Participants can also install Linux systems on their own hardware for increased performance.

Only few software libraries are part of the system used in this class, taking care of audio input and output, communication (OSC, MIDI), configuration and audio file processing. This minimal required framework allows the focus on the actual implementation of the algorithms on a sample-by-sample level, not relying on extensive higher level abstractions.


Although the concept of this class has advantages, there are different alternatives with their own benefits. There is a variety of frameworks to consider for implementing sound synthesis paradigms and building digital musical instruments with C/C++. The JUCE framework allows the compilation of 'desktop and mobile applications, including VST, VST3, AU, AUv3, RTAS and AAX audio plug-ins'. It comes with many helpful features and can be used to create DAW-ready software components. Environments like Puredata or SuperCollider come with APIs for programming user extensions. The resulting software components can be integrated into existing projects, easily.


References

2017

  • Henrik von Coler and David Runge. Teaching Sound Synthesis in C/C++ on the Raspberry Pi. In Proceedings of the Linux Audio Conference. 2017.
    [details] [BibTeX▼]

The JACK API

All examples in this class are implemented as JACK clients. Audio input and output is thus based on the JACK Audio API. The JACK framework takes over a lot of management and allows a quick entry point for programmers. Professional Linux audio systems are usually based on JACK servers, allowing the flexible connection of different software components. Read more in the JACK Section of the Computer Music Basics.


The ThroughExample

The ThroughExample is a slightly adapted version of the Simple Client. It wraps the same functionality into a C++ class, adding multi-channel capabilities.


Main

The file main.cpp creates an instance of the ThroughExample class. No command line arguments are passed and the object is created without any arguments:

ThroughExample *t = new ThroughExample();

Member Variables

jack_client_t   *client;

The pointer to a jack client is needed for connecting this piece of software to the JACK server.

The MIDI Protocol

The MIDI protocol was released in 1982 as a means for connecting electronic musical instruments. First synths to feature the new technology were the Prophet-600 and the Jupiter-6. Although limited in resolution from a recent point of view, it is still a standard for conventional applications - yet to be replaced by the newly released MIDI 2.0. Besides rare mismatches and some limitations, MIDI devices can be connected without complications. Physically, MIDI has been introduced with the still widespread 5-pin connector, shown below. In recent devices, MIDI is usually transmitted via USB.

MIDI jack  (5-pin DIN).

MIDI jack (5-pin DIN).



Standard MIDI Messages

MIDI transmits binary coded messages with a speed of $31250\ \mathrm{kbit/s}$. Timing and latency are thus not a problem when working with MIDI. However, the resolution of control values can be a limiting factor. Standard MIDI messages consist of three Bytes, namely one status Byte (first bit green) and two data bytes (first bit red). The first bit declares the Byte either a status Byte (1) or a data Byte (0).

/images/basics/midi-message.png

Standard MIDI message with three Bytes.


Some of the most common messages are listed in the table below. Since one bit is used as the status/data identifier, 7 bits are left for encoding. This results in the typical MIDI resolution of \(2^7 = 128\) values for pitch, velocity or control changes.

Voice Message           Status Byte      Data Byte1          Data Byte2
-------------           -----------   -----------------   -----------------
Note off                      8x      Key number          Note Off velocity
Note on                       9x      Key number          Note on velocity
Polyphonic Key Pressure       Ax      Key number          Amount of pressure
Control Change                Bx      Controller number   Controller value
Program Change                Cx      Program number      None
Channel Pressure              Dx      Pressure value      None
Pitch Bend                    Ex      MSB                 LSB

Pitch Bend

If you are stuck with MIDI for some reason but need a higher resolution, the Pitch Bend parameter can help. Each MIDI channel has one Pitch Bend, each with two combined data Bytes, resulting in a resolution of \(128^2 = 16384\) steps.


System Exclusive

SysEx messages can be freely defined by manufacturers. They are often used for dumping or loading settings and presets, but can also be used for arbitrary control purposes. SysEx messages can have any length and are not standardized.


MIDI Note to Hertz

When working with MIDI, a conversion from MIDI pitch to Hertz is often necessary. There are two simple formulas for doing that. They both refer to the MIDI pitch of 69, wich corresponds to a frequency of 440 Hz:

\begin{equation*} f[\mathrm{Hz}] = 2 \frac{\mathrm{MIDI}-69}{12} 440 \end{equation*}
\begin{equation*} \mathrm{MIDI} = 69 +12 \log_2 \left( \frac{f}{440 \mathrm{Hz}} \right) \end{equation*}

Getting Started with SuperCollider

Supercollider (SC) is a server-client-based tool for sound synthesis and composition. SC was started by James McCartney in 1996 and is free software since 2002. It can be used on Mac, Linux and Windows systems and comes with a large collection of community-developed extensions. The client-server principle aims at live coding and makes it a powerful tool for distributed and embedded systems, allowing the full remote control of synthesis processes.

There are many ways of approaching SuperCollider, depending on the intended use case. Some tutorials focus on sequencing, others on live coding or sound design. This introduction aims at programming remotely controlled synthesis and processing servers, which involves signal routing and OSC capabilities.


Getting SC

Binaries, source code and build or installation instructions can be found at the SC GitHub site. If possible, it is recommended to build the latest version from the repository:

https://supercollider.github.io/download

SuperCollider comes with a large bundle of help files and code examples but first steps are usually not easy. There are a lot of very helpful additional resources, providing step by step introductions.

Code snippets in this example are taken from the accompanying repository: SC Example. You can simple copy and paste them into your editor.


SC Trinity

SuperCollider is based on a client-server paradigm. The server is running the actual audio processing, whereas clients are used to control the server processes via OSC messages. Multiple clients can connect to a running server. The dedicated ScIDE allows convenient features for live coding and project management:

/images/basics/supercollider-components.png

Server, client and ScIDE.


sclang

sclang is the SuperCollider language. It represents the client side when working with SC. It can for example be started in a terminal by running:

$ sclang

Just as with other interpreted languages, such as Python, the terminal will then change into sclang mode. At this point, the class library is complied, making all SC classes executable. Afterwards, SC commands can be entered:

sc3>  postln("Hello World!")

ScIDE

Working with SC in the terminal is rather inconvenient. The SuperCollider IDE (ScIDE) is the environment for live coding in sclang, allowing the control of the SuperCollider language:

/images/basics/scide.thumbnail.png

ScIDE


When booting the ScIDE, it automatically launches sclang and is then ready to interpret. Files opened in the IDE can be executed as a whole. Moreover, single blocks, respectively single lines can be evaluated, which is especially handy in live coding, when exploring possibilities or prototyping. In addition, the IDE features tools for monitoring various server properties.


Some Language Details

Variable Names

Global variables are either single letters - s is preserved for the default server - or start with a tilde: ~varname). Local variables, used in functions, need to be defined explicitly:

var foo;

Evaluating Selections

Some of the examples in the SC section of this class are in the repository, whereas other only exist as snippets on these pages. In general, all these examples can be explored by copy-pasting the code blocks from the pages into the ScIDE. They can then be evaluated in blocks or line-wise but can not be executed as complete files. This is caused by the problem of synchronous vs asynchronous processes, which is explained later: Synchronous vs Asynchronous

These features help to run code in the ScIDE subsequently:

  • Individual sections of code can be evaluated by selecting them and pressing Control + Enter.

  • Single lines of code can be evaluated by placing the cursor and pressing Shift + Enter


Parentheses

Parentheses can help structuring SC code for live programming. Placing the cursor inside a region between parentheses and pressing Control + Enter evaluates the code inside the parentheses.

(
      post('Hello ');
      postln('World!');
)

NIME 2020: Spatialization

Virtual Source Model

Spectral spatialization in this system is based on a virtual sound source with a position an space and spatial extent, as shown in [Fig.1]. The source center is defined by two angles (Azimuth, Elevation) and the Distance. The Spread defines the diameter of the virtual source. This model is compliant with many theoretical frameworks from the fields of electroacoustic music and virtual acoustics.

/images/NIME_2020/source_in_space.png
Fig.1(1,2)

Virtual sound source with position an space and spatial extent.


Point Cloud Realization

The virtual source from [Fig.1] is realized as a cloud of point sources in an Ambisonics system using the IRCAM software Panoramix. 24 point sources can be controlled jointly. The following figures show the viewer of Panoramix, the left half representing the top view, the right half the rear view.


[Fig.2] shows a dense point cloud of a confined virtual sound source without elevation:

/images/NIME_2020/panoramix_confined.png
Fig.2

Confined virtual sound source.


The virtual sound source in [Fig.3] has a wider spread and is elevated:

/images/NIME_2020/panoramix_spread.png
Fig.3

Spread virtual sound source with elevation.


For small distances and large spreads, the source is enveloping the listener, as shown in [Fig.4]:

/images/NIME_2020/panoramix_enveloping.png
Fig.4

Enveloping virtual sound source.


Dispersion

In a nutshell, the synthesizer outputs the spectral components of a violin sound to 24 individual outputs. Different ways of assigning spectral content to the outputs are possible, shown as Partial to Source Mapping in [Fig.5]. In these experiments, each output represents a Bark scale frequency band. For the point cloud shown above, the distribution of spectral content is thus neither homogenous nor stationary.

/images/NIME_2020/dispersion.png
Fig.5

Dispersion - routing partials to point sources.


Back to NIME 2020 Contents

NIME 2020: Mapping

Extended DMI Model

The typical DMI model connects the musical interface with the sound generation through a mapping stage. [Fig.1] shows the extended DMI model for spatial sound synthesis. The joint control of spatial and timbral characteristics offers new possibilities yet makes the mapping and the resulting control more complex.

/images/NIME_2020/mapping_dmi.png
Fig.1

Mapping in the extended DMI model.


Mapping in Puredata

We chose Puredata as a graphical interface for mapping controller parameters to sound synthesis and spatialization. Especially in the early stage of development this solution offers maximum flexibility. [Fig.2] shows the mapping GUI as it was used by the participants in the mapping study:

/images/NIME_2020/patching.png
Fig.2

Puredata patch for user-defined mappings.


Back to NIME 2020 Contents

NIME 2020: User Study

User-defined Mappings

In the first stage of the user study, participants had 30 minutes to create their own mapping, following this basic instruction:

The objective of this part is to create an enjoyable mapping, which offers the most expressive control over all synthesis and spatialization parameters.

A set of rules allowed one to many mappings and excluded many to one mappings:

  • Every rendering parameter of synthesis and spatialization must be influenced through the mapping.

  • Control parameters may remain unconnected.

  • A single control parameter may be mapped to multiple synthesizer or spatialization parameters.

  • A synthesis or spatialization parameter must not have more than one control parameter connected to its input.


Mapping Frequencies

The mapping matrix shows how some control parameters are preferred for specific tasks, considering the final mappings of all participants:

/images/NIME_2020/matrix.png
Fig.1

Mapping matrix: how often was a control parameter mapped to a specific rendering parameter?


Back to NIME 2020 Contents

Concatenative: Crowd Noise Synthesis

Two master's thesis in collaboration between Audiocommunication Group and IRCAM aimed at a parametric synthesis of crowd noises, more precisely of many people speaking simultaneously (Grimaldi, 2016; Knörzer, 2017). Using a concatenative approach, the resulting synthesis system can be used to dynamically change the affective state of the virtual crowd. The resulting algorithm was applied in user studies in virtual acoustic environments.

Recordings

The corpus of speech was gathered in two group sessions, each with five persons, in the anechoic chamber at TU Berlin. For each speaker, the recording was annotated into regions of different valence and arousal and then segmented into syllables, automatically.

Features

/images/Sound_Synthesis/concatenative/valence_arousal_1.png

Synthesis

The following example synthesizes a crowd with a valence of -90 and an arousal of 80, which can be categorized as frustrated, annoyed or upset. No virtual acoustic environment is used, and the result is rather direct:


References

2017

  • Grimaldi, Vincent and Böhm, Christoph and Weinzierl, Stefan and von Coler, Henrik. Parametric Synthesis of Crowd Noises in Virtual Acoustic Environments. In Proceedings of the 142nd Audio Engineering Society Convention. Audio Engineering Society, 2017.
    [details] [BibTeX▼]
  • Christian Knörzer. Concatenative crowd noise synthesis. Master's thesis, TU Berlin, 2017.
    [details] [BibTeX▼]

2016

2006

  • Diemo Schwarz. Concatenative sound synthesis: The early years. Journal of New Music Research, 35(1):3–22, 2006.
    [details] [BibTeX▼]
  • Diemo Schwarz, Grégory Beller, Bruno Verbrugghe, and Sam Britton. Real-Time Corpus-Based Concatenative Synthesis with CataRT. In In DAFx. 2006.
    [details] [BibTeX▼]

2000

  • Diemo Schwarz. A System for Data-Driven Concatenative Sound Synthesis. In Proceedings of the COST-G6 Conference on Digital Audio Effects (DAFx-00). Verona, Italy, 2000.
    [details] [BibTeX▼]

1989

  • C. Hamon, E. Mouline, and F. Charpentier. A diphone synthesis system based on time-domain prosodic modifications of speech. In International Conference on Acoustics, Speech, and Signal Processing,, 238–241 vol.1. May 1989. doi:10.1109/ICASSP.1989.266409.
    [details] [BibTeX▼]

1986

  • F. Charpentier and M. Stella. Diphone synthesis using an overlap-add technique for speech waveforms concatenation. In ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing, volume 11, 2015–2018. April 1986. doi:10.1109/ICASSP.1986.1168657.
    [details] [BibTeX▼]

Concatenative: Introduction

Concatenative synthesis is an evolution of granular synthesis, first introduced in the context of speech synthesis and processing (Charpentier, 1986; Hamon, 1989).

Concatenative synthesis for musical applications has been introduced by Diemo Schwarz. Corpus-based concatenative synthesis (Schwarz, 2000; Schwarz 2006) splices audio recordings into units and calculates audio features for each unit. During synthesis, unit selection can be performed by navigating the multidimensional feature space and selected units are concatenated.

/images/Sound_Synthesis/concatenative/concatenative-flow-1.png
Fig.1

(Schwarz, 2006)


/images/Sound_Synthesis/concatenative/concatenative-flow-2.png
Fig.2

(Schwarz, 2006)


References

2017

  • Grimaldi, Vincent and Böhm, Christoph and Weinzierl, Stefan and von Coler, Henrik. Parametric Synthesis of Crowd Noises in Virtual Acoustic Environments. In Proceedings of the 142nd Audio Engineering Society Convention. Audio Engineering Society, 2017.
    [details] [BibTeX▼]
  • Christian Knörzer. Concatenative crowd noise synthesis. Master's thesis, TU Berlin, 2017.
    [details] [BibTeX▼]

2016

2006

  • Diemo Schwarz. Concatenative sound synthesis: The early years. Journal of New Music Research, 35(1):3–22, 2006.
    [details] [BibTeX▼]
  • Diemo Schwarz, Grégory Beller, Bruno Verbrugghe, and Sam Britton. Real-Time Corpus-Based Concatenative Synthesis with CataRT. In In DAFx. 2006.
    [details] [BibTeX▼]

2000

  • Diemo Schwarz. A System for Data-Driven Concatenative Sound Synthesis. In Proceedings of the COST-G6 Conference on Digital Audio Effects (DAFx-00). Verona, Italy, 2000.
    [details] [BibTeX▼]

1989

  • C. Hamon, E. Mouline, and F. Charpentier. A diphone synthesis system based on time-domain prosodic modifications of speech. In International Conference on Acoustics, Speech, and Signal Processing,, 238–241 vol.1. May 1989. doi:10.1109/ICASSP.1989.266409.
    [details] [BibTeX▼]

1986

  • F. Charpentier and M. Stella. Diphone synthesis using an overlap-add technique for speech waveforms concatenation. In ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing, volume 11, 2015–2018. April 1986. doi:10.1109/ICASSP.1986.1168657.
    [details] [BibTeX▼]