Concatenative: Crowd Noise Synthesis

Two master's thesis in collaboration between Audiocommunication Group and IRCAM aimed at a parametric synthesis of crowd noises, more precisely of many people speaking simultaneously (Grimaldi, 2016; Knörzer, 2017). Using a concatenative approach, the resulting synthesis system can be used to dynamically change the affective state of the virtual crowd. The resulting algorithm was applied in user studies in virtual acoustic environments.

Recordings

The corpus of speech was gathered in two group sessions, each with five persons, in the anechoic chamber at TU Berlin. For each speaker, the recording was annotated into regions of different valence and arousal and then segmented into syllables, automatically.

Features

/images/Sound_Synthesis/concatenative/valence_arousal_1.png

Synthesis

The following example synthesizes a crowd with a valence of -90 and an arousal of 80, which can be categorized as frustrated, annoyed or upset. No virtual acoustic environment is used, and the result is rather direct:


References

  • Grimaldi, Vincent and Böhm, Christoph and Weinzierl, Stefan and von Coler, Henrik. Parametric Synthesis of Crowd Noises in Virtual Acoustic Environments. In Proceedings of the 142nd Audio Engineering Society Convention. Audio Engineering Society, 2017.
    [BibTeX▼]
  • Christian Knörzer. Concatenative crowd noise synthesis. Master's thesis, TU Berlin, 2017.
    [BibTeX▼]
  • Vincent Grimaldi. Parametric crowd synthesis for virtualacoustic environments. Master's thesis, IRCAM, 2016.
    [BibTeX▼]
  • Diemo Schwarz. Concatenative sound synthesis: The early years. Journal of New Music Research, 35(1):3–22, 2006.
    [BibTeX▼]
  • Diemo Schwarz, Grégory Beller, Bruno Verbrugghe, and Sam Britton. Real-Time Corpus-Based Concatenative Synthesis with CataRT. In In DAFx. 2006.
    [BibTeX▼]
  • Diemo Schwarz. A System for Data-Driven Concatenative Sound Synthesis. In Proceedings of the COST-G6 Conference on Digital Audio Effects (DAFx-00). Verona, Italy, 2000.
    [BibTeX▼]
  • C. Hamon, E. Mouline, and F. Charpentier. A diphone synthesis system based on time-domain prosodic modifications of speech. In International Conference on Acoustics, Speech, and Signal Processing,, 238–241 vol.1. May 1989. doi:10.1109/ICASSP.1989.266409.
    [BibTeX▼]
  • F. Charpentier and M. Stella. Diphone synthesis using an overlap-add technique for speech waveforms concatenation. In ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing, volume 11, 2015–2018. April 1986. doi:10.1109/ICASSP.1986.1168657.
    [BibTeX▼]
  • Faust: Conditional Logic

    The select2() directive can be used as a switch condition with two cases, as shown in switch_example.dsp

    // switch_example.dsp
    //
    //
    // Henrik von Coler
    // 2020-05-28
    
    import("all.lib");
    
    // outputs 0 if x is greater 1
    // and 1 if x is below 0
    // 'l' is used as an implicit argument
    sel(l,x) = select2((x>=0), 0, 1);
    
    process = -0.1 : sel(2);
    

    Concatenative: Introduction

    Concatenative synthesis is an evolution of granular synthesis, first introduced in the context of speech synthesis and processing (Charpentier, 1986; Hamon, 1989).

    Concatenative synthesis for musical applications has been introduced by Diemo Schwarz. Corpus-based concatenative synthesis (Schwarz, 2000; Schwarz 2006) splices audio recordings into units and calculates audio features for each unit. During synthesis, unit selection can be performed by navigating the multidimensional feature space and selected units are concatenated.

    /images/Sound_Synthesis/concatenative/concatenative-flow-1.png
    [Fig.1] (Schwarz, 2006)

    /images/Sound_Synthesis/concatenative/concatenative-flow-2.png
    [Fig.2] (Schwarz, 2006)

    References

  • Grimaldi, Vincent and Böhm, Christoph and Weinzierl, Stefan and von Coler, Henrik. Parametric Synthesis of Crowd Noises in Virtual Acoustic Environments. In Proceedings of the 142nd Audio Engineering Society Convention. Audio Engineering Society, 2017.
    [BibTeX▼]
  • Christian Knörzer. Concatenative crowd noise synthesis. Master's thesis, TU Berlin, 2017.
    [BibTeX▼]
  • Vincent Grimaldi. Parametric crowd synthesis for virtualacoustic environments. Master's thesis, IRCAM, 2016.
    [BibTeX▼]
  • Diemo Schwarz. Concatenative sound synthesis: The early years. Journal of New Music Research, 35(1):3–22, 2006.
    [BibTeX▼]
  • Diemo Schwarz, Grégory Beller, Bruno Verbrugghe, and Sam Britton. Real-Time Corpus-Based Concatenative Synthesis with CataRT. In In DAFx. 2006.
    [BibTeX▼]
  • Diemo Schwarz. A System for Data-Driven Concatenative Sound Synthesis. In Proceedings of the COST-G6 Conference on Digital Audio Effects (DAFx-00). Verona, Italy, 2000.
    [BibTeX▼]
  • C. Hamon, E. Mouline, and F. Charpentier. A diphone synthesis system based on time-domain prosodic modifications of speech. In International Conference on Acoustics, Speech, and Signal Processing,, 238–241 vol.1. May 1989. doi:10.1109/ICASSP.1989.266409.
    [BibTeX▼]
  • F. Charpentier and M. Stella. Diphone synthesis using an overlap-add technique for speech waveforms concatenation. In ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing, volume 11, 2015–2018. April 1986. doi:10.1109/ICASSP.1986.1168657.
    [BibTeX▼]
  • Granular: Faust Example

    The grain_player.dsp example in the repository uses four parallel grain processes, as shown in [Fig.1].

    /images/Sound_Synthesis/granular/grain_player.png
    [Fig.1] Four parallel grain players

    The code below does not handle all problem cases. Depending on the sound material, changing the grain position may result in audible clicks. For high densities, grains are retriggered before their ampltude dacays to 0 - also resulting in clicks.

    // grain_player.dsp
    //
    // Play a wave file in grains.
    //
    // - four grains
    // - glitches when changing grain position
    //
    // Henrik von Coler
    // 2020-05-28
    
    import("stdfaust.lib");
    
    // read a set of wav files
    s = soundfile("label[url:{'../WAV/chips.wav';   '../WAV/my_model.wav'; '../WAV/sine.wav'}]", 1);
    
    // a slider for selecting a sound file:
    file_idx = hslider("file_idx",0,0,2,1);
    
    // a slider for controlling the playback speed of the grains:
    speed = hslider("speed",1,-10,10,0.01);
    
    // start point for grain playback
    start = hslider("start",0,0,1,0.01);
    
    // a slider for the grain length:
    length = hslider("length",1000,1000,40000,1): si.smoo;
    
    // control the sample density (or the clock speed)
    density = hslider("density", 0.1,0.01,20,0.01);
    
    // the ramp is used for scrolling through the indices
    ramp(f, t) = delta : (+ : select2(t,_,delta<0) : max(0)) ~ _ : raz
    with {
    
    // keep below 1:
    raz(x) = select2 (x > 1, x, 0);
    delta = sh(f,t)/ma.SR;
    
    // sample and hold
    sh(x,t) = ba.sAndH(t,x);
    };
    
    
    // 4 impulse trains with 1/4 period phase shifts
    quad_clock(d) = os.lf_imptrain(d) <:  _ , ( _ : @(0.25*(1/d) * ma.SR)) , ( _ : @(0.5*(1/d) * ma.SR)), ( _ : @(0.75*(1/d) * ma.SR)) ;
    
    // function for a single grain
    grain(s, part, start, l,tt) = (part, pos) : outs(s) : _* win_gain
    with {
    
    // ramp from 0 to 1
    r = ramp(speed,tt);
    
    // the playback position derived from the ramp
    pos = r*l + (start*length(s));
    
    // a simple sine window
    win_gain = sin(r*3.14159);
    
    // get recent file's properties
    length(s) = part,0 : s : _,si.block(outputs(s)-1);
    srate(s)  = part,0 : s : !,_,si.block(outputs(s)-2);
    // play sample
    outs(s) = s : si.block(2), si.bus(outputs(s)-2);
    
    };
    
    
    // four parallel grain players triggered by the quad-clock
    process =  quad_clock(density) : par(i,4,grain(s, file_idx, start, length)) :> _,_;// :> _ <: _,_;
    

    Granular: Introduction

    Granular synthesis is a special form of sample based synthesis, making use of micro sections of audio material, called grains, sometimes particles or atoms. This principle can be used to manipulate sounds by time-stretching and pitch-shifting or to generate sound textures (Roads, 2004).

    Early Analog

    John Cage's Williams Mix, realized in 1952-53 shows some of the earliest granular approaches.


    Iannis Xenakis was the first to refer to Dennis Gabor's quantum theory and the elementary signal (Gabor, 1946) for musical applications.

    Early Digital

    The possibilities to use granular synthesis grew rapidly with the advent of digital sampling and new composers made use of the technique.


    Barry Truax, who was visiting the TU Studio as guest professor in 2015-16 is known as one of the pioneers of digital granular composition (Truax, 1987). His soundscape-influenced works use the technique for generating rich textures, as in Riverrun:


    Horacio Vaggione made use of granular processing for his mixed music pieces. The original Scir - for bass flute and tape (which is granular processed bass flute) - has ben produced at the TU Studio in 1988:


    In 2018, the TU Studio performed the piece with flutist Erik Drescher and made a binaural recording:


    References

  • Curtis Roads. Microsound. The MIT Press, 2004. ISBN 0262681544.
    [BibTeX▼]
  • Barry Truax. Real-time granulation of sampled sound with the dmx-1000. In ICMC. 1987.
    [BibTeX▼]
  • D. Gabor. Theory of communication. part 1: the analysis of information. Journal of the Institution of Electrical Engineers - Part III: Radio and Communication Engineering, 93(26):429–441, November 1946. doi:10.1049/ji-3-2.1946.0074.
    [BibTeX▼]
  • NIME 2020: Setup

    The experiment took place at the Small Studio at Technical University Berlin. The room features three loudspeaker systems, including a dome of 21 Genelec 8020 loudspeakers with two subwoofers. This system is used for Ambisonics rendering in the experiments of this project. For the purpose of the study furniture was removed from the studio, making it suitable for free movement in the sweet spot of the loudspeaker dome.


    /images/NIME_2020/setup_1.JPG
    [Fig.1] Studio setup for user study.

    [Fig.1] shows the studio as it was equipped for the user study. An area of about \(1 \ \mathrm{m}^2 \)is marked with tape on the floor. This area is intended as the sweet area, where participants should operate the synthesis system. A table with chair, display, mouse and keyboard is placed close the sweet area, allowing the users to change the mapping. A second table for paperwork is placed at the edges of the loudspeaker system.


    Back to NIME 2020 Contents

    Sampling: Using Audio Files in Faust

    Using 'soundfile'

    Under the hood, using sound files in Faust is based on libsndfile. This part of Faust is less documented and lacks full integration. The soundfile primitive, which is the basis for reading and playing audio files, is not yet managed in the Faust Web IDE and can not be used with all targets.

    When using wav files in Faust, their content is combined with the generated binary when compiling. Files can thus not be read dynamically. Compiling with support for managing audio files is enabled with the -soundfile flag:

    $ faust2jaqt -soundfile sample_trigger.dsp
    

    Samples With a Trigger

    The soundfiles.lib library includes convinient functions for handling sound files and playing them:

    https://github.com/grame-cncm/faustlibraries/blob/master/soundfiles.lib

    Using the provided methods, basic use of audio files is granted with little code. The example sample_trigger.dsp makes use of the play method for soundfiles. A set of audio files is read and selected files can be triggered with buttons.

    // sample_trigger.dsp
    //
    // Read files and make them playable with a trigger.
    //
    // - makes use of the
    //
    // Henrik von Coler
    // 2020-05-28
    
    import("stdfaust.lib");
    
    // read a set of wav files
    s = soundfile("label[url:{'../WAV/kick.wav'; '../WAV/cowbell.wav'; '../WAV/my_model.wav'}]", 1);
    
    // a slider for controlling the level of all samples:
    level = hslider("level",1,0,2,0.01);
    
    // sample objects
    kick = so.sound(s, 0);
    bell = so.sound(s, 1);
    
    process = kick.play( level, button("kick") ),  bell.play( level, button("bell")) :>  _   <: _,_ ;
    

    Looping a Sample

    sample_looper.dsp defines a looping function which can play a chosen sample with fracional playrates, allowing reverse looping.

    // sample_looper.dsp
    //
    // Read a set of samples from wav files
    //
    // - loop sample with slider for speed
    // - select active sample
    //
    // Henrik von Coler
    // 2020-05-28
    
    import("stdfaust.lib");
    
    // read a set of wav files
    s = soundfile("label[url:{'../WAV/kick.wav'; '../WAV/cowbell.wav'; '../WAV/my_model.wav'}]", 1);
    
    // a slider for selecting a sound file:
    file_idx = hslider("file_idx",0,0,2,1);
    
    // a slider for controlling the playback speed:
    speed = hslider("speed",1,-100,100,0.01);
    
    // a logic for reverse loops (wrap to positive indices)
    wrap(l,x) = select2((x>=0),l-abs(x),x);
    
    
    // the loop function
    loop(s, idx) = (idx, reader(s)) : outs(s)
    with {
    
    // get recent file's properties
    length(s) = idx,0 : s : _,si.block(outputs(s)-1);
    srate(s)  = idx,0 : s : !,_,si.block(outputs(s)-2);
    
    // the playback position (a recursive counter)
    reader(s) = (speed * float(srate(s)))/ma.SR : (+,length(s):fmod)~  _ : wrap(length(s)) : int;
    
    // read from sample
    outs(s)   = s : si.block(2), si.bus(outputs(s)-2);
    
    };
    
    process = loop(s,file_idx) <: _,_ ;
    

    Sampling: Introduction

    Pierre Schaeffer & Musique Concrète

    The use of recorded material for musical compositions dates back to Pierre Schaeffer, who started experiments with turntables after World War II. He recorded environmental sounds and musical instruments, arranged them, altered the playback speed and used loops in what then became musique concrète. These techniques are well-known nowadays, but were a completely novel experience in th 1940s.

    Although an engineer by profession, Pierre Schaeffer did not only explore the technical means for composing with recorded sound. With the theory of the objet sonore he also lay the foundation for a theoretical and aesthetical discourse of acousmatic music (Schaeffer, 2012).


    The Cinq Études de bruits (1948), the first published works of musique concrète, use various sources and techniques.


    After the first experiments, Schaeffer started to involve musicians for taking the concept to the next level. With Pierre Henry he relized the Symphonie pour un homme seul in 1950. This acousmatic composition made use of various additional techniques, including spatial aspects.




    Digital Sampling

    Early devices capable of digital sampling are the Fairlight CMI (1979) and the Synclavier II (1980). These expensive, bulky devices were already used in various productions.

    Linn Drum

    The Linn Drum (1982) represents a breakthrough for digital sampling. Using 8 bit technique, it offers a set of drum sounds, which can be found in many 1980s pop productions.


    References

  • P. Schaeffer. In Search of a Concrete Music. Volume 15 of California Studies in 20th-Century Music. University of California Press, 2012. ISBN 9780520265745. Translated by C. North and J. Dack. URL: http://books.google.de/books?id=6nTruQAACAAJ.
    [BibTeX▼]
  • Henrik Brumm. Biomusic and popular culture: the use of animal sounds in the music of the beatles. Journal of Popular Music Studies, 24:25–38, 03 2012. doi:10.1111/j.1533-1598.2012.01314.x.
    [BibTeX▼]
  • Subtractive: Faust Examples

    VCO-VCA-VCF

    The first example for subtractive synthesis implements a virtual chain of VCO, VCF and VCA, as shown in the Faust diagram in [Fig.1].


    /images/Sound_Synthesis/subtractive/process_subtractive_1.svg
    [Fig.1] Faust diagram for the VCO-VCA-VCF example.

    The three modules are definied as individual functions, with paramters controlled by horizontal sliders. In the processing function, they are chained using the : operator.

    A resonant low pass from the filters.lib - the Faust Filters library - is used.


    // sawtooth-filter.dsp
    //
    // First steps with a VCO-VCA-VCF setup.
    // The three modules are connected in series.
    //
    // No anti-aliasing!
    //
    // - steady sound
    // - control over f0, cutoff, resonance, gain
    //
    // Henrik von Coler
    // 2020-05-17
    
    import("stdfaust.lib");
    
    //////////////////////////////////////////////////////////////////////////
    // Control Parameters
    //////////////////////////////////////////////////////////////////////////
    
    cutoff      = hslider("Cutoff", 100, 5, 6000, 0.001):si.smoo;
    f0          = hslider("Pitch", 100, 5, 16000, 0.001):si.smoo;
    q           = hslider("Q", 1, 0.1, 5, 0.01):si.smoo;
    gain        = hslider("Gain", 1, 0, 1, 0.01):si.smoo;
    
    //////////////////////////////////////////////////////////////////////////
    // Define three 'module' functions
    //////////////////////////////////////////////////////////////////////////
    
    vco        = os.sawtooth(f0);
    vcf         = fi.resonlp(cutoff,q,1) ;
    vca(x)    = gain * x;
    
    //////////////////////////////////////////////////////////////////////////
    // Define three 'modules'
    //////////////////////////////////////////////////////////////////////////
    
    voice =  vco  : vcf : vca;
    
    process = voice  <: _,_ ;
    

    Triggered

    The example subtractive_triggered.dsp from the repository extends the previous sawtooth example with temporal envelopes for VCF and VCA and implements four voices with individual control. The block diagram is shown in [Fig.2].


    /images/Sound_Synthesis/subtractive/process_subtractive_2.svg
    [Fig.2] Faust diagram for the triggered subtractive example.

    • The example makes use of the Moog filter from the vaeffects.lib library of virtual analog filter effects.
    • Individual control over the voices is realized through the % command within the voice() function.
    // subtractive_triggered.dsp
    //
    // A four voice subtractive synth.
    //
    // - trigger
    // - control over f0, cutoff, resonance, gain
    //
    // Henrik von Coler
    // 2020-05-17
    
    import("stdfaust.lib");
    
    trigger0 =  button("trigger0 [midi:key 33]");
    trigger1=  button("trigger1 [midi:key 34]");
    trigger2=  button("trigger2 [midi:key 35]");
    trigger3=  button("trigger3 [midi:key 36]");
    
    //////////////////////////////////////////////////////////////////////////
    // Define three 'module' functions
    //////////////////////////////////////////////////////////////////////////
    
    vco(f0)          = os.sawtooth(f0);
    vcf(c,r)          = ve.moog_vcf(r,c);
    vca(x,gain)    = gain * x;
    
    
    //////////////////////////////////////////////////////////////////////////
    // A function with envelopes
    //////////////////////////////////////////////////////////////////////////
    
    voice(index,trig) =  vco(f0) : vcf(fc,res) : vca(env1) * 0.5
    with
    {
    // use an individual hslider for every
    f0                = hslider("Pitch %index", 100, 5, 1000, 0.001):si.smoo;
    
    //trig = button("trigger%index");
    
    rel1 = hslider("rel_vca%index", 0.5, 0.01, 3, 0.01):si.smoo;
    rel2 = hslider("rel_vcf%index", 0.25, 0.01, 3, 0.01):si.smoo;
    
    env1 = en.arfe(0.02, rel1, 0,trig); // en.adsre(0.001,0.3,1,1,trig);
    env2 = en.arfe(0.01, rel2, 0,trig); //en.adsre(0.001,0.3,1,1,trig);
    
    cutoff = hslider("cutoff%index", 100, 5, 6000, 0.001):si.smoo;
    res     = hslider("res%index", 0.1, 0, 1, 0.01):si.smoo;
    
    fc         = 10+env2* cutoff;
    
    };
    
    process = voice(0,trigger0),voice(1,trigger1),voice(2,trigger2),voice(3,trigger3) :> _,_ ;
    


    Contents © Henrik von Coler 2020 - Contact