While the introduction of time warping in continuous-time signals is in principle not problematic, time warping of discrete- time signals is not self-evident. On one hand, if the signal samples were obtained by sampling a bandlimited signal, the warped sig- nal is not necessarily bandlimited: it has a sampling theorem of its own, based on irregular sampling, unless the map is linear.
On the other hand, most signals are regularly sampled so that the samples at non-integer multiples of the sampling interval are not known. While the use of interpolation can partly solve the problem it usu- ally introduces artifacts. Moreover, in many sound applications, the computation already involves a phase vocoder. In this paper we introduce new methods and algorithms for time-warping based on warped time-frequency representations.
These lead to alterna- tive algorithms for warping for use in sound processing tools and digital audio effects and shed new light in the interaction of time warping with phase vocoders. We also outline the applications of time warping in digital audio effects. Abstract In the context of efficient synthesis of wind instrument sound, we introduce a technique for joint modeling of input impedance and sound pressure radiation as digital filters in parallel form, with the filter coefficients derived from experimental data.
In a series of laboratory measurements taken on an alto saxophone, the in- put impedance and sound pressure radiation responses were ob- tained for each fingering. In a first analysis step, we iteratively minimize the error between the frequency response of an input impedance measurement and that of a digital impedance model constructed from a parallel filter structure akin to the discretiza- tion of a modal expansion. With the modal coefficients in hand, we propose a digital model for sound pressure radiation which relies on the same parallel structure, thus suitable for coefficient estimation via frequency-domain least-squares.
For modeling the transition between fingering positions, we propose a simple model based on linear interpolation of input impedance and sound pres- sure radiation models. For efficient sound synthesis, the common impedance-radiation model is used to construct a joint reflectance- radiation digital filter realized as a digital waveguide termination that is interfaced to a reed model based on nonlinear scattering. Audio samples and codes for real-time operation are also supplied. Abstract We present a new approach for audio bandwidth extension for mu- sic signals using convolutional neural networks CNNs.
Inspired by the concept of inpainting from the field of image processing, we seek to reconstruct the high-frequency region i. We then invert this reconstructed time-frequency representation using the phase information from the band-limited input to provide an enhanced musical output. Through our evaluation, we demonstrate that the CQT, with its logarithmic frequency spacing, provides better reconstruc- tion performance as measured by the signal to distortion ratio.
Abstract This paper addresses a phase-related feature that is time-shift in- variant, and that expresses the relative phases of all harmonics with respect to that of the fundamental frequency. We identify the feature as Normalized Relative Delay NRD and we show that it is particularly useful to describe the holistic phase properties of voiced sounds produced by a human speaker, notably vowel sounds.
We illustrate the NRD feature with real data that is ob- tained from five sustained vowels uttered by 20 female speakers and 17 male speakers. It is shown that not only NRD coefficients carry idiosyncratic information, but also their estimation is quite stable and robust for all harmonics encompassing, for most vow- els, at least the first four formant frequencies. The average NRD model that is estimated using data pertaining to all speakers in our database is compared to that of the idealized Liljencrants-Fant L- F and Rosenberg glottal models.
We also present results on the phase effects of linear-phase FIR and IIR vocal tract filter models when a plausible source excitation is used that corresponds to the derivative of the L-F glottal flow model. These results suggest that the shape of NRD feature vectors is mainly determined by the glot- tal pulse and only marginally affected by either the group delay of the vocal tract filter model, or by the acoustic coupling between glottis and vocal tract structures.
Abstract This paper is concerned with perceptual control strategies for phys- ical modeling synthesis of vibrating resonant objects colliding non- linearly with rigid obstacles. For this purpose, we investigate sound morphologies from samples synthesized using physical modeling for non-linear interactions. As a starting point, we study the effect of linear and non-linear springs and collisions on a single-degree- of-freedom system and on a stiff strings.
- The Prisoner of Heaven: A Novel;
- Navigation Bar!
We then synthesize real- istic sounds of a stiff string colliding with a rigid obstacle. Numer- ical simulations allowed the definition of specific signal patterns characterizing the non linear behavior of the interaction according to the attributes of the obstacle. Finally, a global description of the sound morphology associated with this type of interaction is proposed. This study constitutes a first step towards further per- ceptual investigations geared towards the development of intuitive synthesis controls. Abstract An algorithm for artistic spectral audio processing and synthesis using allpass filters is presented.
These filters express group de- lay trajectories, allowing fine control of their frequency-dependent arrival times. We present methods for designing the group delay trajectories to yield a novel class of filters for sound synthesis and audio effects processing.
A number of categories of group de- lay trajectory design are discussed, including stair-stepped, mod- ulated, and probabilistic. Synthesis and processing examples are provided. Abstract Through this research, we develop a study aiming to explore how adaptive music can help in guiding players across virtual environ- ments.
A video game consisting of a virtual 3D labyrinth was built, and two groups of subjects played through it, having the goal of retrieving a series of objects in as short a time as possible. Each group played a different version of the prototype in terms of audio: one had the ability to state their preferences by choosing several musical attributes, which would influence the actual spatialised music they listened to during gameplay; the other group played a version of the prototype with a default, non-adaptive, but also spa- tialised soundtrack.
Time elapsed while completing the task was measured as a way to test user performance. Results show a sta- tistically significant correlation between player performance and the inclusion of a soundtrack adapted to each user. We conclude that there is an absence of a firm musical criteria when making sounds be prominent and easy to track for users, and that an adap- tive system like the one we propose proves useful and effective when dealing with a complex user base. Abstract Sound production by means of a physical model for falling ob- jects, which is intended for audio synthesis of immersive contents, is described here.
Our approach is a mathematical model to syn- thesize sound and audio for animation with rigid body simulation. To consider various conditions, a collision model of an object was introduced for vibration and propagation simulation. The gener- ated sound was evaluated by comparing the model output with real sound using numerical criteria and psychoacoustic analysis. The usefulness of the physical model for audio synthesis in virtual reality was represented in terms of breadth and quality of sound.
Abstract There are a range of different methods for comparing or measur- ing the similarity between environmental sound effects. These methods can be used as objective evaluation techniques, to eval- uate the effectiveness of a sound synthesis method by assessing the similarity between synthesised sounds and recorded samples. We propose to evaluate a number of different synthesis objective evaluation metrics, by using the different distance metrics as fit- ness functions within a resynthesis algorithm.
A recorded sample is used as a target sound, and the resynthesis is intended to produce a set of synthesis parameters that will synthesise a sound as close to the recorded sample as possible, within the restrictions of the synthesis model. The recorded samples are excerpts of selections from a sound effects library, and the results are evaluated through a subjective listening test. Results show that one of the objective function performs significantly worse than several others.
Only one method had a significant and strong correlation between the user perceptual distance and the objective distance. A recommen- dation of an objective evaluation function for measuring similarity between synthesised environmental sounds is made. Abstract In music recording and virtual reality applications, it is often desir- able to control the perceived size of a synthesized acoustic space. Here, we demonstrate a physically informed method for enlarging and shrinking room size.
Abstract An extensive piano sample library consisting of binaural sounds and keyboard vibration signals is made available through an open- access data repository. Samples were acquired with high-quality audio and vibration measurement equipment on two Yamaha Disklavier pianos one grand and one upright model by means of computer-controlled playback of each key at ten different MIDI velocity values. The nominal specifications of the equipment used in the acquisition chain are reported in a companion document, allowing researchers to calculate physical quantities e. Also, project files are provided for straightforward playback in a free software sampler available for Windows and Mac OS systems.
The library is especially suited for acoustic and vibration research on the pi- ano, as well as for research on multimodal interaction with musical instruments. Abstract This paper presents a position-based attenuation and amplifica- tion method suitable for source separation and enhancement. Our novel sigmoidal time-frequency mask allows us to directly control the level within a target azimuth range and to exploit a trade-off between the production of musical noise artifacts and separation quality.
DAFX: Digital Audio Effects, Second Edition [Book]
The algorithm is fully describable in a closed and compact analytical form. The method was evaluated on a multitrack dataset and compared to another position-based source separation algo- rithm. The results show that although the sigmoidal mask leads to a lower source-to-interference ratio, the overall sound quality mea- sured by the source-to-distortion ratio and the source-to-artifacts ratio is improved. Abstract In this paper, we propose to reduce the relatively high-dimension of pitch-based features for fear emotion recognition from speech.
Many techniques of dimensionality reduction are explored. First of all, optimal features ensuring better emotion classification are deter- mined. Next, several families of dimensionality reduction, namely PCA, LDA and LPP, are tested in order to reveal the suitable di- mension range guaranteeing the highest overall and fear recogni- tion rates. Results show that the optimal features group permits Abstract An audio-guide prototype was developed which makes it possi- ble to associate virtual sound sources to tourist route focal points.
An augmented reality effect is created, as the virtual audio content presented through headphones seems to originate from the specified real points. A route management application allows specification of source positions GPS coordinates , audio content monophonic files and route points where playback should be triggered. The binaural spatialisation effects depend on user pose rela- tive to the focal points: position is detected by a GPS receiver; for head-tracking, an IMU is attached to the headphone strap.
HRTF filters are se- lected according to the azimuth and elevation of the path from the virtual source, continuously updated based on user pose. Preliminary tests carried out with ten subjects confirmed the ability to provide the desired audio spatialisation effects and identified position detection accuracy as the main aspect to be improved in the future. Abstract This article is concerned with the power-balanced simulation of analog audio circuits, governed by nonlinear differential algebraic equations DAE.
The proposed approach is to combine principles from the port-Hamiltonian and Brayton-Moser formalisms to yield a skew-symmetric gradient system. The practical interest is to pro- vide a solver, using an average discrete gradient, that handles dif- ferential and algebraic relations in a unified way, and avoids having to pre-solve the algebraic part. This leads to a structure-preserving method that conserves the power balance and total energy.
The proposed formulation is then applied on typical nonlinear audio circuits to study the effectiveness of the method. Abstract Wave Digital Filters were developed to discretize linear time in- variant lumped systems, particularly electronic circuits. The time- invariant assumption is baked into the underlying theory and be- comes problematic when simulating audio circuits that are by na- ture time-varying.
We present extensions to WDF theory that in- corporate proper numerical schemes, allowing for the accurate sim- ulation of time-varying systems. We present generalized continuous-time models of reactive components that encapsulate the time-varying lossless models pre- sented by Fettweis, the circuit-theoretic time-varying models, as well as traditional LTI models as special cases. Models of time- varying reactive components are valuable tools to have when mod- eling circuits containing variable capacitors or inductors or electri- cal devices such as condenser microphones.
A power metric is derived and the model is discretized using the alpha-transform nu- merical scheme and parametric wave definition. Case studies of circuits containing time-varying resistance and capacitance are presented and help to validate the proposed gener- alized continuous-time model and discretization. Abstract In this paper, we focus on studying nonlinear behavior of the pickup of an electric guitar and on its modeling. The approach is purely experimental, based on physical assumptions and attempts to find a nonlinear model that, with few parameters, would be able to pre- dict the nonlinear behavior of the pickup.
In our experimental setup a piece of string is attached to a shaker and vibrates per- pendicularly to the pickup in frequency range between 60 Hz and Hz. The oscillations are controlled by a linearizion feedback to create a purely sinusoidal steady state movement of the string. In the first step, harmonic distortions of three different magnetic pickups a single-coil, a humbucker, and a rail-pickup are com- pared to check if they provide different distortions. In the last step, the pickup nonlinearities are compared and an empirical model that fits well all three pickups is proposed.
It contains three identical waveshaping circuits that can be used to convert sawtooth waveforms into sine waves. However, its sonic capabilities extend well beyond this particular application. These devices, unlike traditional operational amplifiers, operate on a current differencing principle and are featured in a handful of iconic musical circuits. This work provides an overview of Norton amplifiers within the context of virtual analog modeling and presents a digital model of the Serge TWS based on an analysis of the original circuit. Musikverb A harmonically adaptive audio reverberation.
A virtual tube delay effect. From an idea of an algorithm to a final commercial plugin, there is a lot of steps to be known and understood as a developer in order to make the best of an idea. The goal of this keynote is to help future or already established plugin's developers to be prepared and aware of what should be not forgotten during development. He continues to play drums, guitar and piano in several music groups. Abstract This work aims to implement a novel deep learning architec- ture to perform audio processing in the context of matched equal- ization.
Most existing methods for automatic and matched equal- ization show effective performance and their goal is to find a re- spective transfer function given a frequency response. Neverthe- less, these procedures require a prior knowledge of the type of filters to be modeled. In addition, fixed filter bank architectures are required in automatic mixing contexts. Based on end-to-end convolutional neural networks, we introduce a general purpose ar- chitecture for equalization matching. Thus, by using an end-to- end learning approach, the model approximates the equalization target as a content-based transformation without directly finding the transfer function.
The network learns how to process the au- dio directly in order to match the equalized target audio. We train the network through unsupervised and supervised learning proce- dures. We analyze what the model is actually learning and how the given task is accomplished. Abstract This paper proposes a method to filter the output of instrument contact sensors to approximate the response of a well placed mi- crophone. A modal approach is proposed in which mode frequen- cies and damping ratios are fit to the frequency response of the contact sensor, and the mode gains are then determined for both the contact sensor and the microphone.
The mode frequencies and damping ratios are presumed to be associated with the resonances of the instrument. Accordingly, the corresponding contact sensor and microphone mode gains will account for the instrument radia- tion.
The ratios between the contact sensor and microphone gains are then used to create a parallel bank of second-order biquad fil- ters to filter the contact sensor signal to estimate the microphone signal. The library features single sounds which cover the entire frequency range of the instrument in four dynamic levels, two-note sequences for the study of note transitions and vi- brato, and solo pieces for performance analysis.
Recordings took place in an anechoic chamber with a professional violinist and a recording engineer, us- ing two microphone positions. This document describes the con- tent and the recording setup in detail, alongside basic statistical properties of the data. Abstract In this paper, a method for separating stereophonic mixtures into their harmonic constituents is proposed.
The method is based on a harmonic signal model. An observed mixture is decomposed by first estimating the panning parameters of the sources, and then estimating the fundamental frequencies and the amplitudes of the harmonic components. The number of sources and their panning parameters are estimated using an approach based on clustering of narrowband interaural level and time differences.
The panning parameter distribution is modelled as a Gaussian mixture and the generalized variance is used for selecting the number of sources. The fundamental frequencies of the sources are estimated using an iterative approach. To enforce spectral smoothness when estimat- ing the fundamental frequencies, a codebook of magnitude ampli- tudes is used to limit the amount of energy assigned to each har- monic. The source models are used to form Wiener filters which are used to reconstruct the sources.
The proposed method can be used for source re-panning demonstration given , remixing, and multi-channel upmixing, e. Abstract This paper proposes a new partial tracking method, based on linear programming, that can run in real-time, is simple to imple- ment, and performs well in difficult tracking situations by consid- ering spurious peaks, crossing partials, and a non-stationary short- term sinusoidal model. Complex constant parameters of a gener- alized short-term signal model are explicitly estimated to inform peak matching decisions. Peak matching is formulated as a vari- ation of the linear assignment problem.
Combinatorially optimal peak-to-peak assignments are found in polynomial time using the Hungarian algorithm. Results show that the proposed method cre- ates high-quality representations of monophonic and polyphonic sounds. Abstract This paper describes a modification of the ESPRIT algorithm which can be used to determine the parameters frequency, decay time, initial magnitude and initial phase of a modal reverberator that best match a provided room impulse response.
By applying per- ceptual criteria we are able to match room impulse responses using a variable number of modes, with an emphasis on high quality for lower mode counts; this allows the synthesis algorithm to scale to different computational environments. Abstract To date, the most successful onset detectors are those based on frequency representation of the signal. However, for such methods the time between the physical onset and the reported one is unpre- dictable and may largely vary according to the type of sound being analyzed.
Such variability and unpredictability of spectrum-based onset detectors may not be convenient in some real-time applica- tions. This paper proposes a real-time method to improve the tem- poral accuracy of state-of-the-art onset detectors. The method is grounded on the theory of hard real-time operating systems where the result of a task must be reported at a certain deadline. It con- sists of the combination of a time-base technique which has a high degree of accuracy in detecting the physical onset time but is more prone to false positives and false negatives with a spectrum-based technique which has a high detection accuracy but a low tempo- ral accuracy.
Keiler, A. Bonada, X. Serra, X. Amatriain, A.
Valimaki, S. Bilbao, J. Smith, J. Abel, J. Pakarinen, D. Perez-Gonzalez, J. Evangelista, S. Marchand, M. Plumbley, E. Zolzer, V. It maintains a unique approach to DAFX with a lecture-style introduction into the basics of effect processing. Each effect description begins with the presentation of the physical and acoustical phenomena, an explanation of the signal processing techniques to achieve the effect, followed by a discussion of musical applications and the control of effect parameters.
Topics covered include: filters and delays, modulators and demodulators, nonlinear processing, spatial effects, time-segment processing, time-frequency processing, source-filter processing, spectral processing, time and frequency warping musical signals. Discussing DAFX from both an introductory and advanced level, the book systematically introduces the reader to digital signal processing concepts, how they can be applied to sound and their use in musical effects.
This makes the book suitable for a range of professionals including those working in audio engineering, as well as researchers and engineers involved in the area of digital signal processing along with students on multimedia related courses. Subject Computer sound processing. Topics covered include: filters and delays, modulators and demodulators, nonlinear processing, spatial effects, time-segment processing, time-frequency processing, source-filter processing, spectral processing, time and frequency warping musical signals.
Discussing DAFX from both an introductory and advanced level, the book systematically introduces the reader to digital signal processing concepts, how they can be applied to sound and their use in musical effects.
This makes the book suitable for a range of professionals including those working in audio engineering, as well as researchers and engineers involved in the area of digital signal processing along with students on multimedia related courses. List of Contributors. Verfaille, M. Holters, U. Dutilleux, M. Holters, S. Disch, U. Sound and Music. Dutilleux, K. Dempwolf, M. Pulkki, T. Lokki, D. Dutilleux, G. De Poli, A.