TAPESTREA : Analysis mechanisms

version: 0.1.x.x (tap tap)

home: http://taps.cs.princeton.edu


Taps lets you extract several types of components from a sound recording:

  • Deterministic events : Sinusoidal or pitched foreground events, like birds or voices
  • Transient events : Brief, noisy foreground events, like claps or crashes (!)
  • Stochastic background : Background noise, din or texture, like wind or street noise
  • Raw templates : Selected time-frequency regions, without additional event or background extraction

Each type of component is analyzed and stored differently using specialized algorithms.


Deterministic Events

Deterministic events are found using sinusoidal analysis. The most prominent frequency components are found in each frame and matched across frames to get a number of continuous sinusoidal tracks (Serra). Sinusoidal tracks can be optionally grouped into events based on harmonics, common frequency and amplitude modulation, and common onset or offset times.

Some amount of preprocessing on recordings is also an option. When running in preprocess mode, taps writes out the FFT frames for the given sound file, as well as all the sinusoidal peaks for each frame. The preprocessed data files can then be loaded for analysis in regular mode.

The strength of the Taps approach to sinusoidal (and other) analysis lies in the flexibility of the analysis parameters, and in the user interface for manipulating them, which together provide enough control to extract specific events.


Transient Events

Transient events can be found by looking for time-domain segments with suddenly high signal energy, in either of two ways.

  • Envelope follower filter : The more tested implementation, this uses a non-linear one-pole filter to obtain an envelope for the given sound. The envelope is then checked for points of rapid increase, which mark possible transient onsets.
  • Energy ratio : This implementation is less supported, but can sometimes find transients missed by the other (and vice versa). "Energy ratio" refers to the ratio of the signal energy in a brief window to the energy in a longer surrounding window. The brief windows with the highest ratios are more likely to belong to transient events, one of which can last across several brief windows. (Verma & Meng)

Stochastic Background

The stochastic background is found by removing the detected deterministic and transient events from the original sound.

  • Deterministic events are removed in the frequency domain by smoothing down the magnitudes in and around bins corresponding to peak frequencies.
  • Transient events are replaced by background din that is statistically similar to neighboring transient-free segments, synthesized by a wavelet tree learning algorithm (Dubnov).

Raw Templates

For the given time region, the selected frequency range is extracted using bandpass filtering on the FFT.



taps | soundlab | cs | music