Audio signal interpolation using optimal transportation of spectrograms

David Valdivia1   Marien Renaud 2   Elsa Cazelles1,3   Cédric Févotte1,3

1IRIT, Université de Toulouse     2IMB, Université de Bordeaux     3CNRS

In our paper, we presented two approaches based on optimal transport (OT) to interpolate between two audio signals (source and target). The first method consists in applying exact OT to the normalized spectrograms of the input signals. The second method relies on the design of a cost matrix that forbids remote displacement of mass in the temporal domain (horizontal axis). This is made possible by relaxing the marginal constraints with unbalanced optimal transport (UOT). We present here results conducted on musical tones and environmental sounds.

The code is available on our github page

How to use: Below you can find results for different experiments. Select an experiment by clicking on the table of contents below. There's a Go Top button below to come back here.

Each experiment is composed of a 2x2 grid with spectrograms and an audio player. On the top of each grid, you will find the source and target signals. On the bottom left, you will find results for exact OT and on the right, results obtained with the structured cost matrix and UOT. You can vary the parameters with the following sliders:

Results:

C3-G3: Piano/Guitar interpolation

Simple case with two musical notes. While exact fails to preserve the rhythm, using the structured cost matrix and UOT circumvents this.

Source

Source Image

Target

Target Image

Exact OT

Exact OT Image

Structured cost matrix + UOT

Structured cost matrix + UOT Image

Cicada chirp / Water flow

Interpolation between more textured sounds. The interpolation yields a sound in between a cicada and the flow of water.

Source

Source Image

Target

Target Image

Exact OT

Exact OT Image

Structured cost matrix + UOT

Structured cost matrix + UOT Image

Clean vs Overdriven guitar

Interpolation between a guitar note with or without a distortion effect. Intuitively, varying the interpolation parameter would result in more or less distortion, although results don't support this idea.

Source

Source Image

Target

Target Image

Exact OT

Exact OT Image

Structured cost matrix + UOT

Structured cost matrix + UOT Image

Piano: from A3 to C3

Interpolation between two notes, A3 and C3, played on a piano. Intuitively, the interpolation parameter would generate notes between A3 and C3 but results don't support this idea.

Source

Source Image

Target

Target Image

Exact OT

Exact OT Image

Structured cost matrix + UOT

Structured cost matrix + UOT Image

Piano / Cicada chirp

An example of interpolation between a musical note and a textured sound.

Source

Source Image

Target

Target Image

Exact OT

Exact OT Image

Structured cost matrix + UOT

Structured cost matrix + UOT Image

Piano / Water flow

Another example of interpolation between a musical note and a textured sound.

Source

Source Image

Target

Target Image

Exact OT

Exact OT Image

Structured cost matrix + UOT

Structured cost matrix + UOT Image

Piano / forte

The input signals are the same piano note played with more or less intensity. Intuitively, one would like to have more or less intensity in the resulting interpolations. However, the results don't support this.

Source

Source Image

Target

Target Image

Exact OT

Exact OT Image

Structured cost matrix + UOT

Structured cost matrix + UOT Image

Piano / Water drop

We interpolate here between a piano note and the sound of a water drop.

Source

Source Image

Target

Target Image

Exact OT

Exact OT Image

Structured cost matrix + UOT

Structured cost matrix + UOT Image