Audio signal interpolation using optimal transportation of spectrograms

David Valdivia¹ Marien Renaud ² Elsa Cazelles^1,3 Cédric Févotte^1,3

¹IRIT, Université de Toulouse ²IMB, Université de Bordeaux ³CNRS

In our paper, we presented two approaches based on optimal transport (OT) to interpolate between two audio signals (source and target). The first method consists in applying exact OT to the normalized spectrograms of the input signals. The second method relies on the design of a cost matrix that forbids remote displacement of mass in the temporal domain (horizontal axis). This is made possible by relaxing the marginal constraints with unbalanced optimal transport (UOT). We present here results conducted on musical tones and environmental sounds.

The code is available on our github page

How to use: Below you can find results for different experiments. Select an experiment by clicking on the table of contents below. There's a Go Top button below to come back here.

Each experiment is composed of a 2x2 grid with spectrograms and an audio player. On the top of each grid, you will find the source and target signals. On the bottom left, you will find results for exact OT and on the right, results obtained with the structured cost matrix and UOT. You can vary the parameters with the following sliders:

Interpolation parameter: slide between the source and target. Values close to 0% are close to the source and values close to 100% are close to the target.
UOT hyperparameter (only for UOT): marginal constraint relaxation paramter.
Time-limiting parameter (only for UOT): define for many time frames mass is allowed to go to.

Results:

C3-G3: Piano/Guitar interpolation

Simple case with two musical notes. While exact fails to preserve the rhythm, using the structured cost matrix and UOT circumvents this.

Source

Target

Exact OT

Interpolation:

Structured cost matrix + UOT

Interpolation:

UOT hyperparameter:

Time-limiting:

Cicada chirp / Water flow

Interpolation between more textured sounds. The interpolation yields a sound in between a cicada and the flow of water.

Source

Target

Exact OT

Interpolation:

Structured cost matrix + UOT

Interpolation:

UOT hyperparameter:

Time-limiting:

Clean vs Overdriven guitar

Interpolation between a guitar note with or without a distortion effect. Intuitively, varying the interpolation parameter would result in more or less distortion, although results don't support this idea.

Source

Target

Exact OT

Interpolation:

Structured cost matrix + UOT

Interpolation:

UOT hyperparameter:

Time-limiting:

Piano: from A3 to C3

Interpolation between two notes, A3 and C3, played on a piano. Intuitively, the interpolation parameter would generate notes between A3 and C3 but results don't support this idea.

Source

Target

Exact OT

Interpolation:

Structured cost matrix + UOT

Interpolation:

UOT hyperparameter:

Time-limiting:

Piano / Cicada chirp

An example of interpolation between a musical note and a textured sound.

Source

Target

Exact OT

Interpolation:

Structured cost matrix + UOT

Interpolation:

UOT hyperparameter:

Time-limiting:

Piano / Water flow

Another example of interpolation between a musical note and a textured sound.

Source

Target

Exact OT

Interpolation:

Structured cost matrix + UOT

Interpolation:

UOT hyperparameter:

Time-limiting:

Piano / forte

The input signals are the same piano note played with more or less intensity. Intuitively, one would like to have more or less intensity in the resulting interpolations. However, the results don't support this.

Source

Target

Exact OT

Interpolation:

Structured cost matrix + UOT

Interpolation:

UOT hyperparameter:

Time-limiting:

Piano / Water drop

We interpolate here between a piano note and the sound of a water drop.

Source

Target

Exact OT

Interpolation:

Structured cost matrix + UOT

Interpolation:

UOT hyperparameter:

Time-limiting:

Audio signal interpolation using optimal transportation of spectrograms

David Valdivia1 Marien Renaud 2 Elsa Cazelles1,3 Cédric Févotte1,3

1IRIT, Université de Toulouse 2IMB, Université de Bordeaux 3CNRS

C3-G3: Piano/Guitar interpolation

Source

Target

Exact OT

Structured cost matrix + UOT

Cicada chirp / Water flow

Source

Target

Exact OT

Structured cost matrix + UOT

Clean vs Overdriven guitar

Source

Target

Exact OT

Structured cost matrix + UOT

Piano: from A3 to C3

Source

Target

Exact OT

Structured cost matrix + UOT

Piano / Cicada chirp

Source

Target

Exact OT

Structured cost matrix + UOT

Piano / Water flow

Source

Target

Exact OT

Structured cost matrix + UOT

Piano / forte

Source

Target

Exact OT

Structured cost matrix + UOT

Piano / Water drop

Source

Target

Exact OT

Structured cost matrix + UOT

David Valdivia¹ Marien Renaud ² Elsa Cazelles^1,3 Cédric Févotte^1,3

¹IRIT, Université de Toulouse ²IMB, Université de Bordeaux ³CNRS