In our paper, we presented two approaches based on optimal transport (OT) to interpolate between two audio signals (source and target). The first method consists in applying exact OT to the normalized spectrograms of the input signals. The second method relies on the design of a cost matrix that forbids remote displacement of mass in the temporal domain (horizontal axis). This is made possible by relaxing the marginal constraints with unbalanced optimal transport (UOT). We present here results conducted on musical tones and environmental sounds.
The code is available on our github page
How to use: Below you can find results for different experiments. Select an experiment by clicking on the table of contents below. There's a Go Top button below to come back here.
Each experiment is composed of a 2x2 grid with spectrograms and an audio player. On the top of each grid, you will find the source and target signals. On the bottom left, you will find results for exact OT and on the right, results obtained with the structured cost matrix and UOT. You can vary the parameters with the following sliders:
Results:
Simple case with two musical notes. While exact fails to preserve the rhythm, using the structured cost matrix and UOT circumvents this.
Interpolation between more textured sounds. The interpolation yields a sound in between a cicada and the flow of water.
Interpolation between a guitar note with or without a distortion effect. Intuitively, varying the interpolation parameter would result in more or less distortion, although results don't support this idea.
Interpolation between two notes, A3 and C3, played on a piano. Intuitively, the interpolation parameter would generate notes between A3 and C3 but results don't support this idea.
An example of interpolation between a musical note and a textured sound.
Another example of interpolation between a musical note and a textured sound.
The input signals are the same piano note played with more or less intensity. Intuitively, one would like to have more or less intensity in the resulting interpolations. However, the results don't support this.
We interpolate here between a piano note and the sound of a water drop.