"Mixing in latent space"

Implementation using two classes of 2-D "dots” as proxies for audio stems. The sum of the stems xi appears in the bottom left in green as the “mix”. In the middle column, we apply some nonlinear twisting and leveling to the “dots” in the left column. In the bottom right, the sums of the embeddings (purple shapes) lie right on top of the embeddings of the mixes (green shapes). Finally, the yellow dots in the bottom middle covering the green dots confirm that we have learned an invertible mapping.

Abstract

We investigate the construction of latent spaces through self-supervised learning to support semantically meaningful operations. Analogous to operational amplifiers, these "operational latent spaces" (OpLaS) not only demonstrate semantic structure such as clustering but also support common transformational operations with inherent semantic meaning. Some operational latent spaces are found to have arisen "unintentionally" in the progress toward some (other) self-supervised learning objective, in which unintended but still useful properties are discovered among the relationships of points in the space. Other spaces may be constructed "intentionally" by developers stipulating certain kinds of clustering or transformations intended to produce the desired structure. We focus on the intentional creation of operational latent spaces via self-supervised learning, including the introduction of rotation operators via a novel "FiLMR'' layer, which can be used to enable ring-like symmetries found in some musical constructions.

BibTeX


        @article{HawleyTackett2024_OpLaS,
          Author = {Scott H. Hawley and Austin R. Tackett},
          Journal = {Journal of Audio Engineering Society},
          Title = {Operational Latent Spaces},
          Month = {June},
          Year = {2024}
        }