The Spectral Parrot

The Spectral Parrot is a self-actuating (feedback) instrument with 8 strings, each of which is controlled by a motorized tuner. The motors are controlled by a deep reinforcement learning algorithm that learns to tune the instrument's strings to approximate a target audio spectrum.

Photo by Victoria Alexandrova

Many deep learning networks are pre-trained prior to being released into the wild. This training often (at least for the large language models and image generators many are familiar with) happens in a purely digital space and is scaffolded by thousands of powerful GPUs, meaning that training time is accelerated to extreme speeds. In contrast, because the Parrot is bounded by its physical reality of motors taking time to turn and acoustic sound taking time to manifest, learning is very slow. An ambitious estimate is that it would take it 20-40 hours to learn to approximate even a single static sound. What it would take to learn to model a dynamically shifting spectrum is an open question.

Importantly, the Parrot is not necessarily less efficient than accelerated training infrastructures, it is just slower.

The following video is from the instrument's debut performance at Radialsystem, Berlin, on November 15th, 2025:

Deep Reinforcement Learning

Deep reinforcement Learning (DRL) is a subset of the field of deep learning, which relies on neural networks with 1 or more layers between their input and output layer (in this sense they are considered deep). In traditional reinforcement learning, an agent becomes increasingly proficient in interacting with an environment by learning a policy, that is, a tactic for how to engage with tasks in that environment. When combined with a deep neural network, reinforcement learning becomes a very powerful approach, especially in the case where the agent does not know what it might encounter in its environment prior to engagement, for example as would be the case when it plays alongside and with an improvising musician. The Spectral Parrot is a proximal policy optimization (PPO) variation of DRL. In comparison to prior DRL networks, a PPO algorithm has a safeguard against making too drastic policy updates, which can destabilize learning. Additionally, the network is comprised of two components: an actor that takes actions (tunes the strings) and a critic that evaluates the actions of the actor, helping to optimize and stabilize the training progress.

Van magazine has kindly allowed me to publish the original English text of an interview published in German focusing on the work I did with the Parrot during the STIP-4 residency awarded by Musikfonds in 2024-2025: Click here

All code and build instructions are available at https://github.com/Adampultz/spectral-parrot

Thank you to Sukandar Kartadinata for consultations around motors, building a resonater, and general encouragement. Thank you to David Pirró and Gerhard Eckel at Institute of Electronic Music and Acoustics (IEM) at Kunstuniversität Graz for hosting me on a month-long residency in the spring. Thank you to Koda Kultur, Dansk Komponistforening, and Culture Moves Europe for additional infrastructure support.