And henceforth, sound!
Scheme for my 21M.380 Final Project: Musical Pictures
Tools to generate a 3:30 minute composition given a photograph
The composition, and the image, will be divided into n sections, in order to encourage diversity throughout the piece. In order to create a MusicalPicture, just construct a MusicalPicture object with 2 parameters:
MusicalPicture(".png/.jpg/.bmp", number_of_sections)
These slices will be vertical (they will divide the width of the image), and are intended to be performed from left to right, although they can be performed in any order.
Each section of the piece will pull from its own pool of pitches, note durations, and velocities, and will have its own synthesizer instrument.
The information for each section can be written to a .txt file that is formatted as an odot bundle in order to easily be copied into Max/MSP.
A one dimentional row of information, or a 1D signal, can be generated in many ways from an image. For example, any cross section of the image can be viewed as a 1D signal:
Examples of wave forms from the 50th and 1000th rows of an image, respectively (notice how they can be very different?)
For the composition, I plan to average the pixels across the columns of a section (converted to grayscale) in order to create a 1D signal. There is a tool included, MusicalPicture.generate_single_wav
, that will create 1 period of a signal from the section and save it as a .wav file. This file can be imported into Max/MSP and used as a custom wave form in the cycle~ object.
Each section will pull from a default of 7 pitches representing the 7 most dominant hues in the section of the image. I found the most dominant colors in the HSV colorspace by using K-Means Clustering.
Example of the most dominant colors from each section of an image
Now that the colors are extracted, they need to be converted into pitches (in Hz). Using inspiration from this article, the tool finds the closest western chromatic pitch associated with the hue of the color, and scales it up by an octave given how saturated the color is.
The tool will also generate the frequency at which the colors appear, which will be fed into o.random.weighted in order to pick pitches from colors that are more dominant more frequently.
For my final project, I decided to explore the pure sonification of a photograph. While sound is a one dimentional signal, pictures can be represented in 2 or three dimensions, depending on the type of color data available. Working with a colored image, I had many degrees of freedom on how I chose to sonify each component. Therefore, the further I went, the more I realized that a "pure" sonification does not really exist, when trying to manipulate data in this way. However, I did try to limit my creative decisions to be focused around the image data.
The general scheme for the project changed over time, but eventually settled to focusing on a few aspects of the image. Firstly, the image is divided into 5 vertical strips (although this can be any number, I decided on 5). Each section of the image represents a section of the piece, read from left to right. To compose a 3.5 minute piece, I specified each section to represent 42 seconds each. There are two main areas of data I extractd from each section:
The main synthesizer I used was generated by using a custom waveform in the cycle~ object. This custom waveform was generated directly from the shape of the data. Using python, I converted the image sections into grayscale, meaning that each pixel only has one value. I then averaged the grayscale values along the rows, so the result was a 1D array of numbers. These numbers represented a wave shape that I could save as a .wav file and import into MAX. The difference in the tambre of the synthesizers from section to section are suble, but can definitely be heard, which was exciting.
Once the instruments were designed, I decided to structure the melody of the piece by choosing from a random pool of pitches and durations. The pool was chosen from the segments of images, by looking at the color information in each segment. I started by finding the 7 most frequently appearing colors in the segment, using "k-means clustering" in python. Instead of leaving the colors encoded as RGB, I transformed them into the Hue Saturation Value colorspace (displayed below), which I thought would be more useful. I wanted the hue to inspire the picthes that were used, so I converted them to notes on the western chromatic scale that hold the same frequency as the color, but ~40 octaves down! It was really interesting researching various ways to associate colors and sound, but this way seemed the most straightforward.
I used the "value" information (or basically, how dark the colors were) to generate the durations. This meant that brighter sections would have more quickly changing notes, which you can notice around the middle of the piece, where there is the bright sun in the center. The notes move faster, and I also increased the volume when the overall brightness of the section was higher.
The "saturation" information (or how grey a color is) was only used to increase pitches by an octave if the colors were more saturated. This was simply to add a bit more range to the pool of pitches. All of the color data I exported to a text file I could copy and paste into an odot bundle, and I did most of the conversions using odot scripting.
The second major topic from the class I decided to implement was randomness. Once the pool of notes was generated, I used the frequency with which the colors appear in order to use a weighted random selector to choose which note would play and for how long. For example, if there is a lot of red in one segement of the image, you would hear the pitch F more frequently than the other notes in the pool. If it was a darker red, the note durations would often be much longer.
In order to add more texture, I cycled through the pool of pitches in each segment, dropped them down by 2 octaves, and fed them into a droning synthesizer that played underneath the melody. This made the piece much easier to listen to, and I ended up liking the result enough to keep it in the piece.
When I started this project, I had no idea what the final result would sound like, and I let the image shape the result for the most part. I did not expect for the result to resemble an "aesthetic" representation of the image. In fact, the piece sounds almost very sinister and creepy, whereas the picture looks quite happy and hopeful! What I would try to demonstrate with this piece is that data does not always represent the beauty in something. However, there are infinitely other ways I could have sonified the image that might have resulted in a less dissonant result, this was just how my manipulation of the parameters worked out.
Having had more time, I would have tried to find a way to make the sections of the image less discrete and more continuous. Then the piece would progress without the listener really realizing the changes happening until perhaps the end, when the tone becomes much different. I would also have liked to tie the note envelopes to the photo information as well, so that there is more variety in the note shape. I would also love to run my code on lots of other images, especially ones with more variety of colors. It would be really interesting to see what sorts of musical moods could be created from different types of images, and what patterns I notice.