This past December I attended a Machine Learning for Creativity and Design workshop at NeurIPS 2018 in Montreal. NeurIPS (formerly NIPS) is a Conference on Neural Information Processing Systems or what today is often referred to as "AI". It's one of the most popular conferences in the space attracting thousands of submissions and attendants each year.
Applying AI to creativity and design might seem counter-intuitive at first. Wasn't creativity supposed be the last bastion of humanity - the one skill that humans would always be better at than machines?
The work presented at NeurIPS made me think that this may no longer be the case. However, this isn't necessarily a bad thing. There might be a future where AI can assist designers and help them be more productive rather than automate the entire creative industry into oblivion.
A number of topics stood out at NeurIPS that will no doubt play a huge role in 2019. These included automatic generation of images, music and videos. There were also quite a few works on text generation, thanks to the latest advances in natural language processing (NLP) in recent years.
These generative models are usually trained on a large corpus of visual or auditory data. A successfully trained generative model has learned the patterns of variation in the data and is then able to produce new examples, such as photorealistic images of imaginary celebrities, music in the style of famous composers or novel movie scripts.
A lot of work in this area is based on Generative Adversarial Networks (GAN). GANs have emerged as one of the most popular models for a variety of generation tasks and operate on a simple idea. Instead of training one neural network, two competing networks are trained. One network, the generator, is trained to produce new data that imitate the input data. The other network, the critic, is trained to distinguish genuine data from data made-up by the generator. Both networks are thus locked into a competition where the generator tries to fool the critic and the critic tries not to fall for the generator's "forgeries". It's easy to see how every improvement in one network will incentivise improvements in the other, thus facilitating better and better results.
Generative models can be used to generate data which can be used to train another machine learning model. Think of a scenario where data is highly imbalanced, like training a neural network to detect a rare disease. By definition there are less disease examples than healthy cases. Here a generative model can be used to balance out the data. Another example is generating additional data to train a self-driving car. The GAN approach can be used to generate more winter-time data.
The kicker is that the generative process can be controlled or conditioned. Rather than blindly sample images of, say faces, the generative model can be conditioned to generate faces of a certain age, hairstyle, facial expression - even with sunglasses. There are obvious use cases in the beauty and fashion industries where this model could be used to virtually apply makeup or try on clothes.
Another topic that stood out at NeurIPS was the research around latent space, which refers to the "memory" or internal representation that a neural network has of the world. This internal representation is more compact than the input data and has a certain structure that groups similar things together. For example, if the input consisted of words, then words like 'apple', 'pear', and 'orange' might be represented closely together in latent space because they are all categorized as fruit.
One can think of all of these words as having a specific coordinate in the latent space. It's therefore possible to measure the distance between words and the direction one has to follow from one word to another. For another example, we could try to find a word that lies between two words. The word between "bored" and "angry" could be "annoyed".
Understanding this representation also gives a lot of control over the neural network's output. One of the most famous examples is "King - Man + Woman = Queen". This illustrates that by subtracting the latent space coordinates of "Man" from "King" one gets a concept akin to "royalty" (although the latent space has no name for it). Adding "royalty" to "Woman" then points to "Queen", which intuitively makes sense.
Interestingly, all of this seems to be true independent of the data that is represented in the latent space. It is possible - and sometimes useful - to do the same kind of algebra on pictures of faces, videos, or even music. Understanding the latent space gives control and allows entirely new applications, such as the smooth transition from one song to another by finding the shortest path between two songs in latent space. Spacesheet brilliantly allows latent space exploration using a spreadsheet interface.
While a lot of what was discussed above is still experimental and there may be gaps in putting them into production and we fully expect the landscape of AI-enabled Creativity and Design to further expand in 2019.
Interested in learning more about Artificial Intelligence? Click here to see what we're up to.