Have you ever wondered why artificial intelligence (AI) is biased? For
instance, why do AI-generated images of productive people only feature
white middle-aged men wearing ties? Or why do AI-generated images of
social service workers only show women from different ethnic groups? The
answer is Data Bugs.
PLAY
Personally - and in Dotdotdot too - we believe that AI is a tool that
can be used for good or evil, just like a pen can be used to write
poems or war declarations. But how does AI become racist or
misogynist? We found the answer in the training phase, where biased
data can lead to biased results.
Currently, 90% of the data used to train AI models is produced by just
10% of the population. This results in a dataset that represents a
narrow point of view, which is our own. As a Western society, we bring
biased concepts into these datasets, which the AI model then learns
and reinforces. The more a concept is present in the dataset, the more
the algorithm will consider it as the best answer. Eventually, the
real problem takes place in what is called Latent Space. A latent
space is defined as a multidimensional space that encodes a reduced,
meaningful representation of externally observed events. In the latent
space, similar samples in the external world are placed next to each
other.
The main problem with this approach is representation. As Bernice King
said, If you don't think representation matters, you are probably well
represented. As Western people, we are aware that we are part of the
problem. As designers at Dotdotdot, we know that the best way to
communicate complex concepts is to abstract them. So we decided to
show the bug in the data by using bugs as data.
Bugs are an excellent way to explain a problem that is both
technological and social. Like our society, the vast variety of
insects is often stereotyped and simplified into standard categories.
We chose eight representative bugs and created two different datasets.
In one dataset, each bug was represented only in the Western vision,
while the other dataset covered a wide range of diversity and meanings
by using images from an entomology archive. By using bugs as data, we
were able to demonstrate the impact of biased data on AI models. In
this way, in one dataset the representation of each bug was reflecting
the Western vision, and in the other, we were covering a huge amount
of diversity and meanings.
Fed up with the simplification of AI as a prompting tool, we decided
to spatialize the latent space, giving it a physical dimension. Users
were thus able to move within it, understanding the role it plays in
image generation. As is the case in latent space, each of the eight
insects thus had a conceptual representation, and based on the
distance the user had from each insect it had more or less weight on
the generated output. Throughout the experience there were then the
two different models constantly generating new images, placing each
user in front of a visual translation of how diversity in datasets
plays a key role.
The model that was biased and trained on a single insect concept, was
not able to produce diverse outcomes and kept generating the same
butterflies and stick insects. On the other hand, when there was a
greater diversity of colors and shapes, the model generated a wider
range of insects that had varying meanings and interpretations. This
demonstrates that each insect can have infinite representations, and
more importantly, that greater diversity can lead to unique and
meaningful results.
In conclusion, more than 55 thousand pairs of images (a total of
110462) were generated during the 7 days of public opening. The two
image generation models worked for a total of 38 hours, totaling a
number of about 1200 unique sessions.
The experience was a huge success, both from the public and the press,
and from an incredible number of insiders. It was a wonderful
opportunity to meet new people. Also, icing on the cake, we won the
interaction design mention of the Fuorisalone Award :))