Artificial Aesthetics and Aesthetic Machine Attention

Jaana Okulov


The aesthetics of artificial intelligence is often viewed in relation to the qualities of their generated expressions. However, aesthetics could have a broader role in developing machine perception. One of the main areas of expertise in aesthetics is the understanding of feature-based information, which involves how the aesthetics of sensory features can cause affective changes in the perceiver, and the other way around – how affective states can give rise to certain kinds of aesthetic features. This two-way link between aesthetic features and affects is not yet well-established in the interdisciplinary discussion; however, according to perceptual psychology, it fundamentally constructs the human experience.

Machine attention is an emerging technique in machine learning that is most often used in tasks like object detection, visual question answering, and language translation. Modern use of technology most often focuses on creating object-based attention through linguistic categories, although the models could also be utilized for nonverbal attention. This paper proposes the following perceptual conditions for aesthetic machine attention: 1) acknowledging that something appears (aesthetic detection); 2) suspension of judgment (aesthetic recognition); and 3) making the incident explicit with expression (aesthetic identification and amplification). These aspects are developed through an interdisciplinary reflection of literature from the fields of aesthetics, perceptual psychology, and machine learning. The paper does not aim to give a general account of aesthetic perception but to expand the interdisciplinary theory of aesthetics and specify the role of aesthetics among other disciplines at the heart of the technological development of the human future.


Article received: May 10, 2022; Article accepted: July 15, 2022;  Published online: October 31, 2022; Original scholarly paper

How to cite this article: Okulov, Jaana. "Artificial Aesthetics and Aesthetic Machine Attention." AM Journal of Art and Media Studies 29 (October 2022): 13-28. doi: 10.25038/am.v0i29.534


attention; aesthetics; machine attention; feature-based knowledge; interdisciplinary theories.

Full Text:



Abrams, Jared, Antoine Barbot, and Marisa Carrasco. “Voluntary Attention Increases Perceived Spatial Frequency.” Attention, Perception & Psychophysics 72, 6 (2010): 1510–21. doi:10.3758/APP.72.6.1510.

AIartist. “Alexander Mordvintsev” (web page). Accessed on April 19, 2022.

Akten, Memo. Learning to See. 2017. Video series. Accessed on April 19, 2022.

alembics. “Disco-diffusion” (Github repository). disco-diffusion. Accessed on April 19, 2022.

Anton-Erxleben, Katharina, Christian Henrich, and Stefan Treue. “Attention Changes Perceived Size of Moving Visual Patterns.” Journal of Vision 7, 11 (2007): 5.

Barrat, Robbie (@videodrome). “left: the 'AI generated' portrait Christie's is auctioning off right now right: outputs from a neural network I trained and put online *over a year ago*. Does anyone else care about this? Am I crazy for thinking that they really just used my network and are selling the results?” Twitter, October 25, 2018. status/1055360024548012033. Accessed on April 19, 2022.

Bubl, Emanuel, Ludger Tebartz Van Elst, Matthias Gondan, Dieter Ebert, and Mark W. Greenlee. “Vision in Depressive Disorder.” The World Journal of Biological Psychiatry 10, (2009): 377–84.

Carrasco, Marisa. “Cross-Modal Attention Enhances Perceived Contrast.” Proceedings of the National Academy of Sciences of the United States of America 106, 52 (2009): 22039–40. doi:10.1073/pnas.0913322107

Carrasco, Marisa. “Visual Attention: The Past 25 Years.” Vision Research 51, 13 (2011): 1484–525. doi:10.1016/j.visres.2011.04.012

Carrasco, Marisa and Antoine Barbot. “Spatial Attention Alters Visual Appearance.” Current Opinion in Psychology 29 (2019): 56–64.

Carrasco, Marisa, Sam Ling, and Sarah Read. “Attention Alters Appearance.” Nature Neuroscience 7 (2004): 308–13.

Cheng, Keyang, Rabia Tahir, Lubamba Kasangu Eric, and Maozhen Li. “An Analysis of Generative Adversarial Networks and Variants for Image Synthesis on MNIST Dataset.” Multimedia Tools and Applications 79, 19 (2020): 13725–52.

Crawford, Kate. “NIPS 2017 Keynote Lecture: Trouble with Bias.” December 10, 2017. YouTube video, 49:31. Accessed on April 19, 2022.

Crespo, Sofia. Critically Extant. 2022. A collection of video works. Accessed on April 19, 2022.

Dhariwal, Prafulla, and Alexander Nichol. “Diffusion Models Beat GANs on Image Synthesis.” Advances in Neural Information Processing Systems 34 (2021): 8780–94.

Di Dio, Cinzia, Martina Ardizzi, Davide Massaro, Giuseppe Di Cesare, Gabriella Gilli, Antonella Marchetti, and Vittorio Gallese. “Human, Nature, Dynamism: The Effects of Content and Movement Perception on Brain Activations During the Aesthetic Judgment of Representational Paintings.” Frontiers in Human Neuroscience 9, 705 (2016). doi: 10.3389/fnhum.2015.00705

Elwes, Jake. Zizi – Queering the Dataset. 2019. 30-second extract of single channel. Accessed on April 19, 2022.

Firestone, Chaz, and Brian J Scholl. “Cognition Does Not Affect Perception: Evaluating the Evidence for ‘Top-Down’ Effects.” Behavioral and Brain Sciences 39 (2016). doi: 10.1017/S0140525X15000965

Guo, Hui, Shu Hu, Xin Wang, Ming-Ching Chang, and Siwei Lyu. “Eyes Tell All: Irregular Pupil Shapes Reveal GAN-Generated Faces.” In Proceedings ICASSP 2022–2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, 2904–8.

Hammerman, Robin, and Andrew L. Russell. Ada’s Legacy: Cultures of Computing from the Victorian to the Digital Age. London: Morgan & Claypool, 2015.

Hartmann, Matthias, and Martin H. Fischer. “Pupillometry: The Eyes Shed Fresh Light on the Mind.” Current Biology 24, 7 (2014): R281–R282.

He, Xiaodong. “Deep Attention Mechanism for Multimodal Intelligence: Perception, Reasoning, & Expression.” March 12, 2018. YouTube video, 2:12. YYKpS-Y75LY. Accessed on April 19, 2022.

Herndon, Holly, and Mathew Dryhurst. “Infinite Images and the Latent camera” (web page). Accessed on May 9, 2022.

Ihde, Don. Experimental Phenomenology: Multistabilities (2nd ed.). Albany, NY: State University of New York Press, 2012.

Jiang, Ming, Shengsheng Huang, Juanyong Duan, and Qi Zhao. “Salicon: Saliency in Context.” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, 1072–80.

Kaspar, Kai, and Peter Kӧnig. “Emotions and Personality Traits as High-Level Factors in Visual Attention: A Review.” Frontiers in Human Neuroscience 6, 321 (2012). doi: 10.3389/fnhum.2012.0032

Kümmerer, Matthias, and Matthias Bethge. “State-of-the-Art in Human Scanpath Prediction.” Preprint, submitted 2021. arXiv:2102.12239.

Kurniawan, Veldri. “The Neural Basis of Multisensory Spatial and Feature-Based Attention in Vision and Somatosensation.” PhD diss., School of Psychology, Cardiff University, 2012.

Lee, Honglak, Roger Grosse, Rajesh Ranganath, and Andrew Y. Ng. “Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations.” In Proceedings of the 26th Annual International Conference on Machine Learning, 2009, 609–616.

Locke, John. An Essay Concerning Human Understanding. London: Penguin Books, 1997. First published 1690.

Lupyan, Gary, and Emily J. Ward. “Language Can Boost Otherwise Unseen Objects into Visual Awareness.” Proceedings of the National Academy of Sciences 110, 35 (2013): 14196–201.

Menabrea, Luigi F. “Notions sur la Machine Analytique de M. Charles Babbage.” Bibliothèque Universelle de Genève 41 (1842): 352–76. First Translated by Augusta Ada Lovelace, Scientific Memoirs 3 (1843): 666–731).

Mersch, Dieter. Epistemologies of Aesthetics. Zurich: Diaphanes, 2015.

Min, Xiongkuo, Guangtao Zhai, Jiantao Zhou, Xiao-Ping Zhang, Xiaokang Yang, and Xinping Guan. “A Multimodal Saliency Model for Videos with High Audio-Visual Correspondence.” IEEE Transactions on Image Processing 29 (2020): 3805–19.

Mittal, Sarthak, Alex Lamb, Anirudh Goyal, Vikram Voleti, Murray Shanahan, Giullaume Lajoie, Michael Mozer, and Yoshua Bengio. “Learning to Combine Top-Down and Bottom-Up Signals in Recurrent Neural Networks with Attention Over Modules.” International Conference on Machine Learning 2020, 6972–86.

Nanay, Bence. “Attention and Perceptual Content.” Analysis 70, 2 (2009): 263–70.

Nanay, Bence. “Aesthetic Attention.” Journal of Consciousness Studies 22, 5–6 (2015): 96–118.

Nightcafe. “VQGAN+CLIP Text to Art Generator” (web page). https://creator. Accessed on May 9, 2022.

Ọnụọha, Mimi. Library of Missing Datasets. 2016. Mixed-media installation. Accessed on April 19, 2022.

OpenAI. “DALL·E 2” (web page). Accessed on May 9, 2022.

“Playing a Game of GANstruction; Eyeo 2019 – Helena Sarin,” June 5, 2019, video. Accessed on May 16, 2022.

Schnall, Simone. “Embodiment in Affective Space: Social Influences on Spatial Perception.” In Spatial Dimensions of Social Thought, edited by A. Maas and T. Schubert, 129–52. Berlin: De Gruyter Mouton, 2011.

Spence, Charles. “Crossmodal Correspondences: A Tutorial Review.” Attention, Perception & Psychophysics 73, 4 (2011): 971–95. doi: 10.3758/s13414-010-0073-7.

Spence, Charles. “Multisensory Perception.” Stevens’ Handbook of Experimental Psychology and Cognitive Neuroscience 2 (2018): 1–56.

Stein, Timo and Marius V. Peelen. “Object Detection in Natural Scenes: Independent Effects of Spatial and Category-Based Attention.” Attention, Perception, & Psychophysics 79, 3 (2017): 738–52.

Takakura, Jun'ya, Takayuki Nishimura, Damee Choi, Yuka Egashira, and Shigeki Watanuki. “Nonthermal Sensory Input and Altered Human Thermoregulation: Effects of Visual Information Depicting Hot or Cold Environments.” International Journal of Biometeorology 59, 10 (2015): 1453–60.

Vogt, Stine and Svein Magnussen. “Expertise in Pictorial Perception: Eye-Movement Patterns and Visual Memory in Artists and Laymen.” Perception 36, 1 (2007): 91–100.

Wang, Teng, Haifeng Hu, and Chen He. “Image Caption with Endogenous-Exogenous Attention.” Neural Processing Letters 50, 1 (2019): 431–43.

Whiteford, Kelly L., Karen B. Schloss, Nathaniel E. Helwig, and Stephen E. Palmer. “Color, Music, and Emotion: Bach to the Blues.” I-Perception 9, 6 (2018): 2041669518808535.

Xie, Weizhen and Weiwei Zhang. “The El Greco Fallacy and Pupillometry: Pupillary

Evidence for Top-Down Effects on Perception.” Behavioral and Brain Sciences, 39 (2016).

Yang, Tianyun, Juan Cao, Qiang Sheng, Lei Li, Jiagi Ji, Xirong Li, and Sheng Tang. “Learning to Disentangle GAN Fingerprint for Fake Image Attribution.” Preprint, submitted 2021. /doi: 10.48550/arXiv.2106.08749

Yarbus, Alfred. L. Eye Movements and Vision. New York: Plenum, 1967.

Yu, Ning, Larry S. Davis, and Mario Fritz. “Attributing Fake Images to GANs: Learning and Analyzing GAN Fingerprints.” In Proceedings of the IEEE/CVF International Conference on Computer Vision (7556–66). 2019.

Zeiler, Matthew D., and Rob Fergus. “Visualizing and Understanding Convolutional Networks.” In European Conference on Computer Vision, 2013, 818–33.

Zhang, Yuanyuan, Zi-Rui Wang, and Jun Du. “Deep Fusion: An Attention Guided Factorized Bilinear Pooling for Audio-Video Emotion Recognition.” In 2019 International Joint Conference on Neural Networks, IJCNN, 2019, 1–8.

Zou, Zhengxia, Zhenwei Shi, Yuhong Guo, and Jieping Ye. “Object Detection in 20 Years: A Survey,” 2019. Preprint, submitted 2021. arXiv:1905.05055.



  • There are currently no refbacks.

Copyright (c) 2022 AM Journal of Art and Media Studies

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

The content on this site is licensed under a Creative Commons Attribution 4.0 International License.

AM Journal of Art and Media Studies ISSN 2217-9666 - printed, ISSN 2406-1654 - online, UDK 7.01:316.774


Publisher: Faculty of Media and Communications, Singidunum University, Belgrade, Serbia

Indexed in: ERIH PLUSEBSCODOAJ, and in The List of Scientific Journals Categorization of Ministry of Education, Science and Technological Development of Republic of Serbia (M24-2021). Beginning with No. 12 2017, AM is indexed, abstracted and covered in Clarivate Analytics service ESCI.