Interplay between deep learning and cognitive science

Coordinator: Laurent Bonnasse-Gahot (CAMS, EHESS, lbg@ehess.fr)

Registrations: Laurent Bonnasse-Gahot (CAMS, EHESS, lbg@ehess.fr) & Jean-Pierre Nadal (LPENS, ENS & CAMS, EHESS, jean-pierre.nadal@phys.ens.fr)

Recommended Level: Biology, Cognitive Science: M2; Mathematics, Physics, Computer Science: M1 or higher. The course is also of interest for PhD students or postdocs in Cognitive Science or Computational Neuroscience.
Semestre: S3, ECTS: 4
Number of hours: 26h
Prerequisites: Elementary mathematics (calculus, linear algebra); Python basics for the TP
Course taught in: English
Code: COGSCI 317

Schedule & Room: Wednesdays, 4:00 pm to 6:00 pm, salle Langevin, 29 rue d'Ulm
First course: September 18, 2024

1. Course description

1a. Course description (English)

The past decade has seen an impressive development of artificial intelligence (AI) techniques based on neural networks and machine learning. The revolution began in the field of vision, where these neural networks were able to recognize objects with performance far superior to those of other algorithms used at that time. Many other fields are now also being transformed by neural techniques, notably natural language processing, with applications such as subtitling or automatic translation (modulo some minor modifications, this course presentation in English is what you obtain by automatic translation of the French presentation below making use of AI tools, deepL or Google Translate). AI tools are ubiquitous in our lives today, and are revolutionizing many areas of science where they are used as data analysis tools.

In cognitive science, the relationship with machine learning has been very strong since the beginning, beyond the use as a mere tool, but as computational cognitive models around themes such as learning, language processing, categorization, etc. The recent major successes of machine learning are at least a conceptual proof that a distributed system based on elementary basic units can lead by learning (by modifying the connections between these units) to very high level behaviors. But can we speak of models of the brain?

The objective of this course is to present and discuss these artificial neural methods and their applications in the study of cognition. The course aims at showing the latest advances in machine learning and its use to understand our cognition, while also highlighting the limits posed by current techniques if considered as models of the brain.

In a first part, the fundamental bases, the history and the development of these techniques will be presented. After a presentation of the perceptron, the gradient descent and backpropagation algorithms, the course will present the main contemporary neural methods and architectures, in particular convolutional networks, recurrent networks (such as LSTM), and Transformers, which are modern neural networks based on a concept of attention that are used in large language models such as ChatGPT.
In a second part, guest lecturers will present the use made of these models in their research in cognitive science. Particular focus will be on the comparison between patterns of activations in artificial and biological neural systems.
One session will be devoted to the implementation of a learning algorithm.
The last part will be devoted to students' presentations, considered as part of the course.

Notice: This course is NOT a substitute for an advanced course on deep learning or more broadly machine learning. This course does NOT address issues in statistical inference.

1b. Description du cours (français)

La dernière décennie a vu le développement impressionnant des techniques d'intelligence artificielle (IA) basées sur les réseaux neuronaux artificiels et l'apprentissage automatique. La révolution a commencé dans le domaine de la vision, où ces réseaux neuronaux ont été capables de reconnaître des objets avec des performances bien supérieures à celles des autres algorithmes utilisés à l'époque. Aujourd'hui, de nombreux autres domaines sont également transformés par les techniques neuronales, notamment le traitement du langage naturel, avec des applications telles que le sous-titrage ou la traduction automatique (modulo quelques modifications mineures, le présent texte de présentation du cours en français s'obtient par traduction automatique du texte en anglais ci-dessus avec des outils IA, deepL ou Google Traduction). Les outils IA sont omniprésents dans nos vies aujourd'hui, et révolutionnent de nombreux domaines scientifiques où ils sont utilisés comme outils d'analyse de données.

En sciences cognitives, la relation avec l'apprentissage automatique est très forte depuis le début, au-delà de l'utilisation comme simple outil, comme modèles cognitifs computationnels autour de thèmes tels que l'apprentissage, le traitement du langage, la catégorisation, etc. Les grands succès récents de l'apprentissage automatique sont au moins une preuve conceptuelle qu'un système distribué basé sur des unités élémentaires peut conduire par apprentissage (en modifiant les connexions entre ces unités) à des comportements de très haut niveau. Mais peut-on parler de modèles du cerveau ?

L'objectif de ce cours est de présenter et discuter ces méthodes de neurones artificiels et leurs applications dans l'étude de la cognition. Le cours vise à montrer les dernières avancées de l'apprentissage automatique et son utilisation pour comprendre notre cognition, tout en soulignant les limites posées par les techniques actuelles si elles sont considérées comme des modèles du cerveau.

Dans une première partie, les bases fondamentales, l'histoire et le développement de ces techniques seront présentés. Après une présentation du perceptron, des algorithmes de descente de gradient et de rétropropagation, le cours présentera les principales méthodes et architectures neuronales contemporaines, en particulier les réseaux convolutifs, les réseaux récurrents (tels que les LSTM), et les Transformers, qui sont des réseaux de neurones modernes basés sur un concept d'attention utilisés dans de grands modèles de langage tels que ChatGPT.
Dans une seconde partie, des conférenciers invités présenteront l'utilisation faite de ces modèles dans leurs recherches en sciences cognitives. Un accent particulier sera mis sur la comparaison entre les modèles d'activations dans les systèmes neuronaux artificiels et biologiques.
Une session sera consacrée à l'implémentation d'un algorithme d'apprentissage.
La dernière partie sera consacrée aux présentations des étudiants, considérées comme faisant partie du cours.

Remarque : Ce cours ne remplace pas un cours avancé sur l'apprentissage profond ou plus largement l'apprentissage automatique. Ce cours n'aborde pas les questions d'inférence statistique.

1c. Prerequisites

Some familiarity (knowledge and practice) with elementary mathematics -- calculus, linear algebra -- is strongly recommended. Prior knowledge in machine learning will help but is not mandatory.
Basic programming knowledge, preferably in Python, is necessary for the hands-on (TP) session.

2. Learning outcomes

Students will learn or review the basics of modern artificial neural networks, with a focus on how these techniques emerged. In this way, they will discover the principles underlying the main and most recent neural architectures.
They will learn how the latest advances in machine learning are used in the understanding of cognition, together with the current limitations that these techniques face within this framework.
The students will also learn to implement from scratch a basic backpropagation algorithm, and how this can be done in a few lines making use of modern open source tools.

3. Pedagogy, Validation, Readings

3a. Pedagogy, course organization and homework

The course will be given in English.
Organization:

Four theoretical courses of two to three hours. An important part will be done on the (black/white)board for step by step presentations of basic ideas underlying the artificial network architectures.
Two hands-on session devoted to implementing learning algorithms in Python.
Four courses of two hours given by invited lecturers on topics at the interface AI/cognition.
Two sessions of two hours dedicated to the students' presentations: students taking the course for credits will contribute to the course through their presentation of a course-related topic (see Assessment).

Students are encouraged to participate in lectures by asking questions, included simple questions, asking for clarification. Questions can be asked in English or French.

3b. Assessment

Regular attendance is mandatory for all students registered for credit. Students in pairs will have to choose a theme (either from a list of proposals, or proposed directly by the student) of interest for the course, at the border between machine learning and cognitive science. They will give an oral presentation in front of the other students, presenting a summary of the chosen topic, using possibly several sources or articles as references.

3c. Textbook and Readings

This course does not have a textbook since it will cover a broad field and will be largely based on recent literature. Students will be provided with lecture slides, a list of general references and of references specific to each lecture. Every paper will be (made) accessible online.
A list of topics and articles for the final validation will be provided.

Selection of general references:

— Introductions to deep learning:

LeCun, Y., Bengio, Y., and Hinton, G. (2015). Deep learning. Nature, 521(7553):436–444.
Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural networks, 61, 85-117.
Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep Learning. MIT Press.

— On the interface between neuroscience and deep learning:

Hassabis, D., Kumaran, D., Summerfield, C., and Botvinick, M. (2017). Neuroscience-inspired artificial intelligence. Neuron, 95(2):245–258
Cichy, R. M. and Kaiser, D. (2019). Deep neural networks as scientific models. Trends in cognitive sciences, 23(4):305–317.
Richards, B. A., Lillicrap, T. P., Beaudoin, P., Bengio, Y., Bogacz, R., Christensen, A., Clopath, C., Costa, R. P., de Berker, A., Ganguli, S., et al. (2019). A deep learning framework for neuroscience. Nature neuroscience, 2(11):1761–1770
Yang, G. R. and Wang, X.-J. (2020). Artificial neural networks for neuroscientists: A primer. Neuron, 107(6):1048–1070.
Saxe, A., Nelli, S., and Summerfield, C. (2021). If deep learning is the answer, what is the question? Nature Reviews Neuroscience, 22(1):55–67.

— Pedagogical resources: Mathematical tools

Mathematical Tools for Neuroscientists, by Ella Batty
Mathematical Tools for Neural and Cognitive Science, by Mike Landy & Eero Simoncelli https://www.cns.nyu.edu/~eero/math-tools

4. Course content

4a. Theoretical lectures (two to four hours each)

Introduction to artificial neural networks
This first course will first give a brief general introduction to artificial neural networks and deep learning. It will then offer the roots of the discipline by presenting the artificial neuron, the perceptron and its limits, the multi-layer perceptron, and finally gradient descent and the backpropagation algorithm.

Convolutional Neural Networks (CNN)
This course will present the convolutional neural network, from its origins to the more recent variant called residual neural network (ResNet). While the next theoretical courses on sequential data will be completed by followed-up presentations by invited lecturers on the cognitive applications of these methods, this course will directly discuss the use of CNN in the application of deep learning for the study of vision, with various comparisons with the human (or other animals) brain.

Recurrent Neural Networks (RNN)
This lecture and the following will be devoted to the learning of representations of sequential data through artificial neural processing. First, we will introduce the notion of (fixed) embeddings, following by the presentation of the vanilla recurrent neural networks. The issues of vanishing and exploding gradients will be presented, along with different solutions, notably the invention of the Long Short Term Memory network. The course will conclude by a presentation of the seq2seq networks, which were introduced to handle variable sized sequence to sequence conversion.

Attention, Transformers, & Large Language Models
This lecture is devoted to the presentation of the concept of attention, as first introduced as a mean to better handle long sequences in seq2seq networks. We will present the model based on self-attention called Transformers. This model family initiated an important revolution in natural language processing just a few years ago, as exemplified by large language models (LLMs), and is now widespread in many domains, notably for image analysis.

4b. Hands-on session (`travaux pratiques')

The goal of this lab session is to implement a feedforward neural network along with gradient descent learning with the backpropagation algorithm. This implementation will be done from scratch in python/numpy, using the theoretical knowledge developed during the first course. This will help the students to well understand the fundamentals behind deep learning. The course will conclude with a brief presentation and comparison with a modern implementation that makes use of a deep learning framework such as TensorFlow/Keras or PyTorch.

4c. Invited lectures (two hours each)

These lectures aim at showing how the latest advances in machine learning are used in the understanding of cognition, while also emphasizing the current limitations that these techniques face within this framework. The choice of the topics may change from one year to the other depending on the recent literature on the interface AI/cognition.

Use of recurrent neural networks to understand executive and motor functions
Lectured by Adrian Valente (PhD student - LNC2, INSERM-ENS).
After a brief reminder on RNNs, this course will introduce and explain the notions of state space, dimensionality reduction, and dynamic system. It will present a framework to reverse engineering a recurrent neural network to gain insights on their working. The course will provide a comparison with neuronal data and various examples of applications: decision-making, working memory, spatial navigation (head direction and grid cells), motor cortex.

Emergence of linguistic knowledge in neural networks
Lectured by Nicolas Guerin (PhD student - LSPC, EHESS-ENS-CNRS-PSL).
This course will present a series of studies on the emergence of linguistic knowledge in neural networks that are trained in a self-supervised fashion, in particular with a simple next-word prediction task, also called language modeling task. Despite the simplicity of such a task, the neural networks trained in such a way develop many interesting semantic and syntactic capabilities that will be explored during this course.

Use of artificial neural networks for the study of the acquisition of language by children
Lectured by Emmanuel Dupoux (EHESS, LSCP, EHESS-ENS-CNRS-PSL).
Pretrained Neural Language Models have claimed to be good models of adult language processing both behaviorally and neurally. But are they also good models of language acquisition? We briefly review key findings and theories about language acquisition in the human infant and present recent research aimed at learning language models directly from the raw audio signal using self-supervised objectives. We explore how the trained models can be probed at various linguistic levels using tasks inspired by psychophysics and psycholinguistics and how the results compare with landmarks in early language acquisition. We discuss additional problems that infants are faced with on the sensory side, such as noise, speech overlap, signal distortion and present some recent work that address these problems by training models in naturalistic audio captured in daylong recordings. We raise the issues of insufficient inductive biases and data efficiency in current language models and present briefly the line of research into the emergence of communication in neural agents as a possible way to tackle these issues.

Comparison between artificial neural networks and the human brain during language processing
Lectured by Christophe Pallier (CNRS - Cognitive Neuroimaging Unit, INSERM-CEA).
Capitalizing on the impressive performance achieved on various linguistic tasks by Neural Language Models, some neuroscientists have started to use them to study neural activity in the human brain during language processing. We will present the state of art in this emerging field, explaining how the activations from these artificial models are fitted to imaging data (fMRI or MEG) to computed brains scores that evaluate their capacity to predict neural time-courses.
We will then discuss the influence of factors such as test loss, training corpus and model architecture on the ability of a neural language model to capture brain activity.
We will also show how neural language models can be used to try and answer neurolinguistics questions such as which brain areas are sensitive to syntactic or semantic information, what is the size of the context of information integration of various regions, etc.

4d. Student presentations (two sessions of two hours)

These course validation sessions will be considered as part of the course (attendance will be mandatory as for the other sessions).

5. Course Policies

Laptop/phone policy
Laptops are necessary for the hands-on session. Phones are forbidden.

Attendance
Attendance is mandatory.

Participation
The students are encouraged to work in a collaborative manner and ask questions during the sessions. They can also contact the course coordinator by email.

Homework
Assignments described above.

Academic honesty policy
Cheating will not be tolerated and may cost you your grade as well as have deeper repercussions in your academic career. The following is a non-exhaustive list of examples of what counts as cheating in this course: signing on the attendance sheet without attending the class (e.g. signing and leaving, or signing for someone else); using the same homework to validate two courses.