Major innovations are currently emerging in deep learning in stages: DeepMind, Google's AI research department, has now presented Gato after the 540 billion parameter model PaLM and immediately after a multimodal AI called Flamingo and is naming it after Flamingo and chinchilla in the animal kingdom (gato means cat).
According to the research team, the new AI agent can multitask across media boundaries (multimodal processing of text and images, among other things). He is supported in this by multiple embodiment (in English something like "embodiment in space") – an ability that plays a role in virtual reality research and in robotics through spatial sensors that simulate physical presence with the associated sensory perceptions to some extent. Described by its editors as a generalist, the AI agent is based on the latest advances in language modeling of large Transformer models.
On June 2nd and 3rd, the Minds Mastering Machines 2022 will take place again – this time in Karlsruhe. The Heise machine learning conference has a technical focus and is aimed at professionals who implement ML projects in technical reality: data scientists, software developers, software architects, project and team leaders.
Two days, three tracks: a lot of practice on the pulse of AI time – in person
Two prominent keynotes are planned: Frank Kraemer from IBM reports on current developments for autonomous driving, and Jonas Andrulis, CEO of Aleph Alpha, on deep learning world models and their latest capabilities fresh from the research laboratory for use in industry, among other things. Field reports are the focus, current ML topics such as symbolic reasoning, sentence embeddings, causal inference, data mesh and knowledge distillation are on the agenda – all tracks can be found in the program:
Continuous integration for ML – practical exampleInteractive data processingDeployment of MLData poisoning: Minimizing risks for ML applicationsImage-based fraud preventionML and legal issues
Tickets cost 980 euros (all prices plus VAT). The workshops "Introduction to Reinforcement Learning", "Anomaly Detection and Time Series" and "MLOps with Python and TensorFlow" each cost 495 euros, and there are discounted combined tickets.
Multitasking with a radius of action in space
According to the accompanying published research report, the egg-laying AI woolly cat should be based on a single, uniformly weighted transformer model. The underlying neural network can therefore do more than just solve text problems: it should also be able to label images (image captioning), stack physical blocks with a robotic arm and play Atari. Depending on the context, it apparently decides independently which tokens it outputs: text, torques for joints (joint torques), keystrokes or another variant of the output within the scope of its comparatively extensive options.
Agent Gato: Training approaches for greater suppleness
The DeepMind team serialized Gato in the training phase with data of different tasks and modalities (such as text and image, but probably also sensory input) in a flat sequence of tokens, stacked as a batch and processed in one by a neural transformer network similar to how it is processed known from large AI language models. In the process, machine learning engineers masked the loss function and trained the model to only predict action and text goals. At present, the known transformer models are based on the approach of predictions (predictions), in that they calculate the most probable answers from their data set and the learned links and output them as an answer when prompted.
Agent Gato: Overview of the possible forms of input and output
According to the twenty-strong research team, Agent Gato should have a greater scope of action than GPT-3 or DALL-E-like models, in that it not only processes text and images, but can also output spatial action impulses for a robotic arm in the form of tokens, depending on the input and context . The generalist agent should be able to "sense" and act (act) through a variety of different embodiments. During training, Gato had to solve 604 specific tasks that involved varying modalities, observations, and actions.
Gato's training phase: Data from different modalities and tasks were to be serialized into flat token sequences that processed a unified transformer model.
The research report by the DeepMind team that worked on Gato provides information on the exact course of the training. The DeepMind researchers first shared the news on Twitter, where Gato has already received a great deal of interest from the machine learning community and discussions about possible implications for more general artificial intelligence (AGI) from deep learning have begun.
A full PDF version of the paper is available on Google's Storage website.