#NewsBytesExplainer: Tracing Google's progress in AI and ML so far
The world is going gaga over ChatGPT and its potential. Many see it as the end of Google's dominance in search. To further the company's consternation, Microsoft recently announced its plan to introduce ChatGPT into Bing search and Azure Cloud Service. To allay the fears of investors, Google has released a blog post explaining its research and development in AI and machine learning (ML).
The company is working on LaMDA
Most discussions about AI are now centered around language models. The ability of large language models to generate "coherent, contextual, and natural-sounding responses" and perform wide-ranging tasks such as creating content, writing codes, and answering complicated questions has caught everyone by surprise. Google is working on its allegedly 'sentient' machine-learning language model LaMDA, which is trained on dialogues.
Google is focusing on making LaMDA's responses safe and grounded
With LaMDA, the company is exploring how language models can be used for safe and high-quality dialogue for multi-turn conversations. ChatGPT has showcased its ability to weave multi-turn conversations with ease. However, it tends to wander into dangerous territory with some of its answers. In this regard, Google emphasizing safe and grounded responses might help the company in the AI race.
PaLM is a 540-billion parameter language model
Another language model Google is working on is PaLM (Pathways Language Model), a 540-billion parameter language model built on the company's Pathways software infrastructure. Per Google, the work on PaLM has demonstrated how large language models trained on "large amounts of multi-lingual data and source code" can perform a variety of tasks despite not being trained to specifically perform those tasks.
Building systems that can perform multi-step reasoning is a challenge
Multi-step reasoning is one of the biggest challenges in AI. Making AI systems break down complex problems into smaller tasks and then combine solutions to those to address the larger problem is not as easy as it sounds. Google is working on 'Chain of Thought prompting' wherein the language model is encouraged to show the steps in reaching the solution.
'Chain of Thought prompting' helps find answers to complex problems
According to Google, 'Chain of Thought prompting' would help language models to generate "more structured, organized and accurate responses." The company believes that models are more likely to find the correct answer to complex problems that require multiple steps of reasoning with this approach. This will be particularly beneficial in solving complex mathematical and scientific problems.
Google applied transformer architecture in computer vision
Computer vision is a fast-evolving sphere of AI. It focuses on replicating the complexities of the human vision system and enabling computers to identify and process objects in a similar way. Google's major contribution to this field so far has been setting the trend of applying transformer architecture instead of convolutional neural networks to computer visions.
Google's MaxViT model combines both local and non-local information
Google has been working on several computer vision models. The MaxViT (Multi-Axis Vision Transformer) combines both local and non-local information of a vision model. This approach has proven to be better than other models on the ImageNet-1k (the primary dataset for pertaining models for computer vision tasks) classification tasks and other object detection tasks with much lower computational costs.
In Pix2Seq, object detection is a language modeling task
The tech giant attempts to tackle object detection from a different perspective in Pix2Seq. Instead of the usual task-specific approach, Google approaches object detection as a language modeling task conditioned on the observed pixel inputs. The model is trained to read out locations and other attributes of objects of interest in the image. Per Google, this system has achieved competitive results.
LOLNerf can identify 3D structure from a single 2D image
A big challenge in computer vision is understanding the 3D structure of real-world objects from one or a few 2D images. Google took a huge leap in tackling this challenge with the LOLNerf program. It learns the 3D structure of an object from a single 2D image. This was achieved by training the model on different examples of a particular category of objects.
Google is working on multi-modal ML models
ML models usually focus on a single modality of data. Google has been working on taking a step further by exploring multi-modal models or models that can deal with multiple modalities. Per the company, bringing together different modalities after a few steps of modality-specific processing and then mixing the features from different modalities through a bottleneck layer will be effective in such models.
Imagen and Parti are Google's text-to-image generators
After language processing models, generative models are the most popular AI models. Text-to-image models are the stars of the world of generative models. When we hear text-to-image models, DALL-E and Stable Diffusion come to our minds. Google has its own image generation models, including Imagen and Parti. The former is based on diffusion, while the latter uses an autoregressive transformer network.
Text-to-video is tough due to the added dimension of time
Developing generative models for videos is a tough ask, mainly because of the added dimension of time. Google is working on two such models named Imagen Video and Phenaki. Imagen Video uses cascaded diffusion models to generate high-resolution videos. The company is still working on the length of videos generated that way. Phenaki, on the other hand, is a transformer-based model.
Google advocated for responsible AI
Google concludes its blog post by advocating for responsible AI. "Leaders in ML and AI must lead not only in state-of-the-art technologies, but also in state-of-the-art approaches to responsibility and implementation," the company said. It is doubtful whether the argument of responsible AI would help the company address questions about OpenAI becoming the leader in the field of ML and AI.