Research Focus: Week of February 20, 2023

Research Focus: Week of February 20, 2023

Microsoft Research Focus 10 edition, week of February 20, 2023

Welcome to Research Focus, a new series of blog posts that highlights notable publications, events, code/datasets, new hires and other milestones from across the research community at Microsoft.

NEW RESEARCH

Self-supervised Multi-task pretrAining with contRol Transformers (SMART)

Many real-world applications require sequential decision making, where an agent interacts with a stochastic environment to perform a task. For example, a navigating robot is expected to control itself and move to a target using sensory information it receives along the way. Learning the proper control policy can be complicated by environmental uncertainty and high-dimensional perceptual information, such as raw-pixel spaces. More importantly, the learned strategy is specific to the task (e.g. which target to reach) and the agent (e.g., a two-leg robot or a four-leg robot). That means that a good strategy for one task does not necessarily apply to a new task or a different agent.

Pre-training a foundation model can help improve overall efficiency when facing a large variety of control tasks and agents. However, although foundation models have achieved incredible success in language domains, different control tasks and agents can have large discrepancies, making it challenging to find a universal foundation. It becomes even more challenging in real-world scenarios that lack supervision or high-quality behavior data.

In a new paper: SMART: Self-supervised Multi-task pretrAining with contRol Transformers, Microsoft researchers tackle these challenges and propose a generic pre-training framework for control problems. Their research demonstrates that a single pre-trained SMART model can be fine-tuned for various visual-control tasks and agents, either seen or unseen, with significantly improved performance and learning efficiency. SMART is also resilient to low-quality datasets and works well even when random behaviors comprise the pre-training data.


Spotlight: On-Demand EVENT

Microsoft Research Summit 2022

On-Demand
Watch now to learn about some of the most pressing questions facing our research community and listen in on conversations with 120+ researchers around how to ensure new technologies have the broadest possible benefit for humanity.

NEW RESEARCH

A Ranking Game for Imitation Learning

Reinforcement learning relies on environmental reward feedback to learn meaningful behaviors. Since reward specification is a hard problem, imitation learning (IL) may be used to bypass reward specification and learn from expert data, often via Inverse Reinforcement Learning (IRL) techniques.  In IL, while near-optimal expert data is very informative, it can be difficult to obtain. Even with infinite data, expert data cannot imply a total ordering over trajectories as preferences can. On the other hand, learning from preferences alone is challenging, as a large number of preferences are required to infer a high-dimensional reward function, though preference data is typically much easier to collect than expert demonstrations. The classical IRL formulation learns from expert demonstrations but provides no mechanism to incorporate learning from offline preferences.

In a new paper: A Ranking Game for Imitation Learning accepted at TMLR 2023, researchers from UT Austin, Microsoft Research, and UMass Amherst create a unified algorithmic framework for IRL that incorporates both expert and suboptimal information for imitation learning. They propose a new framework for imitation learning called “rank-game” which treats imitation as a two-player ranking-based game between a policy and a reward. In this game, the reward agent learns to satisfy pairwise performance rankings between behaviors, while the policy agent learns to maximize this reward. A novel ranking loss function is proposed, giving an algorithm that can simultaneously learn from expert demonstrations and preferences, gaining the advantages of both modalities. Experimental results in the paper show that the proposed method achieves state-of-the-art sample efficiency and can solve previously unsolvable tasks in the Learning from Observation (LfO) setting. Project video and code can be found on GitHub.

rank-game: The Policy agent maximizes the reward function by interacting with the environment. The Reward agent satisfies a set of behavior rankings obtained from various sources: generated by the policy agent (vanilla), automatically generated (auto), or offline annotated rankings obtained from a human or offline dataset (pref). Treating this game in the Stackelberg framework leads to either Policy being a leader and Reward being a follower, or vice versa.
Figure 1: rank-game: The Policy agent maximizes the reward function by interacting with the environment. The Reward agent satisfies a set of behavior rankings obtained from various sources: generated by the policy agent (vanilla), automatically generated (auto), or offline annotated rankings obtained from a human or offline dataset (pref). Treating this game in the Stackelberg framework leads to either Policy being a leader and Reward being a follower, or vice versa.

NEWS

Microsoft helps GoodLeaf Farms drive agricultural innovation with data

Vertical indoor farming uses extensive technology to manage production and optimize growing conditions. This includes movement of grow benches, lighting, irrigation, and air and temperature controls. Data and analytics can help vertical farms produce the highest possible yields and quality.

Canadian vertical farm pioneer GoodLeaf Farms has announced a partnership with Microsoft and data and analytics firm Adastra to optimize crop production and quality. GoodLeaf has deployed Microsoft Azure Synapse Analytics and Microsoft Power Platform to utilize the vast amounts of data it collects.

GoodLeaf is also collaborating with Microsoft Research through Project FarmVibes, using GoodLeaf’s data to support research into controlled environment agriculture.

GoodLeaf’s farm in Guelph, Ontario, and two currently under construction in Calgary and Montreal, use a connected system of cameras and sensors to manage plant seeding, growing mediums, germination, temperature, humidity, nutrients, lighting, and air flow. Data science and analytics help the company grow microgreens and baby greens in Canada year-round, no matter the weather using a hydroponics system and specialized LED lights. 


OPPORTUNITY

Reinforcement Learning Open Source Fest

Proposals are now being accepted for Reinforcement Learning (RL) Open Source Fest 2023, a global online program that introduces students to open-source RL programs and software development. Our goal is to bring together a diverse group of students from around the world to help solve open-source RL problems and advance state-of-the-art research and development. The program produces open-source code written and released to benefit all.

Accepted students will join a four-month research project from May to August 2023, working virtually alongside researchers, data scientists, and engineers on the Microsoft Research New York City Real World Reinforcement Learning team. Students will also receive a $10,000 USD stipend. At the end of the program, students will present each of their projects to the Microsoft Research Real World Reinforcement Learning team online.

The proposal deadline is Monday, April 3, 2023, at 11:59 PM ET. Learn more and submit your proposal today.

The post Research Focus: Week of February 20, 2023 appeared first on Microsoft Research.

Read More

Suppressing quantum errors by scaling a surface code logical qubit

Suppressing quantum errors by scaling a surface code logical qubit

Many years from today, scientists will be able to use fault-tolerant quantum computers for large-scale computations with applications across science and industry. These quantum computers will be much bigger than today, consisting of millions of coherent quantum bits, or qubits. But there’s a catch — these basic building blocks must be good enough or the systems will be overrun with errors.

Currently, the error rates of the qubits on our 3rd generation Sycamore processor are typically between 1 in 10,000 to 1 in 100. Through our work and that of others, we understand that developing large-scale quantum computers will require far lower error rates. We will need rates in the range of 1 in 109 to 1 in 106 to run quantum circuits that can solve industrially relevant problems.

So how do we get there, knowing that squeezing three to six orders of magnitude of better performance from our current physical qubits is unlikely? Our team has created a roadmap that has directed our research for the last several years, improving the performance of our quantum computers in gradual steps toward a fault-tolerant quantum computer.

Roadmap for building a useful error-corrected quantum computer with key milestones. We are currently building one logical qubit that we will scale in the future.

Today, in “Suppressing Quantum Errors by Scaling a Surface Code Logical Qubit”, published in Nature, we are announcing that we have reached the second milestone on our roadmap. Our experimental results demonstrate a prototype of the basic unit of an error-corrected quantum computer known as a logical qubit, with performance nearing the regime that enables scalable fault-tolerant quantum computing.

A paradigm shift: from physical qubits to logical qubits

Quantum error correction (QEC) represents a paradigm shift from today’s quantum computing, where each physical qubit on the processor acts as a unit of computation. It provides the recipe to reach low errors by trading many good qubits for an excellent one: information is encoded across several physical qubits to construct a single logical qubit that is more resilient and capable of running large-scale quantum algorithms. Under the right conditions, the more physical qubits used to build a logical qubit, the better that logical qubit becomes.

However, this will not work if the added errors from each additional physical qubit outweigh the benefits of QEC. Until now, the high physical error rates have always won out.

To that end, we use a particular error-correcting code called a surface code and show for the first time that increasing the size of the code decreases the error rate of the logical qubit. A first-ever for any quantum computing platform, this was achieved by painstakingly mitigating many error sources as we scaled from 17 to 49 physical qubits. This work is evidence that with enough care, we can produce the logical qubits necessary for a large-scale error-corrected quantum computer.

Quantum error correction with surface codes

How does an error-correcting code protect information? Take a simple example from classical communication: Bob wants to send Alice a single bit that reads “1” across a noisy communication channel. Recognizing that the message is lost if the bit flips to “0”, Bob instead sends three bits: “111”. If one erroneously flips, Alice could take a majority vote (a simple error-correcting code) of all the received bits and still understand the intended message. Repeating the information more than three times — increasing the “size” of the code — would enable the code to tolerate more individual errors.

Many physical qubits on a quantum processor acting as one logical qubit in an error-correcting code called a surface code.

A surface code takes this principle and imagines a practical quantum implementation. It has to satisfy two additional constraints. First, the surface code must be able to correct not just bit flips, taking a qubit from |0⟩ to |1⟩, but also phase flips. This error is unique to quantum states and transforms a qubit in a superposition state, for example from “|0⟩ + |1⟩” to “|0⟩ – |1⟩”. Second, checking the qubits’ states would destroy their superpositions, so one needs a way of detecting errors without measuring the states directly.

To address these constraints, we arrange two types of qubits on a checkerboard. “Data” qubits on the vertices make up the logical qubit, while “measure” qubits at the center of each square are used for so-called “stabilizer measurements.” These measurements tell us whether the qubits are all the same, as desired, or different, signaling that an error occurred, without actually revealing the value of the individual data qubits.

We tile two types of stabilizer measurements in a checkerboard pattern to protect the logical data from bit- and phase-flips. If some of the stabilizer measurements register an error, then correlations in the stabilizer measurements are used to identify which error(s) occurred and where.

Surface-code QEC. Data qubits (yellow) are at the vertices of a checkerboard. Measure qubits at the center of each square are used for stabilizer measurements (blue squares). Dark blue squares check for bit-flip errors, while light blue squares check for phase-flip errors. Left: A phase-flip error. The two nearest light blue stabilizer measurements register the error (light red). Right: A bit-flip error. The two nearest dark blue stabilizer measurements register the error (dark red).

Just as Bob’s message to Alice in the example above became more robust against errors with increasing code size, a larger surface code better protects the logical information it contains. The surface code can withstand a number of bit- and phase-flip errors each equal to less than half the distance, where the distance is the number of data qubits that span the surface code in either dimension.

But here’s the problem: every individual physical qubit is prone to errors, so the more qubits in a code, the more opportunity for errors. We want the higher protection offered by QEC to outweigh the increased opportunities for errors as we increase the number of qubits. For this to happen, the physical qubits must have errors below the so-called “fault-tolerant threshold.” For the surface code, this threshold is quite low. So low that it hasn’t been experimentally feasible until recently. We are now on the precipice of reaching this coveted regime.

Making and controlling high-quality physical qubits

Entering the regime where QEC improves with scale required improving every aspect of our quantum computers, from nanofabrication of the physical qubits to the optimized control of the full quantum system. These experiments ran on a state-of-the-art 3rd generation Sycamore processor architecture optimized for QEC using the surface code with improvements across the board:

  • Increased qubit relaxation and dephasing lifetimes through an improved fabrication process and environmental noise reduction near the quantum processor.
  • Lowered cross-talk between all physical qubits during parallel operation by optimizing quantum processor circuit design and nanofabrication.
  • Reduced drift and improved qubit control fidelity through upgraded custom electronics.
  • Implemented faster and higher-fidelity readout and reset operations compared with previous generations of the Sycamore processor.
  • Reduced calibration errors by extensively modeling the full quantum system and employing better system-optimization algorithms.
  • Developed context-aware and fully parallel calibrations to minimize drift and optimize control parameters for QEC circuits.
  • Enhanced dynamical decoupling protocols to protect physical qubits from noise and cross-talk during idling operations.

Running surface code circuits

With these upgrades in place, we ran experiments to compare the ratio (𝚲3,5) between the logical error rate of a distance-3 surface code (ε3) with 17 qubits to that of a distance-5 surface code (ε5) with 49 qubits — 𝚲3,5 = ε3 / ε5.

Comparison of logical fidelity (defined as 1-ε) between distance-3 (d=3) and distance-5 (d=5) surface codes. The distance-5 code contains four possible distance-3 arrangements, with one example shown in the red outline (left). As improvements were made, the d=5 fidelity increased faster than that of the d=3, eventually overtaking the distance-3 code, as shown in the top-right data points (right), whose average lies slightly to the left of the ε3 = ε5 line.

The results of these experiments are shown above on the right. Continued improvements over several months allowed us to reduce the logical errors of both grids, leading to the distance-5 grid (ε5 = 2.914%) outperforming the distance-3 grids (ε3 = 3.028%) by 4% (𝚲3,5 = 1.04) with 5𝛔 confidence. While this might seem like a small improvement, it’s important to emphasize that the result represents a first for the field since Peter Shor’s 1995 QEC proposal. A larger code outperforming a smaller one is a key signature of QEC, and all quantum computing architectures will need to pass this hurdle to realize a path to the low errors that are necessary for quantum applications.

The path forward

These results indicate that we are entering a new era of practical QEC. The Google Quantum AI team has spent the last few years thinking about how we define success in this new era, and how we measure progress along the way.

The ultimate goal is to demonstrate a pathway to achieving the low errors needed for using quantum computers in meaningful applications. To this end, our target remains achieving logical error rates of 1 in 106 or lower per cycle of QEC. In the figure below on the left, we outline the path that we anticipate to reach this target. As we continue improving our physical qubits (and hence the performance of our logical qubits), we expect to gradually increase 𝚲 from close to 1 in this work to larger numbers. The figure below shows that a value of 𝚲 = 4 and a code distance of 17 (577 physical qubits with good enough quality) will yield a logical error rate below our target of 1 in 106.

While this result is still a few years out, we have an experimental technique to probe error rates this low with today’s hardware, albeit in limited circumstances. While two-dimensional surface codes allow us to correct both bit- and phase-flip errors, we can also construct one-dimensional repetition codes that are only able to solve one type of error with relaxed requirements. On the right below, we show that a distance-25 repetition code can reach error rates per cycle close to 1 in 106. At such low errors, we see new kinds of error mechanisms that are not yet observable with our surface codes. By controlling for these error mechanisms, we can improve repetition codes to error rates near 1 in 107.

Left: Expected progression as we improve performance (quantified by 𝚲) and scale (quantified by code distance) for surface codes. Right: Experimentally measured logical error rates per cycle versus the distance of one-dimensional repetition codes and two-dimensional surface codes.

Reaching this milestone reflects three years of focused work by the entire Google Quantum AI team following our demonstration of a quantum computer outperforming a classical computer. In our march toward building fault-tolerant quantum computers, we will continue to use the target error rates in the figure above to measure our progress. With further improvements toward our next milestone, we anticipate entering the fault-tolerant regime, where we can exponentially suppress logical errors and unlock the first useful error-corrected quantum applications. In the meantime, we continue to explore various ways of solving problems using quantum computers in topics ranging from condensed matter physics to chemistry, machine learning, and materials science.

Read More

New NVIDIA Studio Laptops Powered by GeForce RTX 4070, 4060, 4050 Laptop GPUs Boost On-the-Go Content Creation

New NVIDIA Studio Laptops Powered by GeForce RTX 4070, 4060, 4050 Laptop GPUs Boost On-the-Go Content Creation

Editor’s note: This post is part of our weekly In the NVIDIA Studio series, which celebrates featured artists, offers creative tips and tricks, and demonstrates how NVIDIA Studio technology improves creative workflows. We’re also deep diving on new GeForce RTX 40 Series GPU features, technologies and resources, and how they dramatically accelerate content creation.

Laptops equipped with NVIDIA GeForce RTX 4070, 4060 and 4050 GPUs are now available. The new lineup — including NVIDIA Studio-validated laptops from ASUS, GIGABYTE and Samsung — gives creators more options to create from anywhere with lighter, thinner devices that dramatically exceed the performance of the last generation.

These new GeForce RTX Laptop GPUs bring increased efficiency, thanks to the NVIDIA Ada Lovelace GPU architecture and fifth-generation Max-Q technology.

The laptops are fueled by powerful NVIDIA Studio technologies, including hardware acceleration for 3D, video and AI workflows; optimizations for RTX hardware in over 110 popular creative apps; and exclusive NVIDIA Studio apps like Omniverse, Canvas and Broadcast. And when the creating ends to let the gaming begin, DLSS 3 technology doubles frame rates.

Plus, the making of 3D artist Shangyu Wang’s short film, called Most Precious Gift, is highlighted In the NVIDIA Studio this week. The film was staged in NVIDIA Omniverse, a platform for creating and operating metaverse applications.

And don’t forget to sign up for creator and Omniverse sessions, tutorials and more at NVIDIA GTC, a free, global conference for the era of AI and the metaverse running online March 20-23.

A GPU Class of Their Own 

The new Studio laptops, equipped with powerful GeForce RTX 4070, 4060 and 4050 Laptop GPUs and fifth-generation Max-Q technology, revolutionize content creation on the go.

These advancements enable extreme efficiencies that allow creators to get the best of both worlds: small size and high performance. The thinner, lighter, quieter laptops retain extraordinary performance — letting users complete complex creative tasks in a fraction of the time needed before.

GeForce RTX 4070 GPUs unlock advanced video editing and 3D rendering capabilities. Work in 6K RAW high-dynamic range video files with lightning-fast decoding, export in AV1 with the new eighth-generation encoder, and gain a nearly 40% performance boost over the previous generation with GPU-accelerated effects in Blackmagic Design’s DaVinci Resolve. Advanced 3D artists can tackle large projects with ease across essential 3D apps using new third-generation RT Cores.

The GeForce RTX 4060 GPU-class laptops equipped with 8GB of video memory are great for video editing and artists looking to get started in 3D modeling and animation. In the popular open-source 3D app Blender, render times are a whopping 38% faster than the last generation.

Get started with GPU acceleration for photography, graphic design and video editing workflows using GeForce RTX 4050 GPUs, which provide a massive upgrade from integrated graphics. Access accelerated AI features, including 54% faster performance in Topaz Video for upscaling and deinterlacing footage. And turn home offices into professional-grade studios with NVIDIA’s encoder and the AI-powered NVIDIA Broadcast app for livestreaming.

Freelancers, hobbyists, aspiring artists and others can find a GeForce RTX GPU to fit their needs, now available in the new lineup of NVIDIA Studio laptops.

Potent, Portable, Primed for Creating

Samsung’s Galaxy Book3 Ultra comes with a choice of the GeForce RTX 4070 or 4050 GPU, alongside a vibrant 16-inch, 3K, AMOLED display.

Pick one up at Best Buy or on Samsung.com.

The Samsung Galaxy Book3 Ultra houses the GeForce RTX 4070 or 4050 GPU.

GIGABYTE upgraded its Aero 16 Studio laptop with up to a GeForce RTX 4070 GPU and a 16-inch, thin-bezel, 60Hz, OLED display. The Aero 14 features a GeForce RTX 4050 GPU with a 14-inch, thin-bezel, 90Hz, OLED display.

Purchase the Aero 14 from Amazon, and find both laptops on GIGABYTE.com.

GIGABYTE’s Aero 16 and 14 models with up to a GeForce RTX 4070 GPU are content-creation beasts.

The ASUS ROG FLOW Z13 comes with up to a GeForce RTX 4060 GPU, QHD, 165Hz, 13.4-inch Nebula display, as well as a 170-degree kickstand and detachable full-sized keyboard for portable creating, plus a stylus with NVIDIA Canvas support to turn simple brushstrokes into realistic images powered by AI.

Get one from ASUS.com.

The ASUS ROG FLOW Z13 is equipped with up to a GeForce RTX 4060 GPU.

MSI’s Stealth 17 Studio and Razer’s 16 and 18 models, with up to GeForce RTX 4090 Laptop GPUs, are also available to pick up today.

All Aboard the Creative Ship

Studio laptops power the imaginations of the world’s most creative minds, including this week’s In the NVIDIA Studio artist, Shangyu Wang.

From the moment his movie’s opening credits roll, viewers can expect to be captivated by a spellbinding journey in space and an intricately designed world, complemented by engaging music and voice-overs.

The film, Most Precious Gift, centers on humanity attempting to make peace with another intelligent lifeform holding the key to survival. It’s an extension of Wang’s interests in alien civilizations and their potential conflicts with humankind.

Wang usually jumps directly into 3D modeling, bypassing the concept stage that most artists go through. He sculpts and shapes the models in Autodesk Maya and Autodesk Fusion 360.

Ultra-fine details modeled in Autodesk Maya.

By selecting the default Autodesk Arnold renderer, using his GeForce RTX 3080 Ti-powered Studio laptop, Wang was able to use RTX-accelerated ray tracing and AI denoising, which let him tinker with and add details to highly interactive, photorealistic visuals. This was a boon for his efficiency.

Clothing segments combined and applied to the 3D model in Autodesk Maya.

Wang built textures in Adobe Substance 3D Painter and placed extra care on the fine details, noting the app was the “best option for the most realistic, original materials.” RTX-accelerated light and ambient occlusion guaranteed fully baked assets in mere seconds.

Realistic textures applied to 3D models in Adobe Substance 3D Painter.

For final renders, Wang said it was a no-brainer to assemble, simulate and stage his 3D scenes in Omniverse Create. “Because of the powerful path-tracing rendering, I can modify scene lights and materials in real time,” he said.

 

And when it came to final exports, Wang could use his preferred renderer within the Omniverse Create viewport, which has support for Pixar HD Storm, Chaos V-Ray, Maxon’s Redshift, OTOY OctaneRender, Blender Cycles and more.

Realistic lighting and shadows, manipulated and tinkered with in Omniverse Create.

Wang wrapped up compositing in NUKE software, where he adjusted colors and added depth-of-field visuals to the lens. The artist finally moved to DaVinci Resolve to add sound effects, music and subtitles.

3D artist Shangyu Wang.

Check out more of Wang’s work on ArtStation.

Follow NVIDIA Studio on Instagram, Twitter and Facebook. Access tutorials on the Studio YouTube channel and get updates directly in your inbox by subscribing to the Studio newsletter. Learn more about Omniverse on Instagram, Medium, Twitter and YouTube for additional resources and inspiration. Check out the Omniverse forums, and join our Discord server and Twitch channel to chat with the community.

Read More

Google Research, 2022 & beyond: Natural sciences

Google Research, 2022 & beyond: Natural sciences

(This is Part 7 in our series of posts covering different topical areas of research at Google. You can find other posts in the series here.)

It’s an incredibly exciting time to be a scientist. With the amazing advances in machine learning (ML) and quantum computing, we now have powerful new tools that enable us to act on our curiosity, collaborate in new ways, and radically accelerate progress toward breakthrough scientific discoveries.

Since joining Google Research eight years ago, I’ve had the privilege of being part of a community of talented researchers fascinated by applying cutting-edge computing to push the boundaries of what is possible in applied science. Our teams are exploring topics across the physical and natural sciences. So, for this year’s blog post I want to focus on high-impact advances we’ve made recently in the fields of biology and physics, from helping to organize the world’s protein and genomics information to benefit people’s lives to improving our understanding of the nature of the universe with quantum computers. We are inspired by the great potential of this work.

Using machine learning to unlock mysteries in biology

Many of our researchers are fascinated by the extraordinary complexity of biology, from the mysteries of the brain, to the potential of proteins, and to the genome, which encodes the very language of life. We’ve been working alongside scientists from other leading organizations around the world to tackle important challenges in the fields of connectomics, protein function prediction, and genomics, and to make our innovations accessible and useful to the greater scientific community.

Neurobiology

One exciting application of our Google-developed ML methods was to explore how information travels through the neuronal pathways in the brains of zebrafish, which provides insight into how the fish engage in social behavior like swarming. In collaboration with researchers from the Max Planck Institute for Biological Intelligence, we were able to computationally reconstruct a portion of zebrafish brains imaged with 3D electron microscopy — an exciting advance in the use of imaging and computational pipelines to map out the neuronal circuitry in small brains, and another step forward in our long-standing contributions to the field of connectomics.

Reconstruction of the neural circuitry of a larval zebrafish brain, courtesy of the Max Planck Institute for Biological Intelligence.

The technical advances necessary for this work will have applications even beyond neuroscience. For example, to address the difficulty of working with such large connectomics datasets, we developed and released TensorStore, an open-source C++ and Python software library designed for storage and manipulation of n-dimensional data. We look forward to seeing the ways it is used in other fields for the storage of large datasets.

We’re also using ML to shed light on how human brains perform remarkable feats like language by comparing human language processing and autoregressive deep language models (DLMs). For this study, a collaboration with colleagues at Princeton University and New York University Grossman School of Medicine, participants listened to a 30-minute podcast while their brain activity was recorded using electrocorticography. The recordings suggested that the human brain and DLMs share computational principles for processing language, including continuous next-word prediction, reliance on contextual embeddings, and calculation of post-onset surprise based on word match (we can measure how surprised the human brain is by the word, and correlate that surprise signal with how well the word is predicted by the DLM). These results provide new insights into language processing in the human brain, and suggest that DLMs can be used to reveal valuable insights about the neural basis of language.

Biochemistry

ML has also allowed us to make significant advances in understanding biological sequences. In 2022, we leveraged recent advances in deep learning to accurately predict protein function from raw amino acid sequences. We also worked in close collaboration with the European Molecular Biology Laboratory’s European Bioinformatics Institute (EMBL-EBI) to carefully assess model performance and add hundreds of millions of functional annotations to the public protein databases UniProt, Pfam/InterPro, and MGnify. Human annotation of protein databases can be a laborious and slow process and our ML methods enabled a giant leap forward — for example, increasing the number of Pfam annotations by a larger number than all other efforts during the past decade combined. The millions of scientists worldwide who access these databases each year can now use our annotations for their research.

Google Research contributions to Pfam exceed in size all expansion efforts made to the database over the last decade.

Although the first draft of the human genome was released in 2003, it was incomplete and had many gaps due to technical limitations in the sequencing technologies. In 2022 we celebrated the remarkable achievements of the Telomere-2-Telomere (T2T) Consortium in resolving these previously unavailable regions — including five full chromosome arms and nearly 200 million base pairs of novel DNA sequences — which are interesting and important for questions of human biology, evolution, and disease. Our open source genomics variant caller, DeepVariant, was one of the tools used by the T2T Consortium to prepare their release of a complete 3.055 billion base pair sequence of a human genome. The T2T Consortium is also using our newer open source method DeepConsensus, which provides on-device error correction for Pacific Biosciences long-read sequencing instruments, in their latest research toward comprehensive pan-genome resources that can represent the breadth of human genetic diversity.

Using quantum computing for new physics discoveries

When it comes to making scientific discoveries, quantum computing is still in its infancy, but has a lot of potential. We’re exploring ways of advancing the capabilities of quantum computing so that it can become a tool for scientific discovery and breakthroughs. In collaboration with physicists from around the world, we are also starting to use our existing quantum computers to create interesting new experiments in physics.

As an example of such experiments, consider the problem where a sensor measures something, and a computer then processes the data from the sensor. Traditionally, this means the sensor’s data is processed as classical information on our computers. Instead, one idea in quantum computing is to directly process quantum data from sensors. Feeding data from quantum sensors directly to quantum algorithms without going through classical measurements may provide a large advantage. In a recent Science paper written in collaboration with researchers from multiple universities, we show that quantum computing can extract information from exponentially fewer experiments than classical computing, as long as the quantum computer is coupled directly to the quantum sensors and is running a learning algorithm. This “quantum machine learning” can yield an exponential advantage in dataset size, even with today’s noisy intermediate-scale quantum computers. Because experimental data is often the limiting factor in scientific discovery, quantum ML has the potential to unlock the vast power of quantum computers for scientists. Even better, the insights from this work are also applicable to learning on the output of quantum computations, such as the output of quantum simulations that may otherwise be difficult to extract.

Even without quantum ML, a powerful application of quantum computers is to experimentally explore quantum systems that would be otherwise impossible to observe or simulate. In 2022, the Quantum AI team used this approach to observe the first experimental evidence of multiple microwave photons in a bound state using superconducting qubits. Photons typically do not interact with one another, and require an additional element of non-linearity to cause them to interact. The results of our quantum computer simulations of these interactions surprised us — we thought the existence of these bound states relied on fragile conditions, but instead we found that they were robust even to relatively strong perturbations that we applied.

Occupation probability versus discrete time step for n-photon bound states. We observe that the majority of the photons (darker colors) remain bound together.

Given the initial successes we have had in applying quantum computing to make physics breakthroughs, we are hopeful about the possibility of this technology to enable future groundbreaking discoveries that could have as significant a societal impact as the creation of transistors or GPS. The future of quantum computing as a scientific tool is exciting!

Acknowledgements

I would like to thank everyone who worked hard on the advances described in this post, including the Google Applied Sciences, Quantum AI, Genomics and Brain teams and their collaborators across Google Research and externally. Finally, I would like to thank the many Googlers who provided feedback in the writing of this post, including Lizzie Dorfman, Erica Brand, Elise Kleeman, Abe Asfaw, Viren Jain, Lucy Colwell, Andrew Carroll, Ariel Goldstein and Charina Chou.

Top

Google Research, 2022 & beyond

This was the seventh blog post in the “Google Research, 2022 & Beyond” series. Other posts in this series are listed in the table below:

Read More

MLOps deployment best practices for real-time inference model serving endpoints with Amazon SageMaker

MLOps deployment best practices for real-time inference model serving endpoints with Amazon SageMaker

After you build, train, and evaluate your machine learning (ML) model to ensure it’s solving the intended business problem proposed, you want to deploy that model to enable decision-making in business operations. Models that support business-critical functions are deployed to a production environment where a model release strategy is put in place. Given the nature of ML models, where the data is continuously changing, you also want to ensure that a deployed model is still relevant to new data and that the model is updated when this is not the case. This includes choosing a deployment strategy that minimizes risks and downtime. This optimal deployment strategy should maintain high availability of the model, consider the business cost of deploying an inferior model to what is already in production, and contain functionality to easily roll back to a previous model version. Many of these recommended considerations and deployment patterns are also covered within the AWS Well Architected Framework – Machine Learning Lens.

In addition to choosing the right deployment strategy, that strategy should be implemented using a reliable mechanism that includes MLOps practices. MLOps includes practices that integrate ML workloads into release management, CI/CD, and operations, accounting for the unique aspects of ML projects, including considerations for deploying and monitoring models. Amazon SageMaker for MLOps provides purpose-built tools to automate and standardize steps across the ML lifecycle, including capabilities to deploy and manage new models using advanced deployment patterns.

In this post, we discuss how to deploy ML models with Amazon SageMaker in a repeatable and automated way, integrating the production variants and deployment guardrails capabilities of SageMaker with MLOps solutions. We give you an introduction of how to integrate the MLOps tools of SageMaker with SageMaker model deployment patterns, focusing on real-time single-model endpoints.

Solution overview

We explore the following model testing and guardrail patterns and their integration with SageMaker MLOps tools:

  • Model testing – We compare different model versions in production before replacing the current model version. This post compares the following model testing capabilities:
    • A/B testing – With A/B testing, you compare different versions of your model in production by distributing the endpoint traffic between your model variants. A/B testing is used in scenarios where closed loop feedback can directly tie model outputs to downstream business metrics. This feedback is then used to determine the statistical significance of changing from one model to another, helping you select the best model through live production testing.
    • Shadow tests – With shadow tests, you test a new version of your model in production by sending requests to the production model and the new model in parallel. The prediction response data from the production model is served to the application, while the new model version predictions are stored for testing but not served to the production application. Shadow testing is used in situations where there is no closed loop feedback mapping a business metric back to a model’s predictions. In this scenario, you use model quality and operational metrics to compare multiple models instead of any impact on downstream business metrics.
  • Shifting traffic – After you have tested the new version of the model and are satisfied with its performance, the next step is to shift traffic from the current model to the new one. The blue/green deployment guardrails in SageMaker allow you to easily switch from the current model in production (blue fleet) to a new one (green fleet) in a controlled way. Blue/green deployments avoid downtime during the updates of your model, like what you would have in an in-place deployment scenario. To maximize model availability, as of this writing, blue/green deployments are the default option for model updates in SageMaker. We discuss the following traffic shifting methods in this post:
    • All at once traffic shifting – 100% of your endpoint traffic is shifted from your blue fleet to your green fleet after the green fleet becomes available. We use alarms in Amazon CloudWatch that monitor your green fleet for a set amount of time (the baking period) and if no alarm is triggered, the blue fleet is then deleted by SageMaker after the baking period.
    • Canary traffic shifting – Your green fleet is first exposed to a smaller proportion of your traffic (a canary) and validated for any issues using CloudWatch alarms for a baking period while the blue fleet keeps receiving most of the endpoint traffic. After the green fleet is validated, all traffic is shifted to the new fleet and the blue fleet is then deleted by SageMaker.
    • Blue/green linear traffic shifting guardrail – You gradually shift traffic from your blue fleet to your green fleet in a step approach. Your model is then monitored with CloudWatch alarms for a baking period in each step before the Blue fleet is completely replaced.

This post focuses on describing architectures that utilize SageMaker MLOps features to perform controlled deployments of models via the deployment guardrails and modeling testing strategies we’ve listed. For general information on these patterns, refer to Take advantage of advanced deployment strategies using Amazon SageMaker deployment guardrails and Deployment guardrails.

Deploy a model with SageMaker

SageMaker offers a broad range of deployment options that vary from low latency and high throughput to long-running inference jobs. These options include considerations for batch, real-time, or near real-time inference. Each option offers different advanced features, such as the ability to run multiple models on a single endpoint. However, as previously mentioned, for this post, we only cover MLOps deployment patterns using single-model endpoints. To dive further into more advanced SageMaker deployment features for real-time inference, refer to Model hosting patterns in Amazon SageMaker, Part 2: Getting started with deploying real time models on SageMaker.

To understand the implementation of advanced deployment patterns using continuous delivery (CD) pipelines, let’s first discuss a key concept within SageMaker called model variants.

SageMaker model variants

Model variants allow you to deploy multiple versions of your model to the same endpoint to test your model. Model variants are deployed to separate instances, so there is no impact on other variants when one is updated. In SageMaker, model variants are implemented as production and shadow variants.

Production variants allow you to A/B test multiple versions of your model to compare their performance. In this scenario, all versions of your model return responses to the model requests. Your endpoint traffic is distributed between the existent variants either by traffic distribution, where you assign a weight for each variant, or by target variant, where a certain parameter (for instance Region or market) decides which model should be invoked.

Shadow variants allow you to shadow test a new version of your model. In this scenario, your model has a production variant and a shadow variant deployed in parallel to the same endpoint. The shadow variant receives the full (or sampled) data traffic from your endpoint. However, only the predictions of the production variants are sent back to the users of your application, and the predictions from the shadow variants are logged for analysis. Because shadow variants are launched on separate instances from the production variant, there is no performance impact to your production variant in this test. With this option, you are testing the new model and minimizing the risks of a low-performing model, and you can compare both models’ performance with the same data.

SageMaker deployment guardrails

Guardrails are an essential part of software development. They protect your application and minimize the risk of deployment of a new version of your application. Similarly, SageMaker deployment guardrails allow you to switch from one model version to another in a controlled way. As of December 2022, SageMaker guardrails provide implementation for blue/green, canary, and linear traffic shifting deployment options. When combined with model variants, deployment guardrails can be applied both to production and shadow variants of your model, ensuring no downtime during the update of a new variant, with the traffic shifting being controlled according to the option selected.

MLOps foundations for model deployment

In the broader context of an ML model building and deploying workflow, we want to employ CI/CD practices purpose built for the ML workflow. Similar to traditional CI/CD systems, we want to automate software tests, integration testing, and production deployments. However, we also need to include specific operations around the ML lifecycle that aren’t present in the traditional software development lifecycle such as model training, model experimentation, model testing, and model monitoring.

To achieve those ML-specific capabilities, MLOps foundations such as automated model testing, deployment guardrails, multi-account deployments, and automated model rollback are added to the model deployment process. This ensures that the already described capabilities allow for model testing and avoid downtime during the process of a model update. It also provides the reliability and traceability necessary for the continuous improvement of a production-ready model. Additionally, capabilities like the ability to package existing solutions into reusable templates and deploy models in a multi-account setup ensure the scalability of the model deployment patterns discussed in the post to several models across an organization.

The following figure demonstrates a common pattern for the connection of SageMaker capabilities to create an end-to-end model building and deployment pipeline. In this example, a model is developed in SageMaker using SageMaker Processing jobs to run data processing code that is used to prepare data for an ML algorithm. SageMaker Training jobs are then used to train an ML model on the data produced by the processing job. The model artifacts and associated metadata are stored in the SageMaker Model Registry as the last step of the training process. This is orchestrated by SageMaker Pipelines, which is a purpose-built CI/CD service for ML that helps automate and manage ML workflows at scale.

After the model is approved, it is tested in production with either an A/B testing or a shadow deployment. After the model is validated in production, we use the model registry to approve the model for production rollout to a SageMaker endpoint using one of the deployment guardrails options.

When the model update process is complete, SageMaker Model Monitor continually monitors the model performance for drifts into the model and data quality. This process is automated to multiple use cases using SageMaker Project templates mapping the infrastructure deployment to a multi-account setup in order to ensure complete resource isolation and easier cost control.

Single-model endpoint deployment patterns

When deploying models to a production environment for the first time, you don’t have a model running to compare with, and the deployed model will be the one used by your business application. After the model is deployed and monitored in a production environment, you might want to update the model, either on a regular basis or on demand, when new data is available or when your model has a performance gap detected. When updating an existing model, you want to ensure that the new model performs better than the current one and can handle the prediction request traffic from your business applications. During this validation period, you want the current model to still be available for a possible rollback to minimize the risk of downtime to your applications.

In a broader model development picture, models are typically trained in a data science development account. This includes experimentation workflows often used in the development of models as well as retraining workflows used in production-ready pipelines. All of the metadata for these experiments can be tracked using Amazon SageMaker Experiments during development. After the workflow is incorporated into a pipeline for production use, the metadata is automatically tracked through SageMaker Pipelines. To keep track of viable production models in one place, after experimentation has brought a model’s performance metrics (precision, recall, and so on) to an acceptable level for production, a condition step in the SageMaker pipeline allows the model to be registered into the model registry.

The model registry allows you to trigger the deployment of this model with a manual or automated approval process. This deployment takes place in an ML test account where operational tests such as integration tests, unit tests, model latency, and any additional model validation can be performed against the new model version. Note that A/B testing and shadow testing are not performed in the ML test account, but rather in the ML production account.

After the model passes all validations in the test account, it’s ready to be deployed to a production environment. A new approval process triggers this deployment, and SageMaker deployment guardrails allow for a controlled release and transparent model update process according to the traffic shifting mode selected.

The following diagram illustrates this solution architecture.

All at once traffic shifting

The all at once traffic shifting mode allows you to update a new model version (green fleet) by completely shifting 100% of the traffic from your current model (blue fleet) to your new model. With this option, you can configure a baking period during which both versions of your model are still running, and you can quickly and automatically roll back to the current version if your new model doesn’t perform as expected. The downside of this option is that all your data traffic is affected at once, so if there is an issue with your model deployment, all users using the application during the deployment process are affected. The following architecture shows how the all at once traffic shifting option handles model updates.

All at once traffic shifting can be incorporated into your MLOps tooling by defining an endpoint deployment configuration with BlueGreenUpdatePolicy set to ALL_AT_ONCE. In your MLOps pipeline, after a new model is approved for deployment to the ML production account, SageMaker checks if your model endpoint already exists. If so, the ALL_AT_ONCE configuration triggers an endpoint update that follows the architecture. Your endpoint rollback is controlled based on CloudWatch alarms defined by your endpoint AutoRollbackConfiguration, which when triggered automatically starts the model rollback to your current model version.

Canary traffic shifting

The canary traffic shifting mode allows you to test your new model (green fleet) with a small portion of the data traffic before either updating the running model (blue fleet) to the new version or rolling back the new version, depending on the outcome of the canary testing. The portion of the traffic used to test the new model is called the canary, and in this option your risk of a problematic new model is minimized to the canary traffic while the update time is still minimized.

Canary deployments allow you to minimize the risk of implementing a new model version by exposing the new model version to a smaller group of users to monitor effectiveness over a period of time. The downside is managing multiple versions for a period of time that allows for gathering performance metrics that are meaningful enough to determine performance impact. The benefit is the ability to isolate risk to a smaller group of users.

Canary traffic shifting can be incorporated into your MLOps tooling by defining an endpoint deployment configuration with a BlueGreenUpdatePolicy set to CANARY and defining the CanarySize to determine how much of your endpoint traffic should be redirected to a new model version. Similarly to all at once option, in your MLOps pipeline, after a new model is approved for deployment to the ML production account, SageMaker checks if your model endpoint already exists. If so, the CANARY configuration triggers an endpoint update that follows the architecture outlined in the following diagram. Your endpoint rollback is controlled based on CloudWatch alarms defined by your endpoint AutoRollbackConfiguration that when triggered automatically starts the model rollback to your current model version. Useful alarm types to deploy here are 500 status codes and model latency; however, these alarm settings should be customized to your specific business use case and ML technology.

Linear traffic shifting

In the linear traffic shifting model, you gradually change the traffic from your current model (blue fleet) to your new model version (green fleet) by increasing the data traffic sent to the new model in steps. This way, the proportion of traffic used to test your new model version gradually increases with each step, and a baking time for each step ensures that your model is still operational with the new traffic. With this option, you minimize the risk of deploying a low-performing model and gradually expose the new model to more data traffic. The downside of this approach is that your update time is longer and the costs of the running both models in parallel are increased.

Linear traffic shifting can be incorporated into your MLOps tooling by defining an endpoint deployment configuration with BlueGreenUpdatePolicy set to LINEAR and defining the LinearStepSize to determine how much of your traffic should be redirected to a new model in each step. Similarly to all at once option, in your MLOps pipeline, after a new model is approved for deployment to the ML production account, SageMaker checks if your model endpoint already exists. If so, the LINEAR configuration triggers an endpoint update that follows the architecture indicated in the following diagram. Your endpoint rollback is controlled based on CloudWatch alarms defined by your endpoint AutoRollbackConfiguration that when triggered automatically starts the model rollback to your current model version.

Deployment patterns with model production variants

Independently from the deployment pattern that you chose for your application, you can also utilize production variants to validate your model performance before updating your endpoint or implement additional deployment patterns such as shadow deployments. In this case, you want to add a manual or automated process to select the best model to be deployed before updating your endpoint. The following architecture shows how your endpoint traffic and response behave in a shadow deployment scenario. In this scenario, each prediction request is submitted to both the new and deployed models; however, only the currently deployed model serves the prediction response to the business application, while the prediction served from the new model is maintained only for analysis in performance against the currently deployed model. After model performance is evaluated, the new model version can be deployed to service prediction response traffic to business applications.

Rollback

Independently from the deployment strategy that you chose for your model deployment, you want to be able roll back to the previous model version if your new model performance is lower than your current model performance. To do so while minimizing the downtime of your application, you need to keep your current model running in parallel to the new one until you are confident that your new model performs better than the current one.

SageMaker deployment guardrails allow you to set alarms and automatically roll back to previous model versions during the model validation period. After the validation period is over, you might still need to roll back to a previous model version to solve a new problem that is discovered after the model update is complete. To do so, you can take advantage of the SageMaker model registry to reject and approved models and trigger a rollback process.

Conclusion

In this post, you learned how to combine SageMaker endpoint model variants and deployment guardrails with MLOps capabilities in order to create end-to-end patterns for model development. We provided an example implementation for canary and linear shifting deployment guardrails connected with SageMaker pipelines and the model registry via a SageMaker custom project. As a next step, try adapting the following template to implement the deployment strategy of your organization.

References


About the authors

Maira Ladeira Tanke is an ML Specialist Solutions Architect at AWS. With a background in data science, she has 9 years of experience architecting and building ML applications with customers across industries. As a technical lead, she helps customers accelerate their achievement of business value through emerging technologies and innovative solutions. In her free time, Maira enjoys traveling and spending time with her family someplace warm.

Clay Elmore is an AI/ML Specialist Solutions Architect at AWS. After spending many hours in a materials research lab, his background in chemical engineering was quickly left behind to pursue his interest in machine learning. He has worked on ML applications in many different industries ranging from energy trading to hospitality marketing. Clay has a special interest in bringing software development practices to ML and guiding customers towards repeatable, scalable solutions by using these principles. In his spare time, Clay enjoys skiing, solving Rubik’s cubes, reading, and cooking.

Shelbee Eigenbrode is a Principal AI and Machine Learning Specialist Solutions Architect at AWS. She has been in technology for 24 years spanning multiple industries, technologies, and roles. She is currently focusing on combining her DevOps and ML background into the domain of MLOps to help customers deliver and manage ML workloads at scale. With over 35 patents granted across various technology domains, she has a passion for continuous innovation and using data to drive business outcomes. Shelbee is a co-creator and instructor of the Practical Data Science specialization on Coursera. She is also the Co-Director of Women In Big Data (WiBD), Denver chapter. In her spare time, she likes to spend time with her family, friends, and overactive dogs.

Qiyun Zhao is a Senior Software Development Engineer with the Amazon SageMaker Inference Platform team. He is the lead developer of the deployment guardrails and shadow deployments, and he focuses on helping customers to manage ML workloads and deployments at scale with high availability. He also works on platform architecture evolutions for fast and secure ML jobs deployment and running ML online experiments at ease. In his spare time, he enjoys reading, gaming, and traveling.

Read More

AWS and Hugging Face collaborate to make generative AI more accessible and cost efficient

We’re thrilled to announce an expanded collaboration between AWS and Hugging Face to accelerate the training, fine-tuning, and deployment of large language and vision models used to create generative AI applications. Generative AI applications can perform a variety of tasks, including text summarization, answering questions, code generation, image creation, and writing essays and articles.

AWS has a deep history of innovation in generative AI. For example, Amazon uses AI to deliver a conversational experience with Alexa that customers are interacting with billions of times each week, and is increasingly using generative AI as part of new experiences like Create with Alexa. In addition, M5 a group within Amazon Search that helps teams across Amazon bring large models to their applications, trained large models to improve search results on Amazon.com. AWS is constantly innovating across all areas of ML including infrastructure, tools on Amazon SageMaker,  and AI services, such as Amazon CodeWhisperer, a service that improves developer productivity by generating code recommendations based on the code and comments in an IDE. AWS also created purpose-built ML accelerators for the training (AWS Trainium) and inference (AWS Inferentia) of large language and vision models on AWS.

Hugging Face selected AWS because it offers flexibility across state-of-the-art tools to train, fine-tune, and deploy Hugging Face models including Amazon SageMaker, AWS Trainium, and AWS Inferentia. Developers using Hugging Face can now easily optimize performance and lower cost to bring generative AI applications to production faster.

High-performance and cost-efficient generative AI

Building, training, and deploying large language and vision models is an expensive and time-consuming process that requires deep expertise in machine learning (ML). Since the models are very complex and can contain hundreds of billions of parameters, generative AI is largely out of reach for many developers.

To close this gap, Hugging Face is now collaborating with AWS to make it easier for developers to access AWS services and deploy Hugging Face models specifically for generative AI applications. The benefits are: faster training and scaling low-latency and high-throughput inference. For example, the Amazon EC2 Trn1 instances powered by AWS Trainium deliver faster time to train while offering up to 50% cost-to-train savings over comparable GPU-based instances. Amazon EC2’s new Inf2 instances, powered by the latest generation of AWS Inferentia, are purpose-built to deploy the latest generation of large language and vision models and raise the performance of Inf1 by delivering up to 4x higher throughput and up to 10x lower latency. Developers can use AWS Trainium and AWS Inferentia through managed services such as Amazon SageMaker, a service with tools and workflows for ML. Or they can self-manage on Amazon EC2.

Get started today

Customers can start using Hugging Face models on AWS in three ways: through SageMaker JumpStart, the Hugging Face AWS Deep Learning Containers (DLCs),  or the tutorials to deploy your models to AWS Trainium or AWS Inferentia. The Hugging Face DLC is packed with optimized transformers, datasets, and tokenizers libraries to enable you to fine-tune and deploy generative AI applications at scale in hours instead of weeks – with minimal code changes. SageMaker JumpStart and the Hugging Face DLCs are available in all regions where Amazon SageMaker is available and come at no additional cost. Read documentation and discussion forums to learn more or try the sample notebooks today.

Read More

Survey Reveals How Telcos Plan to Ring in Change Using AI

Survey Reveals How Telcos Plan to Ring in Change Using AI

The telecommunications industry has for decades helped advance revolutionary change – enabling everything from telephones and television to online streaming and self-driving cars. Yet the industry has long been considered an evolutionary mover in its own business.

A recent survey of more than 400 telecommunications industry professionals from around the world found that same cautious tone in how they plan to define and execute on their AI strategies.

To fill in a more complete picture of how the telecommunications industry is using AI, and where it’s headed, NVIDIA’s first “State of AI in Telecommunications” survey consisted of questions covering a range of AI topics, infrastructure spending, top use cases, biggest challenges and deployment models.

Survey respondents included C-suite leaders, managers, developers and IT architects from mobile telecoms, fixed and cable companies. The survey was conducted over eight weeks between mid-November 2022 and mid-January 2023.

Dial AI for Motivation

The survey results revealed two consistent themes: industry players (73%) see AI as a tool to grow revenue, improve operations and sustainability, or boost customer retention. Amid skepticism about the money-making potential of 5G, telecoms see efficiencies driven by AI as the most likely path for returns on investment.

Yet, 93% of those responding to questions about undertaking AI projects at their own companies appear to be substantially underinvesting in AI as a percentage of annual capital spending.

Some 50% of respondents reported spending less than $1 million last year on AI projects; a year earlier, 60% of respondents said they spent less than $1 million on AI. Just 3% of respondents spent over $50 million on AI in 2022.

The reasons cited for such cautious spending? Some 44% of respondents reported an inability to adequately quantify return on investment, which illustrates a mismatch between aspirations and the reality in introducing AI-driven solutions.

Technical challenges — whether from lack of enough skilled personnel or poor infrastructure — are also obstructing AI adoption. Of respondents, 34% cited an insufficient number of data scientists as the second-biggest challenge. Given that data scientists are sought after across industries, the response suggests that the telecoms industry needs to push harder to woo them.

With 33% of respondents also citing a lack of budget for AI projects, the results suggest that AI advocates need to work harder with decision-makers to develop a convincing case for AI adoption.

Likewise, for a technology solution that relies on data, concerns about the availability, handling, privacy and security of data were all critical issues to be addressed, especially in the light of data privacy and data residency laws around the globe, for example GDPR.

AI Engagement

Some 95% of telecommunications industry respondents said they were engaged with AI. But only 34% of respondents reported using AI for more than six months, while 23% said they’re still learning about the different options for AI. Eighteen percent reported being in a trial or pilot phase of an AI project.

For respondents at the trial or implementation stage, a clear majority acknowledged that there had been a positive impact on both revenue and cost. About 73% of respondents reported that implementation of AI had led to increased revenue in the last year, with 17% noting revenue gains of more than 10% in specific parts of the business.

Likewise, 80% of respondents reported that their implementation of AI led to reduced annual costs in the last year, with 15% noting that this cost reduction is above 10% — again, in specific parts of their business.

AI, AI Everywhere

The telecommunications industry has a deep and multilayered view on where best to allocate resources to AI: cost reduction, revenue increase, customer experience enhancement and creating operational efficiencies were all cited as key priorities.

In terms of deployment, however, AI focused on improving operational efficiency was a clear winner. This is somewhat expected, as the operational complexity of new telecommunications networks like 5G lend themselves to new solutions like AI. The industry is responsible for critical national infrastructure in every country, supports over 5 billion customer end points, and is expected to constantly deliver above 99% reliability. Telcos have also discussed AI-enabled solutions for network operations, cell sites planning, truck-routing optimization and machine learning data analytics. To improve the customer experience, some are adopting recommendation engines, virtual assistants and digital avatars.

In the near term, the focus appears to be on building more effective telecom infrastructure and unlocking new revenue-generating opportunities, especially together with partners.

The trick will be moving from early testing to widespread adoption.

Download the “State of AI in Telecommunications: 2023 Trends” report for in-depth results and insights.

Learn more about how telcos are leveraging AI to optimize operations and improve customer experiences.

Read More