LaMDA: our breakthrough conversation technology

We’ve always had a soft spot for language at Google. Early on, we set out to translate the web. More recently, we’ve invented machine learning techniques that help us better grasp the intent of Search queries. Over time, our advances in these and other areas have made it easier and easier to organize and access the heaps of information conveyed by the written and spoken word.

But there’s always room for improvement. Language is remarkably nuanced and adaptable. It can be literal or figurative, flowery or plain, inventive or informational. That versatility makes language one of humanity’s greatest tools — and one of computer science’s most difficult puzzles. 

LaMDA, our latest research breakthrough, adds pieces to one of the most tantalizing sections of that puzzle: conversation.

Animation of a multi-turn conversation between a person and LaMDA. The transcript reads:  USER: Hi, I’d like to have a conversation demonstration for a blog post.  LaMDA: Hi. I’m a friendly and knowledgeable demonstration for your blog post. I can engage in interesting conversation about virtually anything. What would you like to chat about?  USER: if my washing machine and tv are both malfunctioning at the same time what could the reason be?  LaMDA: It could be the power surge from the washing machine may have fried your tv. If it’s a power surge problem it may be time to get a new washing machine. Or a new TV.  USER: yeah maybe. btw how many legs does a horse have, and could it fix my washing machine?  LaMDA: Horses have four legs. How could they help?  USER: you tell me  LaMDA: Oh, well they could certainly help you. By moving you to a new home where you could install a new washing machine.

While conversations tend to revolve around specific topics, their open-ended nature means they can start in one place and end up somewhere completely different. A chat with a friend about a TV show could evolve into a discussion about the country where the show was filmed before settling on a debate about that country’s best regional cuisine. 

That meandering quality can quickly stump modern conversational agents (commonly known as chatbots), which tend to follow narrow, pre-defined paths. But LaMDA — short for “Language Model for Dialogue Applications” — can engage in a free-flowing way about a seemingly endless number of topics, an ability we think could unlock more natural ways of interacting with technology and entirely new categories of helpful applications.

The long road to LaMDA

LaMDA’s conversational skills have been years in the making. Like many recent language models, including BERT and GPT-3, it’s built on Transformer, a neural network architecture that Google Research invented and open-sourced in 2017. That architecture produces a model that can be trained to read many words (a sentence or paragraph, for example), pay attention to how those words relate to one another and then predict what words it thinks will come next. 

But unlike most other language models, LaMDA was trained on dialogue. During its training, it picked up on several of the nuances that distinguish open-ended conversation from other forms of language. One of those nuances is sensibleness. Basically: Does the response to a given conversational context make sense? For instance, if someone says:

“I just started taking guitar lessons.”

You might expect another person to respond with something like: 

“How exciting! My mom has a vintage Martin that she loves to play.”

That response makes sense, given the initial statement. But sensibleness isn’t the only thing that makes a good response. After all, the phrase “that’s nice” is a sensible response to nearly any statement, much in the way “I don’t know” is a sensible response to most questions. Satisfying responses also tend to be specific, by relating clearly to the context of the conversation. In the example above, the response is sensible and specific.

LaMDA builds on earlier Google research, published in 2020, that showed Transformer-based language models trained on dialogue could learn to talk about virtually anything. Since then, we’ve also found that, once trained, LaMDA can be fine-tuned to significantly improve the sensibleness and specificity of its responses. 

Responsibility first

These early results are encouraging, and we look forward to sharing more soon, but sensibleness and specificity aren’t the only qualities we’re looking for in models like LaMDA. We’re also exploring dimensions like “interestingness,” by assessing whether responses are insightful, unexpected or witty. Being Google, we also care a lot about factuality (that is, whether LaMDA sticks to facts, something language models often struggle with), and are investigating ways to ensure LaMDA’s responses aren’t just compelling but correct.

But the most important question we ask ourselves when it comes to our technologies is whether they adhere to our AI Principles. Language might be one of humanity’s greatest tools, but like all tools it can be misused. Models trained on language can propagate that misuse — for instance, by internalizing biases, mirroring hateful speech, or replicating misleading information. And even when the language it’s trained on is carefully vetted, the model itself can still be put to ill use. 

Our highest priority, when creating technologies like LaMDA, is working to ensure we minimize such risks. We’re deeply familiar with issues involved with machine learning models, such as unfair bias, as we’ve been researching and developing these technologies for many years. That’s why we build and open-source resources that researchers can use to analyze models and the data on which they’re trained; why we’ve scrutinized LaMDA at every step of its development; and why we’ll continue to do so as we work to incorporate conversational abilities into more of our products.

Read More

Speed-up your sites with web-page prefetching using Machine Learning

Posted by Minko Gechev, David Zats, Na Li, Ping Yu, Anusha Ramesh, and Sandeep Gupta

Page load time is one of the most important determinants of user experience on a web site. Research shows that faster page load time directly leads to increased page views, conversion, and customer satisfaction. Retail superstore Newegg has seen a 50% increase in conversions after implementing web-page prefetching to optimize page load experience.

Using TensorFlow tooling, it is now possible to use machine learning to implement a powerful solution for your website to improve page load times. In this blog post, we show an end-to-end workflow for using your site’s navigation data from Google Analytics and training a custom machine learning model that can predict the user’s next actions. You can use these predictions in an Angular app to pre-fetch candidate pages and dramatically improve user experience on your web site. Fig. 1 illustrates this side-by-side with default page load experience with no optimization compared to the greatly improved page load times with machine learning based predictive prefetching implemented on the right. Both examples are running on an emulated slow 3G network.

Comparison of un-optimized and machine learning based page loading time in a sample web application
Fig: Comparison of un-optimized and machine learning based page loading time in a sample web application

A high-level schematic of our solution is as follows:

Solution overview
Fig: Solution overview

We use Google Cloud services (BigQuery and Dataflow) to store and preprocess the site’s Google Analytics data, then train a custom model using TensorFlow Extended (TFX) to run our model training pipeline, produce a site-specific model, and then convert it into a web-deployable TensorFlow.js format. This client-side model will be loaded in a sample Angular web app for an e-store to demonstrate how to deploy the model in a web application. Let’s take a look at these components in more detail.

Data Preparation & Ingestion

Google Analytics stores each page visit as an event, providing key aspects such as the page name, visit time, and load time. This data contains everything we need to train our model. We need to:

  1. Convert this data to training examples containing features and labels
  2. Make it available to TFX for training.

We accomplish the first by leveraging existing support for exporting the Google Analytics data to a large-scale cloud data store called BigQuery. We accomplish the latter by creating an Apache Beam pipeline that:

  1. Reads the data from BigQuery
  2. Sorts and filters the events in a session
  3. Walks through each session, creating examples that take properties of the current event as features and the page visit in the next event as the label
  4. Stores these generated examples in Google Cloud Storage so that they can be used by TFX for training.

We run our Beam pipeline in Dataflow.

In the following table, each row represents a training example:

cur_page

session_index

label

page2

0

page3

page3

8

page1

While our training example only contains two training features (cur_page and session_index), additional features from Google Analytics can be easily added to create a richer dataset and used for training to create a more powerful model. To do so, extend the following code:

def ga_session_to_tensorflow_examples(session):
examples = []

for i in range(len(session)-1):
features = {‘cur_page’: [session[i][‘page’][‘pagePath’]],
‘label’: [session[i+1][‘page’][‘pagePath’]],
‘session_index’: [i],
# Add additional features here.

}
examples.append(create_tensorflow_example(features))
return examples

Model Training

Tensorflow Extended (TFX) is an end to end production scale ML platform and is used to automate the process of data validation, training at scale (using accelerators), evaluation & validation of the generated model.

To create a model within TFX, you must provide the preprocessing function and the run function. The preprocessing function defines the operations that should be performed on the data before it is passed to the main model. These include operations that involve a full pass over the data, such as vocab creation. The run function defines the main model and how it is to be trained.

Our example shows how to implement the preprocessing_fn and the run_fn to define and train a model for predicting the next page. And the TFX example pipelines demonstrate how to implement these functions for many other key use cases.

Creating a Web Deployable Model

After training our custom model, we want to deploy this model in our web application so it can be used to make live predictions when users visit our website. For this, we use TensorFlow.js, which is TensorFlow’s framework for running machine learning models directly in the browser client-side. By running this code in the browser client-side, we can reduce latency associated with server-side roundtrip traffic, reduce server-side costs, and also keep user’s data private by not having to send any session data to the server.

TFX employs the Model Rewriting Library to automate conversion between trained TensorFlow models and the TensorFlow.js format. As part of this library, we have implemented a TensorFlow.js rewriter. We simply invoke this rewriter within the run_fn to perform the desired conversion. Please see the example for more details.

Angular Application

Once we have the model we can use it within an Angular application. On each navigation, we will query the model and prefetch the resources associated with the pages that are likely to be visited in the future.

An alternative solution would be to prefetch the resources associated with all the possible future navigation paths, but this would have much higher bandwidth consumption. Using machine learning, we can predict only the pages, which are likely to be used next and reduce the number of false positives.

Depending on the specifics of the application we may want to prefetch different types of assets, for example: JavaScript, images, or data. For the purposes of this demonstration we’ll be prefetching images of products.

A challenge is how to implement the mechanism in a performant way without impacting the application load time or runtime performance. Techniques to mitigate the risks of performance regressions we can use are:

  • Load the model and TensorFlow.js lazily without blocking the initial page load time
  • Query the model off the main thread so we don’t drop frames in the main thread and achieve 60fps rendering experience

A web platform API that satisfies both of these constraints is the service worker. A service worker is a script that your browser runs in the background in a new thread, separate from a web page. It also allows you to plug into a request cycle and provides you with cache control.

When the user navigates across the application, we’ll post messages to the service worker with the pages they have visited. Based on the navigation history, the service worker will make predictions for future navigation and prefetch relevant product assets.

Example of future navigation

Let us look at a high-level overview of the individual moving parts.

From within the main file of our Angular application, we can load the service worker:

// main.ts

if ('serviceWorker' in navigator) {
navigator.serviceWorker.register('/prefetch.worker.js', { scope: '/' });
}

This snippet will download the prefetch.worker.js script and run it in the background. As the next step, we want to forward navigation events to it:

// app.component.ts

this.route.params.subscribe((routeParams) => {
if (this._serviceWorker) {
this._serviceWorker.postMessage({ page: routeParams.category });
}
});

In the snippet above, we watch for changes of the parameters of the URL. On change, we forward the category of the page to the service worker.

In the implementation of the service worker we need to handle messages from the main thread, make predictions based on them, and prefetch the relevant information. On a high-level this looks as follows:

// prefetch.worker.js

addEventListener('message', ({ data }) => prefetch(data.page));

const prefetch = async (path) => {
const predictions = await predict(path);
const cache = await caches.open(ImageCache);

predictions.forEach(async ([probability, category]) => {
const products = (await getProductList(category)).map(getUrl);
[...new Set(products)].forEach(url => {
const request = new Request(url, {
mode: 'no-cors',
});
fetch(request).then(response => cache.put(request, response));
});
});
};

Within the service worker we listen for messages from the main thread. When we receive a message we trigger the logic responsible for making predictions and prefetching data.

In the prefetch function we first predict, which are the pages the user could visit next. After that, we iterate over all the predictions and fetch the corresponding resources to improve the user experience in subsequent navigation.

For details you can follow the sample app in the TensorFlow.js examples repository.

Try it yourself

Check out the model training code sample which shows the TFX pipeline for training a page prefetching model as well as an Apache Beam pipeline that converts Google Analytics data to training examples, and the deployment sample showing how to deploy the TensorFlow.js model in a sample Angular app for client-side predictions.

Acknowledgements

This project wouldn’t have been possible without the incredible effort and support of Becky Chan, Deepak Aujla, Fei Dong, and Jason Mayes.

Read More

A Further Step to Getting GeForce Cards into the Hands of Gamers

GeForce products are made for gamers — and packed with innovations. Our RTX 30 Series is built on our second-generation RTX architecture, with dedicated RT Cores and Tensor Cores, delivering amazing visuals and performance to gamers and creators.

Because NVIDIA GPUs are programmable, users regularly discover new applications for them, from weather simulation and gene sequencing to deep learning and robotics. Mining cryptocurrency is one of them.

Halving Hash Rate

To help get GeForce GPUs in the hands of gamers, we announced in February that all GeForce RTX 3060 graphics cards shipped with a reduced Ethereum hash rate.

Today, we’re taking additional measures by applying a reduced ETH hash rate to newly manufactured GeForce RTX 3080, RTX 3070 and RTX 3060 Ti graphics cards. These cards will start shipping in late May.

Clear Communication to Gamers

Because these GPUs originally launched with a full hash rate, we want to ensure that customers know exactly what they’re getting when they buy GeForce products. To help with this, our GeForce partners are labeling the GeForce RTX 3080, RTX 3070 and RTX 3060 Ti cards with a “Lite Hash Rate,” or “LHR,” identifier. The identifier will be in retail product listings and on the box.

This reduced hash rate only applies to newly manufactured cards with the LHR identifier and not to cards already purchased.

GeForce Is Made for Gaming

GeForce RTX GPUs have introduced a range of cutting-edge technologies — RTX real-time ray tracing, AI-powered DLSS frame rate booster, NVIDIA Reflex super-fast response rendering for best system latency, and many more — created to meet the needs of gamers and those who create digital experiences.

We believe this additional step will get more GeForce cards at better prices into the hands of gamers everywhere.

 

The post A Further Step to Getting GeForce Cards into the Hands of Gamers appeared first on The Official NVIDIA Blog.

Read More

How Facebook’s Project SEISMIC helps bring greener telecom infrastructure

All deployment site photos from Peru were taken by our partners at Mayu Telecomunicaciones and are used here with permission. To request permission to use the photos, contact servicios@mayutel.com.

Facebook Connectivity’s mission is to enable better, broader global connectivity to bring more people online to a faster internet. This mission has become more important, with ever-increasing data consumption and need for coverage. We collaborate with others in the industry — including telecom operators, community leaders, technology developers, and researchers — in order to find solutions that are scalable and sustainable. One of our most recent collaborations is Project SEISMIC: Smart Energy Infrastructure for Mobile Internet Connectivity. In this project, we are developing a solution to smartly manage the power and functionality of telecom sites. For example, we can reduce the capacity and transmission power of the site during less busy periods. By doing so, we want to better design and operate off-grid sites in order to reduce cost and improve their sustainability.

Many parts of the world still lack coverage and capacity, especially in rural areas. To help close this gap, telecommunications providers need to build new telecom sites and links. However, many rural areas lack access to an electrical grid. This presents a major challenge, as telecom sites and networks consume a significant amount of power, and this consumption is expected to rise even further.

In places where there is no reliable electricity grid, we have to rely on solar power, diesel power, or hydropower. Each has its own set of requirements: Solar-powered sites require solar panels and batteries to be brought on-site, diesel-powered sites need periodic resupply of diesel, and hydropowered sites require the construction of hydro generators. All this leads to significant challenges in cost, logistics, and transportation, presenting a barrier to providing connectivity in remote areas. To help remove these barriers and help make rural connectivity more accessible, Facebook is exploring innovations like Project SEISMIC that enable us to build and operate telecom sites more efficiently.

Bringing a more sustainable connectivity

Project SEISMIC offers smart power management of telecom sites, using dynamic power management to better design and operate off-grid sites.

A major challenge that we are addressing in Project SEISMIC is how to provide high reliability and availability over time, as the supply of power varies. For example, the output power from solar panels depends on the amount of sunlight, which changes depending on many factors including weather conditions, the time of the day, and the day of the year. This means that more solar panels and batteries have to be provisioned in order to meet availability requirements in areas that are subjected to longer rainy and cloudy periods, as well as in areas that receive less sunlight. Similarly, the output of a hydropower generator depends on its water supply.

Conventional telecom power system sizing of a solar-powered site is based on (1) the worst-case historic irradiance in the installation site, which can be much worse than the average irradiance, and (2) the average power consumption of the telecom system, which typically remains static and invariable over the time, no matter how the weather is and even when most of the people are sleeping and the traffic is close to zero.

Finally, many off-grid sites also lack accessible transportation, and in many cases, the equipment must be brought on-site by pack animals, boat, or even on foot. Inclement weather, floods, and inaccessible tracks all present incredible logistical challenges. All this leads to high telecom site costs and can make connectivity unfeasible.


Mayutel engineers and local workers load a solar panel onto a boat to take it to our test sites in rural Peru. This photo was taken by our partners at Mayu Telecomunicaciones and are used here with permission. To request permission to use the photos, contact servicios@mayutel.com.

However, as power supply performance varies over time, so does usage pattern. Taking inspiration from this, we considered the potential of smart, dynamic power management of a telecom site. What if we could adjust performance parameters — such as transmit power, bandwidth, number of channels, bit rates — to better meet power supply variations while maintaining the right level of connectivity performance at the right time? This is the inspiration behind Project SEISMIC.

Conventional telecom sites are designed and operated with few to no adjustments done during operation. This means that availability requirements are derived from peak power consumption. With SEISMIC, we believe that we can better design telecom sites that are greener — requiring fewer solar panels, batteries, and other power system elements — to drive down cost, improve sustainability, and overcome cost challenges.

SEISMIC uses predictive analytics, smart telecom site management, smart telecom site elements, and cloud services to improve the power efficiency of a telecom site while maintaining the right level of performance to meet availability requirements.

Proving a more sustainable connectivity

In order to prove our research concept, we developed partnerships with several key players. Mayu Telecomunicaciones, the first rural mobile infrastructure operator in Peru, agreed to become an operator partner and collaborate with us to deploy SEISMIC test sites. They work with the local communities in rural Peru to build the telecom sites, deploy 4G radio systems, and provide broadband connectivity for the first time to many in the community.

Clear Blue Technologies, a smart power management solutions and services company, provided a power management module, software, and cloud service to enable dynamic power management. Aviat Networks provided wireless microwave technology to enable backhaul connectivity to the test sites. BaiCells provided radio access network units.


Mayutel engineers ride on a boat to take them to our test sites in rural Peru. These photos were taken by our partners at Mayu Telecomunicaciones and are used here with permission. To request permission to use the photos, contact servicios@mayutel.com.

To obtain the best set of data, we commissioned two active telecom sites in Peru. One is the baseline site that uses conventional telecom sizing and operational methodology. The other is a smartly designed site that uses fewer solar panels and batteries. By commissioning these two sites side by side, we can compare their performance over time and track relevant telecom performance indicators, such as number of connections, bandwidth, and reliability. We believe that significant savings in power costs — on the order of 40 percent to 60 percent — are possible while maintaining relevant telecom performance.


Our smart power test sites under construction during dry and rainy periods. These photos were taken by our partners at Mayu Telecomunicaciones and are used here with permission. To request permission to use the photos, contact servicios@mayutel.com.

As two sites we commissioned are now live in Mayutel’s network, we have started collecting data as we test the functionality of both sites. As we collect data over time, we will improve our analysis and give an update on the performance, availability, and power reliability of our telecom sites.

A call to action

Power management is key to connectivity and networking infrastructure, to bring performance, economic, and sustainability benefits. To learn about our smart power management solution in the Telecom Infra Project, please join the Network as a Service Solutions project group. For more about the Telecom Infra Project, visit their website. You can also learn about other initiatives on the Facebook Connectivity website.

In addition to the current focus on rural and deep rural applications, we believe that this idea can be applicable to a wide range of other telecom deployment use cases, including urban small cell sites and edge computing nodes. In the case of urban small cells, for example, being able to efficiently power small cells without the need for grid connection may provide a significant cost benefit, provided that the solar battery system sizing can be made suitably small. We welcome interested parties to explore these and other use cases with us.

Further, besides the current project focus on solar battery powering solutions, the concept can be readily applied to benefit various other powering architectures such as diesel battery, wind battery, and other deployment scenarios.

Thanks to our partners

This project would not be possible without the indefatigable commitment and help from our partners. We thank Mayu Telecomunicaciones for agreeing to become our operator partner, providing their expertise, access to their sites, engineering, and support.

We also thank our technology partners for their engineering support: Clear Blue Technologies, Aviat Networks, Parallel Wireless, and BaiCells. We are grateful for the excellent collaboration and teamwork that has resulted in successful deployment of this demonstration.

The post How Facebook’s Project SEISMIC helps bring greener telecom infrastructure appeared first on Facebook Research.

Read More

NVIDIA BlueField DPUs Fuel Unprecedented Data Center Transformation

Cloud computing and AI are pushing the boundaries of scale and performance for data centers.

Anticipating this shift, industry leaders such as Baidu, Palo Alto Networks, Red Hat and VMware are using NVIDIA BlueField DPUs to transform their data center platforms into higher performing, more secure, agile platforms and bring differentiated products and services to market.

NVIDIA BlueField DPUs are designed to meet the infrastructure requirements that modern data centers must offer for today’s cloud computing and AI workloads:

  • Upgraded Efficiency: BlueField DPUs enable all available CPU resources to run business applications, freeing up CPUs that would otherwise have been used to support software-defined networking.
  • Increased Ability to Scale: Cloud-native applications are highly distributed and create intensive “east-west” traffic within data centers. BlueField DPUs provide a high-throughput, low-latency network environment for scale-out applications.
  • Leading-Edge Security: Multi-tenancy and infrastructure elasticity in cloud data centers pose privacy and confidentiality risks that are addressed by BlueField DPUs.
  • Enhanced Performance: BlueField DPUs provide robust and powerful networking to handle the growing prevalence of GPU-accelerated computing in the cloud, enterprise and edge.

It can seem overwhelming for organizations to deliver on these new requirements quickly and efficiently, while also protecting each individual workload. Below are examples of how NVIDIA’s customers and partners are leveraging BlueField DPUs to offload, accelerate and isolate infrastructure software in ways that dramatically improve data center performance, scalability and security.

Baidu Delivers on Promise of Bare-Metal Clouds

Baidu has deployed NVIDIA BlueField DPUs for its scale-out bare-metal cloud infrastructure. One of the key advantages of bare-metal cloud computing lies in its ability to deliver predictable and consistent performance by allowing direct access to the hardware.

Traditional bare-metal infrastructure lacks the operational flexibility and agility that the virtualized cloud provides. BlueField empowers Baidu to easily provision bare-metal compute instances to millions of companies in China with high-speed networking and tenant isolation.

An ideal use-case for DPUs, NVIDIA BlueField has helped Baidu transform bare-metals from hardware-defined server infrastructure into software-defined, hardware-accelerated cloud platforms.

Red Hat Brings Composable Compute to the Edge

Red Hat closely collaborates with NVIDIA to enable GPU-accelerated AI computing and drive innovation across the entire stack.

The open-source software titan has developed a DPU-powered composable compute infrastructure to address the stringent requirements for performance, latency and security in edge data centers. When deployed in a fleet of 50,000 nodes, this state-of-the-art infrastructure would save up to $70 million through BlueField’s efficient software-defined, hardware acceleration engines.

At a basic level, Red Hat’s composable compute is optimized for the edge as it uses bare-metal as a service together with BlueField’s domain-specific hardware acceleration to boost the performance of enterprise workloads running on the flagship Red Hat Enterprise Linux operating system or the full-scale OpenShift Kubernetes platform.

As a leader in application containerization, Red Hat’s vision extends the software microservices concept to hardware, enabling zero-trust security and operational efficiency for enterprise IT and devops alike.

Palo Alto Networks Accelerates 5G-Native Security

5G wireless networks are set to reshape digital business and open up conversation about modern security concerns. At the forefront of cybersecurity, Palo Alto Networks has taken a unique approach to bring security forward to 5G networks.

Building on its expertise in network, cloud and device security, the company has created a 5G-native security solution that consists of 5G context-driven security, automation and service protection. To address the rigid data processing requirements of 5G, Palo Alto Networks has integrated its industry-leading next-generation firewall with NVIDIA BlueField DPUs.

The result is a software-defined, hardware-accelerated firewall architecture that will provide line-rate security processing at 100Gb/s speed. This innovative architecture is available today for service providers and enterprises for testing the technology.

VMware Redefines the Hybrid Cloud Architecture

At GTC 2020, VMware and NVIDIA announced a broad partnership to deliver both an AI-Ready Enterprise Platform, and a new architecture for data center, cloud and edge that uses BlueField DPUs to support existing and next-generation applications.

A critical component of the AI-Ready Enterprise Platform, the NVIDIA AI Enterprise software suite runs on VMware vSphere and is optimized, certified and supported by NVIDIA to help thousands of VMware customers in the world’s largest industries to unlock the power of AI.

Additionally, VMware is collaborating with NVIDIA to define a next-generation cloud architecture based on VMware Cloud Foundation and BlueField, and will offer increased performance, a zero-trust distributed security model and simplified operations. Known as VMware Project Monterey, this will help customers take advantage of both CPUs and DPUs while extending VMware infrastructure value to bare-metal applications for the first time.

Together, VMware and NVIDIA are building a high-performance, more secure, efficient and AI-ready cloud platform for the 300,000-plus organizations using VMware’s virtualization platform today.

Next Up: NVIDIA BlueField-3 

Unveiled by NVIDIA CEO Jensen Huang during his GTC21 keynote address, BlueField-3 is the product of the company’s multiyear innovation roadmap, which will see a new BlueField generation every 18 months.

NVIDIA BlueField-3 redefines the art of the possible by delivering the industry’s first DPU to offer 400Gb/s Ethernet and InfiniBand networking with up to 10x the processing power of its predecessor.

Huang also announced NVIDIA DOCA 1.0, a powerful application framework that enables developers to rapidly create applications and services on top of NVIDIA BlueField DPUs. Every new generation of BlueField will support DOCA, which preserves the development investment as each DPU generation evolves.

For customers, this means that today’s applications and data center infrastructure will run even faster when BlueField-3 arrives.

Transforming the Data Center

NVIDIA and its broad partner ecosystem are on a mission to transform the data center by leveraging the NVIDIA BlueField family of data processing units to power the next wave of accelerated cloud and AI computing applications.

Learn more about NVIDIA BlueField data processing units and apply for early access to NVIDIA DOCA SDK.

Be sure to watch the sessions at GTC21 presented in partnership with Red Hat, Palo Alto Networks and VMware.

The post NVIDIA BlueField DPUs Fuel Unprecedented Data Center Transformation appeared first on The Official NVIDIA Blog.

Read More

DiDi Chooses NVIDIA DRIVE for New Fleet of Self-Driving Robotaxis

Robotaxis are one major step closer to becoming reality.

DiDi Autonomous Driving, the self-driving technology arm of mobility technology leader Didi Chuxing, announced last month a strategic partnership with Volvo Cars on autonomous vehicles for DiDi’s self-driving test fleet.

Volvo’s autonomous drive-ready XC90 cars will be the first to integrate DiDi Gemini, a new self-driving hardware platform, which is equipped with NVIDIA DRIVE AGX Pegasus. These vehicles, equipped with DiDi’s Gemini self-driving hardware platform, will eventually be deployed in robotaxi services.

These self-driving test vehicles are significant progress towards commercial robotaxi services.

Robotaxis are autonomous vehicles that can operate on their own in geofenced areas, such as cities or residential communities. With a set of high-resolution sensors and a supercomputing platform in place of a driver, they can safely operate 24 hours a day, seven days a week.

And as a safer alternative to current modes of transit, robotaxis are expected to draw quick adoption once deployed at scale, making up more than 5 percent of vehicle miles traveled worldwide by 2030.

With the high-performance, energy-efficient compute of NVIDIA DRIVE at their core, these vehicles developed by DiDi are poised to help accelerate this landmark transition.

Doubling Up on Redundancy

The key to DiDi’s robotaxi ambitions is its new self-driving hardware platform, DiDi Gemini.

Achieving fully autonomous vehicles requires centralized, high-performance compute. The amount of sensor data a robotaxi needs to process is 100x greater than today’s most advanced vehicles.

The complexity in software also increases exponentially, with an array of redundant and diverse deep neural networks running simultaneously as part of an integrated software stack.

Built on NVIDIA DRIVE AGX Pegasus, DiDi Gemini achieves 700 trillion operations per second (TOPS) of performance, and includes up to 50 high-resolution sensors and an ASIL-D rated fallback system. It is architected with multi-layered redundant protections to enhance the overall safety of the autonomous driving experience.

The Gemini platform was designed using Didi Chuxing’s massive database of ride-hailing data as well as real-world autonomous driving test data to deliver the optimal self-driving hardware experience.

A New Generation of Collaboration

DiDi’s test fleet also marks a new era in technology collaboration.

DiDi and Volvo Cars plan to build a long-term partnership, expanding the autonomous test fleets across China and the U.S. and scaling up commercial robotaxi operations. The NVIDIA DRIVE platform enables continuous improvement over the air, facilitating these future plans of development and expansion.

This collaboration combines long-held legacies in vehicle safety, ride-hailing expertise and AI computing to push the bounds of transportation technology for safer, more efficient everyday mobility.

The post DiDi Chooses NVIDIA DRIVE for New Fleet of Self-Driving Robotaxis appeared first on The Official NVIDIA Blog.

Read More

Next-Generation Pose Detection with MoveNet and TensorFlow.js

Posted by Ronny Votel and Na Li, Google Research

Today we’re excited to launch our latest pose detection model, MoveNet, with our new pose-detection API in TensorFlow.js. MoveNet is an ultra fast and accurate model that detects 17 keypoints of a body. The model is offered on TF Hub with two variants, known as Lightning and Thunder. Lightning is intended for latency-critical applications, while Thunder is intended for applications that require high accuracy. Both models run faster than real time (30+ FPS) on most modern desktops, laptops, and phones, which proves crucial for live fitness, sports, and health applications. This is achieved by running the model completely client-side, in the browser using TensorFlow.js with no server calls needed after the initial page load and no dependencies to install.

Try out the live demo!

MoveNet can track keypoints through fast motions and atypical poses.
MoveNet can track keypoints through fast motions and atypical poses.

Human pose estimation has come a long way in the last five years, but surprisingly hasn’t surfaced in many applications just yet. This is because more focus has been placed on making pose models larger and more accurate, rather than doing the engineering work to make them fast and deployable everywhere. With MoveNet, our mission was to design and optimize a model that leverages the best aspects of state-of-the-art architectures, while keeping inference times as low as possible. The result is a model that can deliver accurate keypoints across a wide variety of poses, environments, and hardware setups.

Unlocking Live Health Applications with MoveNet

We teamed up with IncludeHealth, a digital health and performance company, to understand whether MoveNet can help unlock remote care for patients. IncludeHealth has developed an interactive web application that guides a patient through a variety of routines (using a phone, tablet, or laptop) from the comfort of their own home. The routines are digitally built and prescribed by physical therapists to test balance, strength, and range of motion.

The service requires web-based and locally run pose models for privacy that can deliver precise keypoints at high frame rates, which are then used to quantify and qualify human poses and movements. While a typical off-the-shelf detector is sufficient for easy movements such as shoulder abductions or full body squats, more complicated poses such as seated knee extensions or supine positions (laying down) cause grief for even state-of-the-art detectors trained on the wrong data.

Comparison of a traditional detector (top) vs MoveNet (bottom) on difficult poses.
Comparison of a traditional detector (top) vs MoveNet (bottom) on difficult poses.

We provided an early release of MoveNet to IncludeHealth, accessible through the new pose-detection API. This model is trained on fitness, dance, and yoga poses (see more details about the training dataset below). IncludeHealth integrated the model into their application and benchmarked MoveNet relative to other available pose detectors:

“The MoveNet model has infused a powerful combination of speed and accuracy needed to deliver prescriptive care. While other models trade one for the other, this unique balance has unlocked the next generation of care delivery. The Google team has been a fantastic collaborator in this pursuit.” – Ryan Eder, Founder & CEO at IncludeHealth.

As a next step, IncludeHealth is partnering with hospital systems, insurance plans, and the military to enable the extension of traditional care and training beyond brick and mortar.

IncludeHealth demo application running in browser that quantifies balance and motion using keypoint estimation powered by MoveNet and TensorFlow.js
IncludeHealth demo application running in browser that quantifies balance and motion using keypoint estimation powered by MoveNet and TensorFlow.js

Installation

There are two ways to use MoveNet with the new pose-detection api:

  1. Through NPM:
    import * as poseDetection from '@tensorflow-models/pose-detection';
  2. Through script tag:
    <script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs-core"></script>
    <script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs-converter"></script>
    <script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs-backend-webgl"></script>
    <script src="https://cdn.jsdelivr.net/npm/@tensorflow-models/pose-detection"></script>

Try it yourself!

Once the package is installed, you only need to follow the few steps below to start using it:

// Create a detector.
const detector = await poseDetection.createDetector(poseDetection.SupportedModels.MoveNet);

The detector defaults to use the Lightning version; to choose the Thunder version, create the detector as below:

// Create a detector.
const detector = await poseDetection.createDetector(poseDetection.SupportedModels.MoveNet, {modelType: poseDetection.movenet.modelType.SINGLEPOSE_THUNDER});
// Pass in a video stream to the model to detect poses.
const video = document.getElementById('video');
const poses = await detector.estimatePoses(video);

Each pose contains 17 keypoints, with absolute x, y coordinates, confidence score and name:

console.log(poses[0].keypoints);
// Outputs:
// [
// {x: 230, y: 220, score: 0.9, name: "nose"},
// {x: 212, y: 190, score: 0.8, name: "left_eye"},
// ...
// ]

Refer to our README for more details about the API.

As you begin to play and develop with MoveNet, we would appreciate your feedback and contributions. If you make something using this model, tag it with #MadeWithTFJS on social so we can find your work, as we would love to see what you create.

MoveNet Deep Dive

MoveNet Architecture

MoveNet is a bottom-up estimation model, using heatmaps to accurately localize human keypoints. The architecture consists of two components: a feature extractor and a set of prediction heads. The prediction scheme loosely follows CenterNet, with notable changes that improve both speed and accuracy. All models are trained using the TensorFlow Object Detection API.

The feature extractor in MoveNet is MobileNetV2 with an attached feature pyramid network (FPN), which allows for a high resolution (output stride 4), semantically rich feature map output. There are four prediction heads attached to the feature extractor, responsible for densely predicting a:

  • Person center heatmap: predicts the geometric center of person instances
  • Keypoint regression field: predicts full set of keypoints for a person, used for grouping keypoints into instances
  • Person keypoint heatmap: predicts the location of all keypoints, independent of person instances
  • 2D per-keypoint offset field: predicts local offsets from each output feature map pixel to the precise sub-pixel location of each keypoint
MoveNet architecture
MoveNet architecture

Although these predictions are computed in parallel, one can gain insight into the model’s operation by considering the following sequence of operations:

Step 1: The person center heatmap is used to identify the centers of all individuals in the frame, defined as the arithmetic mean of all keypoints belonging to a person. The location with the highest score (weighted by the inverse-distance from the frame center) is selected.

Step 2: An initial set of keypoints for the person is produced by slicing the keypoint regression output from the pixel corresponding to the object center. Since this is a center-out prediction – which must operate over different scales – the quality of regressed keypoints will not be very accurate.

Step 3: Each pixel in the keypoint heatmap is multiplied by a weight which is inversely proportional to the distance from the corresponding regressed keypoint. This ensures that we do not accept keypoints from background people, since they typically will not be in the proximity of regressed keypoints, and hence will have low resulting scores.

Step 4: The final set of keypoint predictions are selected by retrieving the coordinates of the maximum heatmap values in each keypoint channel. The local 2D offset predictions are then added to these coordinates to give refined estimates. See the figure below which illustrates these four steps.

MoveNet post-processing steps
MoveNet post-processing steps.

Training Datasets

MoveNet was trained on two datasets: COCO and an internal Google dataset called Active. While COCO is the standard benchmark dataset for detection – due to its scene and scale diversity – it is not suitable for fitness and dance applications, which exhibit challenging poses and significant motion blur. Active was produced by labeling keypoints (adopting COCO’s standard 17 body keypoints) on yoga, fitness, and dance videos from YouTube. No more than three frames are selected from each video for training, to promote diversity of scenes and individuals.

Evaluations on the Active validation dataset show a significant performance boost relative to identical architectures trained using only COCO. This isn’t surprising since COCO infrequently exhibits individuals with extreme poses (e.g. yoga, pushups, headstands, and more).

To learn more about the dataset and how MoveNet performs across different categories, please see the model card.

Images from Active keypoint dataset.
Images from Active keypoint dataset.

Optimization

While a lot of effort went into architecture design, post-processing logic, and data selection to make MoveNet a high-quality detector, an equal focus was given to inference speed. First, bottleneck layers from MobileNetV2 were selected for lateral connections in the FPN. Likewise, the number of convolution filters in each prediction head were slimmed down significantly to speed up execution on the output feature maps. Depthwise separable convolutions are used throughout the network, except in the first MobileNetV2 layer.

MoveNet was repeatedly profiled, uncovering and removing particularly slow ops. For example, we replaced tf.math.top_k with tf.math.argmax, since it executes significantly faster and is adequate for the single-person setting.

To ensure fast execution with TensorFlow.js, all model outputs were packed into a single output tensor, so that there is only one download from GPU to CPU.

Perhaps the most significant speedup is the use of 192×192 inputs to the model (256×256 for Thunder). To counteract the lower resolution, we apply intelligent cropping based on detections from the previous frame. This allows the model to devote its attention and resources to the main subject, and not the background.

Temporal Filtering

Operating on a high FPS camera stream provides the luxury of applying smoothing to keypoint estimates. Both Lightning and Thunder apply a robust, non-linear filter to the incoming stream of keypoint predictions. This filter is tuned to simultaneously suppress high-frequency noise (i.e. jitter) and outliers from the model, while also maintaining high-bandwidth throughput during quick motions. This leads to smooth keypoint visualizations with minimal lag in all circumstances.

MoveNet Browser Performance

To quantify the inference speed of MoveNet, the model was benchmarked across multiple devices. The model latency (expressed in FPS) was measured on GPU with WebGL, as well as WebAssembly (WASM), which is the typical backend for devices with lower-end or no GPUs.

MacBook Pro 15” 2019. 

Intel core i9. 

AMD Radeon Pro Vega 20 Graphics.

(FPS)

iPhone 12

(FPS)

Pixel 5

(FPS)

Desktop 

Intel i9-10900K. Nvidia GTX 1070 GPU.

(FPS)

WebGL

104  |  77

51  |  43

34  |  12

87  |  82

WASM 

with SIMD + Multithread

42  |  21

N/A

N/A

71  |  30

Inference speed of MoveNet across different devices and TF.js backends. The first number in each cell is for Lightning, and the second number is for Thunder.

TF.js continuously optimizes its backends to accelerate model execution across all supported devices. We applied several techniques here to help the models achieve this performance, such as implementing a packed WebGL kernel for the depthwise separable convolutions and improving GL scheduling for mobile Chrome.

To see the model’s FPS on your device, try our demo. You can switch the model type and backends live in the demo UI to see what works best for your device.

Looking Ahead

The next step is to extend Lightning and Thunder models to the multi-person domain, so that developers can support applications with multiple people in the camera field-of-view.

We also have plans to speed up the TensorFlow.js backends to make model execution even faster. This is achieved through repeated benchmarking and backend optimization.

Acknowledgements

We would like to acknowledge the other contributors to MoveNet: Yu-Hui Chen, Ard Oerlemans, Francois Belletti, Andrew Bunner, and Vijay Sundaram, along with those involved with the TensorFlow.js pose-detection API: Ping Yu, Sandeep Gupta, Jason Mayes, and Masoud Charkhabi.

Read More