Stanford Papers and Workshops at RSS 2020

Stanford Papers and Workshops at RSS 2020

Robotics: Science and Systems (RSS) 2020 is being hosted virtually from July 12th – June 16th. We’re excited to share all the work from SAIL and Stanford Robotics that’s being presented! Below you’ll find links to each paper, as well as the authors’ five-minute presentation of their research. Feel free to reach out to the contact authors directly to learn more about what’s happening at Stanford.

In addition to these papers, SAIL students and faculty are involved in organizing several workshops at RSS. We invite you to attend these workshops, where you can hear from amazing speakers, interact, and ask questions! Workshop attendance is completely virtual: we provide information and links to these workshops at the bottom of this post.

List of Accepted Papers

Shared Autonomy with Learned Latent Actions

Authors: Hong Jun Jeon, Dylan P. Losey, Dorsa Sadigh


Links: Paper | Experiments

Keywords: human-robot interaction, learning, control

Reinforcement Learning based Control of Imitative Policies for Near-Accident Driving

Authors: Zhangjie Cao*, Erdem Bıyık*, Woodrow Z. Wang, Allan Raventos, Adrien Gaidon, Guy Rosman, Dorsa Sadigh


Links: Paper | Experiments

Keywords: autonomous driving, near-accident scenarios, hierarhical reinforcement learning, hierarchical control, imitation learning

Active Preference-based Gaussian Process Regression for Reward Learning

Authors: Erdem Bıyık*, Nicolas Huynh*, Mykel J. Kochenderfer, Dorsa Sadigh


Links: Paper | Experiments

Keywords: active learning, preference-based learning, gaussian processes, reward learning, inverse reinforcement learning

Dynamic Multi-Robot Task Allocation under Uncertainty and Temporal Constraints

Authors: Shushman Choudhury, Jayesh Gupta, Mykel J. Kochenderfer, Dorsa Sadigh, Jeannette Bohg


Links: Paper

Keywords: multi-robot systems, task allocation

GTI: Learning to Generalize across Long-Horizon Tasks from Human Demonstrations

Authors: Ajay Mandlekar*, Danfei Xu*, Roberto Martin-Martin, Silvio Savarese, Li Fei-Fei


Links: Paper

Keywords: imitation learning, manipulation, robotics

Concept2Robot: Learning Manipulation Concepts from Instructions and Human Demonstrations

Authors: Lin Shao, Toki Migimatsu, Qiang Zhang, Karen Yang, Jeannette Bohg


Links: Paper

Keywords: concept learning; manipulation; learning from demonstration; natural language grounding

ALGAMES: A Fast Solver for Constrained Dynamic Games

Authors: Simon Le Cleac’h, Mac Schwager, Zachary Manchester


Links: Paper | Blog

Keywords: game theory, optimization, MPC, autonomous driving

List of Organized Workshops

Emergent Behaviors in Human-Robot Systems

Link: Homepage | Zoom (contact for password)

Prerecorded Talks by: Brenna Argall, Anca Dragan, Judith Fan, Jakob Foerster, Robert D. Hawkins, Maja Matarić, Igor Mordatch, Harold Soh, Mac Schwager

Live Events on July 12th (PDT): Panel from 09:30 AM – 10:30 AM | Spotlight talks from 10:30 AM – 11:00 AM

Organizers: Erdem Bıyık, Minae Kwon, Dylan Losey, Noah Goodman, Stefanos Nikolaidis, Dorsa Sadigh

Robotics Retrospectives

Link: Homepage

Prerecorded Talks by: Stefan Schaal. Karen Liu, Sangbae Kim, Andy Zeng, David Hsu, Leslie Kaebling, Nicolas Heess, Angela Schoellig

Live Events on July 12th (PDT): Panels from 09:00 AM – 10:00 AM and 05:00 PM – 06:00 PM | Retrospective sessions from 10:00 AM – 11:00 AM and 04:00 PM – 05:00 PM

Organizers: Franziska Meier, Akshara Rai, Arunkumar Byravan, Jeannette Bohg

Advances & Challenges in Imitation Learning for Robotics

Link: Homepage

Prerecorded Talks by: Anca Dragan, Maya Cakmak, Yuke Zhu, Byron Boots, Abhinav Gupta, Oliver Kroemer

Live Events on July 12th (PDT): Paper discussions from 9:00 AM – 10:30 AM | Virtual coffee break from 10:30 AM – 11:00 AM | Panel from 11:00 AM – 1:30 PM

Organizers: Scott Niekum, Akanksha Saran, Yuchen Cui, Nick Walker, Andreea Bobu, Ajay Mandlekar, Danfei Xu

Action Representations for Learning in Continuous Control

Link: Homepage

Live Talks by: Daniel Braun, Dagmar Sternad, Marc Toussaint, Franziska Meier, Michiel van de Panne

Live Events on July 13th (PDT): Talks from 8:00 AM – 11:50 AM (including spotlights from 9:50 AM – 10:50 AM) | Panel at 11:50 AM

Organizers: Tamim Asfour, Miroslav Bogdanovic, Jeannette Bohg, Animesh Garg, Roberto Martin-Martin, Ludovic Righetti

We look forward to seeing you at RSS!

Read More

Giving your content a voice with the Newscaster speaking style from Amazon Polly

Giving your content a voice with the Newscaster speaking style from Amazon Polly

Audio content consumption has grown exponentially in the past few years. Statista reports that podcast ad revenue will exceed a billion dollars in 2021. For the publishing industry and content providers, providing audio as an alternative option to reading could improve engagement with users and be an incremental revenue stream. Given the shift in customer trends to audio consumption, Amazon Polly launched a new speaking style focusing on the publishing industry: the Newscaster speaking style. This post discusses how the Newscaster voice was built and how you can use the Newscaster voice with your content in a few simple steps.

Building the Newscaster style voice

Until recently, Amazon Polly voices were built such that the speaking style of the voice remained the same, no matter the use case. In the real world, however, speakers change their speaking style based on the situation at hand, from using a conversational style around friends to using upbeat and engaging speech when telling stories. To make voices as lifelike as possible, Amazon Polly has built two speaking style voices: Conversational and Newscaster. Newscaster style, available in US English for Matthew and Joanna, and US Spanish for Lupe, gives content a voice with the persona of a news anchor. Have a listen to the following samples:

Listen now

Listen now

With the successful implementation of Neural Text-to-Speech (NTTS), text synthesis no longer relies on a concatenative approach, which mainly consisted of finding the best chunks of recordings to generate synthesized speech. The concatenative approach played audio that was an exact copy of the recordings stored for that voice. NTTS, on the other hand, relies on two end-to-end models that predict waveforms, which results in smoother speech with no joins. NTTS outputs waveforms by learning from training data, which enables seamless transitions between all the sounds and allows us to focus on the rhythm and intonation of the voice to match the existing voice timbre and quality for Newscaster speaking style.

Remixd, a leading audio technology partner for premium publishers, helps publishers and media owners give their editorial content a voice using Amazon Polly. Christopher Rooke, CEO of Remixd, says, “Consumer demand for audio has exploded, and content owners recognize that the delivery of journalism must adapt to meet this moment. Using Amazon Polly’s Newscaster voice, Remixd is helping news providers innovate and keep up with demand to serve the growing customer appetite for audio. Remixd and Amazon Polly make it easy for publishers to remain relevant as content consumption preferences shift.”

Remixd uses Amazon Polly to provide audio content production efficiencies at scale, which makes it easy for publishers to instantly enable audio for new and existing editorial content in real time without needing to invest in costly human voice talent, narration, and pre- and post-production overhead. Rooke adds, “When working with news content, where information is time-sensitive and perishable, the voice quality, and the ability to process large volumes of content and publish the audio version in just a few seconds, is critical to service our customer base.” The following screenshot shows Remixd’s audio player live on one of their customer’s website Daily Caller.

“At the Daily Caller, it’s a priority that our content is accessible and convenient for visitors to consume in whichever format they prefer,” says Chad Brady, Director of Operations of the Daily Caller. “This includes audio, which can be time-consuming and costly to produce. Using Remixd, coupled with Amazon Polly’s high-quality newscaster voice, Daily Caller editorial articles are made instantly listenable, enabling us to scale production and distribution, and delight our audience with a best-in-class audio experience both on and off-site.”

The new NTTS technology enables newscaster voices to be more expressive. However, although the expressiveness vastly increases how natural the voice sounds, it also makes the model more susceptible to discrepancies. NTTS technology learns to model intonation patterns for a given punctuation mark based on data it was provided. Because the intonation patterns are much more extreme for style voices, good annotation of the training data is essential. The Amazon Polly team trained the model with an initial small set of newscaster recordings in addition to the existing recordings from the speakers. Having more data leads to more robust models, but to build a model in a cost- and time-efficient manner, the Amazon Polly team worked on concepts such as multi-speaker models, which allow you to use existing resources instead of needing more recordings from the same speaker.

Evaluations have shown that our newscaster voice is preferred over the neutral speaking style for voicing news content. The following histogram shows results for the Joanna Newscaster voice when compared to other voices for the news use case.

Using Newscaster style to voice your audio content

To use the Newscaster style with Python, complete the following steps (this solution requires Python 3):

  1. Set up and activate your virtual environment with the following code:
    $ python3 -m virtualenv ./venv
    $ . ./venv/bin/activate

  2. Install the requirements with the following code:
    $ pip install boto3 click

  3. In your preferred text editor, create a file See the following code:
    import boto3
    import click
    import sys
    polly_c = boto3.client('polly')
    def main(voice, text):
        if voice not in ['Joanna', 'Matthew', ‘Lupe’]:
            print('Only Joanna, Matthew and Lupe support the newscaster style')
        response = polly_c.synthesize_speech(
                       Text = f'<speak><amazon:domain name="news">{text}></amazon:domain></speak>')
        f = open('newscaster.mp3', 'wb')
    if __name__ == '__main__':

  4. Run the script passing the name and text you want to say:
    $ python ./ Joanna "Synthesizing the newsperson style is innovative and unprecedented. And it brings great excitement in the media world and beyond."

This generates newscaster.mp3, which you can play in your favorite media player.


This post walked you through the Newscaster style and how to use it in Amazon Polly. The Matthew, Joanna, and Lupe Newscaster voices are used by customers such as The Globe and Mail, Gannetts’ USA Today, DailyCaller and many others.

To learn more about using the Newscaster style in Amazon Polly, see Using the Newscaster Style. For the full list of voices that Amazon Polly offers, see Voices in Amazon Polly.

About the Authors

Joppe Pelzer is a Language Engineer working on text-to-speech for English and building style voices. With bachelor’s degrees in linguistics and Scandinavian languages, she graduated from Edinburgh University with an MSc in Speech and Language Processing in 2018. During her masters she focused on the text-to-speech front end, building and expanding upon multilingual G2P models, and has gained experience with NLP, Speech recognition and Deep Learning. Outside of work, she likes to draw, play games, and spend time in nature.



Ariadna Sanchez is a Research Scientist investigating the application of DL/ML technologies in the area of text-to-speech. After completing a bachelor’s in Audiovisual Systems Engineering, she received her MSc in Speech and Language Processing from University of Edinburgh in 2018. She has previously worked as an intern in NLP and TTS. During her time at University, she focused on TTS and signal processing, especially in the dysarthria field. She has experience in Signal Processing, Deep Learning, NLP, Speech and Image Processing. In her free time, Ariadna likes playing the violin, reading books and playing games.





Read More

Duality — A New Approach to Reinforcement Learning

Duality — A New Approach to Reinforcement Learning

Posted by Ofir Nachum and Bo Dai, Research Scientists, Google Research

Reinforcement learning (RL) is an approach commonly used to train agents to make sequences of decisions that will be successful in complex environments, including for example, settings such as robotic navigation, where an agent controls the joint motors of a robot to seek a path to a target location, or game-playing, where the goal might be to solve a game level in minimal time. Many modern successful RL algorithms, such as Q-learning and actor-critic, propose to reduce the RL problem to a constraint-satisfaction problem, where a constraint exists for every possible “state” of the environment. For example, in vision-based robotic navigation, the “states” of the environment correspond to every possible camera input.

Despite how ubiquitous the constraint-satisfaction approach is in practice, this strategy is often difficult to reconcile with the complexity of real-world settings. In practical scenarios (like the robotic navigation example) the space of states is large, sometimes even uncountable, so how can one learn to satisfy the tremendous number of constraints associated with arbitrary input? Implementations of Q-learning and actor-critic often ignore these mathematical issues or obscure them through a series of rough approximations, which results in a stark divide between the practical implementations of these algorithms and their mathematical foundations.

In “Reinforcement Learning via Fenchel-Rockafellar Duality” we have developed a new approach to RL that enables algorithms that are both useful in practice and mathematically principled — that is to say, the proposed algorithms avoid the use of exceedingly rough approximations to translate their mathematical foundations to practical implementation. This approach is based on convex duality, which is a well-studied mathematical tool used to transform problems expressed in one form into equivalent problems in distinct forms that may be more computationally friendly. In our case, we develop specific ways to apply duality in RL to transform the traditional constraint-satisfaction mathematical form to an unconstrained, and thus more practical, mathematical problem.

A Duality-Based Solution
The duality-based approach begins by formulating the reinforcement learning problem as a mathematical objective along with a number of constraints, potentially infinite in number. Applying duality to this mathematical problem yields a different formulation of the same problem. Still, this dual formulation has the same format as the original problem — a single objective with a large number of constraints — although the specific objective and constraints are changed.

The next step is key to the duality-based solution. We augment the dual objective with a convex regularizer, a method often used in optimization as a way to smooth a problem and make it easier to solve. The choice of the regularizer is crucial to the final step, in which we apply duality once again to yield another formulation of an equivalent problem. In our case, we use the f-divergence regularizer, which results in a final formulation that is now unconstrained. Although there exist other choices of convex regularizers, regularization via the f-divergence is uniquely desirable for yielding an unconstrained problem that is especially amenable to optimization in practical and real-world settings which require off-policy or offline learning.

Notably in many cases, the applications of duality and regularization prescribed by the duality-based approach do not change the optimality of the original solution. In other words, although the form of the problem has changed, the solution has not. This way, the result obtained with the new formulation is the same result as for the original problem, albeit achieved in a much easier way.

Experimental Evaluation
As a test of our new approach, we implemented duality-based training on a navigational agent. The agent starts at one corner of a multi-room map and must navigate to the opposite corner. We compare our algorithm to an actor-critic approach. Although both of these algorithms are based on the same underlying mathematical problem, actor-critic uses a number of approximations due to the infeasibility of satisfying the large number of constraints. In contrast, our algorithm is more amenable to practical implementation as can be seen by comparing the performance of the two algorithms. In the figure below, we plot the average reward achieved by the learned agent against the number of iterations of training for each algorithm. The duality-based implementation achieves significantly higher reward compared to actor-critic.

A plot of the average reward achieved by an agent using the duality-based approach (blue) compared to an agent using standard actor-critic (orange). In addition to being more mathematically principled, our approach also yields better practical results.

In summary, we’ve shown that if one formulates the RL problem as a mathematical objective with constraints, then repeated applications of convex duality in conjunction with a cleverly chosen convex regularizer yield an equivalent problem without constraints. The resulting unconstrained problem is easy to implement in practice and applicable in a wide range of settings. We’ve already applied our general framework to agent behavior policy optimization as well as policy evaluation, and imitation learning. We’ve found that our algorithms are not only more mathematically principled than existing RL methods, but they also often yield better practical performance, showing the value of unifying mathematical principles with practical implementation.

Read More

Accelerating innovation: How serverless machine learning on AWS powers F1 Insights

Accelerating innovation: How serverless machine learning on AWS powers F1 Insights

FORMULA 1 (F1) turns 70 years old in 2020 and is one of the few sports that combines real-time skill with engineering and technical prowess. Technology has always played a central role in F1; where the evolution of the rules and tools is built into the DNA of F1. This keeps fans engaged and drivers and teams always pushing as races are won and lost in tenths of a second.

With pit stops from well over a minute to under 2 seconds, 5g cornering and braking, speeds up to 375 KPH, and racing in 22 countries, no sport has been as dynamic in its evolution and embrace of new technology. FORMULA 1 seeks to innovate continuously and some of the latest innovations are going to enhance the experiences of its growing base of over a half a billion fans, and an improved understanding of what happens on and off the track through the power of data and analytics, by bringing the split-second decisions made by drivers and teams to the viewers.

With 300 sensors on each race car generating 1.1M data points per second transmitted from the car to the pit, the fan experience has downshifted from reactive to real time, which accelerates the action on the track. F1 can pinpoint how a driver is performing and whether they are pushing the car over the limit by using cloud-native technologies, such as machine learning (ML) models created in Amazon SageMaker and hosted on AWS Lambda. As a result, they can predict the outcome of an overtake or pit stop battle. They can share these insights immediately with fans all over the world through broadcast partners and digital platforms.

This post takes a deep dive into how the Amazon ML Solutions Lab and Professional Services Teams worked with F1 to build a real-time race strategy prediction application using AWS technology that brings “Pit-Wall” decisions to the viewer and resulted in the Pit Strategy Battle graphic. The post discusses race strategies and how to translate them into application logic, all while working backwards from a concept with multiple teams in parallel. You can also learn how a serverless architecture can provide ML predictions with minimal latency across the globe, and how to get started on your own ML journey.

To pit or not to pit

To a fan, 20 drivers and 10 teams on the race track can feel like chaos. But drivers and engineers employ different strategies to get more out of their race cars and get an edge over their competitors. While some are well-calculated risks and others are wild gambles, all are critical to a race outcome, sometimes coming down to split seconds, and all contribute to the spectacular adrenaline rush that keeps fans coming back for more. F1 wants to pull back the curtain for their fans to provide a glimpse into how they make these decisions and their impact on battles as they unfold.

Tire condition is a critical factor that affects the performance of a race car. It is not possible for a driver to stay competitive and finish a race on a single set of tires. Teams choose between varying tire compounds that balance performance and resilience. Softer compounds provide superior grip and handling in exchange for faster degradation, and harder compounds provide superior durability but limit cornering speed and traction. Drivers and teams decide when and how often to pit, but the rules require that drivers make a pit stop at least once per grand prix.

A fresh set of tires can significantly boost a vehicle’s performance, thus increasing the driver’s chance of overtaking another car. However, this comes at a cost—around 20 seconds on average to make a pit stop. Careful planning and execution of when to pit relative to your opponents may give the advantage that delivers victory.

Imagine a battle between two drivers: driver 1 and driver 2. Driver 1 leads and is trying to defend his position, with driver 2 gaining ground to attempt an overtake, which already proves challenging despite his faster pace. Considering that both drivers need to change tires at least once, driver 2 might choose to pit first to get a performance advantage. By pitting early, driver 2 now has the upper hand to close the gap between the cars because driver 1’s tire degradation limits his performance. If driver 2 catches back up to driver 1 after pitting, he can overtake when driver 1 is finally forced to pit. This strategy is called an undercut.

While this may seem obvious, the opposite strategy, an overcut, is sometimes also the case. Driver 2 may decide to push his car as far as he can, hoping that driver 1 pits first, possibly gambling that driver 1’s tires are wearing faster. The calculation here is that having no traffic ahead might be the advantage that driver 2 needs to get ahead. When executed well, the chaser overtakes the leader after an eventual pit stop. With more than two drivers on the track, this gets complex fairly quickly. A given driver is a chaser to some and a leader to others, and such battles may take multiple laps to unfold. For spectators during the chaos of the race, it is nearly impossible to track which drivers have the advantage and which strategies teams employ. Even the most die-hard F1 fan benefits from data analytics to make the complex simple.

F1 partnered with AWS to build new F1 insights, working backwards to build ML models to track pit battles and improve the viewing experience.

Working backwards

AWS starts with the customer and works backwards, which forces us to validate ideas against your values. A Working Backwards document includes three parts: a press release using customer-centric language to describe an idea at a high level, frequently asked questions that customers and internal stakeholders may ask, and visuals to help communicate the idea. When weighing the merits of an idea, it is important to sketch out all possible experience outcomes. It might be a whiteboard sketch, a workflow diagram, or a wire-frame. The following was the initial view for the Pit Strategy Battle use case:

This conceptual illustration allows stakeholders to align on a diverse set of outcomes and goals—graphics applications, application development, ML models, and more—and you can test it with a small user group to verify the desired outcomes. Also, it allows teams to break up the work in chunks to handle in parallel, such as the development of different graphic wire-frames (graphics), collection of data (operations), translation of race logic into application logic (development team), and building the ML models (ML team).

The Working Backwards model provided a clear vision from the outset. We aligned with F1’s broadcast partners on the types of messages and formats used, and illustrators created a video as a proof of concept for the on-screen graphics team.

We used Amazon SageMaker notebooks to do exploratory analysis and visualize large quantities of timing, tire, and weather data uploaded to Amazon S3 to understand how the race looks from an algorithm’s point of view. We determined what strategies were used during past races and what factors determined outcomes, and endlessly replayed races to see what historical features we could extract for our ML models and how to extract those features during a live race.

Having extracted and cleaned the relevant data from various sources, we started on ML tasks. When you start an ML project, you are rarely certain of the best possible outcome that you can achieve. To experiment and iterate quickly, we set two key performance indicators (KPIs):

  • Business KPIs – These are designed to communicate the progress to all relevant stakeholders, such as the percentage of predictions within a certain boundary.
  • Technical KPIs – These are used to optimize the model, such as root mean square error.

You can use these KPIs, technical requirements, and a set output format in validation code that allows for quick experimentation with feature engineering and various algorithms to optimize for prediction error.

Implementing the architecture

When we were designing how the application architecture would look, we faced many requirements, some of which seemed contradictory at first glance. We achieved our goals by using cloud-native AWS services while focusing on what mattered and spending little overhead on maintenance. And the pay-as-you-go model allowed us to keep costs relatively low.

Architecture overview

The following diagram shows the architecture in detail:

When a signal is captured at the race track, it begins its journey, first passing via F1 infrastructure, then as an HTTP call to the AWS Cloud. Amazon API Gateway acts as the entry point to the application, which is hosted as a function in Lambda, which implements the race logic. When the function receives the incoming message, it updates the race state stored in Amazon DynamoDB (for example, a change in driver position). After the update is finished, the function evaluates whether this is a trigger for prediction. If so, it uses the model trained in Amazon SageMaker to make the prediction. The prediction is sent back as a response to the call and ingested back to the F1 infrastructure. It returns to the broadcasting center and is ready for the race director to use. We needed the whole process to complete in less than 500 milliseconds.

Picking the right tools

The first challenge was that we didn’t know in advance what approaches would work, especially given the tight deadlines. We had to pick a set of tools that would enable us to prototype fast, validate, and experiment, and enable us to move quickly from a proof of concept to a production-ready application. We used serverless products offered by AWS, such as Lambda, API Gateway, DynamoDB, Amazon CloudWatch, and S3. For example, we hosted a prototype on Lambda with zero operational overhead, and when we were satisfied with the results, we could move the application into production with a simple script. We didn’t have to worry about provisioning infrastructure because Lambda automatically scales up your resources when the rate of requests increases. When the race finished, the resources were released without the need for manual actions. Because the predictions are made live, it is critical to have an infrastructure with high availability. Traditionally, building such an infrastructure would require a dedicated skilled team of system engineers. Lambda readily offers highly available infrastructure for any applications.

When the application received a message from the track, the content of a single message was never enough to trigger a prediction. For example, a position change of one driver doesn’t tell much about the whole situation on the track. Because predictions take comprehensive inputs that include past and present situations on the track, we had to employ a database to store and manage the state of the race. DynamoDB was a crucial tool for storing the race state, timing data, the strategies that we were monitoring, and features for ML models. DynamoDB provides single-digit millisecond performance regardless of the number of rows in a table, with no operational overhead. We didn’t have to spend time spinning up and managing database clusters or worrying about uptime.

To automate our iterations from prototype to production, we used CI/CD tools, including AWS CodePipeline and AWS CodeBuild, to segregate environments and move the code to production when it was ready. We used AWS CloudFormation to implement an approach called infrastructure as code (IaC) to provision environments and have predictable deployments.

We used most of these resources only during live races or tests, so we wanted to pay for only the consumed resources. To avoid paying for over-provisioning, we would need to manually start and stop components. The services that we used offer a pay-as-you-go model; the bill included only the exact amount of storage that we used, and the number of calls determined the charges for computational resources. This was possible because we hosted the model on Lambda which is an alternative to hosting models on SageMaker end-points. For more detailed information about hosting models on Lambda you can take a look at this blog post.

Accuracy and performance

When it came to ML models, we based our requirements on accuracy and runtime performance. To achieve accuracy, we needed tools that would enable us to test approaches fast, experiment, and iterate. To train the models, we used Amazon SageMaker; its built-in algorithm XGBoost is a popular and efficient open-source implementation of the gradient boosted trees algorithm. We carefully analyzed racing data and model predictions to extract features that are available in the race data. After we finished the optimal design of the model and input features, we trained the models on historical race data using training jobs in Amazon SageMaker. The benefit of this feature is that it fully implements provisioning and de-provisioning of the resources, while the data scientists can focus on the optimization of the model. In addition, SageMaker allows you to control the instance types and number of instances that you use for training. This is particularly useful when training large data sets.

Although the training time of the algorithms was fairly straightforward, inference had to happen in real time. F1 serves a live stream to hundreds of millions of viewers around the world; for a sport that is being decided in milliseconds, data that is even a few seconds old is obsolete. To meet the required response time, we loaded the model trained in Amazon SageMaker into the application hosted on Lambda and implemented the inference in the function code. Because the model stayed loaded in memory right next to the running code, we could cut the invocation overhead to a bare minimum. We used the built-in open-source algorithm XGBoost to train the model. We recorded the model into a smaller and higher performing format using an additional open-source package, which boosted inference speed and reduced deployment size. Because we hosted the application and models in Lambda, we could scale the infrastructure elastically and easily keep up with the varying prediction rates during the race without operational interventions.

The choice of tools and services is fundamental to a project’s success. Thanks to the breadth and depth of services offered by AWS, we could pick the best-suited tools for our requirements and operational model. And serverless technologies freed up time spent on infrastructure upkeep so we could focus on what mattered most.


The Pit Strategy Battle insight was released on March 17, 2019, at the Australian Grand Prix at the official start of the 2019 F1 season. To show the Pit Strategy Battle graphic used to its fullest potential, we traveled to Bahrain on March 31 for the Bahrain Grand Prix. The Grand Prix was one of the most exhilarating races in the 2019 season, and it was also the stage for a top-class display of Mercedes performing the undercut strategy. The following short clip shows Hamilton chasing down Vettel on fresh new tires from his pit stop one lap earlier, attempting to overtake Vettel while he is making his pit stop on lap 14.

The video shows how Hamilton pulled off a successful undercut. The graphic was used to build the suspense and help the viewer understand what was happening on the track. The application provided live predictions for both the predicted time gap and the overtake probability by using ML models trained on historical data, all within 500 milliseconds.


Despite F1’s history of technical innovation, we’re just getting started with the volume of data we can now collect—over 300 sensors in each race car produces over 1.1M data points per second. This post showed how the AWS Professional Services team worked with F1 to take this data and apply ML and analytics to help fans get insights and better understand the race. Multiple teams created a shared understanding and had clarity on the end goal by working backwards, which allowed us to work in parallel. This can greatly accelerate a project and remove bottlenecks.

Much like other businesses, F1 is trying to make sense of chaos. You can apply the higher-level services and underlying principles we used to any industry. The use of Lambda for application hosting, DynamoDB for storage, and Amazon SageMaker for model training allows developers and data scientists to focus on what matters. Rather than spending time building and maintaining infrastructure or worrying about uptime and costs, you can focus on translating business knowledge to application logic, experimenting, and iterating quickly.

Whether it’s a company building websites that wants to offer personalized products, factories that want to run more efficiently, or farms that want to increase yield, you can benefit from using data in your respective businesses to develop faster and scale quicker. AWS Professional Services are ready to supplement your team with specialized skills and experience to help you achieve your business outcomes. For more information, see AWS Professional Services, or reach out through your account manager to get in touch.

About the authors

Luuk Figdor is a data scientist in the AWS Professional Services team. He works with clients across industries to help them tell stories with data using machine learning. In his spare time he likes to learn all about the mind and the intersection between psychology, economics and AI.




Andrey Syschikov is a full-stack technologist in the AWS Professional Services team. He helps customers to fulfil their ideas into innovative cloud-based applications. In the rare moments when Andrey is not next to a computer, he enjoys audiobooks, playing piano, and hiking.





Read More

Screening for COVID-19: Japanese Startup Uses AI for Drug Discovery

Screening for COVID-19: Japanese Startup Uses AI for Drug Discovery

Researchers are racing to discover the right drug molecule to treat COVID-19 — but the number of potential drug-like molecules out there is estimated to be an inconceivable 1060.

“Even if you hypothetically checked one molecule per second, it would take longer than the age of the universe to explore the entire chemical space,” said Shinya Yuki, co-founder and CEO of Tokyo-based startup Elix, Inc. “AI can efficiently explore huge search spaces to solve difficult problems, whether in drug discovery, materials development or a game like Go.”

Yuki’s company is using deep learning to accelerate drug discovery, building neural networks that predict the properties of molecules much faster than computer simulations can. To support COVID-19 research, the team is using AI to find drugs that are FDA-approved or in clinical trials that could be repurposed to treat the coronavirus.

“Developing a new drug from scratch is a years-long process, which is unwanted especially in this pandemic situation,” Yuki said. “Speed is critical, and drug-repurposing can help identify candidates with an existing clinical safety record, significantly reducing the time and cost of drug development.”

Elix recently published a paper on approved and clinical trial-stage drugs that its AI model flagged for potential COVID-19 treatments. Among the candidates selected by Elix’s AI tool was remdevisir, an antiviral drug that recently received emergency use authorization from the FDA for coronavirus cases.

A member of NVIDIA Inception, a program that helps startups get to market faster, Elix uses the NVIDIA DGX Station for training and inference of its deep learning algorithms. Yuki spoke about the company’s work in AI for drug discovery in the Inception Startup Showcase at GTC Digital, NVIDIA’s digital conference for developers and AI researchers.

Elix’s AI Fix for Drug Discovery

At the molecular level, a successful drug must have the perfect combination of shape, flexibility and interaction energies to bind to a target protein — like the spike proteins that cover the viral envelope of SARS-CoV-2, the virus that causes COVID-19.

SARS-CoV-2, the virus that causes COVID-19, has a surface covered in protein spikes. Image credit: CDC/ Alissa Eckert, MSMI; Dan Higgins, MAMS. Licensed under public domain.

A person gets infected with COVID-19 when these spike proteins attach to cells in the body, bringing the virus into the cells. An effective antiviral drug might interfere with this attachment process. For example, a promising drug molecule would bind with receptors on the spike proteins, preventing the virus from attaching to human cells.

To help researchers find the best drug for the job, Elix uses a variety of neural networks to rapidly narrow down the field of potential molecules. This allows researchers to reserve physical tests in the lab for a smaller subset of molecules that have a higher likelihood of solving the problem.

With predictive AI models, Yuki’s team can analyze a database of drug candidates to infer which have the right physical and chemical properties to treat a given disease. They also use generative models, which start from scratch to come up with promising molecular structures — some of which may not be found in nature.

That’s where a third neural network comes in, a retrosynthesis model that helps researchers figure out if the generated molecules can be synthesized in the lab.

Elix uses multiple NVIDIA DGX Station systems — GPU-powered AI workstations for data science development teams — to accelerate training and inference of these neural networks, achieving up to a 6x speedup using a single GPU for training versus a CPU.

Yuki says the acceleration is essential for the generative models, which would otherwise take a week or more to train until convergence, when the neural network reaches the lowest error rate possible. Each DGX Station has four NVIDIA V100 Tensor Core GPUs, enabling the Elix team to tackle bigger AI models and run multiple experiments at once.

“DGX Stations are basically supercomputers. We usually have several users working on the same machine at the same time,” he said. “We can not only train models faster, we can also run up to 15 experiments in parallel.”

The startup’s customers include pharmaceutical companies, research institutes and universities. Since molecular data is sensitive intellectual property for the pharma industry, most choose to run the AI models on their own on-prem servers.

Beyond drug discovery, Elix also uses AI for molecular design for material informatics, working with companies like tire- and rubber-manufacturer Bridgestone and RIKEN, Japan’s largest research institution. The company also develops computer vision models for autonomous driving and AI at the edge.

In one project, Yuki’s team worked with global chemical company Nippon Shokubai to generate a molecule that can be used as a blending material for ink, while posing a low risk of skin irritation.

Learn more about Elix in Yuki’s GTC Digital lightning talk. Visit our COVID page to explore how other startups are using AI and accelerated computing to fight the pandemic.

Main image by Chaos, licensed from Wikimedia Commons under CC BY-SA 3.0

The post Screening for COVID-19: Japanese Startup Uses AI for Drug Discovery appeared first on The Official NVIDIA Blog.

Read More

NVIDIA Ampere GPUs Come to Google Cloud at Speed of Light

NVIDIA Ampere GPUs Come to Google Cloud at Speed of Light

The NVIDIA A100 Tensor Core GPU has landed on Google Cloud.

Available in alpha on Google Compute Engine just over a month after its introduction, A100 has come to the cloud faster than any NVIDIA GPU in history.

Today’s introduction of the Accelerator-Optimized VM (A2) instance family featuring A100 makes Google the first cloud service provider to offer the new NVIDIA GPU.

A100, which is built on the newly introduced NVIDIA Ampere architecture, delivers NVIDIA’s greatest generational leap ever. It boosts training and inference computing performance by 20x over its predecessors, providing tremendous speedups for workloads to power the AI revolution.

“Google Cloud customers often look to us to provide the latest hardware and software services to help them drive innovation on AI and scientific computing workloads, ” said Manish Sainani, director of Product Management at Google Cloud. “With our new A2 VM family, we are proud to be the first major cloud provider to market NVIDIA A100 GPUs, just as we were with NVIDIA T4 GPUs. We are excited to see what our customers will do with these new capabilities.”

In cloud data centers, A100 can power a broad range of compute-intensive applications, including AI training and inference, data analytics, scientific computing, genomics, edge video analytics, 5G services, and more.

Fast-growing, critical industries will be able to accelerate their discoveries with the breakthrough performance of A100 on Google Compute Engine. From scaling up AI training and scientific computing, to scaling out inference applications, to enabling real-time conversational AI, A100 accelerates complex and unpredictable workloads of all sizes running in the cloud. 

NVIDIA CUDA 11, coming to general availability soon, makes accessible to developers the new capabilities of NVIDIA A100 GPUs, including Tensor Cores, mixed-precision modes, multi-instance GPU, advanced memory management and standard C++/Fortran parallel language constructs.

Breakthrough A100 Performance in the Cloud for Every Size Workload

The new A2 VM instances can deliver different levels of performance to efficiently accelerate workloads across CUDA-enabled machine learning training and inference, data analytics, as well as high performance computing.

For large, demanding workloads, Google Compute Engine offers customers the a2-megagpu-16g instance, which comes with 16 A100 GPUs, offering a total of 640GB of GPU memory and 1.3TB of system memory — all connected through NVSwitch with up to 9.6TB/s of aggregate bandwidth.

For those with smaller workloads, Google Compute Engine is also offering A2 VMs in smaller configurations to match specific applications’ needs.

Google Cloud announced that additional NVIDIA A100 support is coming soon to Google Kubernetes Engine, Cloud AI Platform and other Google Cloud services. For more information, including technical details on the new A2 VM family and how to sign up for access, visit the Google Cloud blog.

The post NVIDIA Ampere GPUs Come to Google Cloud at Speed of Light appeared first on The Official NVIDIA Blog.

Read More