Q&A with Sameer Hinduja and Justin Patchin of the Cyberbullying Research Center

In this monthly interview series, we turn the spotlight on members of the academic community and the important research they do — as partners, collaborators, consultants, and independent contributors.

For April, we nominated an academic duo: Sameer Hinduja (Florida Atlantic University) and Justin Patchin (University of Wisconsin–Eau Claire) of the Cyberbullying Research Center, which they created as a one-stop-shop for the most recent research on cyberbullying among adolescents. Patchin and Hinduja are top industry consultants, and they provide valuable insights that better inform our content policies. In this Q&A, they share more about their background, the formation of the Cyberbullying Research Center, their contributions to Facebook and Instagram, and their current academic research.

Q: Tell us about your backgrounds in academia.

A: We both earned master’s and doctoral degrees in criminal justice from Michigan State University. Upon entering graduate school, Sameer was interested in emerging crime issues related to technology and Justin was interested in school violence and juvenile delinquency. When we observed adolescent behaviors online, we noticed bullying occurring among this population. We started systematically studying cyberbullying and quickly learned that it was affecting young people. Using data from thousands of youths over the last two decades, we’ve been able to contribute evidence-based insights about the experiences of youths online. Thankfully, it isn’t all bad! But we use results from our research to advise teens, parents, and others about safe online practices.

Q: How did the Cyberbullying Research Center form?

A: The Cyberbullying Research Center formed from our interest in studying cyberbullying behaviors, but also in more quickly disseminating information from our research to those who could benefit from it (parents, educators, youths). We wanted a platform where we could post timely results from our studies, in the form of blog posts, research briefs, and fact sheets. We still write academic journal articles and books, but we also want to produce resources that are more easily accessible to everyone. We wanted to create a one-stop-shop people could turn to for reliable information on youth cyberbullying and other online problems.

Q: How have you contributed your expertise to Facebook and Instagram?

A: Part of our mission as action researchers is to help people prevent and more adequately respond to cyberbullying and other problematic online behaviors among adolescents. This includes working with industry partners, like Facebook, to keep them up-to-date on the latest research and help inform their policies and practices concerning inappropriate behaviors. We are also trusted partners for Facebook and Instagram, so we are able to help flag abusive content on these platforms more quickly. We also routinely walk people who use these platforms through how to deal with problematic content on the apps so that they can have positive experiences. Sometimes it can be challenging navigating all the settings and reporting features, and we know these pretty well.

Q: What have you been working on lately?

A: We recently completed a study of tween cyberbullying for Cartoon Network and are currently planning to collect more data very soon on teen cyberbullying in the United States to see whether behaviors have changed as a result of the COVID-19 pandemic. We continue to write academic articles and are in the early stages of our next book. Finally, we continue to discuss various current events and issues at the intersection of youth and social media on our blog, and we regularly create new resources for youths and youth-serving adults to use.

Q: Where can people learn more about your work?

A: You can read about our research on our website or follow us on Facebook and Instagram @cyberbullyingresearch.

The post Q&A with Sameer Hinduja and Justin Patchin of the Cyberbullying Research Center appeared first on Facebook Research.

Read More

NVIDIA RTX Lights Up the Night in Stunning Demos at GTC

NVIDIA is putting complex night scenes in a good light.

A demo at GTC21 this week showcased how NVIDIA RTX Direct Illumination (RTXDI) technology is paving the way for realistic lighting in graphics. The clip shows thousands of dynamic lights as they move, turn on and off, change color, show reflections and cast shadows.

People can also experience the latest technologies in graphics with the new RTX Technology Showcase, a playable demo that allows developers to explore an attic scene and interact with elements while seeing the visual impact of real-time ray tracing.

Hero Lighting Gets a Boost with RTXDI

Running on an NVIDIA GeForce RTX 3090 GPU, the RTXDI demo shows how dynamic, animated lights can be rendered in real time.

Creating realistic night scenes in computer graphics requires lights to be simulated all at once. RTXDI does this by allowing developers and artists to create cinematic visuals with realistic lighting, incredible reflections and accurate shadows through real-time ray tracing.

Traditionally, creating realistic lighting required complex baking solutions and was limited to a small number of “hero” lights. RTXDI removes such barriers by combining ray tracing and a deep learning algorithm called spatio-temporal importance resampling (ReSTIR) to create realistic dynamic lighting.

Developers and artists can now easily integrate animated and color-changing lights into their scenes, without baking or relying on just a handful of hero lights.

Based on NVIDIA research, RTXDI enables direct lighting from millions of moving light sources, without requiring any complex data structures to be built. From fireworks in the sky to billboards in New York Times Square, all of that complex lighting can now be captured in real time with RTXDI.

And RTXDI works even better when combined with additional NVIDIA technology, such as:

Learn more and check out RTXDI, which is now available.

Hit the Light Spots in RTX Technology Showcase

The RTX Technology Showcase features discrete ray-tracing capabilities, so users can choose to turn on specific technologies and immediately view their effects within the attic scene.

Watch the RTX Technology Showcase in action:

Developers can download the demo to discover the latest and greatest in ray-tracing innovations with RTX Technology Showcase.

Check out other GTC demos that highlight the latest technologies in graphics, and a full track for game developers here. Watch a replay of the GTC keynote address by NVIDIA CEO Jensen Huang to catch up on the latest graphics announcements.

The post NVIDIA RTX Lights Up the Night in Stunning Demos at GTC appeared first on The Official NVIDIA Blog.

Read More

Healthcare Headliners Put AI Under the Microscope at GTC

Two revolutions are meeting in the field of life sciences — the explosion of digital data and the rise of AI computing to help healthcare professionals make sense of it all, said Daphne Koller and Kimberly Powell at this week’s GPU Technology Conference,.

Powell, NVIDIA’s vice president of healthcare, presented an overview of AI innovation in medicine that highlighted advances in drug discovery, medical imaging, genomics and intelligent medical instruments.

“There’s a digital biology revolution underway, and it’s generating enormous data, far too complex for human understanding,” she said. “With algorithms and computations at the ready, we now have the third ingredient — data — to truly enter the AI healthcare era.”

And Koller, a Stanford adjunct professor and CEO of the AI drug discovery company Insitro, focused on AI solutions in her talk outlining the challenges of drug development and the ways in which predictive machine learning models can enable a better understanding of disease-related biological data.

Digital biology “allows us to measure biological systems in entirely new ways, interpret what we’re measuring using data science and machine learning, and then bring that back to engineer biology to do things that we’d never otherwise be able to do,” she said.

Watch replays of these talks — part of a packed lineup of more than 100 healthcare sessions among 1,600 on-demand sessions — by registering free for GTC through April 23. Registration isn’t required to watch a replay of the keynote address by NVIDIA CEO Jensen Huang.

Data-Driven Insights into Disease

Recent advancements in biotechnology — including CRISPR, induced pluripotent stem cells and more widespread availability of DNA sequencing — have allowed scientists to gather “mountains of data,” Koller said in her talk, “leaving us with a problem of how to interpret those data.”

“Fortunately, this is where the other revolution comes in, which is that using machine learning to interpret and identify patterns in very large amounts of data has transformed virtually every sector of our existence,” she said.

The data-intensive process of drug discovery requires researchers to understand the biological structure of a disease, and then vet potential compounds that could be used to bind with a critical protein along the disease pathway. Finding a promising therapeutic is a complex optimization problem, and despite the exponential rise in the amount of digital data available in the last decade or two, the process has been getting slower and more expensive.

Daphne Koller, CEO of Insitro

Known as Eroom’s law, this observation finds that the research and development cost for bringing a new drug to market has trended upward since the 1980s, taking pharmaceutical companies more time and money. Koller says that’s because of all the potential drug candidates that fail to get approved for use.

“What we aim to do at Insitro is to understand those failures, and try and see whether machine learning — combined with the right kind of data generation — can get us to make better decisions along the path and avoid a lot of those failures,” she said. “Machine learning is able to see things that people just cannot see.”

Bringing AI to vast datasets can help scientists determine how physical characteristics like height and weight, known as phenotypes, relate to genetic variants, known as genotypes. In many cases, “these associations give us a hint about the causal drivers of disease,” said Koller.

She gave the example of NASH, or nonalcoholic steatohepatitis, a common liver condition related to obesity and diabetes. To study underlying causes and potential treatments for NASH, Insitro worked with biopharmaceutical company Gilead to apply machine learning to liver biopsy and RNA sequencing data from clinical trial data representing hundreds of patients.

The team created a machine learning model to analyze biopsy images to capture a quantitative representation of a patient’s disease state, and found even with just a weak level of supervision, the AI’s predictions aligned with the scores assigned by clinical pathologists. The models could even differentiate between images with and without NASH, which is difficult to determine with the naked eye.

Accelerating the AI Healthcare Era

It’s not enough to just have abundant data to create an effective deep learning model for medicine, however. Powell’s GTC talk focused on domain-specific computational platforms — like the NVIDIA Clara application framework for healthcare — that are tailored to the needs and quirks of medical datasets.

The NVIDIA Clara Discovery suite of AI libraries harnesses transformer models, popular in natural language processing, to parse biomedical deta. Using the NVIDIA Megatron framework for training transformers helps researchers build models with billions of parameters — like MegaMolBart, an NLP generative drug discovery model in development by NVIDIA and AstraZeneca for use in reaction prediction, molecular optimization and de novo molecular generation.

Kimberly Powell, VP of healthcare at NVIDIA

University of Florida Health has also used the NVIDIA Megatron framework and NVIDIA BioMegatron pre-trained model to develop GatorTron, the largest clinical language model to date, which was trained on more than 2 million patient records with more than 50 million interactions.

“With biomedical data at scale of petabytes, and learning at the scale of billions and soon trillions of parameters, transformers are helping us do and find the unexpected,” Powell said.

Clinical decisions, too, can be supported by AI insights that parse data from health records, medical imaging instruments, lab tests, patient monitors and surgical procedures.

“No one hospital’s the same, and no healthcare practice is the same,” Powell said. “So we need an entire ecosystem approach to developing algorithms that can predict the future, see the unseen, and help healthcare providers make complex decisions.”

The NVIDIA Clara framework has more than 40 domain-specific pretrained models available in the NGC catalog — including NVIDIA Federated Learning, which allows different institutions to collaborate on AI model development without sharing patient data with each other, overcoming challenges of data governance and privacy.

And to power the next generation of intelligent medical instruments, the newly available NVIDIA Clara AGX developer kit helps hospitals develop and deploy AI across smart sensors such as endoscopes, ultrasound devices and microscopes.

“As sensor technology continues to innovate, so must the computing platforms that process them,” Powell said. “With AI, instruments can become smaller, cheaper and guide an inexperienced user through the acquisition process.”

These AI-driven devices could help reach areas of the world that lack access to many medical diagnostics today, she said. “The instruments that measure biology, see inside our bodies and perform surgeries are becoming intelligent sensors with AI and computing.”

GTC registration is open through April 23. Attendees will have access to on-demand content through May 11. For more, subscribe to NVIDIA healthcare news, and follow NVIDIA Healthcare on Twitter.

The post Healthcare Headliners Put AI Under the Microscope at GTC appeared first on The Official NVIDIA Blog.

Read More

Auto-placement of ad campaigns using multi-armed bandits

What the research is:

We look at the problem of allocating the budget of an advertiser across multiple surfaces optimally when both the demand and the value are unknown. Consider an advertiser who uses the Facebook platform to advertise a product. They have a daily budget that they would like to spend on our platform. Advertisers want to reach users where they spend time, so they spread their budget over multiple platforms, like Facebook, Instagram, and others. They want an algorithm to help bid on their behalf on the different platforms and are increasingly relying on automation products to help them achieve it.

In this research, we model the problem of placement optimization as a stochastic bandit problem. In this problem, the algorithm is participating in k different auctions, one for each platform, and needs to decide the correct bid for each of the auctions. The algorithm is given a total budget B (e.g., the daily budget) and a time horizon T over which this budget should be spent. At each time-step, the algorithm should decide the bid it will associate with each of the k platform, which will be input into the auctions for the next set of requests on each of the platforms. At the end of a round (i.e., a sequence of requests), the algorithm sees the total reward it obtained (e.g., number of clicks) and the total budget that was consumed in the process, on each of the different platforms. Based on just this history, the algorithm should decide the next set of bid multipliers it needs to place. The goal of the algorithm is to maximize the total advertiser value with the given budget across the k platforms.

How it works:

This problem can be tackled using a model of bandits called bandits with budgets. In this paper, we propose a modified algorithm that works optimally in the regime when the number of platforms k is large and the total possible value is small relative to the total number of plays. Online advertising data has this particular behavior where the budget spent and the total value the advertiser receives are much smaller compared with the total number of auctions they participate in, because of the scale in the competition pool. Thus, our algorithm is a significant improvement over prior works, which tend to focus on the regime where the total optimal value possible is comparable to the number of time-steps.

The key idea of the algorithm is to modify a primal-dual based approach in prior work [1] that can handle multiple platforms. In particular, we derive a new optimization program at each time-step whose optimal solution gives us the bid multiplier that needs to be placed at each time-step. Prior work [2] that solves an optimization program usually relies on also performing a rounding step. However, this rounding step works well only when the optimal value possible is at least √T, and hence the assumption of the optimal value being comparable to the number of time-steps is unavoidable. However, in this work, we rely on the property of this linear program [3] and show that for the special case of multiplatform bid optimization, the optimal solution is already integral, and thus, we do not need a rounding step. This is the key idea that leads to the optimal regret guarantee.

We use logged data to show that this algorithm works well in practice with desirable properties such as uniform budget consumption and small total regret. We compare it with the prior works and also other commonly used heuristics in the industry. We show that the proposed algorithm is indeed superior to all these algorithms.

Why it matters:

On the business side, this research has potential benefits for advertisers, users, and platforms. Automated products that perform many of the targeting, placement, and creative optimization on advertisers’ behalf and their adoption is rapidly rising in the larger induis increasing in number. The key challenges with these automated products are scalability and budget management. The number of possible combinations explodes exponentially while the total budget provided by the advertiser roughly remains the same. This research provides scalable and simple algorithms that can help us in creating such automated solutions by automatically manipulating the bids, in real time, in the auction mechanism. Bid is one primary lever that advertisers use to change the delivery of the ads to their desired behavior. However, they usually do so in a black-box fashion because they do not have the required data to perform optimal bidding choices. However, the advantage of using the proposed algorithm is that the bidding is near-optimal thus, getting the most value for their spend. This has benefits for both the individual advertiser and the overall ecosystem.

On the research side, “Bandits with Budgets” has primarily been studied in the theoretical computer science/operations research community purely as a mathematical problem. This research bridges the gap between theory and practice of these algorithms — by applying it to a large-scale important problem. En route to this application, we also create a new, simpler algorithm that is optimal in the parameter ranges desired in the application.

Going forward, we hope that our paper opens the door for newer applications, both within online advertising and outside of it, for this extremely general and versatile model. We believe that this line of work has huge research potential for creating new algorithms as well as for affecting core business problems.

Read the full paper:

Stochastic bandits for multiplatform budget optimization in online advertising

References:

[1] – Badanidiyuru, Ashwinkumar, Robert Kleinberg, and Aleksandrs Slivkins. “Bandits with knapsacks.” 2013 IEEE 54th Annual Symposium on Foundations of Computer Science. IEEE, 2013.
[2] – Sankararaman, Karthik Abinav, and Aleksandrs Slivkins. “Combinatorial semi-bandits with knapsacks.” International Conference on Artificial Intelligence and Statistics. PMLR, 2018.
[3] – Avadhanula, Vashist, et al. “On the tightness of an LP relaxation for rational optimization and its applications.” Operations Research Letters 44.5 (2016): 612-617.

The post Auto-placement of ad campaigns using multi-armed bandits appeared first on Facebook Research.

Read More

AWS launches free digital training courses to empower business leaders with ML knowledge

Today, we’re pleased to launch Machine Learning Essentials for Business and Technical Decision Makersa series of three free, on-demand, digital-training courses from AWS Training and Certification. These courses are intended to empower business leaders and technical decision makers with the foundational knowledge needed to begin shaping a machine learning (ML) strategy for their organization, even if they have no prior ML experience. Each 30-minute course includes real-world examples from Amazon’s 20+ years of experience scaling ML within its own operations as well as lessons learned through countless successful customer implementations. These new courses are based on content delivered through the AWS Machine Learning Embark program, an exclusive, hands-on, ML accelerator that brings together executives and technologists at an organization to solve business problems with ML via a holistic learning experience. After completing the three courses, business leaders and technical decision makers will be better able to assess their organization’s readiness, identify areas of the business where ML will be the most impactful, and identify concrete next steps.

Last year, Amazon announced that we’re committed to helping 29 million individuals around the world grow their tech skills with free cloud computing skills training by 2025. The new Machine Learning Essentials for Business and Technical Decision Makers series presents one more step in this direction, with three courses:

  • Machine Learning: The Art of the Possible is the first course in the series. Using clear language and specific examples, this course helps you understand the fundamentals of ML, common use cases, and even potential challenges.
  • Planning a Machine Learning Project – the second course – breaks down how you can help your organization plan for an ML project. Starting with the process of assessing whether ML is the right fit for your goals and progressing through the key questions you need to ask during deployment, this course helps you understand important issues, such as data readiness, project timelines, and deployment.
  • Building a Machine Learning Ready Organization – the final course- offers insights into how to prepare your organization to successfully implement ML, from data-strategy evaluation, to culture, to starting an ML pilot, and more.

Democratizing access to free ML training

ML has the potential to transform nearly every industry, but most organizations struggle to adopt and implement ML at scale. Recent Gartner research shows that only 53% of ML projects make it from prototype to production. The most common barriers we see today are business and culture related. For instance, organizations often struggle to identify the right use cases to start their ML journey; this is often exacerbated by a shortage of skilled talent to execute on an organization’s ML ambitions. In fact, as an additional Gartner study shows, “skills of staff” is the number one challenge or barrier to the adoption of artificial intelligence (AI) and ML. Business leaders play a critical role in addressing these challenges by driving a culture of continuous learning and innovation; however, many lack the resources to develop their own knowledge of ML and its use cases.

With the new Machine Learning Essentials for Business and Technical Decision Makers course, we’re making a portion of the AWS Machine Learning Embark curriculum available globally as free, self-paced, digital-training courses.

The AWS Machine Learning Embark program has already helped many organizations harness the power of ML at scale. For example, the Met Office (the UK’s national weather service) is a great example of how organizations can accelerate their team’s ML knowledge using the program. As a research- and science-based organization, the Met Office develops custom weather-forecasting and climate-projection models that rely on very large observational data sets that are constantly being updated. As one of its many data-driven challenges, the Met Office was looking to develop an approach using ML to investigate how the Earth’s biosphere could alter in response to climate change. The Met Office partnered with the Amazon ML Solutions Lab through the AWS Machine Learning Embark program to explore novel approaches to solving this. “We were excited to work with colleagues from the AWS ML Solutions Lab as part of the Embark program,” said Professor Albert Klein-Tank, head of the Met Office’s Hadley Centre for Climate Science and Services. “They provided technical skills and experience that enabled us to explore a complex categorization problem that offers improved insight into how Earth’s biosphere could be affected by climate change. Our climate models generate huge volumes of data, and the ability to extract added value from it is essential for the provision of advice to our government and commercial stakeholders. This demonstration of the application of machine learning techniques to research projects has supported the further development of these skills across the Met Office.”

In addition to giving access to ML Embark content through the Machine Learning Essentials for Business and Technical Decision Makers, we’re also expanding the availability of the full ML Embark program through key strategic AWS Partners, including Slalom Consulting. We’re excited to jointly offer this exclusive program to all enterprise customers looking to jump-start their ML journey.

We invite you to expand your ML knowledge and help lead your organization to innovate with ML. Learn more and get started today.


About the Author

Michelle K. Lee is vice president of the Machine Learning Solutions Lab at AWS.

Read More

You Put a Spell on Me: GFN Thursdays Are Rewarding, 15 New Games Added This Week

This GFN Thursday — when GeForce NOW members can learn what new games and updates are streaming from the cloud — we’re adding 15 games to the service, with new content, including NVIDIA RTX and DLSS in a number of games.

Plus, we have a GeForce NOW Reward for Spellbreak from our friends at Proletariat.

Rewards Are Rewarding

One of the benefits of being a GeForce NOW member is gaining access to exclusive rewards. These can include free games, in-game content, discounts and more.

This week, we’re offering the Noble Oasis outfit, a rare outfit from the game Spellbreak that’s exclusive to GeForce NOW members.

Play Spellbreak on GeForce NOW
Unleash your inner battlemage in Spellbreak, streaming on GeForce NOW,

Spellbreak Chapter 2: The Fracture is streaming on GeForce NOW. This massive update, released just last week, includes Dominion, the new 5 vs 5 team capture-point game mode. It also introduced Leagues, the new competitive ranking mode where players work their way up through Bronze, Silver and all the way to Legend. There were new map updates and gameplay changes as well, making it their biggest update yet.

Founders members will have first crack at the reward, starting today. It’s another benefit to thank you for gaming with us. Priority members are next in line and can look for their opportunity to redeem starting on Friday, April 16. Free members gain access on Tuesday, April 20.

It’s first come, first served, so be sure to redeem your reward as soon as you have access!

The Spellbreak in-game reward is the latest benefit for GeForce NOW members; others have included rewards for Discord, ARK: Survival Evolved, Hyperscape, Warface, Warframe and more.

Signing up for GeForce NOW Rewards is simple. Log in to your NVIDIA GeForce NOW account, click “Update Rewards Settings” and check the box.

Updates to Your Catalog

GeForce NOW members are getting updates to a few games this week in the form of new expansions or RTX support.

Path of Exile, the popular free-to-play, online, action RPG is getting an expansion in Path of Exile: Ultimatum. It contains the Ultimatum challenge league, eight new Skill and Support Gems, improvements to Vaal Skills, an overhaul to past league reward systems, dozens of new items, and much more.

Meanwhile, three games are adding RTX support with real-time, ray-traced graphics and/or NVIDIA DLSS. Mortal Shell gets the full complement of RTX support, while Medieval Dynasty and Observer System Redux get DLSS support to improve image quality while maintaining framerate.

Let’s Play Today

Nigate Tail on GeForce NOW
Nigate Tale is one of 15 games joining the GeForce NOW library.

Of course, GFN Thursday has even more games in store for members. This week we welcomed Nigate Tale, day-and-date with its Steam release on Tuesday. It’s currently available for 15 percent off through April 18. Members can also look for 14 additional games to join our library. Complete list below:

Excited for the reward? Looking forward to streaming one of this week’s new releases or new content? Let us know on Twitter or in the comments below.

The post You Put a Spell on Me: GFN Thursdays Are Rewarding, 15 New Games Added This Week appeared first on The Official NVIDIA Blog.

Read More

Estimating 3D pose for athlete tracking using 2D videos and Amazon SageMaker Studio

In preparation for the upcoming Olympic Games, Intel®, an American multinational corporation and one of the world’s largest technology companies, developed a concept around 3D Athlete Tracking (3DAT). 3DAT is a machine learning (ML) solution to create real-time digital models of athletes in competition in order to increase fan engagement during broadcasts. Intel was looking to leverage this technology for the purpose of coaching and training elite athletes.

Classical computer vision methods for 3D pose reconstruction have proven to be cumbersome for most scientists, given that these models mostly rely on embedding additional sensors on an athlete and the lack of 3D labels and models. Although we can put seamless data collection mechanisms in place using regular mobile phones, developing 3D models using 2D video data is a challenge, given the lack of depth of information in 2D videos. Intel’s 3DAT team partnered with the Amazon ML Solutions Lab (MLSL) to develop 3D human pose estimation techniques on 2D videos in order to create a lightweight solution for coaches to extract biomechanics and other metrics of their athletes’ performance.

This unique collaboration brought together Intel’s rich history in innovation and Amazon ML Solution Lab’s computer vision expertise to develop a 3D multi-person pose estimation pipeline using 2D videos from standard mobile phones as inputs, with Amazon SageMaker Studio notebooks (SM Studio) as the development environment.

Jonathan Lee, Director of Intel Sports Performance, Olympic Technology Group, says, “The MLSL team did an amazing job listening to our requirements and proposing a solution that would meet our customers’ needs. The team surpassed our expectations, developing a 3D pose estimation pipeline using 2D videos captured with mobile phones in just two weeks. By standardizing our ML workload on Amazon SageMaker, we achieved a remarkable 97% average accuracy on our models.”

This post discusses how we employed 3D pose estimation models and generated 3D outputs on 2D video data collected from Ashton Eaton, a decathlete and two-time Olympic gold medalist from the United States, using different angles. It also presents two computer vision techniques to align the videos captured from different angles, thereby allowing coaches to use a unique set of 3D coordinates across the run.

Challenges

Human pose estimation techniques use computer vision aim to provide a graphical skeleton of a person detected in a scene. They include coordinates of predefined key points corresponding to human joints, such as the arms, neck, and hips. These coordinates are used to capture the body’s orientation for further analysis, such as pose tracking, posture analysis, and subsequent evaluation. Recent advances in computer vision and deep learning have enabled scientists to explore pose estimation in a 3D space, where the Z-axis provides additional insights compared to 2D pose estimation. These additional insights could be used for more comprehensive visualization and analysis. However, building a 3D pose estimation model from scratch is challenging because it requires imaging data along with 3D labels. Therefore, many researchers employ pretrained 3D pose estimation models.

Data processing pipeline

We designed an end-to-end 3D pose estimation pipeline illustrated in the following diagram using SM Studio, which encompassed several components:

  • Amazon Simple Storage Service (Amazon S3) bucket to host video data
  • Frame extraction module to convert video data to static images
  • Object detection modules to detect bounding boxes of persons in each frame
  • 2D pose estimation for future evaluation purposes
  • 3D pose estimation module to generate 3D coordinates for each person in each frame
  • Evaluation and visualization modules

SM Studio offers a broad range of features facilitating the development process, including easy access to data in Amazon S3, availability of compute capability, software and library availability, and an integrated development experience (IDE) for ML applications.

First, we read the video data from the S3 bucket and extracted the 2D frames in a portable network graphics (PNG) format for frame-level development. We used YOLOv3 object detection to generate a bounding box of each person detected in a frame. For more information, see Benchmarking Training Time for CNN-based Detectors with Apache MXNet.

Next, we passed the frames and corresponding bounding box information to the 3D pose estimation model to generate the key points for evaluation and visualization. We applied a 2D pose estimation technique to the frames, and we generated the key points per frame for development and evaluation. The following sections discuss the details of each module in the 3D pipeline.

Data preprocessing

The first step was to extract frames from a given video utilizing OpenCV as shown in the following figure. We used two counters to keep track of time and frame count respectively, because videos were captured at different frames per second (FPS) rates. We then stored the sequence of images asvideo_name + second_count + frame_countin PNG format.

Object (person) detection

We employed YOLOv3 pretrained models based on the Pascal VOC dataset to detect persons in frames. For more information, see Deploying custom models built with Gluon and Apache MXNet on Amazon SageMaker. The YOLOv3 algorithm produced the bounding boxes shown in the following animations (the original images are resized to 910×512 pixels).

We stored the bounding box coordinates in a CSV file, in which the rows indicated the frame index, bounding box information as a list, and their confidence scores.

2D pose estimation

We selected ResNet-18 V1b as the pretrained pose estimation model, which considers a top-down strategy to estimate human poses within bounding boxes output by the object detection model. We further reset the detector classes to include humans so that the non-maximum suppression (NMS) process could be performed faster. The Simple Pose network was applied to predict heatmaps for key points (as in the following animation), and the highest values in the heatmaps were mapped to the coordinates on the original images.

3D pose estimation

We employed a state-of-the-art 3D pose estimation algorithm encompassing a camera distance-aware top-down method for multi-person per RGB frame referred to as 3DMPPE (Moon et al.). This algorithm consisted of two major phases:

  • RootNet – Estimates the camera-centered coordinates of a person’s root in a cropped frame
  • PoseNet – Uses a top-down approach to predict the relative 3D pose coordinates in the cropped image

Next, we used the bounding box information to project the 3D coordinates back to the original space. 3DMPPE offered two pretrained models trained using Human36 and MuCo3D datasets (for more information, see the GitHub repo), which included 17 and 21 key points, respectively, as illustrated in the following animations. We used the 3D pose coordinates predicted by the two pretrained models for visualization and evaluation purposes.

Evaluation

To evaluate the 2D and 3D pose estimation models’ performance, we used the 2D pose (x,y) and 3D pose (x,y,z) coordinates for each joint generated for every frame in a given video. The number of key points varied based on the datasets; for instance, the Leeds Sports Pose Dataset (LSP) includes 14, whereas the MPII Human Pose dataset, a state-of-the-art benchmark for evaluating articulated human pose estimation referring to Human3.6M, includes 16 key points. We used two metrics commonly used for both 2D and 3D pose estimation, as described in the next section on evaluation. In our implementation, our default key points dictionary followed the COCO detection dataset, which has 17 key points (see the following image), and the order is defined as follows:

KEY POINTS = {
    0: "nose",
    1: "left_eye",
    2: "right_eye",
    3: "left_ear",
    4: "right_ear",
    5: "left_shoulder",
    6: "right_shoulder",
    7: "left_elbow",
    8: "right_elbow",
    9: "left_wrist",
    10: "right_wrist",
    11: "left_hip",
    12: "right_hip",
    13: "left_knee",
    14: "right_knee",
    15: "left_ankle",
    16: "right_ankle"
}

Mean per joint position error

Mean per joint position error (MPJPE) is the Euclidean distance between ground truth and a joint prediction. As MPJPE measures the error or loss distance, and lower values indicate greater precision.

We use the following pseudo code:

  • Let G denote ground_truth_joint and preprocess G by:
    • Replacing the null entries in G with [0,0] (2D) or [0,0,0] (3D)
    • Using a Boolean matrix B to store the location of null entries
  • Let P denote predicted_joint matrix, and align G and P by frame index by inserting a zero vector if any frame doesn’t have results or is unlabeled
  • Compute element-wise Euclidean distance between G and P, and let D denote distance matrix
  • Replace Di,j with 0 if Bi,j
  • Mean per joint position is the mean value of each column of Ds,tDi,j ≠ 0

The following figure visualizes an example of video’s per joint error, a matrix with dimension m*n, where m denotes the number of frames in a video and n denotes the number of joints (key points). The matrix shows an example of a heatmap of per joint position error on the left and the mean per joint position error on the right.

The following figure visualizes an example of video’s per joint error, a matrix with dimension m*n , where m denotes the number of frames in a video and n denotes the number of joints (key points). The matrix shows an example of a heatmap of per joint position error on the left and the mean per joint position error on the right.

Percentage of correct key points

The percentage of correct key points (PCK) represents a pose evaluation metric where a detected joint is considered correct if the distance between the predicted and actual joint is within a certain threshold; this threshold may vary, which leads to a few different variations of metrics. Three variations are commonly used:

  • PCKh@0.5, which is when the threshold is defined as 0.5 * head bone link
  • PCK@0.2, which is when the distance between the predicted and actual joint is < 0.2 * torso diameter
  • 150mm as a hard threshold

In our solution, we used the PCKh@0.5 as our ground truth XML data containing the head bounding box, which we can use to compute the head-bone link. To the best of our knowledge, no existing package contains an easy-to-use implementation for this metric; therefore, we implemented the metric in-house.

Pseudo code

We used the following pseudo code:

  • Let G denote ground-truth joint and preprocess G by:
    • Replacing the null entries in G with [0,0] (2D) or [0,0,0] (3D)
    • Using a Boolean matrix B to store the location of null entries
  • For each frame Fi, use its bbox Bi=(xmin,ymin,xmax,ymax) to compute each frame’s corresponding head-bone link Hi , where Hi=((xmax-xmin)2+(ymax-ymin)2)½
  • Let P denote predicted joint matrix and align G and P by frame index; insert a zero tensor if any frame is missing
  • Compute the element-wise 2-norm error between G and P; let E denote error matrix, where Ei,j=||Gi,j-Pi,j||
  • Compute a scaled matrix S=H*I, where I represents an identity matrix with the same dimension as E
  • To avoid division by 0, replace Si,j with 0.000001 if Bi,j=1
  • Compute scaled error matrix Si,j=Ei,j/Si,j
  • Filter out SE with threshold = 0.5, and let C denote the counter matrix, where Ci,j=1 if SEi,j<0.5 and Ci,j=0 elsewise
  • Count how many 1’s in C*,j as c⃗ and count how many 0’s in B*,j as b⃗
  • PCKh@0.5=mean(c⃗/b⃗)

In the sixth step (replace Si,jwith 0.000001 if Bi,j=1), we set up a trap for the scaled error matrix by replacing 0 entries with 0.00001. Dividing any number by a tiny number generates an amplified number. Because we later used > 0.5 as a threshold to filter out incorrect predictions, the null entries were excluded from the correct prediction because it was way too large. We subsequently counted only the not null entries in the Boolean matrix. In this way, we also excluded the null entries from the whole dataset. We proposed an engineering trick in this implementation to filter out null entries from either unlabeled key points in the ground truth or the frames with no person detected.

Video alignment

We considered two different camera configurations to capture video data from athletes, namely the line and box setups. The line setup consists of four cameras placed along a line while the box setup consists of four cameras placed in each corner of a rectangle. The cameras were synchronized in the line configuration and then lined up at a predefined distance from each other, utilizing slightly overlapping camera angles. The objective of the video alignment in the line configuration was to identify the timestamps connecting consecutive cameras to remove repeated and empty frames. We implemented two approaches based on object detection and cross-correlation of optical flows.

Object detection algorithm

We used the object detection results in this approach, including persons’ bounding boxes from the previous steps. The object detection techniques produced a probability (score) per person in each frame. Therefore, plotting the scores in a video enabled us to find the frame where the first person appeared or disappeared. The reference frame from the box configuration was extracted from each video, and all cameras were then synchronized based on the first frame’s references. In the line configuration, both the start and end timestamps were extracted, and a rule-based algorithm was implemented to connect and align consecutive videos, as illustrated in the following images.

The top videos in the following figure show the original videos in the line configuration. Underneath that are person detection scores. The next rows show a threshold of 0.75 applied to the scores, and appropriate start and end timestamps are extracted. The bottom row shows aligned videos for further analysis.

Moment of snap

We introduced the moment of snap (MOS) – a well-known alignment approach – which indicates when an event or play begins. We wanted to determine the frame number when an athlete enters or leaves the scene. Typically, relatively little movement occurs on the running field before the start and after the end of the snap, whereas relatively substantial movement occurs when the athlete is running. Therefore, intuitively, we could find the MOS frame by finding the video frames with relatively large differences in the video’s movement before and after the frame. To this end, we utilized density optical flow, a standard measure of movement in the video, to estimate the MOS. First, given a video, we computed optical flow for every two consecutive frames. The following videos present a visualization of dense optical flow on the horizontal axis.

We then measured cross-correlation between two consecutive frames’ optical flows, because cross-correlation measures the difference between them. For each angle’s camera-captured video, we repeated the algorithm to find its MOS. Finally, we used the MOS frame as the key frame for aligning videos from different angles. The following video details these steps.

Conclusion

The technical objective of the work demonstrated in this post was to develop a deep-learning based solution producing 3D pose estimation coordinates using 2D videos. We employed a camera distance-aware technique with a top-down approach to achieve 3D multi-person pose estimation. Further, using object detection, cross-correlation, and optical flow algorithms, we aligned the videos captured from different angles.

This work has enabled coaches to analyze 3D pose estimation of athletes over time to measure biomechanics metrics, such as velocity, and monitor the athletes’ performance using quantitative and qualitative methods.

This post demonstrated a simplified process for extracting 3D poses in real-world scenarios, which can be scaled to coaching in other sports such as swimming or team sports.

If you would like help with accelerating the use of ML in your products and services, please contact the Amazon ML Solutions Lab program.

References

Moon, Gyeongsik, Ju Yong Chang, and Kyoung Mu Lee. “Camera distance-aware top-down approach for 3d multi-person pose estimation from a single RGB image.” In Proceedings of the IEEE International Conference on Computer Vision, pp. 10133-10142. 2019.


About the Author

Saman Sarraf is a Data Scientist at the Amazon ML Solutions Lab. His background is in applied machine learning including deep learning, computer vision, and time series data prediction.

 

 

 

Amery Cong is an Algorithms Engineer at Intel, where he develops machine learning and computer vision technologies to drive biomechanical analyses at the Olympic Games. He is interested in quantifying human physiology with AI, especially in a sports performance context.

 

 

Ashton Eaton is a Product Development Engineer at Intel, where he helps design and test technologies aimed at advancing sport performance. He works with customers and the engineering team to identify and develop products that serve customer needs. He is interested in applying science and technology to human performance.

 

 

Jonathan Lee is the Director of Sports Performance Technology, Olympic Technology Group at Intel. He studied the application of machine learning to health as an undergrad at UCLA and during his graduate work at University of Oxford. His career has focused on algorithm and sensor development for health and human performance. He now leads the 3D Athlete Tracking project at Intel.

 

 

Nelson Leung is the Platform Architect in the Sports Performance CoE at Intel, where he defines end-to-end architecture for cutting-edge products that enhance athlete performance. He also leads the implementation, deployment and productization of these machine learning solutions at scale to different Intel partners.

 

 

Suchitra Sathyanarayana is a manager at the Amazon ML Solutions Lab, where she helps AWS customers across different industry verticals accelerate their AI and cloud adoption. She holds a PhD in Computer Vision from Nanyang Technological University, Singapore.

 

 

Wenzhen Zhu is a data scientist with the Amazon ML Solution Lab team at Amazon Web Services. She leverages Machine Learning and Deep Learning to solve diverse problems across industries for AWS customers.

Read More

Implement checkpointing with TensorFlow for Amazon SageMaker Managed Spot Training

Customers often ask us how can they lower their costs when conducting deep learning training on AWS. Training deep learning models with libraries such as TensorFlow, PyTorch, and Apache MXNet usually requires access to GPU instances, which are AWS instances types that provide access to NVIDIA GPUs with thousands of compute cores. GPU instance types can be more expensive than other Amazon Elastic Compute Cloud (Amazon EC2) instance types, so optimizing usage of these types of instances is a priority for customers as well as an overall best practice for well-architected workloads.

Amazon SageMaker is a fully managed service that provides every developer and data scientist with the ability to prepare, build, train, and deploy machine learning (ML) models quickly. SageMaker removes the heavy lifting from each step of the ML process to make it easier to develop high-quality models. SageMaker provides all the components used for ML in a single toolset so models get to production faster with less effort and at lower cost.

Amazon EC2 Spot Instances offer spare compute capacity available in the AWS Cloud at steep discounts compared to On-Demand prices. Amazon EC2 can interrupt Spot Instances with 2 minutes of notification when the service needs the capacity back. You can use Spot Instances for various fault-tolerant and flexible applications. Some examples are analytics, containerized workloads, stateless web servers, CI/CD, training and inference of ML models, and other test and development workloads. Spot Instance pricing makes high-performance GPUs much more affordable for deep learning researchers and developers who run training jobs.

One of the key benefits of SageMaker is that it frees you of any infrastructure management, no matter the scale you’re working at. For example, instead of having to set up and manage complex training clusters, you simply tell SageMaker which EC2 instance type to use and how many you need. The appropriate instances are then created on-demand, configured, and stopped automatically when the training job is complete. As SageMaker customers have quickly understood, this means that they pay only for what they use. Building, training, and deploying ML models are billed by the second, with no minimum fees, and no upfront commitments. SageMaker can also use EC2 Spot Instances for training jobs, which optimize the cost of the compute used for training deep-learning models.

In this post, we walk through the process of training a TensorFlow model with Managed Spot Training in SageMaker. We walk through the steps required to set up and run a training job that saves training progress in Amazon Simple Storage Service (Amazon S3) and restarts the training job from the last checkpoint if an EC2 instance is interrupted. This allows our training jobs to continue from the same point before the interruption occurred. Finally, we see the savings that we achieved by running our training job on Spot Instances using Managed Spot Training in SageMaker.

Managed Spot Training in SageMaker

SageMaker makes it easy to train ML models using managed EC2 Spot Instances. Managed Spot Training can optimize the cost of training models up to 90% over On-Demand Instances. With only a few lines of code, SageMaker can manage Spot interruptions on your behalf.

Managed Spot Training uses EC2 Spot Instances to run training jobs instead of On-Demand Instances. You can specify which training jobs use Spot Instances and a stopping condition that specifies how long SageMaker waits for a training job to complete using EC2 Spot Instances. Metrics and logs generated during training runs are available in Amazon CloudWatch.

Managed Spot Training is available in all training configurations:

  • All instance types supported by SageMaker
  • All models: built-in algorithms, built-in frameworks, and custom models
  • All configurations: single instance training and distributed training

Interruptions and checkpointing

There’s an important difference when working with Managed Spot Training. Unlike On-Demand Instances that are expected to be available until a training job is complete, Spot Instances may be reclaimed any time Amazon EC2 needs the capacity back.

SageMaker, as a fully managed service, handles the lifecycle of Spot Instances automatically. It interrupts the training job, attempts to obtain Spot Instances again, and either restarts or resumes the training job.

To avoid restarting a training job from scratch if it’s interrupted, we strongly recommend that you implement checkpointing, a technique that saves the model in training at periodic intervals. When you use checkpointing, you can resume a training job from a well-defined point in time, continuing from the most recent partially trained model, and avoiding starting from the beginning and wasting compute time and money.

To implement checkpointing, we have to make a distinction on the type of algorithm you use:

  • Built-in frameworks and custom models – You have full control over the training code. Just make sure that you use the appropriate APIs to save model checkpoints to Amazon S3 regularly, using the location you defined in the CheckpointConfig parameter and passed to the SageMaker Estimator. TensorFlow uses checkpoints by default. For other frameworks, see our sample notebooks and Use Machine Learning Frameworks, Python, and R with Amazon SageMaker.
  • Built-in algorithms – Computer vision algorithms support checkpointing (object detection, semantic segmentation, and image classification). Because they tend to train on large datasets and run for longer than other algorithms, they have a higher likelihood of being interrupted. The XGBoost built-in algorithm also supports checkpointing.

TensorFlow image classification model with Managed Spot Training

To demonstrate Managed Spot Training and checkpointing, I guide you through the steps needed to train a TensorFlow image classification model. To make sure that your training scripts can take advantage of SageMaker Managed Spot Training, we need to implement the following:

  • Frequent saving of checkpoints, thereby saving checkpoints each epoch
  • The ability to resume training from checkpoints if checkpoints exist

Save checkpoints

SageMaker automatically backs up and syncs checkpoint files generated by your training script to Amazon S3. Therefore, you need to make sure that your training script saves checkpoints to a local checkpoint directory on the Docker container that’s running the training. The default location to save the checkpoint files is /opt/ml/checkpoints, and SageMaker syncs these files to the specific S3 bucket. Both local and S3 checkpoint locations are customizable.

Saving checkpoints using Keras is very easy. You need to create an instance of the ModelCheckpoint callback class and register it with the model by passing it to the fit() function.

You can find the full implementation code on the GitHub repo.

The following is the relevant code:

callbacks = []
callbacks.append(ModelCheckpoint(args.checkpoint_path + '/checkpoint-{epoch}.h5'))

logging.info("Starting training from epoch: {}".format(initial_epoch_number+1))
    
model.fit(x=train_dataset[0],
          y=train_dataset[1],
          steps_per_epoch=(num_examples_per_epoch('train') 
          epochs=args.epochs,
          initial_epoch=initial_epoch_number,
          validation_data=validation_dataset,
          validation_steps=(num_examples_per_epoch('validation') 
          callbacks=callbacks) 

The names of the checkpoint files saved are as follows: checkpoint-1.h5, checkpoint-2.h5, checkpoint-3.h5, and so on.

For this post, I’m passing initial_epoch, which you normally don’t set. This lets us resume training from a certain epoch number and comes in handy when you already have checkpoint files.

The checkpoint path is configurable because we get it from args.checkpoint_path in the main function:

if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    
    ...
    parser.add_argument("--checkpoint-path",type=str,default="/opt/ml/checkpoints",help="Path where checkpoints will be saved.")
    ...
    
    args = parser.parse_args()

Resume training from checkpoint files

When Spot capacity becomes available again after Spot interruption, SageMaker launches a new Spot Instance, instantiates a Docker container with your training script, copies your dataset and checkpoint files from Amazon S3 to the container, and runs your training scripts.

Your script needs to implement resuming training from checkpoint files, otherwise your training script restarts training from scratch. You can implement a load_model_from_checkpoints function as shown in the following code. It takes in the local checkpoint files path (/opt/ml/checkpoints being the default) and returns a model loaded from the latest checkpoint and the associated epoch number.

You can find the full implementation code on the GitHub repo.

The following is the relevant code:

def load_model_from_checkpoints(checkpoint_path):
    checkpoint_files = [file for file in os.listdir(checkpoint_path) if file.endswith('.' + 'h5')]
    logging.info('--------------------------------------------')
    logging.info("Available checkpoint files: {}".format(checkpoint_files))
    epoch_numbers = [re.search('(.*[0-9])(?=.)',file).group() for file in checkpoint_files]
      
    max_epoch_number = max(epoch_numbers)
    max_epoch_index = epoch_numbers.index(max_epoch_number)
    max_epoch_filename = checkpoint_files[max_epoch_index]

    logging.info('Latest epoch checkpoint file name: {}'.format(max_epoch_filename))
    logging.info('Resuming training from epoch: {}'.format(int(max_epoch_number)+1))
    logging.info('---------------------------------------------')
    
    resumed_model_from_checkpoints = load_model(f'{checkpoint_path}/{max_epoch_filename}')
    return resumed_model_from_checkpoints, int(max_epoch_number)

Managed Spot Training with a TensorFlow estimator

You can launch SageMaker training jobs from your laptop, desktop, EC2 instance, or SageMaker notebook instances. Make sure you have the SageMaker Python SDK installed and the right user permissions to run SageMaker training jobs.

To run a Managed Spot Training job, you need to specify few additional options to your standard SageMaker Estimator function call:

  • use_spot_instances – Specifies whether to use SageMaker Managed Spot Training for training. If enabled, you should also set the train_max_wait automated reasoning group (ARG).
  • max_wait – Timeout in seconds waiting for Spot training instances (default: None). After this amount of time, SageMaker stops waiting for Spot Instances to become available or the training job to finish. From previous runs, I know that the training job will finish in 4 minutes, so I set it to 600 seconds.
  • max_run – Timeout in seconds for training (default: 24 * 60 * 60). After this amount of time, SageMaker stops the job regardless of its current status. I am willing to stand double the time a training with On-Demand takes, so I assign 20 minutes of training time in total using Spot.
  • checkpoint_s3_uri – The S3 URI in which to persist checkpoints that the algorithm persists (if any) during training.

You can find the full implementation code on the GitHub repo.

The following is the relevant code:

use_spot_instances = True
max_run=600
max_wait = 1200

checkpoint_suffix = str(uuid.uuid4())[:8]
checkpoint_s3_uri = 's3://{}/checkpoint-{}'.format(bucket, checkpoint_suffix)
hyperparameters = {'epochs': 5, 'batch-size': 256}

spot_estimator = TensorFlow(entry_point='cifar10_keras_main.py',
                       source_dir='source_dir',
                       metric_definitions=metric_definitions,
                       hyperparameters=hyperparameters,
                       role=role,
                       framework_version='1.15.2',
                       py_version='py3',
                       instance_count=1,
                       instance_type='ml.p3.2xlarge',
                       base_job_name='cifar10-tf-spot-1st-run',
                       tags=tags,
                       checkpoint_s3_uri=checkpoint_s3_uri,
                       use_spot_instances=use_spot_instances,
                       max_run=max_run,
                       max_wait=max_wait)

Those are all the changes you need to make to significantly lower your cost of ML training.

To monitor your training job and view savings, you can look at the logs on your Jupyter notebook.

Towards the end of the job, you should see two lines of output:

  • Training seconds: X – The actual compute time your training job spent
  • Billable seconds: Y – The time you are billed for after Spot discounting is applied.

If you enabled use_spot_instances, you should see a notable difference between X and Y, signifying the cost savings you get for using Managed Spot Training. This is reflected in an additional line:

  • Managed Spot Training savings – Calculated as (1-Y/X)*100 %

The following screenshot shows the output logs for our Jupyter notebook:

When the training is complete, you can also navigate to the Training jobs page on the SageMaker console and choose your training job to see how much you saved.

For this example training job of a model using TensorFlow, my training job ran for 144 seconds, but I’m only billed for 43 seconds, so for a 5 epoch training on a ml.p3.2xlarge GPU instance, I was able to save 70% on training cost!

Confirm that checkpoints and recovery works for when your training job is interrupted

How can you test if your training job will resume properly if a Spot Interruption occurs?

If you’re familiar with running EC2 Spot Instances, you know that you can simulate your application behavior during a Spot Interruption by following the recommended best practices. However, because SageMaker is a managed service, and manages the lifecycle of EC2 instances on your behalf, you can’t stop a SageMaker training instance manually. Your only option is to stop the entire training job.

You can still test your code’s behavior when resuming an incomplete training by running a shorter training job, and then using the outputted checkpoints from that training job as inputs to a longer training job. To do this, first run a SageMaker Managed Spot Training job for a specified number of epochs as described in the previous section. Let’s say you run training for five epochs. SageMaker would have backed up your checkpoint files to the specified S3 location for the five epochs.

You can navigate to the training job details page on the SageMaker console to see the checkpoint configuration S3 output path.

Choose the S3 output path link to navigate to the checkpointing S3 bucket, and verify that five checkpoint files are available there.

Now run a second training run with 10 epochs. You should provide the first job’s checkpoint location to checkpoint_s3_uri so the training job can use those checkpoints as inputs to the second training job.

You can find the full implementation code in the GitHub repo.

The following is the relevant code:

hyperparameters = {'epochs': 10, 'batch-size': 256}

spot_estimator = TensorFlow(entry_point='cifar10_keras_main.py',
                       source_dir='source_dir',
                       metric_definitions=metric_definitions,
                       hyperparameters=hyperparameters,
                       role=role,
                       framework_version='1.15.2',
                       py_version='py3',
                       instance_count=1,
                       instance_type='ml.p3.2xlarge',
                       base_job_name='cifar10-tf-spot-2nd-run',
                       tags=tags,
                       checkpoint_s3_uri=checkpoint_s3_uri,
                       use_spot_instances=use_spot_instances,
                       max_run=max_run,
                       max_wait=max_wait)

By providing checkpoint_s3_uri with your previous job’s checkpoints, you’re telling SageMaker to copy those checkpoints to your new job’s container. Your training script then loads the latest checkpoint and resumes training. The following screenshot shows that the training resumes resume from the sixth epoch.

To confirm that all checkpoint files were created, navigate to the same S3 bucket. This time you can see that 10 checkpoint files are available.

The key difference between simulating an interruption this way and how SageMaker manages interruptions is that you’re creating a new training job to test your code. In the case of Spot Interruptions, SageMaker simply resumes the existing interrupted job.

Implement checkpointing with PyTorch, MXNet, and XGBoost built-in and script mode

The steps shown in the TensorFlow example are basically the same for PyTorch and MXNet. The code for saving checkpoints and loading them to resume training is different.

You can see full examples for TensorFlow 1.x/2.x, PyTorch, MXNet, and XGBoost built-in and script mode in the GitHub repo.

Conclusions and next steps

In this post, we trained a TensorFlow image classification model using SageMaker Managed Spot Training. We saved checkpoints locally in the container and loaded checkpoints to resume training if they existed. SageMaker takes care of synchronizing the checkpoints with Amazon S3 and the training container. We simulated a Spot interruption by running Managed Spot Training with 5 epochs, and then ran a second Managed Spot Training Job with 10 epochs, configuring the checkpoints’ S3 bucket of the previous job. This resulted in the training job loading the checkpoints stored in Amazon S3 and resuming from the sixth epoch.

It’s easy to save on training costs with SageMaker Managed Spot Training. With minimal code changes, you too can save over 70% when training your deep-learning models.

As a next step, try to modify your own TensorFlow, PyTorch, or MXNet script to implement checkpointing, and then run a Managed Spot Training in SageMaker to see that the checkpoint files are created in the S3 bucket you specified. Let us know how you do in the comments!


About the Author

Eitan Sela is a Solutions Architect with Amazon Web Services. He works with AWS customers to provide guidance and technical assistance, helping them improve the value of their solutions when using AWS. Eitan also helps customers build and operate machine learning solutions on AWS. In his spare time, Eitan enjoys jogging and reading the latest machine learning articles.

Read More

Presenting the iGibson Challenge on Interactive and Social Navigation

Posted by Anthony Francis, Software Engineer and Alexander Toshev, Staff Research Scientist, Google Research

Computer vision has significantly advanced over the past decade thanks to large-scale benchmarks, such as ImageNet for image classification or COCO for object detection, which provide vast datasets and criteria for evaluating models. However, these traditional benchmarks evaluate passive tasks in which the emphasis is on perception alone, whereas more recent computer vision research has tackled active tasks, which require both perception and action (often called “embodied AI”).

The First Embodied AI Workshop, co-organized by Google at CVPR 2020, hosted several benchmark challenges for active tasks, including the Stanford and Google organized Sim2Real Challenge with iGibson, which provided a real-world setup to test navigation policies trained in photo-realistic simulation environments. An open-source setup in the challenge enabled the community to train policies in simulation, which could then be run in repeatable real world navigation experiments, enabling the evaluation of the “sim-to-real gap” — the difference between simulation and the real world. Many research teams submitted solutions during the pandemic, which were run safely by challenge organizers on real robots, with winners presenting their results virtually at the workshop.

This year, Stanford and Google are proud to announce a new version of the iGibson Challenge on Interactive and Social Navigation, one of the 10 active visual challenges affiliated with the Second Embodied AI Workshop at CVPR 2021. This year’s Embodied AI Workshop is co-organized by Google and nine other research organizations, and explores issues such as simulation, sim-to-real transfer, visual navigation, semantic mapping and change detection, object rearrangement and restoration, auditory navigation, and following instructions for navigation and interaction tasks. In addition, this year’s interactive and social iGibson challenge explores interactive navigation and social navigation — how robots can learn to interact with people and objects in their environments — by combining the iGibson simulator, the Google Scanned Objects Dataset, and simulated pedestrians within realistic human environments.

New Challenges in Navigation
Active perception tasks are challenging, as they require both perception and actions in response. For example, point navigation involves navigating through mapped space, such as driving robots over kilometers in human-friendly buildings, while recognizing and avoiding obstacles. Similarly object navigation involves looking for objects in buildings, requiring domain invariant representations and object search behaviors. Additionally, visual language instruction navigation involves navigating through buildings based on visual images and commands in natural language. These problems become even harder in a real-world environment, where robots must be able to handle a variety of physical and social interactions that are much more dynamic and challenging to solve. In this year’s iGibson Challenge, we focus on two of those settings:

  • Interactive Navigation: In a cluttered environment, an agent navigating to a goal must physically interact with objects to succeed. For example, an agent should recognize that a shoe can be pushed aside, but that an end table should not be moved and a sofa cannot be moved.
  • Social Navigation: In a crowded environment in which people are also moving about, an agent navigating to a goal must move politely around the people present with as little disruption as possible.

New Features of the iGibson 2021 Dataset
To facilitate research into techniques that address these problems, the iGibson Challenge 2021 dataset provides simulated interactive scenes for training. The dataset includes eight fully interactive scenes derived from real-world apartments, and another seven scenes held back for testing and evaluation.

iGibson provides eight fully interactive scenes derived from real-world apartments.

To enable interactive navigation, these scenes are populated with small objects drawn from the Google Scanned Objects Dataset, a dataset of common household objects scanned in 3D for use in robot simulation and computer vision research, licensed under a Creative Commons license to give researchers the freedom to use them in their research.

The Google Scanned Objects Dataset contains 3D models of many common objects.

The challenge is implemented in Stanford’s open-source iGibson simulation platform, a fast, interactive, photorealistic robotic simulator with physics based on Bullet. For this year’s challenge, iGibson has been expanded with fully interactive environments and pedestrian behaviors based on the ORCA crowd simulation algorithm.

iGibson environments include ORCA crowd simulations and movable objects.

Participating in the Challenge
The iGibson Challenge has launched and its leaderboard is open in the Dev phase, in which participants are encouraged to submit robotic control to the development leaderboard, where they will be tested on the Interactive and Social Navigation challenges on our holdout dataset. The Test phase opens for teams to submit final solutions on May 16th and closes on May 31st, with the winner demo scheduled for June 20th, 2021. For more details on participating, please check out the iGibson Challenge Page.

Acknowledgements
We’d like to thank our colleagues at at the Stanford Vision and Learning Lab (SVL) for working with us to advance the state of interactive and social robot navigation, including Chengshu Li, Claudia Pérez D’Arpino, Fei Xia, Jaewoo Jang, Roberto Martin-Martin and Silvio Savarese. At Google, we would like to thank Aleksandra Faust, Anelia Angelova, Carolina Parada, Edward Lee, Jie Tan, Krista Reyman and the rest of our collaborators on mobile robotics. We would also like to thank our co-organizers on the Embodied AI Workshop, including AI2, Facebook, Georgia Tech, Intel, MIT, SFU, Stanford, UC Berkeley, and University of Washington.

Read More