Securing Amazon SageMaker Studio internet traffic using AWS Network Firewall

Amazon SageMaker Studio is a web-based fully integrated development environment (IDE) where you can perform end-to-end machine learning (ML) development to prepare data and build, train, and deploy models.

Like other AWS services, Studio supports a rich set of security-related features that allow you to build highly secure and compliant environments.

One of these fundamental security features allows you to launch Studio in your own Amazon Virtual Private Cloud (Amazon VPC). This allows you to control, monitor, and inspect network traffic within and outside your VPC using standard AWS networking and security capabilities. For more information, see Securing Amazon SageMaker Studio connectivity using a private VPC.

Customers in regulated industries, such as financial services, often don’t allow any internet access in ML environments. They often use only VPC endpoints for AWS services, and connect only to private source code repositories in which all libraries have been vetted both in terms of security and licensing. Customers may want to provide internet access but also have some controls such as domain name or URL filtering and allow access to only specific public repositories and websites, possibly packet inspection, or other network traffic-related security controls. For these cases, AWS Network Firewall and NAT gateway-based deployment may provide a suitable use case.

In this post, we show how you can use Network Firewall to build a secure and compliant environment by restricting and monitoring internet access, inspecting traffic, and using stateless and stateful firewall engine rules to control the network flow between Studio notebooks and the internet.

Depending on your security, compliance, and governance rules, you may not need to or cannot completely block internet access from Studio and your AI and ML workloads. You may have requirements beyond the scope of network security controls implemented by security groups and network access control lists (ACLs), such as application protocol protection, deep packet inspection, domain name filtering, and intrusion prevention system (IPS). Your network traffic controls may also require many more rules compared to what is currently supported in security groups and network ACLs. In these scenarios, you can use Network Firewall—a managed network firewall and IPS for your VPC.

Solution overview

When you deploy Studio in your VPC, you control how Studio accesses the internet with the parameter AppNetworkAccessType (via the Amazon SageMaker API) or by selecting your preference on the console when you create a Studio domain.

If you select Public internet Only (PublicInternetOnly), all the ingress and egress internet traffic from Amazon SageMaker notebooks flows through an AWS managed internet gateway attached to a VPC in your SageMaker account. The following diagram shows this network configuration.

Studio provides public internet egress through a platform-managed VPC for data scientists to download notebooks, packages, and datasets. Traffic to the attached Amazon Elastic File System (Amazon EFS) volume always goes through the customer VPC and never through the public internet egress.

To use your own control flow for the internet traffic, like a NAT or internet gateway, you must set the AppNetworkAccessType parameter to VpcOnly (or select VPC Only on the console). When you launch your app, this creates an elastic network interface in the specified subnets in your VPC. You can apply all available layers of security control—security groups, network ACLs, VPC endpoints, AWS PrivateLink, or Network Firewall endpoints—to the internal network and internet traffic to exercise fine-grained control of network access in Studio. The following diagram shows the VpcOnly network configuration.

In this mode, the direct internet access to or from notebooks is completely disabled, and all traffic is routed through an elastic network interface in your private VPC. This also includes traffic from Studio UI widgets and interfaces, such as Experiments, Autopilot, and Model Monitor, to their respective backend SageMaker APIs.

For more information about network access parameters when creating a domain, see CreateDomain.

The solution in this post uses the VpcOnly option and deploys the Studio domain into a VPC with three subnets:

  • SageMaker subnet – Hosts all Studio workloads. All ingress and egress network flow is controlled by a security group.
  • NAT subnet – Contains a NAT gateway. We use the NAT gateway to access the internet without exposing any private IP addresses from our private network.
  • Network Firewall subnet – Contains a Network Firewall endpoint. The route tables are configured so that all inbound and outbound external network traffic is routed via Network Firewall. You can configure stateful and stateless Network Firewall policies to inspect, monitor, and control the traffic.

The following diagram shows the overview of the solution architecture and the deployed components.

VPC resources

The solution deploys the following resources in your account:

  • A VPC with a specified Classless Inter-Domain Routing (CIDR) block
  • Three private subnets with specified CIDRs
  • Internet gateway, NAT gateway, Network Firewall, and a Network Firewall endpoint in the Network Firewall subnet
  • A Network Firewall policy and stateful domain list group with an allow domain list
  • Elastic IP allocated to the NAT gateway
  • Two security groups for SageMaker workloads and VPC endpoints, respectively
  • Four route tables with configured routes
  • An Amazon S3 VPC endpoint (type Gateway)
  • AWS service access VPC endpoints (type Interface) for various AWS services that need to be accessed from Studio

The solution also creates an AWS Identity and Access Management (IAM) execution role for SageMaker notebooks and Studio with preconfigured IAM policies.

Network routing for targets outside the VPC is configured in such a way that all ingress and egress internet traffic goes via the Network Firewall and NAT gateway. For details and reference network architectures with Network Firewall and NAT gateway, see Architecture with an internet gateway and a NAT gateway, Deployment models for AWS Network Firewall, and Enforce your AWS Network Firewall protections at scale with AWS Firewall Manager. The AWS re:Invent 2020 video Which inspection architecture is right for you? discusses which inspection architecture is right for your use case.

SageMaker resources

The solution creates a SageMaker domain and user profile.

The solution uses only one Availability Zone and is not highly available. A best practice is to use a Multi-AZ configuration for any production deployment. You can implement the highly available solution by duplicating the Single-AZ setup—subnets, NAT gateway, and Network Firewall endpoints—to additional Availability Zones.

You use Network Firewall and its policies to control entry and exit of the internet traffic in your VPC. You create an allow domain list rule to allow internet access to the specified network domains only and block traffic to any domain not on the allow list.

AWS CloudFormation resources

The source code and AWS CloudFormation template for solution deployment are provided in the GitHub repository. To deploy the solution on your account, you need:

Network Firewall is a Regional service; for more information on Region availability, see the AWS Region Table.

Your CloudFormation stack doesn’t have any required parameters. You may want to change the DomainName or *CIDR parameters to avoid naming conflicts with the existing resources and your VPC CIDR allocations. Otherwise, use the following default values:

  • ProjectName – sagemaker-studio-vpc-firewall
  • DomainName – sagemaker-anfw-domain
  • UserProfileName – anfw-user-profile
  • VPCCIDR – 10.2.0.0/16
  • FirewallSubnetCIDR – 10.2.1.0/24
  • NATGatewaySubnetCIDR – 10.2.2.0/24
  • SageMakerStudioSubnetCIDR – 10.2.3.0/24

Deploy the CloudFormation template

To start experimenting with the Network Firewall and stateful rules, you need first to deploy the provided CloudFormation template to your AWS account.

  1. Clone the GitHub repository:
git clone https://github.com/aws-samples/amazon-sagemaker-studio-vpc-networkfirewall.git
cd amazon-sagemaker-studio-vpc-networkfirewall 
  1. Create an S3 bucket in the Region where you deploy the solution:
aws s3 mb s3://<your s3 bucket name>

You can skip this step if you already have an S3 bucket.

  1. Deploy the CloudFormation stack:
make deploy CFN_ARTEFACT_S3_BUCKET=<your s3 bucket name>

The deployment procedure packages the CloudFormation template and copies it to the S3 bucket your provided. Then the CloudFormation template is deployed from the S3 bucket to your AWS account.

The stack deploys all the needed resources like VPC, network devices, route tables, security groups, S3 buckets, IAM policies and roles, and VPC endpoints, and also creates a new Studio domain and user profile.

When the deployment is complete, you can see the full list of stack output values by running the following command in terminal:

aws cloudformation describe-stacks 
    --stack-name sagemaker-studio-demo 
    --output table 
    --query "Stacks[0].Outputs[*].[OutputKey, OutputValue]"
  1. Launch Studio via the SageMaker console.

Experiment with Network Firewall

Now you can learn how to control the internet inbound and outbound access with Network Firewall. In this section, we discuss the initial setup, accessing resources not on the allow list, adding domains to the allow list, configuring logging, and additional firewall rules.

Initial setup

The solution deploys a Network Firewall policy with a stateful rule group with an allow domain list. This policy is attached to the Network Firewall. All inbound and outbound internet traffic is blocked now, except for the .kaggle.com domain, which is on the allow list.

Let’s try to access https://kaggle.com by opening a new notebook in Studio and attempting to download the front page from kaggle.com:

!wget https://kaggle.com

The following screenshot shows that the request succeeds because the domain is allowed by the firewall policy. Users can connect to this and only to this domain from any Studio notebook.

 

Access resources not on the allowed domain list

In the Studio notebook, try to clone any public GitHub repository, such as the following:

!git clone https://github.com/aws-samples/amazon-sagemaker-studio-vpc-networkfirewall.git

This operation times out after 5 minutes because any internet traffic except to and from the .kaggle.com domain isn’t allowed and is dropped by Network Firewall.

Add a domain to the allowed domain list

To be able to run the git clone command, you must allow internet traffic to the .github.com domain.

  1. On the Amazon VPC console, choose Firewall policies.
  2. Choose the policy network-firewall-policy-<ProjectName>.

  1. In the Stateful rule groups section, select the group rule domain-allow-sagemaker-<ProjectName>.

You can see the domain .kaggle.com on the allow list.

  1. Choose Add domain.

  1. Enter .github.com.
  2. Choose Save.

You now have two names on the allow domain list.

Firewall policy is propagated in real time to Network Firewall and your changes take effect immediately. Any inbound or outbound traffic from or to these domains is now allowed by the firewall and all other traffic is dropped.

To validate the new configuration, go to your Studio notebook and try to clone the same GitHub repository again:

!git clone https://github.com/aws-samples/amazon-sagemaker-studio-vpc-networkfirewall.git

The operation succeeds this time—Network Firewall allows access to the .github.com domain.

Network Firewall logging

In this section, you configure Network Firewall logging for your firewall’s stateful engine. Logging gives you detailed information about network traffic, including the time that the stateful engine received a packet, detailed information about the packet, and any stateful rule action taken against the packet. The logs are published to the log destination that you configured, where you can retrieve and view them.

  1. On the Amazon VPC console, choose Firewalls.
  2. Choose your firewall.

  1. Choose the Firewall details tab.

  1. In the Logging section, choose Edit.

  1. Configure your firewall logging by selecting what log types you want to capture and providing the log destination.

For this post, select Alert log type, set Log destination for alerts to CloudWatch Log group, and provide an existing or a new log group where the firewall logs are delivered.

  1. Choose Save.

To check your settings, go back to Studio and try to access pypi.org to install a Python package:

!pip install -U scikit-learn

This command fails with ReadTimeoutError because Network Firewall drops any traffic to any domain not on the allow list (which contains only two domains: .github.com and .kaggle.com).

On the Amazon CloudWatch console, navigate to the log group and browse through the recent log streams.

The pipy.org domain shows the blocked action. The log event also provides additional details such as various timestamps, protocol, port and IP details, event type, availability zone, and the firewall name.

You can continue experimenting with Network Firewall by adding .pypi.org and .pythonhosted.org domains to the allowed domain list.

Then validate your access to them via your Studio notebook.

Additional firewall rules

You can create any other stateless or stateful firewall rules and implement traffic filtering based on a standard stateful 5-tuple rule for network traffic inspection (protocol, source IP, source port, destination IP, destination port). Network Firewall also supports industry standard stateful Suricata compatible IPS rule groups. You can implement protocol-based rules to detect and block any non-standard or promiscuous usage or activity. For more information about creating and managing Network Firewall rule groups, see Rule groups in AWS Network Firewall.

Additional security controls with Network Firewall

In the previous section, we looked at one feature of the Network Firewall: filtering network traffic based on the domain name. In addition to stateless or stateful firewall rules, Network Firewall provides several tools and features for further security controls and monitoring:

Build secure ML environments

A robust security design normally includes multi-layer security controls for the system. For SageMaker environments and workloads, you can use the following AWS security services and concepts to secure, control, and monitor your environment:

  • VPC and private subnets to perform secure API calls to other AWS services and restrict internet access for downloading packages.
  • S3 bucket policies that restrict access to specific VPC endpoints.
  • Encryption of ML model artifacts and other system artifacts that are either in transit or at rest. Requests to the SageMaker API and console are made over a Secure Sockets Layer (SSL) connection.
  • Restricted IAM roles and policies for SageMaker runs and notebook access based on resource tags and project ID.
  • Restricted access to Amazon public services, such as Amazon Elastic Container Registry (Amazon ECR) to VPC endpoints only.

For a reference deployment architecture and ready-to-use deployable constructs for your environment, see Amazon SageMaker with Guardrails on AWS.

Conclusion

In this post, we showed how you can secure, log, and monitor internet ingress and egress traffic in Studio notebooks for your sensitive ML workloads using managed Network Firewall. You can use the provided CloudFormation templates to automate SageMaker deployment as part of your Infrastructure as Code (IaC) strategy.

For more information about other possibilities to secure your SageMaker deployments and ML workloads, see Building secure machine learning environments with Amazon SageMaker.


About the Author

Author

Yevgeniy Ilyin is a Solutions Architect at AWS. He has over 20 years of experience working at all levels of software development and solutions architecture and has used programming languages from COBOL and Assembler to .NET, Java, and Python. He develops and codes cloud native solutions with a focus on big data, analytics, and data engineering.

Read More

Perform medical transcription analysis in real-time with AWS AI services and Twilio Media Streams

Medical providers often need to analyze and dictate patient phone conversations, doctors’ notes, clinical trial reports, and patient health records. By automating transcription, providers can quickly and accurately provide patients with medical conditions, medication, dosage, strength, and frequency.

Generic artificial intelligence-based transcription models can be used to transcribe voice to text. However, medical voice data often uses complex medical terms and abbreviations. Transcribing such data needs medical/healthcare-specific machine learning (ML) models. To address this issue, AWS launched Amazon Transcribe Medical, an automatic speech recognition (ASR) service that makes it easy for you to add medical speech-to-text capabilities to your voice-enabled applications.

Additionally, Amazon Comprehend Medical is a HIPAA-eligible service that helps providers extract information from unstructured medical text accurately and quickly. To transcribe voice in real time, providers need access to raw audio from the call while in-progress. Twilio, an AWS partner, offers real-time telephone voice integration.

In this post, we show you how to integrate Twilio Media Streams with Amazon Transcribe Medical and Amazon Comprehend Medical to transcribe and analyze data from phone calls. For non-healthcare industries, you can use this same solution with Amazon Transcribe and Amazon Comprehend.

Twilio Media Streams works in the context of a traditional Twilio voice application, like an Interactive Voice Response (IVR), that serves customers directly, as well as a contact center, like Twilio Flex, where agents are serving consumers. You have discrete control over your voice data within your contact center to build the experience your customers prefer.

Amazon Transcribe Medical is an ML service that makes it easy to quickly create accurate transcriptions between patients and physicians. Amazon Comprehend Medical is a natural language processing (NLP) service that makes it easy to use ML to extract relevant medical information from unstructured text. You can quickly and accurately gather information (such as medical condition, medication, dosage, strength, and frequency), from a variety of sources (like doctors’ notes, clinical trial reports, and patient health records). Amazon Comprehend Medical can also link the detected information to medical ontologies such as ICD-10-CM or RxNorm so downstream healthcare applications can use it easily.

The following diagram illustrates how Amazon Comprehend Medical supports medical named entity and relationship extractions.

Amazon Transcribe Medical, Amazon Comprehend Medical, and Twilio Media Streams are all managed platforms. This means that data scientists and healthcare IT teams don’t need to build services from the ground up. Voice integration is provided by Twilio and  AWS ML services APIs, and only requires a simple plug-and-play with AWS and Twilio services to build the end-to-end workflow.

Solution overview

Our solution uses Twilio Media Streams to provide telephony service to the customer. This service provides a telephone number and backend to media services to integrate it with REST API-based web applications. In this solution, we build a Node.js web app and deploy it with AWS Amplify. Amplify helps front-end web and mobile developers build secure, scalable, full stack applications. The web app interfaces with Twilio Media Streams to receive phone calls in voice format, and uses Amazon Transcribe Medical to convert voice to text. Upon receiving the transcription, the application interfaces with Amazon Comprehend Medical to extract medical terms and insights from the transcription. The insights are displayed on the web app and stored in an Amazon DynamoDB table for further analysis. The solution also uses Amazon Simple Storage Service (Amazon S3) and an AWS Cloud9 environment.

The following diagram illustrates the solution architecture.

To implement the solution, we complete the following high-level steps:

  1. Create a trial Twilio account.
  2. Create an AWS Identity and Access Management (IAM) user.
  3. Create an AWS Cloud9 integrated development environment (IDE).
  4. Clone the GitHub repo.
  5. Create a secured HTTPS tunnel using ngrok and set up Twilio phone number’s voice configuration.
  6. Run the application.

Create a trial Twilio account

Before getting started, make sure to sign up for a trial Twilio account (https://www.twilio.com/try-twilio), if you don’t already have one.

Create an IAM user

To create an IAM user, complete the following steps:

  1. On the IAM console, under Access management, choose Users.
  2. Choose Add user.
  3. On the Set user details page, for User name¸ enter a name.
  4. For Access type, select Programmatic access.
  5. Choose Next: Permissions.

  1. On the Set permissions page, choose Attach existing policies directly.
  2. Select the following AWS Managed Policies, AmazonTranscribeFullAccess, ComprehendMedicalFullAccess, AmazonDyanmoDBFullAccess, and AmazonS3FullAccess.
  3. Choose Next: Tags.
  4. Skip adding tags and choose Next: Review.
  5. Review the IAM user details and attached policies and choose Create user.

  1. On the next page, copy the access key ID and secret access key to your clipboard or download the CSV file.

We use these credentials for testing the Node.js application.

Create an S3 Bucket

To create your Amazon S3 Bucket, complete the following steps.

  1. On the Amazon S3 console, choose Create bucket.
  2. For Bucket name, enter a name for the Amazon S3 bucket.
  3. For Block Public Access settings for this bucket check Block all public access.
  4. Review the settings and choose Create bucket.

Create an Amazon DynamoDB Table

To create your Amazon DynamoDB table, complete the following steps.

  1. On the Amazon DynamoDB console, choose Create table.
  2. For Table name, enter a name for the Amazon DynamoDB Table.
  3. For Primary key, enter ROWID for the primary key.

  1. Review the Amazon DynamoDB table settings and choose

Create an AWS Cloud9 environment

To create your AWS Cloud9 environment, complete the following steps.

  1. On the AWS Cloud9 console, choose Environments.
  2. Choose Create environment.
  3. For Name, enter a name for the environment.
  4. For Description, enter an optional description.
  5. Choose Next step.

  1. On the Configure Settings page, select Ubuntu Server 18.04 LTS for Platform and leave the other settings as default.

  1. Review the settings and choose Create environment.

The AWS Cloud9 IDE tab opens on your browser; you may have to wait a few minutes for the environment creation process to complete.

Clone the GitHub repo

In the AWS Cloud9 environment, close the Welcome and AWS Toolkit – QuickStart tabs. To clone the GitHub repository, on the bash terminal, enter the following code:

git clone https://github.com/aws-samples/amazon-transcribe-comprehend-medical-twilio

cd twilio-medical-transcribe && npm install --silent

Edit the config.json file under the project directory. Replace the values with your Amazon S3 Bucket and Amazon DynamoDB table.

Set up ngrok and the Twilio phone number

Before we start the Node.js application, we need to start a secured HTTPS tunnel using ngrok and set up the Twilio phone number’s voice configuration.

  1. On the terminal, choose the +
  2. Choose New Terminal.

  1. On the terminal, install ngrok:
    sudo snap install ngrok

  2. After ngrok is installed, run the following code to expose the local Express Node.js server to the internet:
    ngrok http 8080

  3. Copy the public HTTPS URL.

You use this URL for the Twilio phone number’s voice configuration.

  1. Sign in to your Twilio account.
  2. On the dashboard, choose the icon to open the Settings

  1. Choose Phone Numbers.

  1. On the Phone Numbers page, choose your Twilio phone number.

  1. In the Voice section, for A Call Comes In, choose Webhook.
  2. Enter the ngrok tunnel followed by /twiml.
  3. Save the configuration.

Run the application

Let’s now run the Twilio Media Streams, Amazon Transcribe Medical, and Amazon Comprehend Medical services by entering the following code:

npm start

We can preview the application in AWS Cloud9. In the environment, on the Preview menu, choose Preview Running Application.

You can copy the public URL to view the application in another browser tab.

Enter the IAM user access ID and secret key credentials, and your Twilio account SID, auth token, and phone number.

Demonstration

In this section, we use two sample recordings to demonstrate real-time audio transcription with Twilio Media Streams.

After you enter your IAM and Twilio credentials, choose Submit Credentials.

The following screenshot shows the transcription for our first audio file, sample-1.mp4.

The following screenshot shows the transcription for our second file, sample-3.mp4.

This application uses Amazon Transcribe Medical to transcribe media content in real time, and stores the output in Amazon S3 for further analysis. The application then uses Amazon Comprehend Medical to detect the following entities:

  • ANATOMY – Detects references to the parts of the body or body systems and the locations of those parts or systems
  • MEDICAL_CONDITION – Detects the signs, symptoms, and diagnosis of medical conditions
  • MEDICATION – Detects medication and dosage information for the patient
  • PROTECTED_HEALTH_INFORMATION – Detects the patient’s personal information
  • TEST_TREATMENT_PROCEDURE – Detects the procedures that are used to determine a medical condition
  • TIME_EXPRESSION – Detects entities related to time when they are associated with a detected entity

These entities are stored in the DynamoDB table. Healthcare providers can use this data to create patient diagnosis and treatment plan.

You can further analyze this data through services such as Amazon Elasticsearch Service (Amazon ES) and Amazon Kendra.

Clean up your resources

The AWS services used in this solution are part of the AWS Free Tier. If you’re not using the Free Tier, clean up the following resources to avoid incurring additional charges:

  • AWS Cloud9 environment
  • Amazon S3 Bucket
  • Amazon DynamoDB Table
  • IAM user

Conclusion

In this post, we showed how to integrate Twilio Media Streams with Amazon Transcribe Medical and Amazon Comprehend Medical to transcribe and analyze medical data from audio files. You can also use this solution in non-healthcare industries to transcribe information from audio.

We invite you to check out the code in the GitHub repo and try out the solution, and even expand on the data analysis with Amazon ES or Amazon Kendra.


About the Author

Mahendra Bairagi is a Principal Machine Learning Prototyping Architect at Amazon Web Services. He helps customers build machine learning solutions on AWS. He has extensive experience on ML, Robotics, IoT and Analytics services. Prior to joining Amazon Web Services, he had long tenure as entrepreneur, enterprise architect and software developer.

 

 

Jay Park is a Prototyping Solutions Architect for AWS. Jay is focused on helping AWS customers speed their adoption of cloud-native workloads through rapid prototyping

Read More

Announcing the recipients of the 2021 Facebook Fellowship awards

The Facebook Fellowship program provides awards to PhD candidates conducting research on important topics across computer science and engineering, such as computer vision, programming languages, computational social science, and more. Recipients of the award receive tuition and fees paid for up to two academic years and a stipend of $42,000, which includes conference travel support.

The Fellows are also invited to Facebook HQ in Menlo Park to attend the annual Fellowship Summit. This summit serves as an opportunity for Fellows to network with the rest of their cohort, share their research, and learn more about what researchers at Facebook are working on. As in 2020, we will host the summit virtually this year.

The program is now in its 10th year and has supported more than 144 PhD candidates from a broad range of universities. This year, we received 2,163 applications from over 100 universities worldwide, and we selected 26 outstanding Fellows from 19 universities.

Congratulations to this year’s winners, and thank you to everyone who took the time to submit an application.

2021 Facebook Fellows

Applied statistics


Hsiang Hsu
Harvard University

Finalists: Ayush Jain, University of California San Diego; Hanyu Song, Duke University

AR/VR photonics and optics


Prachi Tureja
California Institute of Technology

Finalists: Nathan Tessema Ersumo, University of California, Berkeley; Geun Ho Ahn, Stanford University; Christina Maria Spaegele, Harvard University

AR/VR future technologies


Logan Clark
University of Virginia


Caitlin Morris
Massachusetts Institute of Technology (MIT)

Finalists: Dishita Turakhia, MIT; Adam Williams, Colorado State University; Feiyu Lu, Virginia Polytechnic Institute and State University (Virginia Tech)

Blockchain and cryptoeconomics


Yan Ji
Cornell University

Finalists: Vibhaalakshmi Sivaraman, MIT; Itay Tsabary, Technion — Israel Institute of Technology

Computational social science


Manoel Horta Ribeiro
École Polytechnique Fédérale de Lausanne

Finalists: Kelsey Gonzalez, University of Arizona; Marianne Aubin Le Quere, Cornell University

AR/VR computer graphics


Cheng Zhang
University of California, Irvine


Liang Shi
MIT

Finalist: Joey Litalien, McGill University

Computer vision


Shuang Li
MIT


Xingyi Zhou
University of Texas at Austin

Finalists: Xinshuo Weng, Carnegie Mellon University; Yunzhu Li, MIT; Jiayuan Mao, MIT; Yinpeng Dong, Tsinghua University

Distributed systems


Yunhao Zhang
Cornell University

Finalists: Vikram Narayanan, University of California, Irvine; Ahmed Alquraan, University of Waterloo

Economics and computation


Andrés Ignacio Cristi Espinosa
Universidad de Chile

Finalist: Hanrui Zhang, Duke University

Networking


Jiaxin Lin
University of Wisconsin–Madison

Finalists: Siva Kesava Reddy Kakarla, University of California, Los Angeles; Junzhi Gong, Harvard University; Fabian Ruffy Varga, New York University (NYU)

Programming languages


Yuanbo Li
Georgia Institute of Technology

Finalists: Jenna Wise, Carnegie Mellon University; Victor A. Ying, MIT

Security and privacy


Jiaheng Zhang
University of California, Berkeley


Marina Minkin
University of Michigan

Finalists: Lillian Yow Tsai, MIT; Praneeth Vepakomma, MIT; Alexander Bienstock, NYU; Amrita Roy Chowdhury, University of Wisconsin–Madison; Trishita Tiwari, Cornell University; Harjasleen Malvai, Cornell University; Jiameng Pu, Virginia Tech

Database systems


Leonhard Spiegelberg
Brown University


Jialin Ding
MIT

Finalists: Tobias Ziegler, Technical University of Darmstadt; Ian Neal, University of Michigan; Pedro Thiago Timbó Holanda, Centrum Wiskunde & Informatica; Avinash Kumar, University of California, Irvine

Systems for machine learning


Weizhe Hua
Cornell University

Finalist: Qinyi Luo, University of Southern California

Instagram/Facebook app well-being and safety


Yasaman Sadat Sefidgar
University of Washington

Finalists: Nicholas Santer, University of California, Santa Cruz; Brian Ward Bauer, University of Southern Mississippi; Morgan Klaus Scheuerman, University of Colorado Boulder

Privacy and data use


Reza Ghaiumy Anaraky
Clemson University

Finalist: Yixi Zou, University of Michigan

Machine learning


Mikhail Khodak
Carnegie Mellon University


Yuval Dagan
MIT

Natural language processing


Tiago Pimentel Martins da Silva
University of Cambridge


Kawin Ethayarajh
Stanford University

Finalists: Haoyue “Freda” Shi, Toyota Technological Institute at Chicago; Tom McCoy, Johns Hopkins University

Spoken language processing and audio classification


Paul Pu Liang
Carnegie Mellon University

Finalists: Jonah Casebeer, University of Illinois at Urbana-Champaign; Efthymios Tzinis, University of Illinois at Urbana-Champaign; Karan Ahuja, Carnegie Mellon University

To learn more about application requirements and program details, visit the Facebook Fellowship Program page.

The post Announcing the recipients of the 2021 Facebook Fellowship awards appeared first on Facebook Research.

Read More

Amazon Forecast now provides estimated run time for forecast creation jobs, enabling you to manage your time efficiently

Amazon Forecast now displays the estimated time it takes to complete an in-progress workflow for importing your data, training the predictor, and generating the forecast. You can now manage your time more efficiently and better plan for your next workflow around the estimated time remaining for your in-progress workflow. Forecast uses machine learning (ML) to generate more accurate demand forecasts, without requiring any prior ML experience. Forecast brings the same technology used at Amazon.com to developers as a fully managed service, removing the need to manage resources or rebuild your systems.

Previously, you had no clear insights as to how long a workflow would take to complete, which forced you to proactively monitor each stage, whether it was importing your data, training the predictor, or generating the forecast. This made it difficult for you to plan for subsequent steps, causing frustration and anxiety. This can be especially frustrating when the time required to import data, train a predictor, and creating forecasts can vary widely depending on the size and characteristics of your data.

Now, you have visibility into the time that a workflow may take, which can be especially useful for manually running your forecast workloads and during the process of experimentation. Knowing how long each workflow will take allows you to focus on other tasks and come back to the forecast journey later. Additionally, the displayed estimated time to complete a workflow refreshes automatically, which provides better expectations and removes further frustration.

In this post, we walk through the Forecast console experience of reading the estimated time to workflow completion. To check the estimated time through the APIs, refer to DescribeDatasetImportJob, DescribePredictor, DescribeForecast.

If you want to build automated workflows for Forecast, we recommend following the steps outlined in Create forecasting systems faster with automated workflows and notifications in Amazon Forecast, which walks through integrating Forecast with Amazon EventBridge to build event-driven Forecast workflows. EventBridge removes the need to manually check the estimated time for a workflow to complete, because it starts your desired next workflow automatically.

Check the estimated time to completion of your dataset import workflow

After you create a new dataset import job, you can see the Create pending status for the newly created job. When the status changes to Create in progress, you can see the estimated time remaining in the Status column of the Datasets imports section. This estimated time refreshes automatically until the status changes to Active.

On the details page of the newly created dataset import job, when the status is Create in progress, the Estimated time remaining field shows the remaining time for the import job to complete and Actual import time shows -. This section refreshes automatically with the estimated time to completion. After the import job is complete and the status becomes Active, the Actual import time shows the total time of the import.

Check the estimated time to completion of your predictor training workflow

After you create a new predictor, you first see the Create pending status for the newly created job. When the status changes to Create in progress, you see the estimated time remaining in the Status column in the Predictors section. This estimated time refreshes automatically until the status changes to Active.

On the details page of the newly created predictor job, when the status is Create in progress, the Estimated time remaining field shows the remaining time for the predictor job to complete and Actual import time shows -. This section refreshes automatically with the estimated time to completion. After the import job is complete and the status becomes Active, the Actual import time shows the total time for the predictor creation.

Check the estimated time to completion of your forecast creation workflow

After you create a new forecast, you first see the Create pending status for the newly created job. When the status changes to Create in progress, you see the estimated time remaining in the Status column. This estimated time refreshes automatically until it changes to Active.

On the details page of the newly created forecast job, when the status is Create in progress, the Estimated time remaining field shows the remaining time for the forecast job to complete and Actual import time shows -. This section refreshes automatically with the estimated time to completion. After the import job is complete and the status changes to Active, the Actual import time shows the total time for the forecast creation to complete.

Conclusion

You can now find out how long it takes when you initiate a workload using Forecast, which can help you manage your time more efficiently. The new field is part of the response to Describe* calls that will show up automatically, without requiring any setup.

To learn more about this capability, see DescribeDatasetImportJob, DescribePredictor, and DescribeForecast. You can use this capability in all Regions where Forecast is publicly available. For more information about Region availability, see AWS Regional Services.


About the Authors

Alex Kim is a Sr. Product Manager for Amazon Forecast. His mission is to deliver AI/ML solutions to all customers who can benefit from it. In his free time, he enjoys all types of sports and discovering new places to eat.

 

 

 

Ranjith Kumar Bodla is an SDE in the Amazon Forecast team. He works as a backend developer within a distributed environment with a focus on AI/ML and leadership. During his spare time, he enjoys playing table tennis, traveling, and reading.

 

 

 

Gautam Puri is a Software Development Engineer on the Amazon Forecast team. His focus area is on building distributed systems that solve machine learning problems. In his free time, he enjoys hiking and basketball.

 

 

 

Shannon Killingsworth is a UX Designer for Amazon Forecast and Amazon Personalize. His current work is creating console experiences that are usable by anyone, and integrating new features into the console experience. In his spare time, he is a fitness and automobile enthusiast.

 

Read More

Build an event-based tracking solution using Amazon Lookout for Vision

Amazon Lookout for Vision is a machine learning (ML) service that spots defects and anomalies in visual representations using computer vision (CV). With Amazon Lookout for Vision, manufacturing companies can increase quality and reduce operational costs by quickly identifying differences in images of objects at scale.

Many enterprise customers want to identify missing components in products, damage to vehicles or structures, irregularities in production lines, minuscule defects in silicon wafers, and other similar problems. Amazon Lookout for Vision uses ML to see and understand images from any camera as a person would, but with an even higher degree of accuracy and at a much larger scale. Amazon Lookout for Vision eliminates the need for costly and inconsistent manual inspection, while improving quality control, defect and damage assessment, and compliance. In minutes, you can begin using Amazon Lookout for Vision to automate inspection of images and objects—with no ML expertise required.

In this post, we look at how we can automate detecting anomalies in silicon wafers and notifying operators in real time.

Solution overview

Keeping track of the quality of products in a manufacturing line is a challenging task. Some process steps take images of the product that humans then review in order to assure good quality. Thanks to artificial intelligence, you can automate these anomaly detection tasks, but human intervention may be necessary after anomalies are detected. A standard approach is sending emails when problematic products are detected. These emails might be overlooked, which could cause a loss in quality in a manufacturing plant.

In this post, we automate the process of detecting anomalies in silicon wafers and notifying operators in real time using automated phone calls. The following diagram illustrates our architecture. We deploy a static website using AWS Amplify, which serves as the entry point for our application. Whenever a new image is uploaded via the UI (1), an AWS Lambda function invokes the Amazon Lookout for Vision model (2) and predicts whether this wafer is anomalous or not. The function stores each uploaded image to Amazon Simple Storage Service (Amazon S3) (3). If the wafer is anomalous, the function sends the confidence of the prediction to Amazon Connect and calls an operator (4), who can take further action (5).

Setting up Amazon Connect and the associated contact flow

To configure Amazon Connect and the contact flow, you complete the following high-level steps:

  1. Create an Amazon Connect instance.
  2. Set up the contact flow.
  3. Claim your phone number.

Create an Amazon Connect instance

The first step is to create an Amazon Connect instance. For the rest of the setup, we use the default values, but don’t forget to create an administrator login.

Instance creation can take a few minutes, after which we can log in to the Amazon Connect instance using the admin account we created.

Setting up the contact flow

In this post, we have a predefined contact flow that we can import. For more information about importing an existing contact flow, see Import/export contact flows.

  1. Choose the file contact-flow/wafer-anomaly-detection from the GitHub repo.
  2. Choose Import.

The imported contact flow looks similar to the following screenshot.

  1. On the flow details page, expand Show additional flow information.

Here you can find the ARN of the contact flow.

  1. Record the contact flow ID and contact center ID, which you need later.

Claim your phone number

Claiming a number is easy and takes just a few clicks. Make sure to choose the previously imported contact flow while claiming the number.

If no numbers are available in the country of your choice, raise a support ticket.

Contact flow overview

The following screenshot shows our contact flow.

The contact flow performs the following functions:

  • Enable logging
  • Set the output Amazon Polly voice (for this post, we use the Kendra voice)
  • Get customer input using DTMF (only keys 1 and 2 are valid).
  • Based on the user’s input, the flow does one of the following:
    • Prompt a goodbye message stating no action will be taken and exit
    • Prompt a goodbye message stating an action will be taken and exit
    • Fail and deliver a fallback block stating that the machine will shut down and exit

Optionally, you can enhance your system with an Amazon Lex bot.

Deploy the solution

Now that you have set up Amazon Connect, deployed your contact flow, and noted the information you need for the rest of the deployment, we can deploy the remaining components. In the cloned GitHub repository, edit the build.sh script and run it from the command line:

#Global variables
ApplicationRegion="YOUR_REGION"
S3SourceBucket="YOUR_S3_BUCKET-sagemaker"
LookoutProjectName="YOUR_PROJECT_NAME"
FlowID="YOUR_FLOW_ID"
InstanceID="YOUR_INSTANCE_ID"
SourceNumber="YOUR_CLAIMED_NUMBER"
DestNumber="YOUR_MOBILE_PHONE_NUMBER"
CloudFormationStack="YOUR_CLOUD_FORMATION_STACK_NAME"

Provide the following information:

  • Your Region
  • The S3 bucket name you want to use (make sure the name includes the word sagemaker).
  • The name of the Amazon Lookout for Vision project you want to use
  • The ID of your contact flow
  • Your Amazon Connect instance ID
  • The number you’ve claimed in Amazon Connect in E.164 format (for example, +132398765)
  • A name for the AWS CloudFormation stack you create by running this script

This script then performs the following actions:

  • Create an S3 bucket for you
  • Build the .zip files for your Lambda function
  • Upload the CloudFormation template and the Lambda function to your new S3 bucket
  • Create the CloudFormation stack

After the stack is deployed, you can find the following resources created on the AWS CloudFormation console.

You can see that an Amazon SageMaker notebook called amazon-lookout-vision-create-project is also created.

Build, train, and deploy the Amazon Lookout for Vision model

In this section, we see how to build, train, and deploy the Amazon Lookout for Vision model using the open-source Python SDK. For more information about the Amazon Lookout for Vision Python SDK, see this blog post.

You can build the model via the AWS Management Console. For programmatic deployment, complete the following steps:

  1. On the SageMaker console, on the Notebook instances page, access the SageMaker notebook instance that was created earlier by choosing Open Jupyter.

In the instance, you can find the GitHub repository of the Amazon Lookout for Vision Python SDK automatically cloned.

  1. Navigate into the amazon-lookout-for-vision-python-sdk/example folder.

The folder contains an example notebook that walks you through building, training, and deploying a model. Before you get started, you need to upload the images to use to train the model into your notebook instance.

  1. In the example/folder, create two new folders named good and bad.
  2. Navigate into both folders and upload your images accordingly.

Example images are in the downloaded GitHub repository.

  1. After you upload the images, open the lookout_for_vision_example.ipynb notebook.

The notebook walks you through the process of creating your model. One important step you should do first is provide the following information:

# Training & Inference
input_bucket = "YOUR_S3_BUCKET_FOR_TRAINING"
project_name = "YOUR_PROJECT_NAME"
model_version = "1" # leave this as one if you start right at the beginning

# Inference
output_bucket = "YOUR_S3_BUCKET_FOR_INFERENCE" # can be same as input_bucket
input_prefix = "YOUR_KEY_TO_FILES_TO_PREDICT/" # used in batch_predict
output_prefix = "YOUR_KEY_TO_SAVE_FILES_AFTER_PREDICTION/" # used in batch_predict

You can ignore the inference section, but feel free to also play around with this part of the notebook. Because you’re just getting started, you can leave model_version set to “1”.

For input_bucket and project_name, use the S3 bucket and Amazon Lookout for Vision project name that are provided as part of the build.sh script. You can then run each cell in the notebook, which successfully deploys the model.

You can view the training metrics using the SDK, but you can also find them on the console. To do so, open your project, navigate to the models, and choose the model you’ve trained. The metrics are available on the Performance metrics tab.

You’re now ready to deploy a static website that can call your model on demand.

Deploy the static website

Your first step is to add the endpoint of your Amazon API Gateway to your static website’s source code.

  1. On the API Gateway console, find the REST API called LookoutVisionAPI.
  2. Open the API and choose Stages.
  3. On the stage’s drop-down menu (for this post, dev), choose the POST
  4. Copy the value for Invoke URL.

We add the URL to the HTML source code.

  1. Open the file html/index.html.

At the end of the file, you can find a section that uses jQuery to trigger an AJAX request. One key is called url, which has an empty string as its value.

  1. Enter the URL you copied as your new url value and save the file.

The code should look similar to the following:

$.ajax({
    type: 'POST',
    url: 'https://<API_Gateway_ID>.execute-api.<AWS_REGION>.amazonaws.com/dev/amazon-lookout-vision-api',
    data: JSON.stringify({coordinates: coordinates, image: reader.result}),
    cache: false,
    contentType: false,
    processData: false,
    success:function(data) {
        var anomaly = data["IsAnomalous"]
        var confidence = data["Confidence"]
        text = "Anomaly:" + anomaly + "<br>" + "Confidence:" + confidence + "<br>";
        $("#json").html(text);
    },
    error: function(data){
        console.log("error");
        console.log(data);
}});
  1. Convert the index.html file to a .zip file.
  2. On the AWS Amplify console, choose the app ObjectTracking.

The front-end environment page of your app opens automatically.

  1. Select Deploy without Git provider.

You can enhance this piece to connect AWS Amplify to Git and automate your whole deployment.

  1. Choose Connect branch.

  1. For Environment name¸ enter a name (for this post, we enter dev).
  2. For Method, select Drag and drop.
  3. Choose Choose files to upload the index.html.zip file you created.
  4. Choose Save and deploy.

After the deployment is successful, you can use your web application by choosing the domain displayed in AWS Amplify.

Detect anomalies

Congratulations! You just built a solution to automate the detection of anomalies in silicon wafers and alert an operator to take appropriate action. The data we use for Amazon Lookout for Vision is a wafer map taken from Wikipedia. A few “bad” spots have been added to mimic real-world scenarios in semiconductor manufacturing.

After deploying the solution, you can run a test to see how it works. When you open the AWS Amplify domain, you see a website that lets you upload an image. For this post, we present the result of detecting a bad wafer with a so-called donut pattern. After you upload the image, it’s displayed on your website.

If the image is detected as an anomaly, Amazon Connect calls your phone number and you can interact with the service.

Conclusion

In this post, we used Amazon Lookout for Vision to automate the detection of anomalies in silicon wafers and alert an operator in real time using Amazon Connect so they can take action as needed.

This solution isn’t bound to just wafers. You can extend it to object tracking in transportation, products in manufacturing, and other endless possibilities.


About the Authors

Tolla Cherwenka is an AWS Global Solutions Architect who is certified in data and analytics. She uses an art of the possible approach to work backwards from business goals to develop transformative event-driven data architectures that enable data-driven decisions. Moreover, she is passionate about creating prescriptive solutions for refactoring to mission critical monolithic workloads to microservices, supply chain and connected factories that leverage IOT, machine learning, big data and analytics services.

 

 Michael Wallner is a Global Data Scientist with AWS Professional Services and is passionate about enabling customers on their AI/ML journey in the cloud to become AWSome. Besides having a deep interest in Amazon Connect he likes sports and enjoys cooking.

 

 

Krithivasan Balasubramaniyan is a Principal Consultant at Amazon Web Services. He enables global enterprise customers in their digital transformation journey and helps architect cloud native solutions.

 

Read More

Evolving Reinforcement Learning Algorithms

Posted by John D. Co-Reyes, Research Intern and Yingjie Miao, Senior Software Engineer, Google Research

A long-term, overarching goal of research into reinforcement learning (RL) is to design a single general purpose learning algorithm that can solve a wide array of problems. However, because the RL algorithm taxonomy is quite large, and designing new RL algorithms requires extensive tuning and validation, this goal is a daunting one. A possible solution would be to devise a meta-learning method that could design new RL algorithms that generalize to a wide variety of tasks automatically.

In recent years, AutoML has shown great success in automating the design of machine learning components, such as neural networks architectures and model update rules. One example is Neural Architecture Search (NAS), which has been used to develop better neural network architectures for image classification and efficient architectures for running on phones and hardware accelerators. In addition to NAS, AutoML-Zero shows that it’s even possible to learn the entire algorithm from scratch using basic mathematical operations. One common theme in these approaches is that the neural network architecture or the entire algorithm is represented by a graph, and a separate algorithm is used to optimize the graph for certain objectives.

These earlier approaches were designed for supervised learning, in which the overall algorithm is more straightforward. But in RL, there are more components of the algorithm that could be potential targets for design automation (e.g., neural network architectures for agent networks, strategies for sampling from the replay buffer, overall formulation of the loss function), and it is not always clear what the best model update procedure would be to integrate these components. Prior efforts for the automation RL algorithm discovery have focused primarily on model update rules. These approaches learn the optimizer or RL update procedure itself and commonly represent the update rule with a neural network such as an RNN or CNN, which can be efficiently optimized with gradient-based methods. However, these learned rules are not interpretable or generalizable, because the learned weights are opaque and domain specific.

In our paper “Evolving Reinforcement Learning Algorithms”, accepted at ICLR 2021, we show that it’s possible to learn new, analytically interpretable and generalizable RL algorithms by using a graph representation and applying optimization techniques from the AutoML community. In particular, we represent the loss function, which is used to optimize an agent’s parameters over its experience, as a computational graph, and use Regularized Evolution to evolve a population of the computational graphs over a set of simple training environments. This results in increasingly better RL algorithms, and the discovered algorithms generalize to more complex environments, even those with visual observations like Atari games.

RL Algorithm as a Computational Graph
Inspired by ideas from NAS, which searches over the space of graphs representing neural network architectures, we meta-learn RL algorithms by representing the loss function of an RL algorithm as a computational graph. In this case, we use a directed acyclic graph for the loss function, with nodes representing inputs, operators, parameters and outputs. For example, in the computational graph for DQN, input nodes include data from the replay buffer, operator nodes include neural network operators and basic math operators, and the output node represents the loss, which will be minimized with gradient descent.

There are a few benefits of such a representation. This representation is expressive enough to define existing algorithms but also new, undiscovered algorithms. It is also interpretable. This graph representation can be analyzed in the same way as human designed RL algorithms, making it more interpretable than approaches that use black box function approximators for the entire RL update procedure. If researchers can understand why a learned algorithm is better, then they can both modify the internal components of the algorithm to improve it and transfer the beneficial components to other problems. Finally, the representation supports general algorithms that can solve a wide variety of problems.

Example computation graph for DQN which computes the squared Bellman error.

We implemented this representation using the PyGlove library, which conveniently turns the graph into a search space that can be optimized with regularized evolution.

Evolving RL Algorithms
We use an evolutionary based approach to optimize the RL algorithms of interest. First, we initialize a population of training agents with randomized graphs. This population of agents is trained in parallel over a set of training environments. The agents first train on a hurdle environment — an easy environment, such as CartPole, intended to quickly weed out poorly performing programs.

If an agent cannot solve the hurdle environment, the training is stopped early with a score of zero. Otherwise the training proceeds to more difficult environments (e.g., Lunar Lander, simple MiniGrid environments, etc.). The algorithm performance is evaluated and used to update the population, where more promising algorithms are further mutated. To reduce the search space, we use a functional equivalence checker which will skip over newly proposed algorithms if they are functionally the same as previously examined algorithms. This loop continues as new mutated candidate algorithms are trained and evaluated. At the end of training, we select the best algorithm and evaluate its performance over a set of unseen test environments.

The population size in the experiments was around 300 agents, and we observed the evolution of good candidate loss functions after 20-50 thousand mutations, requiring about three days of training. We were able to train on CPUs because the training environments were simple, controlling for the computational and energy cost of training. To further control the cost of training, we seeded the initial population with human-designed RL algorithms such as DQN.

Overview of meta-learning method. Newly proposed algorithms must first perform well on a hurdle environment before being trained on a set of harder environments. Algorithm performance is used to update a population where better performing algorithms are further mutated into new algorithms. At the end of training, the best performing algorithm is evaluated on test environments.

Learned Algorithms
We highlight two discovered algorithms that exhibit good generalization performance. The first is DQNReg, which builds on DQN by adding a weighted penalty on the Q-values to the normal squared Bellman error. The second learned loss function, DQNClipped, is more complex, although its dominating term has a simple form — the max of the Q-value and the squared Bellman error (modulo a constant). Both algorithms can be viewed as a way to regularize the Q-values. While DQNReg adds a soft constraint, DQNClipped can be interpreted as a kind of constrained optimization that will minimize the Q-values if they become too large. We show that this learned constraint kicks in during the early stage of training when overestimating the Q-values is a potential issue. Once this constraint is satisfied, the loss will instead minimize the original squared Bellman error.

A closer analysis shows that while baselines like DQN commonly overestimate Q-values, our learned algorithms address this issue in different ways. DQNReg underestimates the Q-values, while DQNClipped has similar behavior to double dqn in that it slowly approaches the ground truth without overestimating it.

It’s worth pointing out that these two algorithms consistently emerge when the evolution is seeded with DQN. Learning from scratch, the method rediscovers the TD algorithm. For completeness, we release a dataset of top 1000 performing algorithms discovered during evolution. Curious readers could further investigate the properties of these learned loss functions.

Overestimated values are generally a problem in value-based RL. Our method learns algorithms that have found a way to regularize the Q-values and thus reduce overestimation.

Learned Algorithms Generalization Performance
Normally in RL, generalization refers to a trained policy generalizing across tasks. However, in this work we’re interested in algorithmic generalization performance, which means how well an algorithm works over a set of environments. On a set of classical control environments, the learned algorithms can match baselines on the dense reward tasks (CartPole, Acrobot, LunarLander) and outperform DQN on the sparser reward task, MountainCar.

Performance of learned algorithms versus baselines on classical control environments.

On a set of sparse reward MiniGrid environments, which test a variety of different tasks, we see that DQNReg greatly outperforms baselines on both the training and test environments, in terms of sample efficiency and final performance. In fact, the effect is even more pronounced on the test environments, which vary in size, configuration, and existence of new obstacles, such as lava.

Training environment performance versus training steps as measured by episode return over 10 training seeds. DQNReg can match or outperform baselines in sample efficiency and final performance.
DQNReg can greatly outperform baselines on unseen test environments.

We visualize the performance of normal DDQN vs. the learned algorithm DQNReg on a few MiniGrid environments. The starting location, wall configuration, and object configuration of these environments are randomized at each reset, which requires the agent to generalize instead of simply memorizing the environment. While DDQN often struggles to learn any meaningful behavior, DQNReg can learn the optimal behavior efficiently.

DDQN
DQNReg (Learned) 

Even on image-based Atari environments we observe improved performance, even though training was on non-image-based environments. This suggests that meta-training on a set of cheap but diverse training environments with a generalizable algorithm representation could enable radical algorithmic generalization.

Env DQN DDQN PPO DQNReg
Asteroid 1364.5 734.7 2097.5 2390.4
Bowling 50.4 68.1 40.1 80.5
Boxing 88.0 91.6 94.6 100.0
RoadRunner   39544.0     44127.0     35466.0     65516.0  
Performance of learned algorithm, DQNReg, against baselines on several Atari games. Performance is evaluated over 200 test episodes every 1 million steps.

Conclusion
In this post, we’ve discussed learning new interpretable RL algorithms by representing their loss functions as computational graphs and evolving a population of agents over this representation. The computational graph formulation allows researchers to both build upon human-designed algorithms and study the learned algorithms using the same mathematical toolset as the existing algorithms. We analyzed a few of the learned algorithms and can interpret them as a form of entropy regularization to prevent value overestimation. These learned algorithms can outperform baselines and generalize to unseen environments. The top performing algorithms are available for further analytical study.

We hope that future work will extend to more varied RL settings such as actor critic algorithms or offline RL. Furthermore we hope that this work can lead to machine assisted algorithm development where computational meta-learning can help researchers find new directions to pursue and incorporate learned algorithms into their own work.

Acknowledgements
We thank our co-authors Daiyi Peng, Esteban Real, Sergey Levine, Quoc V. Le, Honglak Lee, and Aleksandra Faust. We also thank Luke Metz for helpful early discussions and feedback on the paper, Hanjun Dai for early discussions on related research ideas, Xingyou Song, Krzysztof Choromanski, and Kevin Wu for helping with infrastructure, and Jongwook Choi for helping with environment selection. Finally we thank Tom Small for designing animations for this post.

Read More

Making Movie Magic, NVIDIA Powers 13 Years of Oscar-Winning Visual Effects

For the 13th year running, NVIDIA professional GPUs have powered the dazzling visuals and cinematics behind every Academy Award nominee for Best Visual Effects.

The 93rd annual Academy Awards will take place on Sunday, April 25, with five VFX nominees in the running:

  • The Midnight Sky
  • Tenet
  • Mulan
  • The One and Only Ivan
  • Love and Monsters

NVIDIA professional GPUs have been behind award-winning graphics in films for over a decade. During that time, the most stunning visual effects shots have formed the backdrop for the Best Visual Effects Oscar.

Although some traditional nominees, namely tentpole summer blockbusters, weren’t released in 2020 because of the pandemic, this year’s lineup still brought innovative tools, new techniques and impressive visuals to the big screen.

For the visuals in The Midnight Sky, Framestore delivered the breathtaking VFX and deft keyframe animation for which they are renowned. Add in cutting-edge film tech like ILM Stagecraft and Anyma, and George Clooney supervising previsualization and face replacement sequences, and it’s no wonder that Framestore swept the Visual Effects Society Awards this year.

Christopher Nolan’s latest film, Tenet, is made up of 300 VFX shots that create a sense of time inversion. During action sequences, DNEG used new temporal techniques to show time moving forward and in reverse.

In Paramount’s Love and Monsters, a sci-fi comedy about giant creatures, Toronto-based visual effects company Mr. X delivers top-notch graphics that earned them their first Oscars nomination. From colossal snails to complex crustaceans, the film featured 13 unique, mutated creatures. The VFX and animation teams crafted the creatures’ movements based on how each would interact in a post-apocalyptic world.

And to create the impressive set extensions, scenic landscapes and massive crowds in Disney’s most recent live-action film, Mulan, Weta Digital tapped NVIDIA GPU-accelerated technology to immerse the audience in a world of epic scale.

While only one visual effects team will accept an award at Sunday’s ceremony, millions of artists are creating stunning visuals and cinematics with NVIDIA RTX. Whether it’s powering virtual production sets or accelerating AI tools, RTX technology is shaping the future of storytelling.

Learn more about NVIDIA technology in media and entertainment.

Featured image courtesy of Framestore. © NETFLIX

The post Making Movie Magic, NVIDIA Powers 13 Years of Oscar-Winning Visual Effects appeared first on The Official NVIDIA Blog.

Read More

Quality Assessment for SageMaker Ground Truth Video Object Tracking Annotations using Statistical Analysis

Data quality is an important topic for virtually all teams and systems deriving insights from data, especially teams and systems using machine learning (ML) models. Supervised ML is the task of learning a function that maps an input to an output based on examples of input-output pairs. For a supervised ML algorithm to effectively learn this mapping, the input-output pairs must be accurately labeled, which makes data labeling a crucial step in any supervised ML task.

Supervised ML is commonly used in the computer vision space. You can train an algorithm to perform a variety of tasks, including image classification, bounding box detection, and semantic segmentation, among many others. Computer vision annotation tools, like those available in Amazon SageMaker Ground Truth (Ground Truth), simplify the process of creating labels for computer vision algorithms and encourage best practices, resulting in high-quality labels.

To ensure quality, humans must be involved at some stage to either annotate or verify the assets. However, human labelers are often expensive, so it’s important to use them cost-effectively. There is no industry-wide standard for automatically monitoring the quality of annotations during the labeling process of images (or videos or point clouds), so human verification is the most common solution.

The process for human verification of labels involves expert annotators (verifiers) verifying a sample of the data labeled by a primary annotator where the experts correct (overturn) any errors in the labels. You can often find candidate samples that require label verification by using ML methods. In some scenarios, you need the same images, videos, or point clouds to be labeled and processed by multiple labelers to determine ground truth when there is ambiguity. Ground Truth accomplishes this through annotation consolidation to get agreement on what the ground truth is based on multiple responses.

In computer vision, we often deal with tasks that contain a temporal dimension, such as video and LiDAR sensors capturing sequential frames. Labeling this kind of sequential data is complex and time consuming. The goal of this blog post is to reduce the total number of frames that need human review by performing automated quality checks in multi-object tracking (MOT) time series data like video object tracking annotations while maintaining data quality at scale. The quality initiative in this blog post proposes science-driven methods that take advantage of the sequential nature of these inputs to automatically identify potential outlier labels. These methods enable you to a) objectively track data labeling quality for Ground Truth video, b) use control mechanisms to achieve and maintain quality targets, and c) optimize costs to obtain high-quality data.

We will walk through an example situation in which a large video dataset has been labeled by primary human annotators for a ML system and demonstrate how to perform automatic quality assurance (QA) to identify samples that may not be labeled properly. How can this be done without overwhelming a team’s limited resources? We’ll show you how using Ground Truth and Amazon SageMaker.

Background

Data annotation is, typically, a manual process in which the annotator follows a set of guidelines and operates in a “best-guess” manner. Discrepancies in labeling criteria between annotators can have an effect on label quality, which may impact algorithm inference performance downstream.

For sequential inputs like video at a high frame rate, it can be assumed that a frame at time t will be very similar to a frame at time t+1. This extends to the labeled objects in the frames and allows large deviations between labels across labels to be considered outliers, which can be identified with statistical metrics. Auditors can be directed to pay special attention to these outlier frames in the verification process.

A common theme in feedback from customers is the desire to create a standard methodology and framework to monitor annotations from Ground Truth and identify frames with low-quality annotations for auditing purposes. We propose this framework to allow you to measure the quality on a certain set of metrics and take action — for example, by sending those specific frames for relabeling using Ground Truth or Amazon Augmented AI (Amazon A2I).

The following table provides a glossary of terms frequently used in this post.

Term Meaning
Annotation The process whereby a human manually captures metadata related to a task. An example would be drawing the outline of the products in a still image.
SageMaker Ground Truth Ground Truth handles the scheduling of various annotation tasks and collecting the results. It also supports defining labor pools and labor requirements for performing the annotation tasks.
IoU The intersection over union (IoU) ratio measures overlap between two regions of interest in an image. This measures how good our object detector prediction is with the ground truth (the real object boundary).
Detection rate The number of detected boxes/number of ground truth boxes.
Annotation pipeline The complete end-to-end process of capturing a dataset for annotation, submitting the dataset for annotation, performing the annotation, performing quality checks and adjusting incorrect annotations.
Source data The MOT17 dataset.
Target data The unified ground truth dataset.

Evaluation metrics

This is an exciting open area of research for quality validation of annotations using statistical approaches, and the following quality metrics are often used to perform statistical validation.

Intersection over union (IoU)

IoU is the overlap between the ground truth and the prediction for each frame and the percentage of overlap between two bounding boxes. A high IoU combined with a low Hausdorff Distance indicates that a source bounding box corresponds well with a target bounding box in geometric space. These parameters may also indicate a skew in imagery. A low IoU may indicate quality conflicts between bounding boxes.

In the preceding equation bp is predicted bounding box and bgt is the ground truth bounding box.

Center Loss

Center loss is the distance between bounding box centers:


In the preceding equation (xp>,yp) is the center of predicted bounding box and (xgt,ygt) is the center of the ground truth bounding box.

IoU distribution

If the mean, median, and mode of an object’s IoU is drastically different than other objects, we may want to flag the object in question for manual auditing. We can use visualizations like heat maps for a quick understanding of object-level IoU variance.

MOT17 Dataset

The Multi Object Tracking Benchmark is a commonly used benchmark for multiple target tracking evaluation. They have a variety of datasets for training and evaluating multi-object tracking models available. For this post, we use the MOT17 dataset for our source data, which is based around detecting and tracking a large number of vehicles.

Solution

To run and customize the code used in this blog post, use the notebook Ground_Truth_Video_Quality_Metrics.ipynb in the Amazon SageMaker Examples tab of a notebook instance, under Ground Truth Labeling Jobs. You can also find the notebook on GitHub.

Download MOT17 dataset

Our first step is to download the data, which takes a few minutes, unzip it, and send it to Amazon Simple Storage Service (Amazon S3) so we can launch audit jobs. See the following code:

# Grab our data this will take ~5 minutes
!wget https://motchallenge.net/data/MOT17.zip -O /tmp/MOT17.zip
    
# unzip our data
!unzip -q /tmp/MOT17.zip -d MOT17
!rm /tmp/MOT17.zip

View MOT17 annotations

Now let’s look at what the existing MOT17 annotations look like.

In the following image, we have a scene with a large number of cars and pedestrians on a street. The labels include both bounding box coordinates as well as unique IDs for each object, or in this case cars, being tracked.

Evaluate our labels

For demonstration purposes, we’ve labeled three vehicles in one of the videos and inserted a few labeling anomalies into the annotations. Although human labelers tend to be accurate, they’re subject to conditions like distraction and fatigue, which can affect label quality. If we use automated methods to identify annotator mistakes and send directed recommendations for frames and objects to fix, we can make the label auditing process more accurate and efficient. If a labeler only has to focus on a few frames instead of a deep review of the entire scene, they can drastically improve speed and reduce cost.

Analyze our tracking data

Let’s put our tracking data into a form that’s easier to analyze.

We use a function to take the output JSON from Ground Truth and turn our tracking output into a dataframe. We can use this to plot values and metrics that will help us understand how the object labels move through our frames. See the following code:

# generate dataframes
lab_frame_real = create_annot_frame(tlabels['tracking-annotations'])
lab_frame_real.head()

Plot progression

Let’s start with some simple plots. The following plots illustrate how the coordinates of a given object progress through the frames of your video. Each bounding box has a left and top coordinate, representing the top-left point of the bounding box. We also have height and width values that let us determine the other three points of the box.

In the following plots, the blue lines represent the progression of our four values (top coordinate, left coordinate, width, and height) through the video frames and the orange lines represent a rolling average of the values from the previous five frames. Because a video is a sequence of frames, if we have a video that has five frames per second or more, the objects within the video (and the bounding boxes drawn around them) should have some amount of overlap between frames. In our video, we have vehicles driving at a normal pace so our plots should show a relatively smooth progression.

We can also plot the deviation between the rolling average and the actual values of bounding box coordinates. We’ll likely want to look at frames where the actual value deviates substantially from the rolling average.

Plot box sizes

Let’s combine the width and height values to look at how the size of the bounding box for a given object progresses through the scene. For Vehicle 1, we intentionally reduced the size of the bounding box on frame 139 and restored it on frame 141. We also removed a bounding box on frame 217. We can see both of these flaws reflected in our size progression plots.

Box size differential

Let’s now look at how the size of the box changes from frame to frame by plotting the actual size differential. This allows us to get a better idea of the magnitude of these changes. We can also normalize the magnitude of the size changes by dividing the size differentials by the sizes of the boxes. This lets us express the differential as a percentage change from the original size of the box. This makes it easier to set thresholds beyond which we can classify this frame as potentially problematic for this object bounding box. The following plots visualize both the absolute size differential and the size differential as a percentage. We can also add lines representing where the bounding box changed by more than 20% in size from one frame to the next.

View the frames with the largest size differential

Now that we have the indexes for the frames with the largest size differential, we can view them in sequence. If we look at the following frames, we can see for Vehicle 1 we were able to identify frames where our labeler made a mistake. Frame 217 was flagged because there was a large difference between frame 216 and the subsequent frame, frame 217.

Rolling IoU

IoU is a commonly used evaluation metric for object detection. We calculate it by dividing the area of overlap between two bounding boxes by the area of union for two bounding boxes. Although it’s typically used to evaluate the accuracy of a predicted box against a ground truth box, we can use it to evaluate how much overlap a given bounding box has from one frame of a video to the next.

Because our frames differ, we don’t expect a given bounding box for a single object to have 100% overlap with the corresponding bounding box from the next frame. However, depending on the frames per second for the video, there often is only a small amount of change in one from to the next because the time elapsed between frames is only a fraction of a second. For higher FPS video, we can expect a substantial amount of overlap between frames. The MOT17 videos are all shot at 25 FPS, so these videos qualify. Operating with this assumption, we can use IoU to identify outlier frames where we see substantial differences between a bounding box in one frame to the next. See the following code:

# calculate rolling intersection over union
def calc_frame_int_over_union(annot_frame, obj, i):
    lframe_len = max(annot_frame['frameid'])
    annot_frame = annot_frame[annot_frame.obj==obj]
    annot_frame.index = list(np.arange(len(annot_frame)))
    coord_vec = np.zeros((lframe_len+1,4))
    coord_vec[annot_frame['frameid'].values, 0] = annot_frame['left']
    coord_vec[annot_frame['frameid'].values, 1] = annot_frame['top']
    coord_vec[annot_frame['frameid'].values, 2] = annot_frame['width']
    coord_vec[annot_frame['frameid'].values, 3] = annot_frame['height']
    boxA = [coord_vec[i,0], coord_vec[i,1], coord_vec[i,0] + coord_vec[i,2], coord_vec[i,1] + coord_vec[i,3]]
    boxB = [coord_vec[i+1,0], coord_vec[i+1,1], coord_vec[i+1,0] + coord_vec[i+1,2], coord_vec[i+1,1] + coord_vec[i+1,3]]
    return bb_int_over_union(boxA, boxB)
# create list of objects
objs = list(np.unique(label_frame.obj))
# iterate through our objects to get rolling IoU values for each
iou_dict = {}
for obj in objs:
    iou_vec = np.ones(len(np.unique(label_frame.frameid)))
    ious = []
    for i in label_frame[label_frame.obj==obj].frameid[:-1]:
        iou = calc_frame_int_over_union(label_frame, obj, i)
        ious.append(iou)
        iou_vec[i] = iou
    iou_dict[obj] = iou_vec
    
fig, ax = plt.subplots(nrows=1,ncols=3, figsize=(24,8), sharey=True)
ax[0].set_title(f'Rolling IoU {objs[0]}')
ax[0].set_xlabel('frames')
ax[0].set_ylabel('IoU')
ax[0].plot(iou_dict[objs[0]])
ax[1].set_title(f'Rolling IoU {objs[1]}')
ax[1].set_xlabel('frames')
ax[1].set_ylabel('IoU')
ax[1].plot(iou_dict[objs[1]])
ax[2].set_title(f'Rolling IoU {objs[2]}')
ax[2].set_xlabel('frames')
ax[2].set_ylabel('IoU')
ax[2].plot(iou_dict[objs[2]])

The following plots show our results:

Identify and visualize low overlap frames

Now that we have calculated our intersection over union for our objects, we can identify objects below an IoU threshold we set. Let’s say we want to identify frames where the bounding box for a given object has less than 50% overlap. We can use the following code:

## ID problem indices
iou_thresh = 0.5
vehicle = 1 # because index starts at 0, 0 -> vehicle:1, 1 -> vehicle:2, etc.
# use np.where to identify frames below our threshold.
inds = np.where(np.array(iou_dict[objs[vehicle]]) < iou_thresh)[0]
worst_ind = np.argmin(np.array(iou_dict[objs[vehicle]]))
print(objs[vehicle],'worst frame:', worst_ind)

Visualize low overlap frames

Now that we have identified our low overlap frames, let’s view them. We can see for Vehicle:2, there is an issue on frame 102, compared to frame 101.

The annotator made a mistake and the bounding box for Vehicle:2 does not go low enough and clearly needs to be extended.

Thankfully our IoU metric was able to identify this!

Embedding comparison

The two preceding methods work because they’re simple and are based on the reasonable assumption that objects in high FPS video don’t move too much from frame to frame. They can be considered more classical methods of comparison. Can we improve upon them? Let’s try something more experimental.

We can use a deep learning method to identify outliers is to generate embeddings for our bounding box crops with an image classification model like ResNet and compare these across frames. Convolutional neural network image classification models have a final fully connected layer using a softmax or scaling activation function that outputs probabilities. If we remove the final layer of our network, our predictions will instead be the image embedding that is essentially the neural network’s representation of the image. If we isolate our objects by cropping our images, we can compare the representations of these objects across frames to see if we can identify any outliers.

We can use a ResNet18 model from Torchhub that was trained on ImageNet. Because ImageNet is a very large and generic dataset, the network over time was able to learn information regarding images that allow it to classify them into different categories. While a neural network more finely tuned on vehicles would likely perform better, a network trained on a large dataset like ImageNet should have learned enough information to give us some indication if images are similar.

The following code shows our crops:

def plot_crops(obj = 'Vehicle:1', start=0):
    fig, ax = plt.subplots(nrows=1, ncols=5, figsize=(20,12))
    for i,a in enumerate(ax):
        a.imshow(img_crops[i+start][obj])
        a.set_title(f'Frame {i+start}')
plot_crops(start=1)

The following image compares the crops in each frame:

Let’s compute the distance between our sequential embeddings for a given object:

def compute_dist(img_embeds, dist_func=distance.euclidean, obj='Vehicle:1'):
    dists = []
    inds = []
    for i in img_embeds:
        if (i>0)&(obj in list(img_embeds[i].keys())):
            if (obj in list(img_embeds[i-1].keys())):
                dist = dist_func(img_embeds[i-1][obj],img_embeds[i][obj]) # distance  between frame at t0 and t1
                dists.append(dist)
                inds.append(i)
    return dists, inds
obj = 'Vehicle:2'
dists, inds = compute_dist(img_embeds, obj=obj)
    
# look for distances that are 2 standard deviation greater than the mean distance
prob_frames = np.where(dists>(np.mean(dists)+np.std(dists)*2))[0]
prob_inds = np.array(inds)[prob_frames]
print(prob_inds)
print('The frame with the greatest distance is frame:', inds[np.argmax(dists)])

Let’s look at the crops for our problematic frames. We can see we were able to catch the issue on frame 102 where the bounding box was off-center.

Combine the metrics

Now that we have explored several methods for identifying anomalous and potentially problematic frames, let’s combine them and identify all of those outlier frames (see the following code). Although we might have a few false positives, these tend to be areas with a lot of action that we might want our annotators to review regardless.

def get_problem_frames(lab_frame, flawed_labels, size_thresh=.25, iou_thresh=.4, embed=False, imgs=None, verbose=False, embed_std=2):
    """
    Function for identifying potentially problematic frames using bounding box size, rolling IoU, and optionally embedding comparison.
    """
    if embed:
        model = torch.hub.load('pytorch/vision:v0.6.0', 'resnet18', pretrained=True)
        model.eval()
        modules=list(model.children())[:-1]
        model=nn.Sequential(*modules)
        
    frame_res = {}
    for obj in list(np.unique(lab_frame.obj)):
        frame_res[obj] = {}
        lframe_len = max(lab_frame['frameid'])
        ann_subframe = lab_frame[lab_frame.obj==obj]
        size_vec = np.zeros(lframe_len+1)
        size_vec[ann_subframe['frameid'].values] = ann_subframe['height']*ann_subframe['width']
        size_diff = np.array(size_vec[:-1])- np.array(size_vec[1:])
        norm_size_diff = size_diff/np.array(size_vec[:-1])
        norm_size_diff[np.where(np.isnan(norm_size_diff))[0]] = 0
        norm_size_diff[np.where(np.isinf(norm_size_diff))[0]] = 0
        frame_res[obj]['size_diff'] = [int(x) for x in size_diff]
        frame_res[obj]['norm_size_diff'] = [int(x) for x in norm_size_diff]
        try:
            problem_frames = [int(x) for x in np.where(np.abs(norm_size_diff)>size_thresh)[0]]
            if verbose:
                worst_frame = np.argmax(np.abs(norm_size_diff))
                print('Worst frame for',obj,'in',frame, 'is: ',worst_frame)
        except:
            problem_frames = []
        frame_res[obj]['size_problem_frames'] = problem_frames
        iou_vec = np.ones(len(np.unique(lab_frame.frameid)))
        for i in lab_frame[lab_frame.obj==obj].frameid[:-1]:
            iou = calc_frame_int_over_union(lab_frame, obj, i)
            iou_vec[i] = iou
            
        frame_res[obj]['iou'] = iou_vec.tolist()
        inds = [int(x) for x in np.where(iou_vec<iou_thresh)[0]]
        frame_res[obj]['iou_problem_frames'] = inds
        
        if embed:
            img_crops = {}
            img_embeds = {}
            for j,img in tqdm(enumerate(imgs)):
                img_arr = np.array(img)
                img_embeds[j] = {}
                img_crops[j] = {}
                for i,annot in enumerate(flawed_labels['tracking-annotations'][j]['annotations']):
                    try:
                        crop = img_arr[annot['top']:(annot['top']+annot['height']),annot['left']:(annot['left']+annot['width']),:]                    
                        new_crop = np.array(Image.fromarray(crop).resize((224,224)))
                        img_crops[j][annot['object-name']] = new_crop
                        new_crop = np.reshape(new_crop, (1,224,224,3))
                        new_crop = np.reshape(new_crop, (1,3,224,224))
                        torch_arr = torch.tensor(new_crop, dtype=torch.float)
                        with torch.no_grad():
                            emb = model(torch_arr)
                        img_embeds[j][annot['object-name']] = emb.squeeze()
                    except:
                        pass
                    
            dists = compute_dist(img_embeds, obj=obj)
            # look for distances that are 2+ standard deviations greater than the mean distance
            prob_frames = np.where(dists>(np.mean(dists)+np.std(dists)*embed_std))[0]
            frame_res[obj]['embed_prob_frames'] = prob_frames.tolist()
        
    return frame_res
    
# if you want to add in embedding comparison, set embed=True
num_images_to_validate = 300
embed = False
frame_res = get_problem_frames(label_frame, flawed_labels, size_thresh=.25, iou_thresh=.5, embed=embed, imgs=imgs[:num_images_to_validate])
        
prob_frame_dict = {}
all_prob_frames = []
for obj in frame_res:
    prob_frames = list(frame_res[obj]['size_problem_frames'])
    prob_frames.extend(list(frame_res[obj]['iou_problem_frames']))
    if embed:
        prob_frames.extend(list(frame_res[obj]['embed_prob_frames']))
    all_prob_frames.extend(prob_frames)
    
prob_frame_dict = [int(x) for x in np.unique(all_prob_frames)]
prob_frame_dict

Launch a directed audit job

Now that we’ve identified our problematic annotations, we can launch a new audit labeling job to review identified outlier frames. We can do this via the SageMaker console, but when we want to launch jobs in a more automated fashion, using the boto3 API is very helpful.

Generate manifests

SageMaker Ground Truth operates using manifests. When using a modality like image classification, a single image corresponds to a single entry in a manifest and a given manifest will contains paths for all of the images to be labeled in a single manifest. For videos, because we have multiple frames per video and we can have multiple videos in a single manifest, this is organized instead by using a JSON sequence file for each video that contains all the paths for our frames. This allows a single manifest to contain multiple videos for a single job. For example, the following code:

# create manifest
man_dict = {}
for vid in all_vids:
    source_ref = f"s3://{bucket}/tracking_manifests/{vid.split('/')[-1]}_seq.json"
    annot_labels = f"s3://{bucket}/tracking_manifests/SeqLabel.json"
    manifest = {
        "source-ref": source_ref,
        'Person':annot_labels, 
        "Person-metadata":{"class-map": {"1": "Pedestrian"},
                         "human-annotated": "yes",
                         "creation-date": "2020-05-25T12:53:54+0000",
                         "type": "groundtruth/video-object-tracking"}
    }
    man_dict[vid] = manifest
    
# save videos as individual jobs
for vid in all_vids:
    with open(f"tracking_manifests/{vid.split('/')[-1]}.manifest", 'w') as f:
        json.dump(man_dict[vid],f)
        
# put multiple videos in a single manifest, with each job as a line
# with open(f"/home/ec2-user/SageMaker/tracking_manifests/MOT17.manifest", 'w') as f:
#     for vid in all_vids:    
#         f.write(json.dumps(man_dict[vid]))
#         f.write('n')
        
print('Example manifest: ', manifest)

The following is our manifest file:

Example manifest:  {'source-ref': 's3://smgt-qa-metrics-input-322552456788-us-west-2/tracking_manifests/MOT17-13-SDP_seq.json', 'Person': 's3://smgt-qa-metrics-input-322552456788-us-west-2/tracking_manifests/SeqLabel.json', 'Person-metadata': {'class-map': {'1': 'Vehicle'}, 'human-annotated': 'yes', 'creation-date': '2020-05-25T12:53:54+0000', 'type': 'groundtruth/video-object-tracking'}}

Launch jobs

We can use this template for launching labeling jobs (see the following code). For the purposes of this post, we already have labeled data, so this isn’t necessary, but if you want to label the data yourself, you can do so using a private workteam.

# generate jobs
job_names = []
outputs = []
# for vid in all_vids:
LABELING_JOB_NAME = f"mot17-tracking-adjust-{int(time.time()} "
task = 'AdjustmentVideoObjectTracking'
job_names.append(LABELING_JOB_NAME)
INPUT_MANIFEST_S3_URI = f's3://{bucket}/tracking_manifests/MOT20-01.manifest'
createLabelingJob_request = {
  "LabelingJobName": LABELING_JOB_NAME,
  "HumanTaskConfig": {
    "AnnotationConsolidationConfig": {
      "AnnotationConsolidationLambdaArn": f"arn:aws:lambda:us-east-1:432418664414:function:ACS-{task}"
    }, # changed us-west-2 to us-east-1
    "MaxConcurrentTaskCount": 200,
    "NumberOfHumanWorkersPerDataObject": 1,
    "PreHumanTaskLambdaArn": f"arn:aws:lambda:us-east-1:432418664414:function:PRE-{task}",
    "TaskAvailabilityLifetimeInSeconds": 864000,
    "TaskDescription": f"Please draw boxes around vehicles, with a specific focus on the following frames {prob_frame_dict}",
    "TaskKeywords": [
      "Image Classification",
      "Labeling"
    ],
    "TaskTimeLimitInSeconds": 7200,
    "TaskTitle": LABELING_JOB_NAME,
    "UiConfig": {
      "HumanTaskUiArn": f'arn:aws:sagemaker:us-east-1:394669845002:human-task-ui/VideoObjectTracking'
    },
    "WorkteamArn": WORKTEAM_ARN
  },
  "InputConfig": {
    "DataAttributes": {
      "ContentClassifiers": [
        "FreeOfPersonallyIdentifiableInformation",
        "FreeOfAdultContent"
      ]
    },
    "DataSource": {
      "S3DataSource": {
        "ManifestS3Uri": INPUT_MANIFEST_S3_URI
      }
    }
  },
  "LabelAttributeName": "Person-ref",
  "LabelCategoryConfigS3Uri": LABEL_CATEGORIES_S3_URI,
  "OutputConfig": {
    "S3OutputPath": f"s3://{bucket}/gt_job_results"
  },
  "RoleArn": role,
  "StoppingConditions": {
    "MaxPercentageOfInputDatasetLabeled": 100
  }
}
print(createLabelingJob_request)
out = sagemaker_cl.create_labeling_job(**createLabelingJob_request)
outputs.append(out)
print(out)

Conclusion

In this post, we introduced how to measure the quality of sequential annotations, namely video multi-frame object tracking annotations, using statistical analysis and various quality metrics (IoU, rolling IoU and embedding comparisons). In addition, we walked through how to flag frames that aren’t labeled properly using these quality metrics and send those frames for verification or audit jobs using SageMaker Ground Truth to generate a new version of the dataset with more accurate annotations. We can perform quality checks on the annotations for video data using this approach or similar approaches such as 3D IoU for 3D point cloud data in automated manner at scale with reduction in the number of frames for human audit.

Try out the notebook and add your own quality metrics for different task types supported by SageMaker Ground Truth. With this process in place, you can generate high-quality datasets for a wide range of business use cases in a cost-effective manner without compromising the quality of annotations.

For more information about labeling with Ground Truth, see Easily perform bulk label quality assurance using Amazon SageMaker Ground Truth.

References

  1. https://en.wikipedia.org/wiki/Hausdorff_distance
  2. https://aws.amazon.com/blogs/machine-learning/easily-perform-bulk-label-quality-assurance-using-amazon-sagemaker-ground-truth/

About the Authors

 Vidya Sagar Ravipati is a Deep Learning Architect at the Amazon ML Solutions Lab, where he leverages his vast experience in large-scale distributed systems and his passion for machine learning to help AWS customers across different industry verticals accelerate their AI and cloud adoption. Previously, he was a Machine Learning Engineer in Connectivity Services at Amazon who helped to build personalization and predictive maintenance platforms.

 

 

Isaac Privitera is a Machine Learning Specialist Solutions Architect and helps customers design and build enterprise-grade computer vision solutions on AWS. Isaac has a background in using machine learning and accelerated computing for computer vision and signals analysis. Isaac also enjoys cooking, hiking, and keeping up with the latest advancements in machine learning in his spare time.

Read More

Reconstructing thousands of particles in one go at the CERN LHC with TensorFlow

A guest post by Jan Kieseler from CERN, EP/CMG

Introduction

At large colliders such as the CERN LHC (Large Hadron Collider) high energetic particle beams collide and thereby create massive and possibly yet unknown particles from the collision energy following the well known equation E=mc2. Most of these newly created particles are not stable and decay to more stable particles almost immediately. Detecting these decay products and measuring their properties precisely is the key to understanding what happened during the high energy collision, and will possibly shed light on big questions such as the origin of dark matter.

Detecting and measuring particles

For this purpose, the collision interaction points are surrounded by large detectors covering as much as possible in all possible directions and energies of the decay products. These detectors are further split into sub-detectors, each collecting complementary information. The innermost detector, closest to the interaction point, is the tracker consisting of multiple layers. Similar to a camera, each layer can detect the spatial position at which a charged particle passed through it, providing access to its trajectory. Combined with a strong magnetic field, this trajectory gives access to the particle charge and the particle momentum.

While the tracker is aimed at measuring the trajectories, only, while minimising any further interaction with and scattering of the particles, the next sub-detector layer is aimed at stopping them entirely. By stopping the particles completely, these calorimeters can extract the initial particle energy, and can also detect neutral particles. The only particles that pass through these calorimeters are muons, which are identified by additional muon chambers that constitute the outermost detector shell and use the same detection principles as the tracker.

Layout of the CMS detector, showing different particle species interacting with the sub-detectors (Image credit: CERN).
Layout of the CMS detector, showing different particle species interacting with the sub-detectors (Image credit: CERN).

Combining the information from all these sub-detectors to reconstruct the final particles is a challenging task, not only because we want to achieve the best physics performance, but also in terms of computing resources and person-power available to develop and tune the reconstruction algorithms. In particular for the High-Luminosity LHC, the extension of the CERN LHC, aiming to collect unprecedented amounts of data, these algorithms need to perform well given a collision rate of 40MHz and up to 200 simultaneous interactions in each collision, which result in up to about a million signals from all sub-detectors.

Even with triggers, a fast, step-wise filtering of interesting events, in place, the total data collected to disk still comprises several petabytes, making efficient algorithms a must at all stages.

Classic reconstruction algorithms in high energy physics heavily rely on factorisation of individual steps and a lot of domain (physics) knowledge. While they perform reasonably well, the idealistic assumptions that are needed to develop these algorithms limit the performance, such that machine-learning approaches are often used to refine the classically reconstructed particles, and make their way into more and more reconstruction chains. The machine learning approaches benefit from a very precise simulation of all detector components and physics processes, valid over several orders of magnitude, such that large sets of labelled data can be produced very easily in a short amount of time. This led to a rise of neural network based identification and regression algorithms, and to the inclusion of TensorFlow as the standard inference engine in the software framework of the Compact Muon Solenoid (CMS) experiment.

Machine-learning reconstruction approaches also come with the advantage that by construction they have to be automatically optimizable and need a loss function to train that quantifies the final reconstruction target. In contrast, classic approaches are often optimised without the need to define such an inclusive quantitative metric, and parameters are tuned by hand, involving many experts, and taking a lot of person-power that could be spent on developing new algorithms instead of tuning existing ones. Therefore, moving to differentiable machine-learning algorithms such as deep neural networks in general can also help use the human resources more efficiently.

However, extending machine-learning based algorithms to the first step of reconstructing the particles from hits – instead of just refining already reconstructed particles – comes with two main challenges: the data structure and phrasing reconstruction as a minimisation problem.

The detector data is highly irregular, due to the inclusion of multiple sub-detectors, each with its own geometry. But even within a sub-detector, such as the tracker, the geometry is designed based on physics aspects with a fine resolution close to the interaction point and more coarse further away. Furthermore, the individual tracker layers are not densely packed, but have a considerable amount of space between them, and in each event only a small fraction of sensors are actually active, changing the number of inputs from event to event. Therefore, neural networks that require a regular grid, such as convolutional neural network architectures are – despite their good performance and highly optimised implementations – not applicable.

Graph neural networks can bridge this gap and, in principle, allow abstracting from the detector geometry. Recently, several graph neural network proposals from computer science have been studied in the context of refining already reconstructed particles in high energy physics. However, given the high input dimensionality of the data many of these proposals cannot be employed for reconstructing particles directly from hits, and custom solutions are needed. One example is GravNet that – by construction – reduces the resource requirements significantly while maintaining good physics performance by using sparse dynamic adjacency matrices and performing most operations without memory overhead.

This in particular becomes possible through TensorFlow which makes it easy to implement and load custom kernels into the graph and integrate custom analytic gradients for fused operations. Only the combination of these custom kernels and the network structure allows loading a full physics event into the GPU memory, training the network on it, and performing the inference.

GravNet layer architecture
GravNet layer architecture (from left to right): point features are projected into a feature space FLR, and a low dimensional coordinate space S; k nearest neighbours are determined in S; mean and maximum of distance weighted neighbour features are accumulated; accumulated features are combined with original features.

Since many of the reconstruction tasks, even the refinement of already reconstructed particles, rely on an unknown number of inputs, the recent addition and support of ragged data structures in TensorFlow in principle opens up a lot of new possibilities. While the integration is not sufficient to build full neural network architectures, yet, a future full support of ragged data structures would be a significant step forward for integrating TensorFlow even deeper into the reconstruction algorithms and would make some custom kernels obsolete.

The second challenge when reconstructing particles directly from hits using deep neural networks is to train the network to predict an unknown number of particles from an unknown number of inputs. There is a plethora of algorithms and training methods for detecting dense objects in dense data, such as images, but while the requirement of the dense data can be loosened in some cases, most of these algorithms still rely on the objects being dense or having a clear boundary, making it possible to exploit anchor boxes or central points of the object. Particles in the detector however, often overlap to a large degree, and their sparsity does not allow defining central points nor clear boundaries. A solution to this problem is Object Condensation, where object properties are condensed in at least one representative condensation point per object that can be chosen freely by the network through a high confidence score.

cluster points

To resolve ambiguities, the other points are clustered around the object they belong to using potential functions (illustrated above). However, these potentials scale with the confidence score in a tunable manner, such that the amount of segmentation the network should perform is adjustable up to the point where all points except the condensation points can be left free floating in case we are only interested in the final object properties.

Some parts of this algorithm are very similar to the method proposed in a previous paper, but the goal is entirely different. While the previous approach constitutes a very powerful segmentation algorithm moving pixels to cluster objects in an image towards a central point, the condensation points here directly carry object properties, and through the choice of the potential functions the clustering space can be completely detached from the input space. The latter has big implications for the applicability to the sparse detector data with overlapping particles, but also does not distinguish conceptually between “stuff” and “things”, providing a new perspective on one-shot panoptic segmentation.

But coming back to particle reconstruction, as shown in the corresponding paper, Object Condensation can outperform classic particle reconstruction algorithms even on simplified problems that are quite close to the idealistic assumptions of the classic algorithm. Therefore, it provides an alternative to classic reconstruction approaches directly from hits.

Based on this promising study, there is work ongoing to extend the approach to simulated events in the High Granularity Calorimeter, a planned new sub-detector of the CMS experiment at CERN, with about 2 million sensors, covering the particularly challenging forward region close to the incident beams, where most particles are produced. Compared to the published proof-of-concept, this realistic environment is much more challenging and requires the optimisation of the network structures and even more custom TensorFlow operations that can be found in the developing repository on github, using DeepJetCore as an interface to data formats commonly used in high-energy physics. Right now, there is a particular focus on implementing fast k-nearest-neighbour algorithms, a crucial building block for GravNet, that can handle the large input dimensionality, but also ragged implementations of other operations as well as implementations of the Object Condensation loss can be found there.

Conclusion

To conclude, the application of deep neural networks to reconstruction tasks is exhibiting a shift from refining classically reconstructed particles to reconstructing the particles and their properties directly, in an optimizable and highly parallelizable way to meet the person-power and computing challenges in the future. This development will give rise to more custom implementations, and will be using more of the bleeding edge features in TensorFlow and tf.keras such as ragged data structures, so that a closer contact between high-energy physics reconstruction developers and TensorFlow developers is foreseeable.

Acknowledgements

I would like to acknowledge the support of Thiru Palanisamy and Josh Gordon at Google for their help with the blog post collaboration and with providing active feedback.

Read More