Building a custom Angular application for labeling jobs with Amazon SageMaker Ground Truth

Building a custom Angular application for labeling jobs with Amazon SageMaker Ground Truth

As a data scientist attempting to solve a problem using supervised learning, you usually need a high-quality labeled dataset before starting your model building. Amazon SageMaker Ground Truth makes dataset building for a different range of tasks, like text classification and object detection, easier and more accessible to everyone.

Ground Truth also helps you build datasets for custom user-defined tasks that let you annotate anything. This capability is powered by the following:

  • Custom AWS Lambda functions that can be triggered between labeling steps. This allows you to have custom logic pre-labeling like filtering examples or augmenting them with metadata using other services like Amazon Translate or Amazon Rekognition, and post-labeling logic for label consolidation or quality control.
  • Custom web templates that let you build unique user interfaces using HTML and Javascript that integrate perfectly with Ground Truth workflows. These templates are easy to build with Crowd HTML Elements, which are a set of common UI elements used for text, video, and audio labeling jobs that you can arrange like blocks in your custom template.
  • Availability of a large set of skilled and specialized workforces in the AWS Marketplace and in Amazon Mechanical Turk if you need to augment your private teams of subject matter experts. Vetted partners in the AWS Marketplace cover numerous languages as well as specific skills in video and image annotations that fit different industry needs (like medical labeling).

For complex labeling tasks, such as complex taxonomy classification, extreme multi-class classifications, or autonomous driving labeling tasks, you may need to build a more complex front-end application for your labeling workforce. Front-end frameworks like Angular are helpful in these cases because they bring useful design patterns like model-view-controller (MVC), which makes your codebase more robust and maintainable for a larger team composed of UX/UI designers and software developers.

This post walks you through using Angular and Angular Elements to create fully customizable solutions that work nicely with Ground Truth. This walkthrough assumes that you’re familiar with running a custom labeling job with Ground Truth and Crowd HTML Elements. For more information, see Build a custom data labeling workflow with Amazon SageMaker Ground Truth.

The approach described in this post also works with Amazon Augmented AI (Amazon A2I), which makes it easy to build the workflows required for human review of machine learning predictions. This is possible because Amazon A2I uses Crowd HTML Elements to create custom worker templates. For more information, see Create Custom Worker Templates.

Building a custom UI for complex taxonomy classification

If you manage large supply chains and interact with different types of suppliers, like global food restaurants or automotive manufacturers, you likely receive invoices in different formats and languages. To keep track of your operations and drive financial efficiencies, you need teams behind the scenes to map invoices and receipts to large categories of products and organize them in hierarchical taxonomies.

The following diagram illustrates a hierarchical taxonomy of computer components.

The following diagram illustrates a hierarchical taxonomy of types of food.

Hierarchical taxonomies can have thousands of categories at their leaf level. Such examples can include web directories (the Yahoo! Directory or the Open Directory Project), library classification schemes (Dewey Decimal or Library of Congress), or the classification schemes used in natural science, legal, or medical applications.

What if a natural language processing (NLP) model could help you automatically tag every invoice to the proper category? What if text labeling tools could extract categories from invoices?

Even if accurate classification over large sets of closely related classes is inherently difficult, it all starts with constructing a high-quality dataset in the most cost-efficient manner.

Taxonomy labeling with Angular Elements

For the following use case, you are one of the biggest fast food chains operating and sourcing materials across the world. To build a dataset for your NLP model, you came up with a single-page web app based on UX research that helps your workforce read an invoice description and select the corresponding category in the taxonomy. See the following screenshot.

This implementation makes use of Angular Materials tabs and a filter box that makes navigating the categories easy. It also displays an English translation of your invoice description so that the workers can labels invoices from across the world. Moreover, because it’s built on a framework like Angular, you can improve it down the line with more elements, such as drop-downs for the higher levels of the taxonomy or dynamic content like images or videos based on third-party APIs.

For more information about this application, see the GitHub repo.

The application is built using Angular Elements, which creates Angular components packaged as custom elements (also called web components), a web standard for defining new HTML elements in a framework-agnostic way. This enables you to integrate smoothly with Crowd HTML Elements later on.

Angular Elements inputs and outputs

In this use case, your Angular component expects two inputs: an invoice description and an invoice translation. These are passed to it using tag attributes in the <ng-home> (the directive that designates the root element of the application). The values are then captured by the @Input() annotations defined in the Angular Controller in src/app/home.ts. See the following code:

<ng-home source='10牛ステーキ-20パッケージ-ブランドX' translation='10 beef steak - 20 packages - brand X' id="home">loading</ng-home> 

export class Home implements OnInit {

  @Input() invoice = '';
  @Input() translation = '';
  
  ...

The values are rendered using two-binding in the placehoders {{source}} and {{translation}} in the Angular View in src/app/home.html. See the following code:

<!-- Invoice Description -->
<div class="card" >
    <div class="card-header">
        <h3>Invoice Description</h3>
    </div>
    <div>
        <p id="step1">
        <span>Invoice Description: <br />
        <b>{{ invoice }}</b></span>
        </p>
        <p style='font-weight: small; color: gray;' id="step2">
        <span>English Translation: <br /> {{ translation }}</span>
        </p>
    </div>
</div>

The following screenshot shows the Meats tab on the Food Categories page.

When you choose a category and choose Submit, the Angular component should also broadcast a Javascript event contaning the category ID to its parent DOM element. This is achieved using the @Output() in the Angular Controller in src/app/home.ts. See the following code:

<button mat-button color="primary" (click)="onSubmit()" id="submitButton">Submit</button>

<table>
    ...
    <tr mat-row *matRowDef="let row; columns: displayedColumns;"
        (click)="selectRow(row)" [ngClass]="{ 'highlight': row === selectedRow }">
    </tr>
</table>
@Output('rowselected') rowselected = new EventEmitter<any>();

#called when user click on a row in the table ("selecting" a category)
selectRow(row) {
      this.selectedRow = row;
}

#called when user click on Submit button
onSubmit(){
    this.rowselected.emit(this.selectedRow);
}

Angular integration with Crowd HTML Elements

Communication between Angular Elements and Crowd HTML Elements happens through the mechanism described in the preceding section.

Following the steps described in Build a custom data labeling workflow with Amazon SageMaker Ground Truth, you can adapt how to pass the text to annotate and how to catch the broadcasted event from Angular Elements to create your custom template.

The following code shows the full Liquid HTML template to use in your job creations. This file should also be your index.html root file of the Angular app under src/ folder. (Make sure to use the index.html file under the dist folder that has the minified .js files injected into it with the right Amazon Simple Storage Service (Amazon S3) path to host your app.)

<!doctype html>
<html lang="en">
<html>
  <head>
    <script src="https://assets.crowd.aws/crowd-html-elements.js"></script>
  </head>
  <body>

    <crowd-form style="display: none;">
        <input name="annotations" id="annotations" type="hidden">
        <input name="timeElapsed" id="timeElapsed" type="hidden">
         <!-- Prevent crowd-form from creating its own button -->
        <crowd-button form-action="submit" style="display: none;"></crowd-button>
    </crowd-form>

    <div class="mat-app-background basic-container">
      <!-- Dev Mode to test the Angular Element -->
      <!-- <ng-home source='10牛ステーキ-20パッケージ-ブランドX' translation='10 beef steak - 20 packages - brand X' id="home">loading</ng-home> -->
      <ng-home source='{{ task.input.source }}' translation='{{ task.input.translatedDesc }}'>loading</ng-home>
    </div>

    <script src="<your-s3-bucket-angular-app>/runtime-es2015.js" type="module"></script>
    <script src="<your-s3-bucket-angular-app>/runtime-es5.js" nomodule defer></script>
    <script src="<your-s3-bucket-angular-app>/polyfills-es5.js" nomodule defer></script>
    <script src="<your-s3-bucket-angular-app>/polyfills-es2015.js" type="module"></script>
    <script src="<your-s3-bucket-angular-app>/styles-es2015.js" type="module"></script>
    <script src="<your-s3-bucket-angular-app>/styles-es5.js" nomodule defer></script>
    <script src="<your-s3-bucket-angular-app>/vendor-es2015.js" type="module"></script>
    <script src="<your-s3-bucket-angular-app>/vendor-es5.js" nomodule defer></script>
    <script src="<your-s3-bucket-angular-app>/main-es2015.js" type="module"></script>
    <script src="<your-s3-bucket-angular-app>/main-es5.js" nomodule defer></script>
</body>
</html>

<script>

  document.addEventListener("DOMContentLoaded", function(event) {
    // Counter
    var enterDate = new Date();
    function secondsSinceEnter()
    {
      return (new Date() - enterDate) / 1000;
    }

    // GT Form Submitting
    const component = document.querySelector('ng-home').addEventListener('rowselected', (event) => {
      // alert(event.detail.CODE);
      document.getElementById('annotations').value = event.detail.CODE;
      document.getElementById('timeElapsed').value = secondsSinceEnter();
      document.querySelector('crowd-form').submit();
    });

  });

</script>
<style>
  .body {
    background-color: #fafafa;
  }

  .header {
    background: #673ab7;
      color: #fff;
      padding: 0 16px;
      margin: 20px 20px 0px 20px;
      padding: 20px;
  }

  .cards {
    display: grid;
    grid-template-columns: 30% auto;
    grid-auto-rows: auto;
    grid-gap: 1rem;
    margin: 20px 20px 0px 20px;
  }

  .card {
    box-shadow: 0 2px 1px -1px rgba(0,0,0,.2), 0 1px 1px 0 rgba(0,0,0,.14), 0 1px 3px 0 rgba(0,0,0,.12);
    transition: box-shadow 280ms cubic-bezier(.4,0,.2,1);
    display: block;
    position: relative;
    padding: 16px;
    border-radius: 4px;
    /* margin: 20px 0px 0px 20px; */
    border: 2px solid #e7e7e7;
    border-radius: 4px;
  }

  .highlight-step {
    background-color: #2515424a;
    margin: 0px -15px 0px -15px;
    padding: 15px;
  }
</style>

Creating the template

To create the preceding template, complete the following steps:

  1. Add the crowd-html-element.js script at the top of the template so you can use Crowd HTML Elements:
    <script src="https://assets.crowd.aws/crowd-html-elements.js"></script>

  2. Inject the text to annotate and the associated metadata coming from the pre-processing Lambda function to the user interface using the Liquid templating language directly in root element <ng-home>:
    <ng-home source='{{ task.input.source }}' translation='{{ task.input.translated }}' id="home">loading</ng-home>

  3. Use the <crowd-form /> element, which submits the annotations to Ground Truth. The element is hidden because the submission happens in the background. See the following code:
    <crowd-form style="display: none;">
            <input name="annotations" id="annotations" type="hidden">
            <input name="timeElapsed" id="timeElapsed" type="hidden">
             <!-- Prevent crowd-form from creating its own button -->
            <crowd-button form-action="submit" style="display: none;"></crowd-button>
    </crowd-form>
    

  4. Instead of using Crowd HTML Elements to submit the annotation, include a small script to integrate the Angular Element with <crowd-form />:
    ocument.addEventListener("DOMContentLoaded", function(event) {
    
        var enterDate = new Date();    
        function secondsSinceEnter()
        {
          return (new Date() - enterDate) / 1000;
        }
      
        const component = document.querySelector('ng-home').addEventListener('rowselected', (event) => 
          document.getElementById('annotations').value = event.detail.CODE;
          document.getElementById('timeElapsed').value = secondsSinceEnter();
          document.querySelector('crowd-form').submit();
        });
      
      });
    

For this use case, I’m also keeping a counter to monitor the time it takes a worker to complete the annotation.

The following diagram illustrates the data flow between each element.

Conclusion

This post showed how to build custom labeling UI with Angular and Ground Truth. The solution can handle communication between the different scopes in the custom template provided in the labeling job creation. The ability to use a custom front-end framework like Angular enables you to easily create modern web applications that serve your exact needs when tapping into public, private, or vendor labeling workforces.

For more information about hierarchical taxonomies in Ground Truth, see Creating hierarchical label taxonomies using Amazon SageMaker Ground Truth.

If you have any comments or questions about this post, please use the comments section. Happy labeling!


About the Authors

Yassine Landa is a Data Scientist at AWS. He holds an undergraduate degree in Math and Physics, and master’s degrees from French universities in Computer Science and Data Science, Web Intelligence, and Environment Engineering. He is passionate about building machine learning and artificial intelligence products for customers, and has won multiple awards for machine learning products he has built with tech startups and as a startup founder.

 

 

 

 

Read More

2019 Q4 recipients of AWS Machine Learning Research Awards

2019 Q4 recipients of AWS Machine Learning Research Awards

The AWS Machine Learning Research Awards (MLRA) aims to advance machine learning (ML) by funding innovative research and open-source projects, training students, and providing researchers with access to the latest technology. Since 2017, MLRA has supported over 180 research projects from 73 schools and research institutes in 13 countries, with topics such as ML algorithms, computer vision, natural language processing, medical research, neuroscience, social science, physics, and robotics.

On February 18, 2020, we announced the winners of MLRA’s 2019 Q2/Q3 call-for-proposal cycles. We’re now pleased to announce 28 new recipients of MLRA’s 2019 Q4 call-for-proposal cycle. The MLRA recipients represent 26 universities in six countries. The funded projects aim to develop open-source tools and research that benefit the ML community at large, or create impactful research using AWS ML solutions, such as Amazon SageMaker, AWS AI Services, and Apache MXNet on AWS. The following are the 2019 Q4 award recipients:

Recipient University Research Title
Anasse Bari New York University Predicting the 2020 Elections Using Big Data, Analyzing What the World Wants Using Twitter and Teaching Next Generation of Thinkers How to Apply AI for Social Good
Andrew Gordon Wilson New York University Scalable Numerical Methods and Probabilistic Deep Learning with Applications to AutoML
Bo Li University of Illinois at Urbana-Champaign Trustworthy Machine Learning as Services via Robust AutoML and Knowledge Enhanced Logic Inference
Dawn Song University of California, Berkeley Protecting the Public Against AI-Generated Fakes
Dimosthenis Karatzas Universitat Autónoma de Barcelona Document Visual Question Answer (DocVQA) for Large-Scale Document Collections
Dit-Yan Yeung Hong Kong University of Science and Technology Temporally Misaligned Spatiotemporal Sequence Modeling
Lantao Liu Indiana University Bloomington Environment-Adaptive Sensing and Modeling using Autonomous Robots
Leonidas Guibas Stanford University Learning Canonical Spaces for Object-Centric 3D Perception
Maryam Rahnemoonfar University of Maryland, Baltimore Combining Model-Based and Data Driven Approaches to Study Climate Change via Amazon SageMaker
Mi Zhang Michigan State University DA-NAS: An AutoML Framework for Joint Data Augmentation and Neural Architecture Search
Michael P. Kelly Washington University Web-Based Machine Learning for Surgeon Benchmarking in Pediatric Spine Surgery
Ming Zhao Arizona State University Enabling Deep Learning across Edge Devices and Cloud Resources
Nianwen Xue Brandeis University AMR2KB: Construct a High-Quality Knowledge by Parsing Meaning Representations
Nicholas Chia Mayo Clinic Massively-Scaled Inverse Reinforcement Learning Approach for Reconstructing the Mutational History of Colorectal Cancer
Oswald Lanz Fondazione Bruno Kessler Structured Representation Learning for Video Recognition and Question Answering
Pierre Gentine Columbia University Learning Fires
Pratik Chaudhari University of Pennsylvania Offline and Off-Policy Reinforcement Learning
Pulkit Agrawal Massachusetts Institute of Technology Curiosity Baselines for the Reinforcement Learning Community
Quanquan Gu University of California, Los Angeles Towards Provably Efficient Deep Reinforcement Learning
Shayok Chakraborty Florida State University Active Learning with Imperfect Oracles
Soheil Feizi University of Maryland, College Park Explainable Deep Learning: Accuracy, Robustness and Fairness
Spyros Makradakis University of Nicosia Clustered Ensemble of Specialist Amazon GluonTS Models for Time Series Forecasting
Xin Jin Johns Hopkins University Making Sense of Network Performance for Distributed Machine Learning
Xuan (Sharon) Di Columbia University Multi-Autonomous Vehicle Driving Policy Learning for Efficient and Safe Traffic
Yi Yang University of Technology Sydney Efficient Video Analysis with Limited Supervision
Yun Raymond Fu Northeastern University Generative Feature Transformation for Multi-Viewed Domain Adaptation
Zhangyang (Atlas) Wang Texas A&M University Mobile-Captured Wound Image Analysis and Dynamic Modeling for Post-Discharge Monitoring of Surgical Site Infection
Zhi-Li Zhang University of Minnesota Universal Graph Embedding Neural Networks for Learning Graph-Structured Data

Congratulations to all MLRA recipients! We look forward to supporting your research.

For more information about MLRA, see AWS Machine Learning Research Awards or send an email to aws-ml-research-awards@amazon.com.


About the Author

Seo Yeon Shin is a program manager for the AWS AI Academic Programs.

 

 

 

 

 

Read More