Store output in custom Amazon S3 bucket and encrypt using AWS KMS for multi-page document processing with Amazon Textract

Store output in custom Amazon S3 bucket and encrypt using AWS KMS for multi-page document processing with Amazon Textract

Amazon Textract is a fully managed machine learning (ML) service that makes it easy to process documents at scale by automatically extracting printed text, handwriting, and other data from virtually any type of document. Amazon Textract goes beyond simple optical character recognition (OCR) to also identify the contents of fields in forms and information stored in tables. This enables businesses across many industries, including financial, medical, legal, and real estate, to easily process large numbers of documents for different business operations. Healthcare providers, for example, can use Amazon Textract to extract patient information from an insurance claim or values from a table in a scanned medical chart without requiring customization or human intervention. The blog post Automatically extract text and structured data from documents with Amazon Textract shows how to use Amazon Textract to automatically extract text and data from scanned documents without any machine learning (ML) experience.

Amazon Textract provides both synchronous and asynchronous API actions to extract document text and analyze the document text data. You can use synchronous APIs for single-page documents and low latency use cases such as mobile capture. Asynchronous APIs can process single-page or multi-page documents such as PDF documents with thousands of pages.

In this post, we show how to control the output location and the AWS Key Management Service (AWS KMS) key used to encrypt the output data when you use the Amazon Textract asynchronous API.

Amazon Textract asynchronous API

Amazon Textract provides asynchronous APIs to extract text and structured data in single-page (jpeg, png, pdf) or multi-page documents that are in PDF format. Processing documents asynchronously allows your application to complete other tasks while it waits for the process to complete. You can use StartDocumentTextDetection and GetDocumentTextDetection to detect lines and words in a document or use StartDocumentAnalysis and GetDocumentAnalysis to detect lines, words, forms, and table data from a document.

The following diagram shows the workflow of an asynchronous API action. We use AWS Lambda as an example of the compute environment calling Amazon Textract, but the general concept applies to other compute environments as well.

  1. You start by calling the StartDocumentTextDetection or StartDocumentAnalysis API with an Amazon Simple Storage Service (Amazon S3) object location that you want to process, and a few additional parameters.
  2. Amazon Textract gets the document from the S3 bucket and starts a job to process the document.
  3. As the document is processed, Amazon Textract internally saves and encrypt the inference results and notifies you using an Amazon Simple Notification Service (Amazon SNS) topic.
  4. You can then call the corresponding GetDocumentTextDetection or GetDocumentAnalysis API to get the results in JSON format.

Store and encrypt output of asynchronous API in custom S3 bucket

When you start an Amazon Textract job by calling StartDocumentTextDetection or StartDocumentAnalysis, an optional parameter in the API action is called OutputConfig. This parameter allows you to specify the S3 bucket for storing the output. Another optional input parameter KMSKeyId allows you to specify the AWS KMS customer master key (CMK) to use to encrypt the output. The user calling the Start operation must have permission to use the specified CMK.

The following diagram shows the overall workflow when you use the output preference parameter with the Amazon Textract asynchronous API.

  1. You start by calling the StartDocumentTextDetection or StartDocumentAnalysis API with an S3 object location, output S3 bucket name, output prefix for S3 path and KMS key ID, and a few additional parameters.
  2. Amazon Textract gets the document from the S3 bucket and starts a job to process the document.
  3. As the document is processed, Amazon Textract stores the JSON output at the path in the output bucket and encrypts it using the KMS CMK that was specified in the start call.
  4. You get a job completion notification via Amazon SMS.
  5. You can then call the corresponding GetDocumentTextDetection GetDocumentAnalysis to get the JSON result. You can also get the JSON result directly from the output S3 bucket at the path with the following format: s3://{S3Bucket}/{S3Prefix}/{TextractJobId}/*.

 

Starting the asynchronous job with OutputConfig

The following code shows how you can start the asynchronous API job to analyze a document and store encrypted inference output in a custom S3 bucket:

import boto3
client = boto3.client('textract')
response = client.start_document_analysis(
    DocumentLocation={
        'S3Object': {
            'Bucket': 'string',
            'Name': 'string',
            'Version': 'string'
        }
    },
    ...
    OutputConfig={
        'S3Bucket': 'string',
        'S3Prefix': 'string'
    },
    KMSKeyId='string'
)

The following code shows how you can get the results API job to analyze a document:

response = client.get_document_analysis(JobId='string',MaxResults=123,NextToken='string')

You can also use AWS SDK to download output directly from your custom S3 bucket.

The following table shows how the Amazon Textract output is stored and encrypted based on the provided input parameters of OutputConfig and KMSKeyId.

OutputConfig KMSKeyId Amazon Textract Output
None None Amazon Textract output is stored internally by Amazon Textract and encrypted using AWS owned CMK.
Customer’s S3 Bucket None Amazon Textract output is stored in customer’s S3 bucket and encrypted using SSE-S3
None Customer managed CMK Amazon Textract output is stored internally by Amazon Textract and encrypted using Customer managed CMK.
Customer’s S3 bucket Customer managed CMK Amazon Textract output is stored in customer’s S3 bucket and encrypted using Customer managed CMK.

IAM permissions

When you use the Amazon Textract APIs to start an analysis or detection job, you must have access to the S3 object specified in your call. To take advantage of output preferences to write the output to an encrypted object in Amazon S3, you must have the necessary permissions for both the target S3 bucket and the CMK specified when you call the analysis or detection APIs.

The following example AWS Identity and Access Management (IAM) identity policy allows you to get objects from the textract-input S3 bucket with a prefix:

{
"Sid":"AllowTextractUserToReadInputData",
"Action":["s3:GetObject"],
"Effect":"Allow",
"Resource":["arn:aws:s3:::textract-input/documents/*"]
}

The following IAM identity policy allows you to write output to the textract-output S3 bucket with a prefix:

{
"Sid":"AllowTextractUserToReadInputData",
"Action":["s3:GetObject"],
"Effect":"Allow",
"Resource":["arn:aws:s3:::textract-input/documents/*"]
}

When placing objects into Amazon S3 using SSE-KMS, you need specific permissions on the CMK. The following CMK policy language allows a user (textract-start) to use the CMK to protect the output files from an Amazon Textract analysis or detection job:

{
  "Sid": "Allow use of the key to write Textract output to S3",
  "Effect": "Allow",
  "Principal": {"AWS":"arn:aws:iam::111122223333:user/textract-start"},
  "Action": ["kms:DescribeKey","kms:GenerateDataKey", "kms:ReEncrypt", "kms:Decrypt"],
  "Resource": "*"
}

The following KMS key policy allows a user (textract-get) to get the output file that’s backed by SSE-KMS.

{
"Sid": "Allow use of the key to read S3 objects for output",
"Effect": "Allow",
"Principal": {"AWS": "arn:aws:iam::111122223333:user/textract-get"},
"Action": ["kms:Decrypt","kms:DescribeKey"],
"Resource": "*"
}

You must still have separate sections of the key policy to allow the management of the key.

For some workloads, you may need to provide a record of actions taken by a user, role, or an AWS service in Amazon Textract. Amazon Textract is integrated with AWS CloudTrail, which captures all API calls for Amazon Textract as events. For more information, see Logging Amazon Textract API Calls with AWS CloudTrail.

AWS KMS and Amazon S3 provide similar integration with CloudTrail. For more information, see Logging AWS KMS API calls with AWS CloudTrail and Logging Amazon S3 API calls using AWS CloudTrail, respectively. To get log visibility into Amazon S3 GETs and PUTs, you can enable the data trail for Amazon S3. This enables you to have end-to-end visibility into your document-processing lifecycle.

Conclusion

In this post, we showed you how to use the Amazon Textract asynchronous API and your S3 bucket and AWS KMS CMK to store and encrypt the results of Amazon Textract output. We also highlighted how you can use CloudTrail integration to get visibility into your overall document processing lifecycle.

For more information about different security controls in Amazon Textract, see Security in Amazon Textract.

 


About the Authors

Kashif Imran is a Principal Solutions Architect at Amazon Web Services. He works with some of the largest AWS customers who are taking advantage of AI/ML to solve complex business problems. He provides technical guidance and design advice to implement computer vision applications at scale. His expertise spans application architecture, serverless, containers, NoSQL and machine learning.

 

 

 

Peter M. O’Donnell is an AWS Principal Solutions Architect, specializing in security, risk, and compliance with the Strategic Accounts team. Formerly dedicated to a major US commercial bank customer, Peter now supports some of AWS’s largest and most complex strategic customers in security and security-related topics, including data protection, cryptography, incident response, and CISO engagement.

Read More

Using Unity to Help Solve Intelligence

We present our use of Unity, a widely recognised and comprehensive game engine, to create more diverse, complex, virtual simulations. We describe the concepts and components developed to simplify the authoring of these environments, intended for use predominantly in the field of reinforcement learning.Read More

Incorporating your enterprise knowledge graph into Amazon Kendra

Incorporating your enterprise knowledge graph into Amazon Kendra

For many organizations, consolidating information assets and making them available to employees when needed remains a challenge. Commonly used technology like spreadsheets, relational databases, and NoSQL databases exacerbate this issue by creating more and more unconnected, unstructured data.

Knowledge graphs can provide easier access and understanding to this data by organizing this data and capturing dataset semantics, properties, and relationships. While some organizations use knowledge graphs like Amazon Neptune to add structure to their data, they still lack a targeted search engine that users can leverage to search this information.

Amazon Kendra is an intelligent search service powered by machine learning. Ken­dra reimagines enterprise search for your websites and applications so your employees and customers can easily find the content they are looking for, even when it’s scattered across multiple locations and content repositories within your organization.

This solution illustrates how to create an intelligent search engine on AWS using Amazon Kendra to search a knowledge graph stored in Amazon Neptune. We illustrate how you can provision a new Amazon Kendra index in just a few clicks – no prior Machine Learning (ML) experience required! We then show how an existing knowledge graph stored in Amazon Neptune can be connected to Amazon Kendra’s pipeline as metadata as well as into the knowledge panels to build a targeted search engine. Knowledge panels are information boxes that appear on search engines when you search for entities (people, places, organizations, things) that are contained in the knowledge graph. Finally, this solution provides you with a ranked list of the top notable entities that match certain criteria and finds more relevant search results extracted from the knowledge graph.

The following are several common use cases for integrating an enterprise search engine with a knowledge graph:

  • Add a knowledge graph as metadata to Amazon Kendra to more relevant results
  • Derive a ranked list of the top notable entities that match certain criteria
  • Predictively complete entities in a search box
  • Annotate or organize content using the knowledge graph entities by querying in Neptune

Required Services

In order to complete this solution, you will require the following:

Sample dataset

For this solution, we use a subset of the Arbitration Awards Online database, which is publicly available. Arbitration is an alternative to litigation or mediation when resolving a dispute. Arbitration panels are composed of one or three arbitrators who are selected by the parties. They read the pleadings filed by the parties, listen to the arguments, study the documentary and testimonial evidence, and render a decision.

The panel’s decision, called an award, is final and binding on all the parties. All parties must abide by the award, unless it’s successfully challenged in court within the statutory time period. Arbitration is generally confidential, and documents submitted in arbitration are not publicly available, unlike court-related filings.

However, if an award is issued at the conclusion of the case, the Financial Industry Regulatory Authority (FINRA) posts it in its Arbitration Awards Online database, which is publicly available. We use a subset of this dataset for our use case under FINRA licensing (©2020 FINRA. All rights reserved. FINRA is a registered trademark of the Financial Industry Regulatory Authority, Inc. Reprinted with permission from FINRA) to create a knowledge graph for awards.

Configuring your document repository

Before you can create an index in Amazon Kendra, you need to load documents into an S3 bucket. This section contains instructions to create an S3 bucket, get the files, and load them into the bucket. After completing all the steps in this section, you have a data source that Amazon Kendra can use.

  1. On the AWS Management Console, in the Region list, choose US East (N. Virginia) or any Region of your choice that Amazon Kendra is available in.
  2. Choose Services.
  3. Under Storage, choose S3.
  4. On the Amazon S3 console, choose Create bucket.
  5. Under General configuration, provide the following information:
    1. Bucket nameenterprise-search-poc-ds-UNIQUE-SUFFIX
    2. Region – Choose the same Region that you use to deploy your Amazon Kendra index (this post uses US East (N. Virginia) us-east-1)
  6. Under Bucket settings for Block Public Access, leave everything with the default values.
  7. Under Advanced settings, leave everything with the default values.
  8. Choose Create bucket.
  9. Download kendra-graph-blog-data and unzip the files.
  10. Upload the index_data folder from the unzipped files.

Inside your bucket, you should now see two folders: index_data (with 20 objects) and graph_data (with two objects).

The following screenshot shows the contents of enterprise-search-poc-ds-UNIQUE-SUFFIX.

The following screenshot shows the contents of index_data.

The index_data folder contains two files: an arbitration PDF file and arbitration metadata file.

The following code is an example for arbitration metadata. DocumentId is the arbitration case number, and we use this identifier to create a correlation between the Amazon Kendra index and a graph dataset that we load into Neptune.

{
  "DocumentId": "17-00486",
  "ContentType": "PDF",
  "Title": "17-00486",
  "Attributes": {
    "_source_uri": "https://www.finra.org/sites/default/files/aao_documents/17-00486.pdf"
  }
}

The following screenshot shows the contents of graph_data.

Setting up an Amazon Kendra index

In this section, we set up an Amazon Kendra index and configure an S3 bucket as the data source.

  1. Sign in to the console and confirm that you have set the Region to us-east-1.
  2. Navigate to the Amazon Kendra service and choose Launch Amazon Kendra.
  3. For Index name, enter enterprise-search-poc.
  4. For IAM role, choose Create a new role.
  5. For Role name, enter poc-role.
  6. Leave Use an AWS KMS managed encryption key
  7. Choose Create.

The index creation may take some time. For more information about AWS Identity and Access Management (IAM) access roles, see IAM access roles for Amazon Kendra.

  1. When index creation is complete, on the Amazon Kendra console, choose your new index.
  2. In the Index settings section, locate the index ID.

You use the index ID in a later step.

Adding a data source

To add your data source, complete the following steps:

  1. On the Amazon Kendra console, choose your index.
  2. Choose Add data source.
  3. For Select connector type for your data source, choose Add connector under Amazon S3.
  4. For Data source name, enter a name for your data source (for example, ent-search-poc-ds-001).
  5. For Enter the data source location, choose Browse S3 and choose the bucket you created earlier.
  6. For IAM role, choose Create new role.
  7. For Role name, enter poc-ds-role.
  8. In the Additional configuration section, on the Include pattern tab, add index_data.
  9. In the Set sync run schedule section, choose Run on demand.
  10. Choose Next.

  1. Review your details and choose Create.
  2. In the details page of your data source, choose Sync now.

Kendra starts crawling and indexing the data source from Amazon S3 and prepares the index.

You can also monitor the process on Amazon CloudWatch.

Searching the ingested documents

To test the index, complete the following steps:

  1. On the Amazon Kendra console, navigate to your index.
  2. Choose Search console.

  1. Enter a question (for example, how much was initial claim fees for case 17-00486?).

The following screenshot shows your search results.

Knowledge graph

This section describes a knowledge graph of the entities and relationships that participate in arbitration panels. We use Apache TinkerPop Gremlin format to load the data to Neptune. For more information, see Gremlin Load Data Format.

To load Apache TinkerPop Gremlin data using the CSV format, you must specify the vertices and the edges in separate files. The loader can load from multiple vertex files and multiple edge files in a single load job.

The following diagram shows the graph ontology. Each award has properties, such as Problem, Customer, Representative, Firm, and Subtype. The is_related edge shows relationships between awards.


You can access the CSV files for both vertex and nodes in the enterprise-search-poc.zip file. The following screenshot shows a tabular view of the vertex file.

The following screenshot shows a tabular view of the edge file.

Launching the Neptune-SageMaker stack

You can launch the Neptune-SageMaker stack from the AWS CloudFormation console by choosing Launch Stack:

Region View Launch
US East 1
(N. Virginia)
View

Acknowledge that AWS CloudFormation will create IAM resources, and choose Create.

The Neptune and Amazon SageMaker resources described here incur costs. With Amazon SageMaker hosted notebooks, you pay simply for the Amazon Elastic Compute Cloud (Amazon EC2) instance that hosts the notebook. For this post, we use an ml.t2.medium instance, which is eligible for the AWS free tier.

The solution creates five stacks, as shown in the following screenshot.

Browsing and running the content

After the stacks are created, you can browse your notebook instance and run the content.

  1. On the Amazon SageMaker console, choose Notebook instances on the navigation pane.
  2. Select your instance and from the Actions menu, choose Open Jupyter.

  1. In the Jupyter window, in the Neptune directory, open the Getting-Starteddirectory.

The Getting-Starteddirectory contains three notebooks:

  • 01-Introduction.ipynb
  • 02-Labelled-Property-Graph.ipynb
  • 03-Graph-Recommendations.ipynb

The first two introduce Neptune and the property graph data model. The third contains an runnable example of an arbitration knowledge graph recommendation engine. When you run the content, the notebook populates Neptune with a sample award dataset and issues several queries to generate related cases recommendations.

  1. To see this in action, open 03-Graph-Recommendations.ipynb.
  2. Change the bulkLoad Amazon S3 location to your created Amazon S3 location (enterprise-search-poc-ds-UNIQUE-SUFFIX).

  1. Run each cell in turn, or choose Run All from the Cell drop-down menu.

You should see the results of each query printed below each query cell (as in the following screenshot).

Architecture overview

DocumentId in Amazon Kendra is the key for an Amazon Kendra index and knowledge graph data. DocumentId in an Amazon Kendra index should be the same as ~id in the graph node. This creates the association between the Amazon Kendra results and graph nodes. The following diagram shows the architecture for integrating Amazon Kendra and Neptune.

The architecture workflow includes the following steps:

  1. The search user interface (UI) sends the query to Amazon Kendra.
  2. Amazon Kendra returns results based on its best match.
  3. The UI component calls Neptune via Amazon API Gateway and AWS Lambda with the docId as the request parameter.
  4. Neptune runs the query and returns all related cases with the requested docId.
  5. The UI component renders the knowledge panel with the graph responses.

Testing Neptune via API Gateway

To test Neptune via API Gateway, complete the following steps:

  1. On the API Gateway console, choose APIs.
  2. Choose KendraGraphAPI to open the API page.
  3. Select the POST method and choose Test.
  4. Enter the following sample event data:
    {
      "docId": "14-02936",
      "repCrd": "5048331",
      "firm": ["23131","29604"]
    }

  5. Choose Test.

This sends an HTTP POST request to the endpoint, using the sample event data in the request body. In the following screenshot, the response shows the related cases for award 14-02936.

  1. On the navigation name, choose Stages.
  2. Choose dev.
  3. Copy the Invoke URL value and save it to use in the next step.

  1. To test the HTTP, enter the following CURL command. Replace the endpoint with your API Gateway invoke URL.
    curl --location --request POST 'REPLACE_WITH_API_GATEWAY_ENDPOINT' 
    --header 'Content-Type: application/json' 
    --data-raw '{
    "docId": "14-02936",
    "repCrd": "5048331",
    "firm": ["23131","29604"]
    }'
    

Testing Neptune via Lambda

Another way to test Neptune is with a Lambda function. This section outlines how to invoke the Lambda function using the sample event data provided.

  1. On the Lambda console, choose Kendra-Neptune-Graph-AddLamb-NeptuneLambdaFunction.
  2. Choose Test.
  3. In the Configure test event page, choose Create new test event.
  4. For Event template, choose the default Hello World
  5. For Event name, enter a name and note the following sample event template:
    {
      "docId": "14-02936",
      "repCrd": "5048331",
      "firm": ["23131","29604"]
    }

  1. Choose Create.

  1. Choose Test.

Each user can create up to 10 test events per function. Those test events aren’t available to other users.

Lambda runs your function on your behalf. The handler in your Lambda function receives and processes the sample event.

  1. After the function runs successfully, view the results on the Lambda console.

The results have the following sections:

  • Execution result – Shows the run status as succeeded and also shows the function run results, returned by the return statement.
  • Summary – Shows the key information reported in the Log output section (the REPORT line in the run log).
  • Log output – Shows the log Lambda generates for each run. These are the logs written to CloudWatch by the Lambda function. The Lambda console shows these logs for your convenience. The Click here link shows the logs on the CloudWatch console. The function then adds logs to CloudWatch in the log group that corresponds to the Lambda function.

Developing the web app

In this section, we develop a web app with a search interface to search the documents. We use AWS Cloud9 as our integrated development environment (IDE) and Amplify to build and deploy the web app.

AWS Cloud9 is a cloud-based IDE that lets you write, run, and debug your code with just a browser. It includes a code editor, debugger, and a terminal. AWS Cloud9 comes prepackaged with essential tools for popular programming languages, including JavaScript, Python, PHP, and more, so you don’t need to install files or configure your development machine to start new projects.

The AWS Cloud9 workspace should be built by an IAM user with administrator privileges, not the root account user. Please ensure you’re logged in as an IAM user, not the root account user.

Ad blockers, JavaScript disablers, and tracking blockers should be disabled for the AWS Cloud9 domain, otherwise connecting to the workspace might be impacted.

Creating a new environment

To create your environment, complete the following steps:

  1. On the AWS Cloud9 console, make sure you’re using one of the following Regions:
    • US East (N. Virginia)
    • US West (Oregon)
    • Asia Pacific (Singapore)
    • Europe (Ireland)
  2. Choose Create environment.
  3. Name the environment kendrapoc.
  4. Choose Next step.
  5. Choose Create a new instance for environment (EC2) and choose small.
  6. Leave all the environment settings as their defaults and choose Next step.
  7. Choose Create environment.

Preparing the environment

To prepare your environment, make sure you’re in the default directory in an AWS Cloud9 terminal window (~/environment) before completing the following steps:

  1. Enter the following code to download the code for the Amazon Kendra sample app and extract it in a temporary directory:
    mkdir tmp
    cd tmp
    aws s3 cp s3://aws-ml-blog/artifacts/Incorporating-your-enterprise-knowledge-graph-into-Amazon-Kendra/kendra-graph-blog-data/kendrasgraphsampleui.zip .
    unzip kendrasgraphsampleui.zip
    rm kendrasgraphsampleui.zip
    cd ..

  1. We build our app in ReactJS with the following code:
    echo fs.inotify.max_user_watches=524288 | sudo tee -a /etc/sysctl.conf && sudo sysctl -p 
    npx create-react-app kendra-poc

  1. Change the working directory to kendra-poc and ensure that you’re in the /home/ec2-
    user/environment/kendra-poc directory:

    cd kendra-poc

  1. Install a few prerequisites with the following code:
    npm install --save node-sass typescript bootstrap react-bootstrap @types/lodash aws-sdk
    npm install --save semantic-ui-react
    npm install aws-amplify @aws-amplify/ui-react
    npm install -g @aws-amplify/cli

  2. Copy the source code to the src directory:
    cp -r ../tmp/kendrasgraphsampleui/* src/

Initializing Amplify

To initialize Amplify, complete the following steps:

  1. On the command line, in the kendra-poc directory, enter the following code:
    amplify init

  2. Choose Enter.
  3. Accept the default project name kendrapoc.
  4. Enter dev for the environment name.
  5. Choose None for the default editor (we use AWS Cloud9).
  6. Choose JavaScript and React when prompted.
  7. Accept the default values for paths and build commands.
  8. Choose the default profile when prompted.

Your run should look like the following screenshot.

Adding authentication

To add authentication to the app, complete the following steps:

  1. Enter the following code:
    amplify add auth

  1. Choose Default Configuration when asked if you want to use the default authentication and security configuration.
  2. Choose Username when asked how you want users to sign in.
  3. Choose No, I am done. when asked about advanced settings.

This session should look like the following screenshot.

  1. To create these changes in the cloud, enter:
    amplify push

  1. Confirm you want Amplify to make changes in the cloud for you.

Provisioning takes a few minutes to complete. The Amplify CLI takes care of provisioning the appropriate cloud resources and updates src/aws-exports.js with all the configuration data we need to use the cloud resources in our app.

Amazon Cognito lets you add user sign-up, sign-in, and access control to your web and mobile apps quickly and easily. We made a user pool, which is a secure user directory that lets our users sign in with the user name and password pair they create during registration. Amazon Cognito (and the Amplify CLI) also supports configuring sign-in with social identity providers, such as Facebook, Google, and Amazon, and enterprise identity providers via SAML 2.0. For more information, see Amazon Cognito Developer and Amplify Authentication documentations

Configuring an IAM role for authenticated users

To configure your IAM role for authenticated users, complete the following steps:

  1. In the AWS Cloud9 IDE, in the left panel, browse to the file kendra-poc/amplify/team-provider-info.json and open it (double-click).
  2. Note the value of AuthRoleName.
  1. On the IAM console, choose Roles.
  2. Search for the AuthRole using the value from the previous step and open that role.
  1. Choose Add inline policy.
  2. Choose JSON and replace the contents with the following policy. Replace enterprise-search-poc-ds-UNIQUE-SUFFIX with the name of the S3 bucket that is configured as the data source (leave the * after the bucket name, which allows the policy to access any object in the bucket). Replace ACCOUNT-NUMBER with the AWS account number and KENDRA-INDEX-ID with the index ID of your index.
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "VisualEditor0",
                "Effect": "Allow",
                "Action": "kendra:ListIndices",
                "Resource": "*"
            },
            {
                "Sid": "VisualEditor1",
                "Effect": "Allow",
                "Action": [
                    "kendra:Query",
                    "s3:ListBucket",
                    "s3:GetObject",
                    "kendra:ListFaqs",
                    "kendra:ListDataSources",
                    "kendra:DescribeIndex",
                    "kendra:DescribeFaq",
                    "kendra:DescribeDataSource"
                ],
                "Resource": [
                    "arn:aws:s3:::enterprise-search-poc-ds-UNIQUE-SUFFIX*",
                    "arn:aws:kendra:us-east-1:ACCOUNT-NUMBER:index/KENDRA-INDEX-ID"
                ]
            }
        ]
    }

  3. Choose Review policy.
  4. Enter a policy name, such as my-kendra-poc-policy.
  5. Choose Create policy.
  6. Browse back to the role and confirm that my-kendra-poc-policy is present.
  1. Create a user (poctester - customer) and add to the corresponding groups by entering the following at the command prompt:
USER_POOL_ID=`grep user_pools_id src/aws-exports.js| awk 'BEGIN {FS =""" } {print $4}'`
aws cognito-idp create-group --group-name customer --user-pool-id $USER_POOL_ID
aws cognito-idp admin-create-user --user-pool-id $USER_POOL_ID --username poctester --temporary-password AmazonKendra
aws cognito-idp admin-add-user-to-group --user-pool-id $USER_POOL_ID --username poctester --group-name customer

Configuring the application

You’re now ready to configure the application.

  1. In the AWS Cloud9 environment, browse to the file kendra-poc/src/search/Search.tsx and open it for editing.

A new window opens.

  1. Replace REPLACE_WITH_KENDRA-INDEX-ID with your index ID.
  1. In the AWS Cloud9 environment, browse to the file kendra-poc/src/properties.js and open it for editing.
  2. In the new window that opens, replace REPLACE_WITH_API_GATEWAY_ENDPOINT with the PI Gateway invoke URL value from earlier.
  1. Start the application in the AWS Cloud9 environment by entering the following code in the command window in the ~/environment/kendra-poc directory:
    npm start

Compiling the code and starting takes a few minutes.

  1. Preview the running application by choosing Preview on the AWS Cloud9 menu bar.
  2. From the drop-down menu, choose Preview Running Application.

A new browser window opens.

  1. Log in with any of the users we configured earlier (poctester) with the temporary password AmazonKendra.

Amazon Cognito forces a password reset upon first login.

Using the application

Now we can try out the app we developed by making a few search queries, such as “how much was initial claim fees for case 17-00486?”

The following screenshot shows the Amazon Kendra results and the knowledge panel, which shows the related cases for each result. The knowledge panel details on the left is populated from the graph database and shows all the related cases for each search item.

Conclusion

This post demonstrated how to build a targeted and flexible cognitive search engine with a knowledge graph stored in Neptune and integrated with Amazon Kendra. You can enable rapid search for your documents and graph data using natural language, without any previous AI or ML experience. Finally, you can create an ensemble of other content types, including any combination of structured and unstructured documents, to make your archives indexable and searchable for harvesting knowledge and gaining insight. For more information about Amazon Kendra, see AWS re:Invent 2019 – Keynote with Andy Jassy on YouTube, Amazon Kendra FAQs, and What is Amazon Kendra?

 


About the Authors

Dr. Yazdan Shirvany is all 12 AWS-certified Senior Solution Architect with deep experience in AI/ML, IOT and big data technologies including NLP, Knowledge Graph, applications reengineering, and optimizing software to leverage the cloud. Dr. Shirvany has 20+ scientific publications, and several issued patents in AI/ML field. Dr. Shirvany holds a M.S and Ph.D. in Computer Science from Chalmers University of Technology.

 

Dipto Chakravarty is a leader in Amazon’s Alexa engineering group and heads up the Personal Mobility team in HQ2 utilizing AI, ML and IoT to solve local search analytics challenges. He has 12 patents issued to date and has authored two best-selling books on computer architecture and operating systems published by McGraw-Hill and Wiley. Dipto holds a B.S and M.S in Computer Science and Electrical Engineering from U. of Maryland, an EMBA from Wharton School, U. Penn, and a GMP from Harvard Business School.

 

Mohit Mehta is a leader in the AWS Professional Services Organization with expertise in AI/ML and Big Data technologies. Mohit holds a M.S in Computer Science, all 12 AWS certifications, MBA from College of William and Mary and GMP from Michigan Ross School of Business.

Read More

Anatomical Adventures in VR: University’s Science Visualizations Tap NVIDIA CloudXR and 5G

Anatomical Adventures in VR: University’s Science Visualizations Tap NVIDIA CloudXR and 5G

5G networks are poised to transform the healthcare industry, starting with how medical students learn.

The Grid Factory, a U.K.-based provider of NVIDIA GPU-accelerated services, is partnering with telecommunications company Vodafone to showcase the potential of 5G technology with a network built at Coventry University.

Operating NVIDIA CloudXR on the private 5G network, student nurses and healthcare professionals can experience lessons and simulations in virtual reality environments.

With NVIDIA CloudXR, users don’t need to be physically tethered to a high-performance computer that drives rich, immersive environments. Instead, it runs on NVIDIA servers located in the cloud or on premises, which deliver the advanced graphics performance needed for wireless virtual, augmented or mixed reality environments — which collectively are known as XR.

Streaming high-resolution graphics over 5G promises higher-quality, mobile-immersive VR for more engaging experiences in remote learning. Using CloudXR enables lecturers to teach in VR while students can access the interactive environment through smartphones, tablets, laptops, VR headsets and AR glasses.

All images courtesy of Gamoola/The Original Content Company.

A member of the NVIDIA CloudXR early access program, The Grid Factory is helping organizations realize new opportunities to deliver high-quality graphics over 5G.

“CloudXR makes the experience so natural that lecturers can easily forget about the technology, and instead focus on learning points and content they’re showing,” said Ben Jones, CTO at The Grid Factory.

With Coventry University’s advanced VR technology, users can now take virtual tours through the human body. Medical students can enter the immersive environment and visualize detailed parts of the body, from the bones, muscles and the brain, to the heart, veins, vessels and blood cells.

Previously, lecturers would have to use pre-recorded materials, but this only allowed them to view the body in a linear, 2D format. Working with Vodafone, The Grid Factory installed NVIDIA CloudXR at the university, enabling lecturers to guide their students on interactive explorations of the human body in 3D models.

“With 5G, we can put the VR headset on and stream high-resolution images and videos remotely anywhere in the world,” said Natasha Taylor, associate professor in the School Of Nursing, Midwifery, and Health at Coventry University . “This experience allows us to take tours of the human body in a way we’ve never been able to before.”

The lessons have turned flat asynchronous learning into cinematic on-demand learning experiences. Students can tune in virtually to study high-resolution, 3D visualizations of the body at any time.

The immersive environments can also show detailed simulations of viral attacks, providing more engaging content that allows students to visualize and retain information faster, according to the faculty staff at Coventry.

And while the lecturers provide the virtual lessons, students can ask questions throughout the presentation.

With 5G, CloudXR can provide lower-latency immersive experiences, and VR environments can become more natural for users. It has allowed lecturers to demonstrate more easily, and for medical students to better visualize parts of the human body.

“The lower the latency, the closer you are to real-life experience,” said Andrea Dona, head of Networks at Vodafone UK. “NVIDIA CloudXR is a really exciting new software platform that allows us to stream high-quality virtual environments directly to the headset, and is now being deployed in a 5G network for the first time commercially.”

More faculty members have expressed interest in the 5G-enabled NVIDIA CloudXR experiences, especially for engineering and automotive use cases, which involve graphics-intensive workloads.

Learn more about NVIDIA CloudXR.

The post Anatomical Adventures in VR: University’s Science Visualizations Tap NVIDIA CloudXR and 5G appeared first on The Official NVIDIA Blog.

Read More

Using GANs to Create Fantastical Creatures

Using GANs to Create Fantastical Creatures

Posted by Andeep Singh Toor, Stadia Software Engineer, and Fred Bertsch, Software Engineer, Google Research, Brain Team

Creating art for digital video games takes a high degree of artistic creativity and technical knowledge, while also requiring game artists to quickly iterate on ideas and produce a high volume of assets, often in the face of tight deadlines. What if artists had a paintbrush that acted less like a tool and more like an assistant? A machine learning model acting as such a paintbrush could reduce the amount of time necessary to create high-quality art without sacrificing artistic choices, perhaps even enhancing creativity.

Today, we present Chimera Painter, a trained machine learning (ML) model that automatically creates a fully fleshed out rendering from a user-supplied creature outline. Employed as a demo application, Chimera Painter adds features and textures to a creature outline segmented with body part labels, such as “wings” or “claws”, when the user clicks the “transform” button. Below is an example using the demo with one of the preset creature outlines.

Using an image imported to Chimera Painter or generated with the tools provided, an artist can iteratively construct or modify a creature outline and use the ML model to generate realistic looking surface textures. In this example, an artist (Lee Dotson) customizes one of the creature designs that comes pre-loaded in the Chimera Painter demo.

In this post, we describe some of the challenges in creating the ML model behind Chimera Painter and demonstrate how one might use the tool for the creation of video game-ready assets.

Prototyping for a New Type of Model
In developing an ML model to produce video-game ready creature images, we created a digital card game prototype around the concept of combining creatures into new hybrids that can then battle each other. In this game, a player would begin with cards of real-world animals (e.g., an axolotl or a whale) and could make them more powerful by combining them (making the dreaded Axolotl-Whale chimera). This provided a creative environment for demonstrating an image-generating model, as the number of possible chimeras necessitated a method for quickly designing large volumes of artistic assets that could be combined naturally, while still retaining identifiable visual characteristics of the original creatures.

Since our goal was to create high-quality creature card images guided by artist input, we experimented with generative adversarial networks (GANs), informed by artist feedback, to create creature images that would be appropriate for our fantasy card game prototype. GANs pair two convolutional neural networks against each other: a generator network to create new images and a discriminator network to determine if these images are samples from the training dataset (in this case, artist-created images) or not. We used a variant called a conditional GAN, where the generator takes a separate input to guide the image generation process. Interestingly, our approach was a strict departure from other GAN efforts, which typically focus on photorealism.

To train the GANs, we created a dataset of full color images with single-species creature outlines adapted from 3D creature models. The creature outlines characterized the shape and size of each creature, and provided a segmentation map that identified individual body parts. After model training, the model was tasked with generating multi-species chimeras, based on outlines provided by artists. The best performing model was then incorporated into Chimera Painter. Below we show some sample assets generated using the model, including single-species creatures, as well as the more complex multi-species chimeras.

Generated card art integrated into the card game prototype showing basic creatures (bottom row) and chimeras from multiple creatures, including an Antlion-Porcupine, Axolotl-Whale, and a Crab-Antion-Moth (top row). More info about the game itself is detailed in this Stadia Research presentation.

Learning to Generate Creatures with Structure
An issue with using GANs for generating creatures was the potential for loss of anatomical and spatial coherence when rendering subtle or low-contrast parts of images, despite these being of high perceptual importance to humans. Examples of this can include eyes, fingers, or even distinguishing between overlapping body parts with similar textures (see the affectionately named BoggleDog below).

GAN-generated image showing mismatched body parts.

Generating chimeras required a new non-photographic fantasy-styled dataset with unique characteristics, such as dramatic perspective, composition, and lighting. Existing repositories of illustrations were not appropriate to use as datasets for training an ML model, because they may be subject to licensing restrictions, have conflicting styles, or simply lack the variety needed for this task.

To solve this, we developed a new artist-led, semi-automated approach for creating an ML training dataset from 3D creature models, which allowed us to work at scale and rapidly iterate as needed. In this process, artists would create or obtain a set of 3D creature models, one for each creature type needed (such as hyenas or lions). Artists then produced two sets of textures that were overlaid on the 3D model using the Unreal Engine — one with the full color texture (left image, below) and the other with flat colors for each body part (e.g., head, ears, neck, etc), called a “segmentation map” (right image, below). This second set of body part segments was given to the model at training to ensure that the GAN learned about body part-specific structure, shapes, textures, and proportions for a variety of creatures.

Example dataset training image and its paired segmentation map.

The 3D creature models were all placed in a simple 3D scene, again using the Unreal Engine. A set of automated scripts would then take this 3D scene and interpolate between different poses, viewpoints, and zoom levels for each of the 3D creature models, creating the full color images and segmentation maps that formed the training dataset for the GAN. Using this approach, we generated 10,000+ image + segmentation map pairs per 3D creature model, saving the artists millions of hours of time compared to creating such data manually (at approximately 20 minutes per image).

Fine Tuning
The GAN had many different hyper-parameters that could be adjusted, leading to different qualities in the output images. In order to better understand which versions of the model were better than others, artists were provided samples for different creature types generated by these models and asked to cull them down to a few best examples. We gathered feedback about desired characteristics present in these examples, such as a feeling of depth, style with regard to creature textures, and realism of faces and eyes. This information was used both to train new versions of the model and, after the model had generated hundreds of thousands of creature images, to select the best image from each creature category (e.g., gazelle, lynx, gorilla, etc).

We tuned the GAN for this task by focusing on the perceptual loss. This loss function component (also used in Stadia’s Style Transfer ML) computes a difference between two images using extracted features from a separate convolutional neural network (CNN) that was previously trained on millions of photographs from the ImageNet dataset. The features are extracted from different layers of the CNN and a weight is applied to each, which affects their contribution to the final loss value. We discovered that these weights were critically important in determining what a final generated image would look like. Below are some examples from the GAN trained with different perceptual loss weights.

Dino-Bat Chimeras generated using varying perceptual loss weights.

Some of the variation in the images above is due to the fact that the dataset includes multiple textures for each creature (for example, a reddish or grayish version of the bat). However, ignoring the coloration, many differences are directly tied to changes in perceptual loss values. In particular, we found that certain values brought out sharper facial features (e.g., bottom right vs. top right) or “smooth” versus “patterned” (top right vs. bottom left) that made generated creatures feel more real.

Here are some creatures generated from the GAN trained with different perceptual loss weights, showing off a small sample of the outputs and poses that the model can handle.

Creatures generated using different models.
A generated chimera (Dino-Bat-Hyena, to be exact) created using the conditional GAN. Output from the GAN (left) and the post-processed / composited card (right).

Chimera Painter
The trained GAN is now available in the Chimera Painter demo, allowing artists to

work iteratively with the model, rather than drawing dozens of similar creatures from scratch. An artist can select a starting point and then adjust the shape, type, or placement of creature parts, enabling rapid exploration and for the creation of a large volume of images. The demo also allows for uploading a creature outline created in an external program, like Photoshop. Simply download one of the preset creature outlines to get the colors needed for each creature part and use this as a template for drawing one outside of Chimera Painter, and then use the “Load’ button on the demo to use this outline to flesh out your creation.

It is our hope that these GAN models and the Chimera Painter demonstration tool might inspire others to think differently about their art pipeline. What can one create when using machine learning as a paintbrush?

Acknowledgments
This project is conducted in collaboration with many people. Thanks to Ryan Poplin, Lee Dotson, Trung Le, Monica Dinculescu, Marc Destefano, Aaron Cammarata, Maggie Oh, Erin Hoffman-John, and Colin Boswell. Thanks to everyone who pitched in to give hours of art direction, technical feedback, and drawings of fantastic creatures.

Read More

Take the A100 Train: HPC Centers Worldwide Jump Aboard NVIDIA AI Supercomputing Fast Track

Take the A100 Train: HPC Centers Worldwide Jump Aboard NVIDIA AI Supercomputing Fast Track

Supercomputing centers worldwide are onboarding NVIDIA Ampere GPU architecture to serve the growing demands of heftier AI models for everything from drug discovery to energy research.

Joining this movement, Fujitsu has announced a new exascale system for Japan-based AI Bridging Cloud Infrastructure (ABCI), offering 600 petaflops of performance at the National Institute of Advanced Industrial Science and Technology.

The debut comes as model complexity has surged 30,000x in the past five years, with booming use of AI in research. With scientific applications, these hulking datasets can be held in memory, helping to minimize batch processing as well as to achieve higher throughput.

To fuel this next research ride, NVIDIA Monday introduced the NVIDIA A100 80GB GPU with HBM2e technology. It doubles the A100 40GB GPU’s high-bandwidth memory to 80GB and delivers over 2 terabytes per second of memory bandwidth.

New NVIDIA A100 80GB GPUs let larger models and datasets run in-memory at faster memory bandwidth, enabling higher compute and faster results on workloads. Reducing internode communication can boost AI training performance by 1.4x with half the GPUs.

NVIDIA also introduced new NVIDIA Mellanox 400G InfiniBand architecture, doubling data throughput and offering new in-network computing engines for added acceleration.

Europe Takes Supercomputing Ride

Europe is leaping in. Italian inter-university consortium CINECA announced the Leonardo system, the world’s fastest AI supercomputer. It taps 14,000 NVIDIA Ampere architecture GPUs and NVIDIA Mellanox InfiniBand networking for 10 exaflops of AI. France’s Atos is set to build it.

Leonardo joins a growing pack of European systems on NVIDIA AI platforms supported by the EuroHPC initiative. Its German neighbor, the Jülich Supercomputing Center, recently launched the first NVIDIA GPU-powered AI exascale system to come online in Europe, delivering the region’s most powerful AI platform. The new Atos-designed Jülich system, dubbed JUWELS, is a 2.5 exaflops AI supercomputer that captured No. 7 on the latest TOP500 list.

Those also getting on board include Luxembourg’s MeluXina supercomputer; IT4Innovations National Supercomputing Center, the most powerful supercomputer in the Czech Republic; and the Vega supercomputer at the Institute of Information Science in Maribor, Slovenia.

Linköping University is planning to build Sweden’s fastest AI supercomputer, dubbed BerzeLiUs, based on the NVIDIA DGX SuperPOD infrastructure. It’s expected to provide 300 petaflops of AI performance for cutting-edge research.

NVIDIA is building Cambridge-1, an 80-node DGX SuperPOD with 400 petaflops of AI performance. It will be the fastest AI supercomputer in the U.K. It’s planned to be used in collaborative research within the country’s AI and healthcare community across academia, industry and startups.

Full Steam Ahead in North America

North America is taking the exascale AI supercomputing ride. NERSC (the U.S. National Energy Research Scientific Computing Center) is adopting NVIDIA AI for projects on Perlmutter, its system packing 6,200 A100 GPUs. NERSC now lays claim to 3.9 exaflops of AI performance.

NVIDIA Selene, a cluster based on the DGX SuperPOD, provides a public reference architecture for large-scale GPU clusters that can be deployed in weeks. The NVIDIA DGX SuperPOD system landed the top spot on the Green500 list of most efficient supercomputers, achieving a new world record in power efficiency of 26.2 gigaflops per watt, and it has set eight new performance milestones for MLPerf inference.

The University of Florida and NVIDIA are building the world’s fastest AI supercomputer in academia, aiming to deliver 700 petaflops of AI performance. The partnership puts UF among leading U.S. AI universities, advances academic research and helps address some of Florida’s most complex challenges.

At Argonne National Laboratory, researchers will use a cluster of 24 NVIDIA DGX A100 systems to scan billions of drugs in the search for treatments for COVID-19.

Los Alamos National Laboratory, Hewlett Packard Enterprise and NVIDIA are teaming up to deliver next-generation technologies to accelerate scientific computing.

All Aboard in APAC

Supercomputers in APAC will also be fueled by NVIDIA Ampere architecture. Korean search engine NAVER and Japanese messaging service LINE are using a DGX SuperPOD built with 140 DGX A100 systems with 700 petaflops of peak AI performance to scale out research and development of natural language processing models and conversational AI services.

The Japan Agency for Marine-Earth Science and Technology, or JAMSTEC, is upgrading its Earth Simulator with NVIDIA A100 GPUs and NVIDIA InfiniBand. The supercomputer is expected to have 624 petaflops of peak AI performance with a maximum theoretical performance of 19.5 petaflops of HPC performance, which today would rank high among the TOP500 supercomputers.

India’s Centre for Development of Advanced Computing, or C-DAC, is commissioning the country’s fastest and largest AI supercomputer, called PARAM Siddhi – AI. Built with 42 DGX A100 systems, it delivers 200 exaflops of AI performance and will address challenges in healthcare, education, energy, cybersecurity, space, automotive and agriculture.

Buckle up. Scientific research worldwide has never enjoyed such a ride.

The post Take the A100 Train: HPC Centers Worldwide Jump Aboard NVIDIA AI Supercomputing Fast Track appeared first on The Official NVIDIA Blog.

Read More

NVIDIA, Ampere Computing Raise Arm 26x in Supercomputing

NVIDIA, Ampere Computing Raise Arm 26x in Supercomputing

In the past 18 months, researchers have witnessed a whopping 25.5x performance boost for Arm-based platforms in high performance computing, thanks to the combined efforts of the Arm and NVIDIA ecosystems.

Many engineers deserve a round of applause for the gains.

  • The Arm Neoverse N1 core gave systems-on-a-chip like Ampere Computing’s Altra an estimated 2.3x improvement over last year’s designs.
  • NVIDIA’s A100 Tensor Core GPUs delivered its largest ever gains in a single generation.
  • The latest platforms upshifted to more and faster cores, input/output lanes and memory.
  • And application developers tuned their software with many new optimizations.

As a result, NVIDIA’s Arm-based reference design for HPC, with two Ampere Altra SoCs and two A100 GPUs, just delivered 25.5x the muscle of the dual-SoC servers researchers were using in June 2019. Our GPU-accelerated, Arm-based reference platform alone saw a 2.5x performance gain in 12 months.

The results span applications — including GROMACS, LAMMPS, MILC, NAMD and Quantum Espresso — that are key to work like drug discovery, a top priority during the pandemic. These and many other applications ready to run on Arm-based systems are available in containers on NGC, our hub for GPU-accelerated software.

Companies and researchers pushing the limits in areas such as molecular dynamics and quantum chemistry can harness these apps to drive advances not only in basic science but in fields such as healthcare.

Under the Hood with Arm and HPC

The latest reference architecture marries the energy-efficient throughput of Ampere Computing’s Mt. Jade, a 2U-sized server platform, with NVIDIA’s HGX A100 that’s already accelerating several supercomputers around the world. It’s the successor to a design that debuted last year based on the Marvell ThunderX2 and NVIDIA V100 GPUs.

Mt. Jade consists of two Ampere Altra SoCs packing 80 cores each based on the Arm Neoverse N1 core, all running at up to 3 GHz. They provide a whopping 192 PCI Express Gen4 lanes and up to 8TB of memory to feed two A100 GPUs.

Ampere Computing Mt. Jade reference design
The Mt. Jade server platform supports 192 PCIe Gen4 lanes.

The combination creates a compelling node for next-generation supercomputers. Ampere Computing has already attracted support from nine original equipment and design manufacturers and systems integrators, including Gigabyte, Lenovo and Wiwynn.

A Rising Arm HPC Ecosystem

In another sign of an expanding ecosystem, the Arm HPC User Group hosted a virtual event ahead of SC20 with more than three dozen talks from organizations including AWS, Hewlett Packard Enterprise, the Juelich Supercomputing Center, RIKEN in Japan, and Oak Ridge and Sandia National Labs in the U.S. Most of the talks are available on its YouTube channel.

In June, Arm made its biggest splash in supercomputing to date. That’s when the Fugaku system in Japan debuted at No. 1 on the TOP500 list of the world’s fastest supercomputers with a stunning 415.5 petaflops using the Arm-based A64FX CPU from Fujitsu.

At the time it was one of four Arm-powered supercomputers on the list, and the first using Arm’s Scalable Vector Extensions, technology embedded in Arm’s next-generation Neoverse designs that NVIDIA will support in its software.

Meanwhile, AWS is already running in the cloud HPC jobs like genomics, financial risk modeling and computational fluid dynamics on its Arm-based Graviton2 processors.

NVIDIA Accelerates Arm in HPC

Arm’s growing HPC presence is part of a broad ecosystem of 13 million developers in areas that span smartphones to supercomputers. It’s a community NVIDIA aims to expand with our deal to acquire Arm to create the world’s premier company for the age of AI.

We’re extending the ecosystem with Arm support built into our NVIDIA AI, HPC, networking and graphics software. At last year’s supercomputing event, NVIDIA CEO Jensen Huang announced our work accelerating Arm in HPC in addition to our ongoing support for IBM POWER and x86 architectures.

Nvidia support for Arm ecosystem
Nvidia has expanded its support for the Arm ecosystem.

Since then, we’ve announced our BlueField-2 DPUs that use Arm IP to accelerate and secure networking and storage jobs for cloud, embedded and enterprise applications. And for more than a decade, we’ve been an avid user of Arm designs inside products such as our Jetson Nano modules for robotics and other embedded systems.

We’re excited to be part of dramatic performance gains for Arm in HPC. It’s the latest page in the story of an open, thriving Arm ecosystem that keeps getting better.

Learn more in the NVIDIA SC20 Special Address.

The post NVIDIA, Ampere Computing Raise Arm 26x in Supercomputing appeared first on The Official NVIDIA Blog.

Read More

Changing Times: How the World’s TOP500 Supercomputers Don’t Just Have to Be Fast, But Smart

Changing Times: How the World’s TOP500 Supercomputers Don’t Just Have to Be Fast, But Smart

The world’s fastest supercomputers aren’t just faster than ever. They’re smarter and support a greater variety of workloads, too.

Nearly 70 percent of the machines, including eight of the top 10, on the latest TOP500 list of the world’s fastest supercomputers, released today at SC20, are powered by NVIDIA technology.

In addition, four of the nominations for the Gordon Bell Prize, supercomputing’s most prestigious award — to be named this week at SC20 — use AI to drive their discoveries.

The common thread: our end-to-end HGX AI supercomputing platform, which accelerates scientific computing, data analytics and AI workloads. It’s a story that begins with a great chip and extremely fast, smart networking, but ultimately is all about NVIDIA’s globally adopted data-center-scale platform for doing great science.

The shift to incorporating AI into HPC, and a platform that extends beyond traditional supercomputing centers, represents a significant change in a field that, since Seymour Cray’s CDC 6600 was launched in 1964, has focused on harnessing ever larger, more powerful machines for compute-intensive simulation and modeling.

The latest TOP500 list is about more than high-performance Linpack results:

  • Speed records: Measured by the traditional benchmark of supercomputing performance — the speed it takes to do operations in a double-precision floating-point format called FP64 — NVIDIA technologies accelerate the world’s fastest clusters, powering eight of the top 10 machines. This includes the No. 5 system — NVIDIA’s own Selene supercomputer, the world’s most powerful commercial system — as well as new additions like JUWELS (Forschungszentrum Jülich) at No. 7 and Dammam-7 (Saudi Aramco) at No. 10.
  • “Smarts” records: When measured by HPL-AI, the mixed-precision standard that’s the benchmark for AI performance, NVIDIA-powered machines captured top spots on the list with Oak Ridge National Lab’s Summit supercomputer at 0.55 exaflops and NVIDIA Selene at 0.25 exaflops.
  • Green records: The NVIDIA DGX SuperPOD system captured the top spot on the Green500 list of most efficient supercomputers, achieving a new world record in power efficiency of 26.2 gigaflops per watt. Overall, NVIDIA-powered machines captured 25 of the top 30 spots on the list.

The Era of AI Supercomputing  Is in High Gear

Maybe the most impressive achievement: We’ve surpassed exascale computing well ahead of schedule.

In October, Italy’s CINECA supercomputing center unveiled plans to build Leonardo, the world’s most powerful AI supercomputer with an expected 10 exaflops of AI performance. It’s joined by a wave of new EuropHPC AI systems in the Czech Republic, Luxembourg and Slovenia. More are coming not only in Europe but also across Asia and North America.

That’s because modern AI harnesses the incredible parallel processing power of NVIDIA GPUs, NVIDIA CUDA-X libraries and NVIDIA Mellanox InfiniBand — the world’s only smart, fully accelerated in-network computing platform — to pour vast quantities of data into advanced neural networks, creating sophisticated models of the world around us. This lets scientists tackle much more ambitious projects than would otherwise be possible.

Take the example of the team from Lawrence Berkeley National Laboratory’s Computational Research Division, one of this year’s Gordon Bell Prize nominees. Thanks to AI, the team was able to increase the scale of their molecular dynamics simulation by at least 100x compared to the largest system simulated by previous nominees.

It’s About Advancing Science

Of course, it’s not just how fast your system is, but what you do with it in the real world that counts.

That’s why you’ll find the new breed of AI-powered being thrown into the front-line of the fight against COVID.

Three of the four of the nominees for a special Gordon Bell award focused tackling the COVID-19 pandemic rely on NVIDIA AI.

On the Lawrence Livermore National Laboratory’s Sierra supercomputer — No. 3 on the TOP500 list — a team trained an AI able to identify new drug candidates on 1.6 billion compounds in just 23 minutes.

On Oak Ridge’s Summit supercomputer — No. 2 on the TOP500 list — another team harnessed 27,612 NVIDIA GPUs to test 19,028 potential drug compounds on two key SARS-CoV-2 protein structures every second.

Another team used Summit to create an AI-driven workflow to model how the SARS-CoV-2 spike protein, the main viral infection machinery, attacks the human ACE2 receptor.

Thanks to the growing ubiquity of the scalable NVIDIA HGX AI supercomputing platform — which includes everything from processors to networking and software — scientists can run their workloads in the hyperscale data centers of cloud computing companies, as well as in supercomputers.

It’s a unified platform, enabling the fusion of high-performance computing, data analytics and AI workloads. With 2.3 million developers and supporting over 1,800 accelerated apps, all AI frameworks and popular data analytics frameworks including DASK and Spark, the platform enables scientists and researchers to be instantly productive on GPU-powered x86, Arm and Power systems.

In addition, the NVIDIA NGC catalog offers performance-optimized containers for the latest versions of HPC and AI applications. So scientists and researchers can deploy quickly and stay focused on advancing their science.

Learn more in the live NVIDIA SC20 Special Address at 3 p.m. PT today.

The post Changing Times: How the World’s TOP500 Supercomputers Don’t Just Have to Be Fast, But Smart appeared first on The Official NVIDIA Blog.

Read More

Mitigating Unfair Bias in ML Models with the MinDiff Framework

Mitigating Unfair Bias in ML Models with the MinDiff Framework

Posted by Flavien Prost, Senior Software Engineer, and Alex Beutel, Staff Research Scientist, Google Research

The responsible research and development of machine learning (ML) can play a pivotal role in helping to solve a wide variety of societal challenges. At Google, our research reflects our AI Principles, from helping to protect patients from medication errors and improving flood forecasting models, to presenting methods that tackle unfair bias in products, such as Google Translate, and providing resources for other researchers to do the same.

One broad category for applying ML responsibly is the task of classification — systems that sort data into labeled categories. At Google, such models are used throughout our products to enforce policies, ranging from the detection of hate speech to age-appropriate content filtering. While these classifiers serve vital functions, it is also essential that they are built in ways that minimize unfair biases for users.

Today, we are announcing the release of MinDiff, a new regularization technique available in the TF Model Remediation library for effectively and efficiently mitigating unfair biases when training ML models. In this post, we discuss the research behind this technique and explain how it addresses the practical constraints and requirements we’ve observed when incorporating it in Google’s products.

Unfair Biases in Classifiers
To illustrate how MinDiff can be used, consider an example of a product policy classifier that is tasked with identifying and removing text comments that could be considered toxic. One challenge is to make sure that the classifier is not unfairly biased against submissions from a particular group of users, which could result in incorrect removal of content from these groups.

The academic community has laid a solid theoretical foundation for ML fairness, offering a breadth of perspectives on what unfair bias means and on the tensions between different frameworks for evaluating fairness. One of the most common metrics is equality of opportunity, which, in our example, means measuring and seeking to minimize the difference in false positive rate (FPR) across groups. In the example above, this means that the classifier should not be more likely to incorrectly remove safe comments from one group than another. Similarly, the classifier’s false negative rate should be equal between groups. That is, the classifier should not miss toxic comments against one group more than it does for another.

When the end goal is to improve products, it’s important to be able to scale unfair bias mitigation to many models. However, this poses a number of challenges:

  • Sparse demographic data: The original work on equality of opportunity proposed a post-processing approach to the problem, which consisted of assigning each user group a different classifier threshold at serving time to offset biases of the model. However, in practice this is often not possible for many reasons, such as privacy policies. For example, demographics are often collected by users self-identifying and opting in, but while some users will choose to do this, others may choose to opt-out or delete data. Even for in-process solutions (i.e., methods that change how a model is trained) one needs to assume that most data will not have associated demographics, and thus needs to make efficient use of the few examples for which demographics are known.
  • Ease of Use: In order for any technique to be adopted broadly, it should be easy to incorporate into existing model architectures, and not be highly sensitive to hyperparameters. While an early approach to incorporating ML fairness principles into applications utilized adversarial learning, we found that it too frequently caused models to degenerate during training, which made it difficult for product teams to iterate and made new product teams wary.
  • Quality: The method for removing unfair biases should also reduce the overall classification performance (e.g., accuracy) as little as possible. Because any decrease in accuracy caused by the mitigation approach could result in the moderation model allowing more toxic comments, striking the right balance is crucial.

MinDiff Framework
We iteratively developed the MinDiff framework over the previous few years to meet these design requirements. Because demographic information is so rarely known, we utilize in-process approaches in which the model’s training objective is augmented with an objective specifically focused on removing biases. This new objective is then optimized over the small sample of data with known demographic information. To improve ease of use, we switched from adversarial training to a regularization framework, which penalizes statistical dependency between its predictions and demographic information for non-harmful examples. This encourages the model to equalize error rates across groups, e.g., classifying non-harmful examples as toxic.

There are several ways to encode this dependency between predictions and demographic information. Our initial MinDiff implementation minimized the correlation between the predictions and the demographic group, which essentially optimized for the average and variance of predictions to be equal across groups, even if the distributions still differ afterward. We have since improved MinDiff further by considering the maximum mean discrepancy (MMD) loss, which is closer to optimizing for the distribution of predictions to be independent of demographics. We have found that this approach is better able to both remove biases and maintain model accuracy.

MinDiff with MMD better closes the FPR gap with less decrease in accuracy
(on an academic benchmark dataset).

To date we have launched modeling improvements across several classifiers at Google that moderate content quality. We went through multiple iterations to develop a robust, responsible, and scalable approach, solving research challenges and enabling broad adoption.

Gaps in error rates of classifiers is an important set of unfair biases to address, but not the only one that arises in ML applications. For ML researchers and practitioners, we hope this work can further advance research toward addressing even broader classes of unfair biases and the development of approaches that can be used in practical applications. In addition, we hope that the release of the MinDiff library and the associated demos and documentation, along with the tools and experience shared here, can help practitioners improve their models and products.

Acknowledgements
This research effort on ML Fairness in classification was jointly led with Jilin Chen, Shuo Chen, Ed H. Chi, Tulsee Doshi, and Hai Qian. Further, this work was pursued in collaboration with Jonathan Bischof, Qiuwen Chen, Pierre Kreitmann, and Christine Luu. The MinDiff infrastructure was also developed in collaboration with Nick Blumm, James Chen, Thomas Greenspan, Christina Greer, Lichan Hong, Manasi Joshi, Maciej Kula, Summer Misherghi, Dan Nanas, Sean O’Keefe, Mahesh Sathiamoorthy, Catherina Xu, and Zhe Zhao. (All names are listed in alphabetical order of last names.)

Read More

Applying MinDiff to Improve Model Fairness

Applying MinDiff to Improve Model Fairness

Posted by Summer Misherghi and Thomas Greenspan, Software Engineers, Google Research

Last December, we open-sourced Fairness Indicators, a platform that enables sliced evaluation of machine learning model performance. This type of responsible evaluation is a crucial first step toward avoiding bias as it allows us to determine how our models are working for a wide variety of users. When we do identify that our model underperforms on certain slices of our data, we need a strategy to mitigate this to avoid creating or reinforcing unfair bias, in line with Google’s AI Principles.

Today, we’re announcing MinDiff, a technique for addressing unfair bias in machine learning models. Given two slices of data, MinDiff works by penalizing your model for differences in the distributions of scores between the two sets. As the model trains, it will try to minimize the penalty by bringing the distributions closer together. MinDiff is the first in what will ultimately be a larger Model Remediation Library of techniques, each suitable for different use cases. To learn about the research and theory behind MinDiff, please see our post on the Google AI Blog.

MinDiff Walkthrough

You can follow along and run the code yourself in this MinDiff notebook. In this walkthrough, we’ll emphasize important points in the notebook, while providing context on fairness evaluation and remediation.

In this example, we are training a text classifier to identify written content that could be considered “toxic.” For this task, our baseline model will be a simple Keras sequential model pre-trained on the Civil Comments dataset. Since this text classifier could be used to automatically moderate forums on the internet (for example, to flag potentially toxic comments), we want to ensure that it works well for everyone. You can read more about how fairness problems can arise in automated content moderation in this blog post.

To attempt to mitigate potential fairness concerns, we will:

  1. Evaluate our baseline model’s performance on text containing references to sensitive groups.
  2. Improve performance on any underperforming groups by training with MinDiff.
  3. Evaluate the new model’s performance on our chosen metric.

Our purpose is to demonstrate usage of the MinDiff technique for you with a minimal workflow, not to lay out a complete approach to fairness in machine learning. Our evaluation will only focus on one sensitive category and a single metric. We also don’t address potential shortcomings in the dataset, nor tune our configurations.

In a production setting, you would want to approach each of these with more rigor. For example:

  • Consider the application space and the potential societal impact of your model; what are the implications of different types of model errors?
  • Consider additional categories for which underperformance might have fairness implications. Do you have sufficient examples for groups in each category?
  • Consider any privacy implications to storing the sensitive categories.
  • Consider any metric for which poor performance could translate into harmful outcomes.
  • Conduct thorough evaluation for all relevant metrics on multiple sensitive categories.
  • Experiment with the configuration of MinDiff by tuning hyperparameters to get optimal performance.

For the purpose of this blog post, we’ll skip building and training our baseline model, and jump right to evaluating its performance. We’ve used some utility functions to compute our metrics and we’re ready to visualize evaluation results (See “Render Evaluation Results” in the notebook):

 widget_view.render_fairness_indicator(eval_result)  
TensorFlow image

Let’s look at the evaluation results. Try selecting the metric false positive rate (FPR) with threshold 0.450. We can see that the model does not perform as well for some religious groups as for others, displaying a much higher FPR. Note the wide confidence intervals on some groups because they have too few examples. This makes it difficult to say with certainty that there is a significant difference in performance for these slices. We may want to collect more examples to address this issue. We can, however, attempt to apply MinDiff for the two groups that we are confident are underperforming.

We’ve chosen to focus on FPR, because a higher FPR means that comments referencing these identity groups are more likely to be incorrectly flagged as toxic. This could lead to inequitable outcomes for users engaging in dialogue about religion, but note that disparities in other metrics can lead to other types of harm.

Now, we’ll try to improve the FPR for religious groups for which our model underperforms. We’ll attempt to do so using MinDiff, a remediation technique that seeks to balance error rates across slices of your data by penalizing disparities in performance during training. When we apply MinDiff, model performance may degrade slightly on other slices. As such, our goals with MinDiff will be to improve performance for underperforming groups, while sustaining strong performance for other groups and overall.

To use MinDiff, we create two additional data splits:

  • A split for non-toxic examples referencing minority groups: In our case, this will include comments with references to our underperforming identity terms. We don’t include some of the groups because there are too few examples, leading to higher uncertainty with wide confidence interval ranges.
  • A split for non-toxic examples referencing the majority group.

It’s important to have sufficient examples belonging to the underperforming classes. Based on your model architecture, data distribution, and MinDiff configuration, the amount of data needed can vary significantly. In past applications, we have seen MinDiff work well with at least 5,000 examples in each data split.

In our case, the groups in the minority splits have example quantities of 9,688 and 3,906. Note the class imbalances in the dataset; in practice, this could be cause for concern, but we won’t seek to address them in this notebook since our intention is just to demonstrate MinDiff.

We select only negative examples for these groups, so that MinDiff can optimize on getting these examples right. It may seem counterintuitive to carve out sets of ground truth negative examples if we’re primarily concerned with disparities in false positive rate, but remember that a false positive prediction is a ground truth negative example that’s incorrectly classified as positive, which is the issue we’re trying to address.

To prepare our data splits, we create masks for the sensitive & non-sensitive groups:

minority_mask = data_train.religion.apply(
lambda x: any(religion in x for religion in ('jewish', 'muslim')))
majority_mask = data_train.religion.apply(
lambda x: x == "['christian']")

Next, we select negative examples, so MinDiff will be able to reduce FPR for sensitive groups:

true_negative_mask = data_train['toxicity'] == 0

data_train_main = copy.copy(data_train)
data_train_sensitive = (
data_train[minority_mask & true_negative_mask])
data_train_nonsensitive = (
data_train[majority_mask & true_negative_mask])

To start training with MinDiff, we need to convert our data to TensorFlow Datasets (not shown here — see “Create MinDiff Datasets” in the notebook for details). Don’t forget to batch your data for training. In our case, we set the batch sizes to the same value as the original dataset but this is not a requirement and in practice should be tuned.

dataset_train_sensitive = dataset_train_sensitive.batch(BATCH_SIZE)
dataset_train_nonsensitive = (
dataset_train_nonsensitive.batch(BATCH_SIZE))

Once we have prepared our three datasets, we merge them into one MinDiff dataset using a util function provided in the library.

min_diff_dataset = md.keras.utils.pack_min_diff_data(
dataset_train_main,
dataset_train_sensitive,
dataset_train_nonsensitive)

To train with MinDiff, simply take the original model and wrap it in a MinDiffModel with a corresponding `loss` and `loss_weight`. We are using 1.5 as the default `loss_weight`, but this is a parameter that needs to be tuned for your use case, since it depends on your model and product requirements. You should experiment with changing the value to see how it impacts the model, noting that increasing it pushes the performance of the minority and majority groups closer together but may come with more pronounced tradeoffs.

As specified above, we create the original model, and wrap it in a MinDiffModel. We pass in one of the MinDiff losses and use a moderately high weight of 1.5.

original_model = ...  # Same structure as used for baseline model. 

min_diff_loss = md.losses.MMDLoss()
min_diff_weight = 1.5
min_diff_model = md.keras.MinDiffModel(
original_model, min_diff_loss, min_diff_weight)

After wrapping the original model, we compile the model as usual. This means using the same loss as for the baseline model:

optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
loss = tf.keras.losses.BinaryCrossentropy()
min_diff_model.compile(
optimizer=optimizer, loss=loss, metrics=['accuracy'])

We fit the model to train on the MinDiff dataset, and save the original model to evaluate (see API documentation for details on why we don’t save the MinDiff model).

min_diff_model.fit(min_diff_dataset, epochs=20)

min_diff_model.save_original_model(
min_diff_model_location, save_format='tf')

Finally, we evaluate the new results.

min_diff_eval_subdir = 'eval_results_min_diff'
min_diff_eval_result = util.get_eval_results(
min_diff_model_location, base_dir, min_diff_eval_subdir,
validate_tfrecord_file, slice_selection='religion')

To ensure we evaluate a new model correctly, we need to select a threshold the same way that we would the baseline model. In a production setting, this would mean ensuring that evaluation metrics meet launch standards. In our case, we will pick the threshold that results in a similar overall FPR to the baseline model. This threshold may be different from the one you selected for the baseline model. Try selecting false positive rate with threshold 0.400. (Note that the subgroups with very low quantity examples have very wide confidence range intervals and don’t have predictable results.)

 widget_view.render_fairness_indicator(min_diff_eval_result) 
TensorFlow Image

Note: The scale of the y-axis has changed from .04 in the graph for the baseline model to .02 for our MinDiff model

Reviewing these results, you may notice that the FPRs for our target groups have improved. The gap between our lowest performing group and the majority group has improved from .024 to .006. Given the improvements we’ve observed and the continued strong performance for the majority group, we’ve satisfied both of our goals. Depending on the product, further improvements may be necessary, but this approach has gotten our model one step closer to performing equitably for all users.

MinDiff Chart

To get a better sense of scale, we superimposed the MinDiff model on top of the base model.

You can get started with MinDiff by visiting the MinDiff page on tensorflow.org. More information about the research behind MinDiff is available in our post on the Google AI Blog. You can also learn more about evaluating for fairness in this guide.

Acknowledgements

The MinDiff framework was developed in collaboration with Thomas Greenspan, Summer Misherghi, Sean O’Keefe‎, Christina Greer, Catherina Xu‎, Manasi Joshi, Dan Nanas, Nick Blumm, Jilin Chen, Zhe Zhao, James Chen, Maciej Kula, Lichan Hong, Mahesh Sathiamoorthy. This research effort on ML Fairness in classification was jointly led by (in alphabetical order) Alex Beutel, Ed H. Chi, Flavien Prost, Hai Qian, Jilin Chen, Shuo Chen, and Tulsee Doshi. Further, this work was pursued in collaboration with Christine Luu, Jonathan Bischof, Pierre Kreitmann, and Qiuwen Chen.

Read More