GeForce NOW Streams High-Res, 120-FPS PC Gaming to World’s First Cloud Gaming Chromebooks

High-end PC gaming arrives on more devices this GFN Thursday.

GeForce NOW RTX 3080 members can now stream their favorite PC games at up to 1600p and 120 frames per second in a Chrome browser. No downloads, no installs, just victory.

Even better, NVIDIA has worked with Google to support the newest Chromebooks, which are the first laptops custom built for cloud gaming, with gorgeous 1600p resolution 120Hz+ displays. They come with a free three-month GeForce NOW RTX 3080 membership, the highest performance tier.

On top of these new ways to play, this GFN Thursday brings hordes of fun with 11 new titles streaming from the cloud — including the Warhammer 40,000: Darktide closed beta, available Oct. 14-16.

High-Performance PC Gaming, Now on Chromebooks

Google’s newest Chromebooks are the first built for cloud gaming, and include GeForce NOW right out of the box.

These new cloud gaming Chromebooks — the Acer Chromebook 516 GE, the ASUS Chromebook Vibe CX55 Flip and the Lenovo Ideapad Gaming Chromebook — all include high refresh rate, high-resolution displays, gaming keyboards, fast WiFi 6 connectivity and immersive audio. And with the GeForce NOW RTX 3080 membership, gamers can instantly stream 1,400+ PC games from the GeForce NOW library at up to 1600p at 120 FPS.

 

That means Chromebook gamers can jump right into over 100 free-to-play titles, including major franchises like Fortnite, Genshin Impact and League of Legends. RTX 3080 members can explore the worlds of Cyberpunk 2077, Control and more with RTX ON, only through GeForce NOW. Compete online with ultra-low latency and other features perfect for playing.

The GeForce NOW app comes preinstalled on these cloud gaming Chromebooks, so users can jump straight into the gaming — just tap, search, launch and play. Plus, pin games from GeForce NOW right to the app shelf to get back into them with just a click.

For new and existing members, every cloud gaming Chromebook includes a free three-month RTX 3080 membership through the Chromebook Perks program.

Stop! Warhammer Time

Fatshark leaps thousands of years into the future to bring gamers Warhammer 40,000: Darktide on Tuesday, Nov. 30.

Gamers who’ve preordered on Steam can get an early taste of the game with a closed beta period, running Oct. 14-16.

Warhammer 40K Darktide Closed Beta
Take back the city of Tertium from hordes of bloodthirsty foes in this intense and brutal action shooter.

Head to the industrial city of Tertium to combat the forces of Chaos, using Vermintide 2’s lauded melee system and a range of deadly Warhammer 40,000 weapons. Personalize your play style with a character-creation system and delve deep into the city to put a stop to the horrors that lurk.

The fun doesn’t stop there. Members can look for these new titles streaming this week:

  • Asterigos: Curse of the Stars (New release on Steam)
  • Kamiwaza: Way of the Thief (New release on Steam)
  • LEGO Bricktales (New release on Steam and Epic Games)
  • Ozymandias: Bronze Age Empire Sim (New release on Steam)
  • PC Building Simulator 2 (New release on Epic Games)
  • The Last Oricru (New release on Steam, Oct. 13)
  • Rabbids: Party of Legends (New release on Ubisoft, Oct. 13)
  • The Darkest Tales (New release on Steam,Oct. 13)
  • Scorn (New Release on Steam and Epic Games, Oct. 14)
  • Warhammer 40,000: Darktide Closed Beta (New release on Steam, available from Oct. 14th, 7 AM PT to Oct. 17th 1 AM PT)
  • Dual Universe (Steam)

Finally, we have a question for you – we promise not to tattle. Let us know your answer on Twitter or in the comments below.

The post GeForce NOW Streams High-Res, 120-FPS PC Gaming to World’s First Cloud Gaming Chromebooks appeared first on NVIDIA Blog.

Read More

Scaling PyTorch models on Cloud TPUs with FSDP

Introduction

The research community has witnessed a lot of successes with large models across NLP, computer vision, and other domains in recent years. Many of these successes were enabled by Cloud TPUs – which are powerful hardware for distributed training. To support TPUs in PyTorch, the PyTorch/XLA library provides a backend for XLA devices (most notably TPUs) and lays the groundwork for scaling large PyTorch models on TPUs.

However, most existing modeling scaling tools in the PyTorch ecosystem assume GPU (or CPU) devices, often depend on specific features in CUDA, and do not work directly on TPUs. The lack of scaling tools makes it challenging to build large models that cannot fit into the memory of a single TPU chip.

To support model scaling on TPUs, we implemented the widely-adopted Fully Sharded Data Parallel (FSDP) algorithm for XLA devices as part of the PyTorch/XLA 1.12 release. We provide an FSDP interface with a similar high-level design to the CUDA-based PyTorch FSDP class while also handling several restrictions in XLA (see Design Notes below for more details). This FSDP interface allowed us to easily build models with e.g. 10B+ parameters on TPUs and has enabled many research explorations.

Using Fully Sharded Data Parallel (FSDP) in PyTorch/XLA

We provide a wrapper class XlaFullyShardedDataParallel over a given PyTorch model to shard its parameters across data-parallel workers. An example usage is as follows:

import torch
import torch_xla.core.xla_model as xm
from torch_xla.distributed.fsdp import XlaFullyShardedDataParallel as FSDP

model = FSDP(my_module)
optim = torch.optim.Adam(model.parameters(), lr=0.0001)
output = model(x, y)
loss = output.sum()
loss.backward()
optim.step()

Wrapping an nn.Module instance with XlaFullyShardedDataParallel enables the ZeRO-2 algorithm on it, where its gradients and the optimizer states are sharded for the entire training process. During its forward and backward passes, the full parameters of the wrapped module are first reconstructed from their corresponding shards for computation.

Nested FSDP wrapping can be used to further save memory. This allows the model to store only the full parameters of one individual layer at any given time. For nested FSDP, one should first wrap its individual submodules with an inner FSDP before wrapping the base model with an outer FSDP. This allows the model to store only the full parameters of one individual layer at any given time. And having an outer wrapper ensures to handle any leftover parameters, corresponding to the ZeRO-3 algorithm. Nested FSDP wrapping can be applied at any depth of submodules and there can be more than 2 layers of nesting.

Model checkpoint saving and loading for models and optimizers can be done like before by saving and loading their .state_dict(). Meanwhile, each training process should save its own checkpoint file of the sharded model parameters and optimizer states, and load the checkpoint file for the corresponding rank when resuming (regardless of ZeRO-2 or ZeRO-3, i.e. nested wrapping or not). A command line tool and a Python interface are provided to consolidate the sharded model checkpoint files together into a full/unshareded model checkpoint file.

Gradient checkpointing (also referred to as “activation checkpointing” or “rematerialization”) is another common technique for model scaling and can be used in conjunction with FSDP. We provide checkpoint_module, a wrapper function over a given nn.Module instance for gradient checkpointing (based on torch_xla.utils.checkpoint.checkpoint).

The MNIST and ImageNet examples below provide illustrative usages of (plain or nested) FSDP, saving and consolidation of model checkpoints, as well as gradient checkpointing.

Starting examples of FSDP in PyTorch/XLA

Training MNIST and ImageNet with FSDP

MNIST and ImageNet classification can often be used as starting points to build more complicated deep learning models. We provide the following FSDP examples on these two datasets:

A comparison of them with the vanilla data-parallel examples of MNIST and ImageNet illustrates how to adapt a training script to use FSDP. A major distinction to keep in mind is that when stepping the optimizer on an FSDP-wrapped model, one should directly call optimizer.step() instead of xm.optimizer_step(optimizer). The latter reduces the gradients across ranks, which is not what we need in FSDP, where the gradients are already reduced and sharded (from a reduce-scatter op in its backward pass).

Installation

FSDP is available from the PyTorch/XLA 1.12 and newer nightly releases. Please refer to https://github.com/pytorch/xla#-available-images-and-wheels for a guide on installation as well as Cloud TPU allocation. Then clone PyTorch/XLA repo on a TPU VM as follows

mkdir -p ~/pytorch && cd ~/pytorch
git clone --recursive https://github.com/pytorch/xla.git
cd ~/

Train MNIST on v3-8 TPU

It gets around 98.9 accuracy for 2 epochs:

python3 ~/pytorch/xla/test/test_train_mp_mnist_fsdp_with_ckpt.py 
  --batch_size 16 --drop_last --num_epochs 2 
  --use_nested_fsdp

The script above automatically tests consolidation of the sharded model checkpoints at the end. You can also manually consolidate the sharded checkpoint files via

python3 -m torch_xla.distributed.fsdp.consolidate_sharded_ckpts 
  --ckpt_prefix /tmp/mnist-fsdp/final_ckpt 
  --ckpt_suffix "_rank-*-of-*.pth"

Train ImageNet with ResNet-50 on v3-8 TPU

It gets around 75.9 accuracy for 100 epochs, same as what one would get without using FSDP; download and preprocess the ImageNet-1k dataset to /datasets/imagenet-1k:

python3 ~/pytorch/xla/test/test_train_mp_imagenet_fsdp.py 
  --datadir /datasets/imagenet-1k --drop_last 
  --model resnet50 --test_set_batch_size 64 --eval_interval 10 
  --lr 0.4 --batch_size 128 --num_warmup_epochs 5 
  --lr_scheduler_divide_every_n_epochs 30 --lr_scheduler_divisor 10 
  --num_epochs 100 
  --use_nested_fsdp

You can also explore other options in these two examples, such as --use_gradient_checkpointing to apply gradient checkpointing (i.e. activation checkpointing) on the ResNet blocks, or --compute_dtype bfloat16 to perform forward and backward passes in bfloat16 precision.

Examples on large-scale models

When building large models on TPUs, we often need to be aware of the memory constraints (e.g. 16 GB per core in TPU v3 and 32 GB per chip in TPU v4). For large models that cannot fit into a single TPU memory or the host CPU memory, one should use nested FSDP to implement the ZeRO-3 algorithm interleave submodule construction with inner FSDP wrapping, so that the full model never needs to be stored in memory during construction.

We illustrate these cases in https://github.com/ronghanghu/ptxla_scaling_examples, which provides examples of training a Vision Transformer (ViT) model with 10B+ parameters on a TPU v3 pod (with 128 cores) as well as other cases.

Design Notes

One might wonder why we need to develop a separate FSDP class in PyTorch/XLA instead of directly reusing PyTorch’s FSDP class or extending it to the XLA backend. The main motivation behind a separate FSDP class in PyTorch/XLA is that the native PyTorch’s FSDP class heavily relies on CUDA features that are not supported by XLA devices, while XLA also has several unique characteristics that need special handling. These distinctions require a different implementation of FSDP that would be much easier to build in a separate class.

Changes in API calls

One prominent distinction is that the native PyTorch FSDP is built upon separate CUDA streams for asynchronous execution in eager mode, while PyTorch/XLA runs in lazy mode and also does not support streams. In addition, TPU requires that all devices homogeneously run the same program. As a result, in the PyTorch/XLA FSDP implementation, CUDA calls and per-process heterogeneity need to be replaced by XLA APIs and alternative homogeneous implementations.

Tensor Storage Handling

Another prominent distinction is how to free a tensor’s storage, which is much harder in XLA than in CUDA. To implement ZeRO-3, one needs to free the storage of full parameters after a module’s forward pass, so that the next module can reuse this memory buffer for subsequent computation. PyTorch’s FSPD accomplishes this on CUDA by freeing the actual storage of a parameter p via p.data.storage().resize_(0). However, XLA tensors do not have this .storage() handle given that the XLA HLO IRs are completely functional and do not provide any ops to deallocate a tensor or resize its storage. Below the PyTorch interface, only the XLA compiler can decide when to free a TPU device memory corresponding to an XLA tensor, and a prerequisite is that the memory can only be released when the tensor object gets deallocated in Python – which cannot happen in FSDP because these parameter tensors are referenced as module attributes and also saved by PyTorch autograd for the backward pass.

Our solution to this issue is to split a tensor’s value properties from its autograd Variable properties, and to free a nn.Parameter tensor by setting its .data attribute to a dummy scalar of size 1. This way the actual data tensor for the full parameter gets dereferenced in Python so that XLA can recycle its memory for other computation, while autograd can still trace the base nn.Parameter as a weak reference to the parameter data. To get this to work, one also needs to handle views over the parameters as views in PyTorch also hold references to its actual data (this required fixing a shape-related issue with views in PyTorch/XLA).

Working with XLA compiler

The solution above should be enough to free full parameters if the XLA compiler faithfully preserves the operations and their execution order in our PyTorch program. But there is another problem – XLA attempts to optimize the program to speed up its execution by applying common subexpression elimination (CSE) to the HLO IRs. In a naive implementation of FSDP, the XLA compiler typically eliminates the 2nd all-gather in the backward pass to reconstruct the full parameters when it sees that it is a repeated computation from the forward pass, and directly holds and reuses the full parameters we want to free up after the forward pass. To guard against this undesired compiler behavior, we introduced the optimization barrier op into PyTorch/XLA and used it to stop eliminating the 2nd all-gather. This optimization barrier is also applied to a similar case of gradient checkpointing to prevent CSE between forward and backward passes that could eliminate the rematerialization.

In the future, if the distinctions between CUDA and XLA become not as prominent as mentioned above, it could be worth considering a merge of the PyTorch/XLA FSDP with the native PyTorch FSDP to have a unified interface.

Acknowledgments

Thanks to Junmin Hao from AWS for reviewing the PyTorch/XLA FSDP pull request. Thanks to Brian Hirsh from the Meta PyTorch team for support on the PyTorch core issues. Thanks to Isaack Karanja, Will Cromar, and Blake Hechtman from Google for support on GCP, XLA, and TPU issues.

Thanks to Piotr Dollar, Wan-Yen Lo, Alex Berg, Ryan Mark, Kaiming He, Xinlei Chen, Saining Xie, Shoubhik Debnath, Min Xu, and Vaibhav Aggarwal from Meta FAIR for various TPU-related discussions.

Read More

Large and Fully Charged: Polestar 3 Sets New Standard for Premium Electric SUVs

The age of electric vehicles has arrived and, with it, an entirely new standard for premium SUVs.

Polestar, the performance EV brand spun out from Volvo Cars, launched its third model today in Copenhagen. With the Polestar 3, the automaker has taken SUV design back to the drawing board, building a vehicle as innovative as the technology it features.

The EV premieres a new aerodynamic profile from the brand, in addition to sustainable materials and advanced active and passive safety systems. The Polestar 3 also maintains some attributes of a traditional SUV, including a powerful and wide stance.

Courtesy of Polestar

It features a 14.5-inch center display for easily accessible infotainment, in addition to 300 miles of battery range to tackle trips of any distance.

The Polestar 3 is the brand’s first SUV, as well as its first model to run on the high-performance, centralized compute of the NVIDIA DRIVE platform. This software-defined architecture lends the Polestar 3 its cutting-edge personality, making it an SUV that tops the list in every category.

Reigning Supreme

The crown jewel of a software-defined vehicle is its core compute — and the Polestar 3 is built with top-of-the-line hardware and software.

The NVIDIA DRIVE high-performance AI compute platform processes data from the SUV’s multiple sensors and cameras to enable advanced driver-assistance safety (ADAS) features and driver monitoring.

Courtesy of Polestar

This ADAS system combines technology from Zenseact, Luminar and Smart Eye that integrates seamlessly thanks to the centralized computing power of NVIDIA DRIVE.

By running on a software-defined architecture, these automated driving features will continue to gain new functionality via over-the-air updates and eventually perform autonomous highway driving.

The Polestar 3 customers’ initial purchase won’t remain the same years or even months later — it will be constantly improving and achieving capabilities not yet even dreamed of.

Charging Ahead

The Polestar 3 kicks off a new phase for the automaker, which is accelerating its product and international growth plans.

The SUV will begin deliveries late next year. Starting with the Polestar 3, the automaker expects to launch a new car every year for the next three years and aims to expand its presence to at least 30 global markets by the end of 2023.

The automaker is targeting 10x growth in global sales, to reach 290,000 vehicles sold by the end of 2025 from about 29,000 in 2021.

And with its future-forward SUV, Polestar is adding a dazzling jewel to its already star-studded crown.

The post Large and Fully Charged: Polestar 3 Sets New Standard for Premium Electric SUVs appeared first on NVIDIA Blog.

Read More

Customize business rules for intelligent document processing with human review and BI visualization

A massive amount of business documents are processed daily across industries. Many of these documents are paper-based, scanned into your system as images, or in an unstructured format like PDF. Each company may apply unique rules associated with its business background while processing these documents. How to extract information accurately and process them flexibly is a challenge many companies face.

Amazon Intelligent Document Processing (IDP) allows you to take advantage of industry-leading machine learning (ML) technology without previous ML experience. This post introduces a solution included in the Amazon IDP workshop showcasing how to process documents to serve flexible business rules using Amazon AI services. You can use the following step-by-step Jupyter notebook to complete the lab.

Amazon Textract helps you easily extract text from various documents, and Amazon Augmented AI (Amazon A2I) allows you to implement a human review of ML predictions. The default Amazon A2I template allows you to build a human review pipeline based on rules, such as when the extraction confidence score is lower than a pre-defined threshold or required keys are missing. But in a production environment, you need the document processing pipeline to support flexible business rules, such as validating the string format, verifying the data type and range, and validating fields across documents. This post shows how you can use Amazon Textract and Amazon A2I to customize a generic document processing pipeline supporting flexible business rules.

Solution overview

For our sample solution, we use the Tax Form 990, a US IRS (Internal Revenue Service) form that provides the public with financial information about a non-profit organization. For this example, we only cover the extraction logic for some of the fields on the first page of the form. You can find more sample documents on the IRS website.

The following diagram illustrates the IDP pipeline that supports customized business rules with human review.IDP HITM Overview

The architecture is composed of three logical stages:

  • Extraction – Extract data from the 990 Tax Form (we use page 1 as an example).

    • Retrieve a sample image stored in an Amazon Simple Storage Service (Amazon S3) bucket.
    • Call the Amazon Textract analyze_document API using the Queries feature to extract text from the page.
  • Validation – Apply flexible business rules with a human-in-the-loop review.

    • Validate the extracted data against business rules, such as validating the length of an ID field.
    • Send the document to Amazon A2I for a human to review if any business rules fail.
    • Reviewers use the Amazon A2I UI (a customizable website) to verify the extraction result.
  • BI visualization – We use Amazon QuickSight to build a business intelligence (BI) dashboard showing the process insights.

Customize business rules

You can define a generic business rule in the following JSON format. In the sample code, we define three rules:

  • The first rule is for the employer ID field. The rule fails if the Amazon Textract confidence score is lower than 99%. For this post, we set the confidence score threshold high, which will break by design. You could adjust the threshold to a more reasonable value to reduce unnecessary human effort in a real-world environment, such as 90%.
  • The second rule is for the DLN field (the unique identifier of the tax form), which is required for the downstream processing logic. This rule fails if the DLN field is missing or has an empty value.
  • The third rule is also for the DLN field but with a different condition type: LengthCheck. The rule breaks if the DLN length is not 16 characters.

The following code shows our business rules in JSON format:

rules = [
    {
        "description": "Employee Id confidence score should greater than 99",
        "field_name": "d.employer_id",
        "field_name_regex": None, # support Regex: "_confidence$",
        "condition_category": "Confidence",
        "condition_type": "ConfidenceThreshold",
        "condition_setting": "99",
    },
    {
        "description": "dln is required",
        "field_name": "dln",
        "condition_category": "Required",
        "condition_type": "Required",
        "condition_setting": None,
    },
    {
        "description": "dln length should be 16",
        "field_name": "dln",
        "condition_category": "LengthCheck",
        "condition_type": "ValueRegex",
        "condition_setting": "^[0-9a-zA-Z]{16}$",
    }
]

You can expand the solution by adding more business rules following the same structure.

Extract text using an Amazon Textract query

In the sample solution, we call the Amazon Textract analyze_document API query feature to extract fields by asking specific questions. You don’t need to know the structure of the data in the document (table, form, implied field, nested data) or worry about variations across document versions and formats. Queries use a combination of visual, spatial, and language cues to extract the information you seek with high accuracy.

To extract value for the DLN field, you can send a request with questions in natural languages, such as “What is the DLN?” Amazon Textract returns the text, confidence, and other metadata if it finds corresponding information on the image or document. The following is an example of an Amazon Textract query request:

textract.analyze_document(
        Document={'S3Object': {'Bucket': data_bucket, 'Name': s3_key}},
        FeatureTypes=["QUERIES"],
        QueriesConfig={
                'Queries': [
                    {
                        'Text': 'What is the DLN?',
                       'Alias': 'The DLN number - unique identifier of the form'
                    }
               ]
        }
)

Define the data model

The sample solution constructs the data in a structured format to serve the generic business rule evaluation. To keep extracted values, you can define a data model for each document page. The following image shows how the text on page 1 maps to the JSON fields.Custom data model

Each field represents a document’s text, check box, or table/form cell on the page. The JSON object looks like the following code:

{
    "dln": {
        "value": "93493319020929",
        "confidence": 0.9765, 
        "block": {} 
    },
    "omb_no": {
        "value": "1545-0047",
        "confidence": 0.9435,
        "block": {}
    },
    ...
}

You can find the detailed JSON structure definition in the GitHub repo.

Evaluate the data against business rules

The sample solution comes with a Condition class—a generic rules engine that takes the extracted data (as defined in the data model) and the rules (as defined in the customized business rules). It returns two lists with failed and satisfied conditions. We can use the result to decide if we should send the document to Amazon A2I for human review.

The Condition class source code is in the sample GitHub repo. It supports basic validation logic, such as validating a string’s length, value range, and confidence score threshold. You can modify the code to support more condition types and complex validation logic.

Create a customized Amazon A2I web UI

Amazon A2I allows you to customize the reviewer’s web UI by defining a worker task template. The template is a static webpage in HTML and JavaScript. You can pass data to the customized reviewer page using the Liquid syntax.

In the sample solution, the custom Amazon A2I UI template displays the page on the left and the failure conditions on the right. Reviewers can use it to correct the extraction value and add their comments.

The following screenshot shows our customized Amazon A2I UI. It shows the original image document on the left and the following failed conditions on the right:

  • The DLN numbers should be 16 characters long. The actual DLN has 15 characters.
  • The confidence score of employer_id is lower than 99%. The actual confidence score is around 98%.

The reviewers can manually verify these results and add comments in the CHANGE REASON text boxes.Customized A2I review UI

For more information about integrating Amazon A2I into any custom ML workflow, refer to over 60 pre-built worker templates on the GitHub repo and Use Amazon Augmented AI with Custom Task Types.

Process the Amazon A2I output

After the reviewer using the Amazon A2I customized UI verifies the result and chooses Submit, Amazon A2I stores a JSON file in the S3 bucket folder. The JSON file includes the following information on the root level:

  • The Amazon A2I flow definition ARN and human loop name
  • Human answers (the reviewer’s input collected by the customized Amazon A2I UI)
  • Input content (the original data sent to Amazon A2I when starting the human loop task)

The following is a sample JSON generated by Amazon A2I:

{
  "flowDefinitionArn": "arn:aws:sagemaker:us-east-1:711334203977:flow-definition/a2i-custom-ui-demo-workflow",
  "humanAnswers": [
    {
      "acceptanceTime": "2022-08-23T15:23:53.488Z",
      "answerContent": {
        "Change Reason 1": "Missing X at the end.",
        "True Value 1": "93493319020929X",
        "True Value 2": "04-3018996"
      },
      "submissionTime": "2022-08-23T15:24:47.991Z",
      "timeSpentInSeconds": 54.503,
      "workerId": "94de99f1bc6324b8",
      "workerMetadata": {
        "identityData": {
          "identityProviderType": "Cognito",
          "issuer": "https://cognito-idp.us-east-1.amazonaws.com/us-east-1_URd6f6sie",
          "sub": "cef8d484-c640-44ea-8369-570cdc132d2d"
        }
      }
    }
  ],
  "humanLoopName": "custom-loop-9b4e67ff-2c9f-40f9-aae5-0e26316c905c",
  "inputContent": {...} # the original input send to A2I when starting the human review task
}

You can implement extract, transform, and load (ETL) logic to parse information from the Amazon A2I output JSON and store it in a file or database. The sample solution comes with a CSV file with processed data. You can use it to build a BI dashboard by following the instructions in the next section.

Create a dashboard in Amazon QuickSight

The sample solution includes a reporting stage with a visualization dashboard served by Amazon QuickSight. The BI dashboard shows key metrics such as the number of documents processed automatically or manually, the most popular fields that required human review, and other insights. This dashboard can help you get an oversight of the document processing pipeline and analyze the common reasons causing human review. You can optimize the workflow by further reducing human input.

The sample dashboard includes basic metrics. You can expand the solution using Amazon QuickSight to show more insights into the data.BI dashboard

Expand the solution to support more documents and business rules

To expand the solution to support more document pages with corresponding business rules, you need to make the following changes:

  • Create a data model for the new page in JSON structure representing all the values you want to extract out of the pages. Refer to the Define the data model section for a detailed format.
  • Use Amazon Textract to extract text out of the document and populate values to the data model.
  • Add business rules corresponding to the page in JSON format. Refer to the Customize business rules section for the detailed format.

The custom Amazon A2I UI in the solution is generic, which doesn’t require a change to support new business rules.

Conclusion

Intelligent document processing is in high demand, and companies need a customized pipeline to support their unique business logic. Amazon A2I also offers a built-in template integrated with Amazon Textract to implement your human review use cases. It also allows you to customize the reviewer page to serve flexible requirements.

This post guided you through a reference solution using Amazon Textract and Amazon A2I to build an IDP pipeline that supports flexible business rules. You can try it out using the Jupyter notebook in the GitHub IDP workshop repo.


About the authors

Lana Zhang is a Sr. Solutions Architect at the AWS WWSO AI Services team with expertise in AI and ML for intelligent document processing and content moderation. She is passionate about promoting AWS AI services and helping customers transform their business solutions.


Sonali Sahu is leading Intelligent Document Processing AI/ML Solutions Architect team at Amazon Web Services. She is a passionate technophile and enjoys working with customers to solve complex problems using innovation. Her core area of focus are Artificial Intelligence & Machine Learning for Intelligent Document Processing.

Read More

How  AI is helping African communities and businesses

How AI is helping African communities and businesses

Editor’s note: Last week Google hosted the annual Google For Africa eventas part of our commitment to make the internet more useful in Africa, and to support the communities and businesses that will power Africa’s economic growth. This commitment includes our investment in research. Since announcing the Google AI Research Center in Accra, Ghanain 2018, we have made great strides in our mission to use AI for societal impact. In May we made several exciting announcements aimed at expanding these commitments.

Yossi Matias, VP of Engineering and Research, who oversees research in Africa, spoke with Jeff Dean, SVP of Google Research, who championed the opening of the AI Research Center, about the potential of AI in Africa.

Jeff: It’s remarkable how far we’ve come since we opened the center in Accra. I was excited then about the talented pool of researchers in Africa. I believed that by bringing together leading researchers and engineers, and collaborating with universities and the wider research community, we could push the boundaries of AI to solve critical challenges on the continent. It’s great to see progress on many fronts, from healthcare and education to agriculture and the climate crisis.

As part of Google For Africa last week, I spoke with Googlers across the continent about recent research and met several who studied at African universities we partner with. Yossi, from your perspective, how does our Research Center in Accra support the wider research ecosystem and benefit from it?

Yossi: I believe that nurturing local talent and working together with the community are critical to our mission. We’ve signed research agreements with five universities in Africa to conduct joint research, and I was fortunate to participate in the inauguration of the African Master of Machine Intelligence (AMMI) program, of which Google is a founding partner. Many AMMI graduates have continued their studies or taken positions in industry, including at our Accra Research Center where we offer an AI residency program. We’ve had three cohorts of AI residents to date.

Our researchers in Africa, and the partners and organizations we collaborate with, understand the local challenges best and can build and implement solutions that are helpful for their communities.

Jeff: For me, the Open Buildings initiative to map Africa’s built environment is a great example of that kind of collaborative solution. Can you share more about this?

Yossi: Absolutely. The Accra team used satellite imagery and machine learning to detect more than half a billion distinct structures and made the dataset available for public use. UN organizations, governments, non-profits, and startups have used the data for various applications, such as understanding energy needs for urban planning and managing the humanitarian response after a crisis. I’m very proud that we are now scaling this technology to countries outside of Africa as well.

Jeff: That’s a great achievement. It’s important to remember that the solutions we build in Africa can be scalable and useful globally. Africa has the world’s youngest population, so it’s essential that we continue to nurture the next generation of tech talent.

We must also keep working to make information accessible for this growing, diverse population. I’m proud of our efforts to use machine translation breakthroughs to bring more African languages online. Several languages were added to Google translate this year, including Bambara, Luganda, Oromo and Sepedi, which are spoken by a combined 85 million people. My mom spoke fluent Lugbara from our time living in Uganda when I was five—Lugbara didn’t make the set of languages added in this round, but we’re working on it!

Yossi: That’s just the start. Conversational technologies also have exciting educational applications that could help students and businesses. We recently collaborated with job seekers to build the Interview Warmup Tool, featured at the Google For Africa event, which uses machine learning and large language models to help job seekers prepare for interviews.

Jeff: Yossi, what’s something that your team is focused on now that you believe will have a profound impact on African society going forward?

Yossi: Climate and sustainability is a big focus and technology has a significant role to play. For example, our AI prediction models can accurately forecast floods, one of the deadliest natural disasters. We’re collaborating with several countries and organizations across the continent to scale this technology so that we can alert people in harm’s way.

We’re also working with local partners and startups on sustainability projects including reducing carbon emissions at traffic lights and improving food security by detecting locust outbreaks, which threaten the food supply and livelihoods of millions of people. I look forward to seeing many initiatives scale as more communities and countries get on board.

Jeff: I’m always inspired by the sense of opportunity in Africa. I’d like to thank our teams and partners for their innovation and collaboration. Of course, there’s much more to do, and together we can continue to make a difference.

Read More