Announcing the launch of the model copy feature for Amazon Comprehend custom models

Technology trends and advancements in digital media in the past decade or so have resulted in the proliferation of text-based data. The potential benefits of mining this text to derive insights, both tactical and strategic, is enormous. This is called natural language processing (NLP). You can use NLP, for example, to analyze your product reviews for customer sentiments, train a custom entity recognizer model to identify product types of interest based on customer comments, or train a custom text classification model to determine the most popular product categories.

Amazon Comprehend is an NLP service with ready-made intelligence to extract insights about the content of documents. It develops insights by recognizing the entities, key phrases, language, sentiments, and other common elements in a document. Amazon Comprehend Custom uses automatic machine learning (Auto ML) to build NLP models on your behalf using your own data. This enables you to detect entities unique to your business or classify text or documents as per your requirements. Additionally, you can automate your entire NLP workflow with easy-to-use APIs.

Today we’re happy to announce the launch of the Amazon Comprehend custom model copy feature, which allows you to automatically copy your Amazon Comprehend custom models from a source account to designated target accounts in the same Region without requiring access to the datasets that the model was trained and evaluated on. Starting today, you can use the AWS Management Console, AWS Command Line Interface (AWS CLI), or the boto3 APIs (Python SDK for AWS) to copy trained custom models from a source account to a designated target account. This new feature is available for both Amazon Comprehend custom classification and custom entity recognition models.

Benefits of the model copy feature

This new feature has the following benefits:

Multi-account MLOps strategy – Train a model one time and ensure predictable deployment in multiple environments in different accounts.
Faster deployment – You can quickly copy a trained model between accounts, avoiding the time taken to retrain in every account.
Protect sensitive datasets – Now you no longer need to share the datasets between different accounts or users. The training data needs to be available only on the account where the training is done. This is very important for certain industries like financial services, where data isolation and sandboxing are essential to meet regulatory requirements.
Easy collaboration – Partners or vendors can now easily train in Amazon Comprehend Custom and share the models with their customers.

How model copy works

With the new model copy feature, you can copy custom models between AWS accounts in the same Region in a two-stage process. First, a user in one AWS account (account A), shares a custom model that’s in their account. Then, a user in another AWS account (account B) imports the model into their account.

Share a model

To share a custom model in account A, the user attaches an AWS Identity and Access Management (IAM) resource-based policy to a model version. This policy authorizes an entity in account B, such as an IAM user or role, to import the model version into Amazon Comprehend in their AWS account. You can configure a resource-based policy either through the console or with the Amazon Comprehend custom PutResourcePolicy API.

Import a model

To import the model to account B, the user of this account provides Amazon Comprehend with the necessary details, such as the Amazon Resource Name (ARN) of the model. When they import the model, this user creates a new custom model in their AWS account that replicates the model that they imported. This model is fully trained and ready for inference jobs, such as document classification or named entity recognition. If the model is encrypted with an AWS Key Management Service (AWS KMS) key in the source, then the service role specified while importing the model needs to have access to the KMS key in order to decrypt the model during import. The target account can also specify a KMS key to encrypt the model during import. The importing of the shared model is also available both on the console and as an API.

Solution overview

To demonstrate the functionality of the model copy feature, we show you how to train, share, and import an Amazon Comprehend custom entity recognition model using both the Amazon Comprehend console and the AWS CLI. For this demonstration, we use two different accounts. The steps are applicable to Amazon Comprehend custom classification as well. The required steps are as follows:

Train an Amazon Comprehend custom entity recognition model in the source account.
Define the IAM resource policy for the trained model to allow cross-account access.
Copy the trained model from the source account to the target account.
Test the copied model through a batch job.

Train an Amazon Comprehend custom entity recognition model in the source account

The first step is to train an Amazon Comprehend custom entity recognition model in the source account. As an input dataset for the training, we use a CSV entity list and training documents for recognizing AWS service offerings in a given document. Make sure that the entity list and training documents are in an Amazon Simple Storage Service (Amazon S3) bucket in the source account. For instructions, see Adding Documents to Amazon S3.

Create an IAM role for Amazon Comprehend and provide required access to the S3 bucket with the training data. Note the role ARN and S3 bucket paths to use in later steps.

Train a model with the AWS CLI

Create an entity recognizer using the following AWS CLI command. Substitute your parameters for the S3 paths, IAM role, and Region. The response returns back the EntityRecognizerArn.

aws comprehend create-entity-recognizer 
     --language-code en 
     --version-name "1"
     --recognizer-name "AWS-Offerings-Entity-Recognizer-Dev"
     --data-access-role-arn "arn:aws:iam::<aws-account-nr>:role/service-role/AmazonComprehendServiceRole-ModelCopyTraining" 
     --input-data-config "DataFormat=COMPREHEND_CSV, EntityTypes=[{Type=AWS_OFFERING}],Documents={S3Uri=s3://import-model-blog/aws-offerings-docs.txt},
    EntityList={S3Uri=s3://import-model-blog/aws-offerings.csv"} 
     --region "us-west-2"
The status of the training job can be monitored by calling the describe-entity-recognizer and checking the Status in the response.
aws comprehend describe-entity-recognizer 
--entity-recognizer-arn "arn:aws:comprehend:<region>:<aws-account-nr>:entity-recognizer/AWS-Offerings-Entity-Recognizer-Dev1/version/1" 
--region us-west-2

The status of the training job can be monitored by calling the describe-entity-recognizer and checking the Status in the response.

aws comprehend describe-entity-recognizer 
--entity-recognizer-arn "arn:aws:comprehend:<region>:<aws-account-nr>:entity-recognizer/AWS-Offerings-Entity-Recognizer-Dev1/version/1" 
--region us-west-2

Train a model via the console

To train a model via the console, complete the following steps:

On the Amazon Comprehend console, under Customization, create a new custom entity recognizer model.
Provide a model name and version.
For Language, choose Engligh.
For Custom entity type, add AWS_OFFERING.

To train a custom entity recognition model, you can choose one of two ways to provide data to Amazon Comprehend: annotations or entity lists. For simplicity, use the entity list method.

For Data format, select CSV file.
For Training type, select Using entity list and training docs.
Provide the S3 location paths for the entity list CSV and training data.
To grant permissions to Amazon Comprehend to access your S3 bucket, create an IAM service-linked role.

In the Resource-based policy section, you can authorize access for the model version. The accounts you grant access to can import this model into their account. We skip this step for now and add the policy after the model is trained and we’re satisfied with the model performance.

Choose Create.

This submits your custom entity recognizer, which goes through a number of models, tunes your hyperparameters, and checks for cross-validation to make sure that your model is robust. These are all the same activities that data scientists perform.

Define the IAM resource policy for the trained model to allow cross-account access

When we’re satisfied with the training performance, we can go ahead and share the specific model version by adding a resource policy.

Add a resource-based policy from the AWS CLI

Authorize importing the model from the target account by adding a resource policy on the model, as shown in the following code. The policy can be tightly scoped to a particular model version and target principal. Substitute your trained entity recognizer ARN and target account to provide access to.

aws comprehend put-resource-policy 
--resource-arn "arn:aws:comprehend:<region>:<aws-account-nr>:entity-recognizer/AWS-Offerings-Entity-Recognizer-Dev/version/1" 
--resource-policy "{"Version":"2012-10-17",
    "Statement":[{"Sid":"ModelCopy",
    "Effect":"Allow","Action":["comprehend:ImportModel"],
    "Resource":"arn:aws:comprehend:<region>:<aws-account-nr>:entity-recognizer/AWS-Offerings-Entity-Recognizer-Dev/version/1",
    "Principal":{"AWS":["arn:aws:iam::<aws-account-nr>:<user>"], "Service": [ "comprehend.amazonaws.com"]}}]}"

Add a resource-based policy via the console

When the training is complete, a custom entity recognition model version is generated. We can choose the trained model and version to view the training details, including performance of the trained model.

To update the policy, complete the following steps:

On the Tags, VPC & Policy tab, edit the resource-based policy.
Provide the policy name, Amazon Comprehend service principal (comprehend.amazonaws.com), target account ID, and IAM users in the target account authorized to import the model version.

We specify root as the IAM entity to authorize all users in the target account.

Make a note of the model resource ARN, which we use later during the import process.

Copy the trained model from the source account to the target account

Now the model is trained and shared from the source account. The authorized target account user can import the model and create a copy of the model in their own account.

To import a model, you need to specify the source model ARN and service role for Amazon Comprehend to perform the copy action on your account. You can specify an optional AWS KMS ID to encrypt the model in your target account.

Import the model through AWS CLI

To import your model with the AWS CLI, enter the following code:

aws comprehend import-model 
--source-model-arn "arn:aws:comprehend:<region>:<aws-account-nr>:entity-recognizer/AWS-Offerings-Entity-Recognizer-Dev1/version/1" 
--model-name "AWS-Offerings-Entity-Recognizer-Prd1" 
--version-name "1"
--data-access-role-arn <Optional Comprehend Service Role ARN>

Import the model via the console

To import the model via the console, complete the following steps:

On the Amazon Comprehend console, under Custom entity recognition, choose Import version.
For Model version ARN, enter the ARN for the model trained in the source account.
Enter a model name and version for the target.
Provide a service account role and choose Confirm to start the model import process.

After the model status changes to Imported, we can view the model details, including the performance details of the trained model.

Test the copied model through a batch job

We test the copied model in the target account by detecting custom entities with a batch job. To test the model, download the test file and place it in an S3 bucket in your target account. Create an IAM role for Amazon Comprehend and provide the required access to the S3 bucket with the test data. You use the role ARN and S3 bucket paths that you noted earlier.

When the job is complete, you can verify the inference data in the specified output S3 bucket.

Test the model with the AWS CLI

To test the model using the AWS CLI, enter the following code:

aws comprehend start-entities-detection-job 
--entity-recognizer-arn "arn:aws:comprehend:<region>:<aws-account-nr>:entity-recognizer/AWS-Offerings-Entity-Recognizer-Prd/version/1" 
--job-name "Test-Entity-Detection-Job" 
--data-access-role-arn "arn:aws:iam::<aws-account-nr>:role/service-role/ComprehendDataAccessRole" 
--language-code "en" 
--input-data-config "S3Uri= s3://import-model-blog/aws-offerings-test.txt" 
--output-data-config "S3Uri= s3://import-model-blog/output/ " 
--region "us-west-2"

Test the model via the console

To test the model via the console, complete the following steps:

On the Amazon Comprehend console, choose Analysis jobs and choose Create job.
For Name, enter a name for the job.
For Analysis type¸ choose Custom entity recognition.
Choose the model name and version of the imported model.
Provide the S3 paths for the test file for the job and the output location where Amazon Comprehend stores the result.
Choose or create an IAM role with permission to access the S3 buckets.
Choose Create job.

When your analysis job is complete, you have JSON files in your output S3 bucket path, which you can download to verify the results of the entity recognition from the imported model.

Conclusion

In this post, we demonstrated the Amazon Comprehend custom entity model copy feature. This feature gives you the ability to train an Amazon Comprehend custom entity recognition or classification model in one account and then share the model with another account in the same Region. This simplifies the multi-account strategy where the model can be trained one time and shared between accounts within same Region without having to retrain or share the training datasets. This allows for a predicable deployment in every account as part of your MLOps workflow. For more information, see our documentation on Comprehend custom copy, or try out the walkthrough in this post either via the console or using a cloud shell with the AWS CLI.

As of this writing, the model copy feature in Amazon Comprehend is available in the following Regions:

US East (Ohio)
US East (N. Virginia)
US West (Oregon)
Asia Pacific (Mumbai)
Asia Pacific (Seoul)
Asia Pacific (Singapore)
Asia Pacific (Sydney)
Asia Pacific (Tokyo)
EU (Frankfurt)
EU (Ireland)
EU (London)
AWS GovCloud (US-West)

Give the feature a try, and please send us feedback either via the AWS forum for Amazon Comprehend or through your usual AWS support contacts.

About the Authors

Premkumar Rangarajan is an AI/ML specialist solutions architect at Amazon Web Services and has previously authored the book Natural Language Processing with AWS AI services. He has 26 years of experience in the IT industry in a variety of roles, including delivery lead, integration specialist, and enterprise architect. He helps enterprises of all sizes adopt AI and ML to solve for their real-world challenges.

Chethan Krishna is a Senior Partner Solutions Architect in India. He works with Strategic AWS Partners for establishing a robust cloud competency, adopting AWS best practices and solving customer challenges. He is a builder and enjoys experimenting with AI/ML, IoT, and analytics.

Sriharsha M S is an AI/ML specialist solution architect in the Strategic Specialist team at Amazon Web Services. He works with strategic AWS customers who are taking advantage of AI/ML to solve complex business problems. He provides technical guidance and design advice to implement AI/ML applications at scale. His expertise spans application architecture, bigdata, analytics and machine learning.

Vedere AI