Achieve DevOps maturity with BMC AMI zAdviser Enterprise and Amazon Bedrock

Achieve DevOps maturity with BMC AMI zAdviser Enterprise and Amazon Bedrock

In software engineering, there is a direct correlation between team performance and building robust, stable applications. The data community aims to adopt the rigorous engineering principles commonly used in software development into their own practices, which includes systematic approaches to design, development, testing, and maintenance. This requires carefully combining applications and metrics to provide complete awareness, accuracy, and control. It means evaluating all aspects of a team’s performance, with a focus on continuous improvement, and it applies just as much to mainframe as it does to distributed and cloud environments—maybe more.

This is achieved through practices like infrastructure as code (IaC) for deployments, automated testing, application observability, and complete application lifecycle ownership. Through years of research, the DevOps Research and Assessment (DORA) team has identified four key metrics that indicate the performance of a software development team:

  • Deployment frequency – How often an organization successfully releases to production
  • Lead time for changes – The amount of time it takes a commit to get into production
  • Change failure rate – The percentage of deployments causing a failure in production
  • Time to restore service – How long it takes an organization to recover from a failure in production

These metrics provide a quantitative way to measure the effectiveness and efficiency of DevOps practices. Although much of the focus around analysis of DevOps is on distributed and cloud technologies, the mainframe still maintains a unique and powerful position, and it can use the DORA 4 metrics to further its reputation as the engine of commerce.

This blog post discusses how BMC Software added AWS Generative AI capabilities to its product BMC AMI zAdviser Enterprise. The zAdviser uses Amazon Bedrock to provide summarization, analysis, and recommendations for improvement based on the DORA metrics data.

Challenges of tracking DORA 4 metrics

Tracking DORA 4 metrics means putting the numbers together and placing them on a dashboard. However, measuring productivity is essentially measuring the performance of individuals, which can make them feel scrutinized. This situation might necessitate a shift in organizational culture to focus on collective achievements and emphasize that automation tools enhance the developer experience.

It’s also vital to avoid focusing on irrelevant metrics or excessively tracking data. The essence of DORA metrics is to distill information into a core set of key performance indicators (KPIs) for evaluation. Mean time to restore (MTTR) is often the simplest KPI to track—most organizations use tools like BMC Helix ITSM or others that record events and issue tracking.

Capturing lead time for changes and change failure rate can be more challenging, especially on mainframes. Lead time for changes and change failure rate KPIs aggregate data from code commits, log files, and automated test results. Using a Git-based SCM pulls these insight together seamlessly. Mainframe teams using BMC’s Git-based DevOps platform, AMI DevX ,can collect this data as easily as distributed teams can.

Solution overview

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon via a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI.

BMC AMI zAdviser Enterprise provides a wide range of DevOps KPIs to optimize mainframe development and enable teams to proactvely identify and resolve issues. Using machine learning, AMI zAdviser monitors mainframe build, test and deploy functions across DevOps tool chains and then offers AI-led recommendations for continuous improvement. In addition to capturing and reporting on development KPIs, zAdviser captures data on how the BMC DevX products are adopted and used. This includes the number of programs that were debugged, the outcome of testing efforts using the DevX testing tools, and many other data points. These additional data points can provide deeper insight into the development KPIs, including the DORA metrics, and may be used in future generative AI efforts with Amazon Bedrock.

The following architecture diagram shows the final implementation of zAdviser Enterprise utilizing generative AI to provide summarization, analysis, and recommendations for improvement based on the DORA metrics KPI data.

Architecture Diagram

The solution workflow includes the following steps:

  1. Create the aggregation query to retrieve the metrics from Elasticsearch.
  2. Extract the stored mainframe metrics data from zAdviser, which is hosted in Amazon Elastic Compute Cloud (Amazon EC2) and deployed in AWS.
  3. Aggregate the data retrieved from Elasticsearch and form the prompt for the generative AI Amazon Bedrock API call.
  4. Pass the generative AI prompt to Amazon Bedrock (using Anthropic’s Claude2 model on Amazon Bedrock).
  5. Store the response from Amazon Bedrock (an HTML-formatted document) in Amazon Simple Storage Service (Amazon S3).
  6. Trigger the KPI email process via AWS Lambda:
    1. The HTML-formatted email is extracted from Amazon S3 and added to the body of the email.
    2. The PDF for customer KPIs is extracted from zAdviser and attached to the email.
    3. The email is sent to subscribers.

The following screenshot shows the LLM summarization of DORA metrics generated using Amazon Bedrock and sent as an email to the customer, with a PDF attachment that contains the DORA metrics KPI dashboard report by zAdviser.

Result Summarization

Key takeaways

In this solution, you don’t need to worry about your data being exposed on the internet when sent to an AI client. The API call to Amazon Bedrock doesn’t contain any personally identifiable information (PII) or any data that could identify a customer. The only data transmitted consists of numerical values in the form of the DORA metric KPIs and instructions for the generative AI’s operations. Importantly, the generative AI client does not retain, learn from, or cache this data.

The zAdviser engineering team was successful in rapidly implementing this feature within a short time span. The rapid progress was facilitated by zAdviser’s substantial investment in AWS services and, importantly, the ease of using Amazon Bedrock via API calls. This underscores the transformative power of generative AI technology embodied in the Amazon Bedrock API. This API, equipped with the industry-specific knowledge repository zAdviser Enterprise and customized with continuously collected organization-specific DevOps metrics, demonstrates the potential of AI in this field.

Generative AI has the potential to lower the barrier to entry to build AI-driven organizations. Large language models (LLMs) in particular can bring tremendous value to enterprises seeking to explore and use unstructured data. Beyond chatbots, LLMs can be used in a variety of tasks, such as classification, editing, and summarization.

Conclusion

This post discussed the transformational impact of generative AI technology in the form of Amazon Bedrock APIs equipped with the industry-specific knowledge that BMC zAdviser possesses, tailored with organization-specific DevOps metrics collected on an ongoing basis.

Check out the BMC website to learn more and set up a demo.


About the Authors

Sunil BemarkarSunil Bemarkar is a Sr. Partner Solutions Architect at Amazon Web Services. He works with various Independent Software Vendors (ISVs) and Strategic customers across industries to accelerate their digital transformation journey and cloud adoption.

Vij BalakrishnaVij Balakrishna is a Senior Partner Development manager at Amazon Web Services. She helps independent software vendors (ISVs) across industries to accelerate their digital transformation journey.

Spencer Hallman is the Lead Product Manager for the BMC AMI zAdviser Enterprise. Previously, he was the Product Manager for BMC AMI Strobe and BMC AMI Ops Automation for Batch Thruput. Prior to Product Management, Spencer was the Subject Matter Expert for Mainframe Performance. His diverse experience over the years has also included programming on multiple platforms and languages as well as working in the Operations Research field. He has a Master of Business Administration with a concentration in Operations Research from Temple University and a Bachelor of Science in Computer Science from the University of Vermont. He lives in Devon, PA and when he’s not attending virtual meetings, enjoys walking his dogs, riding his bike and spending time with his family.

Read More

Fine-tune your Amazon Titan Image Generator G1 model using Amazon Bedrock model customization

Fine-tune your Amazon Titan Image Generator G1 model using Amazon Bedrock model customization

Amazon Titan lmage Generator G1 is a cutting-edge text-to-image model, available via Amazon Bedrock, that is able to understand prompts describing multiple objects in various contexts and captures these relevant details in the images it generates. It is available in US East (N. Virginia) and US West (Oregon) AWS Regions and can perform advanced image editing tasks such as smart cropping, in-painting, and background changes. However, users would like to adapt the model to unique characteristics in custom datasets that the model is not already trained on. Custom datasets can include highly proprietary data that is consistent with your brand guidelines or specific styles such as a previous campaign. To address these use cases and generate fully personalized images, you can fine-tune Amazon Titan Image Generator with your own data using custom models for Amazon Bedrock.

From generating images to editing them, text-to-image models have broad applications across industries. They can enhance employee creativity and provide the ability to imagine new possibilities simply with textual descriptions. For example, it can aid design and floor planning for architects and allow faster innovation by providing the ability to visualize various designs without the manual process of creating them. Similarly, it can aid in design across various industries such as manufacturing, fashion design in retail, and game design by streamlining the generation of graphics and illustrations. Text-to-image models also enhance your customer experience by allowing for personalized advertising as well as interactive and immersive visual chatbots in media and entertainment use cases.

In this post, we guide you through the process of fine-tuning the Amazon Titan Image Generator model to learn two new categories: Ron the dog and Smila the cat, our favorite pets. We discuss how to prepare your data for the model fine-tuning task and how to create a model customization job in Amazon Bedrock. Finally, we show you how to test and deploy your fine-tuned model with Provisioned Throughput.

Ron the dog Smila the cat

Evaluating model capabilities before fine-tuning a job

Foundation models are trained on large amounts of data, so it is possible that your model will work well enough out of the box. That’s why it’s good practice to check if you actually need to fine-tune your model for your use case or if prompt engineering is sufficient. Let’s try to generate some images of Ron the dog and Smila the cat with the base Amazon Titan Image Generator model, as shown in the following screenshots.

As expected, the out-of-the-box model does not know Ron and Smila yet, and the generated outputs show different dogs and cats. With some prompt engineering, we can provide more details to get closer to the look of our favorite pets.

Although the generated images are more similar to Ron and Smila, we see that the model is not able to reproduce the full likeness of them. Let’s now start a fine-tuning job with the photos from Ron and Smila to get consistent, personalized outputs.

Fine-tuning Amazon Titan Image Generator

Amazon Bedrock provides you with a serverless experience for fine-tuning your Amazon Titan Image Generator model. You only need to prepare your data and select your hyperparameters, and AWS will handle the heavy lifting for you.

When you use the Amazon Titan Image Generator model to fine-tune, a copy of this model is created in the AWS model development account, owned and managed by AWS, and a model customization job is created. This job then accesses the fine-tuning data from a VPC and the amazon Titan model has its weights updated. The new model is then saved to an Amazon Simple Storage Service (Amazon S3) located in the same model development account as the pre-trained model. It can now be used for inference only by your account and is not shared with any other AWS account. When running inference, you access this model via a provisioned capacity compute or directly, using batch inference for Amazon Bedrock. Independently from the inference modality chosen, your data remains in your account and is not copied to any AWS owned account or used to improve the Amazon Titan Image Generator model.

The following diagram illustrates this workflow.

Data privacy and network security

Your data used for fine-tuning including prompts, as well as the custom models, remain private in your AWS account. They are not shared or used for model training or service improvements, and aren’t shared with third-party model providers. All the data used for fine-tuning is encrypted in transit and at rest. The data remains in the same Region where the API call is processed. You can also use AWS PrivateLink to create a private connection between the AWS account where your data resides and the VPC.

Data preparation

Before you can create a model customization job, you need to prepare your training dataset. The format of your training dataset depends on the type of customization job you are creating (fine-tuning or continued pre-training) and the modality of your data (text-to-text, text-to-image, or image-to-embedding). For the Amazon Titan Image Generator model, you need to provide the images that you want to use for the fine-tuning and a caption for each image. Amazon Bedrock expects your images to be stored on Amazon S3 and the pairs of images and captions to be provided in a JSONL format with multiple JSON lines.

Each JSON line is a sample containing an image-ref, the S3 URI for an image, and a caption that includes a textual prompt for the image. Your images must be in JPEG or PNG format. The following code shows an example of the format:

{"image-ref": "s3://bucket/path/to/image001.png", "caption": "<prompt text>"}
{"image-ref": "s3://bucket/path/to/image002.png", "caption": "<prompt text>"}
{"image-ref": "s3://bucket/path/to/image003.png", "caption": "<prompt text>"}

Because “Ron” and “Smila” are names that could also be used in other contexts, such as a person’s name, we add the identifiers “Ron the dog” and “Smila the cat” when creating the prompt to fine-tune our model. Although it’s not a requirement for the fine-tuning workflow, this additional information provides more contextual clarity for the model when it is being customized for the new classes and will avoid the confusion of ‘“Ron the dog” with a person called Ron and “Smila the cat” with the city Smila in Ukraine. Using this logic, the following images show a sample of our training dataset.

Ron the dog laying on a white dog bed Ron the dog sitting on a tile floor Ron the dog laying on a car seat
Smila the cat lying on a couch Smila the cat staring at the camera laying on a couch Smila the cat laying in a pet carrier

When transforming our data to the format expected by the customization job, we get the following sample structure:

{"image-ref": "<S3_BUCKET_URL>/ron_01.jpg", "caption": "Ron the dog laying on a white dog bed"}
{"image-ref": "<S3_BUCKET_URL>/ron_02.jpg", "caption": "Ron the dog sitting on a tile floor"}
{"image-ref": "<S3_BUCKET_URL>/ron_03.jpg", "caption": "Ron the dog laying on a car seat"}
{"image-ref": "<S3_BUCKET_URL>/smila_01.jpg", "caption": "Smila the cat lying on a couch"}
{"image-ref": "<S3_BUCKET_URL>/smila_02.jpg", "caption": "Smila the cat sitting next to the window next to a statue cat"}
{"image-ref": "<S3_BUCKET_URL>/smila_03.jpg", "caption": "Smila the cat lying on a pet carrier"}

After we have created our JSONL file, we need to store it on an S3 bucket to start our customization job. Amazon Titan Image Generator G1 fine-tuning jobs will work with 5–10,000 images. For the example discussed in this post, we use 60 images: 30 of Ron the dog and 30 of Smila the cat. In general, providing more varieties of the style or class you are trying to learn will improve the accuracy of your fine-tuned model. However, the more images you use for fine-tuning, the more time will be required for the fine-tuning job to complete. The number of images used also influence the pricing of your fine-tuned job. Refer to Amazon Bedrock Pricing for more information.

Fine-tuning Amazon Titan Image Generator

Now that we have our training data ready, we can begin a new customization job. This process can be done both via the Amazon Bedrock console or APIs. To use the Amazon Bedrock console, complete the following steps:

  1. On the Amazon Bedrock console, choose Custom models in the navigation pane.
  2. On the Customize model menu, choose Create fine-tuning job.
  3. For Fine-tuned model name, enter a name for your new model.
  4. For Job configuration, enter a name for the training job.
  5. For Input data, enter the S3 path of the input data.
  6. In the Hyperparameters section, provide values for the following:
    1. Number of steps – The number of times the model is exposed to each batch.
    2. Batch size – The number of samples processed before updating the model parameters.
    3. Learning rate – The rate at which the model parameters are updated after each batch. The choice of these parameters depends on a given dataset. As a general guideline, we recommend you start by fixing the batch size to 8, the learning rate to 1e-5, and set the number of steps according to the number of images used, as detailed in the following table.
Number of images provided 8 32 64 1,000 10,000
Number of steps recommended 1,000 4,000 8,000 10,000 12,000

If the results of your fine-tuning job are not satisfactory, consider increasing the number of steps if you don’t observe any signs of the style in generated images, and decreasing the number of steps if you observe the style in the generated images but with artifacts or blurriness. If the fine-tuned model fails to learn the unique style in your dataset even after 40,000 steps, consider increasing the batch size or the learning rate.

  1. In the Output data section, enter the S3 output path where the validation outputs, including the periodically recorded validation loss and accuracy metrics, are stored.
  2. In the Service access section, generate a new AWS Identity and Access Management (IAM) role or choose an existing IAM role with the necessary permissions to access your S3 buckets.

This authorization enables Amazon Bedrock to retrieve input and validation datasets from your designated bucket and store validation outputs seamlessly in your S3 bucket.

  1. Choose Fine-tune model.

With the correct configurations set, Amazon Bedrock will now train your custom model.

Deploy the fine-tuned Amazon Titan Image Generator with Provisioned Throughput

After you create custom model, Provisioned Throughput allows you to allocate a predetermined, fixed rate of processing capacity to the custom model. This allocation provides a consistent level of performance and capacity for handling workloads, which results in better performance in production workloads. The second advantage of Provisioned Throughput is cost control, because standard token-based pricing with on-demand inference mode can be difficult to predict at large scales.

When the fine tuning of your model is complete, this model will appear on the Custom models’ page on the Amazon Bedrock console.

To purchase Provisioned Throughput, select the custom model that you just fine-tuned and choose Purchase Provisioned Throughput.

This prepopulates the selected model for which you want to purchase Provisioned Throughput. For testing your fine-tuned model before deployment, set model units to a value of 1 and set the commitment term to No commitment. This quickly lets you start testing your models with your custom prompts and check if the training is adequate. Moreover, when new fine-tuned models and new versions are available, you can update the Provisioned Throughput as long as you update it with other versions of the same model.

Fine-tuning results

For our task of customizing the model on Ron the dog and Smila the cat, experiments showed that the best hyperparameters were 5,000 steps with a batch size of 8 and a learning rate of 1e-5.

The following are some examples of the images generated by the customized model.

Ron the dog wearing a superhero cape Ron the dog on the moon Ron the dog in a swimming pool with sunglasses
Smila the cat on the snow Smila the cat in black and white staring at the camera Smila the cat wearing a Christmas hat

Conclusion

In this post, we discussed when to use fine-tuning instead of engineering your prompts for better-quality image generation. We showed how to fine-tune the Amazon Titan Image Generator model and deploy the custom model on Amazon Bedrock. We also provided general guidelines on how to prepare your data for fine-tuning and set optimal hyperparameters for more accurate model customization.

As a next step, you can adapt the following example to your use case to generate hyper-personalized images using Amazon Titan Image Generator.


About the Authors

Maira Ladeira Tanke is a Senior Generative AI Data Scientist at AWS. With a background in machine learning, she has over 10 years of experience architecting and building AI applications with customers across industries. As a technical lead, she helps customers accelerate their achievement of business value through generative AI solutions on Amazon Bedrock. In her free time, Maira enjoys traveling, playing with her cat Smila, and spending time with her family someplace warm.

Dani Mitchell is an AI/ML Specialist Solutions Architect at Amazon Web Services. He is focused on computer vision use cases and helping customers across EMEA accelerate their ML journey.

Bharathi Srinivasan is a Data Scientist at AWS Professional Services, where she loves to build cool things on Amazon Bedrock. She is passionate about driving business value from machine learning applications, with a focus on responsible AI. Outside of building new AI experiences for customers, Bharathi loves to write science fiction and challenge herself with endurance sports.

Achin Jain is an Applied Scientist with the Amazon Artificial General Intelligence (AGI) team. He has expertise in text-to-image models and is focused on building the Amazon Titan Image Generator.

Read More

Build a receipt and invoice processing pipeline with Amazon Textract

Build a receipt and invoice processing pipeline with Amazon Textract

In today’s business landscape, organizations are constantly seeking ways to optimize their financial processes, enhance efficiency, and drive cost savings. One area that holds significant potential for improvement is accounts payable. On a high level, the accounts payable process includes receiving and scanning invoices, extraction of the relevant data from scanned invoices, validation, approval, and archival. The second step (extraction) can be complex. Each invoice and receipt look different. The labels are imperfect and inconsistent. The most important pieces of information such as price, vendor name, vendor address, and payment terms are often not explicitly labeled and have to be interpreted based on context. The traditional approach of using human reviewers to extract the data is time-consuming, error-prone, and not scalable.

In this post, we show how to automate the accounts payable process using Amazon Textract for data extraction. We also provide a reference architecture to build an invoice automation pipeline that enables extraction, verification, archival, and intelligent search.

Solution overview

The following architecture diagram shows the stages of a receipt and invoice processing workflow. It starts with a document capture stage to securely collect and store scanned invoices and receipts. The next stage is the extraction phase, where you pass the collected invoices and receipts to the Amazon Textract AnalyzeExpense API to extract financially related relationships between text such as vendor name, invoice receipt date, order date, amount due, amount paid, and so on. In the next stage, you use predefined expense rules to determine if you should automatically approve or reject the receipt. Approved and rejected documents go to their respective folders within the Amazon Simple Storage Service (Amazon S3) bucket. For approved documents, you can search all the extracted fields and values using Amazon OpenSearch Service. You can visualize the indexed metadata using OpenSearch Dashboards. Approved documents are also set up to be moved to Amazon S3 Intelligent-Tiering for long-term retention and archival using S3 lifecycle policies.

Solution Architecture

The following sections take you through the process of creating the solution.

Prerequisites

To deploy this solution, you must have the following:

  • An AWS account.
  • An AWS Cloud9 environment. AWS Cloud9 is a cloud-based integrated development environment (IDE) that lets you write, run, and debug your code with just a browser. It includes a code editor, debugger, and terminal.

To create the AWS Cloud9 environment, provide a name and description. Keep everything else as default. Choose the IDE link on the AWS Cloud9 console to navigate to IDE. You’re now ready to use the AWS Cloud9 environment.

Deploy the solution

To set up the solution, you use the AWS Cloud Development Kit (AWS CDK) to deploy an AWS CloudFormation stack.

  1. In your AWS Cloud9 IDE terminal, clone the GitHub repository and install the dependencies. Run the following commands to deploy the InvoiceProcessor stack:
git clone https://github.com/aws-samples/amazon-textract-invoice-processor.git
pip install -r requirements.txt
cdk bootstrap
cdk deploy

The deployment takes around 25 minutes with the default configuration settings from the GitHub repo. Additional output information is also available on the AWS CloudFormation console.

  1. After the AWS CDK deployment is complete, create expense validation rules in an Amazon DynamoDB table. You can use the same AWS Cloud9 terminal to run the following commands:
aws dynamodb execute-statement --statement "INSERT INTO "$(aws cloudformation list-exports --query 'Exports[?Name==`InvoiceProcessorWorkflow-RulesTableName`].Value' --output text)" VALUE {'ruleId': 1, 'type': 'regex', 'field': 'INVOICE_RECEIPT_ID', 'check': '(?i)[0-9]{3}[a-z]{3}[0-9]{3}$', 'errorTxt': 'Receipt number is not valid. It is of the format: 123ABC456'}"
aws dynamodb execute-statement --statement "INSERT INTO "$(aws cloudformation list-exports --query 'Exports[?Name==`InvoiceProcessorWorkflow-RulesTableName`].Value' --output text)" VALUE {'ruleId': 2, 'type': 'regex', 'field': 'PO_NUMBER', 'check': '(?i)[a-z0-9]+$', 'errorTxt': 'PO number is not present'}"
  1. In the S3 bucket that starts with invoiceprocessorworkflow-invoiceprocessorbucketf1-*, create an uploads folder.

In Amazon Cognito, you should already have an existing user pool called OpenSearchResourcesCognitoUserPool*. We use this user pool to create a new user.

  1. On the Amazon Cognito console, navigate to the user pool OpenSearchResourcesCognitoUserPool*.
  2. Create a new Amazon Cognito user.
  3. Provide a user name and password of your choice and note them for later use.
  4. Upload the documents random_invoice1 and random_invoice2 to the S3 uploads folder to start the workflows.

Now let’s dive into each of the document processing steps.

Document Capture

Customers handle invoices and receipts in a multitude of formats from different vendors. These documents are received through channels like hard copies, scanned copies uploaded to file storage, or shared storage devices. In the document capture stage, you store all scanned copies of receipts and invoices in a highly scalable storage such as in an S3 bucket.

Upload sample invoices

Extraction

The next stage is the extraction phase, where you pass the collected invoices and receipts to the Amazon Textract AnalyzeExpense API to extract financially related relationships between text such as Vendor Name, Invoice Receipt Date, Order Date, Amount Due/Paid, etc.

AnalyzeExpense is an API dedicated to processing invoice and receipts documents. It is available both as a synchronous or asynchronous API. The synchronous API allows you to send images in bytes format, and the asynchronous API allows you to send files in JPG, PNG, TIFF, and PDF formats. The AnalyzeExpense API response consists of three distinct sections:

  • Summary fields – This section includes both normalized keys and the explicitly mentioned keys along with their values. AnalyzeExpense normalizes the keys for contact-related information such as vendor name and vendor address, tax ID-related keys such as tax payer ID, payment-related keys such as amount due and discount, and general keys such as invoice ID, delivery date, and account number. Keys that are not normalized still appear in the summary fields as key-value pairs. For a complete list of supported expense fields, refer to Analyzing Invoices and Receipts.
  • Line items – This section includes normalized line item keys such as item description, unit price, quantity, and product code.
  • OCR block – The block contains the raw text extract from the invoice page. The raw text extract can be used for postprocessing and identifying information that is not covered as part of the summary and line item fields.

This post uses the Amazon Textract IDP CDK constructs (AWS CDK components to define infrastructure for intelligent document processing (IDP) workflows), which allows you to build use case-specific, customizable IDP workflows. The constructs and samples are a collection of components to enable definition of IDP processes on AWS and published to GitHub. The main concepts used are the AWS CDK constructs, the actual AWS CDK stacks, and AWS Step Functions.

The following figure shows the Step Functions workflow.

Step function workflow

The extraction workflow includes the following steps:

  • InvoiceProcessor-Decider – An AWS Lambda function that verifies if the input document format is supported by Amazon Textract. For more details about supported formats, refer to Input Documents.
  • DocumentSplitter – A Lambda function that generates 2,500-page (max) chunks from documents and can process large multi-page documents.
  • Map State – A Lambda function that processes each chunk in parallel.
  • TextractAsync – This task calls Amazon Textract using the asynchronous API following best practices with Amazon Simple Notification Service (Amazon SNS) notifications and uses OutputConfig to store the Amazon Textract JSON output to the S3 bucket you created earlier. It consists of two Lambda functions: one to submit the document for processing and one that is triggered on the SNS notification.
  • TextractAsyncToJSON2 – Because the TextractAsync task can produce multiple paginated output files, the TextractAsyncToJSON2 process combines them into one JSON file.

We discuss the details of the next three steps in the following sections.

Verification and approval

For the verification stage, the SetMetaData Lambda function verifies whether the uploaded file is a valid expense as per the rules configured previously in DynamoDB table. For this post, you use the following sample rules:

  • Verification is successful if INVOICE_RECEIPT_ID is present and matches the regex (?i)[0-9]{3}[a-z]{3}[0-9]{3}$ and if PO_NUMBER is present and matches the regex (?i)[a-z0-9]+$
  • Verification is un-successful if either PO_NUMBER or INVOICE_RECEIPT_ID is incorrect or missing in the document.

After the files are processed, the expense verification function moves the input files to either approved or declined folders in the same S3 bucket.

S3 output

For the purposes of this solution, we use DynamoDB to store the expense validation rules. However, you can modify this solution to integrate with your own or commercial expense validation or management solutions.

Intelligent index and search

With the OpenSearchPushInvoke Lambda function, the extracted expense metadata is pushed to an OpenSearch Service index and is available for search.

The final TaskOpenSearchMapping step clears the context, which otherwise could exceed the Step Functions quota of maximum input or output size for a task, state, or workflow run.

After the OpenSearch Service index is created, you can search for keywords from the extracted text via OpenSearch Dashboards.

OpenSearch document search

Archival, audit, and analytics

To manage the lifecycle and archival of invoices and receipts, you can configure S3 lifecycle rules to transition S3 objects from Standard to Intelligent-Tiering storage classes. S3 Intelligent-Tiering monitors access patterns and automatically moves objects to the Infrequent Access tier when they haven’t been accessed for 30 consecutive days. After 90 days of no access, the objects are moved to the Archive Instant Access tier without performance impact or operational overhead.

For auditing and analytics, this solution uses OpenSearch Service for running analytics on invoice requests. OpenSearch Service enables you to effortlessly ingest, secure, search, aggregate, view, and analyze data for a number of use cases, such as log analytics, application search, enterprise search, and more.

Log in to OpenSearch Dashboards and navigate to Stack Management, Saved objects, then choose Import. Choose the invoices.ndjson file from the cloned repository and choose Import. This prepopulates indexes and builds the visualization.

OpenSearch import

Refresh the page and navigate to Home, Dashboard, and open Invoices. You can now select and apply filters and expand the time window to explore past invoices.

OpenSearch dashboard

Clean up

When you’re finished evaluating Amazon Textract for processing receipts and invoices, we recommend cleaning up any resources that you might have created. Complete the following steps:

  1. Delete all content from the S3 bucket invoiceprocessorworkflow-invoiceprocessorbucketf1-*.
  2. In AWS Cloud9, run the following commands to delete Amazon Cognito resources and CloudFormation stacks:
cognito_user_pool=$(aws cloudformation list-exports --query 'Exports[?Name==`InvoiceProcessorWorkflow-CognitoUserPoolId`].Value' --output text)
echo $cognito_user_pool
cdk destroy
aws cognito-idp delete-user-pool --user-pool-id $cognito_user_pool
  1. Delete the AWS Cloud9 environment that you created from the AWS Cloud9 console.

Conclusion

In this post, we provided an overview of how we can build an invoice automation pipeline using Amazon Textract for data extraction and create a workflow for validation, archival, and search. We provided code samples on how to use the AnalyzeExpense API for extraction of critical fields from an invoice.

To get started, sign in to the Amazon Textract console to try this feature. To learn more about Amazon Textract capabilities, refer to the Amazon Textract Developer Guide or Textract Resources. To learn more about IDP, refer to the IDP with AWS AI services Part 1 and Part 2 posts.


About the Authors

Sushant Pradhan is a Sr. Solutions Architect at Amazon Web Services, helping enterprise customers. His interests and experience include containers, serverless technology, and DevOps. In his spare time, Sushant enjoys spending time outdoors with his family.

Shibin Michaelraj is a Sr. Product Manager with the AWS Textract team. He is focused on building AI/ML-based products for AWS customers.

Suprakash Dutta is a Sr. Solutions Architect at Amazon Web Services. He focuses on digital transformation strategy, application modernization and migration, data analytics, and machine learning. He is part of the AI/ML community at AWS and designs intelligent document processing solutions.

Maran Chandrasekaran is a Senior Solutions Architect at Amazon Web Services, working with our enterprise customers. Outside of work, he loves to travel and ride his motorcycle in Texas Hill Country.

Read More

Best practices for building secure applications with Amazon Transcribe

Best practices for building secure applications with Amazon Transcribe

Amazon Transcribe is an AWS service that allows customers to convert speech to text in either batch or streaming mode. It uses machine learning–powered automatic speech recognition (ASR), automatic language identification, and post-processing technologies. Amazon Transcribe can be used for transcription of customer care calls, multiparty conference calls, and voicemail messages, as well as subtitle generation for recorded and live videos, to name just a few examples. In this blog post, you will learn how to power your applications with Amazon Transcribe capabilities in a way that meets your security requirements.

Some customers entrust Amazon Transcribe with data that is confidential and proprietary to their business. In other cases, audio content processed by Amazon Transcribe may contain sensitive data that needs to be protected to comply with local laws and regulations. Examples of such information are personally identifiable information (PII), personal health information (PHI), and payment card industry (PCI) data. In the following sections of the blog, we cover different mechanisms Amazon Transcribe has to protect customer data both in transit and at rest. We share the following seven security best practices to build applications with Amazon Transcribe that meet your security and compliance requirements:

  1. Use data protection with Amazon Transcribe
  2. Communicate over a private network path
  3. Redact sensitive data if needed
  4. Use IAM roles for applications and AWS services that require Amazon Transcribe access
  5. Use tag-based access control
  6. Use AWS monitoring tools
  7. Enable AWS Config

The following best practices are general guidelines and don’t represent a complete security solution. Because these best practices might not be appropriate or sufficient for your environment, use them as helpful considerations rather than prescriptions.

Best practice 1 – Use data protection with Amazon Transcribe

Amazon Transcribe conforms to the AWS shared responsibility model, which differentiates AWS responsibility for security of the cloud from customer responsibility for security in the cloud.

AWS is responsible for protecting the global infrastructure that runs all of the AWS Cloud. As the customer, you are responsible for maintaining control over your content that is hosted on this infrastructure. This content includes the security configuration and management tasks for the AWS services that you use. For more information about data privacy, see the Data Privacy FAQ.

Protecting data in transit

Data encryption is used to make sure that data communication between your application and Amazon Transcribe remains confidential. The use of strong cryptographic algorithms protects data while it is being transmitted.

Amazon Transcribe can operate in one of the two modes:

  • Streaming transcriptions allow media stream transcription in real time
  • Batch transcription jobs allow transcription of audio files using asynchronous jobs.

In streaming transcription mode, client applications open a bidirectional streaming connection over HTTP/2 or WebSockets. An application sends an audio stream to Amazon Transcribe, and the service responds with a stream of text in real time. Both HTTP/2 and WebSockets streaming connections are established over Transport Layer Security (TLS), which is a widely accepted cryptographic protocol. TLS provides authentication and encryption of data in transit using AWS certificates. We recommend using TLS 1.2 or later.

In batch transcription mode, an audio file first needs to be put in an Amazon Simple Storage Service (Amazon S3) bucket. Then a batch transcription job referencing the S3 URI of this file is created in Amazon Transcribe. Both Amazon Transcribe in batch mode and Amazon S3 use HTTP/1.1 over TLS to protect data in transit.

All requests to Amazon Transcribe over HTTP and WebSockets must be authenticated using AWS Signature Version 4. It is recommended to use Signature Version 4 to authenticate HTTP requests to Amazon S3 as well, although authentication with older Signature Version 2 is also possible in some AWS Regions. Applications must have valid credentials to sign API requests to AWS services.

Protecting data at rest

Amazon Transcribe in batch mode uses S3 buckets to store both the input audio file and the output transcription file. Customers use an S3 bucket to store the input audio file, and it is highly recommended to enable encryption on this bucket. Amazon Transcribe supports the following S3 encryption methods:

Both methods encrypt customer data as it is written to disks and decrypt it when you access it using one of the strongest block cyphers available: 256-bit Advanced Encryption Standard (AES-256) GCM.When using SSE-S3, encryption keys are managed and regularly rotated by the Amazon S3 service. For additional security and compliance, SSE-KMS provides customers with control over encryption keys via AWS Key Management Service (AWS KMS). AWS KMS gives additional access controls because you have to have permissions to use the appropriate KMS keys in order to encrypt and decrypt objects in S3 buckets configured with SSE-KMS. Also, SSE-KMS provides customers with an audit trail capability that keeps records of who used your KMS keys and when.

The output transcription can be stored in the same or a different customer-owned S3 bucket. In this case, the same SSE-S3 and SSE-KMS encryption options apply. Another option for Amazon Transcribe output in batch mode is using a service-managed S3 bucket. Then output data is put in a secure S3 bucket managed by Amazon Transcribe service, and you are provided with a temporary URI that can be used to download your transcript.

Amazon Transcribe uses encrypted Amazon Elastic Block Store (Amazon EBS) volumes to temporarily store customer data during media processing. The customer data is cleaned up for both complete and failure cases.

Best practice 2 – Communicate over a private network path

Many customers rely on encryption in transit to securely communicate with Amazon Transcribe over the Internet. However, for some applications, data encryption in transit may not be sufficient to meet security requirements. In some cases, data is required to not traverse public networks such as the internet. Also, there may be a requirement for the application to be deployed in a private environment not connected to the internet. To meet these requirements, use interface VPC endpoints powered by AWS PrivateLink.

The following architectural diagram demonstrates a use case where an application is deployed on Amazon EC2. The EC2 instance that is running the application does not have access to the internet and is communicating with Amazon Transcribe and Amazon S3 via interface VPC endpoints.

An EC2 instance inside a VPC is communicating with Amazon Transcribe and Amazon S3 services in the same region via interface VPC endpoints.

In some scenarios, the application that is communicating with Amazon Transcribe may be deployed in an on-premises data center. There may be additional security or compliance requirements that mandate that data exchanged with Amazon Transcribe must not transit public networks such as the internet. In this case, private connectivity via AWS Direct Connect can be used. The following diagram shows an architecture that allows an on-premises application to communicate with Amazon Transcribe without any connectivity to the internet.

A Corporate data center with an application server is connected to AWS cloud via AWS Direct Connect. The on-premises application server is communicating with Amazon Transcribe and Amazon S3 services via AWS Direct Connect and then interface VPC endpoints.

Best practice 3 – Redact sensitive data if needed

Some use cases and regulatory environments may require the removal of sensitive data from transcripts and audio files. Amazon Transcribe supports identifying and redacting personally identifiable information (PII) such as names, addresses, Social Security numbers, and so on. This capability can be used to enable customers to achieve payment card industry (PCI) compliance by redacting PII such as credit or debit card number, expiration date, and three-digit card verification code (CVV). Transcripts with redacted information will have PII replaced with placeholders in square brackets indicating what type of PII was redacted. Streaming transcriptions support the additional capability to only identify PII and label it without redaction. The types of PII redacted by Amazon Transcribe vary between batch and streaming transcriptions. Refer to Redacting PII in your batch job and Redacting or identifying PII in a real-time stream for more details.

The specialized Amazon Transcribe Call Analytics APIs have a built-in capability to redact PII in both text transcripts and audio files. This API uses specialized speech-to-text and natural language processing (NLP) models trained specifically to understand customer service and sales calls. For other use cases, you can use this solution to redact PII from audio files with Amazon Transcribe.

Additional Amazon Transcribe security best practices

Best practice 4 – Use IAM roles for applications and AWS services that require Amazon Transcribe access. When you use a role, you don’t have to distribute long-term credentials, such as passwords or access keys, to an EC2 instance or AWS service. IAM roles can supply temporary permissions that applications can use when they make requests to AWS resources.

Best Practice 5 – Use tag-based access control. You can use tags to control access within your AWS accounts. In Amazon Transcribe, tags can be added to transcription jobs, custom vocabularies, custom vocabulary filters, and custom language models.

Best Practice 6 – Use AWS monitoring tools. Monitoring is an important part of maintaining the reliability, security, availability, and performance of Amazon Transcribe and your AWS solutions. You can monitor Amazon Transcribe using AWS CloudTrail and Amazon CloudWatch.

Best Practice 7 – Enable AWS Config. AWS Config enables you to assess, audit, and evaluate the configurations of your AWS resources. Using AWS Config, you can review changes in configurations and relationships between AWS resources, investigate detailed resource configuration histories, and determine your overall compliance against the configurations specified in your internal guidelines. This can help you simplify compliance auditing, security analysis, change management, and operational troubleshooting.

Compliance validation for Amazon Transcribe

Applications that you build on AWS may be subject to compliance programs, such as SOC, PCI, FedRAMP, and HIPAA. AWS uses third-party auditors to evaluate its services for compliance with various programs. AWS Artifact allows you to download third-party audit reports.

To find out if an AWS service is within the scope of specific compliance programs, refer to AWS Services in Scope by Compliance Program. For additional information and resources that AWS provides to help customers with compliance, refer to Compliance validation for Amazon Transcribe and AWS compliance resources.

Conclusion

In this post, you have learned about various security mechanisms, best practices, and architectural patterns available for you to build secure applications with Amazon Transcribe. You can protect your sensitive data both in transit and at rest with strong encryption. PII redaction can be used to enable removal of personal information from your transcripts if you do not want to process and store it. VPC endpoints and Direct Connect allow you to establish private connectivity between your application and the Amazon Transcribe service. We also provided references that will help you validate compliance of your application using Amazon Transcribe with programs such as SOC, PCI, FedRAMP, and HIPAA.

As next steps, check out Getting started with Amazon Transcribe to quickly start using the service. Refer to Amazon Transcribe documentation to dive deeper into the service details. And follow Amazon Transcribe on the AWS Machine Learning Blog to keep up to date with new capabilities and use cases for Amazon Transcribe.


About the Author

Portrait image of Alex Bulatkin, a Solutions Architect at AWS

Alex Bulatkin is a Solutions Architect at AWS. He enjoys helping communication service providers build innovative solutions in AWS that are redefining the telecom industry. He is passionate about working with customers on bringing the power of AWS AI services into their applications. Alex is based in the Denver metropolitan area and likes to hike, ski, and snowboard.

Read More

Boost your content editing with Contentful and Amazon Bedrock

Boost your content editing with Contentful and Amazon Bedrock

This post is co-written with Matt Middleton from Contentful.

Today, jointly with Contentful, we are announcing the launch of the AI Content Generator powered by Amazon Bedrock.

The AI Content Generator powered by Amazon Bedrock is an app available on the Contentful Marketplace that allows users to create, rewrite, summarize, and translate content using cutting-edge generative artificial intelligence (AI) models available and accessible through Amazon Bedrock in a simple and secure manner. This app helps content producers reduce their lead time to publishing content while enhancing the quality and consistency of the content produced.

Contentful is an intelligent composable content platform that allows the creation and management of content to build digital experiences. Contentful is an AWS customer and partner.

Amazon Bedrock is a fully managed service that offers quick access to a choice of industry-leading large language models and other foundation models from AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon, along with a broad set of capabilities to build generative AI applications, simplifying development while supporting privacy and security.

With this newly launched app, Contentful customers can take advantage of the range of models provided by Amazon Bedrock directly from within Contentful. They can pick the model that best suits their desired language style, creativity, speed, and budget.

Use case

Contentful customers use the platform to scale content across global experiences, brands, and channels. A frequent task a content editor may have is to rewrite existing content to make it fit for another channel, for example to make it fit a smaller screen by shortening it. This is a task that the AI Content Generator powered by Amazon Bedrock can now do automatically:

  1. First, open an existing content entry. The app is available in the sidebar.AI Content Generator in the side bar
  2. When you choose Rewrite, a dialog asks you to choose the fields for input and output, in this case Body and Body Short, respectively.
  3. You can describe the modifications that should be done to the existing content; in this case, we choose the pre-provided option Shorter.
  4. Choose Generate to invoke Amazon Bedrock to perform the desired modification.
    Content rewrite modal
  5. Finally, you can modify the generated text and then choose Apply to apply the text to the specified output field.Generated text

Getting started

To get started, you need to sign up for Contentful, which is free. Next, install the AI Content Generator powered by Amazon Bedrock app by visiting the Contentful Marketplace and choosing Install now.

The installation dialog asks you for an AWS Region where you want to use Amazon Bedrock, as well as an AWS access key ID and secret access key for authentication. To help you meet your organization’s security rules, we have published an AWS CloudFormation template that creates an AWS Identity and Access Management (IAM) user with the minimum privileges you need for this app.

Using generative AI at scale can become expensive. To help prevent surprises in your AWS bill, we have published another CloudFormation template that deploys a budgeting application into your AWS account. You’re able to define a soft limit, which invokes an email notification, and a hard limit, which disables access to Amazon Bedrock entirely for the IAM user you created.

During app installation, you’re able to provide company-specific information such as branding guidelines, which will always be applied when interacting with the app. At the end of the installation dialog, make sure you assign the app to all content types where you may need it. This will enable the app in the sidebar of your content editing experience.

Conclusion

With the AI Content Generator powered by Amazon Bedrock, content teams can unlock powerful tools to save time, reduce feedback loops, and increase creativity. Contentful customers can use generative AI to generate content on demand, transform content to shift voice and tone, translate and localize content for global markets, and even make sure that content remains on brand. Generative AI also plays a critical role in eliminating the “blank page” problem, where a digital team spends more time figuring out where to start than actually creating great content.

Bringing Amazon Bedrock to Contentful means that digital teams can now use a range of leading large language models to unlock their creativity, create more efficiently, and reach their customers in the most impactful way.


About the Authors

Ulrich Hinze is a Solutions Architect at AWS. He partners with software companies to architect and implement cloud-based solutions on AWS. Before joining AWS, he worked for AWS customers and partners in software engineering, consulting, and architecture roles for 8+ years.

Aris Tsakpinis is a Specialist Solutions Architect for AI & Machine Learning with a special focus on natural language processing (NLP), large language models (LLMs), and generative AI. In his free time he is pursuing a PhD in ML Engineering at University of Regensburg, focussing on applied NLP in the science domain.

Matt Middleton is the Senior Product Partner Ecosystem Manager at Contentful. He runs the strategy and operations of Contentful’s Technology Partner Program. Matt’s background includes eCommerce, enterprise search, personalization, and marketing automation.

Read More

Unlock the potential of generative AI in industrial operations

Unlock the potential of generative AI in industrial operations

In the evolving landscape of manufacturing, the transformative power of AI and machine learning (ML) is evident, driving a digital revolution that streamlines operations and boosts productivity. However, this progress introduces unique challenges for enterprises navigating data-driven solutions. Industrial facilities grapple with vast volumes of unstructured data, sourced from sensors, telemetry systems, and equipment dispersed across production lines. Real-time data is critical for applications like predictive maintenance and anomaly detection, yet developing custom ML models for each industrial use case with such time series data demands considerable time and resources from data scientists, hindering widespread adoption.

Generative AI using large pre-trained foundation models (FMs) such as Claude can rapidly generate a variety of content from conversational text to computer code based on simple text prompts, known as zero-shot prompting. This eliminates the need for data scientists to manually develop specific ML models for each use case, and therefore democratizes AI access, benefitting even small manufacturers. Workers gain productivity through AI-generated insights, engineers can proactively detect anomalies, supply chain managers optimize inventories, and plant leadership makes informed, data-driven decisions.

Nevertheless, standalone FMs face limitations in handling complex industrial data with context size constraints (typically less than 200,000 tokens), which poses challenges. To address this, you can use the FM’s ability to generate code in response to natural language queries (NLQs). Agents like PandasAI come into play, running this code on high-resolution time series data and handling errors using FMs. PandasAI is a Python library that adds generative AI capabilities to pandas, the popular data analysis and manipulation tool.

However, complex NLQs, such as time series data processing, multi-level aggregation, and pivot or joint table operations, may yield inconsistent Python script accuracy with a zero-shot prompt.

To enhance code generation accuracy, we propose dynamically constructing multi-shot prompts for NLQs. Multi-shot prompting provides additional context to the FM by showing it several examples of desired outputs for similar prompts, boosting accuracy and consistency. In this post, multi-shot prompts are retrieved from an embedding containing successful Python code run on a similar data type (for example, high-resolution time series data from Internet of Things devices). The dynamically constructed multi-shot prompt provides the most relevant context to the FM, and boosts the FM’s capability in advanced math calculation, time series data processing, and data acronym understanding. This improved response facilitates enterprise workers and operational teams in engaging with data, deriving insights without requiring extensive data science skills.

Beyond time series data analysis, FMs prove valuable in various industrial applications. Maintenance teams assess asset health, capture images for Amazon Rekognition-based functionality summaries, and anomaly root cause analysis using intelligent searches with Retrieval Augmented Generation (RAG). To simplify these workflows, AWS has introduced Amazon Bedrock, enabling you to build and scale generative AI applications with state-of-the-art pre-trained FMs like Claude v2. With Knowledge Bases for Amazon Bedrock, you can simplify the RAG development process to provide more accurate anomaly root cause analysis for plant workers. Our post showcases an intelligent assistant for industrial use cases powered by Amazon Bedrock, addressing NLQ challenges, generating part summaries from images, and enhancing FM responses for equipment diagnosis through the RAG approach.

Solution overview

The following diagram illustrates the solution architecture.

The workflow includes three distinct use cases:

Use case 1: NLQ with time series data

The workflow for NLQ with time series data consists of the following steps:

  1. We use a condition monitoring system with ML capabilities for anomaly detection, such as Amazon Monitron, to monitor industrial equipment health. Amazon Monitron is able to detect potential equipment failures from the equipment’s vibration and temperature measurements.
  2. We collect time series data by processing Amazon Monitron data through Amazon Kinesis Data Streams and Amazon Data Firehose, converting it into a tabular CSV format and saving it in an Amazon Simple Storage Service (Amazon S3) bucket.
  3. The end-user can start chatting with their time series data in Amazon S3 by sending a natural language query to the Streamlit app.
  4. The Streamlit app forwards user queries to the Amazon Bedrock Titan text embedding model to embed this query, and performs a similarity search within an Amazon OpenSearch Service index, which contains prior NLQs and example codes.
  5. After the similarity search, the top similar examples, including NLQ questions, data schema, and Python codes, are inserted in a custom prompt.
  6. PandasAI sends this custom prompt to the Amazon Bedrock Claude v2 model.
  7. The app uses the PandasAI agent to interact with the Amazon Bedrock Claude v2 model, generating Python code for Amazon Monitron data analysis and NLQ responses.
  8. After the Amazon Bedrock Claude v2 model returns the Python code, PandasAI runs the Python query on the Amazon Monitron data uploaded from the app, collecting code outputs and addressing any necessary retries for failed runs.
  9. The Streamlit app collects the response via PandasAI, and provides the output to users. If the output is satisfactory, the user can mark it as helpful, saving the NLQ and Claude-generated Python code in OpenSearch Service.

Use case 2: Summary generation of malfunctioning parts

Our summary generation use case consists of the following steps:

  1. After the user knows which industrial asset shows anomalous behavior, they can upload images of the malfunctioning part to identify if there is something physically wrong with this part according to its technical specification and operation condition.
  2. The user can use the Amazon Recognition DetectText API to extract text data from these images.
  3. The extracted text data is included in the prompt for the Amazon Bedrock Claude v2 model, enabling the model to generate a 200-word summary of the malfunctioning part. The user can use this information to perform further inspection of the part.

Use case 3: Root cause diagnosis

Our root cause diagnosis use case consists of the following steps:

  1. The user obtains enterprise data in various document formats (PDF, TXT, and so on) related with malfunctioning assets, and uploads them to an S3 bucket.
  2. A knowledge base of these files is generated in Amazon Bedrock with a Titan text embeddings model and a default OpenSearch Service vector store.
  3. The user poses questions related to the root cause diagnosis for malfunctioning equipment. Answers are generated through the Amazon Bedrock knowledge base with a RAG approach.

Prerequisites

To follow along with this post, you should meet the following prerequisites:

Deploy the solution infrastructure

To set up your solution resources, complete the following steps:

  1. Deploy the AWS CloudFormation template opensearchsagemaker.yml, which creates an OpenSearch Service collection and index, Amazon SageMaker notebook instance, and S3 bucket. You can name this AWS CloudFormation stack as: genai-sagemaker.
  2. Open the SageMaker notebook instance in JupyterLab. You will find the following GitHub repo already downloaded on this instance: unlocking-the-potential-of-generative-ai-in-industrial-operations.
  3. Run the notebook from the following directory in this repository: unlocking-the-potential-of-generative-ai-in-industrial-operations/SagemakerNotebook/nlq-vector-rag-embedding.ipynb. This notebook will load the OpenSearch Service index using the SageMaker notebook to store key-value pairs from the existing 23 NLQ examples.
  4. Upload documents from the data folder assetpartdoc in the GitHub repository to the S3 bucket listed in the CloudFormation stack outputs.

Next, you create the knowledge base for the documents in Amazon S3.

  1. On the Amazon Bedrock console, choose Knowledge base in the navigation pane.
  2. Choose Create knowledge base.
  3. For Knowledge base name, enter a name.
  4. For Runtime role, select Create and use a new service role.
  5. For Data source name, enter the name of your data source.
  6. For S3 URI, enter the S3 path of the bucket where you uploaded the root cause documents.
  7. Choose Next.
    The Titan embeddings model is automatically selected.
  8. Select Quick create a new vector store.
  9. Review your settings and create the knowledge base by choosing Create knowledge base.
  10. After the knowledge base is successfully created, choose Sync to sync the S3 bucket with the knowledge base.
  11. After you set up the knowledge base, you can test the RAG approach for root cause diagnosis by asking questions like “My actuator travels slow, what might be the issue?”

The next step is to deploy the app with the required library packages on either your PC or an EC2 instance (Ubuntu Server 22.04 LTS).

  1. Set up your AWS credentials with the AWS CLI on your local PC. For simplicity, you can use the same admin role you used to deploy the CloudFormation stack. If you’re using Amazon EC2, attach a suitable IAM role to the instance.
  2. Clone GitHub repo:
    git clone https://github.com/aws-samples/unlocking-the-potential-of-generative-ai-in-industrial-operations

  3. Change the directory to unlocking-the-potential-of-generative-ai-in-industrial-operations/src and run the setup.sh script in this folder to install the required packages, including LangChain and PandasAI:
    cd unlocking-the-potential-of-generative-ai-in-industrial-operations/src
    chmod +x ./setup.sh
    ./setup.sh   
  4. Run the Streamlit app with the following command:
    source monitron-genai/bin/activate
    python3 -m streamlit run app_bedrock.py <REPLACE WITH YOUR BEDROCK KNOWLEDGEBASE ARN>
    

Provide the OpenSearch Service collection ARN you created in Amazon Bedrock from the previous step.

Chat with your asset health assistant

After you complete the end-to-end deployment, you can access the app via localhost on port 8501, which opens a browser window with the web interface. If you deployed the app on an EC2 instance, allow port 8501 access via the security group inbound rule. You can navigate to different tabs for various use cases.

Explore use case 1

To explore the first use case, choose Data Insight and Chart. Begin by uploading your time series data. If you don’t have an existing time series data file to use, you can upload the following sample CSV file with anonymous Amazon Monitron project data. If you already have an Amazon Monitron project, refer to Generate actionable insights for predictive maintenance management with Amazon Monitron and Amazon Kinesis to stream your Amazon Monitron data to Amazon S3 and use your data with this application.

When the upload is complete, enter a query to initiate a conversation with your data. The left sidebar offers a range of example questions for your convenience. The following screenshots illustrate the response and Python code generated by the FM when inputting a question such as “Tell me the unique number of sensors for each site shown as Warning or Alarm respectively?” (a hard-level question) or “For sensors shown temperature signal as NOT Healthy, can you calculate the time duration in days for each sensor shown abnormal vibration signal?” (a challenge-level question). The app will answer your question, and will also show the Python script of data analysis it performed to generate such results.

If you’re satisfied with the answer, you can mark it as Helpful, saving the NLQ and Claude-generated Python code to an OpenSearch Service index.

Explore use case 2

To explore the second use case, choose the Captured Image Summary tab in the Streamlit app. You can upload an image of your industrial asset, and the application will generate a 200-word summary of its technical specification and operation condition based on the image information. The following screenshot shows the summary generated from an image of a belt motor drive. To test this feature, if you lack a suitable image, you can use the following example image.

Hydraulic elevator motor label” by Clarence Risher is licensed under CC BY-SA 2.0.

Explore use case 3

To explore the third use case, choose the Root cause diagnosis tab. Input a query related to your broken industrial asset, such as, “My actuator travels slow, what might be the issue?” As depicted in the following screenshot, the application delivers a response with the source document excerpt used to generate the answer.

Use case 1: Design details

In this section, we discuss the design details of the application workflow for the first use case.

Custom prompt building

The user’s natural language query comes with different difficult levels: easy, hard, and challenge.

Straightforward questions may include the following requests:

  • Select unique values
  • Count total numbers
  • Sort values

For these questions, PandasAI can directly interact with the FM to generate Python scripts for processing.

Hard questions require basic aggregation operation or time series analysis, such as the following:

  • Select value first and group results hierarchically
  • Perform statistics after initial record selection
  • Timestamp count (for example, min and max)

For hard questions, a prompt template with detailed step-by-step instructions assists FMs in providing accurate responses.

Challenge-level questions need advanced math calculation and time series processing, such as the following:

  • Calculate anomaly duration for each sensor
  • Calculate anomaly sensors for site on a monthly basis
  • Compare sensor readings under normal operation and abnormal conditions

For these questions, you can use multi-shots in a custom prompt to enhance response accuracy. Such multi-shots show examples of advanced time series processing and math calculation, and will provide context for the FM to perform relevant inference on similar analysis. Dynamically inserting the most relevant examples from an NLQ question bank into the prompt can be a challenge. One solution is to construct embeddings from existing NLQ question samples and save these embeddings in a vector store like OpenSearch Service. When a question is sent to the Streamlit app, the question will be vectorized by BedrockEmbeddings. The top N most-relevant embeddings to that question are retrieved using opensearch_vector_search.similarity_search and inserted into the prompt template as a multi-shot prompt.

The following diagram illustrates this workflow.

The embedding layer is constructed using three key tools:

  • Embeddings model – We use Amazon Titan Embeddings available through Amazon Bedrock (amazon.titan-embed-text-v1) to generate numerical representations of textual documents.
  • Vector store – For our vector store, we use OpenSearch Service via the LangChain framework, streamlining the storage of embeddings generated from NLQ examples in this notebook.
  • Index – The OpenSearch Service index plays a pivotal role in comparing input embeddings to document embeddings and facilitating the retrieval of relevant documents. Because the Python example codes were saved as a JSON file, they were indexed in OpenSearch Service as vectors via an OpenSearchVevtorSearch.fromtexts API call.

Continuous collection of human-audited examples via Streamlit

At the outset of app development, we began with only 23 saved examples in the OpenSearch Service index as embeddings. As the app goes live in the field, users start inputting their NLQs via the app. However, due to the limited examples available in the template, some NLQs may not find similar prompts. To continuously enrich these embeddings and offer more relevant user prompts, you can use the Streamlit app for gathering human-audited examples.

Within the app, the following function serves this purpose. When end-users find the output helpful and select Helpful, the application follows these steps:

  1. Use the callback method from PandasAI to collect the Python script.
  2. Reformat the Python script, input question, and CSV metadata into a string.
  3. Check whether this NLQ example already exists in the current OpenSearch Service index using opensearch_vector_search.similarity_search_with_score.
  4. If there’s no similar example, this NLQ is added to the OpenSearch Service index using opensearch_vector_search.add_texts.

In the event that a user selects Not Helpful, no action is taken. This iterative process makes sure that the system continually improves by incorporating user-contributed examples.

def addtext_opensearch(input_question, generated_chat_code, df_column_metadata, opensearch_vector_search,similarity_threshold,kexamples, indexname):
    #######build the input_question and generated code the same format as existing opensearch index##########
    reconstructed_json = {}
    reconstructed_json["question"]=input_question
    reconstructed_json["python_code"]=str(generated_chat_code)
    reconstructed_json["column_info"]=df_column_metadata
    json_str = ''
    for key,value in reconstructed_json.items():
        json_str += key + ':' + value
    reconstructed_raw_text =[]
    reconstructed_raw_text.append(json_str)
    
    results = opensearch_vector_search.similarity_search_with_score(str(reconstructed_raw_text[0]), k=kexamples)  # our search query  # return 3 most relevant docs
    if (dumpd(results[0][1])<similarity_threshold):    ###No similar embedding exist, then add text to embedding
        response = opensearch_vector_search.add_texts(texts=reconstructed_raw_text, engine="faiss", index_name=indexname)
    else:
        response = "A similar embedding is already exist, no action."
    
    return response

By incorporating human auditing, the quantity of examples in OpenSearch Service available for prompt embedding grows as the app gains usage. This expanded embedding dataset results in enhanced search accuracy over time. Specifically, for challenging NLQs, the FM’s response accuracy reaches approximately 90% when dynamically inserting similar examples to construct custom prompts for each NLQ question. This represents a notable 28% increase compared to scenarios without multi-shot prompts.

Use case 2: Design details

On the Streamlit app’s Captured Image Summary tab, you can directly upload an image file. This initiates the Amazon Rekognition API (detect_text API), extracting text from the image label detailing machine specifications. Subsequently, the extracted text data is sent to the Amazon Bedrock Claude model as the context of a prompt, resulting in a 200-word summary.

From a user experience perspective, enabling streaming functionality for a text summarization task is paramount, allowing users to read the FM-generated summary in smaller chunks rather than waiting for the entire output. Amazon Bedrock facilitates streaming via its API (bedrock_runtime.invoke_model_with_response_stream).

Use case 3: Design details

In this scenario, we’ve developed a chatbot application focused on root cause analysis, employing the RAG approach. This chatbot draws from multiple documents related to bearing equipment to facilitate root cause analysis. This RAG-based root cause analysis chatbot uses knowledge bases for generating vector text representations, or embeddings. Knowledge Bases for Amazon Bedrock is a fully managed capability that helps you implement the entire RAG workflow, from ingestion to retrieval and prompt augmentation, without having to build custom integrations to data sources or manage data flows and RAG implementation details.

When you’re satisfied with the knowledge base response from Amazon Bedrock, you can integrate the root cause response from the knowledge base to the Streamlit app.

Clean up

To save costs, delete the resources you created in this post:

  1. Delete the knowledge base from Amazon Bedrock.
  2. Delete the OpenSearch Service index.
  3. Delete the genai-sagemaker CloudFormation stack.
  4. Stop the EC2 instance if you used an EC2 instance to run the Streamlit app.

Conclusion

Generative AI applications have already transformed various business processes, enhancing worker productivity and skill sets. However, the limitations of FMs in handling time series data analysis have hindered their full utilization by industrial clients. This constraint has impeded the application of generative AI to the predominant data type processed daily.

In this post, we introduced a generative AI Application solution designed to alleviate this challenge for industrial users. This application uses an open source agent, PandasAI, to strengthen an FM’s time series analysis capability. Rather than sending time series data directly to FMs, the app employs PandasAI to generate Python code for the analysis of unstructured time series data. To enhance the accuracy of Python code generation, a custom prompt generation workflow with human auditing has been implemented.

Empowered with insights into their asset health, industrial workers can fully harness the potential of generative AI across various use cases, including root cause diagnosis and part replacement planning. With Knowledge Bases for Amazon Bedrock, the RAG solution is straightforward for developers to build and manage.

The trajectory of enterprise data management and operations is unmistakably moving towards deeper integration with generative AI for comprehensive insights into operational health. This shift, spearheaded by Amazon Bedrock, is significantly amplified by the growing robustness and potential of LLMs like Amazon Bedrock Claude 3 to further elevate solutions. To learn more, visit consult the Amazon Bedrock documentation, and get hands-on with the Amazon Bedrock workshop.


About the authors

Julia Hu is a Sr. AI/ML Solutions Architect at Amazon Web Services. She is specialized in Generative AI, Applied Data Science and IoT architecture. Currently she is part of the Amazon Q team, and an active member/mentor in Machine Learning Technical Field Community. She works with customers, ranging from start-ups to enterprises, to develop AWSome generative AI solutions. She is particularly passionate about leveraging Large Language Models for advanced data analytics and exploring practical applications that address real-world challenges.

Sudeesh Sasidharan is a Senior Solutions Architect at AWS, within the Energy team. Sudeesh loves experimenting with new technologies and building innovative solutions that solve complex business challenges. When he is not designing solutions or tinkering with the latest technologies, you can find him on the tennis court working on his backhand.

Neil Desai is a technology executive with over 20 years of experience in artificial intelligence (AI), data science, software engineering, and enterprise architecture. At AWS, he leads a team of Worldwide AI services specialist solutions architects who help customers build innovative Generative AI-powered solutions, share best practices with customers, and drive product roadmap. In his previous roles at Vestas, Honeywell, and Quest Diagnostics, Neil has held leadership roles in developing and launching innovative products and services that have helped companies improve their operations, reduce costs, and increase revenue. He is passionate about using technology to solve real-world problems and is a strategic thinker with a proven track record of success.

Read More

Enhance performance of generative language models with self-consistency prompting on Amazon Bedrock

Enhance performance of generative language models with self-consistency prompting on Amazon Bedrock

Generative language models have proven remarkably skillful at solving logical and analytical natural language processing (NLP) tasks. Furthermore, the use of prompt engineering can notably enhance their performance. For example, chain-of-thought (CoT) is known to improve a model’s capacity for complex multi-step problems. To additionally boost accuracy on tasks that involve reasoning, a self-consistency prompting approach has been suggested, which replaces greedy with stochastic decoding during language generation.

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models from leading AI companies and Amazon via a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI. With the batch inference API, you can use Amazon Bedrock to run inference with foundation models in batches and get responses more efficiently. This post shows how to implement self-consistency prompting via batch inference on Amazon Bedrock to enhance model performance on arithmetic and multiple-choice reasoning tasks.

Overview of solution

Self-consistency prompting of language models relies on the generation of multiple responses that are aggregated into a final answer. In contrast to single-generation approaches like CoT, the self-consistency sample-and-marginalize procedure creates a range of model completions that lead to a more consistent solution. The generation of different responses for a given prompt is possible due to the use of a stochastic, rather than greedy, decoding strategy.

The following figure shows how self-consistency differs from greedy CoT in that it generates a diverse set of reasoning paths and aggregates them to produce the final answer.

Differences between self-consistency and CoT prompting.

Decoding strategies for text generation

Text generated by decoder-only language models unfolds word by word, with the subsequent token being predicted on the basis of the preceding context. For a given prompt, the model computes a probability distribution indicating the likelihood of each token to appear next in the sequence. Decoding involves translating these probability distributions into actual text. Text generation is mediated by a set of inference parameters that are often hyperparameters of the decoding method itself. One example is the temperature, which modulates the probability distribution of the next token and influences the randomness of the model’s output.

Greedy decoding is a deterministic decoding strategy that at each step selects the token with the highest probability. Although straightforward and efficient, the approach risks falling into repetitive patterns, because it disregards the broader probability space. Setting the temperature parameter to 0 at inference time essentially equates to implementing greedy decoding.

Sampling introduces stochasticity into the decoding process by randomly selecting each subsequent token based on the predicted probability distribution. This randomness results in greater output variability. Stochastic decoding proves more adept at capturing the diversity of potential outputs and often yields more imaginative responses. Higher temperature values introduce more fluctuations and increase the creativity of the model’s response.

Prompting techniques: CoT and self-consistency

The reasoning ability of language models can be augmented via prompt engineering. In particular, CoT has been shown to elicit reasoning in complex NLP tasks. One way to implement a zero-shot CoT is via prompt augmentation with the instruction to “think step by step.” Another is to expose the model to exemplars of intermediate reasoning steps in few-shot prompting fashion. Both scenarios typically use greedy decoding. CoT leads to significant performance gains compared to simple instruction prompting on arithmetic, commonsense, and symbolic reasoning tasks.

Self-consistency prompting is based on the assumption that introducing diversity in the reasoning process can be beneficial to help models converge on the correct answer. The technique uses stochastic decoding to achieve this goal in three steps:

  1. Prompt the language model with CoT exemplars to elicit reasoning.
  2. Replace greedy decoding with a sampling strategy to generate a diverse set of reasoning paths.
  3. Aggregate the results to find the most consistent answer in the response set.

Self-consistency is shown to outperform CoT prompting on popular arithmetic and commonsense reasoning benchmarks. A limitation of the approach is its larger computational cost.

This post shows how self-consistency prompting enhances performance of generative language models on two NLP reasoning tasks: arithmetic problem-solving and multiple-choice domain-specific question answering. We demonstrate the approach using batch inference on Amazon Bedrock:

  • We access the Amazon Bedrock Python SDK in JupyterLab on an Amazon SageMaker notebook instance.
  • For arithmetic reasoning, we prompt Cohere Command on the GSM8K dataset of grade school math problems.
  • For multiple-choice reasoning, we prompt AI21 Labs Jurassic-2 Mid on a small sample of questions from the AWS Certified Solutions Architect – Associate exam.

Prerequisites

This walkthrough assumes the following prerequisites:

Manage model access on Amazon Bedrock

The estimated cost to run the code shown in this post is $100, assuming you run self-consistency prompting one time with 30 reasoning paths using one value for the temperature-based sampling.

Dataset to probe arithmetic reasoning capabilities

GSM8K is a dataset of human-assembled grade school math problems featuring a high linguistic diversity. Each problem takes 2–8 steps to solve and requires performing a sequence of elementary calculations with basic arithmetic operations. This data is commonly used to benchmark the multi-step arithmetic reasoning capabilities of generative language models. The GSM8K train set comprises 7,473 records. The following is an example:

{"question": "Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?", "answer": "Natalia sold 48/2 = <<48/2=24>>24 clips in May.nNatalia sold 48+24 = <<48+24=72>>72 clips altogether in April and May.n#### 72"}

Set up to run batch inference with Amazon Bedrock

Batch inference allows you to run multiple inference calls to Amazon Bedrock asynchronously and improve the performance of model inference on large datasets. The service is in preview as of this writing and only available through the API. Refer to Run batch inference to access batch inference APIs via custom SDKs.

After you have downloaded and unzipped the Python SDK in a SageMaker notebook instance, you can install it by running the following code in a Jupyter notebook cell:

# Install preview SDK packages
!pip install -q $(ls ./bedrock-python-sdk-reinvent/botocore-*.whl | head -1)
!pip install -q $(ls ./bedrock-python-sdk-reinvent/boto3-*.whl | head -1)

Format and upload input data to Amazon S3

Input data for batch inference needs to be prepared in JSONL format with recordId and modelInput keys. The latter should match the body field of the model to be invoked on Amazon Bedrock. In particular, some supported inference parameters for Cohere Command are temperature for randomness, max_tokens for output length, and num_generations to generate multiple responses, all of which are passed together with the prompt as modelInput:

data = [
    {
        "recordId": "1",
        "modelInput": {
            "prompt": prompt,
            "temperature": temperature,
            "max_tokens": max_tokens,
            "num_generations": n,
        },
    },
    ...,
]

See Inference parameters for foundation models for more details, including other model providers.

Our experiments on arithmetic reasoning are performed in the few-shot setting without customizing or fine-tuning Cohere Command. We use the same set of eight few-shot exemplars from the chain-of-thought (Table 20) and self-consistency (Table 17) papers. Prompts are created by concatenating the exemplars with each question from the GSM8K train set.

We set max_tokens to 512 and num_generations to 5, the maximum allowed by Cohere Command. For greedy decoding, we set temperature to 0 and for self-consistency, we run three experiments at temperatures 0.5, 0.7, and 1. Each setting yields different input data according to the respective temperature values. Data is formatted as JSONL and stored in Amazon S3.

# Set up S3 client
session = boto3.Session()
s3 = session.client("s3")

# Create S3 bucket with unique name to store input/output data
suffix = str(uuid.uuid4())[:8]
bucket = f"bedrock-self-consistency-{suffix}"
s3.create_bucket(
    Bucket=bucket, CreateBucketConfiguration={"LocationConstraint": session.region_name}
)

# Process data and output to new lines as JSONL
input_key = f"gsm8k/T{temperature}/input.jsonl"
s3_data = ""
for row in data:
    s3_data += json.dumps(row) + "n"
s3.put_object(Body=s3_data, Bucket=bucket, Key=input_key)

Create and run batch inference jobs in Amazon Bedrock

Batch inference job creation requires an Amazon Bedrock client. We specify the S3 input and output paths and give each invocation job a unique name:

# Create Bedrock client							    
bedrock = boto3.client("bedrock")

# Input and output config						     
input_config = {"s3InputDataConfig": {"s3Uri": f"s3://{bucket}/{input_key}"}}
output_config = {"s3OutputDataConfig": {"s3Uri": f"s3://{bucket}/{output_key}"}}

# Create a unique job name
suffix = str(uuid.uuid4())[:8] 
job_name = f"command-batch-T{temperature}-{suffix}"

Jobs are created by passing the IAM role, model ID, job name, and input/output configuration as parameters to the Amazon Bedrock API:

response = bedrock.create_model_invocation_job(
    roleArn=f"arn:aws:iam::{account_id}:role/BedrockBatchInferenceRole",
    modelId="cohere.command-text-v14",
    jobName=job_name,
    inputDataConfig=input_config,
    outputDataConfig=output_config,
)
job_arn = response["jobArn"]

Listing, monitoring, and stopping batch inference jobs is supported by their respective API calls. On creation, jobs appear first as Submitted, then as InProgress, and finally as Stopped, Failed, or Completed.

# Get job details
job_details = bedrock.get_model_invocation_job(jobIdentifier=job_arn)

If the jobs are successfully complete, the generated content can be retrieved from Amazon S3 using its unique output location.

# Get the output file key
s3_prefix = f"s3://{bucket}/"
output_path = job_details["outputDataConfig"]["s3OutputDataConfig"]["s3Uri"].replace(
    s3_prefix, ""
)
output_folder = job_details["jobArn"].split("/")[1]
output_file = (
    f'{job_details["inputDataConfig"]["s3InputDataConfig"]["s3Uri"].split("/")[-1]}.out'
)
result_key = f"{output_path}{output_folder}/{output_file}"

# Get output data
obj = s3.get_object(Bucket=bucket, Key=result_key)
content = obj["Body"].read().decode("utf-8").strip().split("n")

# Show answer to the first question
print(json.loads(content[0])["modelOutput"]["generations"][0]["text"])

[Out]: 'Natalia sold 48 * 1/2 = 24 clips less in May. This means she sold 48 + 24 = 72 clips in April and May. The answer is 72.'

Self-consistency enhances model accuracy on arithmetic tasks

Self-consistency prompting of Cohere Command outperforms a greedy CoT baseline in terms of accuracy on the GSM8K dataset. For self-consistency, we sample 30 independent reasoning paths at three different temperatures, with topP and topK set to their default values. Final solutions are aggregated by choosing the most consistent occurrence via majority voting. In case of a tie, we randomly choose one of the majority responses. We compute accuracy and standard deviation values averaged over 100 runs.

The following figure shows the accuracy on the GSM8K dataset from Cohere Command prompted with greedy CoT (blue) and self-consistency at temperature values 0.5 (yellow), 0.7 (green), and 1.0 (orange) as a function of the number of sampled reasoning paths.

Accuracy of Cohere Command using self-consistency vs CoT prompting.

The preceding figure shows that self-consistency enhances arithmetic accuracy over greedy CoT when the number of sampled paths is as low as three. Performance increases consistently with further reasoning paths, confirming the importance of introducing diversity in the thought generation. Cohere Command solves the GSM8K question set with 51.7% accuracy when prompted with CoT vs. 68% with 30 self-consistent reasoning paths at T=1.0. All three surveyed temperature values yield similar results, with lower temperatures being comparatively more performant at less sampled paths.

Practical considerations on efficiency and cost

Self-consistency is limited by the increased response time and cost incurred when generating multiple outputs per prompt. As a practical illustration, batch inference for greedy generation with Cohere Command on 7,473 GSM8K records finished in less than 20 minutes. The job took 5.5 million tokens as input and generated 630,000 output tokens. At current Amazon Bedrock inference prices, the total cost incurred was around $9.50.

For self-consistency with Cohere Command, we use inference parameter num_generations to create multiple completions per prompt. As of this writing, Amazon Bedrock allows a maximum of five generations and three concurrent Submitted batch inference jobs. Jobs proceed to the InProgress status sequentially, therefore sampling more than five paths requires multiple invocations.

The following figure shows the runtimes for Cohere Command on the GSM8K dataset. Total runtime is shown on the x axis and runtime per sampled reasoning path on the y axis. Greedy generation runs in the shortest time but incurs a higher time cost per sampled path.

Runtimes for Cohere Command

Greedy generation completes in less than 20 minutes for the full GSM8K set and samples a unique reasoning path. Self-consistency with five samples requires about 50% longer to complete and costs around $14.50, but produces five paths (over 500%) in that time. Total runtime and cost increase step-wise with every extra five sampled paths. A cost-benefit analysis suggests that 1–2 batch inference jobs with 5–10 sampled paths is the recommended setting for practical implementation of self-consistency. This achieves enhanced model performance while keeping cost and latency at bay.

Self-consistency enhances model performance beyond arithmetic reasoning

A crucial question to prove the suitability of self-consistency prompting is whether the method succeeds across further NLP tasks and language models. As an extension to an Amazon-related use case, we perform a small-sized analysis on sample questions from the AWS Solutions Architect Associate Certification. This is a multiple-choice exam on AWS technology and services that requires domain knowledge and the ability to reason and decide among several options.

We prepare a dataset from SAA-C01 and SAA-C03 sample exam questions. From the 20 available questions, we use the first 4 as few-shot exemplars and prompt the model to answer the remaining 16. This time, we run inference with the AI21 Labs Jurassic-2 Mid model and generate a maximum of 10 reasoning paths at temperature 0.7. Results show that self-consistency enhances performance: although greedy CoT produces 11 correct answers, self-consistency succeeds on 2 more.

The following table shows the accuracy results for 5 and 10 sampled paths averaged over 100 runs.

. Greedy decoding T = 0.7
# sampled paths: 5 68.6 74.1 ± 0.7
# sampled paths: 10 68.6 78.9 ± 0.3

In the following table, we present two exam questions that are incorrectly answered by greedy CoT while self-consistency succeeds, highlighting in each case the correct (green) or incorrect (red) reasoning traces that led the model to produce correct or incorrect responses. Although not every sampled path generated by self-consistency is correct, the majority converges on the true answer as the number of sampled paths increases. We observe that 5–10 paths are typically enough to improve over the greedy results, with diminishing returns in terms of efficiency past those values.

Question

A web application allows customers to upload orders to an S3 bucket. The resulting Amazon S3 events trigger a Lambda function that inserts a message to an SQS queue. A single EC2 instance reads messages from the queue, processes them, and stores them in a DynamoDB table partitioned by unique order ID. Next month traffic is expected to increase by a factor of 10 and a Solutions Architect is reviewing the architecture for possible scaling problems.

Which component is MOST likely to need re-architecting to be able to scale to accommodate the new traffic?

A. Lambda function
B. SQS queue
C. EC2 instance
D. DynamoDB table

An application running on AWS uses an Amazon Aurora Multi-AZ DB cluster deployment for its database. When evaluating performance metrics, a solutions architect discovered that the database reads are causing high I/O and adding latency to the write requests against the database.

What should the solutions architect do to separate the read requests from the write requests?

A. Enable read-through caching on the Aurora database.
B. Update the application to read from the Multi-AZ standby instance.
C. Create an Aurora replica and modify the application to use the appropriate endpoints.
D. Create a second Aurora database and link it to the primary database as a read replica.

Correct answer C – A single EC2 instance will not scale and is a single point of failure in the architecture. A much better solution would be to have EC2 instances in an Auto Scaling group across 2 availability zones read messages from the queue. The other responses are all managed services that can be configured to scale or will scale automatically. C – Aurora Replicas provide a way to offload read traffic. Aurora Replicas share the same underlying storage as the main database, so lag time is generally very low. Aurora Replicas have their own endpoints, so the application will need to be configured to direct read traffic to the new endpoints.
Greedy decoding Scaling the EC2 instance directly may not be possible, as it may not be possible to resize the instance. Scaling the Lambda function may not be sufficient, as Lambda functions have a 6 GB of memory allocation limit. Scaling the SQS queue may not be sufficient, as SQS has a 10,000 message limit. DynamoDB tables can be scaled by provisioning more capacity units. The correct answer is D. Modifying the application to read from the Multi-AZ standby instance (option B) is the best approach. The standby instance is asynchronously replicated from the primary instance, so read operations will not interfere with write operations. Enabling read-through caching (option A) or creating an Aurora replica (option C) will not solve the I/O contention issue. Creating a second Aurora database and linking it as a read replica (option D) will not solve the I/O contention issue either. The correct answer is B.
Sampled path #1 Scaling the EC2 instance to 10 times its current size may not be feasible, either from a cost or performance perspective. It would require a larger instance, which can be expensive. A more cost-effective approach might be to use a swarm of smaller instances (e.g., Auto Scaling Group) to share the load. In this case, the EC2 instance would likely need to be re-architected. The correct answer is C. Modifying the application to use the appropriate endpoints is not a practical solution as it would require changes to the application code. The correct answer is C: Create an Aurora replica and modify the application to use the appropriate endpoints.
Sampled path #2 Scaling a single EC2 instance is more difficult and problematic than scaling the other service components. Amazon SQS, Lambda, and DynamoDB can all scale automatically to cope with the increased load. The correct answer is C.

(C) Create an Aurora replica and modify the application to use the appropriate endpoints.

By configuring an Aurora Replica, you can separate read traffic from write traffic. The Aurora Replicas use different endpoint URLs, allowing you to direct read traffic to the replica instead of the primary database. The replica can process read requests in parallel with write requests to the primary database, reducing I/O and latency.

Clean up

Running batch inference in Amazon Bedrock is subject to charges according to the Amazon Bedrock Pricing. When you complete the walkthrough, delete your SageMaker notebook instance and remove all data from your S3 buckets to avoid incurring future charges.

Considerations

Although the demonstrated solution shows improved performance of language models when prompted with self-consistency, it’s important to note that the walkthrough is not production-ready. Before you deploy to production, you should adapt this proof of concept to your own implementation, keeping in mind the following requirements:

  • Access restriction to APIs and databases to prevent unauthorized usage.
  • Adherence to AWS security best practices regarding IAM role access and security groups.
  • Validation and sanitization of user input to prevent prompt injection attacks.
  • Monitoring and logging of triggered processes to enable testing and auditing.

Conclusion

This post shows that self-consistency prompting enhances performance of generative language models in complex NLP tasks that require arithmetic and multiple-choice logical skills. Self-consistency uses temperature-based stochastic decoding to generate various reasoning paths. This increases the ability of the model to elicit diverse and useful thoughts to arrive at correct answers.

With Amazon Bedrock batch inference, the language model Cohere Command is prompted to generate self-consistent answers to a set of arithmetic problems. Accuracy improves from 51.7% with greedy decoding to 68% with self-consistency sampling 30 reasoning paths at T=1.0. Sampling five paths already enhances accuracy by 7.5 percent points. The approach is transferable to other language models and reasoning tasks, as demonstrated by results of the AI21 Labs Jurassic-2 Mid model on an AWS Certification exam. In a small-sized question set, self-consistency with five sampled paths increases accuracy by 5 percent points over greedy CoT.

We encourage you to implement self-consistency prompting for enhanced performance in your own applications with generative language models. Learn more about Cohere Command and AI21 Labs Jurassic models available on Amazon Bedrock. For more information about batch inference, refer to Run batch inference.

Acknowledgements

The author thanks technical reviewers Amin Tajgardoon and Patrick McSweeney for helpful feedback.


About the Author

Lucía Santamaría is a Sr. Applied Scientist at Amazon’s ML University, where she’s focused on raising the level of ML competency across the company through hands-on education. Lucía has a PhD in astrophysics and is passionate about democratizing access to tech knowledge and tools.

Read More