Unlock the power of data governance and no-code machine learning with Amazon SageMaker Canvas and Amazon DataZone

Unlock the power of data governance and no-code machine learning with Amazon SageMaker Canvas and Amazon DataZone

Amazon DataZone is a data management service that makes it quick and convenient to catalog, discover, share, and govern data stored in AWS, on-premises, and third-party sources. Amazon DataZone allows you to create and manage data zones, which are virtual data lakes that store and process your data, without the need for extensive coding or infrastructure management. Amazon DataZone makes it straightforward for engineers, data scientists, product managers, analysts, and business users to access data throughout an organization so they can discover, use, and collaborate to derive data-driven insights.

Amazon SageMaker Canvas is a no-code machine learning (ML) service that empowers business analysts and domain experts to build, train, and deploy ML models without writing a single line of code. SageMaker Canvas streamlines data ingestion from popular sources like Amazon Simple Storage Service (Amazon S3), Amazon Redshift, Amazon Athena, Snowflake, Salesforce, and Databricks, offering robust data preparation with Amazon SageMaker Data Wrangler, automated model building through Amazon SageMaker Autopilot, and a playground for using pre-built ML models, including foundation models (FMs) from Amazon Bedrock and Amazon SageMaker Jumpstart.

Enterprises can use no-code ML solutions to streamline their operations and optimize their decision-making without extensive administrative overhead. For example, when financial institutions use ML models to perform fraud detection analysis, they can use low-code and no-code solutions to enable rapid iteration of fraud detection models to improve efficiency and accuracy. However, ML governance plays a key role to make sure the data used in these models is accurate, secure, and reliable. With the integration of Amazon DataZone and Amazon SageMaker, users can set up infrastructure with security controls, collaborate on ML projects, and govern access to data and ML assets. You can use SageMaker Canvas as part of this integration to build ML models that are from approved and reliable datasets.

In this post, we show how the Amazon DataZone integration with SageMaker Canvas allows users to publish their data assets, and other builders from the same organization can search and discover the published datasets, subscribe to them, and consume the data. After you’re subscribed to a data asset, you can consume it from SageMaker Canvas, perform feature engineering, build an ML model, and then publish the model back to the Amazon DataZone project. The new governance capability that makes it straightforward to govern access to your infrastructure, data, and ML resources for the business problem being addressed.

Solution overview

In this section, we provide an overview of three personas: the data admin, data publisher, and data scientist. The data administrator is responsible for provisioning the necessary Amazon DataZone resources to enable the integration with SageMaker according to the Amazon DataZone concepts. The data admin defines the required security controls for ML infrastructure and deploys the SageMaker environment with Amazon DataZone. The data publisher is responsible for publishing and governing access for the bespoke data in the Amazon DataZone business data catalog. The data scientist discovers and subscribes to data and ML resources, accesses the data from SageMaker Canvas, prepares the data, performs feature engineering, builds an ML model, and exports the model back to the Amazon DataZone catalog. In this post, we use a banking dataset that has data related to direct marketing campaigns for a banking institution. This dataset contains continuous, integer, and categorical variables that are used to predict whether the client will subscribe to a term deposit. The following diagram illustrates the workflow.

Prerequisites

Before you can start using the SageMaker and Amazon DataZone integration, you must have the following:

  • An AWS account with appropriate permissions to create and manage resources in SageMaker and Amazon DataZone.
  • An Amazon DataZone domain and an associated Amazon DataZone project configured in your AWS account.
  • Familiarity with SageMaker and its components, such as Amazon SageMaker Studio, SageMaker Canvas, and SageMaker notebooks.
  • The sample dataset
  • Upload the dataset to Amazon S3 and crawl the data to create an AWS Glue database and tables. For instructions to catalog the data, refer to Populating the AWS Glue Data Catalog.

Data admin steps on Amazon DataZone

As a data administrator, you need to set up the necessary Amazon DataZone resources to enable the integration with SageMaker. Follow the steps outlined in Amazon DataZone quickstart with AWS Glue data or refer to the following video to set up an Amazon DataZone domain, enable SageMaker and data lake blueprints, create Amazon DataZone projects (for publishing data assets and to subscribe data assets from the data catalog), and provision default SageMaker and default data lake environments in the respective projects. The data lake environment is required to configure an AWS Glue database table, which is used to publish an asset in the Amazon DataZone catalog. The following video demonstrates how to configure the data source (from an AWS Glue database) and publish the dataset in the Amazon DataZone catalog.

Prior to initiating the data scientist workflow, the following prerequisites are required to be in place for the DataZone project:

  • An Amazon DataZone project named Banking-Consumer-ML, which is used in the data scientist workflow.
  • A SageMaker environment profile with the default SageMaker blueprint.
  • A SageMaker environment based on the SageMaker environment profile, which allows the data scientist to launch SageMaker Studio from the Amazon DataZone project console.
  • A data asset named Bank that contains the customer data from a banking institution that captures the demographic, financial, and marketing campaign data for the bank’s customers. The data asset is already published in the Amazon DataZone data catalog and can be searched from any project created under the Amazon DataZone domain.

Data scientist workflow

In this section, we demonstrate how a data scientist subscribes to an existing data asset from the SageMaker Studio asset catalog, imports the dataset to SageMaker Canvas, builds an ML model, and publishes the model back to the Amazon DataZone data catalog, which can be reused across the projects in the domain. As the data scientist, complete the following steps:

  1. In the Environments section of the Banking-Consumer-ML project, choose SageMaker Studio.

  1. Choose Assets in the navigation pane.
  2. On the Asset catalog tab, search for and choose the data asset Bank.

You can view the metadata and schema of the banking dataset to understand the data attributes and columns.

  1. To raise a request to subscribe to the dataset, choose Subscribe.
  2. Enter a reason for the request and choose Submit.

After the data scientist raises a subscription request, a subscription request is created and a notification is sent for approval from the asset publishing project.

The data publisher for the asset publishing project views the subscription request by navigating to the data owning project console and choosing Incoming requests under Published data in the navigation pane. The data publisher chooses View request to view the request and, based on the organization’s data access policy, approves the incoming subscription request.

The data publisher can view the subscription status for the asset and is also able to revoke and remove subscription access anytime from the data publishing project console.

The data publisher can also view and approve the request under Manage asset requests on the SageMaker Studio Assets page.

On the Assets page, the Bank dataset that the data scientist subscribed to is now visible.

  1. Under Applications in the navigation pane, choose Canvas, then choose Open Canvas to launch SageMaker Canvas from SageMaker Studio.

  1. Choose Data Wrangler in the navigation pane.
  2. On the Import and prepare dropdown menu, choose Tabular.

SageMaker Data Wrangler simplifies the process of data preparation and feature engineering, and enables the completion of each step of the data preparation workflow (including data selection, cleansing, exploration, visualization, and processing at scale) from a single visual interface.

  1. For Select a data source, choose Athena.

Athena is a serverless, interactive analytics service that provides a simplified and flexible way to analyze petabytes of data where it lives. Because the data source for the banking dataset is a database created in the AWS Glue Data Catalog using an AWS Glue crawler, the data is queried using Athena in SageMaker Data Wrangler. With this step, the data scientist can import the data into the Data Wrangler tool to perform feature engineering and prepare the data for ML modeling.

  1. Expand bankmarketing and drag and drop the bank dataset into the canvas.

SageMaker Canvas loads the selected dataset in the Import preview section. The banking dataset contains information about bank clients such as age, job, marital status, education, credit default status, and details about the marketing campaign contacts like communication type, duration, number of contacts, and outcome of the previous campaign.

  1. Choose Import to import the dataset into SageMaker Data Wrangler.

A new data flow is created on the Data Wrangler console.

  1. Choose Get data insights to identify potential data quality issues and get recommendations.

  1. In the Create analysis pane, provide the following information:
    1. For Analysis type, choose Data Quality And Insights Report.
    2. For Analysis name, enter a name.
    3. For Problem type, select Classification.
    4. For Target column, enter y.
    5. For Data size, select Sampled dataset (20k).
    6. Choose Create.

You can review the generated Data Quality and Insights Report to gain a deeper understanding of the data, including statistics, duplicates, anomalies, missing values, outliers, target leakage, data imbalance, and more. If you’re satisfied with the data based on the generated report, you can continue with the data scientist workflow. Refer to Accelerate data preparation for ML in Amazon SageMaker Canvas for a deeper understanding of the process to prepare data for end-to-end model building.

  1. On the options menu (three dots), choose Create model to create a dataset.

  1. Enter a name for the dataset (for example, Banking-Customer-DataSet), then choose Export.

After the dataset is exported, a confirmation message is displayed on the console.

  1. Choose Create model to continue.

The exported dataset is also visible on the Datasets page on the SageMaker Canvas console. Here, you can alternatively select the dataset and choose Create a model to continue.

  1. In the Create new model section, provide the following information:
    1. For Model name, enter a name for the model (for example, Banking-Customer-Prediction-Model).
    2. For Problem type, select Predictive analysis.
    3. Choose Create.

The objective of the model is to predict whether a customer is likely to subscribe for the bank’s term deposit (variable y).

  1. On the Build tab, for Target column, choose the column that the model intends to predict.
  2. Choose Preview model.

The Preview model option runs a quick build of the binary classification model for a subset of data for 10–15 minutes to preview the outcome before running the full build, which typically takes around 4 hours or longer. Optionally, you can choose the Configure model option to customize the ML model.

With the Configure model option, you can customize the model type, objective metric, training method, and training/testing data split, and set limits on model creation job runtime.

SageMaker Canvas runs the preview model and displays the outcome that shows the estimated accuracy (%) and a list of dataset features in descending order of importance. You can observe that columns duration, pdays, month, and housing are the dominant features that impact the model’s prediction.

Optionally, you can choose the View all option on the Build tab to get a full list of options to perform feature transformation and data wrangling, such as dropping unimportant columns, dropping duplicate data, replacing missing values, changing data types, and combining columns to create new columns. This allows you to perform feature engineering before building the model.

  1. Choose Standard build to start the model building process.

You can monitor the progress of model creation.

When the model is complete, the model status is shown along with Overview, Scoring, and Advanced metrics options.

You can review the model status and test the model on the Predict tab. With the prediction option, you can perform either a batch or single prediction and test the model.

  1. On the options menu (three dots), choose Add to Model Registry to register the model using Amazon SageMaker Model Registry.

  1. Enter a group name (for this post, canvas-Banking-Customer-Prediction-Model) and choose Add.

Subsequent builds of the ML model are versioned and are stored under the same group name in the SageMaker Studio model registry.

  1. On the SageMaker Studio console, choose Models in the navigation to view the model you just added to the model registry.
  2. On the Model Groups tab, select the published model version and on the options menu (three dots), choose Update model status.

  1. For Status, choose Approved, then choose Save and update.

  1. Select the approved model and on the options menu (three dots), choose Publish to asset catalog.
  2. After the status is updated, choose View asset to view the published asset.

Alternatively, choose Assets in the navigation pane and on the Asset catalog tab, view the published model by searching the catalog or filtering by the asset type.

The published ML model is also accessible from the Amazon DataZone data portal. Navigate to the Banking-Consumer-ML project and choose Published data to view the details of the ML model published from SageMaker Canvas.

The published model can also be subscribed to from other projects from the Amazon DataZone domain.

Clean up

We recommend deleting any potentially unused resources to avoid incurring unexpected costs. For example, you can delete the Amazon DataZone domain and log out of SageMaker Canvas to automatically delete the workspace instance.

Conclusion

In this post, we covered an end-to-end integration of SageMaker Canvas and Amazon DataZone, including infrastructure controls, sharing and consuming data assets, and creating and publishing ML models. This integration provides a powerful solution for data governance, collaboration, and reusability across ML projects. With Amazon DataZone, data administrators can publish and govern access to data assets, and data scientists can discover, subscribe to, and consume those datasets within SageMaker Canvas. This streamlined workflow enables efficient collaboration between data providers and consumers. Moreover, the ability to publish trained ML models back to the Amazon DataZone catalog promotes reusability, allowing models to be discovered and subscribed to by other teams or projects within the organization. This approach reduces duplication of effort and fosters knowledge sharing across the ML lifecycle.

You can extend this solution to generative artificial intelligence (AI) use cases as well. For example, large language models (LLMs) or other FMs trained on curated datasets can be published and shared through Amazon DataZone, enabling different teams to fine-tune or adapt these models for their specific applications while adhering to robust governance policies. This empowers organizations to unlock the full potential of ML and generative AI while maintaining control and oversight over their data assets.

Try out the new Amazon DataZone integration with SageMaker Canvas today to search and discover the published datasets from an Amazon DataZone project, subscribe to and consume data from SageMaker Canvas, perform feature engineering, build an ML model, and then publish the model back to the Amazon DataZone project.


About the authors

Aparajithan Vaidyanathan is a Principal Enterprise Solutions Architect at AWS. He supports enterprise customers migrate and modernize their workloads on AWS cloud. He is a Cloud Architect with 24+ years of experience designing and developing enterprise, large-scale and distributed software systems. He specializes in Machine Learning & Data Analytics with focus on Data and Feature Engineering domain. He is an aspiring marathon runner and his hobbies include hiking, bike riding and spending time with his wife and two boys.

Ajjay Govindaram is a Senior Solutions Architect at AWS. He works with strategic customers who are using AI/ML to solve complex business problems. His experience lies in providing technical direction as well as design assistance for modest to large-scale AI/ML application deployments. His knowledge ranges from application architecture to big data, analytics, and machine learning. He enjoys listening to music while resting, experiencing the outdoors, and spending time with his loved ones.

Siamak Nariman is a Senior Product Manager at AWS. He is focused on AI/ML technology, ML model management, and ML governance to improve overall organizational efficiency and productivity. He has extensive experience automating processes and deploying various technologies.

Huong Nguyen is a Sr. Product Manager at AWS. She is leading the ML data preparation for SageMaker Canvas and SageMaker Data Wrangler, with 15 years of experience building customer-centric and data-driven products.

Read More

Accelerate performance using a custom chunking mechanism with Amazon Bedrock

Accelerate performance using a custom chunking mechanism with Amazon Bedrock

This post is co-written with Kristina Olesova, Zdenko Esetok, and Selimcan akar from Accenture.

In today’s data-driven world, organizations often face the challenge of extracting structured information from unstructured PDF documents. These PDFs can contain a myriad of elements, such as images, tables, headers, and text formatted in various styles, making it difficult to parse and analyze the data efficiently.

Additionally, the performance of chatbots and other natural language processing (NLP) applications depends heavily on the chunking strategy employed. Improper chunking can lead to loss of context, resulting in hallucinations and inaccurate responses. Also, the performance of language models is further influenced by the chunk size, where smaller chunks provide more granular information but struggle with generalization, whereas larger chunks might miss important details.

This post explores how Accenture used the customization capabilities of Knowledge Bases for Amazon Bedrock to incorporate their data processing workflow and custom logic to create a custom chunking mechanism that enhances the performance of Retrieval Augmented Generation (RAG) and unlock the potential of your PDF data.

Solution overview

The Accenture team created a knowledge base with the financial results of Accenture for every quarter from 2020–2024. This document contained images, tables, text stored in different formats, and other noise elements.

In this use case, we wanted to extract granular information contained in the tables and also preserve the good generalization capabilities of foundation models (FMs) to respond to general questions about financial results.

After testing, we found that the search mechanism wasn’t able to correctly retrieve the information for the years and quarters specified in the prompt. The following screenshot shows an example where the query was for information from the first quarter of 2023, but the search mechanism returned information from the first quarter of 2020.

We couldn’t extract the correct chunk of data using different search strategies or by changing the number of retrieved chunks. After more vigorous testing, we identified struggles with parsing the tabular information and retrieving the correct data. Because the issues were related to the inability of the search algorithm to select the correct chunks, we decided to change the chunking strategy and try the new features in Amazon Bedrock.

The architectural flow of the updated solution is as follows:

  1. Begin by creating a data source with all the data stored in Amazon Simple Storage Service (Amazon S3) or another database. This can include custom PDFs with tables, forms, and other complex elements.
  2. Run Amazon Textract on the PDFs stored in your data source. Amazon Textract is a highly accurate service that can extract text, tables, and other data from virtually any document.
  3. Create chunks based on the extractions from paragraphs in the Amazon Textract output. For every chunk, include additional metadata such as chapter titles and document names to preserve context.
  4. Embed the chunked files into vectors using the console for Knowledge Bases for Amazon Bedrock. Select no chunking while creating a vector representation of chunks.
  5. Set up the system prompt, search strategies, number of chunks, and metadata filtering if applicable and ask the user for a question.
  6. Use the vector-search feature of Amazon OpenSearch Service to select the most similar embedded chunks to the user query (prompt)prompt.
  7. Call a FM from Amazon Bedrock on the chunks provided by OpenSearch Service and get the answer.

The steps in the workflow are orchestrated using AWS Lambda, as shown in the following diagram.

The chunking mechanism uses Amazon Textract to detect paragraphs, tables, images, chapter titles, and other PDF layout elements to improve the chunking (without splitting the text in the middle of a sentence or paragraph), eliminate noise, and provide more context for metadata generation. We can use this metadata directly during filtration or as a hint in a prompt template to improve the accuracy of the generated response. Using the specified logic for every PDF element, we can take the correct actions depending on the category of the element.

The main PDF elements are as follows:

  • Tables – Tables are the most difficult layout elements in a PDF. The information can be correctly extracted only when headers and column names are correctly identified. This is difficult to achieve with fixed size chunking because there is no way to guarantee that headers will be present in the chunk, together with all the row information. We can use table detection to extract a table and save it in a CSV file, or even directly use it in a database as a data source for agents.
  • Images – If the text contains images connected to user instructions, the images can be detected and tagged during preprocessing. Later, these images can be stored in Amazon S3 and displayed in a chat window using relevant tags.
  • Page numbers, headers, and footers – This text information doesn’t bring any valuable information for RAG models, and it can confuse them significantly. Moreover, storing page headers and footers can take up significant space in the vector database and incur significant cost with negligible benefits.
  • Chapter titles and subtitles – In many documents, chapter titles describe the context of the chapter. This information can help us tag the chunks using metadata, or directly include this information in the filtering process, thereby improving the accuracy and speed of extraction.

Use custom chunking with Knowledge Bases in Amazon Bedrock

In this section, we demonstrate how to use the proposed custom chunking solution.

Note: Keep in mind that the content and code provided is for informational purpose only. You should do an independent assessment before running anything in response to the information that follows.

This involves the following steps:

  1. Specify the custom metadata for every financial document that you want to include in the analysis. For this post, we specified the information for quarter, fiscal year, company, and other fields:
metadata = {
"metadataAttributes": {
"document_name": document_name.split(".pdf")[0],
"fiscal_year": fiscal_year,
"quarter":quarter,
"main_topic": "",
"secondary_topic": " ",
"format": "Text"
}
}
  1. Split the PDF files into multiple images or single PDF files. It’s important to have high resolution to properly distinguish all the characters within the files.
  2. Invoke Amazon Textract to detect the layout items and table items:
def textract_data(self,output):
image = Image.open(output)

document = self.extractor.analyze_document(
file_source=image,
features=[TextractFeatures.LAYOUT,TextractFeatures.TABLES],
save_image=True
)

new_layout=self.save_table(document)
self.save_text(new_layout)
  1. Save the table information. In this example, we’re using Anthropic’s Claude models, which are able to correctly parse files in CSV format. Export all the tables detected as a CSV, and save the table names and specified table format as additional metadata:
def save_table(self, document):
table_count = 0
if document.tables:
for layout in document.layouts:
if layout.layout_type in 'LAYOUT_TITLE':
self.metadata["metadataAttributes"]["main_topic"] = layout.text
elif layout.layout_type == 'LAYOUT_SECTION_HEADER':
self.metadata["metadataAttributes"]["secondary_topic"] = layout.text
elif layout.layout_type == 'LAYOUT_TABLE':
table = document.tables[table_count]
df_table = table.to_pandas()
self.metadata["metadataAttributes"]["format"] = "Table"

t_file=self.tables_directory + f'/{self.document_name}_table_p{self.page_number}_t{table_count}.csv'

with open(t_file,'w') as csv_file:
csv_file.write(df_table.to_csv(index=False, header=False))
with open(t_file + ".metadata.json",'w') as json_file:
json.dump(self.metadata, json_file)
table_count = table_count + 1
  1. Further processing is required for information other than tables and images. We create metadata tags containing the information about main chapter titles and subtitles. This information can help you boost performance using metadata filtering or during vector search using a system prompt. For every chunk of data, specify within the metadata to which chapter and subchapter it belongs. Ideally, you should always have one chunk of data for every subchapter, but this isn’t always possible. Many subchapters are too long to be parsed with one chunk. In such cases, you can split the text after the paragraph and use the same metadata for another chunk:
for layout in document:

if layout.layout_type in 'LAYOUT_TITLE':
self.metadata["metadataAttributes"]["main_topic"] = layout.text
elif layout.layout_type == 'LAYOUT_SECTION_HEADER': // split text at the beggining of every subchapter
self.create_chunk() //save previous chunk in chunk_dic
for chunk in self.chunk_dic: // save all of the chunks for given chapter
self.metadata["metadataAttributes"]["format"] = "Text"
with open(chunk["output_path"], 'w') as text_file: //create txt file with specified text
text_file.write(chunk["text"] + str(chunk['metadata']))
with open(chunk["output_path"] + ".metadata.json", 'w') as json_file: //create metadata file for given chunk
json.dump(chunk['metadata'], json_file)
self.subtitle = []
self.chunk_dic = []

self.metadata["metadataAttributes"]["secondary_topic"] = layout.text

elif layout.layout_type in ['LAYOUT_LIST', 'LAYOUT_TEXT']:
if (len(self.new_chunk + layout.text) > chunk_max) and (len(self.new_chunk) > chunk_min): // if the text within chapter is too big split it at the end of paragraph
self.create_chunk()
self.new_chunk = self.new_chunk + layout.text

The benefit of this method is that, even if the text continues on the next page, this mechanism is able to assign it to the correct chunk (if the text is within the limited vector space). This helps prevent splitting the text in the middle of a sentence, which can often lead to hallucinations.

  1. After the text is split, create two files for every chunk:
    1. A .txt chunk file together with the metadata string.
    2. A metadata.json file that can be used with the knowledge base metadata and filtering.
  2. When the split is complete, upload the files to Amazon S3 and continue with creating the knowledge base using the no chunking option.

When using the custom chunk option, keep in mind the maximum size of possible chunks. If the text chunk is too large, the vectorization of the files will fail, and the file won’t be available for the knowledge base.

Benefits of custom chunking

Custom chunking offers the following benefits:

  • Context preservation – By chunking text based on chapters or subchapters, you can make sure that the context of each section remains relevant throughout the chunk, resulting in more accurate vector representations and reducing noise.
  • Flexible chunk sizes – Custom chunking allows you to dynamically adjust the chunk sizes, addressing the challenge of selecting the optimal chunk size for different use cases.
  • Improved retrieval performance – With custom chunking and the advanced retrieval capabilities of Amazon Bedrock such as metadata filtering, you can significantly enhance the performance of your retrieval frameworks, enabling faster and more accurate insights.
  • Seamless integration – Amazon Bedrock seamlessly integrates with other AWS services, such as Amazon S3 and Amazon Textract, providing a streamlined solution for data extraction, organization, and analysis.

Metadata filtering compared to system prompts

Metadata filtering is a powerful feature that significantly enhances the search algorithm’s performance. By using metadata filtering to specify fiscal years and quarters, we achieved notable improvements in response accuracy. Currently, the Amazon Bedrock console requires users to have prior knowledge of metadata filter names and their corresponding values. As of this writing, direct specification of these filters through prompts isn’t supported. Consequently, in practical applications, users would benefit from guidance or hints to assist them in selecting appropriate filter values.

The following figure shows an example of enabling metadata filtering for the same model and chunking logic. In the first question, using only the prompts, the search algorithm failed to provide chunks from the correct documents. In the second question, we filtered by fiscal year (2023) and quarter (Q3). The output of the search algorithm was just one chunk, but the correct one.

Performance comparison

We compared fixed chunking, custom chunking, and custom chunking with prompts. For vectorization, we used the Amazon Titan Embeddings Text v1 model for custom chunking, baseline, and metadata filtering. We performed additional knowledge base testing with Cohere. We performed all the testing with the Claude Sonnet 3 model and hybrid search, with a maximum retrieved result of 20.

We tested the performance of the models on several tasks:

  • Table information – Information only extractable from tables.
  • Long questions – Summarizing chapters using multiple chunks. This is a difficult task for models with a small embedding window.
  • Year-specific questions – The answers are very short and clear, but the correct extraction relies on the capability of vector search to determinate the time span from the user question and extract the chunk corresponding to a given time span.

We evaluated the performance manually by checking factually against the information generated by the model with the source data. The following screenshots show some example questions and answers generated on two different knowledge bases for the year_sensitive class.

The first example uses custom chunking with an Amazon Titan Embeddings model.

The next example uses Cohere with fixed chunking.

We used the prompt template feature released in April 2024 to focus the model on detailed information regarding the fiscal years and quarters. This information was the same as it was in the metadata JSON file, and it gives the models some guidelines about what information is important for extracting the valid chunks. The following is an example of the system prompt:

User:

You are a question answering agent specilizing in companies financial statements and reviews. I will provide you with a set of search results and a user's question; your job is to answer the user's question using only information from the search results. Before answering the question, think step by step and verify your response based on the metadataAttributes provided in {} brackets. If provided in the user’s question, always check that the fiscal_year and quartal match with the values provided. In case of the user asking specific questions about financial outcome of a specific group (such as revenues or net income) focus on search results that have "Table" specified in the format tag in metadataAttributes. To improve the results, you can verify the values of main and secondary topics. The values should be related to the user’s questions.

Here are the search results in numbered order:
$search_results$

Here is the user's question:
<question>
$query$
</question>
$output_format_instructions$

Assistant:

The adjusted prompt template improved the accuracy of the results. For the knowledge base created with an Amazon Titan Embeddings model and fixed chunking, the accuracy of extracted results increased to 70 percent accuracy. This number served as a baseline for our evaluation.

After switching from fixed chunking to custom chunking with Amazon Titan, the accuracy of retrieved results increased by 17 percent.

Interestingly, Cohere led to similar results as using custom chunking with regards to response accuracy, but showed slightly less precise richness in summarization (long answers).

Summarization means condensing a long piece of text while retaining its essential information and meaning by capturing the main points, key ideas, and important details.

The following screenshots show some sample answers in the long answers category. The first example is the output from Cohere.

The following is the output using custom chunking.

Cohere uses smaller chunks of text for embedding, which make it more precise, but it struggles to provide a detailed summary. The responses aren’t inaccurate, but they often miss important details and the created answers are slightly ambiguous.

The biggest advantage of custom chunking is that saving the chunks with variable size helped us improve the accuracy of the model (compared to the original Amazon Titan Embeddings model). We also preserved the good summarization capabilities of the models by using bigger chunks when possible. Overall, the best performance was achieved using metadata filtering.

We applied metadata filtering only to the questions where it was applicable (where the user was asking about the specific year or quarter). It didn’t help in cases where the question was asking the model to extract information from multiple years (like the number of employees in every year or the revenue in every quarter). However, it’s still a great tool that can improve results significantly.

Clean up

As you conclude your journey through setting up and using the knowledge base in this post, it’s essential to clean up the resources you created, so your environment is clean and cost-efficient.

Decommission OpenSearch Service

First, you need to decommission OpenSearch Service. This process involves safely shutting down your OpenSearch instances to prevent any unintended data retention or unnecessary costs:

  1. On the OpenSearch Service console, navigate to your domain.
  2. Delete the domain and confirm the deletion when prompted.

Empty and delete the S3 bucket

Next, delete the S3 bucket that stored your data:

  1. On the Amazon S3 console, navigate to your S3 bucket.
  2. Delete the files to empty the bucket.
  3. Delete the bucket, confirming the deletion when prompted to permanently remove the storage resource.

Delete the Lambda function

Finally, you need to delete the Lambda function created for this project:

  1. On the Lambda console, select your function and choose Delete.
  2. Confirm the deletion to remove the function and free up resources.

By following these steps, you have cleaned up the resources created during this post, maintaining a lean and cost-effective AWS environment. This not only helps in managing your resources better, but also makes sure that you’re only paying for what you use.

Conclusion

By combining the power of Knowledge Bases for Amazon Bedrock with custom chunking mechanisms and the advanced data extraction capabilities of Amazon Textract, organizations can unlock the true potential of their PDF data. Furthermore, using a knowledge base with custom chunking for different models provides holistic evaluation of models quickly. This solution helps you achieve accurate and contextual responses, improves the performance of retrieval frameworks, and enables efficient data extraction from unstructured PDF documents.

The joint effort between Accenture and AWS discussed in this post builds on the 15-year strategic relationship between the companies and uses the same proven mechanisms and accelerators built by the Accenture AWS Business Group (AABG). Connect with the AABG team at accentureaws@amazon.com to drive business outcomes by transforming to an intelligent data enterprise on AWS.

For more information about generative AI on AWS using Amazon Bedrock or Amazon SageMaker, we recommend the following resources:

You can also sign up for the AWS generative AI newsletter, which includes educational resources, blogs, and service updates.

Thank you for following along, and happy coding!


About the Authors

Kristina Olesova works as a Data Scientist at Accenture. She is focused primarily on computer vision and generative AI. Outside of work, she likes to read books and hike in the mountains.

Zdenko Estok works as a cloud architect and DevOps engineer at Accenture. He works with AABG to develop and implement innovative cloud solutions, and specializes in infrastructure as code and cloud security. Zdenko likes to bike to the office and enjoys pleasant walks in nature.

Selimcan “Can” Sakar is a cloud-first developer and solution architect at Accenture with a focus on artificial intelligence and a passion for watching models converge.

Shikhar Kwatra is a Sr. Partner Solutions Architect at Amazon Web Services, working with leading Global System Integrators. He has earned the title of one of the Youngest Indian Master Inventors with over 500 patents in the AI/ML and IoT domains. Shikhar aids in architecting, building, and maintaining cost-efficient, scalable cloud environments for the organization, and support the GSI partners in building strategic industry solutions on AWS.

Marcelo Silva PhotoMarcelo Silva is a Principal Product Manager at Amazon Web Services leading strategy and growth for Knowledge Bases for Amazon Bedrock and Amazon Lex. His passion is helping customers harness the power of conversational AI and generative AI solutions to drive business outcomes and growth.

Read More

Migrate Amazon SageMaker Data Wrangler flows to Amazon SageMaker Canvas for faster data preparation

Migrate Amazon SageMaker Data Wrangler flows to Amazon SageMaker Canvas for faster data preparation

Amazon SageMaker Data Wrangler provides a visual interface to streamline and accelerate data preparation for machine learning (ML), which is often the most time-consuming and tedious task in ML projects. Amazon SageMaker Canvas is a low-code no-code visual interface to build and deploy ML models without the need to write code. Based on customers’ feedback, we have combined the advanced ML-specific data preparation capabilities of SageMaker Data Wrangler inside SageMaker Canvas, providing users with an end-to-end, no-code workspace for preparing data, and building and deploying ML models.

By abstracting away much of the complexity of the ML workflow, SageMaker Canvas enables you to prepare data, then build or use a model to generate highly accurate business insights without writing code. Additionally, preparing data in SageMaker Canvas offers many enhancements, such as page loads up to 10 times faster, a natural language interface for data preparation, the ability to view the data size and shape at every step, and improved replace and reorder transforms to iterate on a data flow. Finally, you can one-click create a model in the same interface, or create a SageMaker Canvas dataset to fine-tune foundation models (FMs).

This post demonstrates how you can bring your existing SageMaker Data Wrangler flows—the instructions created when building data transformations—from SageMaker Studio Classic to SageMaker Canvas. We provide an example of moving files from SageMaker Studio Classic to Amazon Simple Storage Service (Amazon S3) as an intermediate step before importing them into SageMaker Canvas.

Solution overview

The high-level steps are as follows:

  1. Open a terminal in SageMaker Studio and copy the flow files to Amazon S3.
  2. Import the flow files into SageMaker Canvas from Amazon S3.

Prerequisites

In this example, we use a folder called data-wrangler-classic-flows as a staging folder for migrating flow files to Amazon S3. It is not necessary to create a migration folder, but in this example, the folder was created using the file system browser portion of SageMaker Studio Classic. After you create the folder, take care to move and consolidate relevant SageMaker Data Wrangler flow files together. In the following screenshot, three flow files necessary for migration have been moved into the folder data-wrangler-classic-flows, as seen in the left pane. One of these files, titanic.flow, is opened and visible in the right pane.

Copy flow files to Amazon S3

To copy the flow files to Amazon S3, complete the following steps:

  1. To open a new terminal in SageMaker Studio Classic, on the File menu, choose Terminal.
  2. With a new terminal open, you can supply the following commands to copy your flow files to the Amazon S3 location of your choosing (replacing NNNNNNNNNNNN with your AWS account number):
    cd data-wrangler-classic-flows
    target="s3://sagemaker-us-west-2-NNNNNNNNNNNN/data-wrangler-classic-flows/"
    aws s3 sync . $target --exclude "*.*" --include "*.flow"

The following screenshot shows an example of what the Amazon S3 sync process should look like. You will get a confirmation after all files are uploaded. You can adjust the preceding code to meet your unique input folder and Amazon S3 location needs. If you don’t want to create a folder, when you enter the terminal, simply skip the change directory (cd) command, and all flow files on your entire SageMaker Studio Classic file system will be copied to Amazon S3, regardless of origin folder.

After you upload the files to Amazon S3, you can validate that they have been copied using the Amazon S3 console. In the following screenshot, we see the original three flow files, now in an S3 bucket.

Import Data Wrangler flow files into SageMaker Canvas

To import the flow files into SageMaker Canvas, complete the following steps:

  1. On the SageMaker Studio console, choose Data Wrangler in the navigation pane.
  2. Choose Import data flows.
  3. For Select a data source, choose Amazon S3.
  4. For Input S3 endpoint, enter the Amazon S3 location you used earlier to copy files from SageMaker Studio to Amazon S3, then choose Go. You can also navigate to the Amazon S3 location using the browser below.
  5. Select the flow files to import, then choose Import.

After you import the files, the SageMaker Data Wrangler page will refresh to show the newly imported files, as shown in the following screenshot.

Use SageMaker Canvas for data transformation with SageMaker Data Wrangler

Choose one of the flows (for this example, we choose titanic.flow) to launch the SageMaker Data Wrangler transformation.

Now you can add analyses and transformations to the data flow using a visual interface (Accelerate data preparation for ML in Amazon SageMaker Canvas) or natural language interface (Use natural language to explore and prepare data with a new capability of Amazon SageMaker Canvas).

When you’re happy with the data, choose the plus sign and choose Create model, or choose Export to export the dataset to build and use ML models.

Alternate migration method

This post has provided guidance on using Amazon S3 to migrate SageMaker Data Wrangler flow files from a SageMaker Studio Classic environment. Phase 3: (Optional) Migrate data from Studio Classic to Studio provides a second method that uses your local machine to transfer the flow files. Furthermore, you can download single flow files from the SageMaker Studio tree control to your local machine, then import them manually in SageMaker Canvas. Choose the method that suits your needs and use case.

Clean up

When you’re done, shut down any running SageMaker Data Wrangler applications in SageMaker Studio Classic. To save costs, you can also remove any flow files from the SageMaker Studio Classic file browser, which is an Amazon Elastic File System (Amazon EFS) volume. You can also delete any of the intermediate files in Amazon S3. After the flow files are imported into SageMaker Canvas, the files copied to Amazon S3 are no longer needed.

You can log out of SageMaker Canvas when you’re done, then relaunch it when you’re ready to use it again.

Conclusion

Migrating your existing SageMaker Data Wrangler flows to SageMaker Canvas is a straightforward process that allows you to use the advanced data preparations you’ve already developed while taking advantage of the end-to-end, low-code no-code ML workflow of SageMaker Canvas. By following the steps outlined in this post, you can seamlessly transition your data wrangling artifacts to the SageMaker Canvas environment, streamlining your ML projects and enabling business analysts and non-technical users to build and deploy models more efficiently.

Start exploring SageMaker Canvas today and experience the power of a unified platform for data preparation, model building, and deployment!


About the Authors

Charles Laughlin is a Principal AI Specialist at Amazon Web Services (AWS). Charles holds an MS in Supply Chain Management and a PhD in Data Science. Charles works in the Amazon SageMaker service team where he brings research and voice of the customer to inform the service roadmap. In his work, he collaborates daily with diverse AWS customers to help transform their businesses with cutting-edge AWS technologies and thought leadership.

Dan Sinnreich is a Sr. Product Manager for Amazon SageMaker, focused on expanding no-code / low-code services. He is dedicated to making ML and generative AI more accessible and applying them to solve challenging problems. Outside of work, he can be found playing hockey, scuba diving, and reading science fiction.

Huong Nguyen is a Sr. Product Manager at AWS. She is leading the ML data preparation for SageMaker Canvas and SageMaker Data Wrangler, with 15 years of experience building customer-centric and data-driven products.

Davide Gallitelli is a Specialist Solutions Architect for AI/ML in the EMEA region. He is based in Brussels and works closely with customer throughout Benelux. He has been a developer since very young, starting to code at the age of 7. He started learning AI/ML in his later years of university, and has fallen in love with it since then.get confirmation

Read More

Use IP-restricted presigned URLs to enhance security in Amazon SageMaker Ground Truth

Amazon SageMaker Ground Truth significantly reduces the cost and time required for labeling data by integrating human annotators with machine learning to automate the labeling process. You can use SageMaker Ground Truth to create labeling jobs, which are workflows where data objects (such as images, videos, or documents) need to be annotated by human workers. These labeling jobs are distributed among a workteam—a group of workers assigned to perform the annotations. To access the data objects they need to label, workers are provided with Amazon S3 presigned URLs.

A presigned URL is a temporary URL that grants time-limited access to an Amazon Simple Storage Service (Amazon S3) object. In the context of SageMaker Ground Truth, these presigned URLs are generated using the grant_read_access Liquid filter and embedded into the task templates. Workers can then use these URLs to directly access the necessary files, such as images or documents, in their web browsers for annotation purposes.

While presigned URLs offer a convenient way to grant temporary access to S3 objects, sharing these URLs with people outside of the workteam can lead to unintended access of those objects. To mitigate this risk and enhance the security of SageMaker Ground Truth labeling tasks, we have introduced a new feature that adds an additional layer of security by restricting access to the presigned URLs to the worker’s IP address or virtual private cloud (VPC) endpoint from which they access the labeling task. In this blog post, we show you how to enable this feature, allowing you to enhance your data security as needed, and outline the success criteria for this feature, including the scenarios where it will be most beneficial.

Prerequisites

Before you get started configuring IP-restricted presigned URLs, the following resources can help you understand the background concepts:

  • Amazon S3 presigned URL: This documentation covers the use of Amazon S3 presigned URLs, which provide temporary access to objects. Understanding how presigned URLs work will be beneficial.
  • Use Amazon SageMaker Ground Truth to label data: This guide explains how to use SageMaker Ground Truth for data labeling tasks, including setting up workteams and workforces. Familiarity with these concepts will be helpful when configuring IP restrictions for your workteams.

Introducing IP-restricted presigned URLs

Working closely with our customers, we recognized the need for enhanced security posture and stricter access controls to presigned URLs. So, we introduced a new feature that uses AWS global condition context keys aws:SourceIp and aws:VpcSourceIp to allow customers to restrict presigned URL access to specific IP addresses or VPC endpoints. By incorporating AWS Identity and Access Management (IAM) policy constraints, you can now restrict presigned URLs to only be accessible from an IP address or VPC endpoint of your choice. This IP-based access control effectively locks down the presigned URL to the worker’s location, mitigating the risk of unauthorized access or unintended sharing.

Benefits of the new feature

This update brings several significant security benefits to SageMaker Ground Truth:

  • Enhanced data privacy: These IP restrictions restrict presigned URLs to only be accessible from customer-approved locations, such as corporate VPNs, workers’ home networks, or designated VPC endpoints. Although the presigned URLs are pre-authenticated, this feature adds an additional layer of security by verifying the access location and locking the URL to that location until the task is completed.
  • Reduced risk of unauthorized access: Enforcing IP-based access controls minimizes the risk of data being accessed from unauthorized locations and mitigates the risk of data sharing outside the worker’s approved access network. This is particularly important when dealing with sensitive or confidential data.
  • Flexible security options: You can apply these restrictions in either VPC or non-VPC settings, allowing you to tailor security measures to your organization’s specific needs.
  • Auditing and compliance: By locking down presigned URLs to specific IP addresses or VPC endpoints, you can more easily track and audit access to your organization’s data, helping achieve compliance with internal policies and external regulations.
  • Seamless integration: This new feature seamlessly integrates with existing SageMaker Ground Truth workflows, providing enhanced security without disrupting established labeling processes or requiring significant changes to existing infrastructure.

By introducing IP-Restricted presigned URLs, SageMaker Ground Truth empowers you with greater control over data access, so sensitive information remains accessible only to authorized workers within approved locations.

Configuring IP-restricted presigned URLs for SageMaker Ground Truth

The new IP restriction feature for presigned URLs in SageMaker Ground Truth can be enabled through the SageMaker API or the AWS Command Line Interface (AWS CLI). Before we go into the configuration of this new feature, let’s look at how you can create and update workteams today using the AWS CLI. You can also perform these operations through the SageMaker API using the AWS SDK.

Here’s an example of creating a new workteam using the create-workteam command:

aws sagemaker create-workteam 
    --description "A team for image labeling tasks" 
    --workforce-name "default" 
    --workteam-name "MyWorkteam" 
    --member-definitions '{
        "CognitoMemberDefinition": {
            "ClientId": "exampleclientid",
            "UserGroup": "sagemaker-groundtruth-user-group",
            "UserPool": "us-west-2_examplepool"
        }
    }'

To update an existing workteam, you use the update-workteam command:

aws sagemaker update-workteam 
    --workteam-name "MyWorkteam" 
    --description "Updated description for image labeling tasks"

Note that these examples only show a subset of the available parameters for the create-workteam and update-workteam APIs. You can find detailed documentation and examples in the SageMaker Ground Truth Developer Guide.

Enabling IP restrictions for presigned URLs

With the new IP restriction feature, you can now configure IP-based access constraints specific to each workteam when creating a new workteam or modifying an existing one. Here’s how you can enable these restrictions:

  1. When creating or updating a workteam, you can specify a WorkerAccessConfiguration object, which defines access constraints for the workers in that workteam.
  2. Within the WorkerAccessConfiguration, you can include an S3Presign object, which allows you to set access configurations for the presigned URLs used by the workers. Currently, only IamPolicyConstraints can be added to the S3Presign SageMaker Ground Truth provides two Liquid filters that you can use in your custom worker task templates to generate presigned URLs:
    • grant_read_access: This filter generates a presigned URL for the specified S3 object, granting temporary read access. The command will look like:
      <!-- Using grant_read_access filter -->
      <img src="{{ s3://bucket-name/path/to/image.jpg | grant_read_access }}"/>

    • s3_presign: This new filter serves the same purpose as grant_read_access but makes it clear that the generated URL is subject to the S3Presign configuration defined for the workteam. The command will look like:
      <!-- Using s3_presign filter (equivalent) -->
      <img src="{{ s3://bucket-name/path/to/image.jpg | s3_presign }}"/>

  3. The S3Presign object supports IamPolicyConstraints, where you can enable or disable the SourceIp and VpcSourceIp
    • SourceIp: When enabled, workers can access presigned URLs only from the specified IP addresses or ranges.
    • VpcSourceIp: When enabled, workers can access presigned URLs only from the specified VPC endpoints within your AWS account.

You can call the SageMaker ListWorkteams or DescribeWorkteam APIs to view workteams’ metadata, including the WorkerAccessConfiguration.

Let’s say you want to create or update a workteam so that presigned URLs will be restricted to the public IP address of the worker who originally accessed it.

Create workteam:

aws sagemaker create-workteam 
    --description "An example workteam with S3 presigned URLs restricted" 
    --workforce-name "default" 
    --workteam-name "exampleworkteam" 
    --member-definitions '{
        "CognitoMemberDefinition": {
            "ClientId": "exampleclientid",
            "UserGroup": "sagemaker-groundtruth-user-group", 
            "UserPool": "us-west-2_examplepool"
        }
    }' 
    --worker-access-configuration '{
        "S3Presign": {
            "IamPolicyConstraints": {
                "SourceIp": "Enabled",
                "VpcSourceIp": "Disabled"
            }
        }
    }'

Update workteam:

aws sagemaker update-workteam 
    --workteam-name "existingworkteam" 
    --worker-access-configuration '{
        "S3Presign": {
            "IamPolicyConstraints": {
                "SourceIp": "Enabled", 
                "VpcSourceIp": "Disabled"
            }
        }
    }'

Success criteria

While the IP-restricted presigned URLs feature provides enhanced security, there are scenarios where it might not be suitable. Understanding these limitations can help you make an informed decision about using the feature and verify that it aligns with your organization’s security needs and network configurations.

IP-restricted presigned URLs are effective in scenarios where there’s a consistent IP address used by the worker accessing SageMaker Ground Truth and the S3 object. For example, if a worker accesses labeling tasks from a stable public IP address, such as an office network with a fixed IP address, the IP restriction will provide access with enhanced security. Similarly, when a worker accesses both SageMaker Ground Truth and S3 objects through the same VPC endpoint, the IP restriction will verify that the presigned URL is only accessible from within this VPC. In both scenarios, the consistent IP address enables the IP-based access controls to function correctly, providing an additional layer of security.

Scenarios where IP-restricted presigned URLs aren’t effective

Scenario Description Example Exit criteria
Asymmetric VPC endpoints SageMaker Ground Truth is accessed through a public internet connection while Amazon S3 is accessed through a VPC endpoint, or vice versa. Worker accesses SageMaker Ground Truth through the public internet but S3 through a VPC endpoint. Verify that both SageMaker Ground Truth and S3 are accessed either entirely through the public internet or entirely through the same VPC endpoint.
Network Address Translation (NAT) layers NAT layers can alter the source IP address of requests, causing IP mismatches. Issues can arise from dynamically assigned IP addresses or asymmetric configurations. Examples include:

  • N-to-M IP translation where multiple internal IP addresses are translated to multiple public IP addresses.
  • A NAT gateway with multiple public IP addresses assigned to it, which can cause requests to appear from different IP addresses.
  • Shared IP addresses where multiple users’ traffic is routed through a single public IP address, making it difficult to enforce IP-based restrictions effectively.
Verify that the NAT gateway is configured to preserve the source IP address. Validate the NAT configuration for consistency when accessing both SageMaker Ground Truth and S3 resources.
Use of VPNs VPNs change the outgoing IP address, leading to potential access issues with IP-restricted presigned URLs. Worker uses a split-tunnel VPN that changes IP address for different requests to Ground Truth or S3, access might be denied. Disable the VPN or use a full tunnel VPN that offers consistent IP address for all requests.

Interface endpoints aren’t supported by the grant_read_access feature because of their inability to resolve public DNS names. This limitation is orthogonal to the IP restrictions and should be considered when configuring your network setup for accessing S3 objects with presigned URLs. In such cases, use the S3 Gateway endpoint when accessing S3 to verify compatibility with the public DNS names generated by grant_read_access.

Using S3 access logs for debugging

To debug issues related to IP-restricted presigned URLs, S3 access logs can provide valuable insights. By enabling access logging for your S3 bucket, you can track every request made to your S3 objects, including the IP addresses from which the requests originate. This can help you identify:

  • Mismatches between expected and actual IP addresses
  • Dynamic IP addresses or VPNs causing access issues
  • Unauthorized access from unexpected locations

To debug using S3 access logs, follow these steps:

  1. Enable S3 access logging: Configure your bucket to deliver access logs to another bucket or a logging service such as Amazon CloudWatch Logs.
  2. Review log files: Analyze the log files to identify patterns or anomalies in IP addresses, request timestamps, and error codes.
  3. Look for IP address changes: If you observe frequent changes in IP addresses within the logs, it might indicate that the worker’s IP address is dynamic or altered by a VPN or proxy.
  4. Check for NAT layer modifications: See if NAT layers are modifying the source IP address by checking the x-forwarded-for header in the log files.
  5. Verify authorized access: Confirm that requests are coming from approved and consistent IP addresses by checking the Remote IP field in the log files.

By following these steps and analyzing the S3 access logs, you can validate that the presigned URLs are accessed only from approved and consistent IP addresses.

Conclusion

The introduction of IP-restricted presigned URLs in Amazon SageMaker Ground Truth significantly enhances the security of data accessed through the service. By allowing you to restrict access to specific IP addresses or VPC endpoints, this feature helps facilitate more fine-tuned control of presigned URLs. It provides organizations with added protection for their sensitive data, offering a valuable option for those with stringent security requirements. We encourage you to explore this new security feature to protect your organization’s data and enhance the overall security of your labeling workflows. To get started with SageMaker Ground Truth, visit Getting Started. To implement IP restrictions on presigned URLs as part of your workteam setup, refer to the CreateWorkteam and UpdateWorkteam API documentation. Follow the guidance provided in this blog to configure these security measures effectively. For more information or assistance, contact your AWS account team or visit the SageMaker community forums.


About the Authors

Sundar Raghavan is an AI/ML Specialist Solutions Architect at AWS, helping customers build scalable and cost-efficient AI/ML pipelines with Human in the Loop services. In his free time, Sundar loves traveling, sports and enjoying outdoor activities with his family.

Michael Borde is a lead software engineer at Amazon AI, where he has been for seven years. He previously studied mathematics and computer science at the University of Chicago. Michael is passionate about cloud computing, distributed systems design, and digital privacy & security. After work, you can often find Michael putzing around the local powerlifting gym in Capitol Hill.

Jacky Shum is a Software Engineer at AWS in the SageMaker Ground Truth team. He works to help AWS customers leverage machine learning applications, including prior work on ML-based fraud detection with Amazon Fraud Detector.

Rohith Kodukula is a Software Development Engineer on the SageMaker Ground Truth team. In his free time he enjoys staying active and reading up on anything that he finds mildly interesting (most things really).

Abhinay Sandeboina is a Engineering Manager at AWS Human In The Loop (HIL). He has been in AWS for over 2 years and his teams are responsible for managing ML platform services. He has a decade of experience in software/ML engineering building infrastructure platforms at scale. Prior to AWS, he worked in various engineering management roles at Zillow and Capital One.

Read More

Unlock the power of structured data for enterprises using natural language with Amazon Q Business

Unlock the power of structured data for enterprises using natural language with Amazon Q Business

One of the most common applications of generative artificial intelligence (AI) and large language models (LLMs) in an enterprise environment is answering questions based on the enterprise’s knowledge corpus. Pre-trained foundation models (FMs) excel at natural language understanding (NLU) tasks, including summarization, text generation, and question answering across a wide range of topics. However, they often struggle to provide accurate answers without hallucinations and fall short when addressing questions about content that wasn’t included in their training data. Furthermore, FMs are trained with a point-in-time snapshot of data and have no inherent ability to access fresh data at inference time; therefore, they might provide responses that are incorrect or inadequate.

We face a fundamental challenge with enterprise data—overcoming the disconnect between natural language and structured data. Natural language is ambiguous and imprecise, whereas data adheres to rigid schemas. For example, SQL queries can be complex and unintuitive for non-technical users. Handling complex queries involving multiple tables, joins, and aggregations makes it difficult to interpret user intent and translate it into correct SQL operations. Domain-specific terminology further complicates the mapping process. Another challenge is accommodating the linguistic variations users employ to express the same requirement. Effectively managing synonyms, paraphrases, and alternative phrasings is important. The inherent ambiguity of natural language can also result in multiple interpretations of a single query, making it difficult to accurately understand the user’s precise intent.

To bridge this gap, you need advanced natural language processing (NLP) to map user queries to database schema, tables, and operations. In this architecture, Amazon Q Business acts as an intermediary, translating natural language into precise SQL queries. You can simply ask questions like “What were the sales for outdoor gear in Q3 2023?” Amazon Q Business analyzes intent, accesses data sources, and generates the SQL query. This simplifies data access for your non-technical users and streamlines workflows for professionals, allowing them to focus on higher-level tasks.

In this post, we discuss an architecture to query structured data using Amazon Q Business, and build out an application to query cost and usage data in Amazon Athena with Amazon Q Business. Amazon Q Business can create SQL queries to your data sources when provided with the database schema, additional metadata describing the columns and tables, and prompting instructions. You can extend this architecture to use additional data sources, query validation, and prompting techniques to cover a wider range of use cases.

Solution overview

The following figure represents the high-level architecture of the proposed solution. Steps 3 and 4 augment the AWS IAM Identity Center integration with Amazon Q Business for an authorization flow. In this architecture, we use Amazon Cognito for user authentication as well as a trusted token issuer to IAM Identity Center. You can also use your own identity provider as a trusted token issuer as long as it supports OpenID Connect (OIDC).

architecture diagram

The workflow includes the following steps:

  1. The user initiates the interaction with the Streamlit application, which is accessible through an Application Load Balancer, acting as the entry point.
  2. The application prompts the user to authenticate using their Amazon Cognito credentials, maintaining secure access.
  3. The application exchanges the token obtained from Amazon Cognito for an IAM Identity Center token, granting the necessary scope to interact with Amazon Q Business.
  4. Using the IAM Identity Center token, the application assumes an AWS Identity and Access Management (IAM) role and retrieves an AWS session from AWS Security Token Service (AWS STS), enabling authorized communication with Amazon Q Business.
  5. Based on the user’s natural language query, the application formulates relevant prompts and metadata, which are then submitted to the chat_sync API of Amazon Q Business. In response, Amazon Q Business provides an appropriate Athena query to run.
  6. The application runs the Athena query received from Amazon Q Business, and the resulting data is displayed on the web application’s UI.

Querying Amazon Q Business LLMs directly

As explained in the response settings for Amazon Q Business, there are different options to generate responses that allow you to either use your enterprise data, use LLMs directly, or fall back on the LLMs if the answer is not found in your enterprise data. Along with the global controls for response settings, you need to specify which chatMode you want to use based on your specific use case. If you want to bypass Retrieval Augmented Generation (RAG) and use plain text in the context window, you should use CREATOR_MODE. Alternatively, RAG is also bypassed when you upload files directly in the context window.

If you just use text in the context window and call Amazon Q Business APIs without switching to CREATOR_MODE, that may break your use case in the future if you add content to the index (RAG). In this use case, because we’re not indexing any data and using schemas as attachments in the API call to Amazon Q Business, RAG is automatically bypassed and the response is generated directly from the LLMs. Another reason to use attachments for this use case is that for the chatSync API, userMessage has a maximum length of 7,000, which can be surpassed depending on how large your text is in the context window.

Data query workflow

Let’s look at the prompts, query generation, and Athena query in detail. We use Athena as the data store in this post. Users enter natural language questions into a web application built with Streamlit. Amazon Q Business converts the natural language questions to valid SQL for Athena using the prompting instructions, the database schema, and data dictionary that are provided as context to the LLM. The generated SQL is sent to Athena to run as a query, and the returned data is displayed to the user in the Streamlit application. The following diagram illustrates this workflow.

query workflow

These are the various components to this data flow, as numbered in the diagram:

  1. User intent
  2. Prompt builder
  3. SQL query generator
  4. Running the query
  5. Query results

In the following sections, we look at each component in more detail.

User intent

The user intent or your inquiry is the starting point of the process. It can be in natural language, such as “What was the total spend for ElasticSearch last year?” The user’s input serves as the basis for the subsequent steps in the workflow.

Prompt builder

The prompt builder component plays a crucial role in bridging the gap between your natural language input and the structured data format required for SQL querying. It augments your question with relevant information from the table schema and data dictionary to provide context for the query generation process. This step involves the following sub-tasks:

  • Natural language processing – NLP techniques are employed to analyze and understand your questions. This includes steps like tokenization and dependency parsing to extract the intent and relevant entities from the natural language input.
  • Entity recognition – Named entity recognition (NER) is used to identify and classify relevant entities mentioned in your question, such as product names, dates, or region. This step helps map your input to the corresponding data elements in the database schema.
  • Intent mapping – The prompt builder maps your intent, extracted from the NLP analysis, to the appropriate data structures and operations required to fulfill the query. This mapping process uses the table schema and data dictionary to establish connections between your natural language questions and the database elements. The output of the prompt builder is a structured representation of your question, augmented with the necessary context from the database schema and data dictionary. This structured representation serves as input for the next step, SQL query generation.

The following is an example prompt for “What was the total spend for ElasticSearch last year?”

You will not respond to gibberish, random character sequences, or prompts that do not make logical sense. 
If the input the input does not make sense or is outside the scope of the provided context, do not respond with SQL 
but respond with - I do not know about this. Please fix your input.
You are an expert SQL developer. Only return the sql query. Do not include any verbiage. 
You are required to return SQL queries based on the provided schema and the service mappings for common services and 
their synonyms. The table with the provided schema is the only source of data. Do not use joins. Assume product, 
service are synonyms for product_servicecode and price,cost,spend are synonymns for line_item_unblended_cost. Use the 
column names from the provided schema while creating queries. Do not use preceding zeroes for the column month when 
creating the query. Only use predicates when asked. For your reference, current date is June 01, 2024. write a sql 
query for this task - What was the total spend for ElasticSearch last year?

SQL query generation

Based on the prompt generated from the prompt builder and your original question, Amazon Q Business generates the corresponding SQL query. The SQL query is tailored to retrieve the relevant data and perform the desired analysis or calculations to accurately answer the user’s question. This step may involve techniques such as:

  • Mapping your intent and entities to SQL clauses (SELECT, FROM, WHERE, JOIN, and so on)
  • Handling complex queries involving aggregations, subqueries, or predicates
  • Incorporating domain-specific knowledge or business rules into the query generation process

Running the query

In this step, the generated SQL query is run against the chosen data store, which could be a relational database, data warehouse, NoSQL database, or an object store like Amazon Simple Storage Service (Amazon S3). The data store serves as the repository for the data required to answer the user’s question. Depending on the architecture and requirements, the data store query may involve additional components or processes, such as:

  • Query optimization and indexing strategies
  • Materialized views for complex queries
  • Real-time data ingestion and updates

Query results

The query engine runs the generated SQL query against the data store and returns the query results. These results contain the insights or answers to the original user question. The presentation of the query results can take various forms, depending on the requirements of the application or UI:

  • Tabular data – The results can be displayed as a table or spreadsheet, suitable for structured data analysis
  • Visualizations – The query results can be rendered as charts, graphs, or other visual representations, providing a more intuitive way to understand and explore the data
  • Natural language responses – In some cases, the query results can be translated back into natural language statements or summaries, making the insights more accessible to non-technical users

In the following sections, we walk through the steps to deploy the web application and test the solution.

Prerequisites

Complete the following prerequisite steps:

  1. Set up IAM Identity Center and add users that you intend to give access to in your Amazon Q Business application.
  2. Have an existing, working Amazon Q Business application and give access to the users created in the previous step to the application.
  3. AWS Cost and Usage Reports (AWS CUR) data is available in Athena. If you have CUR data, you can skip the following steps for CUR data setup. If not, you have a few options to set up CUR data:
    1. To set up sample CUR data, refer to the following lab and follow the instructions.
    2. You also need to set up an AWS Glue crawler to make the data available in Athena.
  4. If you already have an SSL certificate, you can skip this step; otherwise, generate a private certificate.
  5. Import the certificate into AWS Certificate Manager (ACM). For more details, refer to Importing a certificate.

Set up the application

Complete the following steps to set up the application:

  1. From your terminal, clone the GitHub repository:
git clone https://github.com/aws-samples/data-insights-with-amazon-q-business.git
  1. Go to the project directory:
cd data-insights-with-amazon-q-business
  1. Based on your CUR table, update the CUR schema under app/schemas/cur_schema.txt. Review the prompts under app/qb_config.py. The schema looks similar to the following code:

  1. Review the data dictionary under app/schemas/service_mappings.csv. You can modify the mappings according to your dataset. A sample data dictionary for CUR might look like the following screenshot.

  1. Zip up the code repository and upload it to an S3 bucket.
  2. Follow the steps in the GitHub repo to deploy the Streamlit application.

Access the web application

As part of the deployment steps, you launched an AWS CloudFormation stack. On the AWS CloudFormation console, navigate to the Outputs tab for the stack and find the URL to access the Streamlit application. When you open the URL in a browser, you’ll see a login screen like the following screenshot. Sign up to create a user in the Amazon Cognito user pool. After you’re validated, you can use the same credentials to log in to the web application.

Query your cost and usage data

Start with a simple query like “What was the total spend for ElasticSearch this year?” A relevant prompt will be created and sent to Amazon Q Business. It will respond back with the corresponding SQL query. Notice the predicate where product_servicecode = ‘AmazonES’. Amazon Q Business is able to formulate the query because it has the schema and the data dictionary in context. It understands that ElasticSearch is an AWS service represented by a column named product_servicecode in the CUR data schema and its corresponding value of ‘AmazonES’. Next, the query is run against Athena and you get the results back.

The sample dataset used in this post is from 2023. If you’re using the sample dataset, natural language queries referring to current year will give not return results. Modify your queries to 2023 or mention the year in the user intent.

The following figure highlights the steps as explained in the data flow.

sample query run

You can also try complex queries likeGive me a list of the top 3 products by total spend last year. For each of these products, what percentage of the overall spend is from this product?” Because the prompt builder has schema and product (AWS services) information in its context, Amazon Q Business creates the corresponding query. In this case, you’ll see a query similar to the following:

SELECT 
product_servicecode,
SUM(line_item_unblended_cost) AS total_spend,
ROUND(SUM(line_item_unblended_cost) * 100.0 / (SELECT SUM(line_item_unblended_cost)
FROM cur_daily WHERE year = '2023'), 2) AS percentage_of_total
FROM cur_daily
WHERE year = '2023'
GROUP BY product_servicecode
ORDER BY total_spend DESC
LIMIT 3;

When the query is run against Athena, you’ll see similar results corresponding to your data.

Along with the data, you can also see a summary and trend analysis of your data on the Description tab of your Streamlit app.

The prompts used in the application are open domain and you’re free to update them in the code. For example, the following is a prompt used for a summary task:

You are an AI assistant. You are required to return a summary based on the provided data in attachment. Use atleast 
100 words. The spend is in dollars. The unit of measurement is dollars. Give trend analysis too. Start your response 
with - Here is your summary..

The following screenshot shows the results.

Feedback loop

You also have the option of capturing feedback for the generated queries with the thumbs up/down icon on the web application. Currently, the feedback is captured in a local file under /app/feedback. You can change this implementation to write to a database of your choice and have it serve as a query validation mechanism after your testing, to allow only validated queries to run.

Clean up

To clean up your resources, delete the CloudFormation stack, Amazon Q Business application, and Athena tables.

Conclusion

In this post, we demonstrated how Amazon Q Business can effectively bridge the gap between users and data, enabling you to extract valuable insights from various data stores using natural language queries, without the need for extensive technical knowledge or SQL expertise. The natural language understanding capabilities of Amazon Q Business can accurately interpret user intent, extract relevant entities, and generate SQL to translate the user’s query into executable data operations. You can now empower a wider range of enterprise users to unlock the full value of your organization’s data assets. By democratizing data access and analysis using natural language queries, you can foster data-driven decision-making, drive innovation, and unlock new opportunities for growth and success.

In Part 2 of this series, we demonstrate how to integrate this architecture with LangChain using Amazon Q Business as a custom model. We also cover query validation and accuracy measurement.


About the Authors

Vishal Karlupia is a Senior Technical Account Manager/Lead at Amazon Web Services, Toronto. He specializes in generative AI applications and helps customers build and scale their AI/ML workloads on AWS. Outside of work, he enjoys being outdoors and keeping bonfires alive.

Srinivas Ganapathi is a Principal Technical Account Manager at Amazon Web Services. He is based in Toronto, Canada, and works with games customers to run efficient workloads on AWS.

Read More

Cohere Rerank 3 Nimble now generally available on Amazon SageMaker JumpStart

Cohere Rerank 3 Nimble now generally available on Amazon SageMaker JumpStart

The Cohere Rerank 3 Nimble foundation model (FM) is now generally available in Amazon SageMaker JumpStart. This model is the newest FM in Cohere’s Rerank model series, built to enhance enterprise search and Retrieval Augmented Generation (RAG) systems.

In this post, we discuss the benefits and capabilities of this new model with some examples.

Overview of Cohere Rerank models

Cohere’s Rerank family of models are designed to enhance existing enterprise search systems and RAG systems. Rerank models improve search accuracy over both keyword-based and embedding-based search systems. Cohere Rerank 3 is designed to reorder documents retrieved by initial search algorithms based on their relevance to a given query. A reranking model, also known as a cross-encoder, is a type of model that, given a query and document pair, will output a similarity score. For FMs, words, sentences, or entire documents are often encoded as dense vectors in a semantic space. By calculating the cosine of the angle between these vectors, you can quantify their semantic similarity and output as a single similarity score. You can use this score to reorder the documents by relevance to your query.

Cohere Rerank 3 Nimble is the newest model from Cohere’s Rerank family of models, designed to improve speed and efficiency from its predecessor Cohere Rerank 3. According to Cohere’s benchmark tests including BEIR (Benchmarking IR) for accuracy and internal benchmarking datasets, Cohere Rerank 3 Nimble maintains high accuracy while being approximately 3–5 times faster than Cohere Rerank 3. The speed improvement is designed for enterprises looking to enhance their search capabilities without sacrificing performance.

The following diagram represents the two-stage retrieval of a RAG pipeline and illustrates where Cohere Rerank 3 Nimble is incorporated into the search pipeline.

Flow of Solution

In the first stage of retrieval in the RAG architecture, a set of candidate documents are returned based on the knowledge base that’s relevant to the query. In the second stage, Cohere Rerank 3 Nimble analyzes the semantic relevance between the query and each retrieved document, reordering them from most to least relevant. The top-ranked documents augment the original query with additional context. This process improves search result quality by identifying the most pertinent documents. Integrating Cohere Rerank 3 Nimble into a RAG system enables users to send fewer but higher-quality documents to the language model for grounded generation. This results in improved accuracy and relevance of search results without adding latency.

Overview of SageMaker JumpStart

SageMaker JumpStart offers access to a broad selection of publicly available FMs. These pre-trained models serve as powerful starting points that can be deeply customized to address specific use cases. You can now use state-of-the-art model architectures, such as language models, computer vision models, and more, without having to build them from scratch.

Amazon SageMaker is a comprehensive, fully managed machine learning (ML) platform that revolutionizes the entire ML workflow. It offers an unparalleled suite of tools that cater to every stage of the ML lifecycle, from data preparation to model deployment and monitoring. Data scientists and developers can use the SageMaker integrated development environment (IDE) to access a vast array of pre-built algorithms, customize their own models, and seamlessly scale their solutions. The platform’s strength lies in its ability to abstract away the complexities of infrastructure management, allowing you to focus on innovation rather than operational overhead. The automated ML capabilities of SageMaker, including automated machine learning (AutoML) features, democratize ML by enabling even non-experts to build sophisticated models. Furthermore, its robust governance features help organizations maintain control and transparency over their ML projects, addressing critical concerns around regulatory compliance.

Prerequisites

Make sure your SageMaker AWS Identity and Access Management (IAM) service role has the AmazonSageMakerFullAccess permission policy attached.

To deploy Cohere Rerank 3 Nimble successfully, confirm one of the following:

  • Make sure your IAM role has the following permissions and you have the authority to make AWS Marketplace subscriptions in the AWS account used:
    • aws-marketplace:ViewSubscriptions
    • aws-marketplace:Unsubscribe
    • aws-marketplace:Subscribe
  • Alternatively, confirm your AWS account has a subscription to the model. If so, you can skip the following deployment instructions and start with subscribing to the model package.

Deploy Cohere Rerank 3 Nimble on SageMaker JumpStart

You can access the Cohere Rerank 3 family of models using SageMaker JumpStart in Amazon SageMaker Studio, as shown in the following screenshot.

Cohere Sagemaker Jumpstart Viea

Deployment starts when you choose Deploy, and you may be prompted to subscribe to this model through AWS Marketplace. If you are already subscribed, you can choose Deploy again to deploy the model. After deployment finishes, you will see that an endpoint is created. You can test the endpoint by passing a sample inference request payload or by selecting the testing option using the SDK.

Cohere rerank model card

Subscribe to the model package

To subscribe to the model package, complete the following steps:

  1. Depending on the model you want to deploy, open the model package listing page for cohere-rerank-nimble-english or cohere-rerank-nimble-multilingual.
  2. On the AWS Marketplace listing, choose Continue to subscribe.
  3. On the Subscribe to this software page, review and choose Accept Offer if you and your organization agree with EULA, pricing, and support terms.
  4. Choose Continue to configuration and then choose an AWS Region.

A product ARN will be displayed. This is the model package ARN that you need to specify while creating a deployable model using Boto3.

Deploy Cohere Rerank 3 Nimble using the SDK

To deploy the model using the SDK, copy the product ARN from the previous step and specify it in the model_package_arn in the following code:

from cohere_aws import Client
import boto3
region = boto3.Session().region_name

model_package_arn = "Specify the model package ARN here"

After you specify the model package ARN, you can create the endpoint, as shown in the following code. Specify the name of the endpoint, the instance type, and the number of instances being used. Make sure you have the account-level service limit for using ml.g5.xlarge for endpoint usage as one or more instances. To request a service quota increase, refer to AWS service quotas.

co = Client(region_name=region)
co.create_endpoint(arn=model_package_arn, endpoint_name="cohere-rerank-3/cohere-rerank-nimble-multilingual", instance_type="ml.g5.xlarge", n_instances=1)

If the endpoint is already created, you just need to connect to it with the following code:

co.connect_to_endpoint(endpoint_name="cohere-rerank-3/cohere-rerank-nimble-multilingual-v3")

Follow a similar process as detailed earlier to deploy Cohere Rerank 3 on SageMaker JumpStart.

Inference example with Cohere Rerank 3 Nimble

Cohere Rerank 3 Nimble offers robust multilingual support. The model is available in both English and multilingual versions supporting over 100 languages.

The following code example illustrates how to perform real-time inference using Cohere Rerank 3 Nimble-English:

documents = [
    {"Title":"Incorrect Password","Content":"Hello, I have been trying to access my account for the past hour and it keeps saying my password is incorrect. Can you please help me?"},
    {"Title":"Confirmation Email Missed","Content":"Hi, I recently purchased a product from your website but I never received a confirmation email. Can you please look into this for me?"},
    {"Title":"Questions about Return Policy","Content":"Hello, I have a question about the return policy for this product. I purchased it a few weeks ago and it is defective."},
    {"Title":"Customer Support is Busy","Content":"Good morning, I have been trying to reach your customer support team for the past week but I keep getting a busy signal. Can you please help me?"},
    {"Title":"Received Wrong Item","Content":"Hi, I have a question about my recent order. I received the wrong item and I need to return it."},
    {"Title":"Customer Service is Unavailable","Content":"Hello, I have been trying to reach your customer support team for the past hour but I keep getting a busy signal. Can you please help me?"},
    {"Title":"Return Policy for Defective Product","Content":"Hi, I have a question about the return policy for this product. I purchased it a few weeks ago and it is defective."},
    {"Title":"Wrong Item Received","Content":"Good morning, I have a question about my recent order. I received the wrong item and I need to return it."},
    {"Title":"Return Defective Product","Content":"Hello, I have a question about the return policy for this product. I purchased it a few weeks ago and it is defective."}
]

In the following code, the top_n inference parameter for Cohere Rerank 3 and Rerank 3 Nimble specifies the number of top-ranked results to return after reranking the input documents. It allows you to control how many of the most relevant documents are included in the final output. To determine an optimal value for top_n, consider factors such as the diversity of your document set, the complexity of your queries, and the desired balance between precision and latency for enterprise search or RAG.

response = co.rerank(documents=documents, query='What emails have been about returning items?', rank_fields=["Title","Content"], top_n=2)

The following is the output from Cohere Rerank 3 Nimble-English:

Documents: [RerankResult<document: {'Title': 'Received Wrong Item', 'Content': 'Hi, I have a question about my recent order. I received the wrong item and I need to return it.'}, index: 4, relevance_score: 0.0068771075>, RerankResult<document: {'Title': 'Wrong Item Received', 'Content': 'Good morning, I have a question about my recent order. I received the wrong item and I need to return it.'}, index: 7, relevance_score: 0.0064131636>]

Cohere Rerank 3 Nimble multilingual support

The multilingual capabilities of Cohere Rerank 3 Nimble-Multilingual enable global organizations to provide consistent, improved search experiences to users across different Regions and language preferences.

In the following example, we create an input payload for a list of emails in multiple languages. We can take the same set of emails from earlier and translate them to different languages. These examples are available under the SageMaker JumpStart model card and are randomly generated for this example.

documents = [
    {"Title":"Contraseña incorrecta","Content":"Hola, llevo una hora intentando acceder a mi cuenta y sigue diciendo que mi contraseña es incorrecta. ¿Puede ayudarme, por favor?"},
    {"Title":"Confirmation Email Missed","Content":"Hi, I recently purchased a product from your website but I never received a confirmation email. Can you please look into this for me?"},
    {"Title":"أسئلة حول سياسة الإرجاع","Content":"مرحبًا، لدي سؤال حول سياسة إرجاع هذا المنتج. لقد اشتريته قبل بضعة أسابيع وهو معيب"},
    {"Title":"Customer Support is Busy","Content":"Good morning, I have been trying to reach your customer support team for the past week but I keep getting a busy signal. Can you please help me?"},
    {"Title":"Falschen Artikel erhalten","Content":"Hallo, ich habe eine Frage zu meiner letzten Bestellung. Ich habe den falschen Artikel erhalten und muss ihn zurückschicken."},
    {"Title":"Customer Service is Unavailable","Content":"Hello, I have been trying to reach your customer support team for the past hour but I keep getting a busy signal. Can you please help me?"},
    {"Title":"Return Policy for Defective Product","Content":"Hi, I have a question about the return policy for this product. I purchased it a few weeks ago and it is defective."},
    {"Title":"收到错误物品","Content":"早上好,关于我最近的订单,我有一个问题。我收到了错误的商品,需要退货。"},
    {"Title":"Return Defective Product","Content":"Hello, I have a question about the return policy for this product. I purchased it a few weeks ago and it is defective."}
]

Use the following code to perform real-time inference using Cohere Rerank 3 Nimble-Multilingual:

response = co.rerank(documents=documents, query='What emails have been about returning items?', rank_fields=['Title','Content'], top_n=2)
print(f'Documents: {response}')

The following is the output from Cohere Rerank 3 Nimble-Multilingual:

Documents: [RerankResult<document: {'Title': '收到错误物品', 'Content': '早上好,关于我最近的订单,我有一个问题。我收到了错误的商品,需要退货。'}, index: 7, relevance_score: 0.034553625>, RerankResult<document: {'Title': 'أسئلة حول سياسة الإرجاع', 'Content': 'مرحبًا، لدي سؤال حول سياسة إرجاع هذا المنتج. لقد اشتريته قبل بضعة أسابيع وهو معيب'}, index: 2, relevance_score: 0.00037263767>]

The output translated to English is as follows:

Documents: [RerankResult<document: {'Title': 'Received Wrong Item', 'Content': 'Good morning, I have a question about my recent order. I received the wrong item and need to return it.'}, index: 7, relevance_score: 0.034553625>, RerankResult<document: {'Title': 'Questions about Return Policy', 'Content': 'Hello, I have a question about the return policy for this product. I bought it a few weeks ago and it's defective'}, index: 2, relevance_score: 0.00037263767>]

In both examples, the relevance scores are normalized to be in the range [0, 1]. Scores close to 1 indicate a high relevance to the query, and scores closer to 0 indicate low relevance.

Use cases suitable for Cohere Rerank 3 Nimble

The Cohere Rerank 3 Nimble model provides an option that prioritizes efficiency. The model is ideal for enterprises looking to enable their customers to accurately search complex documentation, build applications that understand over 100 languages, and retrieve the most relevant information from various data stores. In industries such as retail, where website drop-off increases with every 100 milliseconds added to search response time, having a faster AI model like Cohere Rerank 3 Nimble powering the enterprise search system translates to higher conversion rates.

Conclusion

Cohere Rerank 3 and Rerank 3 Nimble are now available on SageMaker JumpStart. To get started, refer to Train, deploy, and evaluate pretrained models with SageMaker JumpStart.

Interested in diving deeper? Check out the Cohere on AWS GitHub repo.


About the Authors

Breanne Warner is an Enterprise Solutions Architect at Amazon Web Services supporting healthcare and life science (HCLS) customers. She is passionate about supporting customers to use generative AI on AWS and evangelizing model adoption. Breanne is also on the Women@Amazon board as co-director of Allyship with the goal of fostering inclusive and diverse culture at Amazon. Breanne holds a Bachelor’s of Science in Computer Engineering from University of Illinois at Urbana Champaign (UIUC)

Nithin Vijeaswaran is a Solutions Architect at AWS. His area of focus is generative AI and AWS AI Accelerators. He holds a Bachelor’s degree in Computer Science and Bioinformatics. Niithiyn works closely with the Generative AI GTM team to enable AWS customers on multiple fronts and accelerate their adoption of generative AI. He’s an avid fan of the Dallas Mavericks and enjoys collecting sneakers.

Karan Singh is a Generative AI Specialist for third-party models at AWS, where he works with top-tier third-party foundational model providers to define and run join GTM motions that help customers train, deploy, and scale foundational models. Karan holds a Bachelor’s of Science in Electrical and Instrumentation Engineering from Manipal University and a Master’s in Science in Electrical Engineering from Northwestern University, and is currently an MBA Candidate at the Haas School of Business at University of California, Berkeley.

Read More

Perform generative AI-powered data prep and no-code ML over any size of data using Amazon SageMaker Canvas

Perform generative AI-powered data prep and no-code ML over any size of data using Amazon SageMaker Canvas

Amazon SageMaker Canvas now empowers enterprises to harness the full potential of their data by enabling support of petabyte-scale datasets. Starting today, you can interactively prepare large datasets, create end-to-end data flows, and invoke automated machine learning (AutoML) experiments on petabytes of data—a substantial leap from the previous 5 GB limit. With over 50 connectors, an intuitive Chat for data prep interface, and petabyte support, SageMaker Canvas provides a scalable, low-code/no-code (LCNC) ML solution for handling real-world, enterprise use cases.

Organizations often struggle to extract meaningful insights and value from their ever-growing volume of data. You need data engineering expertise and time to develop the proper scripts and pipelines to wrangle, clean, and transform data. Then you must experiment with numerous models and hyperparameters requiring domain expertise. Afterward, you need to manage complex clusters to process and train your ML models over these large-scale datasets.

Starting today, you can prepare your petabyte-scale data and explore many ML models with AutoML by chat and with a few clicks. In this post, we show you how you can complete all these steps with the new integration in SageMaker Canvas with Amazon EMR Serverless without writing code.

Solution overview

For this post, we use a sample dataset of a 33 GB CSV file containing flight purchase transactions from Expedia between April 16, 2022, and October 5, 2022. We use the features to predict the base fare of a ticket based on the flight date, distance, seat type, and others.

In the following sections, we demonstrate how to import and prepare the data, optionally export the data, create a model, and run inference, all in SageMaker Canvas.

Prerequisites

You can follow along by completing the following prerequisites:

  1. Set up SageMaker Canvas.
  2. Download the dataset from Kaggle and upload it to an Amazon Simple Storage Service (Amazon S3) bucket.
  3. Add emr-serverless as a trusted entity to the SageMaker Canvas execution role to allow Amazon EMR processing jobs.

Import data in SageMaker Canvas

We start by importing the data from Amazon S3 using Amazon SageMaker Data Wrangler in SageMaker Canvas. Complete the following steps:

  1. In SageMaker Canvas, choose Data Wrangler in the navigation pane.
  2. On the Data flows tab, choose Tabular on the Import and prepare dropdown menu.
  3. Enter the S3 URI for the file and choose Go, then choose Next.
  4. Give your dataset a name, choose Random for Sampling method, then choose Import.

Importing data from the SageMaker Data Wrangler flow allows you to interact with a sample of the data before scaling the data preparation flow to the full dataset. This improves time and performance because you don’t need to work with the entirety of the data during preparation. You can later use EMR Serverless to handle the heavy lifting. When SageMaker Data Wrangler finishes importing, you can start transforming the dataset.

After you import the dataset, you can first look at the Data Quality Insights Report to see recommendations from SageMaker Canvas on how to improve the data quality and therefore improve the model’s performance.

  1. In the flow, choose the options menu (three dots) for the node, then choose Get data insights.
  2. Give your analysis a name, select Regression for Problem type, choose baseFare for Target column, select Sampled dataset for Data Size, then choose Create.

Assessing the data quality and analyzing the report’s findings is often the first step because it can guide the proceeding data preparation steps. Within the report, you will find dataset statistics, high priority warnings around target leakage, skewness, anomalies, and a feature summary.

Prepare the data with SageMaker Canvas

Now that you understand your dataset characteristics and potential issues, you can use the Chat for data prep feature in SageMaker Canvas to simplify data preparation with natural language prompts. This generative artificial intelligence (AI)-powered capability reduces the time, effort, and expertise required for the often complex tasks of data preparation.

  1. Choose the .flow file on the top banner to go back to your flow canvas.
  2. Choose the options menu for the node, then choose Chat for data prep.

For our first example, converting searchDate and flightDate to datetime format might help us perform date manipulations and extract useful features such as year, month, day, and the difference in days between searchDate and flightDate. These features can find temporal patterns in the data that can influence the baseFare.

  1. Provide a prompt like “Convert searchDate and flightDate to datetime format” to view the code and choose Add to steps.

In addition to data preparation using the chat UI, you can use LCNC transforms with the SageMaker Data Wrangler UI to transform your data. For example, we use one-hot encoding as a technique to convert categorical data into numerical format using the LCNC interface.

  1. Add the transform Encode categorical.
  2. Choose One-hot encode for Transform and add the following columns: startingAirport, destinationAirport, fareBasisCode, segmentsArrivalAirportCode, segmentsDepartureAirportCode, segmentsAirlineName, segmentsAirlineCode, segmentsEquipmentDescription, and segmentsCabinCode.

You can use the advanced search and filter option in SageMaker Canvas to select columns that are of String data type to simplify the process.

Refer to the SageMaker Canvas blog for other examples using SageMaker Data Wrangler. For this post, we simplify our efforts with these two steps, but we encourage you to use both chat and transforms to add data preparation steps on your own. In our testing, we successfully ran all our data preparation steps through the chat using the following prompts as an example:

  • “Add another step that extracts relevant features such as year, month, day, and day of the week which can enhance temporality to our dataset”
  • “Have Canvas convert the travelDuration, segmentsDurationInSeconds, and segmentsDistance column from string to numeric”
  • “Handle missing values by imputing the mean for the totalTravelDistance column, and replacing missing values as ‘Unknown’ for the segmentsEquipmentDescription column”
  • “Convert boolean columns isBasicEconomy, isRefundable, and isNonStop to integer format (0 and 1)”
  • “Scale numerical features like totalFare, seatsRemaining, totalTravelDistance using Standard Scaler from scikit-learn”

When these steps are complete, you can move to the next step of processing the full dataset and creating a model.

(Optional) Export your data in Amazon S3 using an EMR Serverless job

You can process the entire 33 GB dataset by running the data flow using EMR Serverless for the data preparation job without worrying about the infrastructure.

  1. From the last node in the flow diagram, choose Export and Export data to Amazon S3.
  2. Provide a dataset name and output location.
  3. It is recommended to keep Auto job configuration selected unless you want to change any of the Amazon EMR or SageMaker Processing configs. (If your data is greater than 5 GB data processing will run in EMR Serverless, otherwise it will run within the SageMaker Canvas workspace.)
  4. Under EMR Serverless, provide a job name and choose Export.

You can view the job status in SageMaker Canvas on the Data Wrangler page on the Jobs tab.

You can also view the job status on the Amazon EMR Studio console by choosing Applications under Serverless in the navigation pane.

Create a model

You can also create a model at the end of your flow.

  1. Choose Create model from the node options, and SageMaker Canvas will create a dataset and then navigate you to create a model.
  2. Provide a dataset and model name, select Predictive analysis for Problem type, choose baseFare as the target column, then choose Export and create model.

The model creation process will take a couple of minutes to complete.

  1. Choose My Models in the navigation pane.
  2. Choose the model you just exported and navigate to version 1.
  3. Under Model type, choose Configure model.
  4. Select Numeric model type, then choose Save.
  5. On the dropdown menu, choose Quick Build to start the build process.

When the build is complete, on the Analyze page, you can the following tabs:

  • Overview – This gives you a general overview of the model’s performance, depending on the model type.
  • Scoring – This shows visualizations that you can use to get more insights into your model’s performance beyond the overall accuracy metrics.
  • Advanced metrics – This contains your model’s scores for advanced metrics and additional information that can give you a deeper understanding of your model’s performance. You can also view information such as the column impacts.

Run inference

In this section, we walk through the steps to run batch predictions against the generated dataset.

  1. On the Analyze page, choose Predict.
  2. To generate predictions on your test dataset, choose Manual.
  3. Select the test dataset you created and choose Generate predictions.
  4. When the predictions are ready, either choose View in the pop-up message at the bottom of the page or navigate to the Status column to choose Preview on the options menu (three dots).

You’re now able to review the predictions.

You have now used the generative AI data preparation capabilities in SageMaker Canvas to prepare a large dataset, trained a model using AutoML techniques, and run batch predictions at scale. All of this was done with a few clicks and using a natural language interface.

Clean up

To avoid incurring future session charges, log out of SageMaker Canvas. To log out, choose Log out in the navigation pane of the SageMaker Canvas application.

When you log out of SageMaker Canvas, your models and datasets aren’t affected, but SageMaker Canvas cancels any Quick build tasks. If you log out of SageMaker Canvas while running a Quick build, your build might be interrupted until you relaunch the application. When you relaunch, SageMaker Canvas automatically restarts the build. Standard builds continue even if you log out.

Conclusion

The introduction of petabyte-scale AutoML support within SageMaker Canvas marks a significant milestone in the democratization of ML. By combining the power of generative AI, AutoML, and the scalability of EMR Serverless, we’re empowering organizations of all sizes to unlock insights and drive business value from even the largest and most complex datasets.

The benefits of ML are no longer confined to the domain of highly specialized experts. SageMaker Canvas is revolutionizing the way businesses approach data and AI, putting the power of predictive analytics and data-driven decision-making into the hands of everyone. Explore the future of no-code ML with SageMaker Canvas today.


About the authors

Bret Pontillo is a Sr. Solutions Architect at AWS. He works closely with enterprise customers building data lakes and analytical applications on the AWS platform. In his free time, Bret enjoys traveling, watching sports, and trying new restaurants.

Polaris Jhandi is a Cloud Application Architect with AWS Professional Services. He has a background in AI/ML & big data. He is currently working with customers to migrate their legacy Mainframe applications to the Cloud.

Peter Chung is a Solutions Architect serving enterprise customers at AWS. He loves to help customers use technology to solve business problems on various topics like cutting costs and leveraging artificial intelligence. He wrote a book on AWS FinOps, and enjoys reading and building solutions.

Read More

Delight your customers with great conversational experiences via QnABot, a generative AI chatbot

Delight your customers with great conversational experiences via QnABot, a generative AI chatbot

QnABot on AWS (an AWS Solution) now provides access to Amazon Bedrock foundational models (FMs) and Knowledge Bases for Amazon Bedrock, a fully managed end-to-end Retrieval Augmented Generation (RAG) workflow. You can now provide contextual information from your private data sources that can be used to create rich, contextual, conversational experiences.

The advent of generative artificial intelligence (AI) provides organizations unique opportunities to digitally transform customer experiences. Enterprises with contact center operations are looking to improve customer satisfaction by providing self-service, conversational, interactive chat bots that have natural language understanding (NLU). Enterprises want to automate frequently asked transactional questions, provide a friendly conversational interface, and improve operational efficiency. In turn, customers can ask a variety of questions and receive accurate answers powered by generative AI.

In this post, we discuss how to use QnABot on AWS to deploy a fully functional chatbot integrated with other AWS services, and delight your customers with human agent like conversational experiences.

Solution overview

QnABot on AWS is an AWS Solution that enterprises can use to enable a multi-channel, multi-language chatbot with NLU to improve end customer experiences. QnABot provides a flexible, tiered conversational interface empowering enterprises to meet customers where they are and provide accurate responses. Some responses need to be exact (for example, regulated industries like healthcare or capital markets), some responses need to be searched from large, indexed data sources and cited, and some answers need to be generated on the fly, conversationally, based on semantic context. With QnABot on AWS, you can achieve all of the above by deploying the solution using an AWS CloudFormation template, with no coding required. The solution is extensible, uses AWS AI and machine learning (ML) services, and integrates with multiple channels such as voice, web, and text (SMS).

QnABot on AWS provides access to multiple FMs through Amazon Bedrock, so you can create conversational interfaces based on your customers’ language needs (such as Spanish, English, or French), sophistication of questions, and accuracy of responses based on user intent. You now have the capability to access various large language models (LLMs) from leading AI enterprises (such as Amazon Titan, Anthropic Claude 3, Cohere Command, Meta Llama 3, Mistal AI Large Model, and others on Amazon Bedrock) to find a model best suited for your use case. Additionally, native integration with Knowledge Bases for Amazon Bedrock allows you to retrieve specific, relevant data from your data sources via pre-built data source connectors (Amazon Simple Storage Service – S3, Confluence, Microsoft SharePoint, Salesforce, or web crawlers), and automatically converted to text embeddings stored in a vector database of your choice. You can then retrieve your company-specific information with source attribution (such as citations) to improve transparency and minimize hallucinations. Lastly, if you don’t want to set up custom integrations with large data sources, you can simply upload your documents and support multi-turn conversations. With prompt engineering, managed RAG workflows, and access to multiple FMs, you can provide your customers rich, human agent-like experiences with precise answers.

Deploying the QnABot solution builds the following environment in the AWS Cloud.

Figure 1: QnABot Architecture Diagram

The high-level process flow for the solution components deployed with the CloudFormation template is as follows:

  1. The admin deploys the solution into their AWS account, opens the Content Designer UI or Amazon Lex web client, and uses Amazon Cognito to authenticate.
  2. After authentication, Amazon API Gateway and Amazon S3 deliver the contents of the Content Designer UI.
  3. The admin configures questions and answers in the Content Designer and the UI sends requests to API Gateway to save the questions and answers.
  4. The Content Designer AWS Lambda function saves the input in Amazon OpenSearch Service in a questions bank index. If using text embeddings, these requests first pass through a LLM model hosted on Amazon Bedrock or Amazon SageMaker to generate embeddings before being saved into the question bank on OpenSearch Service.
  5. Users of the chatbot interact with Amazon Lex through the web client UI, Amazon Alexa, or Amazon Connect.
  6. Amazon Lex forwards requests to the Bot Fulfillment Lambda function. Users can also send requests to this Lambda function through Amazon Alexa devices.
  7. The user and chat information is stored in Amazon DynamoDB to disambiguate follow-up questions from previous question and answer context.
  8. The Bot Fulfillment Lambda function takes the user’s input and uses Amazon Comprehend and Amazon Translate (if necessary) to translate non-native language requests to the native language selected by the user during the deployment, and then looks up the answer in OpenSearch Service. If using LLM features such as text generation and text embeddings, these requests first pass through various LLM models hosted on Amazon Bedrock or SageMaker to generate the search query and embeddings to compare with those saved in the question bank on OpenSearch Service.
  9. If no match is returned from the OpenSearch Service question bank, then the Bot Fulfillment Lambda function forwards the request as follows:
    1. If an Amazon Kendra index is configured for fallback, then the Bot Fulfillment Lambda function forwards the request to Amazon Kendra if no match is returned from the OpenSearch Service question bank. The text generation LLM can optionally be used to create the search query and synthesize a response from the returned document excerpts.
    2. If a knowledge base ID is configured, the Bot Fulfillment Lambda function forwards the request to the knowledge base. The Bot Fulfillment Lambda function uses the RetrieveAndGenerate API to fetch the relevant results for a user query, augment the FM’s prompt, and return the response.
  10. User interactions with the Bot Fulfillment function generate logs and metrics data, which is sent to Amazon Kinesis Data Firehose and then to Amazon S3 for later data analysis.
  11. OpenSearch Dashboards can be used to view usage history, logged utterances, no hits utterances, positive user feedback, and negative user feedback, and also provides the ability to create custom reports.

Prerequisites

To get started, you need the following:

  • An AWS account
  • An active deployment of QnABot on AWS (version 6.0.0 or later)
  • Amazon Bedrock model access (required) for all embeddings and LLM models that will be used in QnABot

Figure 2: Request Access to Bedrock Foundational Models (FMs)

In the following sections, we explore some of QnABot’s generative AI features.

Semantic question matching using an embeddings LLM

QnABot on AWS can use text embeddings to provide semantic search capabilities by using LLMs. The goal of this feature is to improve question matching accuracy while reducing the amount of tuning required when compared to the default OpenSearch Service keyword-based matching.

Some of the benefits include:

  • Improved FAQ accuracy from semantic matching vs. keyword matching (comparing the meaning vs. comparing individual words)
  • Fewer training utterances required to match a diverse set of queries
  • Better multi-language support, because translated utterances only need to match the meaning of the stored text, not the wording

Configure Amazon Bedrock to enable semantic question matching

To enable these expanded semantic search capabilities, QnABot uses an Amazon Bedrock FM to generate text embeddings provided using the EmbeddingsBedrockModelId CloudFormation stack parameter. These models provide the best performance and operate on a pay-per-request model. At the time of writing, the following embeddings models are supported by QnABot on AWS:

For the CloudFormation stack, set the following parameters:

  • Set EmbeddingsAPI to BEDROCK
  • Set EmbeddingsBedrockModelId to one of the available options

For example, with semantic matching enabled, the question “What’s the address of the White House?” matches to “Where does the President live?” This example doesn’t match using keywords because they don’t share any of the same words.

Semantic matching in QnABot

Figure 3: Semantic matching in QnABot

In the UI designer, you can set ENABLE_DEBUG_RESPONSE to true to see the user input, source, or any errors of the answer, as illustrated in the preceding screenshot.

You can also evaluate the matching score on the TEST tab in the content designer UI. In this example, we add a match on “qna item question” with the question “Where does the President live?”

Test and evaluate answer

Figure 4: Test and evaluate answers in QnABot

Similarly, you can try a match on “item text passage” with the question “Where did Humpty Dumpty sit?”

Match items or text passages

Figure 5: Match items or text passages in QnABot

Recommendations for tuning with an embeddings LLM

When using embeddings in QnABot, we recommend generalizing questions because more user utterances will match a general statement. For example, the embeddings LLM model will cluster “checking” and “savings” with “account,” so if you want to match both account types, use “account” in your questions.

Similarly, for the question and utterance of “transfer to an agent,” consider using “transfer to someone” because it will better match with “agent,” “representative,” “human,” “person,” and so on.

In addition, we recommend tuning EMBEDDINGS_SCORE_THRESHOLD, EMBEDDINGS_SCORE_ANSWER_THRESHOLD, and EMBEDDINGS_TEXT_PASSAGE_SCORE_THRESHOLD based on the scores. The default values are generalized to all multiple models, but you might need to modify this based on embeddings model and your experiments.

Text generation and query disambiguation using a text LLM

QnABot on AWS can use LLMs to provide a richer, more conversational chat experience. The goal of these features is to minimize the amount of individually curated answers administrators are required to maintain, improve question matching accuracy by providing query disambiguation, and enable the solution to provide more concise answers to users, especially when using a knowledge base in Amazon Bedrock or the Amazon Kendra fallback feature.

Configure an Amazon Bedrock FM with AWS CloudFormation

To enable these capabilities, QnABot uses one of the Amazon Bedrock FMs to generate text embeddings provided using the LLMBedrockModelId CloudFormation stack parameter. These models provide the best performance and operate on a pay-per-request model.

For the CloudFormation stack, set the following parameters:

  • Set LLMApi to BEDROCK
  • Set LLMBedrockModelId to one of the available LLM options
Setup QnABot to use Bedrock FMs

Figure 6: Setup QnABot to use Bedrock FMs

Query disambiguation (LLM-generated query)

By using an LLM, QnABot can take the user’s chat history and generate a standalone question for the current utterance. This enables users to ask follow-up questions that on their own may not be answerable without context of the conversation. The new disambiguated, or standalone, question can then be used as search queries to retrieve the best FAQ, passage, or Amazon Kendra match.

In QnABot’s Content Designer, you can further customize the prompt and model listed in the Query Matching section:

  • LLM_GENERATE_QUERY_PROMPT_TEMPLATE – The prompt template used to construct a prompt for the LLM to disambiguate a follow-up question. The template may use the following placeholders:
    • history – A placeholder for the last LLM_CHAT_HISTORY_MAX_MESSAGES messages in the conversational history, to provide conversational context.
    • input – A placeholder for the current user utterance or question.
  • LLM_GENERATE_QUERY_MODEL_PARAMS – The parameters sent to the LLM model when disambiguating follow-up questions. Refer to the relevant model documentation for additional values that the model provider accepts.

The following screenshot shows an example with the new LLM disambiguation feature enabled, given the chat history context after answering “Who was Little Bo Peep” and the follow-up question “Did she find them again?”

Use LLMs to disambiguate queries

Figure 7: LLM query disambiguation feature enabled

QnABot rewrites that question to provide all the context required to search for the relevant FAQ or passage: “Did Little Bo Peep find her lost sheep again?”

Query disambiguation with LLMs

Figure 8: With query disambiguation with LLMs, context is maintained

Answer text generation using QnABot

You can now generate answers to questions from context provided by knowledge base search results, or from text passages created or imported directly into QnABot. This allows you to generate answers that reduce the number of FAQs you have to maintain, because you can now synthesize concise answers from your existing documents in a knowledge base, Amazon Kendra index, or document passages stored in QnABot as text items. Additionally, your generated answers can be concise and therefore suitable for voice or contact center chatbots, website bots, and SMS bots. Lastly, these generated answers are compatible with the solution’s multi-language support—customers can interact in their chosen languages and receive generated answers in the same language.

With QnABot, you can use two different data sources to generate responses: text passages or a knowledge base in Amazon Bedrock.

Generate answers to questions from text passages

In the content designer web interface, administrators can store full text passages for QnABot on AWS to use. When a question gets asked that matches against this passage, the solution can use LLMs to answer the user’s question based on information found within the passage. We highly recommend you use this option with semantic question matching using Amazon Bedrock text embedding. In QnABot content designer, you can further customize the prompt and model listed under Text Generation using the General Settings section.

Let’s look at a text passage example:

  1. In the Content Designer, choose Add.
  2. Select the text, enter an item ID and a passage, and choose Create.

You can also import your passages from a JSON file using the Content Designer Import feature. On the tools menu, choose Import, open Examples/Extensions, and choose LOAD next to TextPassage-NurseryRhymeExamples to import two nursery rhyme text items.

The following example shows QnABot generating an answer using a text passage item that contains the nursery rhyme, in response to the question “Where did Humpty Dumpty sit?”

Generate answers from text passages

Figure 9: Generate answers from text passages

You can also use query disambiguation and text generation together, by asking “Who tried to fix Humpty Dumpty?” and the follow-up question “Did they succeed?”

Text generation with query disambiguation

Figure 10: Text generation with query disambiguation to maintain context

You can also modify LLM_QA_PROMPT_TEMPLATE in the Content Designer to answer in different languages. In the prompt, you can specify the prompt and answers in different languages (e.g. prompts in French, Spanish).

Answer in different languages

Figure 11: Answer in different languages

You can also specify answers in two languages with bulleted points.

Answer in multiple languages

Figure 12: Answer in multiple languages

RAG using an Amazon Bedrock knowledge base

By integrating with a knowledge base, QnABot on AWS can generate concise answers to users’ questions from configured data sources. This prevents the need for users to sift through larger text passages to find the answer. You can also create your own knowledge base from files stored in an S3 bucket. Amazon Bedrock knowledge bases with QnABot don’t require EmbeddingsApi and LLMApi because the embeddings and generative response are already provided by the knowledge base. To enable this option, create an Amazon Bedrock knowledge base and use your knowledge base ID for the CloudFormation stack parameter BedrockKnowledgeBaseId.

To configure QnABot to use the knowledge base, refer to Create a knowledge base. The following is a quick setup guide to get started:

  1. Provide your knowledge base details.
Setup Amazon Bedrock Knowledge Base

Figure 13: Setup Amazon Bedrock Knowledge Base for RAG use cases

  1. Configure your data source based on the available options. For this example, we use Amazon S3 as the data source and note that the bucket has to be prepended with qna or QNA.
Setup data sources for Knowledge Base

Figure 14: Setup your RAG data sources for Amazon Knowledge Base

  1. Upload your documents to Amazon S3. For this example, we uploaded the aws-overview.pdf whitepaper to test integration.
  2. Create or choose your vector database store to allow Bedrock to store, update and manage embeddings.
  3. Sync the data source and use your knowledge base ID for the CloudFormation stack parameter BedrockKnowledgeBaseId.
Complete setting up Amazon Bedrock Knowledge Base

Figure 15: Complete setting up Amazon Bedrock Knowledge Base for your RAG use cases

In QnABot Content Designer, you can customize additional settings list under Text Generation using RAG with the Amazon Bedrock knowledge base.

QnABot on AWS can now answer questions from the AWS whitepapers, such as “What services are available in AWS for container orchestration?” and “Are there any upfront fees with ECS?”

Generate answers from your Amazon Bedrock Knowledge Base

Figure 16: Generate answers from your Amazon Bedrock Knowledge Base (RAG)

Conclusion

Customers expect quick and efficient service from enterprises in today’s fast-paced world. But providing excellent customer experience can be significantly challenging when the volume of inquiries outpaces the human resources employed to address them. Companies of all sizes can use QnABot on AWS with built-in Amazon Bedrock integrations to provide access to many market leading FMs, provide specialized lookup needs using RAG to reduce hallucinations, and provide a friendly AI conversational experience. With QnABot on AWS, you can provide high-quality natural text conversations, content management, and multi-turn dialogues. The solution comes with one-click deployment for custom implementation, a content designer for Q&A management, and rich reporting. You can also integrate with contact center systems like Amazon Connect and Genesys Cloud CX. Get started with QnABot on AWS.


About the Author

Ajay Swamy is the Product Leader for Data, ML and Generative AI AWS Solutions. He specializes in building AWS Solutions (production-ready software packages) that deliver compelling value to customers by solving for their unique business needs. Other than QnABot on AWS, he manages Generative AI Application BuilderEnhanced Document UnderstandingDiscovering Hot Topics using Machine Learning and other AWS Solutions. He lives with his wife and dog (Figaro), in New York, NY.

Abhishek Patil is a Software Development Engineer at Amazon Web Services (AWS) based in Atlanta, GA, USA. With over 7 years of experience in the tech industry, he specializes in building distributed software systems, with a primary focus on Generative AI and Machine Learning. Abhishek is a primary builder on AI solution QnABot on AWS and has contributed to other AWS Solutions including Discovering Hot Topics using Machine Learning and OSDU® Data Platform. Outside of work, Abhishek enjoys spending time outdoors, reading, resistance training, and practicing yoga.

Read More

Introducing document-level sync reports: Enhanced data sync visibility in Amazon Q Business

Introducing document-level sync reports: Enhanced data sync visibility in Amazon Q Business

Amazon Q Business is a fully managed, generative artificial intelligence (AI)-powered assistant that helps enterprises unlock the value of their data and knowledge. With Amazon Q, you can quickly find answers to questions, generate summaries and content, and complete tasks by using the information and expertise stored across your company’s various data sources and enterprise systems. At the core of this capability are native data source connectors that seamlessly integrate and index content from multiple repositories into a unified index. This enables the Amazon Q large language model (LLM) to provide accurate, well-written answers by drawing from the consolidated data and information. The data source connectors act as a bridge, synchronizing content from disparate systems like Salesforce, Jira, and SharePoint into a centralized index that powers the natural language understanding and generative abilities of Amazon Q.

Customers appreciate that Amazon Q Business securely connects to over 40 data sources. While using their data source, they want better visibility into the document processing lifecycle during data source sync jobs. They want to know the status of each document they attempted to crawl and index, as well as the ability to troubleshoot why certain documents were not returned with the expected answers. Additionally, they want access to metadata, timestamps, and access control lists (ACLs) for the indexed documents.

We are pleased to announce a new feature now available in Amazon Q Business that significantly improves visibility into data source sync operations. The latest release introduces a comprehensive document-level report incorporated into the sync history, providing administrators with granular indexing status, metadata, and ACL details for every document processed during a data source sync job. This enhancement to sync job observability enables administrators to quickly investigate and resolve ingestion or access issues encountered while setting up an Amazon Q Business application. The detailed document reports are persisted in the new SYNC_RUN_HISTORY_REPORT log stream under the Amazon Q Business application log group, so critical sync job details are available on-demand when troubleshooting.

Lifecycle of a document in a data source sync run job

In this section, we examine the lifecycle of a document within a data source sync in Amazon Q Business. This provides valuable insight into the sync process. The data source sync comprises three key stages: crawling, syncing, and indexing. Crawling involves the connector connecting to the data source and extracting documents meeting the defined sync scope according to the data source configuration. These documents are then synced to Amazon Q Business during the syncing phase. Finally, indexing makes the synced documents searchable within the Amazon Q Business environment.

The following diagram shows a flowchart of a sync run job.

Crawling stage

The first stage is the crawling stage, where the connector crawls all documents and their metadata from the data source. During this stage, the connector also compares the checksum of the document against the Amazon Q index to figure out if a particular document needs to be added, modified, or deleted from the index. This operation corresponds to the CrawlAction field in the sync run history report.

If the document is unmodified, it is marked as UNMODIFIED and skipped in the rest of the stages. If any document fails in the crawling stage, for example due to throttling errors, broken content, or if the document size is too big, that document is marked as failed in the sync run history report with the CrawlStatus as FAILED. If the document was skipped due to any validation errors, its CrawlStatus is marked as SKIPPED. These documents are not sent forward to the next stage. All successful documents are marked as SUCCESS and are sent forward.

We also capture the ACLs and metadata on each document in this stage to be able to add it to the sync run history report.

Syncing stage

During the syncing stage, the document is sent to Amazon Q Business ingestion service APIs like BatchPutDocument and BatchDeleteDocument. After a document is submitted to these APIs, Amazon Q Business runs validation checks on the submitted documents. If any document fails these checks, its SyncStatus is marked as FAILED. If there is an irrecoverable error for a particular document, it is marked as SKIPPED and other documents are sent forward.

Indexing stage

In this step, Amazon Q Business parses the document, processes it according to its content type, and persists it in the index. If the document fails to be persisted, its IndexStatus is marked as FAILED; otherwise, it is marked as SUCCESS.

After the statuses of all the stages have been captured, we emit these statuses as an Amazon Cloudwatch event to the customer’s AWS account.

Key features and benefits of document-level reports

The following are the key features and benefits of the new document level report in Amazon Q Business applications:

  • Enhanced sync run history page – A new Actions column has been added to the sync run history page, providing access to the document-level report for each sync run.
  • Dedicated log stream – A new log stream named SYNC_RUN_HISTORY_REPORT has been created in the Amazon Q Business CloudWatch log group, containing the document-level report.
  • Comprehensive document information – The document-level report includes the following information for each document.
  • Document ID – This is the document ID that is inherited directly from the data source or mapped by the customer in the data source field mappings.
  • Document title – The title of the document is taken from the data source or mapped by the customer in the data source field mappings.
  • Consolidated document status (SUCCESS, FAILED, or SKIPPED) – This is the final consolidated status of the document. It can have a value of SUCCESS, FAILED, or SKIPPED. If the document was successfully processed in all stages, then the value is SUCCESS. If the document has failed or was skipped in any of the stages, then the value of this field will be FAILED or SKIPPED.
  • Error message (if the document failed) – This field contains the error message with which a document failed. If a document was skipped due to throttling errors, or any internal errors, this will be shown in the error message field.
  • Crawl status – This field denotes whether the document was crawled successfully from the data source. This status correlates to the syncing-crawling state in the data source sync.
  • Sync status – This field denotes whether the document was sent for syncing successfully. This correlates to the syncing-indexing state in the data source sync.
  • Index status – This field denotes whether the document was successfully persisted in the index.
  • ACLs – This field contains a list of document-level permissions that were crawled from the data source. The details of each element in the list are:
    • Global name: This is the email/username of the user. This field is mapped across multiple data sources. For example, if a user has 3 data sources – Confluence, Sharepoint and Gmail with the local user ID as confluence_user, sharepoint_user and gmail_user respectively, and their email address user@email.com is the globalName in the ACL for all of them; then Amazon Q Business understands that all of these local user IDs map to the same global name.
    • Name: This is the local unique ID of the user which is assigned by the data source.
    • Type: This field indicates the principal type. This can be either USER or GROUP.
    • Is Federated: This is a boolean flag which indicates whether the group is of INDEX level (true) or DATASOURCE level (false).
    • Access: This field indicates whether the user has access allowed or denied explicitly. Values can be either ALLOWED or DENIED.
    • Data source ID: This is the data source ID. For federated groups (INDEX level), this field will be null.
  • Metadata – This field contains the metadata fields (other than ACL) that were pulled from the data source. This list also includes the metadata fields mapped by the customer in the data source field mappings as well as extra metadata fields added by the connector.
  • Hashed document ID (for troubleshooting assistance) – To safeguard your data privacy, we present a secure, one-way hash of the document identifier. This encrypted value enables the Amazon Q Business team to efficiently locate and analyze the specific document within our logs, should you encounter any issue that requires further investigation and resolution.
  • Timestamp – The timestamp indicates when the document status was logged in CloudWatch.

In the following sections, we explore different use cases for the logging feature.

Troubleshoot “Sorry, I could not find relevant information” with the new logging feature

The new document-level logging feature in Amazon Q Business can help troubleshoot common issues related to the “Sorry, I could not find relevant information to complete your request” response.

Let’s explore an example scenario. A mutual funds manager uses Amazon Q Business chat for knowledge retrieval and insights extraction across their enterprise data stores. When the fund manager asks, “What is the CAGR of the multi-asset fund?” in the Amazon Q chat, they receive the “Sorry, I could not find relevant information to complete your request” response.

As the administrator managing their Amazon Q Business application, you can troubleshoot the issue using the following approach with the new logging feature. First, you want to determine whether the multi-asset fund document was successfully indexed in the Amazon Q Business application. Next, you need to verify if the fund manager’s user account has the required permission to read the information from the multi-asset fund document. Amazon Q Business enforces the document permissions configured in its data source, and you can use this new feature to verify that the document ACL settings are synced in the Amazon Q Business application index.

You can use the following CloudWatch query string to check the document ACL settings:

filter @logStream like 'SYNC_RUN_HISTORY_REPORT/' 
and DocumentTitle = "your-document-title"
| fields DocumentTitle, ConnectorDocumentStatus.Status, Acl
| sort @timestamp desc
| limit 1

This query filter uses the per-document-level logging stream SYNC_RUN_HISTORY_REPORT, and displays the document title and its associated ACL settings. By verifying the document indexing and permissions, you can identify and resolve potential issues that may be causing the “Sorry, I could not find relevant information” response.

The following screenshot shows an example result.

Determine the optimal boosting duration for recent documents in using document-level reporting

When it comes to generating accurate answers, you may want to fine-tune the way Amazon Q prioritizes its content. For instance, you may prefer to boost recent documents over older ones to make sure the most up-to-date passages are used to generate an answer. To achieve this, you can use the business’s relevance tuning feature in Amazon Q Business to boost documents based on the last update date attribute, with a specified boosting duration. However, determining the optimal boosting period can be challenging when dealing with a large number of frequently changing documents.

You can now use the per-document-level report to obtain the _last_updated_at metadata field information for your documents, which can help you determine the appropriate boosting period. For this, you use the following CloudWatch Logs Insights query to retrieve the _last_updated_at metadata attribute for machine learning documents from the SYNC_RUN_HISTORY_REPORT log stream:

filter @logStream like 'SYNC_RUN_HISTORY_REPORT/' 
and Metadata like 'Machine Learning'
| parse Metadata '{"key":"_last_updated_at","value":{"dateValue":"*"}}' as @last_updated_at
| sort @last_updated_at desc, @timestamp desc
| dedup DocumentTitle

With the preceding query, you can gain insights into the last updated timestamps of your documents, enabling you to make informed decisions about the optimal boosting period. This approach makes sure your chat responses are generated using the most recent and relevant information, enhancing the overall accuracy and effectiveness of your Amazon Q Business implementation.

The following screenshot shows an example result.

Common document indexing observability and troubleshooting methods

In this section, we explore some common admin tasks for observing and troubleshooting document indexing using the new document-level reporting feature.

List all successfully indexed documents from a data source

To retrieve a list of all documents that have been successfully indexed from a specific data source, you can use the following CloudWatch query:

fields DocumentTitle, DocumentId, @timestamp
| filter @logStream like 'SYNC_RUN_HISTORY_REPORT/your-data-source-id/'
and ConnectorDocumentStatus.Status = "SUCCESS"
| sort @timestamp desc | dedup DocumentTitle, DocumentId

The following screenshot shows an example result. 

List all successfully indexed documents from a data source sync job

To retrieve a list of all documents that have been successfully indexed during a specific sync job, you can use the following CloudWatch query:

fields DocumentTitle, DocumentId, ConnectorDocumentStatus.Status AS IndexStatus, @timestamp
| filter @logStream like 'SYNC_RUN_HISTORY_REPORT/your-data-source-id/run-id'
and ConnectorDocumentStatus.Status = "SUCCESS"
| sort DocumentTitle

The following screenshot shows an example result.

List all failed indexed documents from a data source sync job

To retrieve a list of all documents that failed to index during a specific sync job, along with the error messages, you can use the following CloudWatch query:

fields DocumentTitle, DocumentId, ConnectorDocumentStatus.Status AS IndexStatus, ErrorMsg, @timestamp
| filter @logStream like 'SYNC_RUN_HISTORY_REPORT/your-data-source-id/run-id'
and ConnectorDocumentStatus.Status = "FAILED"
| sort @timestamp desc

The following screenshot shows an example result.

List all documents that contains a particular user name ACL permission from an Amazon Q Business application

To retrieve a list of documents that have a specific user’s ACL permission, you can use the following CloudWatch Logs Insights query:

filter @logStream like 'SYNC_RUN_HISTORY_REPORT/' 
and Acl like 'aneesh@mydemoaws.onmicrosoft.com'
| display DocumentTitle, SourceUri

The following screenshot shows an example result.

 List the ACL of an indexed document from a data source sync job

To retrieve the ACL information for a specific indexed document from a sync job, you can use the following CloudWatch Logs Insights query:

filter @logStream like 'SYNC_RUN_HISTORY_REPORT/data-source-id/run-id' 
and DocumentTitle = "your-document-title"
| display DocumentTitle, Acl

The following screenshot shows an example result.

List metadata of an indexed document from a data source sync job

To retrieve the metadata information for a specific indexed document from a sync job, you can use the following CloudWatch Logs Insights query:

filter @logStream like 'SYNC_RUN_HISTORY_REPORT/data-source-id/run-id' 
and DocumentTitle = "your-document-title"
| display DocumentTitle, Metadata

The following screenshot shows an example result.

Conclusion

The newly introduced document-level report in Amazon Q Business provides enhanced visibility and observability into the document processing lifecycle during data source sync jobs. This feature addresses a critical need expressed by customers for better troubleshooting capabilities and access to detailed information about the indexing status, metadata, and ACLs of individual documents.

The document-level report is stored in a dedicated log stream named SYNC_RUN_HISTORY_REPORT within the Amazon Q Business application CloudWatch log group. This report contains comprehensive information for each document, including the document ID, title, overall document sync status, error messages (if any), along with its ACLs, and metadata information retrieved from the data sources. The data source sync run history page now includes an Actions column, providing access to the document-level report for each sync run. This feature significantly improves the ability to troubleshoot issues related to document ingestion and access control, and issues related to metadata relevance, and provides better visibility about the documents synced with an Amazon Q index.

To get started with Amazon Q Business, explore the Getting started guide. To learn more about data source connectors and best practices, see Configuring Amazon Q Business data source connectors.


About the authors

Aneesh Mohan is a Senior Solutions Architect at Amazon Web Services (AWS), bringing two decades of experience in creating impactful solutions for business-critical workloads. He is passionate about technology and loves working with customers to build well-architected solutions, focusing on the financial services industry, AI/ML, security, and data technologies.

Ashwin Shukla is a Software Development Engineer II on the Amazon Q for Business and Amazon Kendra engineering team, with 6 years of experience in developing enterprise software. In this role, he works on designing and developing foundational features for Amazon Q for Business.

Read More