Multinational organizations face the complex challenge of effectively managing a workforce and operations across different countries, cultures, and languages. Maintaining consistency and alignment across these global operations can be difficult, especially when it comes to updating and sharing business documents and processes. Delays or miscommunications can lead to productivity losses, operational inefficiencies, or potential business disruptions. Accurate and timely sharing of translated documents across the organization is an important step in making sure that employees have access to the latest information in their native language.
In this post, we show how you can automate language localization through translating documents using Amazon Web Services (AWS). The solution combines Amazon Bedrock and AWS Serverless technologies, a suite of fully managed event-driven services for running code, managing data, and integrating applications—all without managing servers. Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, and Stability AI. Amazon Bedrock is accessible through a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI.
Solution overview
The solution uses AWS Step Functions to orchestrate the translation of the source document into the specified language (English, French, or Spanish) using AWS Lambda functions to call Amazon Translate. Note that Amazon Translate currently supports translation of 75 languages and 3 have been chosen for this demo. It then uses Amazon Bedrock to refine the translation and create natural, flowing content.
Building this solution, shown in the following diagram, on AWS fully managed and serverless technologies eliminates the need to operate infrastructure, manage capacity, or invest significant funding upfront to evaluate the business benefit. The compute and AI services used to process documents for translation run only on demand, resulting in a consumption-based billing model where you only pay for your use.
The document translation and standardization workflow consists of the following steps:
- The user uploads their source document requiring translation to the input Amazon Simple Storage Service (Amazon S3) bucket. The bucket has three folders: English, French, and Spanish. The user uploads the source document to the folder that matches the current language of the document. This can be done using the AWS Management Console, the AWS Command Line Interface (AWS CLI), or third-party tools that allow them to navigate an S3 bucket as a file system.
- The presence of a new document in the input bucket initiates the Step Functions workflow using Amazon S3 Event Notifications.
- The first step of this workflow is an AWS Lambda function that retrieves the source document from the bucket, saves it in temporary storage, and calls the Amazon Translate API
TranslateDocument
specifying the source document as the target for translation. - The second step of the workflow is another Lambda function that queries Amazon Bedrock using a pre-generated prompt with the translated source document included as the target. This prompt instructs Amazon Bedrock to perform a transcreation check on the document content. This validates that the intent, style, and tone of the document is maintained. The final version of the document is now saved in the output S3 bucket.
- The last step of the workflow uses Amazon Simple Notification Service (Amazon SNS) to notify an SNS topic of the outcome of the workflow (success or failure). This will send an email to the subscribers to the topic.
- The user downloads their translated document from the output S3 bucket. This can be done using the console, the AWS CLI, or third-party tools that allow them to navigate an S3 bucket as a file system.
This solution is available on GitHub and provides the AWS Cloud Development Kit (AWS CDK) code to deploy in your own AWS account. The AWS CDK is an open source software development framework for defining cloud infrastructure as code (IaC) and provisioning it through AWS CloudFormation. This provides an automated deployment process for your AWS account.
Prerequisites
For this walkthrough, you should have the following prerequisites:
- An AWS account to deploy the solution to.
- An AWS Identity and Access Management (IAM) role in the account, with sufficient permissions to create the necessary resources. If you have administrator access, no additional action is required.
- The AWS CDK installed on your local machine, or an AWS Cloud9 environment.
- Python 3.9 or later.
- Docker.
Deployment steps
To deploy this solution into your own AWS account:
- Open your code editor of choice and authenticate to your AWS account. Instructions for linking to Visual Studio code can be found in Authentication and access for the AWS Toolkit for Visual Studio Code.
- Clone the solution from the GitHub repository:
- Follow the deployment instructions in the repository README file.
- After the stack is deployed, go to the S3 console. Navigate to the S3 bucket that was created — docstandardizationstack-inputbucket. Upload the word_template.docx file that’s included in the repository. English, French, and Spanish folders will automatically be created.
- Navigate to the Amazon Simple Notification Service (Amazon SNS) console and create a subscription to the topic DocStandardizationStack-ResultTopic created by the stack. After it’s created, make sure that you confirm subscription to the topic before testing the workflow by choosing the confirm subscription link in the automated email you receive from SNS.
- After you have subscribed to the topic, you can test the workflow.
Language translation
To test the workflow, upload a .docx file to the folder corresponding to the document’s original language. For example, if you’re uploading a document that was written in English, this document should be uploaded to the English folder. If you don’t have a .docx file available, you can use the tone_test.docx file that’s included in the repository.
The Step Functions state machine will start after your document is uploaded. Translated versions of your source input document will be added to the other folders that were created in step 5. In this example, we uploaded a document in English and the document was translated in both Spanish and French.
Transcreation process
The translated documents are then processed using Amazon Bedrock. Amazon Bedrock reviews the documents’ intent, style and tone for use in a business setting. You can customize the output tone and style by modifying the Amazon Bedrock prompt to match your specific requirements. The final documents are added to the output S3 bucket with a suffix of _corrected, and each document is added to the folder that corresponds to the document’s language. The output bucket has the same format as the input bucket, with a separate folder created for each language.
The prompt used to instruct the generative AI model for the transcreation task has been designed to produce consistent and valid adjustments. It includes specific instructions, covering both what type of changes are expected from the model and rules to define boundaries that control adjustments. You can adjust this prompt if required to change the outcome of the document processing workflow.
The final documents will have a suffix of _corrected.
When the documents have been processed, you will receive an SNS notification. You will be able to download the processed documents from the S3 bucket DocStandardizationStack-OutputBucket.
Clean up
To delete the deployed resources, run the command cdk destroy
in your terminal, or use the CloudFormation console to delete the CloudFormation stack DocStandardizationStack.
Conclusion
In this post, we explored how to automate the translation of business documents using AWS AI and serverless technologies. Through this automated translation process, companies can improve communication, consistency, and alignment across their global operations, making sure that employees can access the information they need when they need it. As organizations continue to expand their global footprint, tools like this will become increasingly important for maintaining a cohesive and informed workforce, no matter where in the world they might be located. By embracing the capabilities of AWS, companies can focus on their core business objectives without creating additional IT infrastructure overhead.
Bonne traduction!
Feliz traducción!
Happy translating!
Further reading
The solution includes a zero-shot prompt with specific instructions directing what the LLM should and should not modify in the source document. If you want to iterate on the provided prompt to adjust your results, you can use the Amazon Bedrock Prompt Management tool to quickly edit and test the impact of changes to the prompt text.
For additional examples using Amazon Bedrock and other services, visit the AWS Workshops page to get started.
About the Authors
Nadhya Polanco is an Associate Solutions Architect at AWS based in Brussels, Belgium. In this role, she supports organizations looking to incorporate AI and Machine Learning into their workloads. In her free time, Nadhya enjoys indulging in her passion for coffee and exploring new destinations.
Steve Bell is a Senior Solutions Architect at AWS based in Amsterdam, Netherlands. He helps enterprise organizations navigate the complexities of migration, modernization and multicloud strategy. Outside of work he loves walking his labrador, Lily, and practicing his amateur BBQ skills.