How Untold Studios empowers artists with an AI assistant built on Amazon Bedrock

How Untold Studios empowers artists with an AI assistant built on Amazon Bedrock

Untold Studios is a tech-driven, leading creative studio specializing in high-end visual effects and animation. Our commitment to innovation led us to a pivotal challenge: how to harness the power of machine learning (ML) to further enhance our competitive edge while balancing this technological advancement with strict data security requirements and the need to streamline access to our existing internal resources.

To give our artists access to technology, we need to create good user interfaces. This is a challenge, especially if the pool of end-users are diverse in terms of their needs and technological experience. We saw an opportunity to use large language models (LLMs) to create a natural language interface, which makes this challenge easier and takes care of a lot of the heavy lifting.

This post details how we used Amazon Bedrock to create an AI assistant (Untold Assistant), providing artists with a straightforward way to access our internal resources through a natural language interface integrated directly into their existing Slack workflow.

Solution overview

The Untold Assistant serves as a central hub for artists. Besides the common AI functionalities like text and image generation, it allows them to interact with internal data, tools, and workflows through natural language queries.

For the UI, we use Slack’s built-in features rather than building custom frontends. Slack already provides applications for workstations and phones, message threads for complex queries, emoji reactions for feedback, and file sharing capabilities. The implementation uses Slack’s event subscription API to process incoming messages and Slack’s Web API to send responses. Users interact with the Untold Assistant through private direct messages or by mentioning it (@-style tagging) in channels for everybody to see. Because our teams already use Slack throughout the day, this eliminates context switching and the need to adopt new software. Every new message is acknowledged by a gear emoji for immediate feedback, which eventually changes to a check mark if the query was successful or an X if an error occurred. The following screenshot shows an example.

AI Assistant Screenshot

With the use of Anthropic’s Claude 3.5 Sonnet model on Amazon Bedrock, the system processes complex requests and generates contextually relevant responses. The serverless architecture provides scalability and responsiveness, and secure storage houses the studio’s vast asset library and knowledge base. Key AWS services used include:

The following diagram illustrates the solution architecture.

Architecture Diagram

The main components for this application are the Slack integration, the Amazon Bedrock integration, the Retrieval Augmented Generation (RAG) implementation, user management, and logging.

Slack integration

We use a two-function approach to meet Slack’s 3-second acknowledgment requirement. The incoming event from Slack is sent to an endpoint in API Gateway, and Slack expects a response in less than 3 seconds, otherwise the request fails. The first Lambda function, with reserved capacity, quickly acknowledges the event and forwards the request to the second function, where it can be handled without time restrictions. The setup handles time-sensitive responses while allowing for thorough request processing. We call the second function directly from the first function without using an event with Amazon Simple Notification Service (Amazon SNS) or a queue with Amazon Simple Queue Service (Amazon SQS) in between to keep the latency as low as possible.

Amazon Bedrock integration

Our Untold Assistant uses Amazon Bedrock with Anthropic’s Claude 3.5 Sonnet model for natural language processing. We use the model’s function calling capabilities, enabling the application to trigger specific tools or actions as needed. This allows the assistant to handle both general queries and complex specialized queries or run tasks across our internal systems.

RAG implementation

Our RAG setup uses Amazon Bedrock connectors to integrate with Confluence and Salesforce, tapping into our existing knowledge bases. For other data sources without a pre-built connector available, we export content to Amazon S3 and use the Amazon S3 connector. For example, we export pre-chunked asset metadata from our asset library to Amazon S3, letting Amazon Bedrock handle embeddings, vector storage, and search. This approach significantly decreased development time and complexity, allowing us to focus on improving user experience.

User management

We map Slack user IDs to our internal user pool, currently in DynamoDB (but designed to work with Amazon Cognito). This system tailors the assistant’s capabilities to each user’s role and clearance level, making sure that it operates within the bounds of each user’s authority while maintaining functionality. The access to data sources is controlled using tools. Every tool encapsulates a data source and the LLM’s access to tools is restricted by the user and their role.

Additionally, if a user tells the assistant something that should be remembered, we store this piece of information in a database and add it to the context every time the user initiates a request. This could be, for example, “Keep all your replies as short as possible” or “If I ask for code it’s always Python.”

Logging and monitoring

We use the built-in integration with Amazon CloudWatch in Lambda to track system performance and error states. For monitoring critical errors, we’ve set up direct notifications to a dedicated Slack channel, allowing for immediate awareness and response. Every query and tool invocation is logged to DynamoDB, providing a rich dataset that we use to analyze usage patterns and optimize the system’s performance and functionality.

Function calling with Amazon Bedrock

Like Anthropic’s Claude, most modern LLMs support function calling, which allows us to extend the capabilities of LLMs beyond merely generating text. We can provide a set of function specifications with a description of what the function is going to do and the names and descriptions of the function’s parameters. Based on this information, the LLM decides if an incoming request can be solved directly or if the best next step to solve the query would be a function call. If that’s the case, the model returns the name of the function to call, as well as the parameters and values. It’s then up to us to run the function and initiate the next step. Agents use this system in a loop to call functions and process their output until a success criterion is reached. In our case, we only implement a single pass function call to keep things simple and robust. However, in certain cases, the function itself uses the LLM to process data and format it nicely for the end-user.

Function calling is a very useful feature that helps us convert unstructured user input into structured automatable instructions. We anticipate that over the next couple of months, we will add many more functions to extend the AI assistant’s capabilities and increase its usefulness. Although frameworks like LangChain offer comprehensive solutions for implementing function calling systems, we opted for a lightweight, custom approach tailored to our specific needs. This decision allowed us to maintain a smaller footprint and focus on the essential features for our use case.

The following is a code example of using the AiTool base class for extendability.

AI Assistant Code Snippet

All that’s required to add a new function is creating a class like the one in our example. The class will automatically be discovered and the respective specification added to the request to the LLM if the user has access to the function. All the required information to create the function specification is extracted from the code and docstrings:

  • NAME – The ID of the function
  • PROGRESS_MESSAGE – A message that’s sent to the user through Slack for immediate feedback before the function is run
  • EXCLUSIVE_ACCESS_DEPARTMENTS – If set, only users of the specified departments have access to this tool

The tool in this example updates the user memory. For example, the query “Remember to always use Python as a programming language” will trigger the execution of this tool. The LLM will extract the info string from the request, for example, “code should always be Python.” If the existing user memory that is always added to the context already contains a memory about the same topic (for example, “code should always be Java”), the LLM will also provide the memory ID and the existing memory will be overwritten. Otherwise, a new memory with a new ID is created.

Key features and benefits

Slack serves as a single entry point, allowing artists to query diverse internal systems without leaving their familiar workflow. The following features are powered by function calling using Anthropic’s Claude:

  • Various knowledge bases for different user roles (Confluence, Salesforce)
  • Internal asset library (Amazon S3)
  • Image generation powered by Stable Diffusion
  • User-specific memory and preferences (for example, default programming languages, default dimensions for image generation, detail level of responses)

By eliminating the need for additional software or context switching, we’ve drastically reduced friction in accessing critical resources. The system is available around the clock for artist queries and tasks, and our framework for function calling with Anthropic’s Claude allows for future expansion of features.

The LLM’s natural language interface is a game changer for user interaction. It’s inherently more flexible and forgiving compared to traditional interfaces, capable of interpreting unstructured input, asking for missing information, and performing tasks like date formatting, unit conversion, and value extraction from natural language descriptions. The system adeptly handles ambiguous queries, extracting relevant information and intent. This means artists can focus on their creative process rather than worrying about precise phrasing or navigating complex menu structures.

Security and control are paramount in our AI adoption strategy. By keeping all data within the AWS ecosystem, we’ve eliminated dependencies on third-party AI tools and mitigated associated risks. This approach allows us to maintain tight control over data access and usage. Additionally, we’ve implemented comprehensive usage analytics, providing insights into adoption patterns and areas for improvement. This data-driven approach makes sure we’re continually refining the tool to meet evolving artist needs.

Impact and future plans

The Untold Assistant currently handles up to 120 queries per day, with about 10–20% of them calling additional tools, like image generation or knowledge base search. Especially for new users who aren’t too familiar with internal workflows and applications yet, it can save a lot of time. Instead of searching in several different Confluence spaces and Slack channels or reaching out to the technology team, they can just ask the Untold Assistant, which acts as a virtual member of the support team. This can cut down the time from minutes to only a few seconds.

Overall, the Untold Assistant, rapidly developed and deployed using AWS services, has delivered several benefits:

  • Enhanced discoverability and usage of previously underutilized internal resources
  • Significant reduction in time spent searching for information
  • Streamlined access to multiple internal systems with an authorization system from a central entry point
  • Reduced load on the support and technology team
  • Increased speed of adoption of new technologies by providing a framework for user interaction

Building on this success, we’re expanding functionality through additional function calls. A key planned feature is render job error analysis for artists. This tool will automatically fetch logs from recent renders, analyze potential errors using the capabilities of Anthropic’s Claude, and provide users with explanations and solutions by using both internet resources and our internal knowledge base of known errors.

Additionally, we plan to analyze the saved queries using Amazon Titan Text Embeddings and agglomerative clustering to identify semantically similar questions. When the cluster frequency exceeds our defined threshold (for example, more than 10 similar questions from different users within a week), we enhance our knowledge base or update onboarding materials to address these common queries proactively, reducing repetitive questions and improving the assistant’s efficiency.

These initial usage metrics and the planned technical improvements demonstrate the system’s positive impact on our workflows. By automating common support tasks and continuously improving our knowledge base through data-driven analysis, we reduce the technology team’s support load while maintaining high-quality assistance. The modular architecture allows us to quickly integrate new tools as needs arise, to keep up with the astonishing pace of the progress made in AI and ML.

Conclusion

The Untold Assistant demonstrates how Amazon Bedrock enables rapid development of sophisticated AI applications without compromising security or control. Using function calling and pre-built connectors in Amazon Bedrock eliminated the need for complex vector store integrations and custom embedding pipelines, reducing our development time from months to weeks. The modular architecture using Python classes for tools makes the system highly maintainable and extensible.

By automating routine technical tasks and information retrieval, we’ve freed our artists to focus on creative work that drives business value. The solution’s clean separation between the LLM interface and business logic, built entirely within the AWS ecosystem, enables quick integration of new capabilities while maintaining strict data security. The LLM’s ability to interpret unstructured input and handle ambiguous queries creates a more natural and forgiving interface compared to traditional menu-driven systems. This foundation of technical robustness and improved artist productivity positions us to rapidly adopt emerging AI capabilities while keeping our focus on creative innovation.

To explore how to streamline your company’s workflows using Amazon Bedrock, see Getting started with Amazon Bedrock. If you have questions or suggestions, please leave a comment.


About the Authors

Olivier Vigneresse Olivier Vigneresse is a Solutions Architect at AWS. Based in England, he primarily works with SMB Media an&d Entertainment customers. With a background in security and networking, Olivier helps customers achieve success on their cloud journey by providing architectural guidance and best practices; he is also passionate about helping them bring value with Machine Learning and Generative AI use-cases.

Daniel Goller Daniel Goller is a Lead R&D Developer at Untold Studios with a focus on cloud infrastructure and emerging technologies. After earning his PhD in Germany, where he collaborated with industry leaders like BMW and Audi, he has spent the past decade implementing software solutions, with a particular emphasis on cloud technology in recent years. At Untold Studios, he leads infrastructure optimisation and AI/ML initiatives, leveraging his technical expertise and background in research to drive innovation in the Media & Entertainment space.

Max Barnett Max Barnett is an Account Manager at AWS who specialises in accelerating the cloud journey of Media & Entertainment customers. He has been helping customers at AWS for the past 4.5 years. Max has been particularly involved with customers in the visual effect space, guiding them as they explore generative AI.

Read More

AI-Designed Proteins Take on Deadly Snake Venom

AI-Designed Proteins Take on Deadly Snake Venom

Every year, venomous snakes kill over 100,000 people and leave 300,000 more with devastating injuries — amputations, paralysis and permanent disabilities. The victims are often farmers, herders and children in rural communities across sub-Saharan Africa, South Asia and Latin America. For them, a snakebite isn’t just a medical crisis — it’s an economic catastrophe.

Treatment hasn’t changed in over a century. Antivenoms — derived from the blood of immunized animals — are expensive, difficult to manufacture and often ineffective against the deadliest toxins. Worse, they require refrigeration and trained medical staff, making them unreachable for many who need them most.

Now, a team led by Susana Vázquez Torres, a computational biologist working in Nobel Prize winner David Baker’s renowned protein design lab at the University of Washington, has used AI to create entirely new proteins that neutralize lethal snake venom in laboratory tests — faster, cheaper and more effectively than traditional antivenoms. Their research, published in Nature, introduces a new class of synthetic proteins that successfully protect animals from otherwise lethal doses of snake venom toxins.

Susana Vazquez Torres conducts drug-development research. Credit: Ian C. Haydon, UW Medicine Institute for Protein Design

How AI Cracked the Code on Venom

For over a century, antivenom production has relied on animal immunization, requiring thousands of snake milkings and plasma extractions. Torres and her team hope to replace this with AI-driven protein design, compressing years of work into weeks.

Using NVIDIA Ampere and L40 GPUs, the Baker Lab used its deep learning models, including RFdiffusion and ProteinMPNN, to generate millions of potential antitoxin structures ‘in silico,’ or in computer simulations. Instead of screening a vast number of these proteins in a lab, they used AI tools to predict how the designer proteins would interact with snake venom toxins, rapidly homing in on the most promising designs.

The results were remarkable:

  • Newly designed proteins bound tightly to three-finger toxins (3FTx), the deadliest components of elapid venom, effectively neutralizing their toxic effects.
  • Lab tests confirmed their high stability and neutralization capability.
  • Mouse studies showed an 80-100% survival rate following exposure to lethal neurotoxins.
  • The AI-designed proteins were small, heat-resistant and easy to manufacture — no cold storage required.

A Lifeline for the Most Neglected Victims

Unlike traditional antivenoms, which cost hundreds of dollars per dose, it may be possible to mass-produce these AI-designed proteins at low cost, making life-saving treatment available where it’s needed most.

Many snakebite victims can’t afford antivenom or delay seeking care due to cost and accessibility barriers. In some cases, the financial burden of treatment can push entire families deeper into poverty. With an accessible, affordable and shelf-stable antidote, millions of lives — and livelihoods — could be saved.

Beyond Snakebites: The Future of AI-Designed Medicine

This research isn’t just about snakebites. The same AI-driven approach could be used to design precision treatments for viral infections, autoimmune diseases and other hard-to-treat conditions, according to the researchers.

By replacing trial-and-error drug development with algorithmic precision, researchers using AI to design proteins are working to make life-saving medicines more affordable and accessible worldwide.

Torres and her collaborators — including researchers from the Technical University of Denmark, University of Northern Colorado and Liverpool School of Tropical Medicine — are now focused on preparing these venom-neutralizing proteins for clinical testing and large-scale production.

If successful, this AI-driven advancement could save lives, and uplift families and communities around the world.

Read More

Protect your DeepSeek model deployments with Amazon Bedrock Guardrails

Protect your DeepSeek model deployments with Amazon Bedrock Guardrails

The rapid advancement of generative AI has brought powerful publicly available large language models (LLMs), such as DeepSeek-R1, to the forefront of innovation. The DeepSeek-R1 models are now accessible through Amazon Bedrock Marketplace and Amazon SageMaker JumpStart, and distilled variants are available through Amazon Bedrock Custom Model Import. According to DeepSeek AI, these models offer strong capabilities in reasoning, coding, and natural language understanding. However, their deployment in production environments—like all models—requires careful consideration of data privacy requirements, appropriate management of bias in output, and the need for robust monitoring and control mechanisms.

Organizations adopting open source, open weights models such as DeepSeek-R1 have important opportunities to address several key considerations:

  • Enhancing security measures to prevent potential misuse, guided by resources such as OWASP LLM Top 10 and MITRE Atlas
  • Making sure to protect sensitive information
  • Fostering responsible content generation practices
  • Striving for compliance with relevant industry regulations

These concerns become particularly critical in highly regulated industries such as healthcare, finance, and government services, where data privacy and content accuracy are paramount.

This blog post provides a comprehensive guide to implementing robust safety protections for DeepSeek-R1 and other open weight models using Amazon Bedrock Guardrails. We’ll explore:

  • How to use the security features offered by Amazon Bedrock to protect your data and applications
  • Practical implementation of guardrails to prevent prompt attacks and filter harmful content
  • Implementing a robust defense-in-depth strategy

By following this guide, you’ll learn how to use the advanced capabilities of DeepSeek models while maintaining strong security controls and promoting ethical AI practices. Whether developing customer-facing generative AI applications or internal tools, these implementation patterns will help you meet your requirements for secure and responsible AI. By following this step-by-step approach, organizations can deploy open weights LLMs such as DeepSeek-R1 in line with best practices for AI safety and security.

DeepSeek models and deployment on Amazon Bedrock

DeepSeek AI, a company specializing in open weights foundation AI models, recently launched their DeepSeek-R1 models, which according to their paper have shown outstanding reasoning abilities and performance in industry benchmarks. According to third-party evaluations, these models consistently achieve top three rankings across various metrics, including quality index, scientific reasoning and knowledge, quantitative reasoning, and coding (HumanEval).

The company has further developed their portfolio by releasing six dense models derived from DeepSeek-R1, built on Llama and Qwen architectures, which they’ve made open weight models. These models are now accessible through AWS generative AI solutions: DeepSeek-R1 is available through Amazon Bedrock Marketplace and SageMaker Jumpstart, while the Llama-based distilled versions can be implemented through Amazon Bedrock Custom Model Import.

Amazon Bedrock offers comprehensive security features to help secure hosting and operation of open source and open weights models while maintaining data privacy and regulatory compliance. Key features include data encryption at rest and in transit, fine-grained access controls, secure connectivity options, and various compliance certifications. Additionally, Amazon Bedrock provides guardrails for content filtering and sensitive information protection to support responsible AI use. AWS enhances these capabilities with extensive platform-wide security and compliance measures:

Organizations should customize these security settings based on their specific compliance and security needs when deploying to production environments. AWS conducts vulnerability scanning of all model containers as part of its security process and accepts only models in Safetensors format to help prevent unsafe code execution.

Amazon Bedrock Guardrails

Amazon Bedrock Guardrails provides configurable safeguards to help safely build generative AI applications at scale. Amazon Bedrock Guardrails can also be integrated with other Amazon Bedrock tools including Amazon Bedrock Agents and Amazon Bedrock Knowledge Bases to build safer and more secure generative AI applications aligned with responsible AI policies. To learn more, see the AWS Responsible AI page.

Core functionality

Amazon Bedrock Guardrails can be used in two ways. First, it can be integrated directly with the InvokeModel and Converse API call, where guardrails are applied to both input prompts and model outputs during the inference process. This method is suitable with models hosted on Amazon Bedrock through the Amazon Bedrock Marketplace and Amazon Bedrock Custom Model Import. Alternatively, the ApplyGuardrail API offers a more flexible approach, allowing for independent evaluation of content without invoking a model. This second method is useful for assessing inputs or outputs at various stages of an application, working with custom or third-party models outside of Amazon Bedrock. Both approaches enable developers to implement safeguards customized to their use cases and aligned with responsible AI policies, ensuring secure and compliant interactions in generative AI applications.

Key Amazon Bedrock Guardrails policies

Amazon Bedrock Guardrails provides the following configurable guardrail policies to help safely build generative AI applications at scale:

  • Content filters
    • Adjustable filtering intensity for harmful content
    • Predefined categories: Hate, Insults, Sexual Content, Violence, Misconduct, and Prompt Attacks
    • Multi-modal content including text and images (preview)
  • Topic filters
    • Capability to restrict specific topics
    • Prevention of unauthorized topics in both queries and responses
  • Word filters
    • Blocks specific words, phrases, and profanity
    • Custom filters for offensive language or competitor references
  • Sensitive information filters
    • Personally identifiable information (PII) blocking or masking
    • Support for custom regex patterns
    • Probabilistic detection for standard formats (such as SSN, DOB, and addresses)
  • Contextual grounding checks
    • Hallucination detection through source grounding
    • Query relevance validation
  • Automated Reasoning checks for hallucination prevention (gated preview)

Other capabilities

Model-agnostic implementation:

  • Compatible with all Amazon Bedrock foundation models
  • Supports fine-tuned models
  • Extends to external custom and third-party models through the ApplyGuardrail API

This comprehensive framework helps customers implement responsible AI, maintaining content safety and user privacy across diverse generative AI applications.

Solution Overview

  1. Guardrail configuration
    • Create a guardrail with specific policies tailored to your use case and configure the policies.
  1. Integration with InvokeModel API
    • Call the Amazon Bedrock InvokeModel API with the guardrail identifier in your request.
    • When you make the API call, Amazon Bedrock applies the specified guardrail to both the input and output.
  1. Guardrail evaluation process
    1. Input evaluation: Before sending the prompt to the model, the guardrail evaluates the user input against the configured policies.
    2. Parallel policy checking: For improved latency, the input is evaluated in parallel for each configured policy.
    3. Input intervention: If the input violates any guardrail policies, a pre-configured blocked message is returned, and the model inference is discarded.
    4. Model inference: If the input passes the guardrail checks, the prompt is sent to the specified model for inference.
    5. Output evaluation: After the model generates a response, the guardrail evaluates the output against the configured policies.
    6. Output intervention: If the model response violates any guardrail policies, it will be either blocked with a pre-configured message or have sensitive information masked, depending on the policy.
    7. Response delivery: If the output passes all guardrail checks, the response is returned to the application without modifications

Prerequisites

Before setting up guardrails for models imported using the Amazon Bedrock Custom Model Import feature, make sure you meet these prerequisites:

  • An AWS account with access to Amazon Bedrock along with the necessary IAM role with the required permissions. For centralized access management, we recommend that you use AWS IAM Identity Center.
  • Make sure that a custom model is already imported using the Amazon Bedrock Custom Model Import service. For illustration, we’ll use DeepSeek-R1-Distill-Llama-8B, which can be imported using Amazon Bedrock Custom Model Import. You have two options for deploying this model:

You can create the guardrail using the AWS Management Console as explained in this blog post. Alternatively, you can follow this notebook for a programmatic example of how to create the guardrail in this solution. This notebook does the following :

  1. Install the required dependencies
  2. Create a guardrail using the boto3 API and filters to meet the use case mentioned previously.
  3. Configure the tokenizer for the imported model.
  4. Test Amazon Bedrock Guardrails using prompts that show various Amazon Bedrock guardrail filters in action.

This approach integrates guardrails into both the user inputs and the model outputs. This makes sure that any potentially harmful or inappropriate content is intercepted during both phases of the interaction. For open weight distilled models imported using Amazon Bedrock Custom Model Import, Amazon Bedrock Marketplace, and Amazon SageMaker JumpStart, critical filters to implement include those for prompt attacks, content moderation, topic restrictions, and sensitive information protection.

Implementing a defense-in-depth strategy with AWS services

While Amazon Bedrock Guardrails provides essential content and prompt safety controls, implementing a comprehensive defense-in-depth strategy is crucial when deploying any foundation model, especially open weights models such as DeepSeek-R1. For detailed guidance on defense-in-depth approaches aligned with OWASP Top 10 for LLMs, see our previous blog post on architecting secure generative AI applications.

Key highlights include:

  • Developing organizational resiliency by starting with security in mind
  • Building on a secure cloud foundation using AWS services
  • Applying a layered defense strategy across multiple trust boundaries
  • Addressing the OWASP Top 10 risks for LLM applications
  • Implementing security best practices throughout the AI/ML lifecycle
  • Using AWS security services in conjunction with AI and machine learning (AI/ML)-specific features
  • Considering diverse perspectives and aligning security with business objectives
  • Preparing for and mitigating risks such as prompt injection and data poisoning

The combination of model-level controls (guardrails) with a defense-in-depth strategy creates a robust security posture that can help protect against:

  • Data exfiltration attempts
  • Unauthorized access to fine-tuned models or training data
  • Potential vulnerabilities in model implementation
  • Malicious use of AI agents and integrations

We recommend conducting thorough threat modeling exercises using AWS guidance for generative AI workloads before deploying any new AI/ML solutions. This helps align security controls with specific risk scenarios and business requirements.

Conclusion

Implementing safety protection for LLMs, including DeepSeek-R1 models, is crucial for maintaining a secure and ethical AI environment. By using Amazon Bedrock Guardrails with the Amazon Bedrock InvokeModel API and the ApplyGuardrails API, you can help mitigate the risks associated with advanced language models while still harnessing their powerful capabilities. However, it’s important to recognize that model-level protections are just one component of a comprehensive security strategy.

The strategies outlined in this post address several key security concerns that are common across various open weights models hosted on Amazon Bedrock using Amazon Bedrock Custom Model Import, Amazon Bedrock Marketplace, and through Amazon SageMaker JumpStart. These include potential vulnerabilities to prompt injection attacks, the generation of harmful content, and other risks identified in recent assessments. By implementing these guardrails alongside a defense-in-depth approach, organizations can significantly reduce the risk of misuse and better align their AI applications with ethical standards and regulatory requirements.

As AI technology continues to evolve, it’s essential to prioritize safety and responsible use of generative AI. Amazon Bedrock Guardrails provides a configurable and robust framework for implementing these safeguards, allowing developers to customize protection measures according to their specific use cases and organizational policies. We strongly recommend conducting thorough threat modeling of your AI workloads using AWS guidance to evaluate security risks and implementing appropriate controls across your entire technology stack.

Remember to regularly review and update not only your guardrails but all security controls to address new potential vulnerabilities and help maintain protection against emerging threats in the rapidly evolving landscape of AI security. While today we focus on DeepSeek-R1 models, the AI landscape is continuously evolving with new models emerging regularly. Amazon Bedrock Guardrails, combined with AWS security services and best practices, provides a consistent security framework that can adapt to protect your generative AI applications across various open weights models, both current and future. By treating security as a continuous process of assessment, improvement, and adaptation, organizations can confidently deploy innovative AI solutions while maintaining robust security controls.


About the Authors

Satveer Khurpa is a Sr. WW Specialist Solutions Architect, Bedrock at Amazon Web Services. In this role, he uses his expertise in cloud-based architectures to develop innovative generative AI solutions for clients across diverse industries. Satveer’s deep understanding of generative AI technologies allows him to design scalable, secure, and responsible applications that unlock new business opportunities and drive tangible value.

Adewale Akinfaderin is a Sr. Data Scientist–Generative AI, Amazon Bedrock, where he contributes to cutting edge innovations in foundational models and generative AI applications at AWS. His expertise is in reproducible and end-to-end AI/ML methods, practical implementations, and helping global customers formulate and develop scalable solutions to interdisciplinary problems. He has two graduate degrees in physics and a doctorate in engineering.

Antonio Rodriguez is a Principal Generative AI Specialist Solutions Architect at Amazon Web Services. He helps companies of all sizes solve their challenges, embrace innovation, and create new business opportunities with Amazon Bedrock. Apart from work, he loves to spend time with his family and play sports with his friends.

Read More

Cut Your Losses in Large-Vocabulary Language Models

As language models grow ever larger, so do their vocabularies. This has shifted the memory footprint of LLMs during training disproportionately to one single layer: the cross-entropy in the loss computation. Cross-entropy builds up a logit matrix with entries for each pair of input tokens and vocabulary items and, for small models, consumes an order of magnitude more memory than the rest of the LLM combined. We propose Cut Cross-Entropy (CCE), a method that computes the cross-entropy loss without materializing the logits for all tokens into global memory. Rather, CCE only computes the logit…Apple Machine Learning Research

eaSEL: Promoting Social-Emotional Learning and Parent-Child Interaction Through AI-Mediated Content Consumption

As children increasingly consume media on devices, parents look for ways this usage can support learning and growth, especially in domains like social-emotional learning. We introduce eaSEL, a system that (a) integrates social-emotional learning (SEL) curricula into children’s video consumption by generating reflection activities and (b) facilitates parent-child discussions around digital media without requiring co-consumption of videos. We present a technical evaluation of our system’s ability to detect social-emotional moments within a transcript and to generate high-quality SEL-based…Apple Machine Learning Research

Fine-tune and host SDXL models cost-effectively with AWS Inferentia2

Fine-tune and host SDXL models cost-effectively with AWS Inferentia2

Building upon a previous Machine Learning Blog post to create personalized avatars by fine-tuning and hosting the Stable Diffusion 2.1 model at scale using Amazon SageMaker, this post takes the journey a step further. As technology continues to evolve, newer models are emerging, offering higher quality, increased flexibility, and faster image generation capabilities. One such groundbreaking model is Stable Diffusion XL (SDXL), released by StabilityAI, advancing the text-to-image generative AI technology to unprecedented heights. In this post, we demonstrate how to efficiently fine-tune the SDXL model using SageMaker Studio. We show how to then prepare the fine-tuned model to run on AWS Inferentia2 powered Amazon EC2 Inf2 instances, unlocking superior price performance for your inference workloads.

Solution overview

The SDXL 1.0 is a text-to-image generation model developed by Stability AI, consisting of over 3 billion parameters. It comprises several key components, including a text encoder that converts input prompts into latent representations, and a U-Net model that generates images based on these latent representations through a diffusion process. Despite its impressive capabilities trained on a public dataset, app builders sometimes need to generate images for a specific subject or style that are difficult or inefficient to describe in words. In that situation, fine-tuning is a great option to improve relevance using your own data.

One popular approach to fine-tuning SDXL is to use DreamBooth and Low-Rank Adaptation (LoRA) techniques. You can use DreamBooth to personalize the model by embedding a subject into its output domain using a unique identifier, effectively expanding its language-vision dictionary. This process uses a technique called prior preservation, which retains the model’s existing knowledge about the subject class (such as humans) while incorporating new information from the provided subject images. LoRA is an efficient fine-tuning method that attaches small adapter networks to specific layers of the pre-trained model, freezing most of its weights. By combining these techniques, you can generate a personalized model while tuning an order-of-magnitude fewer parameters, resulting in faster fine-tuning times and optimized storage requirements.

After the model is fine-tuned, you can compile and host the fine-tuned SDXL on Inf2 instances using the AWS Neuron SDK. By doing this, you can benefit from the higher performance and cost-efficiency offered by these specialized AI chips while taking advantage of the seamless integration with popular deep learning frameworks such as TensorFlow and PyTorch. To learn more, visit our Neuron documentation.

Prerequisites

Before you get started, review the list of services and instance types required to run the sample notebooks provided at this GitHub location.

By following these prerequisites, you will have the necessary knowledge and AWS resources to run the sample notebooks and work with Stable Diffusion models and FMs on Amazon SageMaker.

Fine-tuning SDXL on SageMaker

To fine-tune SDXL on SageMaker, follow the steps in the next sections.

Prepare the images

The first step in fine-tuning the SDXL model is to prepare your training images. Using the DreamBooth technique, you need as few as 10–12 images for fine-tuning. It’s recommended to provide a variety of images to help the model better understand and generalize your facial features.

The training images should include selfies taken from different angles, covering various perspectives of your face. Include images with different facial expressions, such as smiling, frowning, and neutral. Preferably, use images with different backgrounds to help the model identify the subject more effectively. By providing a diverse set of images, DreamBooth can better identify the subject from the pictures and generalize your facial features. The following set of images demonstrate this.

prepare a set of training images

Additionally, use 1024×1024 pixel square images for fine-tuning. To simplify the process of preparing the images, there is a utility function that automatically crops and adjusts your images to the correct dimensions.

Train the personalized model

After the images are prepared, you can begin the fine-tuning process. To achieve this, you use the autoTrain library from Hugging Face, an automatic and user-friendly approach to training and deploying state-of-the-art machine learning (ML) models. Seamlessly integrated with the Hugging Face ecosystem, autoTrain is designed to be accessible, and individuals can train custom models without extensive technical expertise or coding proficiency. To use autoTrain, use the following example code:

!autotrain dreambooth 
--prompt "${INSTANCE_PROMPT}" 
--class-prompt "${CLASS_PROMPT}" 
--model ${MODEL_NAME} 
--project-name ${PROJECT_NAME} 
--image-path "${IMAGE_PATH}" 
--resolution ${RESOLUTION} 
--batch-size ${BATCH_SIZE} 
--num-steps ${NUM_STEPS} 
--gradient-accumulation ${GRADIENT_ACCUMULATION} 
--lr ${LEARNING_RATE} 
--fp16 
--gradient-checkpointing

First, you need to set the prompt and class-prompt. The prompt should include a unique identifier or token that the model can reference to the subject. The class-prompt, on the other hand, is used to subsidize the model training with similar subjects of the same class. This is a requirement for the DreamBooth technique to better associate the new token with the subject of interest. This is why the DreamBooth technique can generate exceptional fine-tuned results with fewer input images. Additionally, you’ll notice that even though you didn’t provide examples of the top or back of our head, the model still knows how to generate them because of the class prompt. In this example, you are using <<TOK>> as a unique identifier to avoid a name that the model might already be familiar with.

instance_prompt = "photo of <<TOK>>"
class_prompt = "photo of a person"

Next, you need to provide the model, image-path, and project-name. The model name loads the base model from the Hugging Face Hub or locally. The image-path is the location of the training images. By default, autoTrain uses LoRA, a parameter-efficient way to fine-tune. Unlike traditional fine-tuning, LoRA fine-tunes by attaching a small transformer adapter model to the base model. Only the adapter weights are updated during training to achieve fine-tuning behavior. Additionally, these adapters can be attached and detached at any time, making them highly efficient for storage as well. These supplementary LoRA adapters are 98% smaller in size compared to the original model, allowing us to store and share the LoRA adapters without having to duplicate the base model repeatedly. The following diagram illustrates these concepts.

This digram show cases the value prop of using LoRA fine tuning techniques

The rest of the configuration parameters are as follows. You are recommended to start with these values first. Adjust them only if the fine-tuning results don’t meet your expectations.

resolution = 1024          # resolution or size of the generated images
batch_size = 1             # number of samples in one forward and backward pass  
num_steps = 500           # number of training steps
gradient_accumulation = 4  # accumulating gradients over number of batches
learning_rate = 1e-4       # step size
fp16                       # half-precision
gradient-checkpointing     # technique to reduce memory consumption during training

The entire training process takes about 30 mins with the preceding configuration. After the training is done, you can load the LoRA adapter, such as the following code, and generate fine-tuned images.

from diffusers import DiffusionPipeline, StableDiffusionXLImg2ImgPipeline
import random

seed = random.randint(0, 100000)

# loading the base model
pipeline = DiffusionPipeline.from_pretrained(
    model_name_base,
    torch_dtype=torch.float16,
    ).to(device)

# attach the LoRA adapter
pipeline.load_lora_weights(
    project_name,
    weight_name="pytorch_lora_weights.safetensors",
)

# generate fine tuned images
generator = torch.Generator(device).manual_seed(seed)
base_image = pipeline(
    prompt=prompt,
    negative_prompt=negative_prompt,
    num_inference_steps=50,
    generator=generator,
    height=1024,
    width=1024,
    output_type="pil",
    ).images[0]
base_image

Deploy on Amazon EC2 Inf2 instances

In this section, you learn to compile and host the fine-tuned SDXL model on Inf2 instances. To begin, you need to clone the repository and upload the LoRA adapter onto the Inf2 instance created in the prerequisites section. Then, run the compilation notebook to compile the fine-tuned SDXL model using the Optimum Neuron library. Visit the Optimum Neuron page for more details.

The NeuronStableDiffusionXLPipeline class in Optimum Neuron now has direct support for the LoRA. All you need to do is to supply the base model, LoRA adapters, and supply the model input shapes to start the compilation process. The following code snippet illustrates how to compile and then export the compiled model to a local directory.

from optimum.neuron import NeuronStableDiffusionXLPipeline

model_id = "stabilityai/stable-diffusion-xl-base-1.0"
adapter_id = "lora"
input_shapes = {"batch_size": 1, "height": 1024, "width": 1024, "num_images_per_prompt": 1}

# Compile
pipe = NeuronStableDiffusionXLPipeline.from_pretrained(
    model_id,
    export=True,
    lora_model_ids=adapter_id,
    lora_weight_names="pytorch_lora_weights.safetensors",
    lora_adapter_names="sttirum",
    **input_shapes,
)

# Save locally or upload to the HuggingFace Hub
save_directory = "sd_neuron_xl/"
pipe.save_pretrained(save_directory)

The compilation process takes about 35 minutes. After the process is complete, you can use the NeuronStableDiffusionXLPipeline again to load the compiled model back.

from optimum.neuron import NeuronStableDiffusionXLPipeline

stable_diffusion_xl = NeuronStableDiffusionXLPipeline.from_pretrained("sd_neuron_xl")

You can then test the model on Inf2 and make sure that you can still generate the fine-tuned results.

import torch
# Run pipeline
prompt = """
photo of <<TOK>> , 3d portrait, ultra detailed, gorgeous, 3d zbrush, trending on dribbble, 8k render
"""

negative_prompt = """
ugly, tiling, poorly drawn hands, poorly drawn feet, poorly drawn face, out of frame, extra limbs, disfigured, deformed, body out of frame, blurry, bad anatomy, blurred, 
watermark, grainy, signature, cut off, draft, amateur, multiple, gross, weird, uneven, furnishing, decorating, decoration, furniture, text, poor, low, basic, worst, juvenile, 
unprofessional, failure, crayon, oil, label, thousand hands
"""

seed = 491057365
generator = [torch.Generator().manual_seed(seed)]
image = stable_diffusion_xl(prompt,
                    num_inference_steps=50,
                    guidance_scale=7,
                    negative_prompt=negative_prompt,
                    generator=generator).images[0]

Here are a few avatar images generated using the fine-tuned model on Inf2. The corresponding prompts are the following:

  • emoji of << TOK >>, astronaut, space ship background
  • oil painting of << TOK >>, business woman, suit
  • photo of << TOK >> , 3d portrait, ultra detailed, 8k render
  • anime of << TOK >>, ninja style, dark hair

Sample output images generated by finetuned model

Clean up

To avoid incurring AWS charges after you finish testing this example, make sure you delete the following resources:

  • Amazon SageMaker Studio Domain
  • Amazon EC2 Inf2 instance

Conclusion

This post has demonstrated how to fine-tune the Stable Diffusion XL (SDXL) model using DreamBooth and LoRA techniques on Amazon SageMaker, enabling enterprises to generate highly personalized and domain-specific images tailored to their unique requirements using as few as 10–12 training images. By using these techniques, businesses can rapidly adapt the SDXL model to their specific needs, unlocking new opportunities to enhance customer experiences and differentiate their offerings. Moreover, we showcased the process of compiling and deploying the fine-tuned SDXL model for inference on AWS Inferentia2 powered Amazon EC2 Inf2 instances, which deliver an unparalleled price-to-performance ratio for generative AI workloads, enabling enterprises to host fine-tuned SDXL models at scale in a cost-efficient manner. We encourage you to try the example and share your creations with us using hashtags #sagemaker #mme #genai on social platforms. We would love to see what you make.

For more examples about AWS Neuron, refer to aws-neuron-samples.


About the Authors

Deepti Tirumala is a Senior Solutions Architect at Amazon Web Services, specializing in Machine Learning and Generative AI technologies. With a passion for helping customers advance their AWS journey, she works closely with organizations to architect scalable, secure, and cost-effective solutions that leverage the latest innovations in these areas.

James Wu is a Senior AI/ML Specialist Solution Architect at AWS. helping customers design and build AI/ML solutions. James’s work covers a wide range of ML use cases, with a primary interest in computer vision, deep learning, and scaling ML across the enterprise. Prior to joining AWS, James was an architect, developer, and technology leader for over 10 years, including 6 years in engineering and 4 years in marketing & advertising industries.

Author headshotDiwakar Bansal is a Principal GenAI Specialist focused on business development and go-to- market for GenAI and Machine Learning accelerated computing services. Diwakar has led product definition, global business development, and marketing of technology products in the fields of IOT, Edge Computing, and Autonomous Driving focusing on bringing AI and Machine learning to these domains. Diwakar is passionate about public speaking and thought leadership in the Cloud and GenAI space.

Read More

How Aetion is using generative AI and Amazon Bedrock to translate scientific intent to results

How Aetion is using generative AI and Amazon Bedrock to translate scientific intent to results

This post is co-written with Javier Beltrán, Ornela Xhelili, and Prasidh Chhabri from Aetion. 

For decision-makers in healthcare, it is critical to gain a comprehensive understanding of patient journeys and health outcomes over time. Scientists, epidemiologists, and biostatisticians implement a vast range of queries to capture complex, clinically relevant patient variables from real-world data. These variables often involve complex sequences of events, combinations of occurrences and non-occurrences, as well as detailed numeric calculations or categorizations that accurately reflect the diverse nature of patient experiences and medical histories. Expressing these variables as natural language queries allows users to express scientific intent and explore the full complexity of the patient timeline.

Aetion is a leading provider of decision-grade real-world evidence software to biopharma, payors, and regulatory agencies. The company provides comprehensive solutions to healthcare and life science customers to rapidly and transparently transforms real-world data into real-world evidence.

At the core of the Aetion Evidence Platform (AEP) are Measures—logical building blocks used to flexibly capture complex patient variables, enabling scientists to customize their analyses to address the nuances and challenges presented by their research questions. AEP users can use Measures to build cohorts of patients and analyze their outcomes and characteristics.

A user asking a scientific question aims to translate scientific intent, such as “I want to find patients with a diagnosis of diabetes and a subsequent metformin fill,” into algorithms that capture these variables in real-world data. To facilitate this translation, Aetion developed a Measures Assistant to turn users’ natural language expressions of scientific intent into Measures.

In this post, we review how Aetion is using Amazon Bedrock to help streamline the analytical process toward producing decision-grade real-world evidence and enable users without data science expertise to interact with complex real-world datasets.

Amazon Bedrock is a fully managed service that provides access to high-performing foundation models (FMs) from leading AI startups and Amazon through a unified API. It offers a wide range of FMs, allowing you to choose the model that best suits your specific use case.

Aetion’s technology

Aetion is a healthcare software and services company that uses the science of causal inference to generate real-world evidence on the safety, effectiveness, and value of medications and clinical interventions. Aetion has partnered with the majority of top 20 biopharma, leading payors, and regulatory agencies.

Aetion brings deep scientific expertise and technology to life sciences, regulatory agencies (including FDA and EMA), payors, and health technology assessment (HTA) customers in the US, Canada, Europe, and Japan with analytics that can achieve the following:

  • Optimize clinical trials by identifying target populations, creating external control arms, and contextualizing settings and populations underrepresented in controlled settings
  • Expand industry access through label changes, pricing, coverage, and formulary decisions
  • Conduct safety and effectiveness studies for medications, treatments, and diagnostics

Aetion’s applications, including Discover and Substantiate, are powered by the AEP, a core longitudinal analytic engine capable of applying rigorous causal inference and statistical methods to hundreds of millions of patient journeys.

AetionAI, Aetion’s set of generative AI capabilities, are embedded across the AEP and applications. Measures Assistant is an AetionAI feature in Substantiate.

The following figure illustrates the organization of Aetion’s services.

Aetion Services

Measures Assistant

Users build analyses in Aetion Substantiate to turn real-world data into decision-grade real-world evidence. The first step is capturing patient variables from real-world data. Substantiate offers a wide range of Measures, as illustrated in the following screenshot. Measures can often be chained together to capture complex variables.

Measures Assistant

Suppose the user is assessing a therapy’s cost-effectiveness to help negotiate drug coverage with payors. The first step in this analysis is to filter out negative cost values that might appear in claims data. The user can ask AetionAI how to implement this, as shown in the following screenshot.

In another scenario, a user might want to define an outcome in their analysis as the change in hemoglobin over successive lab tests following the start of treatment. A user asks Measures Assistant a question expressed in natural language and receives instructions on how to implement this.

Solution overview

Patient datasets are ingested into the AEP and transformed into a longitudinal (timeline) format. AEP references this data to generate cohorts and run analyses. Measures are the variables that determine conditions for cohort entry, inclusion or exclusion, and the characteristics of a study.

The following diagram illustrates the solution architecture.

Architecture diagram

Measures Assistant is a microservice deployed in a Kubernetes on AWS environment and accessed through a REST API. The data transmitted to the service is encrypted using Transport Layer Security 1.2 (TLS). When a user asks a question through the assistant UI, Substantiate initiates a request containing the question and previous history of messages, if available. Measures Assistant incorporates the question into a prompt template and calls the Amazon Bedrock API to invoke Anthropic’s Claude 3 Haiku. The user-provided prompts and the requests sent to the Amazon Bedrock API are encrypted using TLS 1.2.

Aetion chose to use Amazon Bedrock for working with large language models (LLMs) due to its vast model selection from multiple providers, security posture, extensibility, and ease of use. Anthropic’s Claude 3 Haiku LLM was found to be more efficient in runtime and cost than available alternatives.

Measures Assistant maintains a local knowledge base about AEP Measures from scientific experts at Aetion and incorporates this information into its responses as guardrails. These guardrails make sure the service returns valid instructions to the user, and compensates for logical reasoning errors that the core model might exhibit.

The Measures Assistant prompt template contains the following information:

  • A general definition of the task the LLM is running.
  • Extracts of AEP documentation, describing each Measure type covered, its input and output types, and how to use it.
  • An in-context learning technique that includes semantically relevant solved questions and answers in the prompt.
  • Rules to condition the LLM to behave in a certain manner. For example, how to react to unrelated questions, keep sensitive data secure, or restrict its creativity in developing invalid AEP settings.

To streamline the process, Measures Assistant uses templates composed of two parts:

  • Static – Fixed instructions to be used with user questions. These instructions cover a broad range of well-defined instructions for Measures Assistant.
  • Dynamic – Questions and answers are dynamically selected from a local knowledge base based on semantic proximity to the user question. These examples improve the quality of the generated answers by incorporating similar previously asked and answered questions to the prompt. This technique models a small-scale, optimized, in-process knowledge base for a Retrieval Augmented Generation (RAG) pattern.

Mixedbread’s mxbai-embed-large-v1 Sentence Transformer was fine-tuned to generate sentence embeddings for a question-and-answer local knowledge base and users’ questions. Sentence question similarity is calculated through the cosine similarity between embedding vectors.

The generation and maintenance of the question-and-answer pool involve a human in the loop. Subject matter experts continuously test Measures Assistant, and question-and-answer pairs are used to refine it continually to optimize the user experience.

Outcomes

Our implementation of AetionAI capabilities enable users using natural language queries and sentences to describe scientific intent into algorithms that capture these variables in real-world data. Users now can turn questions expressed in natural language into measures in a matter minutes as opposed to days, without the need of support staff and specialized training.

Conclusion

In this post, we covered how Aetion uses AWS services to streamline the user’s path from defining scientific intent to running a study and obtaining results. Measures Assistant enables scientists to implement complex studies and iterate on study designs, instantaneously receiving guidance through responses to quick, natural language queries.

Aetion is continuing to refine the knowledge base available to Measures Assistant and expand innovative generative AI capabilities across its product suite to help improve the user experience and ultimately accelerate the process of turning real-world data into real-world evidence.

With Amazon Bedrock, the future of innovation is at your fingertips. Explore Generative AI Application Builder on AWS to learn more about building generative AI capabilities to unlock new insights, build transformative solutions, and shape the future of healthcare today.


About the Authors

Javier Beltrán is a Senior Machine Learning Engineer at Aetion. His career has focused on natural language processing, and he has experience applying machine learning solutions to various domains, from healthcare to social media.

Ornela Xhelili is a Staff Machine Learning Architect at Aetion. Ornela specializes in natural language processing, predictive analytics, and MLOps, and holds a Master’s of Science in Statistics. Ornela has spent the past 8 years building AI/ML products for tech startups across various domains, including healthcare, finance, analytics, and ecommerce.

Prasidh Chhabri is a Product Manager at Aetion, leading the Aetion Evidence Platform, core analytics, and AI/ML capabilities. He has extensive experience building quantitative and statistical methods to solve problems in human health.

Mikhail Vaynshteyn is a Solutions Architect with Amazon Web Services. Mikhail works with healthcare life sciences customers and specializes in data analytics services. Mikhail has more than 20 years of industry experience covering a wide range of technologies and sectors.

Read More