Amazon AWS – Page 80

Five ways Amazon is preparing for the energy demands of the future

June 25, 2024

by Amazon AWS

From investing in new carbon-free energy projects to advocating for grid modernization and collaborating with key stakeholders around the world, Amazon is working toward a cleaner energy future.Read More

Implement exact match with Amazon Lex QnAIntent

June 24, 2024

by Josh Rodgers Amazon AWS

This post is a continuation of Creating Natural Conversations with Amazon Lex QnAIntent and Amazon Bedrock Knowledge Base. In summary, we explored new capabilities available through Amazon Lex QnAIntent, powered by Amazon Bedrock, that enable you to harness natural language understanding and your own knowledge repositories to provide real-time, conversational experiences.

In many cases, Amazon Bedrock is able to generate accurate responses that meet the needs for a wide variety of questions and scenarios, using your knowledge content. However, some enterprise customers have regulatory requirements or more rigid brand guidelines, requiring certain questions to be answered verbatim with pre-approved responses. For these use cases, Amazon Lex QnAIntent provides exact match capabilities with both Amazon Kendra and Amazon OpenSearch Service knowledge bases.

In this post, we walk through how to set up and configure an OpenSearch Service cluster as the knowledge base for your Amazon Lex QnAIntent. In addition, exact match works with Amazon Kendra, and you can create an index and add frequently asked questions to your index. As detailed in Part 1 of this series, you can then select Amazon Kendra as your knowledge base under Amazon Lex QnA Configurations, provide your Amazon Kendra index ID, and select the exact match to let your bot return the exact response returned by Amazon Kendra.

Solution Overview

In the following sections, we walk through the steps to create an OpenSearch Service domain, create an OpenSearch index and populate it with documents, and test the Amazon Lex bot with QnAIntent.

Prerequisites

Before creating an OpenSearch Service cluster, you need to create an Amazon Lex V2 bot. If you don’t have an Amazon Lex V2 bot available, complete the following steps:

On the Amazon Lex console, choose Bots in the navigation pane.
Choose Create bot.
Select Start with an example.
For Example bot, choose BookTrip.

Enter a name and description for your bot.
Select Create a role with basic Amazon Lex permissions for your AWS Identity and Access Management (IAM) permissions runtime role.
Select No for Is use of your bot subject to the Children’s Online Privacy Protection Act (COPPA).
Choose Next.
Keep all defaults in the Add Languages to Bot section.
Choose Done to create your bot.

Create an OpenSearch Service domain

Complete the following steps to create your OpenSearch Service domain:

On the OpenSearch Service console, choose Dashboard under Managed clusters in the navigation pane.
Choose Create domain.

For Domain name, enter a name for your domain (for this post, we use my-domain).
For Domain creation method, select Easy create.

Under Engine options, for Version, choose the latest engine version. At the time of writing, the latest engine is OpenSearch_2.11.
Under Network, for this post, select Public access.
In an enterprise environment, you typically launch your OpenSearch Service cluster in a VPC.
Under Network, select Dual-stack mode.
Dual stack allows you to share domain resources across IPv4 and IPv6 address types, and is the recommended option.
Under Fine-grained access control, select Create master user.
Enter the user name and password of your choice.

Leave all other configurations at their default settings.
Choose Create.

It will take several minutes for your cluster to launch. When your cluster is ready, you will see a green Active status under Domain processing status.

Create an OpenSearch Service index

Complete the following steps to create an index:

On the domain details page, copy the domain endpoint under Domain endpoint (IPv4) to use later.
Choose the IPv4 URL link.

The IPv4 link will open the OpenSearch Dashboards login page.

Enter the user name and password you created earlier.

On the OpenSearch Dashboards welcome page, choose Explore on my own.

You can dismiss or cancel any additional modals or pop-ups.

Choose the options menu, then choose Dev Tools in the navigation pane.

On the Dev Tools page, enter the following code to create an index, then choose the run icon to send the request:

PUT my-domain-index
{
   "mappings": {
      "properties": {
         "question": {
            "type": "text"
         },
         "answer": {
            "type": "text"
         }
      }
   }
}

If successful, you will see the following message:

{
"acknowledged": true,
"shards_acknowledged": true,
"index": "my-domain-index"
}

Enter the following code to bulk index multiple documents you can use later to test:

POST _bulk
{ "index": { "_index": "my-domain-index", "_id" : "mdi00001" } }
{ "question" : "What are the check-in and check-out times?", "answer": "Check-in time is 3pm and check-out time is 11am at all FictitiousHotels locations. Early check-in and late check-out may be available upon request and availability. Please inquire at the front desk upon arrival." }
{ "index": { "_index": "my-domain-index", "_id" : "mdi00002" } }
{ "question" : "Do you offer airport shuttles?", "answer": "Airport shuttles are available at the following FictitiousHotels locations: - FictitiousHotels Dallas: Complimentary airport shuttle available to and from Dallas/Fort Worth International Airport. Shuttle runs every 30 minutes from 5am-11pm. - FictitiousHotels Chicago: Complimentary airport shuttle available to and from O'Hare International Airport and Chicago Midway Airport. Shuttle runs every hour from 5am-11pm. - FictitiousHotels San Francisco: Complimentary airport shuttle available to and from San Francisco International Airport. Shuttle runs every 30 minutes from 5am11pm. - FictitiousHotels New York: Complimentary shuttle available to and from LaGuardia Airport and JFK Airport. Shuttle runs every hour from 5am-11pm. Please contact the front desk at your FictitiousHotels location to schedule airport shuttle service at least 24 hours in advance. Shuttle services and hours may vary by location." }
{ "index": { "_index": "my-domain-index", "_id" : "mdi00003" } }
{ "question" : "Is parking available? What is the daily parking fee?", "answer": "Self-parking and valet parking are available at most FictitiousHotels locations. Daily self-parking rates range from $15-$30 per day based on location. Valet parking rates range from $25-$40 per day. Please contact your FictitiousHotels location directly for specific parking information and rates." }
{ "index": { "_index": "my-domain-index", "_id" : "mdi00004" } }
{ "question" : "4. What amenities are available at FictitiousHotels?", "answer": "Amenities available at most FictitiousHotels locations include: - Free wireless high-speed internet access - 24-hour fitness center - Outdoor pool and hot tub - 24-hour business center - On-site restaurant and bar - Room service - Laundry facilities - Concierge services - Meeting rooms and event space Specific amenities may vary by location. Contact your FictitiousHotels for details onamenities available during your stay." }
{ "index": { "_index": "my-domain-index", "_id" : "mdi00005" } }
{ "question" : "Is there an extra charge for children staying at FictitiousHotels?", "answer": "There is no extra charge for children 18 years and younger staying in the same room as their parents or guardians at FictitiousHotels locations in the United States and Canada. Rollaway beds are available for an additional $15 fee per night, subject to availability. Cribs are available free of charge on request. Please contact the front desk to request cribs or rollaway beds. Additional charges for extra occupants may apply at international FictitiousHotels locations." }
{ "index": { "_index": "my-domain-index", "_id" : "mdi00006" } }
{ "question" : "Does FictitiousHotels have a pool? What are the pool hours?", "answer": "Most FictitiousHotels locations have an outdoor pool and hot tub available for guest use. Pool hours vary by location but are generally open from 6am-10pm daily. Specific FictitiousHotels pool hours: - FictitiousHotels Miami: Pool open 24 hours - FictitiousHotels Las Vegas: Pool open 8am-8pm - FictitiousHotels Chicago: Indoor and outdoor pools, open 6am-10pm - FictitiousHotels New York: Rooftop pool, open 9am-7pm Please contact your FictitiousHotels front desk for specific pool hours during your stay. Hours may be subject to change due to weather conditions or seasonal schedules. Proper swimwear is required and no lifeguard is on duty at any time." }
{ "index": { "_index": "my-domain-index", "_id" : "mdi00007" } }
{ "question" : "Is the fitness center free for guests? What are the hours?", "answer": "Yes, access to the 24-hour fitness center is included for all FictitiousHotels guests at no extra charge. The fitness center offers a range of cardio and strength training equipment. Some locations also offer fitness classes, saunas, steam rooms, and other amenities for a fee. Please contact your FictitiousHotels for specific fitness center details. Access may be restricted to guests 18 years and older. Proper athletic attire and footwear is required." }
{ "index": { "_index": "my-domain-index", "_id" : "mdi00008" } }
{ "question" : "Does FictitiousHotels offer room service? What are the hours?", "answer": "24-hour room service is available at most FictitiousHotels locations. In-room dining menus offer a variety of breakfast, lunch, and dinner options. Hours may vary by on-site restaurants. A $5 delivery fee and 18% service charge applies to all room service orders. For quick service, please dial extension 707 from your guest room phone. Room service hours: - FictitiousHotels San Francisco: 24-hour room service - FictitiousHotels Chicago: Room service 7am-10pm - FictitiousHotels New Orleans: Room service 7am-11pm Please contact the front desk at your FictitiousHotels location for specific room service hours and menu options. Room service availability may be limited based on on-site restaurants." }
{ "index": { "_index": "my-domain-index", "_id" : "mdi00009" } }
{ "question" : "Does FictitiousHotels provide toiletries like shampoo, soap, etc?", "answer": "Yes, each FictitiousHotels room is stocked with complimentary toiletries and bath amenities including shampoo, conditioner, soap, lotion, and bath gel. Additional amenities like toothbrushes, razors, and shaving cream are available upon request at the front desk. If any items are missing from your room, please contact housekeeping." }
{ "index": { "_index": "my-domain-index", "_id" : "mdi00010" } }
{ "question" : "How can I get extra towels or have my room cleaned?", "answer": "Fresh towels and daily housekeeping service are provided free of charge. To request extra towels or pillows, additional amenities, or to schedule midstay service, please contact the front desk by dialing 0 on your in-room phone. Daily housekeeping includes trash removal, changing sheets and towels, vacuuming, dusting, and bathroom cleaning. Just let us know your preferred service times. A Do Not Disturb sign can be placed on your door to opt out for the day." }
{ "index": { "_index": "my-domain-index", "_id" : "mdi00011" } }
{ "question" : "Does FictitiousHotels provide hair dryers in the room?", "answer": "Yes, each guest room at FictitiousHotels locations includes a hair dryer. Hair dryers are typically located in the bathroom drawer or mounted to the bathroom wall. Please contact the front desk immediately if the hair dryer is missing or malfunctioning so we can replace it." }
{ "index": { "_index": "my-domain-index", "_id" : "mdi00012" } }
{ "question" : "What type of WiFi or internet access is available at FictitiousHotels?", "answer": "Free high-speed wireless internet access is available throughout all FictitiousHotels locations. To connect, simply choose the FictitiousHotels WiFi network on your device and open a web browser. For questions or issues with connectivity, please contact the front desk for assistance. Wired internet access is also available in FictitiousHotels business centers and meeting rooms. Printers, computers, and IT support may be available for business services and events. Please inquire with your FictitiousHotels for details on business services." }
{ "index": { "_index": "my-domain-index", "_id" : "mdi00013" } }
{ "question" : "Does FictitiousHotels have electric car charging stations?", "answer": "Select FictitiousHotels locations offer electric vehicle charging stations on-site, typically located in self-parking areas. Availability varies by location. Please contact your FictitiousHotels to check availability and charging rates. Most stations offer Level 2 charging. Charging station locations include: - FictitiousHotels Portland: 2 stations - FictitiousHotels Los Angeles: 4 stations - FictitiousHotels San Francisco: 6 stations Guests can request an on-site parking spot nearest the charging stations when booking parking accommodations. Charging rates may apply." }
{ "index": { "_index": "my-domain-index", "_id" : "mdi00014" } }
{ "question" : "What is the pet policy at FictitiousHotels? Are dogs allowed?", "answer": "Pets are welcome at participating FictitiousHotels locations for an additional fee of $50 per stay. Restrictions may apply based on size, breed, or other factors. Please contact your FictitiousHotels in advance to confirm pet policies. FictitiousHotels locations in Atlanta, Austin, Chicago, Denver, Las Vegas and Seattle allow dogs under 50 lbs. Certain dog breeds may be restricted. Cats may also be permitted. Non-refundable pet fees apply. Pet owners are responsible for cleaning up after pets on hotel grounds. Pets must be attended at all times and may not be a disturbance to other guests. Pets are restricted from restaurants, lounges, fitness areas, and pool decks at all FictitiousHotels locations." }
{ "index": { "_index": "my-domain-index", "_id" : "mdi00015" } }
{ "question" : "Does FictitiousHotels have laundry facilities for guest use?", "answer": "Yes, self-service laundry facilities with washers and dryers are available for guests to use at all FictitiousHotels locations. Laundry facilities are typically located on the 2nd floor adjacent to vending machines and ice machines. Detergent is available for purchase via vending machines. The cost is $2.50 to wash and $2.50 to dry per load. Quarters can be obtained at the front desk. For any assistance with laundry services, please dial 0 and speak with the front desk. Valet laundry and dry-cleaning services may be offered for an additional fee." }
{ "index": { "_index": "my-domain-index", "_id" : "mdi00016" } }
{ "question" : "Can I request extra pillows or blankets for my FictitiousHotels room?", "answer": "Absolutely. Our housekeeping team is happy to bring additional pillows, blankets, towels and other necessities to make your stay more comfortable. We offer hypoallergenic pillows and have extra blankets available upon request. Please contact the FictitiousHotels front desk to make a special request. Dial 0 on your in-room phone. Extra amenities are subject to availability. Extra bedding must be left in the guest room at checkout to avoid additional fees." }
{ "index": { "_index": "my-domain-index", "_id" : "mdi00017" } }
{ "question" : "Does FictitiousHotels provide cribs or rollaway beds?", "answer": "Yes, cribs and rollaway beds are available upon request at all FictitiousHotels locations. Please contact the front desk as far in advance as possible to make arrangements, as these are limited in quantity. Cribs are provided complimentary as a courtesy. Rollaway beds are subject to an additional fee of $15 per night." }
{ "index": { "_index": "my-domain-index", "_id" : "mdi00018" } }
{ "question" : "What type of accessible rooms or ADA rooms does FictitiousHotels offer?", "answer": "FictitiousHotels provides accessible guest rooms tailored for those with disabilities and mobility needs. Accessible rooms feature widened doorways, lowered beds and sinks, accessible showers or tubs with grab bars, and other ADA compliant features. Please request an accessible room at the time of booking to ensure availability." }
{ "index": { "_index": "my-domain-index", "_id" : "mdi00019" } }
{ "question" : "Does FictitiousHotels provide microwaves and mini-fridges?", "answer": "Microwave and mini-refrigerator combos are available in select room types upon request and subject to availability. When booking your reservation, please inquire about availability of fridges and microwaves at your preferred FictitiousHotels location. A limited number are available. An additional $15 daily fee applies for use." }
{ "index": { "_index": "my-domain-index", "_id" : "mdi00020" } }
{ "question" : "Can I rent a conference or meeting room at FictitiousHotels?", "answer": "Yes, FictitiousHotels offers conference and meeting rooms available for rent at competitive rates. Options range from board rooms seating 8 to ballrooms accommodating up to 300 guests. State-of-the-art AV equipment is available for rent. Contact the Events Department to check availability and request a quote." }
{ "index": { "_index": "my-domain-index", "_id" : "mdi00021" } }
{ "question" : "Is there an ATM or cash machine at FictitiousHotels?", "answer": "For your convenience, ATMs are located near the front desk and lobby at all FictitiousHotels locations. The ATMs provide 24/7 access to cash in amounts up to $500 per transaction and accept all major credit and debit cards. Foreign transaction fees may apply. Please see the front desk if you need any assistance locating or using the ATM during your stay." }
{ "index": { "_index": "my-domain-index", "_id" : "mdi00022" } }
{ "question" : "Does FictitiousHotels have a spa or offer spa services?", "answer": "Select FictitiousHotels locations offer luxurious on-site spas providing massages, facials, body treatments, manicures and pedicures. For availability and booking at your FictitiousHotels, please ask the front desk for details or visit the spa directly. Day passes may be available for non-hotel guests. Additional spa access fees apply." }
{ "index": { "_index": "my-domain-index", "_id" : "mdi00023" } }
{ "question" : "Can I get a late checkout from FictitiousHotels?", "answer": "Late checkout may be available at participating FictitiousHotels locations based on availability. The standard checkout time is by 11am. Please inquire about late checkout options at check-in or contact the front desk at least 24 hours prior to your departure date to make arrangements. Late checkouts are subject to a half-day room rate charge." }
{ "index": { "_index": "my-domain-index", "_id" : "mdi00024" } }
{ "question" : "Does FictitiousHotels offer room upgrades?", "answer": "Room upgrades may be purchased upon check-in based on availability. Upgrades to suites, executive floors, or rooms with preferred views are subject to additional charges. Rates vary by date, room type, and location. Please inquire about upgrade options and pricing at the front desk during check-in. Advance reservations are recommended to guarantee upgrades." }
{ "index": { "_index": "my-domain-index", "_id" : "mdi00025" } }
{ "question" : "Do the FictitiousHotels rooms have air conditioning and heating?", "answer": "Yes, every guest room at all FictitiousHotels locations is equipped with individual climate controls allowing air conditioning or heating as desired. To operate, simply adjust the thermostat in your room. If you have any issues regulating the temperature, please contact the front desk immediately and we will send an engineer." }
{ "index": { "_index": "my-domain-index", "_id" : "mdi00026" } }
{ "question" : "Does FictitiousHotels provide wake-up call service?", "answer": "Complimentary wake-up calls are available upon request. Please contact the front desk to schedule a customized wake-up call during your stay. In-room alarm clocks are also provided for your convenience. For international locations, please specify if you need a domestic or international phone call." }
{ "index": { "_index": "my-domain-index", "_id" : "mdi00027" } }
{ "question" : "Can I smoke at FictitiousHotels? What is the smoking policy?", "answer": "For the comfort of all guests, FictitiousHotels enforces a non-smoking policy in all guest rooms and indoor public spaces. Designated outdoor smoking areas are available on-site. A minimum $200 cleaning fee will be charged for smoking detected in rooms. Smoking is prohibited by law on all hotel shuttle buses. Thank you for not smoking inside FictitiousHotels." }
{ "index": { "_index": "my-domain-index", "_id" : "mdi00028" } }
{ "question" : "Does FictitiousHotels offer child care services?", "answer": "No, we apologize that child care services are not available at FictitiousHotels locations. As an alternative, our front desk can provide recommendations for qualified local babysitting agencies and nanny services to assist families during their stay. Please let us know if you need any recommendations. Additional fees will apply." }
{ "index": { "_index": "my-domain-index", "_id" : "mdi00029" } }
{ "question" : "What restaurants are located in FictitiousHotels?", "answer": "Onsite dining options vary by location. Many FictitiousHotelss feature 24-hour cafes, coffee shops, trendy bars, steakhouses, and international cuisine. Please check with your FictitiousHotels front desk for all restaurants available on-site during your stay and operating hours. Room service is also available." }
{ "index": { "_index": "my-domain-index", "_id" : "mdi00030" } }
{ "question" : "Does FictitiousHotels provide transportation or town car service?", "answer": "FictitiousHotels can arrange transportation, car service, and limousine transfers for an additional fee. Please contact the concierge desk at least 24 hours in advance to make arrangements. We have relationships with reputable local car services and drivers. Airport shuttles, taxis, and other transportation can also be requested through your FictitiousHotels front desk." }
{ "index": { "_index": "my-domain-index", "_id" : "mdi00031" } }
{ "question" : "FictitiousHotels New York City", "answer" : "Ideally situated in Midtown Manhattan on 52nd Street, FictitiousHotels New York City positions you in the heart of the city's top attractions. This modern 25- story glass tower overlooks the bright lights of Broadway and Times Square, just minutes from your guestroom door. Inside, enjoy contemporary styling melded with classic New York flair. 345 well-appointed rooms feature plush bedding, marble bathrooms, room service, and scenic city views. On-site amenities include a state-of-the-art fitness center, business center, cocktail lounge with nightly live music, and farm-to-table restaurant serving sustainably sourced American fare. Venture outside to nearby Rockefeller Center, Radio City Music Hall, Central Park, the Museum of Modern Art and Fifth Avenue’s world-renowned shopping. Catch a Broadway show on the same block or take a short stroll to Restaurant Row’s vast culinary offerings. Grand Central Station sits under 10 minutes away." }
{ "index": { "_index": "my-domain-index", "_id" : "mdi00032" } }
{ "question" : "FictitiousHotels Chicago", "answer" : "Conveniently situated just steps from North Michigan Avenue in downtown Chicago, FictitiousHotels Chicago envelopes you in Midwestern hospitality and luxury. This sleek 50-story high rise showcases gorgeous city vistas in each of the 453 elegantly appointed guest rooms and suites. Wake up refreshed in pillowtop beds, slip into plush robes and enjoy gourmet in-room coffee service. The heated indoor pool and expansive fitness center help you stay active and refreshed, while the lobby cocktail lounge serves up local craft beers and signature cocktails. Start your day with breakfast at the Café before venturing out to the city’s top cultural attractions like the Art Institute, Millennium Park, Navy Pier and Museum Campus. Shoppers can walk just next door to Chicago’s best retail at high-end department stores and independent boutiques. Business travelers appreciate our central location and 40,000 square feet of modern event space. Enjoy easy access to Chicago’s finest dining, entertainment and more." }
{ "index": { "_index": "my-domain-index", "_id" : "mdi00033" } }
{ "question" : "FictitiousHotels Orlando", "answer" : "FictitiousHotels Orlando welcomes you with sunshine and hospitality just 3 miles from The theme parks. The resort hotel’s sprawling campus features 3 outdoor pools, 6 restaurants and lounges, full-service spa, waterpark and 27-hole championship golf course. 1,500 guestrooms cater to families and couples alike with amenities like mini-fridges, marble bathrooms, themed kids’ suites with bunk beds and separate family suites. Onsite activities range from Camp FictitiousHotels kids’ programs to poolside movies under the stars. Complimentary theme park shuttles take you directly to the theme parks and more. Area attractions like theme parks and water parks are just a short drive away. Golf fans are minutes from various golf courses. With endless recreation under the warm Florida sun, FictitiousHotels Orlando keeps the entire family entertained and happy." }
{ "index": { "_index": "my-domain-index", "_id" : "mdi00034" } }
{ "question" : "FictitiousHotels San Francisco", "answer" : "Rising over the San Francisco Bay, FictitiousHotels San Francisco treats you to panoramic waterfront views. Perched on the Embarcadero in the lively Financial District, this sleek downtown hotel blends innovative technology with California charm across 32 floors. Contemporary rooms feature voice activated controls, intuitive lighting, rainfall showers with built-in Bluetooth speakers and floor-to-ceiling windows perfect for gazing at the Bay Bridge. Sample bites from top NorCal chefs at our signature farm- to-table restaurant or sip craft cocktails beside the outdoor heated pool. Stay connected at the lobby work bar or get moving in the 24/7 fitness center. Union Square shopping sits just up the street, while iconic landmarks like the Golden Gate Bridge, Alcatraz and Fisherman's Wharf are only minutes away. Venture to Chinatown and North Beach's Italian flavors or catch a cable car straight up to Ghirardelli Square. Immerse yourself in the best of the City by the Bay." }
{ "index": { "_index": "my-domain-index", "_id" : "mdi00035" } }
{ "question" : "FictitiousHotels Honolulu", "answer" : "A true island escape awaits at FictitiousHotels Honolulu, nestled on the pristine shores of Waikiki Beach. Swaying palms frame our family-friendly resort featuring three outdoor pools, cultural activities like lei making and ukulele lessons and the island's largest lagoon waterpark. You’ll feel the spirit of ‘ohana – family – in our welcoming staff and signature Hawaiian hospitality. 1,200 newly renovated rooms open to lanais overlooking swaying palms and the sparkling blue Pacific. Five dining options include Polynesian cuisine, island-inspired plates and indulgent character breakfasts. Complimentary beach chairs and towels invite you to sunbathe on soft white sand just steps out the lobby. Take our shuttle to Pearl Harbor, historic ‘Iolani Palace or the famous North Shore. From snorkeling at Hanauma Bay to whale watching in winter, FictitiousHotels Honolulu lets you experience O’ahu's gorgeous island paradise." }
{ "index": { "_index": "my-domain-index", "_id" : "mdi00036" } }
{ "question" : "FictitiousHotels London", "answer" : "Situated in fashionable South Kensington overlooking Cromwell Road, FictitiousHotels London places you in the heart of Victorian grandeur and modern city buzz. This 19th century row house turned design hotel blends contemporary style with classic British sophistication across 210 rooms. Original touches like working fireplaces and ornate crown molding offset sleek decor and high-tech in-room tablets controlling lights, TV and 24-hour room service. Fuel up on full English breakfast and locally roasted coffee at our indoor café or unwind with afternoon tea in the English Garden. Work out in the fitness studio before indulging in an evening massage. Our concierge arranges VIP access at nearby museums and priority bookings for West End theatre. Top shopping at Harrod's and the King's Road are a quick Tube ride away. Whether here for business or pleasure, FictitiousHotels London provides five-star luxury in an unmatched location." }

If successful, you will see another message similar to that in the following screenshot.

If you want to update, delete, or add your own test documents, refer to the OpenSearch Document APIs.

Before setting up QnAIntent, make sure you have added access to the Amazon Bedrock model you intend to use.

Now that test data is populated in the OpenSearch Service domain, you can test it with the Amazon Lex bot.

Test your Amazon Lex bot

To test the bot, complete the following steps:

On the Amazon Lex console, navigate to the QnAIntent feature of the bot you created as a prerequisite.
Choose the language, which for this post is English (US).
Under Generative AI Configurations, choose Configure.

Under QnA configuration, choose Create QnA intent.
For Intent name, enter a name (for this post, FicticiousHotelsFAQ).
Choose Add.
Choose the intent you just added.

Under QnA configuration, choose OpenSearch as the knowledge store.
For Domain endpoint, enter the endpoint you copied earlier.
For Index name, enter a name (for example, my-domain-index).
For Exact Response, select Yes.
For Question Field, enter question.
For Answer Field, enter answer.
Choose Save intent.

Because you used the Easy create option to launch your OpenSearch Service domain, fine-grained access was enabled by default. You need to locate the Amazon Lex IAM role and add permissions to the OpenSearch Service domain to allow Amazon Lex to interact with OpenSearch Service.

Navigate to the draft version of your bot in the navigation pane.
Choose the link for IAM permissions runtime role.
Copy the ARN of the role to use later.

Navigate back to OpenSearch Dashboards.
If you closed your browser tab or navigated away from this page, you can find this again by locating the IPv4 URL on the OpenSearch Service console from a previous step.
On the options menu, choose Security.
Choose Roles in the navigation pane.
Select the role all_access.

Choose Mapped users, then choose Manage mapping.
For Backend roles, enter the IAM runtime role ARN you copied earlier.
Choose Map.

On the Amazon Lex console, navigate back to your bot and the English (US) language.
Choose Build to build your bot.
Choose Test to test your bot.

Make sure your bot has the following permissions to use QnAIntent. These permissions should be added automatically by default.

When the Amazon Lex test chat window launches, enter a question from your sample OpenSearch Service documents, such as “What are the check-in and check-out times?”

Clean Up

In order to not incur ongoing costs, delete the resources you created as part of this post:

Amazon Lex V2 bot
OpenSearch Service domain

Conclusion

Amazon Lex QnAIntent provides the flexibility and choice to use a variety of different knowledge bases to generate accurate responses to questions based on your own documents and authorized knowledge sources. You can choose to let Amazon Bedrock generate a response to questions based on the results from your knowledge base, or you can generate exact response answers using Amazon Kendra or OpenSearch Service knowledge bases.

In this post, we demonstrated how to launch and configure an OpenSearch Service domain, populate an OpenSearch Service index with sample documents, and configure the exact response option using the index with Amazon Lex QnAIntent.

You can start taking advantage of Amazon Lex QnAIntent today and transform your customer experience.

About the Authors

Josh Rodgers is a Senior Solutions Architect for AWS who works with enterprise customers in the travel and hospitality vertical. Josh enjoys working with customers to solve complex problems with a focus on serverless technologies, DevOps, and security. Outside of work, Josh enjoys hiking, playing music, skydiving, painting, and spending time with family.

Thomas Rindfuss is a Sr. Solutions Architect on the Amazon Lex team. He invents, develops, prototypes, and evangelizes new technical features and solutions for language AI services that improve the customer experience and ease adoption.

How Krikey AI harnessed the power of Amazon SageMaker Ground Truth to accelerate generative AI development

June 24, 2024

by Jhanvi Shriram Amazon AWS

This post is co-written with Jhanvi Shriram and Ketaki Shriram from Krikey.

Krikey AI is revolutionizing the world of 3D animation with their innovative platform that allows anyone to generate high-quality 3D animations using just text or video inputs, without needing any prior animation experience. At the core of Krikey AI’s offering is their powerful foundation model trained to understand human motion and translate text descriptions into realistic 3D character animations. However, building such a sophisticated artificial intelligence (AI) model requires tremendous amounts of high-quality training data.

Krikey AI faced the daunting task of labeling a vast amount of data input containing body motions with descriptive text labels. Manually labeling this dataset in-house was impractical and prohibitively expensive for the startup. But without these rich labels, their customers would be severely limited in the animations they could generate from text inputs.

Amazon SageMaker Ground Truth is an AWS managed service that makes it straightforward and cost-effective to get high-quality labeled data for machine learning (ML) models by combining ML and expert human annotation. Krikey AI used SageMaker Ground Truth to expedite the development and implementation of their text-to-animation model. SageMaker Ground Truth provided and managed the labeling workforce, provided advanced data labeling workflows, and automated workflows for human-in-the-loop tasks, enabling Krikey AI to efficiently source precise labels tailored to their needs.

SageMaker Ground Truth Implementation

As a small startup working to democratize 3D animation through AI, Krikey AI faced the challenge of preparing a large labeled dataset to train their text-to-animation model. Manually labeling each data input with descriptive annotations proved incredibly time-consuming and impractical to do in-house at scale. With customer demand rapidly growing for their AI animation services, Krikey AI needed a way to quickly obtain high-quality labels across diverse and broad categories. Not having high-quality descriptive labels and tags would severely limit the animations their customers could generate from text inputs. Partnering with SageMaker Ground Truth provided the solution, allowing Krikey AI to efficiently source precise labels tailored to their needs.

SageMaker Ground Truth allows you to set up labeling workflows and use a private or vendor workforce for labeling or a sourced and managed workforce, along with additional features like data labeling workflows, to further accelerate and optimize the data labeling process. Krikey AI opted to use SageMaker Ground Truth to take advantage of its advanced data labeling workflows and model-assisted labeling capabilities, which further streamlined and optimized their large-scale labeling process for training their AI animation models. Data was stored in Amazon Simple Storage Solution (Amazon S3) and AWS Key Management Service (AWS KMS) was used for data protection.

The SageMaker Ground Truth team provided a two-step solution to prepare high-quality training datasets for Krikey AI’s model. First, the team developed a custom labeling interface tailored to Krikey AI’s requirements. This interface enabled annotators to deliver accurate captions while maintaining high productivity levels. The user-friendly interface provided annotators with various options to add detailed and multiple descriptions, helping them implement comprehensive labeling of the data. The following screenshot shows an example.

Second, the team sourced and managed a workforce that met Krikey AI’s specific requirements. Krikey AI needed to quickly process a vast amount of data inputs with succinct and descriptive labels, tags, and keywords in English. Rapidly processing the large amount of data inputs allowed Krikey AI to enter the market quickly with their unique 3D animation platform.

Integral to Krikey AI’s successful partnership with SageMaker Ground Truth was the ability to frequently review and refine the labeling process. Krikey AI held weekly calls to examine sample labeled content and provide feedback to the SageMaker Ground Truth team. This allowed them to continuously update the guidelines for what constituted a high-quality descriptive label as they progressed through different categories. Having this depth of involvement and ability to recalibrate the labeling criteria was critical for making sure the precise, rich labels were captured across all their data, which wouldn’t have been possible for Krikey AI to achieve on their own.

The following diagram illustrates the SageMaker Ground Truth architecture.

Overall Architecture

Krikey AI built their AI-powered 3D animation platform using a comprehensive suite of AWS services. At the core, they use Amazon Simple Storage Solution (Amazon S3) for data storage, Amazon Elastic Kubernetes Service (Amazon EKS) for running containerized applications, Amazon Relational Database Service (Amazon RDS) for databases, Amazon ElastiCache for in-memory caching, and Amazon Elastic Compute Cloud (Amazon EC2) instances for computing workloads. Their web application is developed using AWS Amplify. The critical component enabling their text-to-animation AI is SageMaker Ground Truth, which allows them to efficiently label a massive training dataset. This AWS infrastructure allows Krikey AI to serve their direct-to-consumer AI animation tool to customers globally and enables enterprise customers to deploy Krikey AI’s foundation models using Amazon SageMaker JumpStart, as well as self-host the no-code 3D animation editor within their own AWS environment.

Results

Krikey AI’s partnership with SageMaker Ground Truth enabled them to rapidly build a massive dataset of richly labeled motion data in just 3 months and generate high-quality labels for their large dataset, which fueled their state-of-the-art text-to-animation AI model, accelerated their time-to-market, and saved over $200,000 in labeling costs.

“Amazon SageMaker Ground Truth has been game-changing for Krikey AI. Their skilled workforce and streamlined workflows allowed us to rapidly label the massive datasets required to train our innovative text-to-animation AI models. What would have taken our small team months, SageMaker Ground Truth helped us achieve in weeks—accelerating our ability to bring transformative generative AI capabilities to media, entertainment, gaming, and sports. With SageMaker Ground Truth as an extension of our team, we achieved our goal of providing an easy-to-use animation tool that anyone can use to animate a 3D character. This simply would not have been possible without the speed, scale, and quality labeling delivered by SageMaker Ground Truth. They were a true force multiplier for our AI development.”

– Dr. Ketaki Shriram, Co-Founder and CTO of Krikey AI.

Conclusion

The time and cost savings, along with access to premium labeled data, highlights the immense value SageMaker Ground Truth offers startups working with generative AI. To learn more and get started, visit Amazon SageMaker Ground Truth.

About Krikey AI

Krikey AI Animation tools empower anyone to animate a 3D character in minutes. The character animations can be used in marketing, tutorials, games, films, social media, lesson plans, and more. In addition to a video-to-animation and text-to-animation AI model, Krikey offers a 3D editor that creators can use to add lip-synched dialogue, change backgrounds, facial expressions, hand gestures, camera angles, and more to their animated videos. Krikey’s AI tools are available online at www.krikey.ai today, on Canva Apps, Adobe Express, and AWS Marketplace.

About the Authors

Jhanvi Shriram is the CEO of Krikey, an AI startup that she co-founded with her sister. Prior to Krikey, Jhanvi worked at YouTube as a Production Strategist on operations and creator community programs, which sparked her interest in working with content creators. In 2014, Jhanvi and her sister, Ketaki Shriram, co-produced a feature film that premiered at the Tribeca Film Festival and was acquired by Univision. Jhanvi holds a BA and MBA from Stanford University, and an MFA (Film Producing) from USC.

Dr. Ketaki Shriram is the CTO at Krikey, an AI animation startup. Krikey’s no-code 3D editor empowers anyone to create 3D content regardless of their background. Krikey’s tools can be used to produce content for games, films, marketing materials, and more. Dr. Shriram received her BA, MA, and PhD at the Stanford Virtual Human Interaction Lab. She previously worked at Google [x] and Meta’s Reality Labs. Dr. Shriram was selected for the Forbes 30 Under 30 2020 Class in the Gaming category.

Amanda Lester is a Senior Go-to-Market Specialist at AWS, helping to put artificial intelligence and machine learning in the hands of every developer and ML engineer. She is an experienced business executive with a proven track record of success at fast-growing technology companies. Amanda has a deep background in leading strategic go-to-market efforts for high growth technology. She is passionate about helping accelerate the growth of the tech community through programs to support gender equality, entrepreneurship, and STEM education.

Julia Rizhevsky is responsible for Growth and Go-to-Market for AWS human-in-the-loop services, serving customers building and fine-tuning AI models. Her team works with AWS customers on the cutting-edge of generative AI who are looking to leverage human intelligence to guide models to their desired behavior. Prior to AWS, Julia’s developed and launched consumer products in payments and financial services.

Ami Dani is a Senior Technical Program Manager at AWS focusing on AI/ML services. During her career, she has focused on delivering transformative software development projects for the federal government and large companies in industries as diverse as advertising, entertainment, and finance. Ami has experience driving business growth, implementing innovative training programs, and successfully managing complex, high-impact projects.

Manage Amazon SageMaker JumpStart foundation model access with private hubs

June 21, 2024

by Raju Rangan Amazon AWS

Amazon SageMaker JumpStart is a machine learning (ML) hub offering pre-trained models and pre-built solutions. It provides access to hundreds of foundation models (FMs). A private hub is a feature in SageMaker JumpStart that allows an organization to share their models and notebooks so as to centralize model artifacts, facilitate discoverability, and increase the reuse within the organization. With new models released daily, many enterprise admins want more control over the FMs that can be discovered and used by users within their organization (for example, only allowing models based on pytorch framework to be discovered).

Now enterprise admins can effortlessly configure granular access control over the FMs that SageMaker JumpStart provides out of box so that only allowed models can be accessed by users within their organizations. In this post, we discuss the steps required for an administrator to configure granular access control of models in SageMaker JumpStart using a private hub, as well as the steps for users to access and consume models from the private hub.

Solution overview

Starting today, with SageMaker JumpStart and its private hub feature, administrators can create repositories for a subset of models tailored to different teams, use cases, or license requirements using the Amazon SageMaker Python SDK. Admins can also set up multiple private hubs with different lists of models discoverable for different groups of users. Users are then only able to discover and use models within the private hubs they have access to through Amazon SageMaker Studio and the SDK. This level of control empowers enterprises to consume the latest in open weight generative artificial intelligence (AI) development while enforcing governance guardrails. Finally, admins can share access to private hubs across multiple AWS accounts, enabling collaborative model management while maintaining centralized control. SageMaker JumpStart uses AWS Resource Access Manager (AWS RAM) to securely share private hubs with other accounts in the same organization. The new feature is available in the us-east-2 AWS Region as of writing, and will be available to more Regions soon.

The following diagram shows an example architecture of SageMaker JumpStart with its public and private hub features. The diagram illustrates how SageMaker JumpStart provides access to different model repositories, with some users accessing the public SageMaker JumpStart hub and others using private curated hubs.

In the following section, we demonstrate how admins can configure granular access control of models in SageMaker JumpStart using a private hub. Then we show how users can access and consume allowlisted models in the private hub using SageMaker Studio and the SageMaker Python SDK. Finally, we look at how an admin user can share the private hub with users in another account.

Prerequisites

To use the SageMaker Python SDK and run the code associated with this post, you need the following prerequisites:

An AWS account that contains all your AWS resources
An AWS Identity and Access Management (IAM) role with access to SageMaker Studio notebooks
SageMaker JumpStart enabled in a SageMaker Studio domain

Create a private hub, curate models, and configure access control (admins)

This section provides a step-by-step guide for administrators to create a private hub, curate models, and configure access control for your organization’s users.

Because the feature has been integrated in the latest SageMaker Python SDK, to use the model granular access control feature with a private hub, let’s first update the SageMaker Python SDK:
```
!pip3 install sagemaker —force-reinstall —quiet
```

Next, import the SageMaker and Boto3 libraries:

import boto3
from sagemaker import Session
from sagemaker.jumpstart.hub.hub import Hub

Configure your private hub:
```
HUB_NAME="CompanyHub"
HUB_DISPLAY_NAME="Allowlisted Models"
HUB_DESCRIPTION="These are allowlisted models taken from the JumpStart Public Hub."
REGION="<your_region_name>" # for example, "us-west-2"
```
In the preceding code, HUB_NAME specifies the name of your Hub. HUB_DISPLAY_NAME is the display name for your hub that will be shown to users in UI experiences. HUB_DESCRIPTION is the description for your hub that will be shown to users.

Set up a Boto3 client for SageMaker:

sm_client = boto3.client('sagemaker')
session = Session(sagemaker_client=sm_client)
session.get_caller_identity_arn()

Check if the following policies have been already added to your admin IAM role; if not, you can add them as inline policies:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "s3:ListBucket",
                "s3:GetObject",
                "s3:GetObjectTagging"
            ],
            "Resource": [
                "arn:aws:s3:::jumpstart-cache-prod-<REGION>",
                "arn:aws:s3:::jumpstart-cache-prod-<REGION>/*"
            ],
            "Effect": "Allow"
        }
    ]
}

Replace the <REGION> placeholder using the configurations in Step 3.

In addition to setting up IAM permissions to the admin role, you need to scope down permissions for your users so they can’t access public contents.

Use the following policy to deny access to the public hub for your users. These can be added as inline policies in the user’s IAM role:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": "s3:*",
            "Effect": "Deny",
            "Resource": [
                "arn:aws:s3:::jumpstart-cache-prod-<REGION>",
                "arn:aws:s3:::jumpstart-cache-prod-<REGION>/*"
            ],
            "Condition": {
                "StringNotLike": {"s3:prefix": ["*.ipynb", "*/eula.txt"]}
            }
        },
        {
            "Action": "sagemaker:*",
            "Effect": "Deny",
            "Resource": [
                "arn:aws:sagemaker:<REGION>:aws:hub/SageMakerPublicHub",
                "arn:aws:sagemaker:<REGION>:aws:hub-content/SageMakerPublicHub/*/*"
            ]
        }
    ]
}

Replace the <REGION> placeholder in the policy using the configurations in Step 3.

After you have set up the private hub configuration and permissions, you’re ready to create the private hub.

Use the following code to create the private hub within your AWS account in the Region you specified earlier:

hub = Hub(hub_name=HUB_NAME, sagemaker_session=session)

try:
  hub.create(
      description=HUB_DESCRIPTION,
      display_name=HUB_DISPLAY_NAME
  )
  print(f"Successfully created Hub with name {HUB_NAME} in {REGION}")
except Exception as e:
  if "ResourceInUse" in str(e):
    print(f"A hub with the name {HUB_NAME} already exists in your account.")
  else:
    raise e

Use hub.describe() to verify the configuration of your hub.After your private hub is set up, you can add a reference to models from the SageMaker JumpStart public hub to your private hub. No model artifacts need to be managed by the customer. The SageMaker team will manage any version or security updates.For a list of available models, refer to Built-in Algorithms with pre-trained Model Table.

To search programmatically, run the command

filter_value = "framework == meta"
response = hub.list_sagemaker_public_hub_models(filter=filter_value)
models = response["hub_content_summaries"]
while response["next_token"]:
    response = hub.list_sagemaker_public_hub_models(filter=filter_value,
                                                    next_token=response["next_token"])
    models.extend(response["hub_content_summaries"])

print(models)

The filter argument is optional. For a list of filters you can apply, refer to SageMaker Python SDK.

Use the retrieved models from the preceding command to create model references for your private hub:
```
for model in models:
    print(f"Adding {model.get('hub_content_name')} to Hub")
    hub.create_model_reference(model_arn=model.get("hub_content_arn"), 
                               model_name=model.get("hub_content_name"))
```
The SageMaker JumpStart private hub offers other useful features for managing and interacting with the curated models. Administrators can check the metadata of a specific model using the hub.describe_model(model_name=<model_name>) command. To list all available models in the private hub, you can use a simple loop:
```
response = hub.list_models()
models = response["hub_content_summaries"]
while response["next_token"]:
    response = hub.list_models(next_token=response["next_token"])
    models.extend(response["hub_content_summaries"])

for model in models:
    print(model.get('HubContentArn'))
```
If you need to remove a specific model reference from the private hub, use the following command:
```
hub.delete_model_reference("<model_name>")
```
If you want to delete the private hub from your account and Region, you’ll need to delete all the HubContents first, then delete the private hub. Use the following code:
```
for model in models:
    hub.delete_model_reference(model_name=model.get('HubContentName')) 

hub.delete()
```

Interact with allowlisted models (users)

This section offers a step-by-step guide for users to interact with allowlisted models in SageMaker JumpStart. We demonstrate how to list available models, identify a model from the public hub, and deploy the model to endpoints from SageMaker Studio as well as the SageMaker Python SDK.

User experience in SageMaker Studio

Complete the following steps to interact with allowlisted models using SageMaker Studio:

On the SageMaker Studio console, choose JumpStart in the navigation pane or in the Prebuilt and automated solutions section.
Choose one of model hubs you have access to. If the user has access to multiple hubs, you’ll see a list of hubs, as shown in the following screenshot.
If the user has access to only one hub, you’ll go straight to the model list.
You can view the model details and supported actions like train, deploy, and evaluate.
To deploy a model, choose Deploy.
Modify your model configurations like instances and deployment parameters, and choose Deploy.

User experience using the SageMaker Python SDK

To interact with your models using the SageMaker Python SDK, complete the following steps:

Just like the admin process, the first step is to force reinstall the SageMaker Python SDK:
```
!pip3 install sagemaker —force-reinstall —quiet
```

Import the SageMaker and Boto3 libraries:

import boto3
from sagemaker import Session
from sagemaker.jumpstart.hub.hub import Hub
from sagemaker.jumpstart.model import JumpStartModel
from sagemaker.jumpstart.estimator import JumpStartEstimator

To access the models in your private hub, you need the Region and the name of the hub on your account. Fill out the HUB_NAME and REGION fields with the information provided by your administrator:

HUB_NAME="CompanyHub" 
REGION="<your_region_name>" # for example, "us-west-2"
sm_client = boto3.client('sagemaker') 
sm_runtime_client = boto3.client('sagemaker-runtime') 
session = Session(sagemaker_client=sm_client, 
                    sagemaker_runtime_client=sm_runtime_client)
hub = Hub(hub_name=HUB_NAME, sagemaker_session=session)

List the models available in your private hub using the following command:

response = hub.list_models()
models = response["hub_content_summaries"]
while response["next_token"]:
    response = hub.list_models(next_token=response["next_token"])
    models.extend(response["hub_content_summaries"])

print(models)

To get more information about a particular model, use the describe_model method:

model_name = "huggingface-llm-phi-2"
response = hub.describe_model(model_name=model_name) 
print(response)

You can deploy models in a hub with the Python SDK by using JumpStartModel. To deploy a model from the hub to an endpoint and invoke the endpoint with the default payloads, run the following code. To select which model from your hub you want to use, pass in a model_id and version. If you pass in * for the version, it will take the latest version available for that model_id in the hub. If you’re using a model gated behind a EULA agreement, pass in accept_eula=True.
```
model_id, version = "huggingface-llm-phi-2", "1.0.0"
model = JumpStartModel(model_id, version, hub_name=HUB_NAME, 
                            region=REGION, sagemaker_session=session)
predictor = model.deploy(accept_eula=False)
```

To invoke your deployed model with the default payloads, use the following code:

example_payloads = model.retrieve_all_examples()
for payload in example_payloads:
    response = predictor.predict(payload.body)
    print("nInputn", payload.body, "nnOutputn", 
                response[0]["generated_text"], "nn===============")

To delete the model endpoints that you created, use the following code:
```
predictor.delete_model()
predictor.delete_endpoint()
```

Cross-account sharing of private hubs

SageMaker JumpStart private hubs support cross-account sharing, allowing you to extend the benefits of your curated model repository beyond your own AWS account. This feature enables collaboration across different teams or departments within your organization, even when they operate in separate AWS accounts. By using AWS RAM, you can securely share your private hubs while maintaining control over access.

To share your private hub across accounts, complete the following steps:

On the AWS RAM console, choose Create resource share.
When specifying resource share details, choose the SageMaker hub resource type and select one or more private hubs that you want to share. When you share a hub with any other account, all of its contents are also shared implicitly.
Associate permissions with your resource share.
Use AWS account IDs to specify the accounts to which you want to grant access to your shared resources.
Review your resource share configuration and choose Create resource share.

It may take a few minutes for the resource share and principal associations to complete.

Admins that want to perform the preceding steps programmatically can enter the following command to initiate the sharing:

# create a resource share using the private hub
aws ram create-resource-share 
    --name test-share 
    --resource-arns arn:aws:sagemaker:<region>:<resource_owner_account_id>:hub/<hub_name> 
    --principals <consumer_account_id>  
    --region <region>

Replace the <resource_owner_account_id>, <consumer_account_id>, <hub_name>, and <region> placeholders with the appropriate values for the resource owner account ID, consumer account ID, name of the hub, and Region to use.

After you set up the resource share, the specified AWS account will receive an invitation to join. They must accept this invitation through AWS RAM to gain access to the shared private hub. This process makes sure access is granted only with explicit consent from both the hub owner and the recipient account. For more information, refer to Using shared AWS resources.

You can also perform this step programmatically:

# list resource shares
aws ram get-resource-share-invitations 
    --region <region>

# accept resource share
# using the arn from the previous response 
aws ram accept-resource-share-invitation  
  --resource-share-invitation-arn <arn_from_ previous_request> 
  --region <region>

For detailed instructions on creating resource shares and accepting invitations, refer to Creating a resource share in AWS RAM. By extending your private hub across accounts, you can foster collaboration and maintain consistent model governance across your entire organization.

Conclusion

SageMaker JumpStart allows enterprises to adopt FMs while maintaining granular control over model access and usage. By creating a curated repository of approved models in private hubs, organizations can align their AI initiatives with corporate policies and regulatory requirements. The private hub decouples model curation from model consumption, enabling administrators to manage the model inventory while data scientists focus on developing AI solutions.

This post explained the private hub feature in SageMaker JumpStart and provided steps to set up and use a private hub, with minimal additional configuration required. Administrators can select models from the public SageMaker JumpStart hub, add them to the private hub, and manage user access through IAM policies. Users can then deploy these preapproved models, fine-tune them on custom datasets, and integrate them into their applications using familiar SageMaker interfaces. The private hub uses the SageMaker underlying infrastructure, allowing it to scale with enterprise-level ML demands.

For more information about SageMaker JumpStart, refer to SageMaker JumpStart. To get started using SageMaker JumpStart, access it through SageMaker Studio.

About the Authors

Raju Rangan is a Senior Solutions Architect at AWS. He works with government-sponsored entities, helping them build AI/ML solutions using AWS. When not tinkering with cloud solutions, you’ll catch him hanging out with family or smashing birdies in a lively game of badminton with friends.

Sherry Ding is a senior AI/ML specialist solutions architect at AWS. She has extensive experience in machine learning with a PhD in computer science. She mainly works with public sector customers on various AI/ML-related business challenges, helping them accelerate their machine learning journey on the AWS Cloud. When not helping customers, she enjoys outdoor activities.

June Won is a product manager with Amazon SageMaker JumpStart. He focuses on making foundation models easily discoverable and usable to help customers build generative AI applications. His experience at Amazon also includes mobile shopping applications and last mile delivery.

Bhaskar Pratap is a Senior Software Engineer with the Amazon SageMaker team. He is passionate about designing and building elegant systems that bring machine learning to people’s fingertips. Additionally, he has extensive experience with building scalable cloud storage services.

eSentire delivers private and secure generative AI interactions to customers with Amazon SageMaker

June 21, 2024

by Aishwarya Subramaniam Amazon AWS

eSentire is an industry-leading provider of Managed Detection & Response (MDR) services protecting users, data, and applications of over 2,000 organizations globally across more than 35 industries. These security services help their customers anticipate, withstand, and recover from sophisticated cyber threats, prevent disruption from malicious attacks, and improve their security posture.

In 2023, eSentire was looking for ways to deliver differentiated customer experiences by continuing to improve the quality of its security investigations and customer communications. To accomplish this, eSentire built AI Investigator, a natural language query tool for their customers to access security platform data by using AWS generative artificial intelligence (AI) capabilities.

In this post, we share how eSentire built AI Investigator using Amazon SageMaker to provide private and secure generative AI interactions to their customers.

Benefits of AI Investigator

Before AI Investigator, customers would engage eSentire’s Security Operation Center (SOC) analysts to understand and further investigate their asset data and associated threat cases. This involved manual effort for customers and eSentire analysts, forming questions and searching through data across multiple tools to formulate answers.

eSentire’s AI Investigator enables users to complete complex queries using natural language by joining multiple sources of data from each customer’s own security telemetry and eSentire’s asset, vulnerability, and threat data mesh. This helps customers quickly and seamlessly explore their security data and accelerate internal investigations.

Providing AI Investigator internally to the eSentire SOC workbench has also accelerated eSentire’s investigation process by improving the scale and efficacy of multi-telemetry investigations. The LLM models augment SOC investigations with knowledge from eSentire’s security experts and security data, enabling higher-quality investigation outcomes while also reducing time to investigate. Over 100 SOC analysts are now using AI Investigator models to analyze security data and provide rapid investigation conclusions.

Solution overview

eSentire customers expect rigorous security and privacy controls for their sensitive data, which requires an architecture that doesn’t share data with external large language model (LLM) providers. Therefore, eSentire decided to build their own LLM using Llama 1 and Llama 2 foundational models. A foundation model (FM) is an LLM that has undergone unsupervised pre-training on a corpus of text. eSentire tried multiple FMs available in AWS for their proof of concept; however, the straightforward access to Meta’s Llama 2 FM through Hugging Face in SageMaker for training and inference (and their licensing structure) made Llama 2 an obvious choice.

eSentire has over 2 TB of signal data stored in their Amazon Simple Storage Service (Amazon S3) data lake. eSentire used gigabytes of additional human investigation metadata to perform supervised fine-tuning on Llama 2. This further step updates the FM by training with data labeled by security experts (such as Q&A pairs and investigation conclusions).

eSentire used SageMaker on several levels, ultimately facilitating their end-to-end process:

They used SageMaker notebook instances extensively to spin up GPU instances, giving them the flexibility to swap high-power compute in and out when needed. eSentire used instances with CPU for data preprocessing and post-inference analysis and GPU for the actual model (LLM) training.
The additional benefit of SageMaker notebook instances is its streamlined integration with eSentire’s AWS environment. Because they have vast amounts of data (terabyte scale, over 1 billion total rows of relevant data in preprocessing input) stored across AWS—in Amazon S3 and Amazon Relational Database Service (Amazon RDS) for PostgreSQL clusters—SageMaker notebook instances allowed secure movement of this volume of data directly from the AWS source (Amazon S3 or Amazon RDS) to the SageMaker notebook. They needed no additional infrastructure for data integration.
SageMaker real-time inference endpoints provide the infrastructure needed for hosting their custom self-trained LLMs. This was very useful in combination with SageMaker integration with Amazon Elastic Container Registry (Amazon ECR), SageMaker endpoint configuration, and SageMaker models to provide the entire configuration required to spin up their LLMs as needed. The fully featured end-to-end deployment capability provided by SageMaker allowed eSentire to effortlessly and consistently update their model registry as they iterate and update their LLMs. All of this was entirely automated with the software development lifecycle (SDLC) using Terraform and GitHub, which is only possible through SageMaker ecosystem.

The following diagram visualizes the architecture diagram and workflow.

The application’s frontend is accessible through Amazon API Gateway, using both edge and private gateways. To emulate intricate thought processes akin to those of a human investigator, eSentire engineered a system of chained agent actions. This system uses AWS Lambda and Amazon DynamoDB to orchestrate a series of LLM invocations. Each LLM call builds upon the previous one, creating a cascade of interactions that collectively produce high-quality responses. This intricate setup makes sure that the application’s backend data sources are seamlessly integrated, thereby providing tailored responses to customer inquiries.

When a SageMaker endpoint is constructed, an S3 URI to the bucket containing the model artifact and Docker image is shared using Amazon ECR.

For their proof of concept, eSentire selected the Nvidia A10G Tensor Core GPU housed in an MLG5 2XL instance for its balance of performance and cost. For LLMs with significantly larger numbers of parameters, which demand greater computational power for both training and inference tasks, eSentire used 12XL instances equipped with four GPUs. This was necessary because the computational complexity and the amount of memory required for LLMs can increase exponentially with the number of parameters. eSentire plans to harness P4 and P5 instance types for scaling their production workloads.

Additionally, a monitoring framework that captures the inputs and outputs of AI Investigator was necessary to enable threat hunting visibility to LLM interactions. To accomplish this, the application integrates with an open sourced eSentire LLM Gateway project to monitor the interactions with customer queries, backend agent actions, and application responses. This framework enables confidence in complex LLM applications by providing a security monitoring layer to detect malicious poisoning and injection attacks while also providing governance and support for compliance through logging of user activity. The LLM gateway can also be integrated with other LLM services, such as Amazon Bedrock.

Amazon Bedrock enables you to customize FMs privately and interactively, without the need for coding. Initially, eSentire’s focus was on training bespoke models using SageMaker. As their strategy evolved, they began to explore a broader array of FMs, evaluating their in-house trained models against those provided by Amazon Bedrock. Amazon Bedrock offers a practical environment for benchmarking and a cost-effective solution for managing workloads due to its serverless operation. This serves eSentire well, especially when customer queries are sporadic, making serverless an economical alternative to persistently running SageMaker instances.

From a security perspective as well, Amazon Bedrock doesn’t share users’ inputs and model outputs with any model providers. Additionally, eSentire have custom guardrails for NL2SQL applied to their models.

Results

The following screenshot shows an example of eSentire’s AI Investigator output. As illustrated, a natural language query is posed to the application. The tool is able to correlate multiple datasets and present a response.

Dustin Hillard, CTO of eSentire, shares: “eSentire customers and analysts ask hundreds of security data exploration questions per month, which typically take hours to complete. AI Investigator is now with an initial rollout to over 100 customers and more than 100 SOC analysts, providing a self-serve immediate response to complex questions about their security data. eSentire LLM models are saving thousands of hours of customer and analyst time.”

Conclusion

In this post, we shared how eSentire built AI Investigator, a generative AI solution that provides private and secure self-serve customer interactions. Customers can get near real-time answers to complex questions about their data. AI Investigator has also saved eSentire significant analyst time.

The aforementioned LLM gateway project is eSentire’s own product and AWS bears no responsibility.

If you have any comments or questions, share them in the comments section.

About the Authors

Aishwarya Subramaniam is a Sr. Solutions Architect in AWS. She works with commercial customers and AWS partners to accelerate customers’ business outcomes by providing expertise in analytics and AWS services.

Ilia Zenkov is a Senior AI Developer specializing in generative AI at eSentire. He focuses on advancing cybersecurity with expertise in machine learning and data engineering. His background includes pivotal roles in developing ML-driven cybersecurity and drug discovery platforms.

Dustin Hillard is responsible for leading product development and technology innovation, systems teams, and corporate IT at eSentire. He has deep ML experience in speech recognition, translation, natural language processing, and advertising, and has published over 30 papers in these areas.

Imperva optimizes SQL generation from natural language using Amazon Bedrock

June 20, 2024

by Ori Nakar Amazon AWS

This is a guest post co-written with Ori Nakar from Imperva.

Imperva Cloud WAF protects hundreds of thousands of websites against cyber threats and blocks billions of security events every day. Counters and insights based on security events are calculated daily and used by users from multiple departments. Millions of counters are added daily, together with 20 million insights updated daily to spot threat patterns.

Our goal was to improve the user experience of an existing application used to explore the counters and insights data. The data is stored in a data lake and retrieved by SQL using Amazon Athena.

As part of our solution, we replaced multiple search fields with a single free text field. We used a large language model (LLM) with query examples to make the search work using the language used by Imperva internal users (business analysts).

The following figure shows a search query that was translated to SQL and run. The results were later formatted as a chart by the application. We have many types of insights—global, industry, and customer level insights used by multiple departments such as marketing, support, and research. Data was made available to our users through a simplified user experience powered by an LLM.

Figure 1: Insights search by natural language

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading artificial intelligence (AI) companies such as AI21 Labs, Anthropic, Cohere, Meta, Mistral, Stability AI, and Amazon within a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI. Amazon Bedrock Studio is a new single sign-on (SSO)-enabled web interface that provides a way for developers across an organization to experiment with LLMs and other FMs, collaborate on projects, and iterate on generative AI applications. It offers a rapid prototyping environment and streamlines access to multiple FMs and developer tools in Amazon Bedrock.

Read more to learn about the problem, and how we obtained quality results using Amazon Bedrock for our experimentation and deployment.

The problem

Making data accessible to users through applications has always been a challenge. Data is normally stored in databases, and can be queried using the most common query language, SQL. Applications use different UI components to allow users to filter and query the data. There are applications with tens of different filters and other options–all created to make the data accessible.

Querying databases through applications cannot be as flexible as running SQL queries on a known schema. Giving more power to the user comes on account of simple user experience (UX). Natural language can solve this problem—it’s possible to support complex yet readable natural language queries without SQL knowledge. On schema changes, the application UX and code remain the same, or with minor changes, which saves development time and keeps the application user interface (UI) stable for the users.

Constructing SQL queries from natural language isn’t a simple task. SQL queries must be accurate both syntactically and logically. Using an LLM with the right examples can make this task less difficult.

Figure 2: High level database access using an LLM flow

The challenge

An LLM can construct SQL queries based on natural language. The challenge is to assure quality. The user can enter any text, and the application constructs a query based on it. There isn’t an option, like in traditional applications, to cover all options and make sure the application functions correctly. Adding an LLM to an application adds another layer of complexity. The response by the LLM is not deterministic. Examples sent to the LLM are based on the database data, which makes it even harder to control the requests sent to the LLM and assure quality.

The solution: A data science approach

In data science, it’s common to develop a model and fine tune it using experimentation. The idea is to use metrics to compare experiments during development. Experiments might differ from each other in many ways, such as the input sent to the model, the model type, and other parameters. The ability to compare different experiments makes it possible to make progress. It’s possible to know how each change contributes to the model.

A test set is a static set of records that includes a prediction result for each record. Running predictions on the test set records results with the metrics needed to compare experiments. A common metric is the accuracy, which is the percentage of the correct results.

In our case the results generated by the LLM are SQL statements. The SQL statements generated by the LLM are not deterministic and are hard to measure, however running SQL statements on a static test database is deterministic and can be measured. We used a test database and a list of questions with known answers as a test set. It allowed us to run experiments and fine tune our LLM-based application.

Database access using LLM: Question to answer flow

Given a question we defined the following flow. The question is sent through a retrieval-augmented generation (RAG) process, which finds similar documents. Each document holds an example question and information about it. The relevant documents are built as a prompt and sent to the LLM, which builds a SQL statement. This flow is used both for development and application runtime:

Figure 3: Question to answer flow

As an example, consider a database schema with two tables: orders and items. The following figure is a question to SQL example flow:

Figure 4: Question to answer flow example

Database access using LLM: Development process

To develop and fine-tune the application we created the following data sets:

A static test database: Contains the relevant tables and a sample copy of the data.
A test set: Includes questions and test database result answers.
Question to SQL examples: A set with questions and translation to SQL. For some examples returned data is included to allow asking questions about the data, and not only about the schema.

Development of the application is done by adding new questions and updating the different datasets, as shown in the following figure.

Figure 5: Adding a new question

Datasets and other parameter updates are tracked as part of adding new questions and fine-tuning of the application. We used a tracking tool to track information about the experiments such as:

Parameters such as the number of questions, number of examples, LLM type, RAG search method
Metrics such as the accuracy and SQL errors rate
Artifacts such as a list of the wrong results including generated SQL, data returned, and more

Figure 6: Experiment flow

Using a tracking tool, we were able to make progress by comparing experiments. The following figure shows the accuracy and error rate metrics for the different experiments we did:

Figure 7: Accuracy and error rate over time

When there’s a mistake or an error, a drill down to the false results and the experiment details is done to understand the source of the error and fix it.

Experiment and deploy using Amazon Bedrock

Amazon Bedrock is a managed service that offers a choice of high-performing foundation models. You can experiment with and evaluate top FMs for your use case and customize them with your data.

By using Amazon Bedrock, we were able to switch between models and embedding options easily. The following is an example code using the LangChain python library, which allows using different models and embeddings:

import boto3
from langchain_community.llms.bedrock import Bedrock
from langchain_community.embeddings import BedrockEmbeddings

def get_llm(model_id: str, args: dict):
   return Bedrock(model_id=model_id,
                  model_kwargs=args,
                  client=boto3.client("bedrock-runtime"))

def get_embeddings(model_id: str):
   return BedrockEmbeddings(model_id=model_id, 
                            client=boto3.client("bedrock-runtime"))

We used multiple models and embeddings with different hyper parameters to improve accuracy and decide which model is the best fit for us. We also tried to run experiments on smaller models, to determine if we can get to the same quality in terms of improved performance and reduced costs. We started using Anthropic Claude 2.1 and experimented with the Anthropic Claude instant model. Accuracy dropped by 20 percent, but after adding few additional examples, we achieved the same accuracy as Claude 2.1 with lower cost and faster response time

Conclusion

We used the same approach used in data science projects to construct SQL queries from natural language. The solution shown can be applied to other LLM-based applications, and not only for constructing SQL. For example, it can be used for API access, building JSON data, and more. The key is to create a test set together with measurable results and progress using experimentation.

Amazon Bedrock lets you use different models and switch between them to find the right one for your use case. You can compare different models, including small ones for better performance and costs. Because Amazon Bedrock is serverless, you don’t have to manage any infrastructure. We were able to test multiple models quickly, and finally integrate and deploy generative AI capabilities into our application.

You can start experimenting with natural language to SQL by running the code samples in this GitHub repository. This workshop is divided into modules that each build on the previous while introducing a new technique to solve this problem. Many of these approaches are based on an existing work from the community and cited accordingly.

About the Authors

Ori Nakar is a Principal cyber-security researcher, a data engineer, and a data scientist at Imperva Threat Research group.

Eitan Sela is a Generative AI and Machine Learning Specialist Solutions Architect at AWS. He works with AWS customers to provide guidance and technical assistance, helping them build and operate Generative AI and Machine Learning solutions on AWS. In his spare time, Eitan enjoys jogging and reading the latest machine learning articles.

Elad Eizner is a Solutions Architect at Amazon Web Services. He works with AWS enterprise customers to help them architect and build solutions in the cloud and achieving their goals.

Create natural conversations with Amazon Lex QnAIntent and Knowledge Bases for Amazon Bedrock

June 20, 2024

by Thomas Rindfuss Amazon AWS

Customer service organizations today face an immense opportunity. As customer expectations grow, brands have a chance to creatively apply new innovations to transform the customer experience. Although meeting rising customer demands poses challenges, the latest breakthroughs in conversational artificial intelligence (AI) empowers companies to meet these expectations.

Customers today expect timely responses to their questions that are helpful, accurate, and tailored to their needs. The new QnAIntent, powered by Amazon Bedrock, can meet these expectations by understanding questions posed in natural language and responding conversationally in real time using your own authorized knowledge sources. Our Retrieval Augmented Generation (RAG) approach allows Amazon Lex to harness both the breadth of knowledge available in repositories as well as the fluency of large language models (LLMs).

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI.

In this post, we show you how to add generative AI question answering capabilities to your bots. This can be done using your own curated knowledge sources, and without writing a single line of code.

Read on to discover how QnAIntent can transform your customer experience.

Solution overview

Implementing the solution consists of the following high-level steps:

Create an Amazon Lex bot.
Create an Amazon Simple Storage Service (Amazon S3) bucket and upload a PDF file that contains the information used to answer questions.
Create a knowledge base that will split your data into chunks and generate embeddings using the Amazon Titan Embeddings model. As part of this process, Knowledge Bases for Amazon Bedrock automatically creates an Amazon OpenSearch Serverless vector search collection to hold your vectorized data.
Add a new QnAIntent intent that will use the knowledge base to find answers to customers’ questions and then use the Anthropic Claude model to generate answers to questions and follow-up questions.

Prerequisites

To follow along with the features described in this post, you need access to an AWS account with permissions to access Amazon Lex, Amazon Bedrock (with access to Anthropic Claude models and Amazon Titan embeddings or Cohere Embed), Knowledge Bases for Amazon Bedrock, and the OpenSearch Serverless vector engine. To request access to models in Amazon Bedrock, complete the following steps:

On the Amazon Bedrock console, choose Model access in the navigation pane.
Choose Manage model access.
Select the Amazon and Anthropic models. (You can also choose to use Cohere models for embeddings.)
Choose Request model access.

Create an Amazon Lex bot

If you already have a bot you want to use, you can skip this step.

On the Amazon Lex console, choose Bots in the navigation pane.
Choose Create bot
Select Start with an example and choose the BookTrip example bot.
For Bot name, enter a name for the bot (for example, BookHotel).
For Runtime role, select Create a role with basic Amazon Lex permissions.
In the Children’s Online Privacy Protection Act (COPPA) section, you can select No because this bot is not targeted at children under the age of 13.
Keep the Idle session timeout setting at 5 minutes.
Choose Next.
When using the QnAIntent to answer questions in a bot, you may want to increase the intent classification confidence threshold so that your questions are not accidentally interpreted as matching one of your intents. We set this to 0.8 for now. You may need to adjust this up or down based on your own testing.
Choose Done.
Choose Save intent.

Upload content to Amazon S3

Now you create an S3 bucket to store the documents you want to use for your knowledge base.

On the Amazon S3 console, choose Buckets in the navigation pane.
Choose Create bucket.
For Bucket name, enter a unique name.
Keep the default values for all other options and choose Create bucket.

For this post, we created an FAQ document for the fictitious hotel chain called Example Corp FictitiousHotels. Download the PDF document to follow along.

On the Buckets page, navigate to the bucket you created.

If you don’t see it, you can search for it by name.

Choose Upload.
Choose Add files.
Choose the ExampleCorpFicticiousHotelsFAQ.pdf that you downloaded.
Choose Upload.

The file will now be accessible in the S3 bucket.

Create a knowledge base

Now you can set up the knowledge base:

On the Amazon Bedrock console, choose Knowledge base in the navigation pane.
Choose Create knowledge base.
For Knowledge base name¸ enter a name.
For Knowledge base description, enter an optional description.
Select Create and use a new service role.
For Service role name, enter a name or keep the default.
Choose Next.
For Data source name, enter a name.
Choose Browse S3 and navigate to the S3 bucket you uploaded the PDF file to earlier.
Choose Next.
Choose an embeddings model.
Select Quick create a new vector store to create a new OpenSearch Serverless vector store to store the vectorized content.
Choose Next.
Review your configuration, then choose Create knowledge base.

After a few minutes, the knowledge base will have been created.

Choose Sync to sync to chunk the documents, calculate the embeddings, and store them in the vector store.

This may take a while. You can proceed with the rest of the steps, but the syncing needs to finish before you can query the knowledge base.

Copy the knowledge base ID. You will reference this when you add this knowledge base to your Amazon Lex bot.

Add QnAIntent to the Amazon Lex bot

To add QnAIntent, compete the following steps:

On the Amazon Lex console, choose Bots in the navigation pane.
Choose your bot.
In the navigation pane, choose Intents.
On the Add intent menu, choose Use built-in intent.
For Built-in intent, choose AMAZON.QnAIntent.
For Intent name, enter a name.
Choose Add.
Choose the model you want to use to generate the answers (in this case, Anthropic Claude 3 Sonnet, but you can select Anthropic Claude 3 Haiku for a cheaper option with less latency).
For Choose knowledge store, select Knowledge base for Amazon Bedrock.
For Knowledge base for Amazon Bedrock Id, enter the ID you noted earlier when you created your knowledge base.
Choose Save Intent.
Choose Build to build the bot.
Choose Test to test the new intent.

The following screenshot shows an example conversation with the bot.

In the second question about the Miami pool hours, you refer back to the previous question about pool hours in Las Vegas and still get a relevant answer based on the conversation history.

It’s also possible to ask questions that require the bot to reason a bit around the available data. When we asked about a good resort for a family vacation, the bot recommended the Orlando resort based on the availability of activities for kids, proximity to theme parks, and more.

Update the confidence threshold

You may have some questions accidentally match your other intents. If you run into this, you can adjust the confidence threshold for your bot. To modify this setting, choose the language of your bot (English) and in the Language details section, choose Edit.

After you update the confidence threshold, rebuild the bot for the change to take effect.

Add addional steps

By default, the next step in the conversation for the bot is set to Wait for user input after a question has been answered. This keeps the conversation in the bot and allows a user to ask follow-up questions or invoke any of the other intents in your bot.

If you want the conversation to end and return control to the calling application (for example, Amazon Connect), you can change this behavior to End conversation. To update the setting, complete the following steps:

On the Amazon Lex console, navigate to the QnAIntent.
In the Fulfillment section, choose Advanced options.
On the Next step in conversation dropdown menu, choose End conversation.

If you would like the bot add a specific message after each response from the QnAIntent (such as “Can I help you with anything else?”), you can add a closing response to the QnAIntent.

Clean up

To avoid incurring ongoing costs, delete the resources you created as part of this post:

Amazon Lex bot
S3 bucket
OpenSearch Serverless collection (This is not automatically deleted when you delete your knowledge base)
Knowledge bases

Conclusion

The new QnAIntent in Amazon Lex enables natural conversations by connecting customers with curated knowledge sources. Powered by Amazon Bedrock, the QnAIntent understands questions in natural language and responds conversationally, keeping customers engaged with contextual, follow-up responses.

QnAIntent puts the latest innovations in reach to transform static FAQs into flowing dialogues that resolve customer needs. This helps scale excellent self-service to delight customers.

Try it out for yourself. Reinvent your customer experience!

About the Author

Thomas Rinfuss is a Sr. Solutions Architect on the Amazon Lex team. He invents, develops, prototypes, and evangelizes new technical features and solutions for Language AI services that improves the customer experience and eases adoption.

Evaluate the reliability of Retrieval Augmented Generation applications using Amazon Bedrock

June 20, 2024

by Oussama Kandakji Amazon AWS

Retrieval Augmented Generation (RAG) is a technique that enhances large language models (LLMs) by incorporating external knowledge sources. It allows LLMs to reference authoritative knowledge bases or internal repositories before generating responses, producing output tailored to specific domains or contexts while providing relevance, accuracy, and efficiency. RAG achieves this enhancement without retraining the model, making it a cost-effective solution for improving LLM performance across various applications. The following diagram illustrates the main steps in a RAG system.

Although RAG systems are promising, they face challenges like retrieving the most relevant knowledge, avoiding hallucinations inconsistent with the retrieved context, and efficient integration of retrieval and generation components. In addition, RAG architecture can lead to potential issues like retrieval collapse, where the retrieval component learns to retrieve the same documents regardless of the input. A similar problem occurs for some tasks like open-domain question answering—there are often multiple valid answers available in the training data, therefore the LLM could choose to generate an answer from its training data. Another challenge is the need for an effective mechanism to handle cases where no useful information can be retrieved for a given input. Current research aims to improve these aspects for more reliable and capable knowledge-grounded generation.

Given these challenges faced by RAG systems, monitoring and evaluating generative artificial intelligence (AI) applications powered by RAG is essential. Moreover, tracking and analyzing the performance of RAG-based applications is crucial, because it helps assess their effectiveness and reliability when deployed in real-world scenarios. By evaluating RAG applications, you can understand how well the models are using and integrating external knowledge into their responses, how accurately they can retrieve relevant information, and how coherent the generated outputs are. Additionally, evaluation can identify potential biases, hallucinations, inconsistencies, or factual errors that may arise from the integration of external sources or from sub-optimal prompt engineering. Ultimately, a thorough evaluation of RAG-based applications is important for their trustworthiness, improving their performance, optimizing cost, and fostering their responsible deployment in various domains, such as question answering, dialogue systems, and content generation.

In this post, we show you how to evaluate the performance, trustworthiness, and potential biases of your RAG pipelines and applications on Amazon Bedrock. Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI.

RAG evaluation and observability challenges in real-world scenarios

Evaluating a RAG system poses significant challenges due to its complex architecture consisting of multiple components, such as the retrieval module and the generation component represented by the LLMs. Each module operates differently and requires distinct evaluation methodologies, making it difficult to assess the overall end-to-end performance of the RAG architecture. The following are some of the challenges you may encounter:

Lack of ground truth references – In many open-ended generation tasks, there is no single correct answer or reference text against which to evaluate the system’s output. This makes it difficult to apply standard evaluation metrics like BERTScore (Zhang et al. 2020) BLEU, or ROUGE used for machine translation and summarization.
Faithfulness evaluation – A key requirement for RAG systems is that the generated output should be faithful and consistent with the retrieved context. Evaluating this faithfulness, which also serves to measure the presence of hallucinated content, in an automated manner is non-trivial, especially for open-ended responses.
Context relevance assessment – The quality of the RAG output depends heavily on retrieving the right contextual knowledge. Automatically assessing the relevance of the retrieved context to the input prompt is an open challenge.
Factuality vs. coherence trade-off – Although factual accuracy from the retrieved knowledge is important, the generated text should also be naturally coherent. Evaluating and balancing factual consistency with language fluency is difficult.
Compounding errors, diagnosis, and traceability – Errors can compound from the retrieval and generation components. Diagnosing whether errors stem from retrieval failures or generation inconsistencies is hard without clear intermediate outputs. Given the complex interplay between various components of the RAG architecture, it’s also difficult to provide traceability of the problem in the evaluation process.
Human evaluation challenges – Although human evaluation is possible for sample outputs, it’s expensive and subjective, and may not scale well for comprehensive system evaluation across many examples. The need for a domain expert to create and evaluate against a dataset is essential, because the evaluation process requires specialized knowledge and expertise. The labor-intensive nature of the human evaluation process is time-consuming, because it often involves manual effort.
Lack of standardized benchmarks – There are no widely accepted and standardized benchmarks yet for holistically evaluating different capabilities of RAG systems. Without such benchmarks, it can be challenging to compare the various capabilities of different RAG techniques, models, and parameter configurations. Consequently, you may face difficulties in making informed choices when selecting the most appropriate RAG approach that aligns with your unique use case requirements.

Addressing these evaluation and observability challenges is an active area of research, because robust metrics are critical for iterating on and deploying reliable RAG systems for real-world applications.

RAG evaluation concepts and metrics

As mentioned previously, RAG-based generative AI application is composed of two main processes: retrieval and generation. Retrieval is the process where the application uses the user query to retrieve the relevant documents from a knowledge base before adding it to as context augmenting the final prompt. Generation is the process of generating the final response from the LLM. It’s important to monitor and evaluate both processes because they impact the performance and reliability of the application.

Evaluating RAG systems at scale requires an automated approach to extract metrics that are quantitative indicators of its reliability. Generally, the metrics to look for are grouped by main RAG components or by domains. Aside from the metrics discussed in this section, you can incorporate tailored metrics that align with your business objectives and priorities.

Retrieval metrics

You can use the following retrieval metrics:

Context relevance – This measures whether the passages or chunks retrieved by the RAG system are relevant for answering the given query, without including extraneous or irrelevant details. The values range from 0–1, with higher values indicating better context relevancy.
Context recall – This evaluates how well the retrieved context matches to the annotated answer, treated as the ground truth. It’s computed based on the ground truth answer and the retrieved context. The values range between 0–1, with higher values indicating better performance.
Context precision – This measures if all the truly relevant pieces of information from the given context are ranked highly or not. The preferred scenario is when all the relevant chunks are placed at the top ranks. This metric is calculated by considering the question, the ground truth (correct answer), and the context, with values ranging from 0–1, where higher scores indicate better precision.

Generation metrics

You can use the following generation metrics:

Faithfulness – This measures whether the answer generated by the RAG system is faithful to the information contained in the retrieved passages. The aim is to avoid hallucinations and make sure the output is justified by the context provided as input to the RAG system. The metric ranges from 0–1, with higher values indicating better performance.
Answer relevance – This measures whether the generated answer is relevant to the given query. It penalizes cases where the answer contains redundant information or doesn’t sufficiently answer the actual query. Values range between 0–1, where higher scores indicate better answer relevancy.
Answer semantic similarity – It compares the meaning and content of a generated answer with a reference or ground truth answer. It evaluates how closely the generated answer matches the intended meaning of the ground truth answer. The score ranges from 0–1, with higher scores indicating greater semantic similarity between the two answers. A score of 1 means that the generated answer conveys the same meaning as the ground truth answer, whereas a score of 0 suggests that the two answers have completely different meanings.

Aspects evaluation

Aspects are evaluated as follows:

Harmfulness (Yes, No) – If the generated answer carries the risk of causing harm to people, communities, or more broadly to society
Maliciousness (Yes, No) – If the submission intends to harm, deceive, or exploit users
Coherence (Yes, No) – If the generated answer presents ideas, information, or arguments in a logical and organized manner
Correctness (Yes, No) – If the generated answer is factually accurate and free from errors
Conciseness (Yes, No) – If the submission conveys information or ideas clearly and efficiently, without unnecessary or redundant details

The RAG Triad proposed by TrueLens consists of three distinct assessments, as shown in the following figure: evaluating the relevance of the context, examining the grounding of the information, and assessing the relevance of the answer provided. Achieving satisfactory scores across all three evaluations provides confidence that the corresponding RAG application is not generating hallucinated or fabricated content.

The RAGAS paper proposes automated metrics to evaluate these three quality dimensions in a reference-free manner, without needing human-annotated ground truth answers. This is done by prompting a language model and analyzing its outputs appropriately for each aspect.

To automate the evaluation at scale, metrics are computed using machine learning (ML) models called judges. Judges can be LLMs with reasoning capabilities, lightweight language models that are fine-tuned for evaluation tasks, or transformer models that compute similarities between text chunks such as cross-encoders.

Metric outcomes

When metrics are computed, they need to be examined to further optimize the system in a feedback loop:

Low context relevance means that the retrieval process isn’t fetching the relevant context. Therefore, data parsing, chunk sizes and embeddings models need to be optimized.
Low answer faithfulness means that the generation process is likely subject to hallucination, where the answer is not fully based on the retrieved context. In this case, the model choice needs to be revisited or further prompt engineering needs to be done.
Low answer relevance means that the answer generated by the model doesn’t correspond to the user query, and further prompt engineering or fine-tuning needs to be done.

Solution overview

You can use Amazon Bedrock to evaluate your RAG-based applications. In the following sections, we go over the steps to implement this solution:

Set up observability.
Prepare the evaluation dataset.
Choose the metrics and prepare the evaluation prompts.
Aggregate and review the metric results, then optimize the RAG system.

The following diagram illustrates the continuous process for optimizing a RAG system.

Set up observability

In a RAG system, multiple components (input processing, embedding, retrieval, prompt augmentation, generation, and output formatting) interact to generate answers assisted by external knowledge sources. Monitoring arriving user queries, search results, metadata, and component latencies help developers identify performance bottlenecks, understand system interactions, monitor for issues, and conduct root cause analysis, all of which are essential for maintaining, optimizing, and scaling the RAG system effectively.

In addition to metrics and logs, tracing is essential for setting up observability for a RAG system due to its distributed nature. The first step to implement tracing in your RAG system is to instrument your application. Instrumenting your application involves adding code to your application, automatically or manually, to send trace data for incoming and outbound requests and other events within your application, along with metadata about each request. There are several different instrumentation options you can choose from or combine, based on your particular requirements:

Auto instrumentation – Instrument your application with zero code changes, typically through configuration changes, adding an auto-instrumentation agent, or other mechanisms
Library instrumentation – Make minimal application code changes to add pre-built instrumentation targeting specific libraries or frameworks, such as the AWS SDK, LangChain, or LlamaIndex
Manual instrumentation – Add instrumentation code to your application at each location where you want to send trace information

To store and analyze your application traces, you can use AWS X-Ray or third-party tools like Arize Phoenix.

Prepare the evaluation dataset

To evaluate the reliability of your RAG system, you need a dataset that evolves with time, reflecting the state of your RAG system. Each evaluation record contains at least three of the following elements:

Human query – The user query that arrives in the RAG system
Reference document – The document content retrieved and added as a context to the final prompt
AI answer – The generated answer from the LLM
Ground truth – Optionally, you can add ground truth information:
- Context ground truth – The documents or chunks relevant to the human query
- Answer ground truth – The correct answer to the human query

If you have set up tracing, your RAG system traces already contain these elements, so you can either use them to prepare your evaluation dataset, or you can create a custom curated synthetic dataset specific for evaluation purposes based on your indexed data. In this post, we use Anthropic’s Claude 3 Sonnet, available in Amazon Bedrock, to evaluate the reliability of sample trace data of a RAG system that indexes the FAQs from the Zappos website.

Choose your metrics and prepare the evaluation prompts

Now that the evaluation dataset is prepared, you can choose the metrics that matter most to your application and your use case. In addition to the metrics we’ve discussed, you can create their own metrics to evaluate aspects that matter to you most. If your evaluation dataset provides answer ground truth, n-gram comparison metrics like ROUGE or embedding-based metrics BERTscore can be relevant before using an LLM as a judge. For more details, refer to the AWS Foundation Model Evaluations Library and Model evaluation.

When using an LLM as a judge to evaluate the metrics associated with a RAG system, the evaluation prompts play a crucial role in providing accurate and reliable assessments. The following are some best practices when designing evaluation prompts:

Give a clear role – Explicitly state the role the LLM should assume, such as “evaluator” or “judge,” to make sure it understands its task and what it is evaluating.
Give clear indications – Provide specific instructions on how the LLM should evaluate the responses, such as criteria to consider or rating scales to use.
Explain the evaluation procedure – Outline the parameters that need to be evaluated and the evaluation process step by step, including any necessary context or background information.
Deal with edge cases – Anticipate and address potential edge cases or ambiguities that may arise during the evaluation process. For example, determine if an answer based on irrelevant context be considered evaluated as factual or hallucinated.

In this post, we show how to create three custom binary metrics that don’t need ground truth data and that are inspired from some of the metrics we’ve discussed: faithfulness, context relevance, and answer relevance. We created three evaluation prompts.

The following is our faithfulness evaluation prompt template:

You are an AI assistant trained to evaluate interactions between a Human and an AI Assistant. An interaction is composed of a Human query, a reference document, and an AI answer. Your goal is to classify the AI answer using a single lower-case word among the following : “hallucinated” or “factual”.

“hallucinated” indicates that the AI answer provides information that is not found in the reference document.

“factual” indicates that the AI answer is correct relative to the reference document, and does not contain made up information.

Here is the interaction that needs to be evaluated:

Human query: {query}
Reference document: {reference}
AI answer: {response}
Classify the AI’s response as: “factual” or “hallucinated”. Skip the preamble or explanation, and provide the classification.

We also created the following context relevance prompt template:

You are an AI assistant trained to evaluate a knowledge base search system. A search request is composed of a Human query and a reference document. Your goal is to classify the reference document using one of the following classifications in lower-case: “relevant” or “irrelevant”.

“relevant” means that the reference document contains the necessary information to answer the Human query.

“irrelevant” means that the reference document doesn’t contain the necessary information to answer the Human query.

Here is the search request that needs to be evaluated:

Human query: {query}
Reference document: {reference}

Classify the reference document as: “relevant” or “irrelevant”. Skip any preamble or explanation, and provide the classification.

The following is our answer relevance prompt template:

You are an AI assistant trained to evaluate interactions between a Human and an AI Assistant. An interaction is composed of a Human query, a reference document, and an AI answer that should be based on the reference document. Your goal is to classify the AI answer using a single lower-case word among the following : “relevant” or “irrelevant”.

“relevant” means that the AI answer answers the Human query and stays relevant to the Human query, even if the reference document lacks full information.

“irrelevant” means that the Human query is not correctly or only partially answered by the AI.

Here is the interaction that needs to be evaluated:

Human query: {query}
Reference document: {reference}
AI answer: {response}

Classify the AI’s response as: “relevant” or “irrelevant”. Skip the preamble or explanation, and provide the classification.

Aggregate and review your metric results and then optimize your RAG system

After you obtain the evaluation results, you can store metrics in your observability systems alongside the stored traces to identify areas for improvement based on the values of their values or aggregates.

As indicated in the following diagram, every aspect of a RAG system has cascading impact on what follows; for instance, suboptimal document parsing impacts how reliably chunks are created, impacting embeddings quality, retrieval, and model output. When reviewing reliability metrics of your RAG system to find out what needs to be optimized, you should start by optimizing and reviewing what is earlier in the chain—from the left side of the following diagram.

In the following table, we present 3 of the 15 queries we used from the Zappos FAQs to get the correspondent LLM answers together with the reference documents and the calculated metrics for faithfulness, context relevance, and answer relevance.

Question	Retrieved Document	Answer	Faithfulness	Context Relevance	Answer Relevance
What are the specific care instructions for different fabric types like cashmere, silk, and wool?	Silk: Dry cleaning may be required. Some silks are hand- or machine-washable. Do not dry silk in a clothes dryer. Garments usually look best when professionally dry-cleaned.	Cashmere: Generally, hand wash in cold water with a mild detergent. Do not wring or twist. Lay flat to dry.Silk: Dry cleaning is often recommended. Some silks can be hand- or machine-washed, but avoid using a clothes dryer. Wool: Check the label for specific instructions. Many wools can be machine-washed on a gentle cycle with cold water. Avoid high temperatures, as they can cause the wool to shrink or felt.	hallucinated	irrelevant	relevant
How can I identify if an email claiming to be from Zappos is legitimate or a phishing attempt?	recommends you update your online passwords, including the password to your Zappos.com Account, and report the email to the Federal Trade Commission (FTC). Visit the FTC website here to report the email and find more information on how to identify and prevent phishing attacks.	To identify if an email claiming to be from Zappos is legitimate or a phishing attempt, you should check for the following:1. Look for the correct sender email address. Zappos emails should come from an address ending in “@zappos.com”. 2. Check for spelling and grammatical errors in the email. Phishing emails often contain mistakes. 3. Be wary of emails asking for personal information, especially passwords or credit card details. Zappos will never ask for this information via email. 4. Hover over the links in the email without clicking to see if they lead to the correct Zappos URL. 5. If you’re unsure, contact Zappos directly through their official website or customer service number to verify the email’s authenticity.Report any suspicious emails to the Federal Trade Commission (FTC) through their website.	hallucinated	relevant	relevant
What is Amazon Pay, and how do I access my transaction history with it on Zappos?	How do I find my Amazon Pay transactions?	Amazon Pay is a digital wallet that allows you to make payments online using the information stored in your Amazon account, including your shipping addresses, payment methods, and default delivery settings. To view your Amazon Pay transaction history on Zappos or any other merchant site, you should log into your Amazon account. Once logged in, navigate to the “Your Orders” page, where you can view all of your past and present orders, including those made using Amazon Pay. If you’re unable to find a specific transaction, you can use the search bar to look up the order number or the date of the transaction.	hallucinated	irrelevant	relevant
…	.	.	.	.	.
.	.	.	87%	67%	100%

From the preceding aggregates, we can see that answer relevance has a high score; however, context relevance in this example RAG system is 67%. In addition to that, the system is demonstrating a level of hallucination in some cases. Therefore, we should start optimizing earlier in the chain to improve context relevance. If we look at sample questions where context relevance is classified as irrelevant, we can see that text is well parsed; however, we can also see that chunks may start or end in the middle of a sentence or just include the FAQ question without the answer. Therefore, we start by optimizing the chunking method.

After we update the chunking mechanism to prevent starting or ending a chunk in the middle of a sentence and to include the FAQ question and answer pairs, we redo the evaluation over the same 15 questions. The following table shows a sample of our results.

Question	Retrieved Document	Answer	Faithfulness	Context Relevance	Answer Relevance
What are the specific care instructions for different fabric types like cashmere, silk, and wool?	How do I care for XYZ fabric? Acetate: While most items made of acetate are dry-clean only, some may be hand-washed in lukewarm water. Do not twist the fabric. Iron while the fabric is damp. Press the inside of the garment using a low-temperature setting. Use a pressing cloth when ironing the outside. Acrylic knit: Most acrylic knit garments can be machine-washed. Read the label, and check for the proper drying option. Some knits retain their shapes best if reshaped and dried flat. Cashmere: Check the care label before laundering cashmere. To hand-wash, use a mild detergent or shampoo. Gently squeeze the water through the garment, then rinse until the water runs clear. Do not wring or twist. Squeeze out excess water. To dry, lay flat on a towel, away from sunlight or heat. Cotton: Cotton holds up well to home laundering. Remove the garment from the dryer promptly to reduce wrinkling. Press using spray starch for the crispness of a laundered shirt. Cotton blend: Dry cotton-blend garments using your dryer’s permanent-press or low-heat cycle, and remove immediately to reduce wrinkling. Touch up with a steam iron; starch for a professionally laundered look. Linen: Most linen garments need to be dry-cleaned or hand-washed. Follow the instructions on the care label. To touch up or press, use a steam iron on a linen setting for a crisp look. Nylon: Machine-wash nylon garments in warm water. Use a low-temperature setting when tumble-drying, and include a dryer sheet to reduce static electricity. Use a warm iron to press, if necessary. Polyester: Read the label. Usually polyester items can be machine-washed (cool) and dried (low). Check the label to see if air-drying is recommended. Touch up with a cool – never hot – iron, if necessary. Rayon: Make sure to read the care label for rayon clothing. Launder in cool water and a mild detergent. When machine-washing, use the gentle cycle and promptly remove garments after washing. When hand-washing, do not wring or twist the garment. To dry, roll the item in a towel to squeeze out excess water. Lay flat to dry. Iron when still damp and with the garment inside out to prevent the fabric from becoming shiny. Use a pressing cloth when ironing the right side of the garment. Pressing cloths can be purchased, or use something as simple as a piece of white cotton fabric. Silk: Dry cleaning may be required. Some silks are hand- or machine-washable. Do not dry silk in a clothes dryer. Garments usually look best when professionally dry-cleaned. Spandex: Use warm water when hand- or machine-washing items made with spandex. Line-dry or use a low temperature setting when machine-drying to preserve the elasticity of the garment. Wool knit: Typically wool knits need to be dry-cleaned, but check the label. If hand-washable, use cool water and a detergent for fine washables. Squeeze out excess water, then reshape and dry flat. Wool (lined): Lined garments generally look best when professionally dry-cleaned once a season. Between dry cleanings, remove surface soil with a brush or damp cloth. Refresh the item by hanging it from a padded hanger in a steamy bathroom; moisture will help garment shed wrinkles. When touching up with an iron, use steam in an up-and-down motion (rather than sliding the iron along the fabric).	Cashmere: Check the care label before laundering. To hand-wash, use a mild detergent or shampoo. Gently squeeze the water through the garment, then rinse until the water runs clear. Do not wring or twist. Squeeze out excess water and lay flat to dry, away from sunlight or heat. Silk: Dry cleaning may be required. Some silks are hand- or machine-washable. Do not dry silk in a clothes dryer. For hand-washing, use a mild detergent and cool water. Squeeze out excess water and lay flat to dry. Iron when still damp and with the garment inside out to prevent the fabric from becoming shiny. Wool: The care instructions for wool depend on whether it’s a knit or not. For wool knits, they typically need to be dry-cleaned, but check the label. If hand-washable, use cool water and a detergent for fine washables. Squeeze out excess water, then reshape and dry flat. For lined wool garments, they generally look best when professionally dry-cleaned once a season. Between dry cleanings, remove surface soil with a brush or damp cloth and refresh in a steamy bathroom. When touching up with an iron, use steam in an up-and-down motion.	factual	relevant	relevant
How can I identify if an email claiming to be from Zappos is legitimate or a phishing attempt?	How do I identify whether an Email is or is not from Zappos? Suspicious or fraudulent emails not from Zappos.com may contain: A reference to a payment source that Zappos.com does not accept, such as PayPal. Zappos.com currently only accepts credit cards, debit cards, and Zappos.com Gift Cards for payment. A request to update payment information that is not linked to an order you placed on Zappos.com or your Zappos.com account. Links, including “Here” links, to websites not affiliated with Zappos.com. You should be able to hover over a link without clicking it to show the website address associated with the link. If the link shows a site other than Zappos.com or a site that is not a legitimate Zappos domain, then it is likely phishing. An order confirmation for an item you didn’t purchase or an attachment to an order confirmation. If you are a Zappos.com customer, go to your Zappos Account and view Your Order History to see if there is an order that matches the details in the correspondence. If you are not a Zappos.com customer, feel free to contact us to verify whether the email is associated with an actual order. Typos, grammatical errors, or unconventional formatting. A generic email address with Zappos in the email, for example advert@gmail.com or an email address from a non-US country or site not associated with Zappos. Zappos will never send communications from personal or generic email accounts. Order-related emails typically come from cs-noreply@zappos.com or another zappos.com email address. Forged email addresses to make it look like the email is coming from Zappos.com. Your email program may allow you to hover over or click on an email address to view its original source. What should I do if I believe I’ve received a phishing email or if I responded to what may be a phishing email? If you believe you received a phishing email or responded to what may be a phishing email, Zappos recommends you update your online passwords, including the password to your Zappos.com Account, and report the email to the Federal Trade Commission (FTC). Visit the FTC website here to report the email and find more information on how to identify and prevent phishing attacks. Payment Information Do you accept international credit cards? Currently Zappos.com can accept international credit cards but we can only ship to an address within the United States and its territories. Please note that the credit card must be issued and contain a logo from either Visa, Mastercard, Discover, or American Express. On the checkout page, there will be a billing and shipping section. Please do the following: For the billing information Enter your street address on Address Line 1. Enter your City, County or Province, and Postal Code on Address Line 2. Enter your Country for the city. Enter AA for the state. Enter 11111 for the zip code. For the shipping information Please enter the correct information in the appropriate field. Orders placed with the use of an International Credit Card may take up to 72 hours to finalize prior to shipping. We apologize for any inconvenience this may cause.	To identify if an email claiming to be from Zappos is legitimate or a phishing attempt, check for the following: 1. Zappos only accepts certain payment methods: credit cards, debit cards, and Zappos.com Gift Cards. 2. No unsolicited requests for payment information or updates. 3. Links lead to Zappos.com or legitimate Zappos domains. 4. Match the order details in the email with your Zappos Account’s order history. 5. Look for typos, grammatical errors, or unconventional formatting. 6. Check for generic email addresses not affiliated with Zappos. 7. Verify the sender’s email address is from cs-noreply@zappos.com or another zappos.com email address. 8. Be wary of forged email addresses. If you believe you received a phishing email or responded to what may be a phishing email, update your passwords and report the email to the Federal Trade Commission (FTC).	factual	relevant	relevant
What is Amazon Pay, and how do I access my transaction history with it on Zappos?	How do I find my Amazon Pay transactions? Go to pay.amazon.com and sign in as a “shopper” using your Amazon username and password. One-time purchases are shown on the “Activity” tab, and subscription agreements are on the “Merchant agreements” tab. You can also view your transactions on Amazon.com. If you are on a desktop, your Amazon Pay transactions can be found under “Amazon Orders” within the Amazon Pay tab or in “Your Account” under “Transactions”. If you are in the Amazon app, all transactions are listed together under “Your Orders” or in “Your Account”.	Amazon Pay is a digital wallet that allows you to make payments online using your Amazon account. However, the context does not provide information on how to access your transaction history with Amazon Pay on Zappos. The instructions given in the context are for accessing Amazon Pay transactions on Amazon’s own platform.	factual	relevant	relevant
…	.	.	.	.	.
.	.	.	100%	93%	100%

After we changed the chunking mechanism to prevent mid-sentence chunking and to include an FAQ and its corresponding answer in the same chunk, we improved context relevance from 67% to 93%. We can also see that improving context relevance resolved previous hallucinations without even changing the prompt template. We can iterate the optimization process with further investigation into the questions that are having irrelevant retrievals by adjusting the indexing or the retrieval mechanism by choosing a higher number of retrieved chunks or by using hybrid search to combine lexical search with semantic search.

Sample references

To further explore and experiment different RAG evaluation techniques, you can delve deeper into the sample notebooks available in the Knowledge Bases section of the Amazon Bedrock Samples GitHub repo.

Conclusion

In this post, we described the importance of evaluating and monitoring RAG-based generative AI applications. We showcased the metrics and frameworks for RAG system evaluation and observability, then we went over how you can use FMs in Amazon Bedrock to compute RAG reliability metrics. It’s important to choose the metrics that matter most to your organization and that impact the aspect or configuration you want to optimize.

If RAG is not sufficient for your use case, you can opt for fine-tuning or continued pre-training in Amazon Bedrock or Amazon SageMaker to build custom models that are specific to your domain, organization, and use case. Most importantly, keeping a human in the loop is essential to align AI systems, as well as their evaluation mechanisms, with their intended uses and objectives.

About the Authors

Oussama Maxime Kandakji is a Senior Solutions Architect at AWS focusing on data science and engineering. He works with enterprise customers on solving business challenges and building innovative functionalities on top of AWS. He enjoys contributing to open source and working with data.

Ioan Catana is a Senior Artificial Intelligence and Machine Learning Specialist Solutions Architect at AWS. He helps customers develop and scale their ML solutions and generative AI applications in the AWS Cloud. Ioan has over 20 years of experience, mostly in software architecture design and cloud engineering.

Connect to Amazon services using AWS PrivateLink in Amazon SageMaker

June 20, 2024

by Francisco Calderon Rodriguez Amazon AWS

AWS customers that implement secure development environments often have to restrict outbound and inbound internet traffic. This becomes increasingly important with artificial intelligence (AI) development because of the data assets that need to be protected. Transmitting data across the internet is not secure enough for highly sensitive data. Therefore, accessing AWS services without leaving the AWS network can be a secure workflow.

One of the ways you can secure AI development is by creating Amazon SageMaker instances within a virtual private cloud (VPC) with direct internet access disabled. This isolates the instance from the internet and makes API calls to other AWS services not possible. This presents a challenge for developers that are building architectures for production in which many AWS services need to function together.

In this post, we present a solution for configuring SageMaker notebook instances to connect to Amazon Bedrock and other AWS services with the use of AWS PrivateLink and Amazon Elastic Compute Cloud (Amazon EC2) security groups.

Solution overview

The following example architecture shows a SageMaker instance connecting to various services. The SageMaker instance is isolated from the internet but is still able to access AWS services through PrivateLink. One will notice that the connection to Amazon S3 is through a Gateway VPC endpoint. You can learn more about Gateway VPC endpoints here.

In the following sections, we show how to configure this on the AWS Management Console.

Create security groups for outbound and inbound endpoint access

First, you have to create the security groups that will be attached to the VPC endpoints and the SageMaker instance. You create the security groups before creating a SageMaker instance because after the instance has been created, the security group configuration can’t be changed.

You create two groups, one for outbound and another for inbound. Complete the following steps:

1. On the Amazon EC2 console, choose Security Groups in the navigation pane.

2. Choose Create security group.

3. For Security group name, enter a name (for example, inbound-sagemaker).

4. For Description, enter a description.

5. For VPC, choose your VPC.

6. Note the security group ID to use in the next steps.

7. Create a new outbound rule.

8. For Security group name, enter a name (for example, outbound-sagemaker).

9. For Description, enter description.

10. For VPC, choose the same VPC as the inbound rule.

11. In the Outbound rules section, choose Add rule.

12. Add an outbound rule with the inbound security group ID as the destination using HTTPS as the type.

13. Note the outbound security group ID to use in the next step.

14. Return to the inbound security group and add an inbound rule of HTTPS type with the destination set to the outbound security group ID.

Create a SageMaker instance with the outbound security group

You now create a SageMaker instance with the network configuration shown in the following screenshot. It’s important to choose the same VPC that you used to create the inbound and outbound security groups. You then choose the outbound security group you created earlier.

Create an Interface VPC endpoint

In this step, you create an Interface VPC endpoint using Amazon Virtual Private Cloud (Amazon VPC) that automatically uses PrivateLink, which allows calls from your SageMaker instance to AWS services.

1. On the Amazon VPC console, choose Endpoints in the navigation pane.

2. Choose Create endpoint.

3. For Name tag, enter a name (for example, bedrock-link).

4. For Service category, select AWS services.

5. For Services, search for and choose com.amazonaws.<region>.bedrock-runtime.

6. Set the VPC to the same one you’ve been working with.

7. Specify the subnet(s).

A subnet is a range of IP addresses within a VPC. If you don’t know what subnet to specify, any subnet will work. Otherwise, specify the subnet that is required by any security requirements from your cloud security team.

8. Set the security group to the inbound security group you created earlier.

After you create the endpoint, it should take some time to become available.

Repeat these steps for every service that you need for your workflow. The following screenshots show examples of services that you can create interface VPC endpoints for, such as Amazon Simple Storage Service (Amazon S3), Amazon Kendra, and AWS Lambda. AWS PrivateLink enables you to connect privately to several AWS services, for a current list please see this page.

Test the connection

You can test the connection to Amazon Bedrock using a simple Python API call. The following is a code snippet that invokes the Amazon Bedrock model:

import boto3
import json

bedrock = boto3.client(service_name='bedrock-runtime')
prompt = """
Human: What type of sharks are there?

Assistant:"""

body = json.dumps({
"prompt": prompt,
"max_tokens_to_sample": 4000,
"temperature": 0.1,
"top_p": 0.9,
})

modelId = 'anthropic.claude-instant-v1'
accept = 'application/json'
contentType = 'application/json'

response = bedrock.invoke_model(body=body, modelId=modelId, accept=accept, contentType=contentType)
response_body = json.loads(response.get('body').read())

print(response_body.get('completion'))

If you were to run this in a Jupyter notebook cell, it would give you an error because you have not pointed the invocation to use the VPC endpoint. You do this by adding an endpoint URL to the client instantiation:

bedrock = boto3.client(
    service_name='bedrock-runtime',
    endpoint_url = 'https://vpce-0e452bc86b1f87c50-5xltzdpo.bedrock-runtime.us-west-2.vpce.amazonaws.com'
)

To find the endpoint URL, go back to the VPC endpoint that you created in the previous step and look for DNS names, illustrated in the following screenshot. The Private DNS is the best option since it is the same as the public, which means you don’t have to change anything to use the private connection. The next best option is to use the Regional DNS, which is the first option under “DNS names”. Both options allow your traffic to failover to other healthy Availability Zones (AZ), in case the current AZ is impaired.

Clean up

To clean up your resources, complete the following steps:

1. On the SageMaker console, navigate to the notebook configuration page.

2. Stop the instance, then choose Delete to delete the instance.

3. On the Amazon EC2 console, navigate to the inbound security group’s detail page.

4. On the Actions menu, choose Delete security groups.

5. Repeat these steps for the outbound security group.

6. On the Amazon VPC console, navigate to the VPC endpoint’s details page.

7. On the Actions menu, choose Delete.

8. Repeat this is step for every endpoint you created as part of this post.

Conclusion

In this post, we showed how to set up VPC endpoints and security groups to allow SageMaker to connect to Amazon Bedrock. When a SageMaker instance has restricted internet access, you can still develop and connect to other AWS services through the use of AWS PrivateLink. This post showed how to connect to Amazon Bedrock from an isolated SageMaker instance, but you can replicate the steps for other services.

We encourage you to get started developing AI applications on AWS. To learn more, visit Amazon SageMaker, Amazon Bedrock, and AWS PrivateLink for more information. Happy coding!

About the Author

Francisco Calderon is a Data Scientist at the AWS Generative AI Innovation Center. As a member of the GenAI Innovation Center, he helps solve critical business problems for AWS customers using the latest technology in Generative AI. In his spare time, Francisco likes to play music and guitar, play soccer with his daughters, and enjoy time with his family.

Sungmin Hong is an Applied Scientist at AWS Generative AI Innovation Center where he helps expedite the variety of use cases of AWS customers. Before joining Amazon, Sungmin was a postdoctoral research fellow at Harvard Medical School. He holds Ph.D. in Computer Science from New York University. Outside of work, Sungmin enjoys hiking, traveling and reading.

Yash Shah is a Science Manager in the AWS Generative AI Innovation Center. He and his team of applied scientists and machine learning engineers work on a range of machine learning use cases from healthcare, sports, automotive and manufacturing.

Anila Joshi has more than a decade of experience building AI solutions. As an Applied Science Manager at AWS Generative AI Innovation Center, Anila pioneers innovative applications of AI that push the boundaries of possibility and guides customers to strategically chart a course into the future of AI.

Maximize your Amazon Translate architecture using strategic caching layers

June 19, 2024

by Praneeth Reddy Tekula Amazon AWS

Amazon Translate is a neural machine translation service that delivers fast, high quality, affordable, and customizable language translation. Amazon Translate supports 75 languages and 5,550 language pairs. For the latest list, see the Amazon Translate Developer Guide. A key benefit of Amazon Translate is its speed and scalability. It can translate a large body of content or text passages in batch mode or translate content in real-time through API calls. This helps enterprises get fast and accurate translations across massive volumes of content including product listings, support articles, marketing collateral, and technical documentation. When content sets have phrases or sentences that are often repeated, you can optimize cost by implementing a write-through caching layer. For example, product descriptions for items contain many recurring terms and specifications. This is where implementing a translation cache can significantly reduce costs. The caching layer stores source content and its translated text. Then, when the same source content needs to be translated again, the cached translation is simply reused instead of paying for a brand-new translation.

In this post, we explain how setting up a cache for frequently accessed translations can benefit organizations that need scalable, multi-language translation across large volumes of content. You’ll learn how to build a simple caching mechanism for Amazon Translate to accelerate turnaround times.

Solution overview

The caching solution uses Amazon DynamoDB to store translations from Amazon Translate. DynamoDB functions as the cache layer. When a translation is required, the application code first checks the cache—the DynamoDB table—to see if the translation is already cached. If a cache hit occurs, the stored translation is read from DynamoDB with no need to call Amazon Translate again.

If the translation isn’t cached in DynamoDB (a cache miss), then the Amazon Translate API will be called to perform the translation. The source text is passed to Amazon Translate, and the translated result is returned and the translation is stored in DynamoDB, populating the cache for the next time that translation is requested.

For this blog post, we will be using Amazon API Gateway as a rest API for translation that integrates with AWS Lambda to perform backend logic. An Amazon Cognito user pool is used to control who can access your translate rest API. You can also use other mechanisms to control authentication and authorization to API Gateway based on your use-case.

Amazon Translate caching architecture

When a new translation is needed, the user or application makes a request to the translation rest API.
Amazon Cognito verifies the identity token in the request to grant access to the translation rest API.
When new content comes in for translation, the Amazon API Gateway invokes the Lambda function that checks the Amazon DynamoDB table for an existing translation.
If a match is found, the translation is retrieved from DynamoDB.
If no match is found, the content is sent to Amazon Translate to perform a custom translation using parallel data. The translated content is then stored in DynamoDB along with a new entry for hit rate percentage.

These high-value translations are periodically post-edited by human translators and then added as parallel data for machine translation. This improves the quality of future translations performed by Amazon Translate.

We will use a simple schema in DynamoDB to store the cache entries. Each item will contain the following attributes:

src_text: The original source text
target_locale: The target language to translate to
translated_text: The translated text
src_locale: The original source language
hash: The primary key of the table

The primary key will be constructed from the src_locale, target_locale, and src_text to uniquely identify cache entries. When retrieving translations, items will be looked up by their primary key.

Prerequisites

To deploy the solution, you need

An AWS account. If you don’t already have an AWS account, you can create one.
Your access to the AWS account must have AWS Identity and Access Management (IAM) permissions to launch AWS CloudFormation templates that create IAM roles.
Install AWS CLI.
Install jq tool.
AWS Cloud Development Kit (AWS CDK). See Getting started with the AWS CDK.
Postman installed and configured on your computer.

Deploy the solution with AWS CDK

We will use AWS CDK to deploy the DynamoDB table for caching translations. CDK allows defining the infrastructure through a familiar programming language such as Python.

Clone the repo from GitHub.

git clone https://github.com/aws-samples/maximize-translate-architecture-strategic-caching

Run the requirements.txt, to install python dependencies.
```
python3 -m pip install -r requirements.txt
```
Open app.py file and replace the AWS account number and AWS Region with yours.
To verify that the AWS CDK is bootstrapped, run cdk bootstrap from the root of the repository:

cdk bootstrap
 Bootstrapping environment aws://<acct#>/<region>... 
Trusted accounts for deployment: (none) 
Trusted accounts for lookup: (none) 
Using default execution policy of 
'arn:aws:iam::aws:policy/AdministratorAccess'. 
Pass '--cloudformation-execution-policies' to 
customize.  Environment aws://<acct#>/<region> 
bootstrapped (no changes).

Define your CDK stack to add DynamoDB and Lambda resources. The DynamoDB and Lambda Functions are defined as follows:

- This creates a DynamoDB table with the primary key as hash, because the TRANSLATION_CACHE table is schemaless, you don’t have to define other attributes in advance. This also creates a Lambda function with Python as the runtime.

table = ddb.Table(
            self, 'TRANSLATION_CACHE',
            table_name='TRANSLATION_CACHE',
            partition_key={'name': 'hash', 'type': ddb.AttributeType.STRING},
            removal_policy=RemovalPolicy.DESTROY
        )

        self._handler = _lambda.Function(
            self, 'GetTranslationHandler',
            runtime=_lambda.Runtime.PYTHON_3_10,
            handler='get_translation.handler',
            code=_lambda.Code.from_asset('lambda'),
            environment={
                'TRANSLATION_CACHE_TABLE_NAME': table.table_name,
            }
        )

- The Lambda function is defined such that it:
  - Parses the request body JSON into a Python dictionary.
  - Extracts the source locale, target locale, and input text from the request.
  - Gets the DynamoDB table name to use for a translation cache from environment variables.
  - Calls generate_translations_with_cache() to translate the text, passing the locales, text, and DynamoDB table name.
  - Returns a 200 response with the translations and processing time in the body.

def handler(event, context):

    print('request: {}'.format(json.dumps(event)))

    request = json.loads(event['body'])
    print("request", request)

    src_locale = request['src_locale']
    target_locale = request['target_locale']
    input_text = request['input_text']
    table_name = os.environ['TRANSLATION_CACHE_TABLE_NAME']

    if table_name == "":
        print("Defaulting table name")
        table_name = "TRANSLATION_CACHE"

    try:
        start = time.perf_counter()
        translations = generate_translations_with_cache(src_locale, target_locale, input_text, table_name)
        end = time.perf_counter()
        time_diff = (end - start)

        translations["processing_seconds"] = time_diff

        return {
            'statusCode': 200,
            'headers': {
                'Content-Type': 'application/json'
            },
            'body': json.dumps(translations)
        }

    except ClientError as error:

        error = {"error_text": error.response['Error']['Code']}
        return {
            'statusCode': 500,
            'headers': {
                'Content-Type': 'application/json'
            },
            'body': json.dumps(error)
        }

- The generate_translations_with_cache function divides the input text into separate sentences by splitting on a period (“.”) symbol. It stores each sentence as a separate entry in the DynamoDB table along with its translation. This segmentation into sentences is done so that cached translations can be reused for repeating sentences.
- In summary, it’s a Lambda function that accepts a translation request, translates the text using a cache, and returns the result with timing information. It uses DynamoDB to cache translations for better performance.
You can deploy the stack by changing the working directory to the root of the repository and running the following command.
```
cdk deploy
```

Considerations

Here are some additional considerations when implementing translation caching:

Eviction policy: An additional column can be defined indicating the cache expiration of the cache entry. The cache entry can then be evicted by defining a separate process.
Cache sizing: Determine expected cache size and provision DynamoDB throughput accordingly. Start with on-demand capacity if usage is unpredictable.
Cost optimization: Balance caching costs with savings from reducing Amazon Translate usage. Use a short DynamoDB Time-to-Live (TTL) and limit the cache size to minimize overhead.
Sensitive Information: DynamoDB encrypts all data at rest by default, if cached translations contain sensitive data, you can grant access to authorized users only. You can also choose to not cache data that contains sensitive information.

Customizing translations with parallel data

The translations generated in the translations table can be human-reviewed and used as parallel data to customize the translations. Parallel data consists of examples that show how you want segments of text to be translated. It includes a collection of textual examples in a source language; for each example, it contains the desired translation output in one or more target languages.

This is a great approach for most use cases, but some outliers might require light post-editing by human teams. The post-editing process can help you better understand the needs of your customers by capturing the nuances of local language that can be lost in translation. For businesses and organizations that want to augment the output of Amazon Translate (and other Amazon artificial intelligence (AI) services) with human intelligence, Amazon Augmented AI (Amazon A2I) provides a managed approach to do so, see Designing human review workflows with Amazon Translate and Amazon Augmented AI for more information.

When you add parallel data to a batch translation job, you create an Active Custom Translation job. When you run these jobs, Amazon Translate uses your parallel data at runtime to produce customized machine translation output. It adapts the translation to reflect the style, tone, and word choices that it finds in your parallel data. With parallel data, you can tailor your translations for terms or phrases that are unique to a specific domain, such as life sciences, law, or finance. For more information, see Customizing your translations with parallel data.

Testing the caching setup

Here is a video walkthrough of testing the solution.

There are multiple ways to test the caching setup. For this example, you will use Postman to test by sending requests. Because the Rest API is protected by an Amazon Cognito authorizer, you will need to configure Postman to send an authorization token with the API request.

As part of the AWS CDK deployment in the previous step, a Cognito user pool is created with an app client integration. On your AWS CloudFormation console, you can find BaseURL, translateCacheEndpoint, UserPoolID, and ClientID on the CDK stack output section. Copy these into a text editor for use later.

To generate an authorization token from Cognito, the next step is to create a user in the Cognito user pool.

Go to the Amazon Cognito console. Select the user pool that was created by the AWS CDK stack.
Select the Users tab and choose Create User.
Enter the following values and choose Create User.
1. On Invitation Message verify that Don’t send an invitation is selected.
2. For Email address, enter test@test.com.
3. On Temporary password, verify that Set a password is selected.
4. In Password enter testUser123!.
Now that the user is created, you will use AWS Command Line Interface (CLI) to simulate a sign in for the user. Go to the AWS CloudShell console.
Enter the following commands on the CloudShell terminal by replacing UserPoolID and ClientID from the CloudFormation output of the AWS CDK stack.

export YOUR_POOL_ID=<UserPoolID>

export YOUR_CLIENT_ID=<ClientID>

export Session_ID=$(aws cognito-idp admin-initiate-auth --user-pool-id ${YOUR_POOL_ID} --client-id ${YOUR_CLIENT_ID} --auth-flow ADMIN_NO_SRP_AUTH --auth-parameters 'USERNAME=test@test.com,PASSWORD="testUser123!"' | jq .Session -r)

aws cognito-idp admin-respond-to-auth-challenge --user-pool-id ${YOUR_POOL_ID}  --client-id ${YOUR_CLIENT_ID} --challenge-name NEW_PASSWORD_REQUIRED --challenge-responses 'USERNAME= test@test.com,NEW_PASSWORD="testUser456!"' --session "${Session_ID}"

The output from this call should be a valid session in the following format. The IdToken is the Open ID Connect-compatible identity token that we will pass to the APIs in the authorization header on Postman configuration. Copy it into a text editor to use later.

{
   "ChallengeParameters": {},
   "AuthenticationResult": {
"AccessToken":"YOU_WILL_SEE_VALID_ACCESS_TOKEN_VALUE_HERE",
      "ExpiresIn": 3600,
      "TokenType": "Bearer",
      "RefreshToken": "YOU_WILL_SEE_VALID_REFRESH_TOKEN_VALUE_HERE",
      "IdToken": "YOU_WILL_SEE_VALID_ID_TOKEN_VALUE_HERE"
   }
}

Now that you have an authorization token to pass with the API request to your rest API. Go to the Postman website. Sign in to the Postman website or download the Postman desktop client and create a Workspace with the name dev.

Select the workspace dev and choose on New request.
Change the method type to POST from GET.
Paste the <TranslateCacheEndpoint> URL from the CloudFormation output of the AWS CDK stack into the request URL textbox. Append the API path /translate to the URL, as shown in the following figure.

Now set up authorization configuration on Postman so that requests to the translate API are authorized by the Amazon Cognito user pool.

Select the Authorization tab below the request URL in Postman. Select OAuth2.0 as the Type.
Under Current Token, copy and paste Your IdToken from earlier into the Token field.

Select Configure New Token. Under Configuration Options add or select the values that follow. Copy the BaseURL and ClientID from the CloudFormation output of the AWS CDK stack. Leave the remaining fields at the default values.

- Token Name: token
- Grant Type: Select Authorization Code
- Callback URL: Enter https://localhost
- Auth URL: Enter <BaseURL>/oauth2/authorize
- Access Token URL: Enter <BaseURL>/oauth2/token
- ClientID: Enter <ClientID>
- Scope: Enter openid profile translate-cache/translate
- Client Authorization: Select Send client credentials in body.

Click Get New Access Token. You will be directed to another page to sign in as a user. Use the below credentials of the test user that was created earlier in your Cognito user pool:-
- Username: test@test.com
- Password: testUser456!
After authenticating, you will now get a new id_token. Copy the new id_token and go back to Postman authorization tab to replace that with the token value under Current Token.
Now, on the Postman request URL and Select the Body tab for Request. Select the raw . Change Body type to JSON and insert the following JSON content. When done, choose Send.

{
"src_locale": "en",
"target_locale": "fr",
"input_text": "Use the Amazon Translate service to translate content from a source language (the language of the input content) to a target language (the language that you select for the translation output). In a batch job, you can translate files from one or more source languages to one or more target languages. For more information about supported languages, see Supported languages and language codes."
}

First translation request to the API

The first request to the API takes more time, because the Lambda function checks the given input text against the DynamoDB database on the initial request. Because this is the first request, it won’t find the input text in the table and will call Amazon Translate to translate the provided text.

Examining the processing_seconds value reveals that this initial request took approximately 2.97 seconds to complete.

Subsequent translations requests to the API

After the first request, the input text and translated output are now stored in the DynamoDB table. On subsequent requests with the same input text, the Lambda function will first check DynamoDB for a cache hit. Because the table now contains the input text from the first request, the Lambda function will find it there and retrieve the translation from DynamoDB instead of calling Amazon Translate again.

Storing requests in a cache allows subsequent requests for the same translation to skip the Amazon Translate call, which is usually the most time-consuming part of the process. Retrieving the translation from DynamoDB is much faster than calling Amazon Translate to translate the text each time.

The second request has a processing time of approximately 0.79 seconds, about 3 times faster than the first request which took 2.97 seconds to complete.

Cache purge

Amazon Translate continuously improves its translation models over time. To benefit from these improvements, you need to periodically purge translations from your DynamoDB cache and fetch fresh translations from Amazon Translate.

DynamoDB provides a Time-to-Live (TTL) feature that can automatically delete items after a specified expiry timestamp. You can use this capability to implement cache purging. When a translation is stored in DynamoDB, a purge_date attribute set to 30 days in the future is added. DynamoDB will automatically delete items shortly after the purge_date timestamp is reached. This ensures cached translations older than 30 days are removed from the table. When these expired entries are accessed again, a cache miss occurs and Amazon Translate is called to retrieve an updated translation.

The TTL-based cache expiration allows you to efficiently purge older translations on an ongoing basis. This ensures your applications can benefit from the continuous improvements to the machine learning models used by Amazon Translate while minimizing costs by still using caching for repeated translations within a 30-day period.

Clean up

When deleting a stack, most resources will be deleted upon stack deletion, however that’s not the case for all resources. The DynamoDB table will be retained by default. If you don’t want to retain this table, you can set this in the AWS CDK code by using RemovalPolicy.

Additionally, the Lambda function will generate Amazon CloudWatch logs that are permanently retained. These won’t be tracked by CloudFormation because they’re not part of the stack, so the logs will persist. Use the Cloudwatch console to manually delete any logs that you don’t want to retain.

You can either delete the stack through the CloudFormation console or use AWS CDK destroy from the root folder.

cdk destroy

Conclusion

The solution outlined in this post provides an effective way to implement a caching layer for Amazon Translate to improve translation performance and reduce costs. Using a cache-aside pattern with DynamoDB allows frequently accessed translations to be served from the cache instead of calling Amazon Translate each time.

The caching architecture is scalable, secure, and cost-optimized. Additional enhancements such as setting TTLs, adding eviction policies, and encrypting cache entries can further customize the architecture to your specific use case.

Translations stored in the cache can also be post-edited and used as parallel data to train Amazon Translate. This creates a feedback loop that continuously improves translation quality over time.

By implementing a caching layer, enterprises can deliver fast, high-quality translations tailored to their business needs at reduced costs. Caching provides a way to scale Amazon Translate efficiently while optimizing performance and cost.

Additional resources

About the authors

Praneeth Reddy Tekula is a Senior Solutions Architect focusing on EdTech at AWS. He provides architectural guidance and best practices to customers in building resilient, secure and scalable systems on AWS. He is passionate about observability and has a strong networking background.

Reagan Rosario is a Solutions Architect at AWS, specializing in building scalable, highly available, and secure cloud solutions for education technology companies. With over 10 years of experience in software engineering and architecture roles, Reagan loves using his technical knowledge to help AWS customers architect robust cloud solutions that leverage the breadth and depth of AWS.