Designing generative AI workloads for resilience

Designing generative AI workloads for resilience

Resilience plays a pivotal role in the development of any workload, and generative AI workloads are no different. There are unique considerations when engineering generative AI workloads through a resilience lens. Understanding and prioritizing resilience is crucial for generative AI workloads to meet organizational availability and business continuity requirements. In this post, we discuss the different stacks of a generative AI workload and what those considerations should be.

Full stack generative AI

Although a lot of the excitement around generative AI focuses on the models, a complete solution involves people, skills, and tools from several domains. Consider the following picture, which is an AWS view of the a16z emerging application stack for large language models (LLMs).

Taxonomy of LLM App Stack on AWS

Compared to a more traditional solution built around AI and machine learning (ML), a generative AI solution now involves the following:

  • New roles – You have to consider model tuners as well as model builders and model integrators
  • New tools – The traditional MLOps stack doesn’t extend to cover the type of experiment tracking or observability necessary for prompt engineering or agents that invoke tools to interact with other systems

Agent reasoning

Unlike traditional AI models, Retrieval Augmented Generation (RAG) allows for more accurate and contextually relevant responses by integrating external knowledge sources. The following are some considerations when using RAG:

  • Setting appropriate timeouts is important to the customer experience. Nothing says bad user experience more than being in the middle of a chat and getting disconnected.
  • Make sure to validate prompt input data and prompt input size for allocated character limits that are defined by your model.
  • If you’re performing prompt engineering, you should persist your prompts to a reliable data store. That will safeguard your prompts in case of accidental loss or as part of your overall disaster recovery strategy.

Data pipelines

In cases where you need to provide contextual data to the foundation model using the RAG pattern, you need a data pipeline that can ingest the source data, convert it to embedding vectors, and store the embedding vectors in a vector database. This pipeline could be a batch pipeline if you prepare contextual data in advance, or a low-latency pipeline if you’re incorporating new contextual data on the fly. In the batch case, there are a couple challenges compared to typical data pipelines.

The data sources may be PDF documents on a file system, data from a software as a service (SaaS) system like a CRM tool, or data from an existing wiki or knowledge base. Ingesting from these sources is different from the typical data sources like log data in an Amazon Simple Storage Service (Amazon S3) bucket or structured data from a relational database. The level of parallelism you can achieve may be limited by the source system, so you need to account for throttling and use backoff techniques. Some of the source systems may be brittle, so you need to build in error handling and retry logic.

The embedding model could be a performance bottleneck, regardless of whether you run it locally in the pipeline or call an external model. Embedding models are foundation models that run on GPUs and do not have unlimited capacity. If the model runs locally, you need to assign work based on GPU capacity. If the model runs externally, you need to make sure you’re not saturating the external model. In either case, the level of parallelism you can achieve will be dictated by the embedding model rather than how much CPU and RAM you have available in the batch processing system.

In the low-latency case, you need to account for the time it takes to generate the embedding vectors. The calling application should invoke the pipeline asynchronously.

Vector databases

A vector database has two functions: store embedding vectors, and run a similarity search to find the closest k matches to a new vector. There are three general types of vector databases:

  • Dedicated SaaS options like Pinecone.
  • Vector database features built into other services. This includes native AWS services like Amazon OpenSearch Service and Amazon Aurora.
  • In-memory options that can be used for transient data in low-latency scenarios.

We don’t cover the similarity searching capabilities in detail in this post. Although they’re important, they are a functional aspect of the system and don’t directly affect resilience. Instead, we focus on the resilience aspects of a vector database as a storage system:

  • Latency – Can the vector database perform well against a high or unpredictable load? If not, the calling application needs to handle rate limiting and backoff and retry.
  • Scalability – How many vectors can the system hold? If you exceed the capacity of the vector database, you’ll need to look into sharding or other solutions.
  • High availability and disaster recovery – Embedding vectors are valuable data, and recreating them can be expensive. Is your vector database highly available in a single AWS Region? Does it have the ability to replicate data to another Region for disaster recovery purposes?

Application tier

There are three unique considerations for the application tier when integrating generative AI solutions:

  • Potentially high latency – Foundation models often run on large GPU instances and may have finite capacity. Make sure to use best practices for rate limiting, backoff and retry, and load shedding. Use asynchronous designs so that high latency doesn’t interfere with the application’s main interface.
  • Security posture – If you’re using agents, tools, plugins, or other methods of connecting a model to other systems, pay extra attention to your security posture. Models may try to interact with these systems in unexpected ways. Follow the normal practice of least-privilege access, for example restricting incoming prompts from other systems.
  • Rapidly evolving frameworks – Open source frameworks like LangChain are evolving rapidly. Use a microservices approach to isolate other components from these less mature frameworks.

Capacity

We can think about capacity in two contexts: inference and training model data pipelines. Capacity is a consideration when organizations are building their own pipelines. CPU and memory requirements are two of the biggest requirements when choosing instances to run your workloads.

Instances that can support generative AI workloads can be more difficult to obtain than your average general-purpose instance type. Instance flexibility can help with capacity and capacity planning. Depending on what AWS Region you are running your workload in, different instance types are available.

For the user journeys that are critical, organizations will want to consider either reserving or pre-provisioning instance types to ensure availability when needed. This pattern achieves a statically stable architecture, which is a resiliency best practice. To learn more about static stability in the AWS Well-Architected Framework reliability pillar, refer to Use static stability to prevent bimodal behavior.

Observability

Besides the resource metrics you typically collect, like CPU and RAM utilization, you need to closely monitor GPU utilization if you host a model on Amazon SageMaker or Amazon Elastic Compute Cloud (Amazon EC2). GPU utilization can change unexpectedly if the base model or the input data changes, and running out of GPU memory can put the system into an unstable state.

Higher up the stack, you will also want to trace the flow of calls through the system, capturing the interactions between agents and tools. Because the interface between agents and tools is less formally defined than an API contract, you should monitor these traces not only for performance but also to capture new error scenarios. To monitor the model or agent for any security risks and threats, you can use tools like Amazon GuardDuty.

You should also capture baselines of embedding vectors, prompts, context, and output, and the interactions between these. If these change over time, it may indicate that users are using the system in new ways, that the reference data is not covering the question space in the same way, or that the model’s output is suddenly different.

Disaster recovery

Having a business continuity plan with a disaster recovery strategy is a must for any workload. Generative AI workloads are no different. Understanding the failure modes that are applicable to your workload will help guide your strategy. If you are using AWS managed services for your workload, such as Amazon Bedrock and SageMaker, make sure the service is available in your recovery AWS Region. As of this writing, these AWS services don’t support replication of data across AWS Regions natively, so you need to think about your data management strategies for disaster recovery, and you also may need to fine-tune in multiple AWS Regions.

Conclusion

This post described how to take resilience into account when building generative AI solutions. Although generative AI applications have some interesting nuances, the existing resilience patterns and best practices still apply. It’s just a matter of evaluating each part of a generative AI application and applying the relevant best practices.

For more information about generative AI and using it with AWS services, refer to the following resources:


About the Authors

Jennifer Moran is an AWS Senior Resiliency Specialist Solutions Architect based out of New York City. She has a diverse background, having worked in many technical disciplines, including software development, agile leadership, and DevOps, and is an advocate for women in tech. She enjoys helping customers design resilient solutions to improve resilience posture and publicly speaks about all topics related to resilience.

Randy DeFauwRandy DeFauw is a Senior Principal Solutions Architect at AWS. He holds an MSEE from the University of Michigan, where he worked on computer vision for autonomous vehicles. He also holds an MBA from Colorado State University. Randy has held a variety of positions in the technology space, ranging from software engineering to product management. He entered the big data space in 2013 and continues to explore that area. He is actively working on projects in the ML space and has presented at numerous conferences, including Strata and GlueCon.

Read More

Analyze security findings faster with no-code data preparation using generative AI and Amazon SageMaker Canvas

Analyze security findings faster with no-code data preparation using generative AI and Amazon SageMaker Canvas

Data is the foundation to capturing the maximum value from AI technology and solving business problems quickly. To unlock the potential of generative AI technologies, however, there’s a key prerequisite: your data needs to be appropriately prepared. In this post, we describe how use generative AI to update and scale your data pipeline using Amazon SageMaker Canvas for data prep.

Typically, data pipeline work requires a specialized skill to prepare and organize data for security analysts to use to extract value, which can take time, increase risks, and increase time to value. With SageMaker Canvas, security analysts can effortlessly and securely access leading foundation models to prepare their data faster and remediate cyber security risks.

Data prep involves careful formatting and thoughtful contextualization, working backward from the customer problem. Now with the SageMaker Canvas chat for data prep capability, analysts with domain knowledge can quickly prepare, organize, and extract value from data using a chat-based experience.

Solution overview

Generative AI is revolutionizing the security domain by providing personalized and natural language experiences, enhancing risk identification and remediations, while boosting business productivity. For this use case, we use SageMaker Canvas, Amazon SageMaker Data Wrangler, Amazon Security Lake, and Amazon Simple Storage Service (Amazon S3). Amazon Security Lake allows you to aggregate and normalize security data for analysis to gain a better understanding of security across your organization. Amazon S3 enables you to store and retrieve any amount of data at any time or place. It offers industry-leading scalability, data availability, security, and performance.

SageMaker Canvas now supports comprehensive data preparation capabilities powered by SageMaker Data Wrangler. With this integration, SageMaker Canvas provides an end-to-end no-code workspace to prepare data, build, and use machine learning (ML) and Amazon Bedrock foundation models to accelerate the time from data to business insights. You can now discover and aggregate data from over 50 data sources and explore and prepare data using over 300 built-in analyses and transformations in the SageMaker Canvas visual interface. You’ll also see faster performance for transforms and analyses, and benefit from a natural language interface to explore and transform data for ML.

In this post, we demonstrate three key transformations; filtering, column renaming, and text extraction from a column on the security findings dataset. We also demonstrate using the chat for data prep feature in SageMaker Canvas to analyze the data and visualize your findings.

Prerequisites

Before starting, you need an AWS account. You also need to set up an Amazon SageMaker Studio domain. For instructions on setting up SageMaker Canvas, refer to Generate machine learning predictions without code.

Access the SageMaker Canvas chat interface

Complete the following steps to start using the SageMaker Canvas chat feature:

  1. On the SageMaker Canvas console, choose Data Wrangler.
  2. Under Datasets, choose Amazon S3 as your source and specify the security findings dataset from Amazon Security Lake.
  3. Choose your data flow and choose Chat for data prep, which will display a chat interface experience with guided prompts.

Filter data

For this post, we first want to filter for critical and high severity warnings, so we enter into the chat box instructions to remove findings that are not critical or high severity. Canvas removes the rows, displays a preview of transformed data, and provides the option to use the code. We can add it to the list of steps in the Steps pane.

Rename columns

Next, we want rename two columns, so we enter in the chat box the following prompt, to rename the desc and title columns to Finding and Remediation. SageMaker Canvas generates a preview, and if you’re happy with the results, you can add the transformed data to the data flow steps.

Extract text

To determine the source Regions of the findings, you can enter in chat instructions to Extract the Region text from the UID column based on the pattern arn:aws:security:securityhub:region:*  and create a new column called Region) to extract the Region text from the UID column based on a pattern. SageMaker Canvas then generates code to create a new region column. The data preview shows the findings originate from one Region: us-west-2. You can add this transformation to the data flow for downstream analysis.

Analyze the data

Finally, we want to analyze the data to determine if there is a correlation between time of day and number of critical findings. You can enter a request to summarize critical findings by time of day into the chat, and SageMaker Canvas returns insights that are useful for your investigation and analysis.

Visualize findings

Next, we visualize the findings by severity over time to include in a leadership report. You can ask SageMaker Canvas to generate a bar chart of severity compared to time of day. In seconds, SageMaker Canvas has created the chart grouped by severity. You can add this visualization to the analysis in the data flow and download it for your report. The data shows the findings originate from one Region and happen at specific times. This gives us confidence on where to focus our security findings investigation to determine root causes and corrective actions.

Clean up

To avoid incurring unintended charges, complete the following steps to clean up your resources:

  1. Empty the S3 bucket you used as a source.
  2. Log out of SageMaker Canvas.

Conclusion

In this post, we showed you how to use SageMaker Canvas as an end-to-end no-code workspace for data preparation to build and use Amazon Bedrock foundation models to accelerate time to gather business insights from data.

Note that this approach is not limited to security findings; you can apply this to any generative AI use case that uses data preparation at its core.

The future belongs to businesses that can effectively harness the power of generative AI and large language models. But to do so, we must first develop a solid data strategy and understand the art of data preparation. By using generative AI to structure our data intelligently, and working backward from the customer, we can solve business problems faster. With SageMaker Canvas chat for data preparation, it’s effortless for analysts to get started and capture immediate value from AI.


About the Authors

Sudeesh Sasidharan is a Senior Solutions Architect at AWS, within the Energy team. Sudeesh loves experimenting with new technologies and building innovative solutions that solve complex business challenges. When he is not designing solutions or tinkering with the latest technologies, you can find him on the tennis court working on his backhand.

John Klacynski is a Principal Customer Solution Manager within the AWS Independent Software Vendor (ISV) team. In this role, he programmatically helps ISV customers adopt AWS technologies and services to reach their business goals more quickly. Prior to joining AWS, John led Data Product Teams for large Consumer Package Goods companies, helping them leverage data insights to improve their operations and decision making.

Read More

GeForce NOW Leaps Into Its Fourth Year With 27 New Games and More Celebrations All Month Long

GeForce NOW Leaps Into Its Fourth Year With 27 New Games and More Celebrations All Month Long

GeForce NOW is celebrating its fourth anniversary all month — plus an extra day for leap year — during February’s GFN Thursdays, with 2 new games joining the cloud. Keep an eye out for more new games and other announcements for members to come.

Diablo IV and Overwatch 2 on GeForce NOW
The cloud is getting hotter.

Diablo IV and Overwatch 2 heat up the cloud this GFN Thursday to kick off the anniversary celebrations. They’re the next Activision and Blizzard games to join GeForce NOW as part of the NVIDIA and Microsoft partnership, following Call of Duty.

The titles join the RAGE franchise as part of five new games joining the ever-expanding GeForce NOW library this week.

New GeForce NOW IG channel
Did it for the ‘Gram.

Follow the #4YearsofGFN celebration and get the latest cloud gaming news on the newly launched GeForce NOW Instagram and Threads channels. Use #GFNShare and tag these accounts for a chance to be featured.

 

Oh, Heck

Diablo 4 on GeForce NOW
Don’t be a chicken.

Join the fight to protect the world of Sanctuary in the highly acclaimed online action role-playing game Diablo IV with the power of a GeForce RTX 4080 gaming rig in the cloud. Members can look forward to playing the Steam version, with support for the Battle.net launcher coming soon.

The battle between the High Heavens and Burning Hells rages on as chaos threatens to consume Sanctuary. Create customized characters and choose between five classes: Barbarian, Druid, Necromancer, Rogue and Sorcerer. Each features unique skills and talents to help in battle.

Jump into the dark, thrilling campaign solo or with friends. Explore a shared open world where players can form parties to take down World Bosses, or join the fray in player vs. player zones to test skills against others. Explore and loot over 120 monster-filled dungeons, and clear out Strongholds to use as safe havens for Sanctuary citizens.

Ultimate members can stand triumphant at up to 4K resolution and 120 frames per second with support for NVIDIA DLSS technology and experience the action even on low-powered devices.

Be a Hero

Overwatch 2 on GeForce NOW
Time to save the world again.

Team up and answer the call of heroes in Overwatch 2, streaming today on GeForce NOW. Members can look forward to playing the Steam version, with support for the Battle.net launcher coming soon.

Overwatch 2 features 30+ epic heroes, each with game-changing abilities. Battle across all-new maps and modes in this ultimate team-based game. Lead the charge, ambush enemies or aid allies as one of Overwatch’s distinct heroes. Join the battle across dozens of futuristic maps inspired by real-world locations, and master unique game modes in the always-on, ever-evolving, live game.

Ultimate members can join the action with support for 4K and ultra-low latency. It’s the perfect upgrade for teammates on low-powered devices, ensuring everyone’s playing at heroic quality.

Rage Out

Rage 2 on GeForce NOW
No rules, no limits, no mercy!

Get ready for action with this week’s five new additions, including the RAGE series:

And check out what’s coming throughout the rest of the month:

  • Stormgate (Demo on Steam, available Feb. 5-12 during Steam Next Fest)
  • The Inquisitor (New release on Steam, Feb. 8)
  • Banishers: Ghosts of New Eden (New release on Steam, Feb. 13)
  • Solium Infernum (New release on Steam, Feb. 14)
  • Skull and Bones (New release on Ubisoft, Feb. 16)
  • The Thaumaturge (New release on Steam, Feb. 20)
  • Myth of Empires (New re-release on Steam, Feb. 21)
  • Terminator: Dark Fate – Defiance (New release on Steam, Feb. 21)
  • Nightingale (New release on Steam, Feb. 22)
  • Garden Life: A Cozy Simulator (New release on Steam, Feb. 22)
  • Pacific Drive (New release on Steam, Feb. 22)
  • STAR WARS: Dark Forces Remaster (New release on Steam, Feb. 28)
  • Welcome to ParadiZe (New release on Steam, Feb. 29)
  • Aragami 2 (Xbox, available on Microsoft Store)
  • dotAGE (Steam)
  • Fort Solis (Steam)
  • Katamari Damacy REROLL (Steam)
  • Klonoa Phantasy Reverie Series (Steam)
  • PAC-MAN MUSEUM+ (Steam)
  • PAC-MAN WORLD Re-PAC (Steam)
  • Tales of Arise (Steam)
  • Tram Simulator Urban Transit (Steam)
  • The Walking Dead: The Telltale Definitive Series (Steam)

Joyful January

In addition to the 20 games announced last month, 14 more joined the GeForce NOW library:

  • Those Who Remain (New release on Xbox, available on PC Game Pass, Jan. 16)
  • New Cycle (New release on Steam, Jan. 18)
  • Assassin’s Creed: Valhalla (Xbox, available for PC Game Pass)
  • Beacon Pines (Xbox, available on the Microsoft Store)
  • Exoprimal (Steam)
  • FAR: Changing Tides (Xbox, available on the Microsoft Store)
  • Going Under (Xbox, available on the Microsoft Store)
  • Metal: Hellsinger (Xbox, available on the Microsoft Store)
  • Road 96: Mile 0 (Xbox, available on the Microsoft Store)
  • The Talos Principle 2 (Epic Games Store)
  • TUNIC (Xbox, available for PC Game Pass)
  • Turbo Golf Racing (Xbox, available for PC Game Pass)
  • Turnip Boy Commits Tax Evasion (Xbox, available on the Microsoft Store)
  • Zombie Army 4: Dead War (Xbox, available for PC Game Pass)

Discussions with Spike Chunsoft have confirmed that the publisher’s games will remain on GeForce NOW. Members can continue to stream these titles from the cloud:

  • 428 Shibuya Scramble
  • AI: The Somnium Files
  • Conception PLUS: Maidens of the Twelve Stars
  • Danganronpa: Trigger Happy Havoc
  • Danganronpa 2: Goodbye Despair
  • Danganronpa V3: Killing Harmony
  • Danganronpa Another Episode: Ultra Despair Girls
  • Fire Pro Wrestling World
  • Re: ZERO – Starting Life in Another World – The Prophecy of the Throne
  • RESEARCH and DESTROY
  • Shiren the Wanderer: The Tower of Fortune and the Dice of Fate
  • STEINS;GATE
  • Zanki Zero: Last Beginning
  • Zero Escape: The Nonary Games

What are you looking forward to streaming? Let us know on X or in the comments below.

Read More

What’s Your Story: Ivan Tashev

What’s Your Story: Ivan Tashev

photo of Ivan Tashev with the Microsoft Research Podcast logo

In the Microsoft Research Podcast series What’s Your Story, Johannes Gehrke explores the who behind the technical and scientific advancements helping to reshape the world. A systems expert whose 10 years with Microsoft spans research and product, Gehrke talks to members of the company’s research community about what motivates their work and how they got where they are today.

Partner Software Architect Ivan Tashev’s expertise in audio signal processing has contributed to the design and study of audio components for Microsoft products such as Kinect, Teams, and HoloLens. In this episode, Tashev discusses how a first-place finish in the Mathematical Olympiad fueled a lifelong passion for shooting film; how a company event showcasing cutting-edge projects precipitated his move from product back to research; and how laser focus on things within his control has helped him find success in 25-plus years with Microsoft.

photos of Ivan Tashev throughout his life

Transcript

[TEASER] [MUSIC PLAYS UNDER DIALOGUE]

IVAN TASHEV: To succeed in Microsoft, you have to be laser focused on what you are doing. This is the thing you can change. Focus on the problems you have to solve, do your job, and be very good at it. Those are the most important rules I have used in my career in Microsoft.

[TEASER ENDS]

JOHANNES GEHRKE: Microsoft Research works at the cutting edge. But how much do we know about the people behind the science and technology that we create? This is What’s Your Story, and I’m Johannes Gehrke. In my 10 years with Microsoft, across product and research, I’ve been continuously excited and inspired by the people I work with, and I’m curious about how they became the talented and passionate people they are today. So I sat down with some of them. Now, I’m sharing their stories with you. In this podcast series, you’ll hear from them about how they grew up, the critical choices that shaped their lives, and their advice to others looking to carve a similar path.

[MUSIC FADES]

In this episode, I’m talking with Partner Software Architect Ivan Tashev in the anechoic chamber in Building 99 on our Redmond, Washington, campus. Constructed of concrete, rubber, and sound-absorbing panels, making it impervious to outside noise, this chamber has played a significant role in Ivan’s 25 years with Microsoft.

He’s put his expertise in audio processing to work in the space, helping to design and study the audio components of such products as Kinect, Teams, and HoloLens. Here’s my conversation with Ivan, beginning with his childhood in Bulgaria, where he was raised by two history teachers.


IVAN TASHEV: So I’m born in a city called Yambol in Bulgaria, my origin country. The city [was] created 2,000 years B.C. and now sits on the two shores of the river called Tundzha. It always has been an important transportation and agricultural center in the entire region, and I grew up there in a family of two lecturers. My parents were teaching history. And they loved to travel. So everywhere I go, I had two excellent tourist guides with me: “This in this place happened at this and this and this in this year.”

GEHRKE: Were there quizzes afterwards? [LAUGHTER]

TASHEV: But it happened that I was more fond to engineering, technology, math, all of the devices. It just … mechanical things just fascinated me. When I read in a book about the parachutes, I decided that I will have to try this and jump into it from the second floor of a building with an umbrella to see how much it will slow me down. It didn’t.

GEHRKE: And how … did you get hurt?

TASHEV: Oh, I ended with a twisted ankle for quite a while.

GEHRKE: Oh, OK, good. Nothing more … worse. [LAUGHTER] So you were always hands on, that’s what you’re telling me, right? Always the experimenter?

TASHEV: Yep. So I was doing a lot of this stuff, but also I was very strong in math. It happened that I had good teachers in math, and going to those competitions of mathematical Olympiads was something I started since fifth grade. Pretty much every year, they were well organized on school, city, regional level, and I remember how in my sixth grade, I won the first place of a regional Olympiad, and the prize was an 8mm movie camera. That, I would say, changed my life. This is my hobby since then. I have been holding this, a movie camera of several generations, everywhere I go and travel. In Moscow, in Kyiv, in Venice. Everywhere my parents were traveling, I was shooting 8mm films, and I continue this till today. Today, I have much better equipment but also very powerful computers to do the processing. I produce three to five Blu-ray Discs pretty much every year. Performances of the choir or the dancing groups in the Bulgarian Cultural and Heritage Center of Seattle mostly.

GEHRKE: Wow, that’s fascinating. And was that hobby somehow connected to your, you know, entry into, you know, science and then actually doing a PhD and then actually going and, you know, going into audio, audio processing?

TASHEV: The mathematical high school I attended in my … in the city where I’m born was one of the fifth … one of the five strongest in the country, which means first, math every day, two days, twice; physics every day. Around ninth grade, at the end, we finished the entire high school curriculum and started to study differentials and integrals, something which is more towards the university math courses. But this means that I had no problems entering any of the, of the universities with mathematical exams. I didn’t even have to do that because I qualified in one year, my 11th grade, to become member of the Bulgarian national teams in … for the International Math Olympia and for International Physics Olympia. And they actually coincided, so I had to choose one, and I chose physics. And since then, I’m actually saying that math is the language of physics; physics is the language of engineering. And that kind of showed the tendency … so literally, I was 11th grade and I could literally point and choose any of the universities, and I decided to go and study electronic engineering in the Technical University of Sofia.

GEHRKE: And then how did you end up in the US?

TASHEV: So that’s another interesting story. I defended my … graduated from the university, defended my PhD thesis. It was something amazing.

GEHRKE: What was it on, actually?

TASHEV: It was a control system for a telescope. But not just for observation of celestial objects but for tracking and ranging the distance to a satellite. It’s literally one measurement. You shoot with the laser; it goes to the satellite, which is 60 centimeters in diameter; it returns back; and you measure the time with accuracy of 100 picoseconds. And this was part of studying how the Earth rotates, how the satellites move. The data … there were around 44 stations like this in the entire Earth, and the data were public and used by NASA for finalizing the models for the satellites, which later all became GPS; used by Russians to finalize the models for their GLONASS system; used by people who studied the precession and the rotation of the Earth. A lot of interesting PhD theses came from the data from the results of this device, including tides. For example, I found that Balkan Peninsula moves up and down 2 meters every day because of the tides. So the Earth is liquid inside, and there are tides under us in the same way as with the oceans.

GEHRKE: Oh, wow, super interesting. I actually just wanted to come back … so just to get the right kind of comparison for the, for the unit, and so picoseconds, right? Because I know what a nanosecond is because …

TASHEV: Nanoseconds is 1-0 minus ninth; picoseconds is 1-0 minus 12th.

GEHRKE: OK. Good, good. Just to put that in perspective.

TASHEV: Thank you, Johannes. To, to be exact. So this was the, the accuracy. The light goes 30 centimeters for that time. For one nanosecond. And we needed to go way shorter than that. But why this project was so fascinating for me … can you imagine this is 1988—people having Apple II on compatible computers playing with the joystick a very famous game when you have the crosshair in the space and you shoot with laser the satellites.

GEHRKE: [LAUGHS] Absolutely.

TASHEV: And I was sitting behind the ocular and moving a joystick and shooting at real satellites. [LAUGHS]

GEHRKE: Not with the goal to destroy them, of course.

TASHEV: No. The energy of the laser was one joule. You can put your hand in front. But very short and one nanosecond. So it can go and enter and you have the resolution to measure the distance.

GEHRKE: Super, super exciting.

TASHEV: And after that, I became assistant professor in the Technical University of Sofia. How I came to Microsoft is a consequence of that. So I was teaching data and signal processing, and the changes in Europe already started. Think about 1996. And then a friend of mine came back from a scientific institution from the former Eastern Germany, and he basically shared how much money West Germany has poured into the East German economy to change it, to bring it up to the standards, and that … it was, I think, 900 billion Deutsche Marks.

GEHRKE: But this was after the … 

TASHEV: After the changes. After, after basically the East and West Germany united. And then this was in the first nine years of the changes. And then we looked at each other in the eyes and said, wait a minute. If you model this as a first-order system, this is the time constant, and the process will finish after two times more of the time constant, and they will need another 900 billion Marks. You cannot imagine how exact became that prediction when East Germany will be on equal economically to the West Germany. But then we looked at each other’s eyes and said, what about Bulgaria? We don’t have West Bulgaria. And then this started to make me think that most probably there will be technical universal software, but in this economical crisis, there will be no money for research, none for development, for building skills, for going to conferences. And then pretty much around the same time, somebody said, hey, you know, Microsoft is coming here to hire. And I sent my résumé knowing that, OK, I’m an assistant professor. I can program. But that actually happened that I can program quite well, implementing all of those control systems for the telescope, etc., etc., and literally …

GEHRKE: And so there was a programming testing as part of the interview?

TASHEV: Oh, the interview questions were three or four people, one hour, asking programming questions. The opening was for a software engineer.

GEHRKE: Like on a whiteboard?

TASHEV: Like on a whiteboard. And then I got an email saying that, Ivan, we liked your performance. We want to bring you to Redmond for further interviews. I flew here in 1997. After the interviews, I returned to my hotel, and the offer was waiting for me on the reception.

GEHRKE: Wow, that’s fast.

TASHEV: So this is how we decided to move here in Redmond, and I started and went through two full shipping cycles of programs.

GEHRKE: So you didn’t start out in MSR (Microsoft Research), right?

TASHEV: Nope.

GEHRKE: Where were you first?

TASHEV: So, actually, I was lucky enough both products were version 1.0. One of them was COM+. This is the transactional server and the COM technology, which is the backbone of Windows.

GEHRKE: Was the component model being used at that point in time?

TASHEV: Component object model. Basically, creating an object, calling … getting the interface, and calling the methods there. And my experience with low-level programming on assembly language and microprocessor actually became here very handy. We shipped this as a part of Windows 2000. And the second product was the Microsoft Application Center 2000, which was, OK, cluster management system.

GEHRKE: But both of them had nothing to do with signal processing, right?

TASHEV: Nope. Except there were some load balancing in Application Center. But they had nothing to do with signal processing; just pure programming skills.

GEHRKE: Right.

TASHEV: And then in the year of 2000, there was the first TechFest, and I went to see it and said, wait a minute. There are PhDs in this company and they’re doing this amazing research? My place is here.

GEHRKE: And TechFest, maybe … do you want to explain briefly what TechFest is?

TASHEV: TechFest is an annual event when researchers from Microsoft Research go and show and demonstrate technologies they have created.

GEHRKE: So it used to be, like, in the Microsoft Conference Center.

TASHEV: It used to be in the Microsoft Conference Center …

GEHRKE: Like, a really big two-day event …

TASHEV: … and basically visited by 6, 7,000 Microsoft employees. And usually, Microsoft Research, all of the branches were showing around 150ish demos, and it was amazing. And that was the first such event. Pretty much …

GEHRKE: Oh, the very first time?

TASHEV: The very first TechFest. And pretty much not only me, but the rest of Microsoft Corporation learning that we do have a research organization. In short, in three months, I started in Microsoft Research.

GEHRKE: How did you get a job here then? How did that happen?

TASHEV: So … seriously, visiting TechFest made me to think seriously that I should return back to research, and I opened the career website with potential openings, and there were two suitable for me. One of them was in Rico Malvar’s Signal Processing Group …

GEHRKE: Oh, OK, yeah …

TASHEV: … and the other was in Communication, Collaboration, and Multimedia Group led by Anoop Gupta. So I sent my résumé to both of them. Anoop replied in 15 minutes; next week, I was on informational with him. When Rico replied, I already had an offer from Anoop to join the team. [LAUGHS]

GEHRKE: Got it. And that’s, that’s where your focus on communication came from then?

TASHEV: Yes. So our first project was RingCam.

GEHRKE: OK.

TASHEV: So it’s a 360-camera, eight-element microphone array in the base, and the purpose was to record the meetings, to do a, a meeting diarization, to have a 360 view, but also, based on the signal processing and face detection, to have a speaker view, separate camera for the whiteboard, diarization based on who is speaking based on the direction from the microphone array. Honestly, even today when you read our 2002 paper … Ross Cutler was creator of the 360 camera; I was doing the microphone array. Even today when you read our 2002 paper, you say, wow, that was something super exciting and super advanced.

GEHRKE: And that then you brought it all the way to shipping, right, and it became a Microsoft product?

TASHEV: So yes. At some point, it was actually monitored personally by Bill Gates, and at some point …

GEHRKE: So he was PMing it, basically, or …? [LAUGHS]

TASHEV: He basically was …

GEHRKE: He was just aware of it.

TASHEV: I personally stole the distributed meeting system in Bill Gates’ conference room.

GEHRKE: Wow.

TASHEV: We do have basically 360 images with Bill Gates attending a meeting. But anyway, it was believed that this is something important, and a product team was formed to make it a product. Ross Cutler left Microsoft Research and became architect of that team, and this is what became Microsoft RoundTable device. It was licensed to Polycom, and for many years was sold as Polycom [CX5000].

GEHRKE: Yeah, actually, I remember when I was in many meetings, they used to have exactly that device in the middle, and the nice thing was that even if somebody was remote, right, you could see all the people around the table and you got this, sort of, really nice view of who was next to whom and not sort of the transactional windows that you have right now in Teams. That’s a really interesting view.

TASHEV: So, as you can see, very exciting start. [LAUGHS] But then Anoop went and became Bill Gates’ technical assistant, and the signal processing people from his team were merged with Rico Malvar’s signal processing team, and this is how I continued to work on microphone arrays and the speech enhancement, and this is what I do till today.

GEHRKE: And you mentioned, like, amazing products from Microsoft like Kinect and so on, right. And so you were involved in the, like, audio processing layer of all of those, and they were actually then … part of it was designed here in this room?

TASHEV: Yep.

GEHRKE: So tell me a little bit more about how that happened.

TASHEV: You know, at the time, I was fascinated by a problem which was considered theoretically impossible: multichannel acoustic echo cancellation. There was a paper written in 1998 by the inventor of the acoustic echo cancellation from Bell Labs stating that stereo acoustic echo cancellation is not possible.

GEHRKE: And he proved it, or what does it mean? He just …

TASHEV: It’s very simple. You have two unknowns—the two impulse responses from the left and the right loudspeaker—and one equation; that’s the microphone signal. What I did was to circumvent this. When you start Kinect, you’ll hear some melodic signals, and this is the calibration. At least you know the relation between the two unknowns, and now you have one unknown, which is basically discovered using an adaptive filter, the classic acoustic echo cancellation. So technically, Kinect became the first device ever shipped with surround sound acoustic echo cancellation, the first device ever that could recognize human speech from 4 1/2 meters while the loudspeakers are blasting. And gamers are listening to very loud levels of their loudspeakers.

GEHRKE: So maybe just tell the audience a little bit, what does it mean to do acoustic echo cancellation? What is it actually good for, and what does it do?

TASHEV: So in general, speech enhancement is removing unwanted noises and sounds from the desired signal. Some of them we don’t know anything about, which is the surrounding noise. For some of them, we have a pretty good understanding. This is the sound from our own loudspeakers. So you send the signal to the loudspeakers and then try to estimate on the fly how much of it is captured by the microphone and subtract this estimation, and this is called acoustic echo cancellation. This is part of every single speakerphone. This is one of the oldest applications of the adaptive filtering.

GEHRKE: So would the right way to think about this is that noise cancellation is cancelling unwanted noise from the outside?

TASHEV: Unknown noises …

GEHRKE: … whereas acoustic echo cancellation is cancelling the own noise that actually comes …

TASHEV: … which we know about.

GEHRKE: Right, OK.

TASHEV: And that was an amazing work, but … it also started actually in TechFest. I designed this surround sound echo cancellation, and my target was … at the time, we had Windows Media Center. It was a device designed to stay in the media room and controlling all of those loudspeakers. And I made sure to bring all of the VPs of Windows and Windows Media Center, and then I noticed that I started repeatedly to see some faces which I didn’t invite—I didn’t know—but they came over and over and over. And after the meeting, after TechFest, a person called me and said, “Look, we are working on a thing which your technology fits very well,” and this is how I started to work for Kinect. And in the process of the work, I had to go and talk with industrial designers because of the design of the microphones, with electrical designers because of the circuitry and the requirements for identical microphone channels, and with the software team, which had to implement my algorithms, and this … actually, at some point, I had an office in their building and was literally embedded working with them day and night, especially at the end of the shipping cycle, of the shipping cycle when the device had to go out.

GEHRKE: And this was not a time when you could go, like, in the device and, you know, update software on the device or anything. The device would go out as is, right?

TASHEV: Actually, this was one of the first devices like that.

GEHRKE: Oh, it could?

TASHEV: Yup.

GEHRKE: Wow, I didn’t know that.

TASHEV: Already Kinects were manufactured. They are boxed; they are already distributed to the, to the stores. But there was a deadline when we had to provide the image when you connected Kinect to your Xbox and it has to be uploaded.

GEHRKE: But, no, I get that. But then once it was actually connected to the Xbox, you could still update the firmware on the …

TASHEV: Yes, yes.

GEHRKE: Oh, wow. That’s, that’s really cool. OK.

TASHEV: But it also has a deadline. So that was amazing. Literally left us, all of us, breathless. There are plenty of serious technological challenges to overcome. A lot of firsts as a technology is, basically, was brought to this device to make sure … and this is the audio. And next to us were the video people and the gaming people and the designers, and everybody was excited to be working like hell so we can basically bring this to the customers.

GEHRKE: Wow, that’s super exciting. I mean even just being involved in … I think that’s one of the really big things that is so much fun here at Microsoft, right, that you can get whatever you do in the hands of, you know, millions—if not hundreds of millions—of people, right. Coming, coming back to, you know, your work now in, in audio signal processing, and that whole field is also being revolutionized like many other fields right now with AI, right.

TASHEV: Absolutely.

GEHRKE: Photography, one of the other fields that you’re very passionate about, is also being revolutionized with AI, of course.

TASHEV: Also revolutionized.

GEHRKE: You know, in, in terms of changes that you’ve made in your career, how do you deal with such changes, and what were … you know, this is something where you have been an expert in a certain class of algorithms, and now suddenly it says there’s this completely new technology coming along, and we need to shift. How are you dealing with this? How did you deal with this, personally?

TASHEV: Let me put it in …

GEHRKE: In some sense, you’re becoming a little bit of a dinosaur in a little bit while …

TASHEV: Oh, not at all.

GEHRKE: That’s what I’m saying.

TASHEV: I wouldn’t be in research! [LAUGHS]

GEHRKE: Exactly. How did you overcome that?

TASHEV: So, first, each one of us was working and trying to produce better and better technology, and at the time, the signal processing, speech enhancement, most of the audio processing was based on statistical signal processing. You build statistical models, distributions, hidden Markov models, and get …

GEHRKE: Like speech recognition.

TASHEV: … certain improvements. Yep. And all of us started to sense that this set of tools we have started to saturate. And it was simple. We use the simple models we can derive. Let’s say speech is Gaussian distribution; noise is Gaussian distribution. You derive the suppression rule. But this is simplifying the reality. If you apply a more precise model of the speech signal distribution, then you cannot derive easily the suppression rule, for example, in the case of noise suppression. And it was literally hanging in the air that we have to find a way, a way to learn from data. And I have several papers, actually before the neural networks start to appear, that let’s get a big dataset and learn from the data this suppression rule.

GEHRKE: So a more data-driven approach already.

TASHEV: Data-driven approach. I have several papers from that, and by the way, they were not quite well accepted by my audio processing community. All of them are published on bordering conferences, not in the core conferences. I got those papers rejected. But then appeared neural networks. Not that they were something new. We had neural networks in ’80s, and they didn’t work well. The new … the miracle was that now we had an algorithm which allows us to train them. Literally, next year after the work of Geoff Hinton was published in the Implementation of Deep Learning, several things happened. At first, my colleagues in the speech research group started to do neural network–based speech recognition, and I, in my audio group, started to do neural network–based speech enhancement. This is the year of 2013 or 2014. We had the speech, neural network–based speech enhancement algorithm surpassing the existing statistical signal processing algorithm literally instantly. It was big. It was heavy. But better.

GEHRKE: When did the first of these ship? What … can you tell any interesting ship stories about this?

TASHEV: The first neural network–based speech enhancement algorithm was shipped in 2020 in Teams.

GEHRKE: OK, OK.

TASHEV: We had to work with that team for quite a while. Actually, four years took us to work with Teams to find … you see, here in the research, industrial research lab we have a little bit different perspective. It’s not just to make it work; it’s not just to make it a technology. That technology has to be shippable. It has to meet a lot of other requirements and limitations in memory and in CPU and in reliability. It’s one thing to publish a paper with very cool results with your limited dataset and completely different to throw this algorithm in the wild, where it can face everything. And this is why it cost us around four years before to ship the first prototype in Teams.

GEHRKE: That, that makes sense. And I think a lot of the infrastructure was also not there at that point in time early on, right, in terms of, you know, how do you upload a model to the client, even in terms of all the model profiling, you know, neural architecture search, quantization, and other tooling that now exists where you can take a model …

TASHEV: That’s correct.

GEHRKE: … and squeeze it on the right kind of computation for the …

TASHEV: That’s correct. And …

GEHRKE: So you did all of that manually, I guess, at that point in time.

TASHEV: Initially, yes. But new architectures arrived. The cloud. Wow, it was a savior. You can press a button; you can get a hundred or thousand machines. You can run in parallel multiple architectures. You can really select the optimal from every single standpoint. Actually, what we did is we ended up with a set of speech enhancement algorithms. Given computing power, we can tell you what is the best architecture for this, or if you want to hit up this improvement, I can tell you how much CPU you will need for that.

GEHRKE: Got it.

TASHEV: But that tradeoff is also something very typical for industrial research lab and not very well understood in academia.

GEHRKE: Makes sense. Let me, let me switch gears one last time, namely, I mean, you have made quite a few changes in your career, you know, throughout, right. You started as an assistant professor and then became, sort of, a core developer, then, you know, were a member of a signal processing group and now you’re, sort of, driving a lot of the audio processing research for the company. How do you deal with this change, and do you have any advice for our listeners on how to, you know, keep your career going, especially as the rate of change seems to be accelerating all the time?

TASHEV: So for 25 years in Microsoft Corporation, I have learned several rules I follow. The first is dealing with ambiguity. It is not just change in the technology but changes in the … of the teams and organizations, etc., etc. Simply put, there are things you cannot change. There are things you cannot hide. Just accept them and go on. And here comes the second rule. To succeed in Microsoft, you have to be laser focused on what you are doing. This is the thing you can change. Focus on the problems you have to solve, do your job, and be very good at it. This is the most important … those are the two most important rules I have used in my career in Microsoft.

GEHRKE: OK, super, super interesting, Ivan. Thank you very much for this amazing conversation.

TASHEV: Thank you for the invitation, Johannes.

GEHRKE: To learn more about Ivan’s work or to see photos of Ivan pursuing his passion for shooting film and video, visit aka.ms/ResearcherStories (opens in new tab).

The post What’s Your Story: Ivan Tashev appeared first on Microsoft Research.

Read More